All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4 v2] BDI lifetime fix
@ 2017-01-31 12:54 Jan Kara
  2017-01-31 12:54 ` [PATCH 1/4] block: Unhash block device inodes on gendisk destruction Jan Kara
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Jan Kara @ 2017-01-31 12:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour, Jan Kara

Hello,

this is a second version of the patch series that attempts to solve the
problems with the life time of a backing_dev_info structure. Currently it lives
inside request_queue structure and thus it gets destroyed as soon as request
queue goes away. However the block device inode still stays around and thus
inode_to_bdi() call on that inode (e.g. from flusher worker) may happen after
request queue has been destroyed resulting in oops.

This patch set tries to solve these problems by making backing_dev_info
independent structure referenced from block device inode. That makes sure
inode_to_bdi() cannot ever oops. I gave some basic testing to the patches
in KVM and on a real machine, Dan was running them with libnvdimm test suite
which was previously triggering the oops and things look good. So they should
be reasonably healthy. Laurent, if you can give these patches testing in your
environment where you were triggering the oops, it would be nice.

								Honza

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] block: Unhash block device inodes on gendisk destruction
  2017-01-31 12:54 [PATCH 0/4 v2] BDI lifetime fix Jan Kara
@ 2017-01-31 12:54 ` Jan Kara
  2017-02-01  9:48   ` Christoph Hellwig
  2017-01-31 12:54 ` [PATCH 2/4] block: Use pointer to backing_dev_info from request_queue Jan Kara
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2017-01-31 12:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour, Jan Kara

Currently, block device inodes stay around after corresponding gendisk
hash died until memory reclaim finds them and frees them. Since we will
make block device inode pin the bdi, we want to free the block device
inode as soon as the device goes away so that bdi does not stay around
unnecessarily. Furthermore we need to avoid issues when new device with
the same major,minor pair gets created since reusing the bdi structure
would be rather difficult in this case.

Unhashing block device inode on gendisk destruction nicely deals with
these problems. Once last block device inode reference is dropped (which
may be directly in del_gendisk()), the inode gets evicted. Furthermore if
the major,minor pair gets reallocated, we are guaranteed to get new
block device inode even if old block device inode is not yet evicted and
thus we avoid issues with possible reuse of bdi.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/genhd.c      |  2 ++
 fs/block_dev.c     | 15 +++++++++++++++
 include/linux/fs.h |  1 +
 3 files changed, 18 insertions(+)

diff --git a/block/genhd.c b/block/genhd.c
index fcd6d4fae657..f2f22d0e8e14 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -648,6 +648,8 @@ void del_gendisk(struct gendisk *disk)
 	disk_part_iter_init(&piter, disk,
 			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
 	while ((part = disk_part_iter_next(&piter))) {
+		bdev_unhash_inode(MKDEV(disk->major,
+					disk->first_minor + part->partno));
 		invalidate_partition(disk, part->partno);
 		delete_partition(disk, part->partno);
 	}
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 5db5d1340d69..ed6a34be7a1e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -954,6 +954,21 @@ static int bdev_set(struct inode *inode, void *data)
 
 static LIST_HEAD(all_bdevs);
 
+/*
+ * If there is a bdev inode for this device, unhash it so that it gets evicted
+ * as soon as last inode reference is dropped.
+ */
+void bdev_unhash_inode(dev_t dev)
+{
+	struct inode *inode;
+
+	inode = ilookup5(blockdev_superblock, hash(dev), bdev_test, &dev);
+	if (inode) {
+		remove_inode_hash(inode);
+		iput(inode);
+	}
+}
+
 struct block_device *bdget(dev_t dev)
 {
 	struct block_device *bdev;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2ba074328894..702cb6c50194 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2342,6 +2342,7 @@ extern struct kmem_cache *names_cachep;
 #ifdef CONFIG_BLOCK
 extern int register_blkdev(unsigned int, const char *);
 extern void unregister_blkdev(unsigned int, const char *);
+extern void bdev_unhash_inode(dev_t dev);
 extern struct block_device *bdget(dev_t);
 extern struct block_device *bdgrab(struct block_device *bdev);
 extern void bd_set_size(struct block_device *, loff_t size);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] block: Use pointer to backing_dev_info from request_queue
  2017-01-31 12:54 [PATCH 0/4 v2] BDI lifetime fix Jan Kara
  2017-01-31 12:54 ` [PATCH 1/4] block: Unhash block device inodes on gendisk destruction Jan Kara
@ 2017-01-31 12:54 ` Jan Kara
  2017-02-01  9:48   ` Christoph Hellwig
  2017-01-31 12:54 ` [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info Jan Kara
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2017-01-31 12:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour, Jan Kara

We will want to have struct backing_dev_info allocated separately from
struct request_queue. As the first step add pointer to backing_dev_info
to request_queue and convert all users touching it. No functional
changes in this patch.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-cgroup.c             |  6 +++---
 block/blk-core.c               | 27 ++++++++++++++-------------
 block/blk-integrity.c          |  4 ++--
 block/blk-settings.c           |  2 +-
 block/blk-sysfs.c              |  8 ++++----
 block/blk-wbt.c                |  8 ++++----
 block/genhd.c                  |  2 +-
 drivers/block/aoe/aoeblk.c     |  4 ++--
 drivers/block/drbd/drbd_main.c |  6 +++---
 drivers/block/drbd/drbd_nl.c   | 12 +++++++-----
 drivers/block/drbd/drbd_proc.c |  2 +-
 drivers/block/drbd/drbd_req.c  |  2 +-
 drivers/block/pktcdvd.c        |  4 ++--
 drivers/block/rbd.c            |  2 +-
 drivers/md/bcache/request.c    | 10 +++++-----
 drivers/md/bcache/super.c      |  8 ++++----
 drivers/md/dm-cache-target.c   |  2 +-
 drivers/md/dm-era-target.c     |  2 +-
 drivers/md/dm-table.c          |  2 +-
 drivers/md/dm-thin.c           |  2 +-
 drivers/md/dm.c                |  6 +++---
 drivers/md/linear.c            |  2 +-
 drivers/md/md.c                |  6 +++---
 drivers/md/multipath.c         |  2 +-
 drivers/md/raid0.c             |  6 +++---
 drivers/md/raid1.c             |  4 ++--
 drivers/md/raid10.c            | 10 +++++-----
 drivers/md/raid5.c             | 12 ++++++------
 fs/gfs2/ops_fstype.c           |  2 +-
 fs/nilfs2/super.c              |  2 +-
 fs/super.c                     |  2 +-
 include/linux/blkdev.h         |  3 ++-
 mm/page-writeback.c            |  4 ++--
 33 files changed, 90 insertions(+), 86 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 8ba0af780e88..d673a69b61b4 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -184,7 +184,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 		goto err_free_blkg;
 	}
 
-	wb_congested = wb_congested_get_create(&q->backing_dev_info,
+	wb_congested = wb_congested_get_create(q->backing_dev_info,
 					       blkcg->css.id,
 					       GFP_NOWAIT | __GFP_NOWARN);
 	if (!wb_congested) {
@@ -469,8 +469,8 @@ static int blkcg_reset_stats(struct cgroup_subsys_state *css,
 const char *blkg_dev_name(struct blkcg_gq *blkg)
 {
 	/* some drivers (floppy) instantiate a queue w/o disk registered */
-	if (blkg->q->backing_dev_info.dev)
-		return dev_name(blkg->q->backing_dev_info.dev);
+	if (blkg->q->backing_dev_info->dev)
+		return dev_name(blkg->q->backing_dev_info->dev);
 	return NULL;
 }
 EXPORT_SYMBOL_GPL(blkg_dev_name);
diff --git a/block/blk-core.c b/block/blk-core.c
index 61ba08c58b64..a9ff1b919ae7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -74,7 +74,7 @@ static void blk_clear_congested(struct request_list *rl, int sync)
 	 * flip its congestion state for events on other blkcgs.
 	 */
 	if (rl == &rl->q->root_rl)
-		clear_wb_congested(rl->q->backing_dev_info.wb.congested, sync);
+		clear_wb_congested(rl->q->backing_dev_info->wb.congested, sync);
 #endif
 }
 
@@ -85,7 +85,7 @@ static void blk_set_congested(struct request_list *rl, int sync)
 #else
 	/* see blk_clear_congested() */
 	if (rl == &rl->q->root_rl)
-		set_wb_congested(rl->q->backing_dev_info.wb.congested, sync);
+		set_wb_congested(rl->q->backing_dev_info->wb.congested, sync);
 #endif
 }
 
@@ -116,7 +116,7 @@ struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev)
 {
 	struct request_queue *q = bdev_get_queue(bdev);
 
-	return &q->backing_dev_info;
+	return q->backing_dev_info;
 }
 EXPORT_SYMBOL(blk_get_backing_dev_info);
 
@@ -584,7 +584,7 @@ void blk_cleanup_queue(struct request_queue *q)
 	blk_flush_integrity();
 
 	/* @q won't process any more request, flush async actions */
-	del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
+	del_timer_sync(&q->backing_dev_info->laptop_mode_wb_timer);
 	blk_sync_queue(q);
 
 	if (q->mq_ops)
@@ -596,7 +596,7 @@ void blk_cleanup_queue(struct request_queue *q)
 		q->queue_lock = &q->__queue_lock;
 	spin_unlock_irq(lock);
 
-	bdi_unregister(&q->backing_dev_info);
+	bdi_unregister(q->backing_dev_info);
 
 	/* @q is and will stay empty, shutdown and put */
 	blk_put_queue(q);
@@ -708,17 +708,18 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	if (!q->bio_split)
 		goto fail_id;
 
-	q->backing_dev_info.ra_pages =
+	q->backing_dev_info = &q->_backing_dev_info;
+	q->backing_dev_info->ra_pages =
 			(VM_MAX_READAHEAD * 1024) / PAGE_SIZE;
-	q->backing_dev_info.capabilities = BDI_CAP_CGROUP_WRITEBACK;
-	q->backing_dev_info.name = "block";
+	q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
+	q->backing_dev_info->name = "block";
 	q->node = node_id;
 
-	err = bdi_init(&q->backing_dev_info);
+	err = bdi_init(q->backing_dev_info);
 	if (err)
 		goto fail_split;
 
-	setup_timer(&q->backing_dev_info.laptop_mode_wb_timer,
+	setup_timer(&q->backing_dev_info->laptop_mode_wb_timer,
 		    laptop_mode_timer_fn, (unsigned long) q);
 	setup_timer(&q->timeout, blk_rq_timed_out_timer, (unsigned long) q);
 	INIT_LIST_HEAD(&q->queue_head);
@@ -768,7 +769,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 fail_ref:
 	percpu_ref_exit(&q->q_usage_counter);
 fail_bdi:
-	bdi_destroy(&q->backing_dev_info);
+	bdi_destroy(q->backing_dev_info);
 fail_split:
 	bioset_free(q->bio_split);
 fail_id:
@@ -1196,7 +1197,7 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
 	 * disturb iosched and blkcg but weird is bettern than dead.
 	 */
 	printk_ratelimited(KERN_WARNING "%s: dev %s: request aux data allocation failed, iosched may be disturbed\n",
-			   __func__, dev_name(q->backing_dev_info.dev));
+			   __func__, dev_name(q->backing_dev_info->dev));
 
 	rq->rq_flags &= ~RQF_ELVPRIV;
 	rq->elv.icq = NULL;
@@ -2696,7 +2697,7 @@ void blk_finish_request(struct request *req, int error)
 	BUG_ON(blk_queued_rq(req));
 
 	if (unlikely(laptop_mode) && req->cmd_type == REQ_TYPE_FS)
-		laptop_io_completion(&req->q->backing_dev_info);
+		laptop_io_completion(req->q->backing_dev_info);
 
 	blk_delete_timer(req);
 
diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index d69c5c79f98e..9f0ff5ba4f84 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -443,10 +443,10 @@ void blk_integrity_revalidate(struct gendisk *disk)
 		return;
 
 	if (bi->profile)
-		disk->queue->backing_dev_info.capabilities |=
+		disk->queue->backing_dev_info->capabilities |=
 			BDI_CAP_STABLE_WRITES;
 	else
-		disk->queue->backing_dev_info.capabilities &=
+		disk->queue->backing_dev_info->capabilities &=
 			~BDI_CAP_STABLE_WRITES;
 }
 
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 529e55f52a03..6eb19bcbf3cb 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -253,7 +253,7 @@ void blk_queue_max_hw_sectors(struct request_queue *q, unsigned int max_hw_secto
 	max_sectors = min_not_zero(max_hw_sectors, limits->max_dev_sectors);
 	max_sectors = min_t(unsigned int, max_sectors, BLK_DEF_MAX_SECTORS);
 	limits->max_sectors = max_sectors;
-	q->backing_dev_info.io_pages = max_sectors >> (PAGE_SHIFT - 9);
+	q->backing_dev_info->io_pages = max_sectors >> (PAGE_SHIFT - 9);
 }
 EXPORT_SYMBOL(blk_queue_max_hw_sectors);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 1dbce057592d..64fb54c6b41c 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -89,7 +89,7 @@ queue_requests_store(struct request_queue *q, const char *page, size_t count)
 
 static ssize_t queue_ra_show(struct request_queue *q, char *page)
 {
-	unsigned long ra_kb = q->backing_dev_info.ra_pages <<
+	unsigned long ra_kb = q->backing_dev_info->ra_pages <<
 					(PAGE_SHIFT - 10);
 
 	return queue_var_show(ra_kb, (page));
@@ -104,7 +104,7 @@ queue_ra_store(struct request_queue *q, const char *page, size_t count)
 	if (ret < 0)
 		return ret;
 
-	q->backing_dev_info.ra_pages = ra_kb >> (PAGE_SHIFT - 10);
+	q->backing_dev_info->ra_pages = ra_kb >> (PAGE_SHIFT - 10);
 
 	return ret;
 }
@@ -236,7 +236,7 @@ queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
 
 	spin_lock_irq(q->queue_lock);
 	q->limits.max_sectors = max_sectors_kb << 1;
-	q->backing_dev_info.io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
+	q->backing_dev_info->io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
 	spin_unlock_irq(q->queue_lock);
 
 	return ret;
@@ -799,7 +799,7 @@ static void blk_release_queue(struct kobject *kobj)
 		container_of(kobj, struct request_queue, kobj);
 
 	wbt_exit(q);
-	bdi_exit(&q->backing_dev_info);
+	bdi_exit(q->backing_dev_info);
 	blkcg_exit_queue(q);
 
 	if (q->elevator) {
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index f0a9c07b4c7a..1aedb1f7ee0c 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -96,7 +96,7 @@ static void wb_timestamp(struct rq_wb *rwb, unsigned long *var)
  */
 static bool wb_recent_wait(struct rq_wb *rwb)
 {
-	struct bdi_writeback *wb = &rwb->queue->backing_dev_info.wb;
+	struct bdi_writeback *wb = &rwb->queue->backing_dev_info->wb;
 
 	return time_before(jiffies, wb->dirty_sleep + HZ);
 }
@@ -279,7 +279,7 @@ enum {
 
 static int __latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat)
 {
-	struct backing_dev_info *bdi = &rwb->queue->backing_dev_info;
+	struct backing_dev_info *bdi = rwb->queue->backing_dev_info;
 	u64 thislat;
 
 	/*
@@ -339,7 +339,7 @@ static int latency_exceeded(struct rq_wb *rwb)
 
 static void rwb_trace_step(struct rq_wb *rwb, const char *msg)
 {
-	struct backing_dev_info *bdi = &rwb->queue->backing_dev_info;
+	struct backing_dev_info *bdi = rwb->queue->backing_dev_info;
 
 	trace_wbt_step(bdi, msg, rwb->scale_step, rwb->cur_win_nsec,
 			rwb->wb_background, rwb->wb_normal, rwb->wb_max);
@@ -423,7 +423,7 @@ static void wb_timer_fn(unsigned long data)
 
 	status = latency_exceeded(rwb);
 
-	trace_wbt_timer(&rwb->queue->backing_dev_info, status, rwb->scale_step,
+	trace_wbt_timer(rwb->queue->backing_dev_info, status, rwb->scale_step,
 			inflight);
 
 	/*
diff --git a/block/genhd.c b/block/genhd.c
index f2f22d0e8e14..d9ccd42f3675 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -613,7 +613,7 @@ void device_add_disk(struct device *parent, struct gendisk *disk)
 	disk_alloc_events(disk);
 
 	/* Register BDI before referencing it from bdev */
-	bdi = &disk->queue->backing_dev_info;
+	bdi = disk->queue->backing_dev_info;
 	bdi_register_owner(bdi, disk_to_dev(disk));
 
 	blk_register_region(disk_devt(disk), disk->minors, NULL,
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index ec9d8610b25f..027b876370bc 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -396,8 +396,8 @@ aoeblk_gdalloc(void *vp)
 	WARN_ON(d->gd);
 	WARN_ON(d->flags & DEVFL_UP);
 	blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
-	q->backing_dev_info.name = "aoe";
-	q->backing_dev_info.ra_pages = READ_AHEAD / PAGE_SIZE;
+	q->backing_dev_info->name = "aoe";
+	q->backing_dev_info->ra_pages = READ_AHEAD / PAGE_SIZE;
 	d->bufpool = mp;
 	d->blkq = gd->queue = q;
 	q->queuedata = d;
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 83482721bc01..d305f05be648 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2462,7 +2462,7 @@ static int drbd_congested(void *congested_data, int bdi_bits)
 
 	if (get_ldev(device)) {
 		q = bdev_get_queue(device->ldev->backing_bdev);
-		r = bdi_congested(&q->backing_dev_info, bdi_bits);
+		r = bdi_congested(q->backing_dev_info, bdi_bits);
 		put_ldev(device);
 		if (r)
 			reason = 'b';
@@ -2834,8 +2834,8 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
 	/* we have no partitions. we contain only ourselves. */
 	device->this_bdev->bd_contains = device->this_bdev;
 
-	q->backing_dev_info.congested_fn = drbd_congested;
-	q->backing_dev_info.congested_data = device;
+	q->backing_dev_info->congested_fn = drbd_congested;
+	q->backing_dev_info->congested_data = device;
 
 	blk_queue_make_request(q, drbd_make_request);
 	blk_queue_write_cache(q, true, true);
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index f35db29cac76..908c704e20aa 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1328,11 +1328,13 @@ static void drbd_setup_queue_param(struct drbd_device *device, struct drbd_backi
 	if (b) {
 		blk_queue_stack_limits(q, b);
 
-		if (q->backing_dev_info.ra_pages != b->backing_dev_info.ra_pages) {
+		if (q->backing_dev_info->ra_pages !=
+		    b->backing_dev_info->ra_pages) {
 			drbd_info(device, "Adjusting my ra_pages to backing device's (%lu -> %lu)\n",
-				 q->backing_dev_info.ra_pages,
-				 b->backing_dev_info.ra_pages);
-			q->backing_dev_info.ra_pages = b->backing_dev_info.ra_pages;
+				 q->backing_dev_info->ra_pages,
+				 b->backing_dev_info->ra_pages);
+			q->backing_dev_info->ra_pages =
+						b->backing_dev_info->ra_pages;
 		}
 	}
 	fixup_discard_if_not_supported(q);
@@ -3345,7 +3347,7 @@ static void device_to_statistics(struct device_statistics *s,
 		s->dev_disk_flags = md->flags;
 		q = bdev_get_queue(device->ldev->backing_bdev);
 		s->dev_lower_blocked =
-			bdi_congested(&q->backing_dev_info,
+			bdi_congested(q->backing_dev_info,
 				      (1 << WB_async_congested) |
 				      (1 << WB_sync_congested));
 		put_ldev(device);
diff --git a/drivers/block/drbd/drbd_proc.c b/drivers/block/drbd/drbd_proc.c
index be2b93fd2c11..8378142f7a55 100644
--- a/drivers/block/drbd/drbd_proc.c
+++ b/drivers/block/drbd/drbd_proc.c
@@ -288,7 +288,7 @@ static int drbd_seq_show(struct seq_file *seq, void *v)
 			seq_printf(seq, "%2d: cs:Unconfigured\n", i);
 		} else {
 			/* reset device->congestion_reason */
-			bdi_rw_congested(&device->rq_queue->backing_dev_info);
+			bdi_rw_congested(device->rq_queue->backing_dev_info);
 
 			nc = rcu_dereference(first_peer_device(device)->connection->net_conf);
 			wp = nc ? nc->wire_protocol - DRBD_PROT_A + 'A' : ' ';
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index de279fe4e4fd..cb6bdb75d52d 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -938,7 +938,7 @@ static bool remote_due_to_read_balancing(struct drbd_device *device, sector_t se
 
 	switch (rbm) {
 	case RB_CONGESTED_REMOTE:
-		bdi = &device->ldev->backing_bdev->bd_disk->queue->backing_dev_info;
+		bdi = device->ldev->backing_bdev->bd_disk->queue->backing_dev_info;
 		return bdi_read_congested(bdi);
 	case RB_LEAST_PENDING:
 		return atomic_read(&device->local_cnt) >
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 1b94c1ca5c5f..f735c1e48549 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -1243,7 +1243,7 @@ static int pkt_handle_queue(struct pktcdvd_device *pd)
 	 		&& pd->bio_queue_size <= pd->write_congestion_off);
 	spin_unlock(&pd->lock);
 	if (wakeup) {
-		clear_bdi_congested(&pd->disk->queue->backing_dev_info,
+		clear_bdi_congested(pd->disk->queue->backing_dev_info,
 					BLK_RW_ASYNC);
 	}
 
@@ -2370,7 +2370,7 @@ static void pkt_make_request_write(struct request_queue *q, struct bio *bio)
 	spin_lock(&pd->lock);
 	if (pd->write_congestion_on > 0
 	    && pd->bio_queue_size >= pd->write_congestion_on) {
-		set_bdi_congested(&q->backing_dev_info, BLK_RW_ASYNC);
+		set_bdi_congested(q->backing_dev_info, BLK_RW_ASYNC);
 		do {
 			spin_unlock(&pd->lock);
 			congestion_wait(BLK_RW_ASYNC, HZ);
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 36d2b9f4e836..82afc56a5614 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -4524,7 +4524,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
 	q->limits.discard_zeroes_data = 1;
 
 	if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
-		q->backing_dev_info.capabilities |= BDI_CAP_STABLE_WRITES;
+		q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
 
 	disk->queue = q;
 
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 76d20875503c..15f2a2c58e97 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -1009,7 +1009,7 @@ static int cached_dev_congested(void *data, int bits)
 	struct request_queue *q = bdev_get_queue(dc->bdev);
 	int ret = 0;
 
-	if (bdi_congested(&q->backing_dev_info, bits))
+	if (bdi_congested(q->backing_dev_info, bits))
 		return 1;
 
 	if (cached_dev_get(dc)) {
@@ -1018,7 +1018,7 @@ static int cached_dev_congested(void *data, int bits)
 
 		for_each_cache(ca, d->c, i) {
 			q = bdev_get_queue(ca->bdev);
-			ret |= bdi_congested(&q->backing_dev_info, bits);
+			ret |= bdi_congested(q->backing_dev_info, bits);
 		}
 
 		cached_dev_put(dc);
@@ -1032,7 +1032,7 @@ void bch_cached_dev_request_init(struct cached_dev *dc)
 	struct gendisk *g = dc->disk.disk;
 
 	g->queue->make_request_fn		= cached_dev_make_request;
-	g->queue->backing_dev_info.congested_fn = cached_dev_congested;
+	g->queue->backing_dev_info->congested_fn = cached_dev_congested;
 	dc->disk.cache_miss			= cached_dev_cache_miss;
 	dc->disk.ioctl				= cached_dev_ioctl;
 }
@@ -1125,7 +1125,7 @@ static int flash_dev_congested(void *data, int bits)
 
 	for_each_cache(ca, d->c, i) {
 		q = bdev_get_queue(ca->bdev);
-		ret |= bdi_congested(&q->backing_dev_info, bits);
+		ret |= bdi_congested(q->backing_dev_info, bits);
 	}
 
 	return ret;
@@ -1136,7 +1136,7 @@ void bch_flash_dev_request_init(struct bcache_device *d)
 	struct gendisk *g = d->disk;
 
 	g->queue->make_request_fn		= flash_dev_make_request;
-	g->queue->backing_dev_info.congested_fn = flash_dev_congested;
+	g->queue->backing_dev_info->congested_fn = flash_dev_congested;
 	d->cache_miss				= flash_dev_cache_miss;
 	d->ioctl				= flash_dev_ioctl;
 }
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 3a19cbc8b230..85e3f21c2514 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -807,7 +807,7 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
 	blk_queue_make_request(q, NULL);
 	d->disk->queue			= q;
 	q->queuedata			= d;
-	q->backing_dev_info.congested_data = d;
+	q->backing_dev_info->congested_data = d;
 	q->limits.max_hw_sectors	= UINT_MAX;
 	q->limits.max_sectors		= UINT_MAX;
 	q->limits.max_segment_size	= UINT_MAX;
@@ -1132,9 +1132,9 @@ static int cached_dev_init(struct cached_dev *dc, unsigned block_size)
 	set_capacity(dc->disk.disk,
 		     dc->bdev->bd_part->nr_sects - dc->sb.data_offset);
 
-	dc->disk.disk->queue->backing_dev_info.ra_pages =
-		max(dc->disk.disk->queue->backing_dev_info.ra_pages,
-		    q->backing_dev_info.ra_pages);
+	dc->disk.disk->queue->backing_dev_info->ra_pages =
+		max(dc->disk.disk->queue->backing_dev_info->ra_pages,
+		    q->backing_dev_info->ra_pages);
 
 	bch_cached_dev_request_init(dc);
 	bch_cached_dev_writeback_init(dc);
diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index e04c61e0839e..fa255d66ca2f 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -2291,7 +2291,7 @@ static void do_waker(struct work_struct *ws)
 static int is_congested(struct dm_dev *dev, int bdi_bits)
 {
 	struct request_queue *q = bdev_get_queue(dev->bdev);
-	return bdi_congested(&q->backing_dev_info, bdi_bits);
+	return bdi_congested(q->backing_dev_info, bdi_bits);
 }
 
 static int cache_is_congested(struct dm_target_callbacks *cb, int bdi_bits)
diff --git a/drivers/md/dm-era-target.c b/drivers/md/dm-era-target.c
index bf2b2676cb8a..9fab33b113c4 100644
--- a/drivers/md/dm-era-target.c
+++ b/drivers/md/dm-era-target.c
@@ -1379,7 +1379,7 @@ static void stop_worker(struct era *era)
 static int dev_is_congested(struct dm_dev *dev, int bdi_bits)
 {
 	struct request_queue *q = bdev_get_queue(dev->bdev);
-	return bdi_congested(&q->backing_dev_info, bdi_bits);
+	return bdi_congested(q->backing_dev_info, bdi_bits);
 }
 
 static int era_is_congested(struct dm_target_callbacks *cb, int bdi_bits)
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 0a427de23ed2..3ad16d9c9d5a 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1750,7 +1750,7 @@ int dm_table_any_congested(struct dm_table *t, int bdi_bits)
 		char b[BDEVNAME_SIZE];
 
 		if (likely(q))
-			r |= bdi_congested(&q->backing_dev_info, bdi_bits);
+			r |= bdi_congested(q->backing_dev_info, bdi_bits);
 		else
 			DMWARN_LIMIT("%s: any_congested: nonexistent device %s",
 				     dm_device_name(t->md),
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index d1c05c12a9db..95dfa3904975 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -2714,7 +2714,7 @@ static int pool_is_congested(struct dm_target_callbacks *cb, int bdi_bits)
 		return 1;
 
 	q = bdev_get_queue(pt->data_dev->bdev);
-	return bdi_congested(&q->backing_dev_info, bdi_bits);
+	return bdi_congested(q->backing_dev_info, bdi_bits);
 }
 
 static void requeue_bios(struct pool *pool)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3086da5664f3..db934b1dba9d 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1314,7 +1314,7 @@ static int dm_any_congested(void *congested_data, int bdi_bits)
 			 * With request-based DM we only need to check the
 			 * top-level queue for congestion.
 			 */
-			r = md->queue->backing_dev_info.wb.state & bdi_bits;
+			r = md->queue->backing_dev_info->wb.state & bdi_bits;
 		} else {
 			map = dm_get_live_table_fast(md);
 			if (map)
@@ -1397,7 +1397,7 @@ void dm_init_md_queue(struct mapped_device *md)
 	 * - must do so here (in alloc_dev callchain) before queue is used
 	 */
 	md->queue->queuedata = md;
-	md->queue->backing_dev_info.congested_data = md;
+	md->queue->backing_dev_info->congested_data = md;
 }
 
 void dm_init_normal_md_queue(struct mapped_device *md)
@@ -1408,7 +1408,7 @@ void dm_init_normal_md_queue(struct mapped_device *md)
 	/*
 	 * Initialize aspects of queue that aren't relevant for blk-mq
 	 */
-	md->queue->backing_dev_info.congested_fn = dm_any_congested;
+	md->queue->backing_dev_info->congested_fn = dm_any_congested;
 	blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY);
 }
 
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 5975c9915684..f1c7bbac31a5 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -62,7 +62,7 @@ static int linear_congested(struct mddev *mddev, int bits)
 
 	for (i = 0; i < mddev->raid_disks && !ret ; i++) {
 		struct request_queue *q = bdev_get_queue(conf->disks[i].rdev->bdev);
-		ret |= bdi_congested(&q->backing_dev_info, bits);
+		ret |= bdi_congested(q->backing_dev_info, bits);
 	}
 
 	return ret;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 82821ee0d57f..ede3b2ad45db 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5341,8 +5341,8 @@ int md_run(struct mddev *mddev)
 			queue_flag_set_unlocked(QUEUE_FLAG_NONROT, mddev->queue);
 		else
 			queue_flag_clear_unlocked(QUEUE_FLAG_NONROT, mddev->queue);
-		mddev->queue->backing_dev_info.congested_data = mddev;
-		mddev->queue->backing_dev_info.congested_fn = md_congested;
+		mddev->queue->backing_dev_info->congested_data = mddev;
+		mddev->queue->backing_dev_info->congested_fn = md_congested;
 	}
 	if (pers->sync_request) {
 		if (mddev->kobj.sd &&
@@ -5699,7 +5699,7 @@ static int do_md_stop(struct mddev *mddev, int mode,
 
 		__md_stop_writes(mddev);
 		__md_stop(mddev);
-		mddev->queue->backing_dev_info.congested_fn = NULL;
+		mddev->queue->backing_dev_info->congested_fn = NULL;
 
 		/* tell userspace to handle 'inactive' */
 		sysfs_notify_dirent_safe(mddev->sysfs_state);
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index aa8c4e5c1ee2..d457afa672d5 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -169,7 +169,7 @@ static int multipath_congested(struct mddev *mddev, int bits)
 		if (rdev && !test_bit(Faulty, &rdev->flags)) {
 			struct request_queue *q = bdev_get_queue(rdev->bdev);
 
-			ret |= bdi_congested(&q->backing_dev_info, bits);
+			ret |= bdi_congested(q->backing_dev_info, bits);
 			/* Just like multipath_map, we just check the
 			 * first available device
 			 */
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index a162fedeb51a..1ec22cdac8cb 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -36,7 +36,7 @@ static int raid0_congested(struct mddev *mddev, int bits)
 	for (i = 0; i < raid_disks && !ret ; i++) {
 		struct request_queue *q = bdev_get_queue(devlist[i]->bdev);
 
-		ret |= bdi_congested(&q->backing_dev_info, bits);
+		ret |= bdi_congested(q->backing_dev_info, bits);
 	}
 	return ret;
 }
@@ -415,8 +415,8 @@ static int raid0_run(struct mddev *mddev)
 		 */
 		int stripe = mddev->raid_disks *
 			(mddev->chunk_sectors << 9) / PAGE_SIZE;
-		if (mddev->queue->backing_dev_info.ra_pages < 2* stripe)
-			mddev->queue->backing_dev_info.ra_pages = 2* stripe;
+		if (mddev->queue->backing_dev_info->ra_pages < 2* stripe)
+			mddev->queue->backing_dev_info->ra_pages = 2* stripe;
 	}
 
 	dump_zones(mddev);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a1f3fbed9100..c5a128ec03fb 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -740,9 +740,9 @@ static int raid1_congested(struct mddev *mddev, int bits)
 			 * non-congested targets, it can be removed
 			 */
 			if ((bits & (1 << WB_async_congested)) || 1)
-				ret |= bdi_congested(&q->backing_dev_info, bits);
+				ret |= bdi_congested(q->backing_dev_info, bits);
 			else
-				ret &= bdi_congested(&q->backing_dev_info, bits);
+				ret &= bdi_congested(q->backing_dev_info, bits);
 		}
 	}
 	rcu_read_unlock();
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index ab5e86209322..d15028f63095 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -860,7 +860,7 @@ static int raid10_congested(struct mddev *mddev, int bits)
 		if (rdev && !test_bit(Faulty, &rdev->flags)) {
 			struct request_queue *q = bdev_get_queue(rdev->bdev);
 
-			ret |= bdi_congested(&q->backing_dev_info, bits);
+			ret |= bdi_congested(q->backing_dev_info, bits);
 		}
 	}
 	rcu_read_unlock();
@@ -3806,8 +3806,8 @@ static int raid10_run(struct mddev *mddev)
 		 * maybe...
 		 */
 		stripe /= conf->geo.near_copies;
-		if (mddev->queue->backing_dev_info.ra_pages < 2 * stripe)
-			mddev->queue->backing_dev_info.ra_pages = 2 * stripe;
+		if (mddev->queue->backing_dev_info->ra_pages < 2 * stripe)
+			mddev->queue->backing_dev_info->ra_pages = 2 * stripe;
 	}
 
 	if (md_integrity_register(mddev))
@@ -4608,8 +4608,8 @@ static void end_reshape(struct r10conf *conf)
 		int stripe = conf->geo.raid_disks *
 			((conf->mddev->chunk_sectors << 9) / PAGE_SIZE);
 		stripe /= conf->geo.near_copies;
-		if (conf->mddev->queue->backing_dev_info.ra_pages < 2 * stripe)
-			conf->mddev->queue->backing_dev_info.ra_pages = 2 * stripe;
+		if (conf->mddev->queue->backing_dev_info->ra_pages < 2 * stripe)
+			conf->mddev->queue->backing_dev_info->ra_pages = 2 * stripe;
 	}
 	conf->fullsync = 0;
 }
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 06d7279bdd04..e075cdae5b5d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6262,10 +6262,10 @@ raid5_store_skip_copy(struct mddev *mddev, const char *page, size_t len)
 		mddev_suspend(mddev);
 		conf->skip_copy = new;
 		if (new)
-			mddev->queue->backing_dev_info.capabilities |=
+			mddev->queue->backing_dev_info->capabilities |=
 				BDI_CAP_STABLE_WRITES;
 		else
-			mddev->queue->backing_dev_info.capabilities &=
+			mddev->queue->backing_dev_info->capabilities &=
 				~BDI_CAP_STABLE_WRITES;
 		mddev_resume(mddev);
 	}
@@ -7084,8 +7084,8 @@ static int raid5_run(struct mddev *mddev)
 		int data_disks = conf->previous_raid_disks - conf->max_degraded;
 		int stripe = data_disks *
 			((mddev->chunk_sectors << 9) / PAGE_SIZE);
-		if (mddev->queue->backing_dev_info.ra_pages < 2 * stripe)
-			mddev->queue->backing_dev_info.ra_pages = 2 * stripe;
+		if (mddev->queue->backing_dev_info->ra_pages < 2 * stripe)
+			mddev->queue->backing_dev_info->ra_pages = 2 * stripe;
 
 		chunk_size = mddev->chunk_sectors << 9;
 		blk_queue_io_min(mddev->queue, chunk_size);
@@ -7694,8 +7694,8 @@ static void end_reshape(struct r5conf *conf)
 			int data_disks = conf->raid_disks - conf->max_degraded;
 			int stripe = data_disks * ((conf->chunk_sectors << 9)
 						   / PAGE_SIZE);
-			if (conf->mddev->queue->backing_dev_info.ra_pages < 2 * stripe)
-				conf->mddev->queue->backing_dev_info.ra_pages = 2 * stripe;
+			if (conf->mddev->queue->backing_dev_info->ra_pages < 2 * stripe)
+				conf->mddev->queue->backing_dev_info->ra_pages = 2 * stripe;
 		}
 	}
 }
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index a34308df927f..c0e5b9a8bf5f 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1226,7 +1226,7 @@ static int set_gfs2_super(struct super_block *s, void *data)
 	 * We set the bdi here to the queue backing, file systems can
 	 * overwrite this in ->fill_super()
 	 */
-	s->s_bdi = &bdev_get_queue(s->s_bdev)->backing_dev_info;
+	s->s_bdi = bdev_get_queue(s->s_bdev)->backing_dev_info;
 	return 0;
 }
 
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 12eeae62a2b1..e1872f36147f 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1068,7 +1068,7 @@ nilfs_fill_super(struct super_block *sb, void *data, int silent)
 	sb->s_time_gran = 1;
 	sb->s_max_links = NILFS_LINK_MAX;
 
-	sb->s_bdi = &bdev_get_queue(sb->s_bdev)->backing_dev_info;
+	sb->s_bdi = bdev_get_queue(sb->s_bdev)->backing_dev_info;
 
 	err = load_nilfs(nilfs, sb);
 	if (err)
diff --git a/fs/super.c b/fs/super.c
index 1709ed029a2c..ea662b0e5e78 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1047,7 +1047,7 @@ static int set_bdev_super(struct super_block *s, void *data)
 	 * We set the bdi here to the queue backing, file systems can
 	 * overwrite this in ->fill_super()
 	 */
-	s->s_bdi = &bdev_get_queue(s->s_bdev)->backing_dev_info;
+	s->s_bdi = bdev_get_queue(s->s_bdev)->backing_dev_info;
 	return 0;
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 83695641bd5e..069e4a102a73 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -432,7 +432,8 @@ struct request_queue {
 	 */
 	struct delayed_work	delay_work;
 
-	struct backing_dev_info	backing_dev_info;
+	struct backing_dev_info	*backing_dev_info;
+	struct backing_dev_info	_backing_dev_info;
 
 	/*
 	 * The queue owner gets to use this for whatever they like.
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 290e8b7d3181..216449825859 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1988,11 +1988,11 @@ void laptop_mode_timer_fn(unsigned long data)
 	 * We want to write everything out, not just down to the dirty
 	 * threshold
 	 */
-	if (!bdi_has_dirty_io(&q->backing_dev_info))
+	if (!bdi_has_dirty_io(q->backing_dev_info))
 		return;
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(wb, &q->backing_dev_info.wb_list, bdi_node)
+	list_for_each_entry_rcu(wb, &q->backing_dev_info->wb_list, bdi_node)
 		if (wb_has_dirty_io(wb))
 			wb_start_writeback(wb, nr_pages, true,
 					   WB_REASON_LAPTOP_TIMER);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info
  2017-01-31 12:54 [PATCH 0/4 v2] BDI lifetime fix Jan Kara
  2017-01-31 12:54 ` [PATCH 1/4] block: Unhash block device inodes on gendisk destruction Jan Kara
  2017-01-31 12:54 ` [PATCH 2/4] block: Use pointer to backing_dev_info from request_queue Jan Kara
@ 2017-01-31 12:54 ` Jan Kara
  2017-02-01  9:50   ` Christoph Hellwig
  2017-01-31 12:54 ` [PATCH 4/4] block: Make blk_get_backing_dev_info() safe without open bdev Jan Kara
  2017-02-06 14:48 ` [PATCH 0/4 v2] BDI lifetime fix Thiago Jung Bauermann
  4 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2017-01-31 12:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour, Jan Kara

Instead of storing backing_dev_info inside struct request_queue,
allocate it dynamically, reference count it, and free it when the last
reference is dropped. Currently only request_queue holds the reference
but in the following patch we add other users referencing
backing_dev_info.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-core.c                 | 12 +++++-------
 block/blk-sysfs.c                |  2 +-
 include/linux/backing-dev-defs.h |  2 ++
 include/linux/backing-dev.h      | 10 +++++++++-
 include/linux/blkdev.h           |  1 -
 mm/backing-dev.c                 | 37 ++++++++++++++++++++++++++++++++++++-
 6 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a9ff1b919ae7..545ccb4b96f3 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -693,7 +693,6 @@ static void blk_rq_timed_out_timer(unsigned long data)
 struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 {
 	struct request_queue *q;
-	int err;
 
 	q = kmem_cache_alloc_node(blk_requestq_cachep,
 				gfp_mask | __GFP_ZERO, node_id);
@@ -708,17 +707,16 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	if (!q->bio_split)
 		goto fail_id;
 
-	q->backing_dev_info = &q->_backing_dev_info;
+	q->backing_dev_info = bdi_alloc_node(gfp_mask, node_id);
+	if (!q->backing_dev_info)
+		goto fail_split;
+
 	q->backing_dev_info->ra_pages =
 			(VM_MAX_READAHEAD * 1024) / PAGE_SIZE;
 	q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
 	q->backing_dev_info->name = "block";
 	q->node = node_id;
 
-	err = bdi_init(q->backing_dev_info);
-	if (err)
-		goto fail_split;
-
 	setup_timer(&q->backing_dev_info->laptop_mode_wb_timer,
 		    laptop_mode_timer_fn, (unsigned long) q);
 	setup_timer(&q->timeout, blk_rq_timed_out_timer, (unsigned long) q);
@@ -769,7 +767,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 fail_ref:
 	percpu_ref_exit(&q->q_usage_counter);
 fail_bdi:
-	bdi_destroy(q->backing_dev_info);
+	bdi_put(q->backing_dev_info);
 fail_split:
 	bioset_free(q->bio_split);
 fail_id:
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 64fb54c6b41c..4cbaa519ec2d 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -799,7 +799,7 @@ static void blk_release_queue(struct kobject *kobj)
 		container_of(kobj, struct request_queue, kobj);
 
 	wbt_exit(q);
-	bdi_exit(q->backing_dev_info);
+	bdi_put(q->backing_dev_info);
 	blkcg_exit_queue(q);
 
 	if (q->elevator) {
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index e850e76acaaf..ad955817916d 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -10,6 +10,7 @@
 #include <linux/flex_proportions.h>
 #include <linux/timer.h>
 #include <linux/workqueue.h>
+#include <linux/kref.h>
 
 struct page;
 struct device;
@@ -144,6 +145,7 @@ struct backing_dev_info {
 
 	char *name;
 
+	struct kref refcnt;	/* Reference counter for the structure */
 	unsigned int capabilities; /* Device capabilities */
 	unsigned int min_ratio;
 	unsigned int max_ratio, max_prop_frac;
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 43b93a947e61..efb6ca992d05 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -18,7 +18,14 @@
 #include <linux/slab.h>
 
 int __must_check bdi_init(struct backing_dev_info *bdi);
-void bdi_exit(struct backing_dev_info *bdi);
+
+static inline struct backing_dev_info *bdi_get(struct backing_dev_info *bdi)
+{
+	kref_get(&bdi->refcnt);
+	return bdi;
+}
+
+void bdi_put(struct backing_dev_info *bdi);
 
 __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
@@ -29,6 +36,7 @@ void bdi_unregister(struct backing_dev_info *bdi);
 
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
 void bdi_destroy(struct backing_dev_info *bdi);
+struct backing_dev_info *bdi_alloc_node(gfp_t gfp_mask, int node_id);
 
 void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
 			bool range_cyclic, enum wb_reason reason);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 069e4a102a73..de85701cc699 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -433,7 +433,6 @@ struct request_queue {
 	struct delayed_work	delay_work;
 
 	struct backing_dev_info	*backing_dev_info;
-	struct backing_dev_info	_backing_dev_info;
 
 	/*
 	 * The queue owner gets to use this for whatever they like.
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 3bfed5ab2475..db7d33a898a8 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -20,6 +20,8 @@ struct backing_dev_info noop_backing_dev_info = {
 };
 EXPORT_SYMBOL_GPL(noop_backing_dev_info);
 
+static struct kmem_cache *bdi_cachep;
+
 static struct class *bdi_class;
 
 /*
@@ -237,6 +239,9 @@ static __init int bdi_class_init(void)
 
 	bdi_class->dev_groups = bdi_dev_groups;
 	bdi_debug_init();
+
+	bdi_cachep = KMEM_CACHE(backing_dev_info, SLAB_PANIC);
+
 	return 0;
 }
 postcore_initcall(bdi_class_init);
@@ -776,6 +781,7 @@ int bdi_init(struct backing_dev_info *bdi)
 
 	bdi->dev = NULL;
 
+	kref_init(&bdi->refcnt);
 	bdi->min_ratio = 0;
 	bdi->max_ratio = 100;
 	bdi->max_prop_frac = FPROP_FRAC_BASE;
@@ -791,6 +797,21 @@ int bdi_init(struct backing_dev_info *bdi)
 }
 EXPORT_SYMBOL(bdi_init);
 
+struct backing_dev_info *bdi_alloc_node(gfp_t gfp_mask, int node_id)
+{
+	struct backing_dev_info *bdi;
+
+	bdi = kmem_cache_alloc_node(bdi_cachep, gfp_mask | __GFP_ZERO, node_id);
+	if (!bdi)
+		return NULL;
+
+	if (bdi_init(bdi)) {
+		kmem_cache_free(bdi_cachep, bdi);
+		return NULL;
+	}
+	return bdi;
+}
+
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 		const char *fmt, ...)
 {
@@ -871,12 +892,26 @@ void bdi_unregister(struct backing_dev_info *bdi)
 	}
 }
 
-void bdi_exit(struct backing_dev_info *bdi)
+static void bdi_exit(struct backing_dev_info *bdi)
 {
 	WARN_ON_ONCE(bdi->dev);
 	wb_exit(&bdi->wb);
 }
 
+static void release_bdi(struct kref *ref)
+{
+	struct backing_dev_info *bdi =
+			container_of(ref, struct backing_dev_info, refcnt);
+
+	bdi_exit(bdi);
+	kmem_cache_free(bdi_cachep, bdi);
+}
+
+void bdi_put(struct backing_dev_info *bdi)
+{
+	kref_put(&bdi->refcnt, release_bdi);
+}
+
 void bdi_destroy(struct backing_dev_info *bdi)
 {
 	bdi_unregister(bdi);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] block: Make blk_get_backing_dev_info() safe without open bdev
  2017-01-31 12:54 [PATCH 0/4 v2] BDI lifetime fix Jan Kara
                   ` (2 preceding siblings ...)
  2017-01-31 12:54 ` [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info Jan Kara
@ 2017-01-31 12:54 ` Jan Kara
  2017-02-01  9:53   ` Christoph Hellwig
  2017-02-01 19:25   ` Dan Williams
  2017-02-06 14:48 ` [PATCH 0/4 v2] BDI lifetime fix Thiago Jung Bauermann
  4 siblings, 2 replies; 22+ messages in thread
From: Jan Kara @ 2017-01-31 12:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour, Jan Kara

Currenly blk_get_backing_dev_info() is not safe to be called when the
block device is not open as bdev->bd_disk is NULL in that case. However
inode_to_bdi() uses this function and may be call called from flusher
worker or other writeback related functions without bdev being open
which leads to crashes such as:

[113031.075540] Unable to handle kernel paging request for data at address 0x00000000
[113031.075614] Faulting instruction address: 0xc0000000003692e0
0:mon> t
[c0000000fb65f900] c00000000036cb6c writeback_sb_inodes+0x30c/0x590
[c0000000fb65fa10] c00000000036ced4 __writeback_inodes_wb+0xe4/0x150
[c0000000fb65fa70] c00000000036d33c wb_writeback+0x30c/0x450
[c0000000fb65fb40] c00000000036e198 wb_workfn+0x268/0x580
[c0000000fb65fc50] c0000000000f3470 process_one_work+0x1e0/0x590
[c0000000fb65fce0] c0000000000f38c8 worker_thread+0xa8/0x660
[c0000000fb65fd80] c0000000000fc4b0 kthread+0x110/0x130
[c0000000fb65fe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c
--- Exception: 0  at 0000000000000000
0:mon> e
cpu 0x0: Vector: 300 (Data Access) at [c0000000fb65f620]
    pc: c0000000003692e0: locked_inode_to_wb_and_lock_list+0x50/0x290
    lr: c00000000036cb6c: writeback_sb_inodes+0x30c/0x590
    sp: c0000000fb65f8a0
   msr: 800000010280b033
   dar: 0
 dsisr: 40000000
  current = 0xc0000001d69be400
  paca    = 0xc000000003480000   softe: 0        irq_happened: 0x01
    pid   = 18689, comm = kworker/u16:10

Fix the problem by grabbing reference to bdi on first open of the block
device and drop the reference only once the inode is evicted from
memory. This pins struct backing_dev_info in memory and thus fixes the
crashes.

Reported-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-core.c   | 8 +++-----
 fs/block_dev.c     | 7 +++++++
 include/linux/fs.h | 1 +
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 545ccb4b96f3..84fabb51714a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -109,14 +109,12 @@ void blk_queue_congestion_threshold(struct request_queue *q)
  * @bdev:	device
  *
  * Locates the passed device's request queue and returns the address of its
- * backing_dev_info.  This function can only be called if @bdev is opened
- * and the return value is never NULL.
+ * backing_dev_info. The return value is never NULL however we may return
+ * &noop_backing_dev_info if the bdev is not currently open.
  */
 struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev)
 {
-	struct request_queue *q = bdev_get_queue(bdev);
-
-	return q->backing_dev_info;
+	return bdev->bd_bdi;
 }
 EXPORT_SYMBOL(blk_get_backing_dev_info);
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index ed6a34be7a1e..601b71b76d7f 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -884,6 +884,8 @@ static void bdev_evict_inode(struct inode *inode)
 	spin_lock(&bdev_lock);
 	list_del_init(&bdev->bd_list);
 	spin_unlock(&bdev_lock);
+	if (bdev->bd_bdi != &noop_backing_dev_info)
+		bdi_put(bdev->bd_bdi);
 }
 
 static const struct super_operations bdev_sops = {
@@ -986,6 +988,7 @@ struct block_device *bdget(dev_t dev)
 		bdev->bd_contains = NULL;
 		bdev->bd_super = NULL;
 		bdev->bd_inode = inode;
+		bdev->bd_bdi = &noop_backing_dev_info;
 		bdev->bd_block_size = (1 << inode->i_blkbits);
 		bdev->bd_part_count = 0;
 		bdev->bd_invalidated = 0;
@@ -1542,6 +1545,8 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 		bdev->bd_disk = disk;
 		bdev->bd_queue = disk->queue;
 		bdev->bd_contains = bdev;
+		if (bdev->bd_bdi == &noop_backing_dev_info)
+			bdev->bd_bdi = bdi_get(disk->queue->backing_dev_info);
 
 		if (!partno) {
 			ret = -ENXIO;
@@ -1637,6 +1642,8 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 	bdev->bd_disk = NULL;
 	bdev->bd_part = NULL;
 	bdev->bd_queue = NULL;
+	bdi_put(bdev->bd_bdi);
+	bdev->bd_bdi = &noop_backing_dev_info;
 	if (bdev != bdev->bd_contains)
 		__blkdev_put(bdev->bd_contains, mode, 1);
 	bdev->bd_contains = NULL;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 702cb6c50194..c930cbc19342 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -423,6 +423,7 @@ struct block_device {
 	int			bd_invalidated;
 	struct gendisk *	bd_disk;
 	struct request_queue *  bd_queue;
+	struct backing_dev_info *bd_bdi;
 	struct list_head	bd_list;
 	/*
 	 * Private data.  You must have bd_claim'ed the block_device
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] block: Unhash block device inodes on gendisk destruction
  2017-01-31 12:54 ` [PATCH 1/4] block: Unhash block device inodes on gendisk destruction Jan Kara
@ 2017-02-01  9:48   ` Christoph Hellwig
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2017-02-01  9:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] block: Use pointer to backing_dev_info from request_queue
  2017-01-31 12:54 ` [PATCH 2/4] block: Use pointer to backing_dev_info from request_queue Jan Kara
@ 2017-02-01  9:48   ` Christoph Hellwig
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2017-02-01  9:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info
  2017-01-31 12:54 ` [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info Jan Kara
@ 2017-02-01  9:50   ` Christoph Hellwig
  2017-02-01 12:22     ` Jan Kara
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2017-02-01  9:50 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour

On Tue, Jan 31, 2017 at 01:54:28PM +0100, Jan Kara wrote:
> Instead of storing backing_dev_info inside struct request_queue,
> allocate it dynamically, reference count it, and free it when the last
> reference is dropped. Currently only request_queue holds the reference
> but in the following patch we add other users referencing
> backing_dev_info.

Do we really need the separate slab cache?  Otherwise this looks
fine to me.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] block: Make blk_get_backing_dev_info() safe without open bdev
  2017-01-31 12:54 ` [PATCH 4/4] block: Make blk_get_backing_dev_info() safe without open bdev Jan Kara
@ 2017-02-01  9:53   ` Christoph Hellwig
  2017-02-01 12:28     ` Jan Kara
  2017-02-01 19:25   ` Dan Williams
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2017-02-01  9:53 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour

Looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

But can you also add another patch to kill off blk_get_backing_dev_info?
The direct dereference is short and cleaner.  Additionally the bt_bdi
field in XFS could go away, too.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info
  2017-02-01  9:50   ` Christoph Hellwig
@ 2017-02-01 12:22     ` Jan Kara
  2017-02-01 22:45       ` Jens Axboe
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2017-02-01 12:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Jens Axboe, linux-block, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour

On Wed 01-02-17 01:50:07, Christoph Hellwig wrote:
> On Tue, Jan 31, 2017 at 01:54:28PM +0100, Jan Kara wrote:
> > Instead of storing backing_dev_info inside struct request_queue,
> > allocate it dynamically, reference count it, and free it when the last
> > reference is dropped. Currently only request_queue holds the reference
> > but in the following patch we add other users referencing
> > backing_dev_info.
> 
> Do we really need the separate slab cache?  Otherwise this looks
> fine to me.

Yeah, probably it is not worth it. I'll remove it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] block: Make blk_get_backing_dev_info() safe without open bdev
  2017-02-01  9:53   ` Christoph Hellwig
@ 2017-02-01 12:28     ` Jan Kara
  2017-02-01 12:52       ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2017-02-01 12:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Jens Axboe, linux-block, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour

On Wed 01-02-17 01:53:20, Christoph Hellwig wrote:
> Looks fine:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> But can you also add another patch to kill off blk_get_backing_dev_info?
> The direct dereference is short and cleaner.  Additionally the bt_bdi
> field in XFS could go away, too.

OK, I'll do that. Another cleanup I was considering is to remove all other
embedded occurences of backing_dev_info and make the structure only
dynamically allocated. It would unify the handling of backing_dev_info and
allow us to make bdi_init(), bdi_destroy(), etc. static inside
mm/backing_dev.c. What do you think?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] block: Make blk_get_backing_dev_info() safe without open bdev
  2017-02-01 12:28     ` Jan Kara
@ 2017-02-01 12:52       ` Christoph Hellwig
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2017-02-01 12:52 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour

On Wed, Feb 01, 2017 at 01:28:14PM +0100, Jan Kara wrote:
> OK, I'll do that. Another cleanup I was considering is to remove all other
> embedded occurences of backing_dev_info and make the structure only
> dynamically allocated. It would unify the handling of backing_dev_info and
> allow us to make bdi_init(), bdi_destroy(), etc. static inside
> mm/backing_dev.c. What do you think?

Yes, that would be great.  I have vague memories trying to do that
a long time ago, but I don't remember why it didn't go anywhere.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] block: Make blk_get_backing_dev_info() safe without open bdev
  2017-01-31 12:54 ` [PATCH 4/4] block: Make blk_get_backing_dev_info() safe without open bdev Jan Kara
  2017-02-01  9:53   ` Christoph Hellwig
@ 2017-02-01 19:25   ` Dan Williams
  1 sibling, 0 replies; 22+ messages in thread
From: Dan Williams @ 2017-02-01 19:25 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig,
	Thiago Jung Bauermann, Laurent Dufour

On Tue, Jan 31, 2017 at 4:54 AM, Jan Kara <jack@suse.cz> wrote:
> Currenly blk_get_backing_dev_info() is not safe to be called when the
> block device is not open as bdev->bd_disk is NULL in that case. However
> inode_to_bdi() uses this function and may be call called from flusher
> worker or other writeback related functions without bdev being open
> which leads to crashes such as:
>
> [113031.075540] Unable to handle kernel paging request for data at address 0x00000000
> [113031.075614] Faulting instruction address: 0xc0000000003692e0
> 0:mon> t
> [c0000000fb65f900] c00000000036cb6c writeback_sb_inodes+0x30c/0x590
> [c0000000fb65fa10] c00000000036ced4 __writeback_inodes_wb+0xe4/0x150
> [c0000000fb65fa70] c00000000036d33c wb_writeback+0x30c/0x450
> [c0000000fb65fb40] c00000000036e198 wb_workfn+0x268/0x580
> [c0000000fb65fc50] c0000000000f3470 process_one_work+0x1e0/0x590
> [c0000000fb65fce0] c0000000000f38c8 worker_thread+0xa8/0x660
> [c0000000fb65fd80] c0000000000fc4b0 kthread+0x110/0x130
> [c0000000fb65fe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c
> --- Exception: 0  at 0000000000000000
> 0:mon> e
> cpu 0x0: Vector: 300 (Data Access) at [c0000000fb65f620]
>     pc: c0000000003692e0: locked_inode_to_wb_and_lock_list+0x50/0x290
>     lr: c00000000036cb6c: writeback_sb_inodes+0x30c/0x590
>     sp: c0000000fb65f8a0
>    msr: 800000010280b033
>    dar: 0
>  dsisr: 40000000
>   current = 0xc0000001d69be400
>   paca    = 0xc000000003480000   softe: 0        irq_happened: 0x01
>     pid   = 18689, comm = kworker/u16:10
>
> Fix the problem by grabbing reference to bdi on first open of the block
> device and drop the reference only once the inode is evicted from
> memory. This pins struct backing_dev_info in memory and thus fixes the
> crashes.
>
> Reported-by: Dan Williams <dan.j.williams@intel.com>
> Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> Signed-off-by: Jan Kara <jack@suse.cz>

Tested-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info
  2017-02-01 12:22     ` Jan Kara
@ 2017-02-01 22:45       ` Jens Axboe
  2017-02-02 13:32         ` Jan Kara
  0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2017-02-01 22:45 UTC (permalink / raw)
  To: Jan Kara, Christoph Hellwig
  Cc: linux-block, Dan Williams, Thiago Jung Bauermann, Laurent Dufour

On 02/01/2017 04:22 AM, Jan Kara wrote:
> On Wed 01-02-17 01:50:07, Christoph Hellwig wrote:
>> On Tue, Jan 31, 2017 at 01:54:28PM +0100, Jan Kara wrote:
>>> Instead of storing backing_dev_info inside struct request_queue,
>>> allocate it dynamically, reference count it, and free it when the last
>>> reference is dropped. Currently only request_queue holds the reference
>>> but in the following patch we add other users referencing
>>> backing_dev_info.
>>
>> Do we really need the separate slab cache?  Otherwise this looks
>> fine to me.
> 
> Yeah, probably it is not worth it. I'll remove it.

I agree on that, it should not be a hot path. Will you respin the series
after making this change? Would be great to get this queued up.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info
  2017-02-01 22:45       ` Jens Axboe
@ 2017-02-02 13:32         ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-02-02 13:32 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jan Kara, Christoph Hellwig, linux-block, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour

On Wed 01-02-17 14:45:20, Jens Axboe wrote:
> On 02/01/2017 04:22 AM, Jan Kara wrote:
> > On Wed 01-02-17 01:50:07, Christoph Hellwig wrote:
> >> On Tue, Jan 31, 2017 at 01:54:28PM +0100, Jan Kara wrote:
> >>> Instead of storing backing_dev_info inside struct request_queue,
> >>> allocate it dynamically, reference count it, and free it when the last
> >>> reference is dropped. Currently only request_queue holds the reference
> >>> but in the following patch we add other users referencing
> >>> backing_dev_info.
> >>
> >> Do we really need the separate slab cache?  Otherwise this looks
> >> fine to me.
> > 
> > Yeah, probably it is not worth it. I'll remove it.
> 
> I agree on that, it should not be a hot path. Will you respin the series
> after making this change? Would be great to get this queued up.

Yes, will send it later today. I was just waiting whether someone else does
not have more comments to the series.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4 v2] BDI lifetime fix
  2017-01-31 12:54 [PATCH 0/4 v2] BDI lifetime fix Jan Kara
                   ` (3 preceding siblings ...)
  2017-01-31 12:54 ` [PATCH 4/4] block: Make blk_get_backing_dev_info() safe without open bdev Jan Kara
@ 2017-02-06 14:48 ` Thiago Jung Bauermann
  2017-02-06 15:26   ` Thiago Jung Bauermann
  4 siblings, 1 reply; 22+ messages in thread
From: Thiago Jung Bauermann @ 2017-02-06 14:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams, Laurent Dufour

Hello,

Am Dienstag, 31. Januar 2017, 13:54:25 BRST schrieb Jan Kara:
> this is a second version of the patch series that attempts to solve the
> problems with the life time of a backing_dev_info structure. Currently it
> lives inside request_queue structure and thus it gets destroyed as soon as
> request queue goes away. However the block device inode still stays around
> and thus inode_to_bdi() call on that inode (e.g. from flusher worker) may
> happen after request queue has been destroyed resulting in oops.
> 
> This patch set tries to solve these problems by making backing_dev_info
> independent structure referenced from block device inode. That makes sure
> inode_to_bdi() cannot ever oops. I gave some basic testing to the patches
> in KVM and on a real machine, Dan was running them with libnvdimm test suite
> which was previously triggering the oops and things look good. So they
> should be reasonably healthy. Laurent, if you can give these patches
> testing in your environment where you were triggering the oops, it would be
> nice.

I know you posted a v3, but we are seeing this crash on v2 and looking at
v3's changelog it doesn't seem it would make a difference:

6:mon> th
[c000000003e6b940] c00000000037d15c writeback_sb_inodes+0x30c/0x590
[c000000003e6ba50] c00000000037d4c4 __writeback_inodes_wb+0xe4/0x150
[c000000003e6bab0] c00000000037d91c wb_writeback+0x2fc/0x440
[c000000003e6bb80] c00000000037e778 wb_workfn+0x268/0x580
[c000000003e6bc90] c0000000000f3890 process_one_work+0x1e0/0x590
[c000000003e6bd20] c0000000000f3ce8 worker_thread+0xa8/0x660
[c000000003e6bdc0] c0000000000fd124 kthread+0x154/0x1a0
[c000000003e6be30] c00000000000b4e8 ret_from_kernel_thread+0x5c/0x74
--- Exception: 0  at 0000000000000000
6:mon> r
R00 = c00000000037d15c   R16 = c0000001fca60160
R01 = c000000003e6b8e0   R17 = c0000001fca600d8
R02 = c0000000014c3800   R18 = c0000001fca601c8
R03 = c0000001fca600d8   R19 = 0000000000000000
R04 = c0000000036478d0   R20 = 0000000000000000
R05 = 0000000000000000   R21 = c000000003e68000
R06 = 00000001fee70000   R22 = c0000001f49d17c0
R07 = 0001c6ce3a83dfca   R23 = c0000001f49d17a0
R08 = 0000000000000000   R24 = 0000000000000000
R09 = 0000000000000000   R25 = c0000001fca60160
R10 = 0000000080000006   R26 = 0000000000000000
R11 = c0000000fb627b68   R27 = 0000000000000000
R12 = 0000000000002200   R28 = 0000000000000001
R13 = c00000000fb83600   R29 = c0000001fca600d8
R14 = c0000000000fcfd8   R30 = c000000003e6bbe0
R15 = 0000000000000000   R31 = 0000000000000000
pc  = c0000000003799a0 locked_inode_to_wb_and_lock_list+0x50/0x290
cfar= c0000000005f5568 iowrite16+0x38/0xb0
lr  = c00000000037d15c writeback_sb_inodes+0x30c/0x590
msr = 800000000280b033   cr  = 24e62882
ctr = c00000000012c110   xer = 0000000000000000   trap =  300
dar = 0000000000000000   dsisr = 40000000
6:mon> sh
[312489.344110] INFO: rcu_sched detected stalls on CPUs/tasks:
[312489.396998] INFO: rcu_sched detected stalls on CPUs/tasks:
[312489.397003]         3-...: (4 ticks this GP) idle=59b/140000000000001/0 softirq=18323196/18323196 fqs=2
[312489.397005]         6-...: (1 GPs behind) idle=86f/140000000000001/0 softirq=18012373/18012374 fqs=2
[312489.397005]         (detected by 2, t=47863798 jiffies, g=9340524, c=9340523, q=170)
[312489.505361] rcu_sched kthread starved for 47863823 jiffies! g9340524 c9340523 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[312489.537334]         3-...: (26 ticks this GP) idle=59b/140000000000000/0 softirq=18323196/18323196 fqs=2
[312489.537395]         6-...: (1 GPs behind) idle=86f/140000000000001/0 softirq=18012373/18012374 fqs=2
[312489.537454]         (detected by 0, t=47863836 jiffies, g=9340524, c=9340523, q=170)
[312489.537528] rcu_sched kthread starved for 47863832 jiffies! g9340524 c9340523 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0
[312489.672967] Unable to handle kernel paging request for data at address 0x00000000
[312489.673028] Faulting instruction address: 0xc0000000003799a0
cpu 0x6: Vector: 300 (Data Access) at [c000000003e6b660]
    pc: c0000000003799a0: locked_inode_to_wb_and_lock_list+0x50/0x290
    lr: c00000000037d15c: writeback_sb_inodes+0x30c/0x590
    sp: c000000003e6b8e0
   msr: 800000000280b433
   dar: 0
 dsisr: 40000000
  current = 0xc000000003646e00
  paca    = 0xc00000000fb83600   softe: 0        irq_happened: 0x01
    pid   = 8569, comm = kworker/u16:5
Linux version 4.10.0-rc3jankarav2+ (bauermann@u1604le) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #3 SMP Wed Feb 1 13:22:47 BRST 2017
enter ? for help
6:mon>                      

It took more than a day under I/O stress test to crash, so it seems to be a
hard to hit race condition. PC is at:

$ addr2line -e /usr/lib/debug/vmlinux-4.10.0-rc3jankarav2+ c0000000003799a0
wb_get at /home/bauermann/src/linux/./include/linux/backing-dev-defs.h:218
 (inlined by) locked_inode_to_wb_and_lock_list at /home/bauermann/src/linux/fs/fs-writeback.c:281

Which is:

216 static inline void wb_get(struct bdi_writeback *wb)
217 {
218         if (wb != &wb->bdi->wb)
219                 percpu_ref_get(&wb->refcnt);
220 }

So it looks like wb->bdi is NULL.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4 v2] BDI lifetime fix
  2017-02-06 14:48 ` [PATCH 0/4 v2] BDI lifetime fix Thiago Jung Bauermann
@ 2017-02-06 15:26   ` Thiago Jung Bauermann
  2017-02-07 12:33     ` Jan Kara
  0 siblings, 1 reply; 22+ messages in thread
From: Thiago Jung Bauermann @ 2017-02-06 15:26 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams, Laurent Dufour

Am Montag, 6. Februar 2017, 12:48:42 BRST schrieb Thiago Jung Bauermann:
> 216 static inline void wb_get(struct bdi_writeback *wb)
> 217 {
> 218         if (wb != &wb->bdi->wb)
> 219                 percpu_ref_get(&wb->refcnt);
> 220 }
> 
> So it looks like wb->bdi is NULL.

Sorry, looking a little deeper, it's actually wb which is NULL:

./include/linux/backing-dev.h:
371             return inode->i_wb;
   0xc00000000037999c <+76>:    ld      r31,256(r29)

./include/linux/backing-dev-defs.h:
218             if (wb != &wb->bdi->wb)
   0xc0000000003799a0 <+80>:    ld      r9,0(r31)
   0xc0000000003799a4 <+84>:    addi    r9,r9,88
   0xc0000000003799a8 <+88>:    cmpld   cr7,r31,r9
   0xc0000000003799ac <+92>:    beq     cr7,0xc0000000003799e0 
<locked_inode_to_wb_and_lock_list+144>

We can see above that inode->i_wb is in r31, and the machine crashed at 
0xc0000000003799a0 so it was trying to dereference wb and crashed.
r31 is NULL in the crash information.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4 v2] BDI lifetime fix
  2017-02-06 15:26   ` Thiago Jung Bauermann
@ 2017-02-07 12:33     ` Jan Kara
  2017-02-07 17:21       ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2017-02-07 12:33 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Jan Kara, Jens Axboe, linux-block, Christoph Hellwig,
	Dan Williams, Laurent Dufour, Tejun Heo

On Mon 06-02-17 13:26:53, Thiago Jung Bauermann wrote:
> Am Montag, 6. Februar 2017, 12:48:42 BRST schrieb Thiago Jung Bauermann:
> > 216 static inline void wb_get(struct bdi_writeback *wb)
> > 217 {
> > 218         if (wb != &wb->bdi->wb)
> > 219                 percpu_ref_get(&wb->refcnt);
> > 220 }
> > 
> > So it looks like wb->bdi is NULL.
> 
> Sorry, looking a little deeper, it's actually wb which is NULL:
> 
> ./include/linux/backing-dev.h:
> 371             return inode->i_wb;
>    0xc00000000037999c <+76>:    ld      r31,256(r29)
> 
> ./include/linux/backing-dev-defs.h:
> 218             if (wb != &wb->bdi->wb)
>    0xc0000000003799a0 <+80>:    ld      r9,0(r31)
>    0xc0000000003799a4 <+84>:    addi    r9,r9,88
>    0xc0000000003799a8 <+88>:    cmpld   cr7,r31,r9
>    0xc0000000003799ac <+92>:    beq     cr7,0xc0000000003799e0 
> <locked_inode_to_wb_and_lock_list+144>
> 
> We can see above that inode->i_wb is in r31, and the machine crashed at 
> 0xc0000000003799a0 so it was trying to dereference wb and crashed.
> r31 is NULL in the crash information.

Thanks for report and the analysis. After some looking into the code I see
where the problem is. Writeback code assumes inode->i_wb can never become
invalid once it is set however we still call inode_detach_wb() from
__blkdev_put(). So in a way this is a different problem but closely
related.

It seems to me that instead of calling inode_detach_wb() in __blkdev_put()
we may just switch blkdev inode to bdi->wb (it is now guaranteed to stay
around). That way bdi_unregister() can complete (destroying all writeback
structures except for bdi->wb) while block device inode can still live with
a valid i_wb structure.

CCed Tejun who is more familar with this code to verify my thoughts.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4 v2] BDI lifetime fix
  2017-02-07 12:33     ` Jan Kara
@ 2017-02-07 17:21       ` Tejun Heo
  2017-02-08  7:51         ` Jan Kara
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2017-02-07 17:21 UTC (permalink / raw)
  To: Jan Kara
  Cc: Thiago Jung Bauermann, Jens Axboe, linux-block,
	Christoph Hellwig, Dan Williams, Laurent Dufour

Hello,

On Tue, Feb 07, 2017 at 01:33:31PM +0100, Jan Kara wrote:
> > We can see above that inode->i_wb is in r31, and the machine crashed at 
> > 0xc0000000003799a0 so it was trying to dereference wb and crashed.
> > r31 is NULL in the crash information.
> 
> Thanks for report and the analysis. After some looking into the code I see
> where the problem is. Writeback code assumes inode->i_wb can never become
> invalid once it is set however we still call inode_detach_wb() from
> __blkdev_put(). So in a way this is a different problem but closely
> related.

Heh, it feels like we're chasing our own tails.

> It seems to me that instead of calling inode_detach_wb() in __blkdev_put()
> we may just switch blkdev inode to bdi->wb (it is now guaranteed to stay
> around). That way bdi_unregister() can complete (destroying all writeback
> structures except for bdi->wb) while block device inode can still live with
> a valid i_wb structure.

So, the problem there would be synchronizing get_wb against the
transition.  We can do that and inode_switch_wbs_work_fn() already
does it but it is a bit nasty.

I'm getting a bit confused here, so the reason we added
inode_detach_wb() in __blkdev_put() was because the root wb might go
away because it's embedded in the bdi which is embedded in the
request_queue which is put and may be released by put_disk().

Are you saying that we changed the behavior so that bdi->wb stays
around?  If so, we can just remove the inode_detach_wb() call, no?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4 v2] BDI lifetime fix
  2017-02-07 17:21       ` Tejun Heo
@ 2017-02-08  7:51         ` Jan Kara
  2017-02-08 10:23           ` Jan Kara
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2017-02-08  7:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jan Kara, Thiago Jung Bauermann, Jens Axboe, linux-block,
	Christoph Hellwig, Dan Williams, Laurent Dufour

On Tue 07-02-17 12:21:01, Tejun Heo wrote:
> Hello,
> 
> On Tue, Feb 07, 2017 at 01:33:31PM +0100, Jan Kara wrote:
> > > We can see above that inode->i_wb is in r31, and the machine crashed at 
> > > 0xc0000000003799a0 so it was trying to dereference wb and crashed.
> > > r31 is NULL in the crash information.
> > 
> > Thanks for report and the analysis. After some looking into the code I see
> > where the problem is. Writeback code assumes inode->i_wb can never become
> > invalid once it is set however we still call inode_detach_wb() from
> > __blkdev_put(). So in a way this is a different problem but closely
> > related.
> 
> Heh, it feels like we're chasing our own tails.

Pretty much. I went through the history of bdi registration and
unregistration to understand various constraints and various different
reasons keep pushing that around and always something breaks...

> > It seems to me that instead of calling inode_detach_wb() in __blkdev_put()
> > we may just switch blkdev inode to bdi->wb (it is now guaranteed to stay
> > around). That way bdi_unregister() can complete (destroying all writeback
> > structures except for bdi->wb) while block device inode can still live with
> > a valid i_wb structure.
> 
> So, the problem there would be synchronizing get_wb against the
> transition.  We can do that and inode_switch_wbs_work_fn() already
> does it but it is a bit nasty.

Yeah, I have prototyped that and it is relatively simple although we have
to use synchronize_rcu() to be sure unlocked users of i_wb are done before
switching and that is somewhat ugly. So I'm looking for ways to avoid the
switching as well. Especially since from high-level POV the switching
should not be necessary. Everything is going away and there is no real work
to be done. Just we may be unlucky enough that e.g. flusher is looking
whether there's some work to do and we remove stuff under its hands. So
switching seems like a bit of an overkill.

> I'm getting a bit confused here, so the reason we added
> inode_detach_wb() in __blkdev_put() was because the root wb might go
> away because it's embedded in the bdi which is embedded in the
> request_queue which is put and may be released by put_disk().
> 
> Are you saying that we changed the behavior so that bdi->wb stays
> around?  If so, we can just remove the inode_detach_wb() call, no?

Yes, my patches (currently in linux-block) make bdi->wb stay around as long
as the block device inode. However things are complicated by the fact that
these days bdev_inode->i_wb may be pointing even to non-root wb_writeback
structure. If that happens and we don't call inode_detach_wb(),
bdi_unregister() will block waiting for reference count on wb_writeback to
drop to zero which happens only once bdev inode is evicted from inode cache
which may be far far in the future.

Now I think we can move bdi_unregister() into del_gendisk() (where it IMHO
belongs anyway as a counterpart to device_add_disk() in which we call
bdi_register()) and shutdown the block device inode there before calling
bdi_unregister(). But I'm still figuring out whether it will not break
something else because the code has lots of interactions...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4 v2] BDI lifetime fix
  2017-02-08  7:51         ` Jan Kara
@ 2017-02-08 10:23           ` Jan Kara
  2017-02-08 16:00             ` Dan Williams
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2017-02-08 10:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jan Kara, Thiago Jung Bauermann, Jens Axboe, linux-block,
	Christoph Hellwig, Dan Williams, Laurent Dufour

On Wed 08-02-17 08:51:42, Jan Kara wrote:
> On Tue 07-02-17 12:21:01, Tejun Heo wrote:
> > Hello,
> > 
> > On Tue, Feb 07, 2017 at 01:33:31PM +0100, Jan Kara wrote:
> > > > We can see above that inode->i_wb is in r31, and the machine crashed at 
> > > > 0xc0000000003799a0 so it was trying to dereference wb and crashed.
> > > > r31 is NULL in the crash information.
> > > 
> > > Thanks for report and the analysis. After some looking into the code I see
> > > where the problem is. Writeback code assumes inode->i_wb can never become
> > > invalid once it is set however we still call inode_detach_wb() from
> > > __blkdev_put(). So in a way this is a different problem but closely
> > > related.
> > 
> > Heh, it feels like we're chasing our own tails.
> 
> Pretty much. I went through the history of bdi registration and
> unregistration to understand various constraints and various different
> reasons keep pushing that around and always something breaks...
> 
> > > It seems to me that instead of calling inode_detach_wb() in __blkdev_put()
> > > we may just switch blkdev inode to bdi->wb (it is now guaranteed to stay
> > > around). That way bdi_unregister() can complete (destroying all writeback
> > > structures except for bdi->wb) while block device inode can still live with
> > > a valid i_wb structure.
> > 
> > So, the problem there would be synchronizing get_wb against the
> > transition.  We can do that and inode_switch_wbs_work_fn() already
> > does it but it is a bit nasty.
> 
> Yeah, I have prototyped that and it is relatively simple although we have
> to use synchronize_rcu() to be sure unlocked users of i_wb are done before
> switching and that is somewhat ugly. So I'm looking for ways to avoid the
> switching as well. Especially since from high-level POV the switching
> should not be necessary. Everything is going away and there is no real work
> to be done. Just we may be unlucky enough that e.g. flusher is looking
> whether there's some work to do and we remove stuff under its hands. So
> switching seems like a bit of an overkill.
> 
> > I'm getting a bit confused here, so the reason we added
> > inode_detach_wb() in __blkdev_put() was because the root wb might go
> > away because it's embedded in the bdi which is embedded in the
> > request_queue which is put and may be released by put_disk().
> > 
> > Are you saying that we changed the behavior so that bdi->wb stays
> > around?  If so, we can just remove the inode_detach_wb() call, no?
> 
> Yes, my patches (currently in linux-block) make bdi->wb stay around as long
> as the block device inode. However things are complicated by the fact that
> these days bdev_inode->i_wb may be pointing even to non-root wb_writeback
> structure. If that happens and we don't call inode_detach_wb(),
> bdi_unregister() will block waiting for reference count on wb_writeback to
> drop to zero which happens only once bdev inode is evicted from inode cache
> which may be far far in the future.
> 
> Now I think we can move bdi_unregister() into del_gendisk() (where it IMHO
> belongs anyway as a counterpart to device_add_disk() in which we call
> bdi_register()) and shutdown the block device inode there before calling
> bdi_unregister(). But I'm still figuring out whether it will not break
> something else because the code has lots of interactions...

More news from device shutdown world ;): I was looking more into how device
shutdown works. I was wondering what happens when device gets hot-removed
and how do we shutdown stuff. If I tracked the callback maze correctly, when
we remove scsi disk, we do __scsi_remove_device() -> device_del() ->
bus_remove_device() -> device_release_driver() -> sd_remove() ->
del_gendisk(). We also have __scsi_remove_device() -> blk_cleanup_queue()
-> bdi_unregister() shortly after the previous happening
<DETOUR BEGIN>
This ordering seems to be the real culprit of the bdi name reuse problems
Omar has reported? Same as described in commit 6cd18e711dd8 for MD BTW and
Dan's patch could be IMHO replaced by a move of bdi_unregister() from
blk_cleanup_queue() to del_gendisk() where it logically belongs as a
counterpart of device_add_disk(). I'll test that.
<DETOUR END>

__scsi_remove_device() is also called when the device was hot-removed. At
that point the bdev can still be open and in active use and its i_wb can
point to some non-root wb_writeback struct. Thus bdi_unregister() will
block waiting for that wb_writeback to get released and thus SCSI device
removal will block basically intefinitely (at least until fs on top of bdev
gets unmounted). I believe this is a bug and __scsi_remove_device() is
expected to finish regardless of upper layers still using the bdev. So to
fix this I don't think we can really avoid the switching of bdev inode from
non-root wb_writeback structure to bdi->wb.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4 v2] BDI lifetime fix
  2017-02-08 10:23           ` Jan Kara
@ 2017-02-08 16:00             ` Dan Williams
  0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2017-02-08 16:00 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tejun Heo, Thiago Jung Bauermann, Jens Axboe, linux-block,
	Christoph Hellwig, Laurent Dufour

On Wed, Feb 8, 2017 at 2:23 AM, Jan Kara <jack@suse.cz> wrote:
> On Wed 08-02-17 08:51:42, Jan Kara wrote:
[..]
> <DETOUR BEGIN>
> This ordering seems to be the real culprit of the bdi name reuse problems
> Omar has reported? Same as described in commit 6cd18e711dd8 for MD BTW and
> Dan's patch could be IMHO replaced by a move of bdi_unregister() from
> blk_cleanup_queue() to del_gendisk() where it logically belongs as a
> counterpart of device_add_disk(). I'll test that.
> <DETOUR END>

I'd support that change. This is is why I called my fix "brute-force",
it just accommodated the existing problematic ordering.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-02-08 16:00 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-31 12:54 [PATCH 0/4 v2] BDI lifetime fix Jan Kara
2017-01-31 12:54 ` [PATCH 1/4] block: Unhash block device inodes on gendisk destruction Jan Kara
2017-02-01  9:48   ` Christoph Hellwig
2017-01-31 12:54 ` [PATCH 2/4] block: Use pointer to backing_dev_info from request_queue Jan Kara
2017-02-01  9:48   ` Christoph Hellwig
2017-01-31 12:54 ` [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info Jan Kara
2017-02-01  9:50   ` Christoph Hellwig
2017-02-01 12:22     ` Jan Kara
2017-02-01 22:45       ` Jens Axboe
2017-02-02 13:32         ` Jan Kara
2017-01-31 12:54 ` [PATCH 4/4] block: Make blk_get_backing_dev_info() safe without open bdev Jan Kara
2017-02-01  9:53   ` Christoph Hellwig
2017-02-01 12:28     ` Jan Kara
2017-02-01 12:52       ` Christoph Hellwig
2017-02-01 19:25   ` Dan Williams
2017-02-06 14:48 ` [PATCH 0/4 v2] BDI lifetime fix Thiago Jung Bauermann
2017-02-06 15:26   ` Thiago Jung Bauermann
2017-02-07 12:33     ` Jan Kara
2017-02-07 17:21       ` Tejun Heo
2017-02-08  7:51         ` Jan Kara
2017-02-08 10:23           ` Jan Kara
2017-02-08 16:00             ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.