All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4 RFC] BDI lifetime fix
@ 2017-01-26 17:45 Jan Kara
  2017-01-26 17:45 ` [PATCH 1/4] block: Unhash block device inodes on gendisk destruction Jan Kara
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Jan Kara @ 2017-01-26 17:45 UTC (permalink / raw)
  To: linux-block
  Cc: Christoph Hellwig, Jens Axboe, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour, Jan Kara

Hello,

this patch series attempts to solve the problems with the life time of a
backing_dev_info structure. Currently it lives inside request_queue structure
and thus it gets destroyed as soon as request queue goes away. However
the block device inode still stays around and thus inode_to_bdi() call on
that inode (e.g. from flusher worker) may happen after request queue has been
destroyed resulting in oops.

This patch set tries to solve these problems by making backing_dev_info
independent structure referenced from block device inode. That makes sure
inode_to_bdi() cannot ever oops. The patches are lightly tested for now
(they boot, basic tests with adding & removing loop devices seem to do what
I'd expect them to do ;). If someone is able to reproduce crashes on bdi
when device goes away, please test these patches.

I'd also appreciate if people had a look whether the approach I took looks
sensible.

								Honza

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] block: Unhash block device inodes on gendisk destruction
  2017-01-26 17:45 [PATCH 0/4 RFC] BDI lifetime fix Jan Kara
@ 2017-01-26 17:45 ` Jan Kara
  2017-01-26 17:45 ` [PATCH 2/4] block: Use pointer to backing_dev_info from request_queue Jan Kara
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2017-01-26 17:45 UTC (permalink / raw)
  To: linux-block
  Cc: Christoph Hellwig, Jens Axboe, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour, Jan Kara

Currently, block device inodes stay around after corresponding gendisk
hash died until memory reclaim finds them and frees them. Since we will
make block device inode pin the bdi, we want to free the block device
inode as soon as the device goes away so that bdi does not stay around
unnecessarily. Furthermore we need to avoid issues when new device with
the same major,minor pair gets created since reusing the bdi structure
would be rather difficult in this case.

Unhashing block device inode on gendisk destruction nicely deals with
these problems. Once last block device inode reference is dropped (which
may be directly in del_gendisk()), the inode gets evicted. Furthermore if
the major,minor pair gets reallocated, we are guaranteed to get new
block device inode even if old block device inode is not yet evicted and
thus we avoid issues with possible reuse of bdi.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/genhd.c      |  2 ++
 fs/block_dev.c     | 15 +++++++++++++++
 include/linux/fs.h |  1 +
 3 files changed, 18 insertions(+)

diff --git a/block/genhd.c b/block/genhd.c
index fcd6d4fae657..f2f22d0e8e14 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -648,6 +648,8 @@ void del_gendisk(struct gendisk *disk)
 	disk_part_iter_init(&piter, disk,
 			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
 	while ((part = disk_part_iter_next(&piter))) {
+		bdev_unhash_inode(MKDEV(disk->major,
+					disk->first_minor + part->partno));
 		invalidate_partition(disk, part->partno);
 		delete_partition(disk, part->partno);
 	}
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 5db5d1340d69..ed6a34be7a1e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -954,6 +954,21 @@ static int bdev_set(struct inode *inode, void *data)
 
 static LIST_HEAD(all_bdevs);
 
+/*
+ * If there is a bdev inode for this device, unhash it so that it gets evicted
+ * as soon as last inode reference is dropped.
+ */
+void bdev_unhash_inode(dev_t dev)
+{
+	struct inode *inode;
+
+	inode = ilookup5(blockdev_superblock, hash(dev), bdev_test, &dev);
+	if (inode) {
+		remove_inode_hash(inode);
+		iput(inode);
+	}
+}
+
 struct block_device *bdget(dev_t dev)
 {
 	struct block_device *bdev;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2ba074328894..702cb6c50194 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2342,6 +2342,7 @@ extern struct kmem_cache *names_cachep;
 #ifdef CONFIG_BLOCK
 extern int register_blkdev(unsigned int, const char *);
 extern void unregister_blkdev(unsigned int, const char *);
+extern void bdev_unhash_inode(dev_t dev);
 extern struct block_device *bdget(dev_t);
 extern struct block_device *bdgrab(struct block_device *bdev);
 extern void bd_set_size(struct block_device *, loff_t size);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] block: Use pointer to backing_dev_info from request_queue
  2017-01-26 17:45 [PATCH 0/4 RFC] BDI lifetime fix Jan Kara
  2017-01-26 17:45 ` [PATCH 1/4] block: Unhash block device inodes on gendisk destruction Jan Kara
@ 2017-01-26 17:45 ` Jan Kara
  2017-01-26 17:45 ` [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info Jan Kara
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2017-01-26 17:45 UTC (permalink / raw)
  To: linux-block
  Cc: Christoph Hellwig, Jens Axboe, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour, Jan Kara

We will want to have struct backing_dev_info allocated separately from
struct request_queue. As the first step add pointer to backing_dev_info
to request_queue and convert all users touching it. No functional
changes in this patch.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-cgroup.c             |  6 +++---
 block/blk-core.c               | 27 ++++++++++++++-------------
 block/blk-integrity.c          |  4 ++--
 block/blk-settings.c           |  2 +-
 block/blk-sysfs.c              |  8 ++++----
 block/blk-wbt.c                |  8 ++++----
 block/genhd.c                  |  2 +-
 drivers/block/aoe/aoeblk.c     |  4 ++--
 drivers/block/drbd/drbd_main.c |  6 +++---
 drivers/block/drbd/drbd_nl.c   | 12 +++++++-----
 drivers/block/drbd/drbd_proc.c |  2 +-
 drivers/block/drbd/drbd_req.c  |  2 +-
 drivers/block/pktcdvd.c        |  4 ++--
 drivers/block/rbd.c            |  2 +-
 drivers/md/bcache/request.c    | 10 +++++-----
 drivers/md/bcache/super.c      |  8 ++++----
 drivers/md/dm-cache-target.c   |  2 +-
 drivers/md/dm-era-target.c     |  2 +-
 drivers/md/dm-table.c          |  2 +-
 drivers/md/dm-thin.c           |  2 +-
 drivers/md/dm.c                |  6 +++---
 drivers/md/linear.c            |  2 +-
 drivers/md/md.c                |  6 +++---
 drivers/md/multipath.c         |  2 +-
 drivers/md/raid0.c             |  6 +++---
 drivers/md/raid1.c             |  4 ++--
 drivers/md/raid10.c            | 10 +++++-----
 drivers/md/raid5.c             | 12 ++++++------
 fs/gfs2/ops_fstype.c           |  2 +-
 fs/nilfs2/super.c              |  2 +-
 fs/super.c                     |  2 +-
 include/linux/blkdev.h         |  3 ++-
 mm/page-writeback.c            |  4 ++--
 33 files changed, 90 insertions(+), 86 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 8ba0af780e88..d673a69b61b4 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -184,7 +184,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 		goto err_free_blkg;
 	}
 
-	wb_congested = wb_congested_get_create(&q->backing_dev_info,
+	wb_congested = wb_congested_get_create(q->backing_dev_info,
 					       blkcg->css.id,
 					       GFP_NOWAIT | __GFP_NOWARN);
 	if (!wb_congested) {
@@ -469,8 +469,8 @@ static int blkcg_reset_stats(struct cgroup_subsys_state *css,
 const char *blkg_dev_name(struct blkcg_gq *blkg)
 {
 	/* some drivers (floppy) instantiate a queue w/o disk registered */
-	if (blkg->q->backing_dev_info.dev)
-		return dev_name(blkg->q->backing_dev_info.dev);
+	if (blkg->q->backing_dev_info->dev)
+		return dev_name(blkg->q->backing_dev_info->dev);
 	return NULL;
 }
 EXPORT_SYMBOL_GPL(blkg_dev_name);
diff --git a/block/blk-core.c b/block/blk-core.c
index 61ba08c58b64..a9ff1b919ae7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -74,7 +74,7 @@ static void blk_clear_congested(struct request_list *rl, int sync)
 	 * flip its congestion state for events on other blkcgs.
 	 */
 	if (rl == &rl->q->root_rl)
-		clear_wb_congested(rl->q->backing_dev_info.wb.congested, sync);
+		clear_wb_congested(rl->q->backing_dev_info->wb.congested, sync);
 #endif
 }
 
@@ -85,7 +85,7 @@ static void blk_set_congested(struct request_list *rl, int sync)
 #else
 	/* see blk_clear_congested() */
 	if (rl == &rl->q->root_rl)
-		set_wb_congested(rl->q->backing_dev_info.wb.congested, sync);
+		set_wb_congested(rl->q->backing_dev_info->wb.congested, sync);
 #endif
 }
 
@@ -116,7 +116,7 @@ struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev)
 {
 	struct request_queue *q = bdev_get_queue(bdev);
 
-	return &q->backing_dev_info;
+	return q->backing_dev_info;
 }
 EXPORT_SYMBOL(blk_get_backing_dev_info);
 
@@ -584,7 +584,7 @@ void blk_cleanup_queue(struct request_queue *q)
 	blk_flush_integrity();
 
 	/* @q won't process any more request, flush async actions */
-	del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
+	del_timer_sync(&q->backing_dev_info->laptop_mode_wb_timer);
 	blk_sync_queue(q);
 
 	if (q->mq_ops)
@@ -596,7 +596,7 @@ void blk_cleanup_queue(struct request_queue *q)
 		q->queue_lock = &q->__queue_lock;
 	spin_unlock_irq(lock);
 
-	bdi_unregister(&q->backing_dev_info);
+	bdi_unregister(q->backing_dev_info);
 
 	/* @q is and will stay empty, shutdown and put */
 	blk_put_queue(q);
@@ -708,17 +708,18 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	if (!q->bio_split)
 		goto fail_id;
 
-	q->backing_dev_info.ra_pages =
+	q->backing_dev_info = &q->_backing_dev_info;
+	q->backing_dev_info->ra_pages =
 			(VM_MAX_READAHEAD * 1024) / PAGE_SIZE;
-	q->backing_dev_info.capabilities = BDI_CAP_CGROUP_WRITEBACK;
-	q->backing_dev_info.name = "block";
+	q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
+	q->backing_dev_info->name = "block";
 	q->node = node_id;
 
-	err = bdi_init(&q->backing_dev_info);
+	err = bdi_init(q->backing_dev_info);
 	if (err)
 		goto fail_split;
 
-	setup_timer(&q->backing_dev_info.laptop_mode_wb_timer,
+	setup_timer(&q->backing_dev_info->laptop_mode_wb_timer,
 		    laptop_mode_timer_fn, (unsigned long) q);
 	setup_timer(&q->timeout, blk_rq_timed_out_timer, (unsigned long) q);
 	INIT_LIST_HEAD(&q->queue_head);
@@ -768,7 +769,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 fail_ref:
 	percpu_ref_exit(&q->q_usage_counter);
 fail_bdi:
-	bdi_destroy(&q->backing_dev_info);
+	bdi_destroy(q->backing_dev_info);
 fail_split:
 	bioset_free(q->bio_split);
 fail_id:
@@ -1196,7 +1197,7 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
 	 * disturb iosched and blkcg but weird is bettern than dead.
 	 */
 	printk_ratelimited(KERN_WARNING "%s: dev %s: request aux data allocation failed, iosched may be disturbed\n",
-			   __func__, dev_name(q->backing_dev_info.dev));
+			   __func__, dev_name(q->backing_dev_info->dev));
 
 	rq->rq_flags &= ~RQF_ELVPRIV;
 	rq->elv.icq = NULL;
@@ -2696,7 +2697,7 @@ void blk_finish_request(struct request *req, int error)
 	BUG_ON(blk_queued_rq(req));
 
 	if (unlikely(laptop_mode) && req->cmd_type == REQ_TYPE_FS)
-		laptop_io_completion(&req->q->backing_dev_info);
+		laptop_io_completion(req->q->backing_dev_info);
 
 	blk_delete_timer(req);
 
diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index d69c5c79f98e..9f0ff5ba4f84 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -443,10 +443,10 @@ void blk_integrity_revalidate(struct gendisk *disk)
 		return;
 
 	if (bi->profile)
-		disk->queue->backing_dev_info.capabilities |=
+		disk->queue->backing_dev_info->capabilities |=
 			BDI_CAP_STABLE_WRITES;
 	else
-		disk->queue->backing_dev_info.capabilities &=
+		disk->queue->backing_dev_info->capabilities &=
 			~BDI_CAP_STABLE_WRITES;
 }
 
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 529e55f52a03..6eb19bcbf3cb 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -253,7 +253,7 @@ void blk_queue_max_hw_sectors(struct request_queue *q, unsigned int max_hw_secto
 	max_sectors = min_not_zero(max_hw_sectors, limits->max_dev_sectors);
 	max_sectors = min_t(unsigned int, max_sectors, BLK_DEF_MAX_SECTORS);
 	limits->max_sectors = max_sectors;
-	q->backing_dev_info.io_pages = max_sectors >> (PAGE_SHIFT - 9);
+	q->backing_dev_info->io_pages = max_sectors >> (PAGE_SHIFT - 9);
 }
 EXPORT_SYMBOL(blk_queue_max_hw_sectors);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 1dbce057592d..64fb54c6b41c 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -89,7 +89,7 @@ queue_requests_store(struct request_queue *q, const char *page, size_t count)
 
 static ssize_t queue_ra_show(struct request_queue *q, char *page)
 {
-	unsigned long ra_kb = q->backing_dev_info.ra_pages <<
+	unsigned long ra_kb = q->backing_dev_info->ra_pages <<
 					(PAGE_SHIFT - 10);
 
 	return queue_var_show(ra_kb, (page));
@@ -104,7 +104,7 @@ queue_ra_store(struct request_queue *q, const char *page, size_t count)
 	if (ret < 0)
 		return ret;
 
-	q->backing_dev_info.ra_pages = ra_kb >> (PAGE_SHIFT - 10);
+	q->backing_dev_info->ra_pages = ra_kb >> (PAGE_SHIFT - 10);
 
 	return ret;
 }
@@ -236,7 +236,7 @@ queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
 
 	spin_lock_irq(q->queue_lock);
 	q->limits.max_sectors = max_sectors_kb << 1;
-	q->backing_dev_info.io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
+	q->backing_dev_info->io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
 	spin_unlock_irq(q->queue_lock);
 
 	return ret;
@@ -799,7 +799,7 @@ static void blk_release_queue(struct kobject *kobj)
 		container_of(kobj, struct request_queue, kobj);
 
 	wbt_exit(q);
-	bdi_exit(&q->backing_dev_info);
+	bdi_exit(q->backing_dev_info);
 	blkcg_exit_queue(q);
 
 	if (q->elevator) {
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index f0a9c07b4c7a..1aedb1f7ee0c 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -96,7 +96,7 @@ static void wb_timestamp(struct rq_wb *rwb, unsigned long *var)
  */
 static bool wb_recent_wait(struct rq_wb *rwb)
 {
-	struct bdi_writeback *wb = &rwb->queue->backing_dev_info.wb;
+	struct bdi_writeback *wb = &rwb->queue->backing_dev_info->wb;
 
 	return time_before(jiffies, wb->dirty_sleep + HZ);
 }
@@ -279,7 +279,7 @@ enum {
 
 static int __latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat)
 {
-	struct backing_dev_info *bdi = &rwb->queue->backing_dev_info;
+	struct backing_dev_info *bdi = rwb->queue->backing_dev_info;
 	u64 thislat;
 
 	/*
@@ -339,7 +339,7 @@ static int latency_exceeded(struct rq_wb *rwb)
 
 static void rwb_trace_step(struct rq_wb *rwb, const char *msg)
 {
-	struct backing_dev_info *bdi = &rwb->queue->backing_dev_info;
+	struct backing_dev_info *bdi = rwb->queue->backing_dev_info;
 
 	trace_wbt_step(bdi, msg, rwb->scale_step, rwb->cur_win_nsec,
 			rwb->wb_background, rwb->wb_normal, rwb->wb_max);
@@ -423,7 +423,7 @@ static void wb_timer_fn(unsigned long data)
 
 	status = latency_exceeded(rwb);
 
-	trace_wbt_timer(&rwb->queue->backing_dev_info, status, rwb->scale_step,
+	trace_wbt_timer(rwb->queue->backing_dev_info, status, rwb->scale_step,
 			inflight);
 
 	/*
diff --git a/block/genhd.c b/block/genhd.c
index f2f22d0e8e14..d9ccd42f3675 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -613,7 +613,7 @@ void device_add_disk(struct device *parent, struct gendisk *disk)
 	disk_alloc_events(disk);
 
 	/* Register BDI before referencing it from bdev */
-	bdi = &disk->queue->backing_dev_info;
+	bdi = disk->queue->backing_dev_info;
 	bdi_register_owner(bdi, disk_to_dev(disk));
 
 	blk_register_region(disk_devt(disk), disk->minors, NULL,
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index ec9d8610b25f..027b876370bc 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -396,8 +396,8 @@ aoeblk_gdalloc(void *vp)
 	WARN_ON(d->gd);
 	WARN_ON(d->flags & DEVFL_UP);
 	blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
-	q->backing_dev_info.name = "aoe";
-	q->backing_dev_info.ra_pages = READ_AHEAD / PAGE_SIZE;
+	q->backing_dev_info->name = "aoe";
+	q->backing_dev_info->ra_pages = READ_AHEAD / PAGE_SIZE;
 	d->bufpool = mp;
 	d->blkq = gd->queue = q;
 	q->queuedata = d;
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 83482721bc01..d305f05be648 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2462,7 +2462,7 @@ static int drbd_congested(void *congested_data, int bdi_bits)
 
 	if (get_ldev(device)) {
 		q = bdev_get_queue(device->ldev->backing_bdev);
-		r = bdi_congested(&q->backing_dev_info, bdi_bits);
+		r = bdi_congested(q->backing_dev_info, bdi_bits);
 		put_ldev(device);
 		if (r)
 			reason = 'b';
@@ -2834,8 +2834,8 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
 	/* we have no partitions. we contain only ourselves. */
 	device->this_bdev->bd_contains = device->this_bdev;
 
-	q->backing_dev_info.congested_fn = drbd_congested;
-	q->backing_dev_info.congested_data = device;
+	q->backing_dev_info->congested_fn = drbd_congested;
+	q->backing_dev_info->congested_data = device;
 
 	blk_queue_make_request(q, drbd_make_request);
 	blk_queue_write_cache(q, true, true);
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index f35db29cac76..908c704e20aa 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1328,11 +1328,13 @@ static void drbd_setup_queue_param(struct drbd_device *device, struct drbd_backi
 	if (b) {
 		blk_queue_stack_limits(q, b);
 
-		if (q->backing_dev_info.ra_pages != b->backing_dev_info.ra_pages) {
+		if (q->backing_dev_info->ra_pages !=
+		    b->backing_dev_info->ra_pages) {
 			drbd_info(device, "Adjusting my ra_pages to backing device's (%lu -> %lu)\n",
-				 q->backing_dev_info.ra_pages,
-				 b->backing_dev_info.ra_pages);
-			q->backing_dev_info.ra_pages = b->backing_dev_info.ra_pages;
+				 q->backing_dev_info->ra_pages,
+				 b->backing_dev_info->ra_pages);
+			q->backing_dev_info->ra_pages =
+						b->backing_dev_info->ra_pages;
 		}
 	}
 	fixup_discard_if_not_supported(q);
@@ -3345,7 +3347,7 @@ static void device_to_statistics(struct device_statistics *s,
 		s->dev_disk_flags = md->flags;
 		q = bdev_get_queue(device->ldev->backing_bdev);
 		s->dev_lower_blocked =
-			bdi_congested(&q->backing_dev_info,
+			bdi_congested(q->backing_dev_info,
 				      (1 << WB_async_congested) |
 				      (1 << WB_sync_congested));
 		put_ldev(device);
diff --git a/drivers/block/drbd/drbd_proc.c b/drivers/block/drbd/drbd_proc.c
index be2b93fd2c11..8378142f7a55 100644
--- a/drivers/block/drbd/drbd_proc.c
+++ b/drivers/block/drbd/drbd_proc.c
@@ -288,7 +288,7 @@ static int drbd_seq_show(struct seq_file *seq, void *v)
 			seq_printf(seq, "%2d: cs:Unconfigured\n", i);
 		} else {
 			/* reset device->congestion_reason */
-			bdi_rw_congested(&device->rq_queue->backing_dev_info);
+			bdi_rw_congested(device->rq_queue->backing_dev_info);
 
 			nc = rcu_dereference(first_peer_device(device)->connection->net_conf);
 			wp = nc ? nc->wire_protocol - DRBD_PROT_A + 'A' : ' ';
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index de279fe4e4fd..cb6bdb75d52d 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -938,7 +938,7 @@ static bool remote_due_to_read_balancing(struct drbd_device *device, sector_t se
 
 	switch (rbm) {
 	case RB_CONGESTED_REMOTE:
-		bdi = &device->ldev->backing_bdev->bd_disk->queue->backing_dev_info;
+		bdi = device->ldev->backing_bdev->bd_disk->queue->backing_dev_info;
 		return bdi_read_congested(bdi);
 	case RB_LEAST_PENDING:
 		return atomic_read(&device->local_cnt) >
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 1b94c1ca5c5f..f735c1e48549 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -1243,7 +1243,7 @@ static int pkt_handle_queue(struct pktcdvd_device *pd)
 	 		&& pd->bio_queue_size <= pd->write_congestion_off);
 	spin_unlock(&pd->lock);
 	if (wakeup) {
-		clear_bdi_congested(&pd->disk->queue->backing_dev_info,
+		clear_bdi_congested(pd->disk->queue->backing_dev_info,
 					BLK_RW_ASYNC);
 	}
 
@@ -2370,7 +2370,7 @@ static void pkt_make_request_write(struct request_queue *q, struct bio *bio)
 	spin_lock(&pd->lock);
 	if (pd->write_congestion_on > 0
 	    && pd->bio_queue_size >= pd->write_congestion_on) {
-		set_bdi_congested(&q->backing_dev_info, BLK_RW_ASYNC);
+		set_bdi_congested(q->backing_dev_info, BLK_RW_ASYNC);
 		do {
 			spin_unlock(&pd->lock);
 			congestion_wait(BLK_RW_ASYNC, HZ);
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 36d2b9f4e836..82afc56a5614 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -4524,7 +4524,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
 	q->limits.discard_zeroes_data = 1;
 
 	if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
-		q->backing_dev_info.capabilities |= BDI_CAP_STABLE_WRITES;
+		q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
 
 	disk->queue = q;
 
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 76d20875503c..15f2a2c58e97 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -1009,7 +1009,7 @@ static int cached_dev_congested(void *data, int bits)
 	struct request_queue *q = bdev_get_queue(dc->bdev);
 	int ret = 0;
 
-	if (bdi_congested(&q->backing_dev_info, bits))
+	if (bdi_congested(q->backing_dev_info, bits))
 		return 1;
 
 	if (cached_dev_get(dc)) {
@@ -1018,7 +1018,7 @@ static int cached_dev_congested(void *data, int bits)
 
 		for_each_cache(ca, d->c, i) {
 			q = bdev_get_queue(ca->bdev);
-			ret |= bdi_congested(&q->backing_dev_info, bits);
+			ret |= bdi_congested(q->backing_dev_info, bits);
 		}
 
 		cached_dev_put(dc);
@@ -1032,7 +1032,7 @@ void bch_cached_dev_request_init(struct cached_dev *dc)
 	struct gendisk *g = dc->disk.disk;
 
 	g->queue->make_request_fn		= cached_dev_make_request;
-	g->queue->backing_dev_info.congested_fn = cached_dev_congested;
+	g->queue->backing_dev_info->congested_fn = cached_dev_congested;
 	dc->disk.cache_miss			= cached_dev_cache_miss;
 	dc->disk.ioctl				= cached_dev_ioctl;
 }
@@ -1125,7 +1125,7 @@ static int flash_dev_congested(void *data, int bits)
 
 	for_each_cache(ca, d->c, i) {
 		q = bdev_get_queue(ca->bdev);
-		ret |= bdi_congested(&q->backing_dev_info, bits);
+		ret |= bdi_congested(q->backing_dev_info, bits);
 	}
 
 	return ret;
@@ -1136,7 +1136,7 @@ void bch_flash_dev_request_init(struct bcache_device *d)
 	struct gendisk *g = d->disk;
 
 	g->queue->make_request_fn		= flash_dev_make_request;
-	g->queue->backing_dev_info.congested_fn = flash_dev_congested;
+	g->queue->backing_dev_info->congested_fn = flash_dev_congested;
 	d->cache_miss				= flash_dev_cache_miss;
 	d->ioctl				= flash_dev_ioctl;
 }
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 3a19cbc8b230..85e3f21c2514 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -807,7 +807,7 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
 	blk_queue_make_request(q, NULL);
 	d->disk->queue			= q;
 	q->queuedata			= d;
-	q->backing_dev_info.congested_data = d;
+	q->backing_dev_info->congested_data = d;
 	q->limits.max_hw_sectors	= UINT_MAX;
 	q->limits.max_sectors		= UINT_MAX;
 	q->limits.max_segment_size	= UINT_MAX;
@@ -1132,9 +1132,9 @@ static int cached_dev_init(struct cached_dev *dc, unsigned block_size)
 	set_capacity(dc->disk.disk,
 		     dc->bdev->bd_part->nr_sects - dc->sb.data_offset);
 
-	dc->disk.disk->queue->backing_dev_info.ra_pages =
-		max(dc->disk.disk->queue->backing_dev_info.ra_pages,
-		    q->backing_dev_info.ra_pages);
+	dc->disk.disk->queue->backing_dev_info->ra_pages =
+		max(dc->disk.disk->queue->backing_dev_info->ra_pages,
+		    q->backing_dev_info->ra_pages);
 
 	bch_cached_dev_request_init(dc);
 	bch_cached_dev_writeback_init(dc);
diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index e04c61e0839e..fa255d66ca2f 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -2291,7 +2291,7 @@ static void do_waker(struct work_struct *ws)
 static int is_congested(struct dm_dev *dev, int bdi_bits)
 {
 	struct request_queue *q = bdev_get_queue(dev->bdev);
-	return bdi_congested(&q->backing_dev_info, bdi_bits);
+	return bdi_congested(q->backing_dev_info, bdi_bits);
 }
 
 static int cache_is_congested(struct dm_target_callbacks *cb, int bdi_bits)
diff --git a/drivers/md/dm-era-target.c b/drivers/md/dm-era-target.c
index bf2b2676cb8a..9fab33b113c4 100644
--- a/drivers/md/dm-era-target.c
+++ b/drivers/md/dm-era-target.c
@@ -1379,7 +1379,7 @@ static void stop_worker(struct era *era)
 static int dev_is_congested(struct dm_dev *dev, int bdi_bits)
 {
 	struct request_queue *q = bdev_get_queue(dev->bdev);
-	return bdi_congested(&q->backing_dev_info, bdi_bits);
+	return bdi_congested(q->backing_dev_info, bdi_bits);
 }
 
 static int era_is_congested(struct dm_target_callbacks *cb, int bdi_bits)
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 0a427de23ed2..3ad16d9c9d5a 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1750,7 +1750,7 @@ int dm_table_any_congested(struct dm_table *t, int bdi_bits)
 		char b[BDEVNAME_SIZE];
 
 		if (likely(q))
-			r |= bdi_congested(&q->backing_dev_info, bdi_bits);
+			r |= bdi_congested(q->backing_dev_info, bdi_bits);
 		else
 			DMWARN_LIMIT("%s: any_congested: nonexistent device %s",
 				     dm_device_name(t->md),
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index d1c05c12a9db..95dfa3904975 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -2714,7 +2714,7 @@ static int pool_is_congested(struct dm_target_callbacks *cb, int bdi_bits)
 		return 1;
 
 	q = bdev_get_queue(pt->data_dev->bdev);
-	return bdi_congested(&q->backing_dev_info, bdi_bits);
+	return bdi_congested(q->backing_dev_info, bdi_bits);
 }
 
 static void requeue_bios(struct pool *pool)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3086da5664f3..db934b1dba9d 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1314,7 +1314,7 @@ static int dm_any_congested(void *congested_data, int bdi_bits)
 			 * With request-based DM we only need to check the
 			 * top-level queue for congestion.
 			 */
-			r = md->queue->backing_dev_info.wb.state & bdi_bits;
+			r = md->queue->backing_dev_info->wb.state & bdi_bits;
 		} else {
 			map = dm_get_live_table_fast(md);
 			if (map)
@@ -1397,7 +1397,7 @@ void dm_init_md_queue(struct mapped_device *md)
 	 * - must do so here (in alloc_dev callchain) before queue is used
 	 */
 	md->queue->queuedata = md;
-	md->queue->backing_dev_info.congested_data = md;
+	md->queue->backing_dev_info->congested_data = md;
 }
 
 void dm_init_normal_md_queue(struct mapped_device *md)
@@ -1408,7 +1408,7 @@ void dm_init_normal_md_queue(struct mapped_device *md)
 	/*
 	 * Initialize aspects of queue that aren't relevant for blk-mq
 	 */
-	md->queue->backing_dev_info.congested_fn = dm_any_congested;
+	md->queue->backing_dev_info->congested_fn = dm_any_congested;
 	blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY);
 }
 
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 5975c9915684..f1c7bbac31a5 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -62,7 +62,7 @@ static int linear_congested(struct mddev *mddev, int bits)
 
 	for (i = 0; i < mddev->raid_disks && !ret ; i++) {
 		struct request_queue *q = bdev_get_queue(conf->disks[i].rdev->bdev);
-		ret |= bdi_congested(&q->backing_dev_info, bits);
+		ret |= bdi_congested(q->backing_dev_info, bits);
 	}
 
 	return ret;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 82821ee0d57f..ede3b2ad45db 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5341,8 +5341,8 @@ int md_run(struct mddev *mddev)
 			queue_flag_set_unlocked(QUEUE_FLAG_NONROT, mddev->queue);
 		else
 			queue_flag_clear_unlocked(QUEUE_FLAG_NONROT, mddev->queue);
-		mddev->queue->backing_dev_info.congested_data = mddev;
-		mddev->queue->backing_dev_info.congested_fn = md_congested;
+		mddev->queue->backing_dev_info->congested_data = mddev;
+		mddev->queue->backing_dev_info->congested_fn = md_congested;
 	}
 	if (pers->sync_request) {
 		if (mddev->kobj.sd &&
@@ -5699,7 +5699,7 @@ static int do_md_stop(struct mddev *mddev, int mode,
 
 		__md_stop_writes(mddev);
 		__md_stop(mddev);
-		mddev->queue->backing_dev_info.congested_fn = NULL;
+		mddev->queue->backing_dev_info->congested_fn = NULL;
 
 		/* tell userspace to handle 'inactive' */
 		sysfs_notify_dirent_safe(mddev->sysfs_state);
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index aa8c4e5c1ee2..d457afa672d5 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -169,7 +169,7 @@ static int multipath_congested(struct mddev *mddev, int bits)
 		if (rdev && !test_bit(Faulty, &rdev->flags)) {
 			struct request_queue *q = bdev_get_queue(rdev->bdev);
 
-			ret |= bdi_congested(&q->backing_dev_info, bits);
+			ret |= bdi_congested(q->backing_dev_info, bits);
 			/* Just like multipath_map, we just check the
 			 * first available device
 			 */
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index a162fedeb51a..1ec22cdac8cb 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -36,7 +36,7 @@ static int raid0_congested(struct mddev *mddev, int bits)
 	for (i = 0; i < raid_disks && !ret ; i++) {
 		struct request_queue *q = bdev_get_queue(devlist[i]->bdev);
 
-		ret |= bdi_congested(&q->backing_dev_info, bits);
+		ret |= bdi_congested(q->backing_dev_info, bits);
 	}
 	return ret;
 }
@@ -415,8 +415,8 @@ static int raid0_run(struct mddev *mddev)
 		 */
 		int stripe = mddev->raid_disks *
 			(mddev->chunk_sectors << 9) / PAGE_SIZE;
-		if (mddev->queue->backing_dev_info.ra_pages < 2* stripe)
-			mddev->queue->backing_dev_info.ra_pages = 2* stripe;
+		if (mddev->queue->backing_dev_info->ra_pages < 2* stripe)
+			mddev->queue->backing_dev_info->ra_pages = 2* stripe;
 	}
 
 	dump_zones(mddev);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a1f3fbed9100..c5a128ec03fb 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -740,9 +740,9 @@ static int raid1_congested(struct mddev *mddev, int bits)
 			 * non-congested targets, it can be removed
 			 */
 			if ((bits & (1 << WB_async_congested)) || 1)
-				ret |= bdi_congested(&q->backing_dev_info, bits);
+				ret |= bdi_congested(q->backing_dev_info, bits);
 			else
-				ret &= bdi_congested(&q->backing_dev_info, bits);
+				ret &= bdi_congested(q->backing_dev_info, bits);
 		}
 	}
 	rcu_read_unlock();
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index ab5e86209322..d15028f63095 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -860,7 +860,7 @@ static int raid10_congested(struct mddev *mddev, int bits)
 		if (rdev && !test_bit(Faulty, &rdev->flags)) {
 			struct request_queue *q = bdev_get_queue(rdev->bdev);
 
-			ret |= bdi_congested(&q->backing_dev_info, bits);
+			ret |= bdi_congested(q->backing_dev_info, bits);
 		}
 	}
 	rcu_read_unlock();
@@ -3806,8 +3806,8 @@ static int raid10_run(struct mddev *mddev)
 		 * maybe...
 		 */
 		stripe /= conf->geo.near_copies;
-		if (mddev->queue->backing_dev_info.ra_pages < 2 * stripe)
-			mddev->queue->backing_dev_info.ra_pages = 2 * stripe;
+		if (mddev->queue->backing_dev_info->ra_pages < 2 * stripe)
+			mddev->queue->backing_dev_info->ra_pages = 2 * stripe;
 	}
 
 	if (md_integrity_register(mddev))
@@ -4608,8 +4608,8 @@ static void end_reshape(struct r10conf *conf)
 		int stripe = conf->geo.raid_disks *
 			((conf->mddev->chunk_sectors << 9) / PAGE_SIZE);
 		stripe /= conf->geo.near_copies;
-		if (conf->mddev->queue->backing_dev_info.ra_pages < 2 * stripe)
-			conf->mddev->queue->backing_dev_info.ra_pages = 2 * stripe;
+		if (conf->mddev->queue->backing_dev_info->ra_pages < 2 * stripe)
+			conf->mddev->queue->backing_dev_info->ra_pages = 2 * stripe;
 	}
 	conf->fullsync = 0;
 }
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 06d7279bdd04..e075cdae5b5d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6262,10 +6262,10 @@ raid5_store_skip_copy(struct mddev *mddev, const char *page, size_t len)
 		mddev_suspend(mddev);
 		conf->skip_copy = new;
 		if (new)
-			mddev->queue->backing_dev_info.capabilities |=
+			mddev->queue->backing_dev_info->capabilities |=
 				BDI_CAP_STABLE_WRITES;
 		else
-			mddev->queue->backing_dev_info.capabilities &=
+			mddev->queue->backing_dev_info->capabilities &=
 				~BDI_CAP_STABLE_WRITES;
 		mddev_resume(mddev);
 	}
@@ -7084,8 +7084,8 @@ static int raid5_run(struct mddev *mddev)
 		int data_disks = conf->previous_raid_disks - conf->max_degraded;
 		int stripe = data_disks *
 			((mddev->chunk_sectors << 9) / PAGE_SIZE);
-		if (mddev->queue->backing_dev_info.ra_pages < 2 * stripe)
-			mddev->queue->backing_dev_info.ra_pages = 2 * stripe;
+		if (mddev->queue->backing_dev_info->ra_pages < 2 * stripe)
+			mddev->queue->backing_dev_info->ra_pages = 2 * stripe;
 
 		chunk_size = mddev->chunk_sectors << 9;
 		blk_queue_io_min(mddev->queue, chunk_size);
@@ -7694,8 +7694,8 @@ static void end_reshape(struct r5conf *conf)
 			int data_disks = conf->raid_disks - conf->max_degraded;
 			int stripe = data_disks * ((conf->chunk_sectors << 9)
 						   / PAGE_SIZE);
-			if (conf->mddev->queue->backing_dev_info.ra_pages < 2 * stripe)
-				conf->mddev->queue->backing_dev_info.ra_pages = 2 * stripe;
+			if (conf->mddev->queue->backing_dev_info->ra_pages < 2 * stripe)
+				conf->mddev->queue->backing_dev_info->ra_pages = 2 * stripe;
 		}
 	}
 }
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index a34308df927f..c0e5b9a8bf5f 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1226,7 +1226,7 @@ static int set_gfs2_super(struct super_block *s, void *data)
 	 * We set the bdi here to the queue backing, file systems can
 	 * overwrite this in ->fill_super()
 	 */
-	s->s_bdi = &bdev_get_queue(s->s_bdev)->backing_dev_info;
+	s->s_bdi = bdev_get_queue(s->s_bdev)->backing_dev_info;
 	return 0;
 }
 
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 12eeae62a2b1..e1872f36147f 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1068,7 +1068,7 @@ nilfs_fill_super(struct super_block *sb, void *data, int silent)
 	sb->s_time_gran = 1;
 	sb->s_max_links = NILFS_LINK_MAX;
 
-	sb->s_bdi = &bdev_get_queue(sb->s_bdev)->backing_dev_info;
+	sb->s_bdi = bdev_get_queue(sb->s_bdev)->backing_dev_info;
 
 	err = load_nilfs(nilfs, sb);
 	if (err)
diff --git a/fs/super.c b/fs/super.c
index 1709ed029a2c..ea662b0e5e78 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1047,7 +1047,7 @@ static int set_bdev_super(struct super_block *s, void *data)
 	 * We set the bdi here to the queue backing, file systems can
 	 * overwrite this in ->fill_super()
 	 */
-	s->s_bdi = &bdev_get_queue(s->s_bdev)->backing_dev_info;
+	s->s_bdi = bdev_get_queue(s->s_bdev)->backing_dev_info;
 	return 0;
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 83695641bd5e..069e4a102a73 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -432,7 +432,8 @@ struct request_queue {
 	 */
 	struct delayed_work	delay_work;
 
-	struct backing_dev_info	backing_dev_info;
+	struct backing_dev_info	*backing_dev_info;
+	struct backing_dev_info	_backing_dev_info;
 
 	/*
 	 * The queue owner gets to use this for whatever they like.
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 290e8b7d3181..216449825859 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1988,11 +1988,11 @@ void laptop_mode_timer_fn(unsigned long data)
 	 * We want to write everything out, not just down to the dirty
 	 * threshold
 	 */
-	if (!bdi_has_dirty_io(&q->backing_dev_info))
+	if (!bdi_has_dirty_io(q->backing_dev_info))
 		return;
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(wb, &q->backing_dev_info.wb_list, bdi_node)
+	list_for_each_entry_rcu(wb, &q->backing_dev_info->wb_list, bdi_node)
 		if (wb_has_dirty_io(wb))
 			wb_start_writeback(wb, nr_pages, true,
 					   WB_REASON_LAPTOP_TIMER);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info
  2017-01-26 17:45 [PATCH 0/4 RFC] BDI lifetime fix Jan Kara
  2017-01-26 17:45 ` [PATCH 1/4] block: Unhash block device inodes on gendisk destruction Jan Kara
  2017-01-26 17:45 ` [PATCH 2/4] block: Use pointer to backing_dev_info from request_queue Jan Kara
@ 2017-01-26 17:45 ` Jan Kara
  2017-01-26 20:41   ` Dan Williams
  2017-01-26 17:45 ` [PATCH 4/4] block: Make blk_get_backing_dev_info() safe without open bdev Jan Kara
  2017-01-27  6:15 ` [PATCH 0/4 RFC] BDI lifetime fix Dan Williams
  4 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2017-01-26 17:45 UTC (permalink / raw)
  To: linux-block
  Cc: Christoph Hellwig, Jens Axboe, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour, Jan Kara

Instead of storing backing_dev_info inside struct request_queue,
allocate it dynamically, reference count it, and free it when the last
reference is dropped. Currently only request_queue holds the reference
but in the following patch we add other users referencing
backing_dev_info.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-core.c                 | 10 ++++------
 block/blk-sysfs.c                |  2 +-
 include/linux/backing-dev-defs.h |  4 +++-
 include/linux/backing-dev.h      | 10 +++++++++-
 include/linux/blkdev.h           |  1 -
 mm/backing-dev.c                 | 35 +++++++++++++++++++++++++++++++----
 6 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a9ff1b919ae7..5613d3e0821e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -693,7 +693,6 @@ static void blk_rq_timed_out_timer(unsigned long data)
 struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 {
 	struct request_queue *q;
-	int err;
 
 	q = kmem_cache_alloc_node(blk_requestq_cachep,
 				gfp_mask | __GFP_ZERO, node_id);
@@ -708,17 +707,16 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	if (!q->bio_split)
 		goto fail_id;
 
-	q->backing_dev_info = &q->_backing_dev_info;
+	q->backing_dev_info = bdi_alloc_node(gfp_mask, node_id);
+	if (!q->backing_dev_info)
+		goto fail_split;
+
 	q->backing_dev_info->ra_pages =
 			(VM_MAX_READAHEAD * 1024) / PAGE_SIZE;
 	q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
 	q->backing_dev_info->name = "block";
 	q->node = node_id;
 
-	err = bdi_init(q->backing_dev_info);
-	if (err)
-		goto fail_split;
-
 	setup_timer(&q->backing_dev_info->laptop_mode_wb_timer,
 		    laptop_mode_timer_fn, (unsigned long) q);
 	setup_timer(&q->timeout, blk_rq_timed_out_timer, (unsigned long) q);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 64fb54c6b41c..4cbaa519ec2d 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -799,7 +799,7 @@ static void blk_release_queue(struct kobject *kobj)
 		container_of(kobj, struct request_queue, kobj);
 
 	wbt_exit(q);
-	bdi_exit(q->backing_dev_info);
+	bdi_put(q->backing_dev_info);
 	blkcg_exit_queue(q);
 
 	if (q->elevator) {
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index e850e76acaaf..4282f21b1611 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -144,7 +144,9 @@ struct backing_dev_info {
 
 	char *name;
 
-	unsigned int capabilities; /* Device capabilities */
+	atomic_t refcnt;	/* Reference counter for the structure */
+	unsigned int capabilities:31;	/* Device capabilities */
+	unsigned int free_on_put:1;	/* Structure will be freed on last bdi_put() */
 	unsigned int min_ratio;
 	unsigned int max_ratio, max_prop_frac;
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 43b93a947e61..c1601c23391a 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -18,7 +18,14 @@
 #include <linux/slab.h>
 
 int __must_check bdi_init(struct backing_dev_info *bdi);
-void bdi_exit(struct backing_dev_info *bdi);
+
+static inline struct backing_dev_info *bdi_get(struct backing_dev_info *bdi)
+{
+	atomic_inc(&bdi->refcnt);
+	return bdi;
+}
+
+void bdi_put(struct backing_dev_info *bdi);
 
 __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
@@ -29,6 +36,7 @@ void bdi_unregister(struct backing_dev_info *bdi);
 
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
 void bdi_destroy(struct backing_dev_info *bdi);
+struct backing_dev_info *bdi_alloc_node(gfp_t gfp_mask, int node_id);
 
 void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
 			bool range_cyclic, enum wb_reason reason);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 069e4a102a73..de85701cc699 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -433,7 +433,6 @@ struct request_queue {
 	struct delayed_work	delay_work;
 
 	struct backing_dev_info	*backing_dev_info;
-	struct backing_dev_info	_backing_dev_info;
 
 	/*
 	 * The queue owner gets to use this for whatever they like.
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 3bfed5ab2475..14196e43b9ca 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -20,6 +20,8 @@ struct backing_dev_info noop_backing_dev_info = {
 };
 EXPORT_SYMBOL_GPL(noop_backing_dev_info);
 
+static struct kmem_cache *bdi_cachep;
+
 static struct class *bdi_class;
 
 /*
@@ -237,6 +239,9 @@ static __init int bdi_class_init(void)
 
 	bdi_class->dev_groups = bdi_dev_groups;
 	bdi_debug_init();
+
+	bdi_cachep = KMEM_CACHE(backing_dev_info, SLAB_PANIC);
+
 	return 0;
 }
 postcore_initcall(bdi_class_init);
@@ -776,6 +781,7 @@ int bdi_init(struct backing_dev_info *bdi)
 
 	bdi->dev = NULL;
 
+	atomic_set(&bdi->refcnt, 1);
 	bdi->min_ratio = 0;
 	bdi->max_ratio = 100;
 	bdi->max_prop_frac = FPROP_FRAC_BASE;
@@ -791,6 +797,23 @@ int bdi_init(struct backing_dev_info *bdi)
 }
 EXPORT_SYMBOL(bdi_init);
 
+struct backing_dev_info *bdi_alloc_node(gfp_t gfp_mask, int node_id)
+{
+	struct backing_dev_info *bdi;
+
+	bdi = kmem_cache_alloc_node(bdi_cachep, gfp_mask | __GFP_ZERO, node_id);
+	if (!bdi)
+		return NULL;
+
+	if (bdi_init(bdi)) {
+		kmem_cache_free(bdi_cachep, bdi);
+		return NULL;
+	}
+	bdi->free_on_put = 1;
+
+	return bdi;
+}
+
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 		const char *fmt, ...)
 {
@@ -871,16 +894,20 @@ void bdi_unregister(struct backing_dev_info *bdi)
 	}
 }
 
-void bdi_exit(struct backing_dev_info *bdi)
+void bdi_put(struct backing_dev_info *bdi)
 {
-	WARN_ON_ONCE(bdi->dev);
-	wb_exit(&bdi->wb);
+	if (atomic_dec_and_test(&bdi->refcnt)) {
+		WARN_ON_ONCE(bdi->dev);
+		wb_exit(&bdi->wb);
+		if (bdi->free_on_put)
+			kmem_cache_free(bdi_cachep, bdi);
+	}
 }
 
 void bdi_destroy(struct backing_dev_info *bdi)
 {
 	bdi_unregister(bdi);
-	bdi_exit(bdi);
+	bdi_put(bdi);
 }
 EXPORT_SYMBOL(bdi_destroy);
 
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] block: Make blk_get_backing_dev_info() safe without open bdev
  2017-01-26 17:45 [PATCH 0/4 RFC] BDI lifetime fix Jan Kara
                   ` (2 preceding siblings ...)
  2017-01-26 17:45 ` [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info Jan Kara
@ 2017-01-26 17:45 ` Jan Kara
  2017-01-27  6:15 ` [PATCH 0/4 RFC] BDI lifetime fix Dan Williams
  4 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2017-01-26 17:45 UTC (permalink / raw)
  To: linux-block
  Cc: Christoph Hellwig, Jens Axboe, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour, Jan Kara

Currenly blk_get_backing_dev_info() is not safe to be called when the
block device is not open as bdev->bd_disk is NULL in that case. However
inode_to_bdi() uses this function and may be call called from flusher
worker or other writeback related functions without bdev being open
which leads to crashes such as:

[113031.075540] Unable to handle kernel paging request for data at address 0x00000000
[113031.075614] Faulting instruction address: 0xc0000000003692e0
0:mon> t
[c0000000fb65f900] c00000000036cb6c writeback_sb_inodes+0x30c/0x590
[c0000000fb65fa10] c00000000036ced4 __writeback_inodes_wb+0xe4/0x150
[c0000000fb65fa70] c00000000036d33c wb_writeback+0x30c/0x450
[c0000000fb65fb40] c00000000036e198 wb_workfn+0x268/0x580
[c0000000fb65fc50] c0000000000f3470 process_one_work+0x1e0/0x590
[c0000000fb65fce0] c0000000000f38c8 worker_thread+0xa8/0x660
[c0000000fb65fd80] c0000000000fc4b0 kthread+0x110/0x130
[c0000000fb65fe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c
--- Exception: 0  at 0000000000000000
0:mon> e
cpu 0x0: Vector: 300 (Data Access) at [c0000000fb65f620]
    pc: c0000000003692e0: locked_inode_to_wb_and_lock_list+0x50/0x290
    lr: c00000000036cb6c: writeback_sb_inodes+0x30c/0x590
    sp: c0000000fb65f8a0
   msr: 800000010280b033
   dar: 0
 dsisr: 40000000
  current = 0xc0000001d69be400
  paca    = 0xc000000003480000   softe: 0        irq_happened: 0x01
    pid   = 18689, comm = kworker/u16:10

Fix the problem by grabbing reference to bdi on first open of the block
device and drop the reference only once the inode is evicted from
memory. This pins struct backing_dev_info in memory and thus fixes the
crashes.

Reported-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-core.c   | 8 +++-----
 fs/block_dev.c     | 7 +++++++
 include/linux/fs.h | 1 +
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 5613d3e0821e..34056a37361c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -109,14 +109,12 @@ void blk_queue_congestion_threshold(struct request_queue *q)
  * @bdev:	device
  *
  * Locates the passed device's request queue and returns the address of its
- * backing_dev_info.  This function can only be called if @bdev is opened
- * and the return value is never NULL.
+ * backing_dev_info. The return value is never NULL however we may return
+ * &noop_backing_dev_info if the bdev is not currently open.
  */
 struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev)
 {
-	struct request_queue *q = bdev_get_queue(bdev);
-
-	return q->backing_dev_info;
+	return bdev->bd_bdi;
 }
 EXPORT_SYMBOL(blk_get_backing_dev_info);
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index ed6a34be7a1e..601b71b76d7f 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -884,6 +884,8 @@ static void bdev_evict_inode(struct inode *inode)
 	spin_lock(&bdev_lock);
 	list_del_init(&bdev->bd_list);
 	spin_unlock(&bdev_lock);
+	if (bdev->bd_bdi != &noop_backing_dev_info)
+		bdi_put(bdev->bd_bdi);
 }
 
 static const struct super_operations bdev_sops = {
@@ -986,6 +988,7 @@ struct block_device *bdget(dev_t dev)
 		bdev->bd_contains = NULL;
 		bdev->bd_super = NULL;
 		bdev->bd_inode = inode;
+		bdev->bd_bdi = &noop_backing_dev_info;
 		bdev->bd_block_size = (1 << inode->i_blkbits);
 		bdev->bd_part_count = 0;
 		bdev->bd_invalidated = 0;
@@ -1542,6 +1545,8 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 		bdev->bd_disk = disk;
 		bdev->bd_queue = disk->queue;
 		bdev->bd_contains = bdev;
+		if (bdev->bd_bdi == &noop_backing_dev_info)
+			bdev->bd_bdi = bdi_get(disk->queue->backing_dev_info);
 
 		if (!partno) {
 			ret = -ENXIO;
@@ -1637,6 +1642,8 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 	bdev->bd_disk = NULL;
 	bdev->bd_part = NULL;
 	bdev->bd_queue = NULL;
+	bdi_put(bdev->bd_bdi);
+	bdev->bd_bdi = &noop_backing_dev_info;
 	if (bdev != bdev->bd_contains)
 		__blkdev_put(bdev->bd_contains, mode, 1);
 	bdev->bd_contains = NULL;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 702cb6c50194..c930cbc19342 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -423,6 +423,7 @@ struct block_device {
 	int			bd_invalidated;
 	struct gendisk *	bd_disk;
 	struct request_queue *  bd_queue;
+	struct backing_dev_info *bd_bdi;
 	struct list_head	bd_list;
 	/*
 	 * Private data.  You must have bd_claim'ed the block_device
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info
  2017-01-26 17:45 ` [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info Jan Kara
@ 2017-01-26 20:41   ` Dan Williams
  2017-01-30 12:47     ` Jan Kara
  0 siblings, 1 reply; 17+ messages in thread
From: Dan Williams @ 2017-01-26 20:41 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-block, Christoph Hellwig, Jens Axboe,
	Thiago Jung Bauermann, Laurent Dufour

On Thu, Jan 26, 2017 at 9:45 AM, Jan Kara <jack@suse.cz> wrote:
> Instead of storing backing_dev_info inside struct request_queue,
> allocate it dynamically, reference count it, and free it when the last
> reference is dropped. Currently only request_queue holds the reference
> but in the following patch we add other users referencing
> backing_dev_info.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
[..]
> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
> index e850e76acaaf..4282f21b1611 100644
> --- a/include/linux/backing-dev-defs.h
> +++ b/include/linux/backing-dev-defs.h
> @@ -144,7 +144,9 @@ struct backing_dev_info {
>
>         char *name;
>
> -       unsigned int capabilities; /* Device capabilities */
> +       atomic_t refcnt;        /* Reference counter for the structure */
> +       unsigned int capabilities:31;   /* Device capabilities */
> +       unsigned int free_on_put:1;     /* Structure will be freed on last bdi_put() */
>         unsigned int min_ratio;
>         unsigned int max_ratio, max_prop_frac;
>

Any reason to not just use struct kref for this? The "free on final
put" should be implicit. In other words, if there is a path that would
ever clear this flag, then it really should just be holding its own
reference.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4 RFC] BDI lifetime fix
  2017-01-26 17:45 [PATCH 0/4 RFC] BDI lifetime fix Jan Kara
                   ` (3 preceding siblings ...)
  2017-01-26 17:45 ` [PATCH 4/4] block: Make blk_get_backing_dev_info() safe without open bdev Jan Kara
@ 2017-01-27  6:15 ` Dan Williams
  2017-01-27 19:49   ` Omar Sandoval
                     ` (2 more replies)
  4 siblings, 3 replies; 17+ messages in thread
From: Dan Williams @ 2017-01-27  6:15 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-block, Christoph Hellwig, Jens Axboe,
	Thiago Jung Bauermann, Laurent Dufour

On Thu, Jan 26, 2017 at 9:45 AM, Jan Kara <jack@suse.cz> wrote:
> Hello,
>
> this patch series attempts to solve the problems with the life time of a
> backing_dev_info structure. Currently it lives inside request_queue structure
> and thus it gets destroyed as soon as request queue goes away. However
> the block device inode still stays around and thus inode_to_bdi() call on
> that inode (e.g. from flusher worker) may happen after request queue has been
> destroyed resulting in oops.
>
> This patch set tries to solve these problems by making backing_dev_info
> independent structure referenced from block device inode. That makes sure
> inode_to_bdi() cannot ever oops. The patches are lightly tested for now
> (they boot, basic tests with adding & removing loop devices seem to do what
> I'd expect them to do ;). If someone is able to reproduce crashes on bdi
> when device goes away, please test these patches.

This survives a several runs of the libnvdimm unit tests which stress
del_gendisk() and blk_cleanup_queue(). I'll keep testing since the
failure was intermittent, but this is looking good.

> I'd also appreciate if people had a look whether the approach I took looks
> sensible.

Looks sensible, just the kref comment.

I also don't see a need to try to tag on the bdi device name reuse
into this series. I'm wondering if we can handle that separately with
device_rename(bdi->dev, ...) when we know scsi is done with the old
bdi but it has not finished being deleted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4 RFC] BDI lifetime fix
  2017-01-27  6:15 ` [PATCH 0/4 RFC] BDI lifetime fix Dan Williams
@ 2017-01-27 19:49   ` Omar Sandoval
  2017-01-30 17:19   ` Jan Kara
  2017-01-31 11:20   ` Jan Kara
  2 siblings, 0 replies; 17+ messages in thread
From: Omar Sandoval @ 2017-01-27 19:49 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, linux-block, Christoph Hellwig, Jens Axboe,
	Thiago Jung Bauermann, Laurent Dufour

[-- Attachment #1: Type: text/plain, Size: 1787 bytes --]

On Thu, Jan 26, 2017 at 10:15:06PM -0800, Dan Williams wrote:
> On Thu, Jan 26, 2017 at 9:45 AM, Jan Kara <jack@suse.cz> wrote:
> > Hello,
> >
> > this patch series attempts to solve the problems with the life time of a
> > backing_dev_info structure. Currently it lives inside request_queue structure
> > and thus it gets destroyed as soon as request queue goes away. However
> > the block device inode still stays around and thus inode_to_bdi() call on
> > that inode (e.g. from flusher worker) may happen after request queue has been
> > destroyed resulting in oops.
> >
> > This patch set tries to solve these problems by making backing_dev_info
> > independent structure referenced from block device inode. That makes sure
> > inode_to_bdi() cannot ever oops. The patches are lightly tested for now
> > (they boot, basic tests with adding & removing loop devices seem to do what
> > I'd expect them to do ;). If someone is able to reproduce crashes on bdi
> > when device goes away, please test these patches.
> 
> This survives a several runs of the libnvdimm unit tests which stress
> del_gendisk() and blk_cleanup_queue(). I'll keep testing since the
> failure was intermittent, but this is looking good.
> 
> > I'd also appreciate if people had a look whether the approach I took looks
> > sensible.
> 
> Looks sensible, just the kref comment.
> 
> I also don't see a need to try to tag on the bdi device name reuse
> into this series. I'm wondering if we can handle that separately with
> device_rename(bdi->dev, ...) when we know scsi is done with the old
> bdi but it has not finished being deleted

What's the status of the device name issue? We're hitting it a lot here.
It's really easy to reproduce with scsi_debug, script attached. I'd be
happy to test out any patches.

[-- Attachment #2: stress_test_scsi_debug.sh --]
[-- Type: application/x-sh, Size: 1445 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info
  2017-01-26 20:41   ` Dan Williams
@ 2017-01-30 12:47     ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2017-01-30 12:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, linux-block, Christoph Hellwig, Jens Axboe,
	Thiago Jung Bauermann, Laurent Dufour

On Thu 26-01-17 12:41:30, Dan Williams wrote:
> On Thu, Jan 26, 2017 at 9:45 AM, Jan Kara <jack@suse.cz> wrote:
> > Instead of storing backing_dev_info inside struct request_queue,
> > allocate it dynamically, reference count it, and free it when the last
> > reference is dropped. Currently only request_queue holds the reference
> > but in the following patch we add other users referencing
> > backing_dev_info.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> [..]
> > diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
> > index e850e76acaaf..4282f21b1611 100644
> > --- a/include/linux/backing-dev-defs.h
> > +++ b/include/linux/backing-dev-defs.h
> > @@ -144,7 +144,9 @@ struct backing_dev_info {
> >
> >         char *name;
> >
> > -       unsigned int capabilities; /* Device capabilities */
> > +       atomic_t refcnt;        /* Reference counter for the structure */
> > +       unsigned int capabilities:31;   /* Device capabilities */
> > +       unsigned int free_on_put:1;     /* Structure will be freed on last bdi_put() */
> >         unsigned int min_ratio;
> >         unsigned int max_ratio, max_prop_frac;
> >
> 
> Any reason to not just use struct kref for this? The "free on final
> put" should be implicit. In other words, if there is a path that would
> ever clear this flag, then it really should just be holding its own
> reference.

The free_on_put is there for static incarnations of this structure...
However yes, I can have a look into grabbing the reference on module init
for these cases so that we can avoid the flag.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4 RFC] BDI lifetime fix
  2017-01-27  6:15 ` [PATCH 0/4 RFC] BDI lifetime fix Dan Williams
  2017-01-27 19:49   ` Omar Sandoval
@ 2017-01-30 17:19   ` Jan Kara
  2017-01-30 17:27     ` Dan Williams
  2017-01-31 11:20   ` Jan Kara
  2 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2017-01-30 17:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, linux-block, Christoph Hellwig, Jens Axboe,
	Thiago Jung Bauermann, Laurent Dufour

On Thu 26-01-17 22:15:06, Dan Williams wrote:
> On Thu, Jan 26, 2017 at 9:45 AM, Jan Kara <jack@suse.cz> wrote:
> > Hello,
> >
> > this patch series attempts to solve the problems with the life time of a
> > backing_dev_info structure. Currently it lives inside request_queue structure
> > and thus it gets destroyed as soon as request queue goes away. However
> > the block device inode still stays around and thus inode_to_bdi() call on
> > that inode (e.g. from flusher worker) may happen after request queue has been
> > destroyed resulting in oops.
> >
> > This patch set tries to solve these problems by making backing_dev_info
> > independent structure referenced from block device inode. That makes sure
> > inode_to_bdi() cannot ever oops. The patches are lightly tested for now
> > (they boot, basic tests with adding & removing loop devices seem to do what
> > I'd expect them to do ;). If someone is able to reproduce crashes on bdi
> > when device goes away, please test these patches.
> 
> This survives a several runs of the libnvdimm unit tests which stress
> del_gendisk() and blk_cleanup_queue(). I'll keep testing since the
> failure was intermittent, but this is looking good.

Thanks for testing!

> > I'd also appreciate if people had a look whether the approach I took looks
> > sensible.
> 
> Looks sensible, just the kref comment.
> 
> I also don't see a need to try to tag on the bdi device name reuse
> into this series. I'm wondering if we can handle that separately with
> device_rename(bdi->dev, ...) when we know scsi is done with the old
> bdi but it has not finished being deleted

Do you mean I should not speak about it in the changelog? The problems I
have are not as much with reusing device *name* here (and resulting sysfs
conflicts) but rather a major:minor number pair which results in reusing
block device inode and we are not prepared for that since the bdi
associated with that inode may be already unregistered and reusing it would
be difficult.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4 RFC] BDI lifetime fix
  2017-01-30 17:19   ` Jan Kara
@ 2017-01-30 17:27     ` Dan Williams
  0 siblings, 0 replies; 17+ messages in thread
From: Dan Williams @ 2017-01-30 17:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-block, Christoph Hellwig, Jens Axboe,
	Thiago Jung Bauermann, Laurent Dufour

On Mon, Jan 30, 2017 at 9:19 AM, Jan Kara <jack@suse.cz> wrote:
> On Thu 26-01-17 22:15:06, Dan Williams wrote:
>> On Thu, Jan 26, 2017 at 9:45 AM, Jan Kara <jack@suse.cz> wrote:
>> > Hello,
>> >
>> > this patch series attempts to solve the problems with the life time of a
>> > backing_dev_info structure. Currently it lives inside request_queue structure
>> > and thus it gets destroyed as soon as request queue goes away. However
>> > the block device inode still stays around and thus inode_to_bdi() call on
>> > that inode (e.g. from flusher worker) may happen after request queue has been
>> > destroyed resulting in oops.
>> >
>> > This patch set tries to solve these problems by making backing_dev_info
>> > independent structure referenced from block device inode. That makes sure
>> > inode_to_bdi() cannot ever oops. The patches are lightly tested for now
>> > (they boot, basic tests with adding & removing loop devices seem to do what
>> > I'd expect them to do ;). If someone is able to reproduce crashes on bdi
>> > when device goes away, please test these patches.
>>
>> This survives a several runs of the libnvdimm unit tests which stress
>> del_gendisk() and blk_cleanup_queue(). I'll keep testing since the
>> failure was intermittent, but this is looking good.
>
> Thanks for testing!
>
>> > I'd also appreciate if people had a look whether the approach I took looks
>> > sensible.
>>
>> Looks sensible, just the kref comment.
>>
>> I also don't see a need to try to tag on the bdi device name reuse
>> into this series. I'm wondering if we can handle that separately with
>> device_rename(bdi->dev, ...) when we know scsi is done with the old
>> bdi but it has not finished being deleted
>
> Do you mean I should not speak about it in the changelog? The problems I
> have are not as much with reusing device *name* here (and resulting sysfs
> conflicts) but rather a major:minor number pair which results in reusing
> block device inode and we are not prepared for that since the bdi
> associated with that inode may be already unregistered and reusing it would
> be difficult.

No, sorry for the confusion, changelog is fine.

I think others were expecting that this series addressed the
"duplicate bdi device" warnings, and I was clarifying that this series
is for something else.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4 RFC] BDI lifetime fix
  2017-01-27  6:15 ` [PATCH 0/4 RFC] BDI lifetime fix Dan Williams
  2017-01-27 19:49   ` Omar Sandoval
  2017-01-30 17:19   ` Jan Kara
@ 2017-01-31 11:20   ` Jan Kara
  2 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2017-01-31 11:20 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, linux-block, Christoph Hellwig, Jens Axboe,
	Thiago Jung Bauermann, Laurent Dufour

On Thu 26-01-17 22:15:06, Dan Williams wrote:
> On Thu, Jan 26, 2017 at 9:45 AM, Jan Kara <jack@suse.cz> wrote:
> > Hello,
> >
> > this patch series attempts to solve the problems with the life time of a
> > backing_dev_info structure. Currently it lives inside request_queue structure
> > and thus it gets destroyed as soon as request queue goes away. However
> > the block device inode still stays around and thus inode_to_bdi() call on
> > that inode (e.g. from flusher worker) may happen after request queue has been
> > destroyed resulting in oops.
> >
> > This patch set tries to solve these problems by making backing_dev_info
> > independent structure referenced from block device inode. That makes sure
> > inode_to_bdi() cannot ever oops. The patches are lightly tested for now
> > (they boot, basic tests with adding & removing loop devices seem to do what
> > I'd expect them to do ;). If someone is able to reproduce crashes on bdi
> > when device goes away, please test these patches.
> 
> This survives a several runs of the libnvdimm unit tests which stress
> del_gendisk() and blk_cleanup_queue(). I'll keep testing since the
> failure was intermittent, but this is looking good.

Can I add your Tested-by tag?

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info
  2017-02-01 22:45       ` Jens Axboe
@ 2017-02-02 13:32         ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2017-02-02 13:32 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jan Kara, Christoph Hellwig, linux-block, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour

On Wed 01-02-17 14:45:20, Jens Axboe wrote:
> On 02/01/2017 04:22 AM, Jan Kara wrote:
> > On Wed 01-02-17 01:50:07, Christoph Hellwig wrote:
> >> On Tue, Jan 31, 2017 at 01:54:28PM +0100, Jan Kara wrote:
> >>> Instead of storing backing_dev_info inside struct request_queue,
> >>> allocate it dynamically, reference count it, and free it when the last
> >>> reference is dropped. Currently only request_queue holds the reference
> >>> but in the following patch we add other users referencing
> >>> backing_dev_info.
> >>
> >> Do we really need the separate slab cache?  Otherwise this looks
> >> fine to me.
> > 
> > Yeah, probably it is not worth it. I'll remove it.
> 
> I agree on that, it should not be a hot path. Will you respin the series
> after making this change? Would be great to get this queued up.

Yes, will send it later today. I was just waiting whether someone else does
not have more comments to the series.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info
  2017-02-01 12:22     ` Jan Kara
@ 2017-02-01 22:45       ` Jens Axboe
  2017-02-02 13:32         ` Jan Kara
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2017-02-01 22:45 UTC (permalink / raw)
  To: Jan Kara, Christoph Hellwig
  Cc: linux-block, Dan Williams, Thiago Jung Bauermann, Laurent Dufour

On 02/01/2017 04:22 AM, Jan Kara wrote:
> On Wed 01-02-17 01:50:07, Christoph Hellwig wrote:
>> On Tue, Jan 31, 2017 at 01:54:28PM +0100, Jan Kara wrote:
>>> Instead of storing backing_dev_info inside struct request_queue,
>>> allocate it dynamically, reference count it, and free it when the last
>>> reference is dropped. Currently only request_queue holds the reference
>>> but in the following patch we add other users referencing
>>> backing_dev_info.
>>
>> Do we really need the separate slab cache?  Otherwise this looks
>> fine to me.
> 
> Yeah, probably it is not worth it. I'll remove it.

I agree on that, it should not be a hot path. Will you respin the series
after making this change? Would be great to get this queued up.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info
  2017-02-01  9:50   ` Christoph Hellwig
@ 2017-02-01 12:22     ` Jan Kara
  2017-02-01 22:45       ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2017-02-01 12:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Jens Axboe, linux-block, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour

On Wed 01-02-17 01:50:07, Christoph Hellwig wrote:
> On Tue, Jan 31, 2017 at 01:54:28PM +0100, Jan Kara wrote:
> > Instead of storing backing_dev_info inside struct request_queue,
> > allocate it dynamically, reference count it, and free it when the last
> > reference is dropped. Currently only request_queue holds the reference
> > but in the following patch we add other users referencing
> > backing_dev_info.
> 
> Do we really need the separate slab cache?  Otherwise this looks
> fine to me.

Yeah, probably it is not worth it. I'll remove it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info
  2017-01-31 12:54 ` [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info Jan Kara
@ 2017-02-01  9:50   ` Christoph Hellwig
  2017-02-01 12:22     ` Jan Kara
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2017-02-01  9:50 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour

On Tue, Jan 31, 2017 at 01:54:28PM +0100, Jan Kara wrote:
> Instead of storing backing_dev_info inside struct request_queue,
> allocate it dynamically, reference count it, and free it when the last
> reference is dropped. Currently only request_queue holds the reference
> but in the following patch we add other users referencing
> backing_dev_info.

Do we really need the separate slab cache?  Otherwise this looks
fine to me.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info
  2017-01-31 12:54 [PATCH 0/4 v2] " Jan Kara
@ 2017-01-31 12:54 ` Jan Kara
  2017-02-01  9:50   ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2017-01-31 12:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Laurent Dufour, Jan Kara

Instead of storing backing_dev_info inside struct request_queue,
allocate it dynamically, reference count it, and free it when the last
reference is dropped. Currently only request_queue holds the reference
but in the following patch we add other users referencing
backing_dev_info.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-core.c                 | 12 +++++-------
 block/blk-sysfs.c                |  2 +-
 include/linux/backing-dev-defs.h |  2 ++
 include/linux/backing-dev.h      | 10 +++++++++-
 include/linux/blkdev.h           |  1 -
 mm/backing-dev.c                 | 37 ++++++++++++++++++++++++++++++++++++-
 6 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a9ff1b919ae7..545ccb4b96f3 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -693,7 +693,6 @@ static void blk_rq_timed_out_timer(unsigned long data)
 struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 {
 	struct request_queue *q;
-	int err;
 
 	q = kmem_cache_alloc_node(blk_requestq_cachep,
 				gfp_mask | __GFP_ZERO, node_id);
@@ -708,17 +707,16 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	if (!q->bio_split)
 		goto fail_id;
 
-	q->backing_dev_info = &q->_backing_dev_info;
+	q->backing_dev_info = bdi_alloc_node(gfp_mask, node_id);
+	if (!q->backing_dev_info)
+		goto fail_split;
+
 	q->backing_dev_info->ra_pages =
 			(VM_MAX_READAHEAD * 1024) / PAGE_SIZE;
 	q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
 	q->backing_dev_info->name = "block";
 	q->node = node_id;
 
-	err = bdi_init(q->backing_dev_info);
-	if (err)
-		goto fail_split;
-
 	setup_timer(&q->backing_dev_info->laptop_mode_wb_timer,
 		    laptop_mode_timer_fn, (unsigned long) q);
 	setup_timer(&q->timeout, blk_rq_timed_out_timer, (unsigned long) q);
@@ -769,7 +767,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 fail_ref:
 	percpu_ref_exit(&q->q_usage_counter);
 fail_bdi:
-	bdi_destroy(q->backing_dev_info);
+	bdi_put(q->backing_dev_info);
 fail_split:
 	bioset_free(q->bio_split);
 fail_id:
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 64fb54c6b41c..4cbaa519ec2d 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -799,7 +799,7 @@ static void blk_release_queue(struct kobject *kobj)
 		container_of(kobj, struct request_queue, kobj);
 
 	wbt_exit(q);
-	bdi_exit(q->backing_dev_info);
+	bdi_put(q->backing_dev_info);
 	blkcg_exit_queue(q);
 
 	if (q->elevator) {
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index e850e76acaaf..ad955817916d 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -10,6 +10,7 @@
 #include <linux/flex_proportions.h>
 #include <linux/timer.h>
 #include <linux/workqueue.h>
+#include <linux/kref.h>
 
 struct page;
 struct device;
@@ -144,6 +145,7 @@ struct backing_dev_info {
 
 	char *name;
 
+	struct kref refcnt;	/* Reference counter for the structure */
 	unsigned int capabilities; /* Device capabilities */
 	unsigned int min_ratio;
 	unsigned int max_ratio, max_prop_frac;
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 43b93a947e61..efb6ca992d05 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -18,7 +18,14 @@
 #include <linux/slab.h>
 
 int __must_check bdi_init(struct backing_dev_info *bdi);
-void bdi_exit(struct backing_dev_info *bdi);
+
+static inline struct backing_dev_info *bdi_get(struct backing_dev_info *bdi)
+{
+	kref_get(&bdi->refcnt);
+	return bdi;
+}
+
+void bdi_put(struct backing_dev_info *bdi);
 
 __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
@@ -29,6 +36,7 @@ void bdi_unregister(struct backing_dev_info *bdi);
 
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
 void bdi_destroy(struct backing_dev_info *bdi);
+struct backing_dev_info *bdi_alloc_node(gfp_t gfp_mask, int node_id);
 
 void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
 			bool range_cyclic, enum wb_reason reason);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 069e4a102a73..de85701cc699 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -433,7 +433,6 @@ struct request_queue {
 	struct delayed_work	delay_work;
 
 	struct backing_dev_info	*backing_dev_info;
-	struct backing_dev_info	_backing_dev_info;
 
 	/*
 	 * The queue owner gets to use this for whatever they like.
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 3bfed5ab2475..db7d33a898a8 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -20,6 +20,8 @@ struct backing_dev_info noop_backing_dev_info = {
 };
 EXPORT_SYMBOL_GPL(noop_backing_dev_info);
 
+static struct kmem_cache *bdi_cachep;
+
 static struct class *bdi_class;
 
 /*
@@ -237,6 +239,9 @@ static __init int bdi_class_init(void)
 
 	bdi_class->dev_groups = bdi_dev_groups;
 	bdi_debug_init();
+
+	bdi_cachep = KMEM_CACHE(backing_dev_info, SLAB_PANIC);
+
 	return 0;
 }
 postcore_initcall(bdi_class_init);
@@ -776,6 +781,7 @@ int bdi_init(struct backing_dev_info *bdi)
 
 	bdi->dev = NULL;
 
+	kref_init(&bdi->refcnt);
 	bdi->min_ratio = 0;
 	bdi->max_ratio = 100;
 	bdi->max_prop_frac = FPROP_FRAC_BASE;
@@ -791,6 +797,21 @@ int bdi_init(struct backing_dev_info *bdi)
 }
 EXPORT_SYMBOL(bdi_init);
 
+struct backing_dev_info *bdi_alloc_node(gfp_t gfp_mask, int node_id)
+{
+	struct backing_dev_info *bdi;
+
+	bdi = kmem_cache_alloc_node(bdi_cachep, gfp_mask | __GFP_ZERO, node_id);
+	if (!bdi)
+		return NULL;
+
+	if (bdi_init(bdi)) {
+		kmem_cache_free(bdi_cachep, bdi);
+		return NULL;
+	}
+	return bdi;
+}
+
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 		const char *fmt, ...)
 {
@@ -871,12 +892,26 @@ void bdi_unregister(struct backing_dev_info *bdi)
 	}
 }
 
-void bdi_exit(struct backing_dev_info *bdi)
+static void bdi_exit(struct backing_dev_info *bdi)
 {
 	WARN_ON_ONCE(bdi->dev);
 	wb_exit(&bdi->wb);
 }
 
+static void release_bdi(struct kref *ref)
+{
+	struct backing_dev_info *bdi =
+			container_of(ref, struct backing_dev_info, refcnt);
+
+	bdi_exit(bdi);
+	kmem_cache_free(bdi_cachep, bdi);
+}
+
+void bdi_put(struct backing_dev_info *bdi)
+{
+	kref_put(&bdi->refcnt, release_bdi);
+}
+
 void bdi_destroy(struct backing_dev_info *bdi)
 {
 	bdi_unregister(bdi);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-02-02 13:32 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-26 17:45 [PATCH 0/4 RFC] BDI lifetime fix Jan Kara
2017-01-26 17:45 ` [PATCH 1/4] block: Unhash block device inodes on gendisk destruction Jan Kara
2017-01-26 17:45 ` [PATCH 2/4] block: Use pointer to backing_dev_info from request_queue Jan Kara
2017-01-26 17:45 ` [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info Jan Kara
2017-01-26 20:41   ` Dan Williams
2017-01-30 12:47     ` Jan Kara
2017-01-26 17:45 ` [PATCH 4/4] block: Make blk_get_backing_dev_info() safe without open bdev Jan Kara
2017-01-27  6:15 ` [PATCH 0/4 RFC] BDI lifetime fix Dan Williams
2017-01-27 19:49   ` Omar Sandoval
2017-01-30 17:19   ` Jan Kara
2017-01-30 17:27     ` Dan Williams
2017-01-31 11:20   ` Jan Kara
2017-01-31 12:54 [PATCH 0/4 v2] " Jan Kara
2017-01-31 12:54 ` [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info Jan Kara
2017-02-01  9:50   ` Christoph Hellwig
2017-02-01 12:22     ` Jan Kara
2017-02-01 22:45       ` Jens Axboe
2017-02-02 13:32         ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.