linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] bdi: fix use-after-free for bdi device
@ 2020-02-26 11:18 Yufen Yu
  2020-02-26 11:18 ` [PATCH v2 1/7] blk-wbt: use bdi_dev_name() to get device name Yufen Yu
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Yufen Yu @ 2020-02-26 11:18 UTC (permalink / raw)
  To: axboe, linux-block, linux-fsdevel; +Cc: tj, jack, bvanassche, tytso

Hi, all 

We have reported a use-after-free crash for bdi device in
__blkg_prfill_rwstat() (see Patch #3). The bug is caused by printing
device kobj->name while the device and kobj->name has been freed by
bdi_unregister().

In fact, commit 68f23b8906 "memcg: fix a crash in wb_workfn when
a device disappears" has tried to address the issue, but the code
is till somewhat racy after that commit.

In this patchset, we try to protect device lifetime with RCU, avoiding
the device been freed when others used.

A way which maybe fix the problem is copy device name into special
memory (as discussed in [0]), but that is also need lock protect.

[0] https://lore.kernel.org/linux-block/20200219125505.GP16121@quack2.suse.cz/

V1:
  https://www.spinics.net/lists/linux-block/msg49693.html
  Add a new spinlock and copy kobj->name into caller buffer.
  Or using synchronize_rcu() to wait until reader complete.

Yufen Yu (7):
  blk-wbt: use bdi_dev_name() to get device name
  fs/ceph: use bdi_dev_name() to get device name
  bdi: protect device lifetime with RCU
  bdi: create a new function bdi_get_dev_name()
  bfq: fix potential kernel crash when print dev err info
  memcg: fix crash in wb_workfn when bdi unregister
  blk-wbt: replace bdi_dev_name() with bdi_get_dev_name()

 block/bfq-iosched.c              |  7 +++--
 block/blk-cgroup.c               |  8 ++++--
 block/genhd.c                    |  4 +--
 fs/ceph/debugfs.c                |  2 +-
 fs/ext4/super.c                  |  2 +-
 fs/fs-writeback.c                |  4 ++-
 include/linux/backing-dev-defs.h |  8 +++++-
 include/linux/backing-dev.h      | 31 +++++++++++++++++++--
 include/trace/events/wbt.h       |  8 +++---
 include/trace/events/writeback.h | 38 ++++++++++++--------------
 mm/backing-dev.c                 | 59 +++++++++++++++++++++++++++++++++-------
 11 files changed, 124 insertions(+), 47 deletions(-)

-- 
2.16.2.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/7] blk-wbt: use bdi_dev_name() to get device name
  2020-02-26 11:18 [PATCH v2 0/7] bdi: fix use-after-free for bdi device Yufen Yu
@ 2020-02-26 11:18 ` Yufen Yu
  2020-02-26 11:18 ` [PATCH v2 2/7] fs/ceph: " Yufen Yu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Yufen Yu @ 2020-02-26 11:18 UTC (permalink / raw)
  To: axboe, linux-block, linux-fsdevel; +Cc: tj, jack, bvanassche, tytso

Use the common interface bdi_dev_name() to get device name.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 include/trace/events/wbt.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/wbt.h b/include/trace/events/wbt.h
index 37342a13c9cb..9996420d7ec4 100644
--- a/include/trace/events/wbt.h
+++ b/include/trace/events/wbt.h
@@ -33,7 +33,7 @@ TRACE_EVENT(wbt_stat,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, dev_name(bdi->dev),
+		strlcpy(__entry->name, bdi_dev_name(bdi),
 			ARRAY_SIZE(__entry->name));
 		__entry->rmean		= stat[0].mean;
 		__entry->rmin		= stat[0].min;
@@ -68,7 +68,7 @@ TRACE_EVENT(wbt_lat,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, dev_name(bdi->dev),
+		strlcpy(__entry->name, bdi_dev_name(bdi),
 			ARRAY_SIZE(__entry->name));
 		__entry->lat = div_u64(lat, 1000);
 	),
@@ -105,7 +105,7 @@ TRACE_EVENT(wbt_step,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, dev_name(bdi->dev),
+		strlcpy(__entry->name, bdi_dev_name(bdi),
 			ARRAY_SIZE(__entry->name));
 		__entry->msg	= msg;
 		__entry->step	= step;
@@ -141,7 +141,7 @@ TRACE_EVENT(wbt_timer,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, dev_name(bdi->dev),
+		strlcpy(__entry->name, bdi_dev_name(bdi),
 			ARRAY_SIZE(__entry->name));
 		__entry->status		= status;
 		__entry->step		= step;
-- 
2.16.2.dirty


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/7] fs/ceph: use bdi_dev_name() to get device name
  2020-02-26 11:18 [PATCH v2 0/7] bdi: fix use-after-free for bdi device Yufen Yu
  2020-02-26 11:18 ` [PATCH v2 1/7] blk-wbt: use bdi_dev_name() to get device name Yufen Yu
@ 2020-02-26 11:18 ` Yufen Yu
  2020-02-26 11:18 ` [PATCH v2 3/7] bdi: protect device lifetime with RCU Yufen Yu
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Yufen Yu @ 2020-02-26 11:18 UTC (permalink / raw)
  To: axboe, linux-block, linux-fsdevel; +Cc: tj, jack, bvanassche, tytso

Use the common interface bdi_dev_name() to get device name.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 fs/ceph/debugfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index fb7cabd98e7b..1f8f3b631adb 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -271,7 +271,7 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc)
 				    &congestion_kb_fops);
 
 	snprintf(name, sizeof(name), "../../bdi/%s",
-		 dev_name(fsc->sb->s_bdi->dev));
+		 bdi_dev_name(fsc->sb->s_bdi));
 	fsc->debugfs_bdi =
 		debugfs_create_symlink("bdi",
 				       fsc->client->debugfs_dir,
-- 
2.16.2.dirty


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/7] bdi: protect device lifetime with RCU
  2020-02-26 11:18 [PATCH v2 0/7] bdi: fix use-after-free for bdi device Yufen Yu
  2020-02-26 11:18 ` [PATCH v2 1/7] blk-wbt: use bdi_dev_name() to get device name Yufen Yu
  2020-02-26 11:18 ` [PATCH v2 2/7] fs/ceph: " Yufen Yu
@ 2020-02-26 11:18 ` Yufen Yu
  2020-03-04 17:05   ` Tejun Heo
  2020-02-26 11:18 ` [PATCH v2 4/7] bdi: create a new function bdi_get_dev_name() Yufen Yu
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Yufen Yu @ 2020-02-26 11:18 UTC (permalink / raw)
  To: axboe, linux-block, linux-fsdevel; +Cc: tj, jack, bvanassche, tytso

We reported a kernel crash:

[201962.639350] Call trace:
[201962.644403]  string+0x28/0xa0
[201962.650501]  vsnprintf+0x5f0/0x748
[201962.657472]  seq_vprintf+0x70/0x98
[201962.664442]  seq_printf+0x7c/0xa0
[201962.671238]  __blkg_prfill_rwstat+0x84/0x128
[201962.679949]  blkg_prfill_rwstat_field+0x94/0xc0
[201962.689182]  blkcg_print_blkgs+0xcc/0x140
[201962.697370]  blkg_print_stat_bytes+0x4c/0x60
[201962.706083]  cgroup_seqfile_show+0x58/0xc0
[201962.714446]  kernfs_seq_show+0x44/0x50
[201962.722112]  seq_read+0xd4/0x4a8
[201962.728732]  kernfs_fop_read+0x16c/0x218
[201962.736748]  __vfs_read+0x60/0x188
[201962.743717]  vfs_read+0x94/0x150
[201962.750338]  ksys_read+0x6c/0xd8
[201962.756958]  __arm64_sys_read+0x24/0x30
[201962.764800]  el0_svc_common+0x78/0x130
[201962.772466]  el0_svc_handler+0x38/0x78
[201962.780131]  el0_svc+0x8/0xc

__blkg_prfill_rwstat() tries to get the device name by 'bdi->dev',
while the device and kobj->name has been freed by bdi_unregister().
Then, blkg_dev_name() will return an invalid name pointer, resulting
in crash on string(). The race as following:

CPU1                          CPU2
blkg_print_stat_bytes
                              __scsi_remove_device
                              del_gendisk
                                bdi_unregister

                                put_device(bdi->dev)
                                  kfree(bdi->dev)
__blkg_prfill_rwstat
  blkg_dev_name
    //use the freed bdi->dev
    dev_name(blkg->q->backing_dev_info->dev)
                                bdi->dev = NULL

We fix the bug by protecting the device lifetime with RCU.
call_rcu() will free the device until all of the readers complete.

Since both of blkg_dev_name() and device name have been protected
by rcu_read_lock/unlock(). Thus, we don't need to do it again.
Just use rcu_dereference() to fetch rcu_dev here.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 block/bfq-iosched.c              |  2 +-
 block/blk-cgroup.c               |  8 ++++--
 block/genhd.c                    |  4 +--
 fs/ext4/super.c                  |  2 +-
 include/linux/backing-dev-defs.h |  8 +++++-
 include/linux/backing-dev.h      |  4 +--
 mm/backing-dev.c                 | 59 +++++++++++++++++++++++++++++++++-------
 7 files changed, 68 insertions(+), 19 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 8c436abfaf14..00904611b8e4 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4978,7 +4978,7 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
 	ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
 	switch (ioprio_class) {
 	default:
-		dev_err(bfqq->bfqd->queue->backing_dev_info->dev,
+		dev_err(&bfqq->bfqd->queue->backing_dev_info->rcu_dev->dev,
 			"bfq: bad prio class %d\n", ioprio_class);
 		/* fall through */
 	case IOPRIO_CLASS_NONE:
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index a229b94d5390..7ab1af3925c5 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -494,9 +494,13 @@ static int blkcg_reset_stats(struct cgroup_subsys_state *css,
 
 const char *blkg_dev_name(struct blkcg_gq *blkg)
 {
+	struct bdi_rcu_device *rcu_dev;
+
+	rcu_dev = rcu_dereference(blkg->q->backing_dev_info->rcu_dev);
+
 	/* some drivers (floppy) instantiate a queue w/o disk registered */
-	if (blkg->q->backing_dev_info->dev)
-		return dev_name(blkg->q->backing_dev_info->dev);
+	if (rcu_dev)
+		return dev_name(&rcu_dev->dev);
 	return NULL;
 }
 
diff --git a/block/genhd.c b/block/genhd.c
index ff6268970ddc..bd406b0fb471 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -668,9 +668,9 @@ static void register_disk(struct device *parent, struct gendisk *disk,
 		kobject_uevent(&part_to_dev(part)->kobj, KOBJ_ADD);
 	disk_part_iter_exit(&piter);
 
-	if (disk->queue->backing_dev_info->dev) {
+	if (disk->queue->backing_dev_info->rcu_dev) {
 		err = sysfs_create_link(&ddev->kobj,
-			  &disk->queue->backing_dev_info->dev->kobj,
+			  &disk->queue->backing_dev_info->rcu_dev->dev.kobj,
 			  "bdi");
 		WARN_ON(err);
 	}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index ff1b764b0c0e..7781773f9f77 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -388,7 +388,7 @@ static int block_device_ejected(struct super_block *sb)
 	struct inode *bd_inode = sb->s_bdev->bd_inode;
 	struct backing_dev_info *bdi = inode_to_bdi(bd_inode);
 
-	return bdi->dev == NULL;
+	return bdi->rcu_dev == NULL;
 }
 
 static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn)
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 4fc87dee005a..3d2294c428c1 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -13,6 +13,7 @@
 #include <linux/workqueue.h>
 #include <linux/kref.h>
 #include <linux/refcount.h>
+#include <linux/device.h>
 
 struct page;
 struct device;
@@ -185,6 +186,11 @@ struct bdi_writeback {
 #endif
 };
 
+struct bdi_rcu_device {
+	struct device dev;
+	struct rcu_head rcu_head;
+};
+
 struct backing_dev_info {
 	u64 id;
 	struct rb_node rb_node; /* keyed by ->id */
@@ -219,7 +225,7 @@ struct backing_dev_info {
 #endif
 	wait_queue_head_t wb_waitq;
 
-	struct device *dev;
+	struct bdi_rcu_device *rcu_dev;
 	struct device *owner;
 
 	struct timer_list laptop_mode_wb_timer;
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index f88197c1ffc2..67e429b203a1 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -509,9 +509,9 @@ extern const char *bdi_unknown_name;
 
 static inline const char *bdi_dev_name(struct backing_dev_info *bdi)
 {
-	if (!bdi || !bdi->dev)
+	if (!bdi || !bdi->rcu_dev)
 		return bdi_unknown_name;
-	return dev_name(bdi->dev);
+	return dev_name(&bdi->rcu_dev->dev);
 }
 
 #endif	/* _LINUX_BACKING_DEV_H */
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 62f05f605fb5..b29c0ad43477 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -850,7 +850,7 @@ static int bdi_init(struct backing_dev_info *bdi)
 {
 	int ret;
 
-	bdi->dev = NULL;
+	bdi->rcu_dev = NULL;
 
 	kref_init(&bdi->refcnt);
 	bdi->min_ratio = 0;
@@ -930,20 +930,44 @@ struct backing_dev_info *bdi_get_by_id(u64 id)
 	return bdi;
 }
 
+static void bdi_device_release(struct device *dev)
+{
+	struct bdi_rcu_device *rcu_dev = container_of(dev,
+			struct bdi_rcu_device, dev);
+	kfree(rcu_dev);
+}
+
 int bdi_register_va(struct backing_dev_info *bdi, const char *fmt, va_list args)
 {
 	struct device *dev;
 	struct rb_node *parent, **p;
+	struct bdi_rcu_device *rcu_dev;
+	int retval = -ENODEV;
 
-	if (bdi->dev)	/* The driver needs to use separate queues per device */
+	/* The driver needs to use separate queues per device */
+	if (bdi->rcu_dev)
 		return 0;
 
-	dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0), bdi, fmt, args);
-	if (IS_ERR(dev))
-		return PTR_ERR(dev);
+	rcu_dev = kzalloc(sizeof(struct bdi_rcu_device), GFP_KERNEL);
+	if (!rcu_dev)
+		return -ENOMEM;
+
+	/* initialize device */
+	dev = &rcu_dev->dev;
+	device_initialize(dev);
+	dev->class = bdi_class;
+	dev->release = bdi_device_release;
+	dev_set_drvdata(dev, (void *)bdi);
+	retval = kobject_set_name_vargs(&dev->kobj, fmt, args);
+	if (retval)
+		goto error;
+
+	retval = device_add(dev);
+	if (retval)
+		goto error;
 
 	cgwb_bdi_register(bdi);
-	bdi->dev = dev;
+	bdi->rcu_dev = rcu_dev;
 
 	bdi_debug_register(bdi, dev_name(dev));
 	set_bit(WB_registered, &bdi->wb.state);
@@ -962,6 +986,10 @@ int bdi_register_va(struct backing_dev_info *bdi, const char *fmt, va_list args)
 
 	trace_writeback_bdi_register(bdi);
 	return 0;
+
+error:
+	put_device(&rcu_dev->dev);
+	return retval;
 }
 EXPORT_SYMBOL(bdi_register_va);
 
@@ -1005,17 +1033,28 @@ static void bdi_remove_from_list(struct backing_dev_info *bdi)
 	synchronize_rcu_expedited();
 }
 
+static void bdi_put_device_rcu(struct rcu_head *rcu)
+{
+	struct bdi_rcu_device *rcu_dev = container_of(rcu,
+			struct bdi_rcu_device, rcu_head);
+	put_device(&rcu_dev->dev);
+}
+
 void bdi_unregister(struct backing_dev_info *bdi)
 {
+	struct bdi_rcu_device *rcu_dev = bdi->rcu_dev;
+
 	/* make sure nobody finds us on the bdi_list anymore */
 	bdi_remove_from_list(bdi);
 	wb_shutdown(&bdi->wb);
 	cgwb_bdi_unregister(bdi);
 
-	if (bdi->dev) {
+	if (rcu_dev) {
 		bdi_debug_unregister(bdi);
-		device_unregister(bdi->dev);
-		bdi->dev = NULL;
+		get_device(&rcu_dev->dev);
+		device_unregister(&rcu_dev->dev);
+		rcu_assign_pointer(bdi->rcu_dev, NULL);
+		call_rcu(&rcu_dev->rcu_head, bdi_put_device_rcu);
 	}
 
 	if (bdi->owner) {
@@ -1031,7 +1070,7 @@ static void release_bdi(struct kref *ref)
 
 	if (test_bit(WB_registered, &bdi->wb.state))
 		bdi_unregister(bdi);
-	WARN_ON_ONCE(bdi->dev);
+	WARN_ON_ONCE(bdi->rcu_dev);
 	wb_exit(&bdi->wb);
 	cgwb_bdi_exit(bdi);
 	kfree(bdi);
-- 
2.16.2.dirty


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 4/7] bdi: create a new function bdi_get_dev_name()
  2020-02-26 11:18 [PATCH v2 0/7] bdi: fix use-after-free for bdi device Yufen Yu
                   ` (2 preceding siblings ...)
  2020-02-26 11:18 ` [PATCH v2 3/7] bdi: protect device lifetime with RCU Yufen Yu
@ 2020-02-26 11:18 ` Yufen Yu
  2020-02-26 11:18 ` [PATCH v2 5/7] bfq: fix potential kernel crash when print dev err info Yufen Yu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Yufen Yu @ 2020-02-26 11:18 UTC (permalink / raw)
  To: axboe, linux-block, linux-fsdevel; +Cc: tj, jack, bvanassche, tytso

We prepare a new function bdi_get_dev_name() to copy device
kobj->name into buffer passed by caller. The function is covered
by RCU. Thus, caller can access ->dev and copy integral device name.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 include/linux/backing-dev.h | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 67e429b203a1..89d1cb7923f5 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -514,4 +514,29 @@ static inline const char *bdi_dev_name(struct backing_dev_info *bdi)
 	return dev_name(&bdi->rcu_dev->dev);
 }
 
+/**
+ * bdi_get_dev_name - copy bdi device name into buffer
+ * @bdi: target bdi
+ * @dname: Where to copy the device name to
+ * @len: size of destination buffer
+ */
+static inline void bdi_get_dev_name(struct backing_dev_info *bdi,
+			char *dname, int len)
+{
+	struct bdi_rcu_device *rcu_dev;
+
+	if (!bdi) {
+		strlcpy(dname, bdi_unknown_name, len);
+		return;
+	}
+
+	rcu_read_lock();
+
+	rcu_dev = rcu_dereference(bdi->rcu_dev);
+	strlcpy(dname, rcu_dev ? dev_name(&rcu_dev->dev) :
+			bdi_unknown_name, len);
+
+	rcu_read_unlock();
+}
+
 #endif	/* _LINUX_BACKING_DEV_H */
-- 
2.16.2.dirty


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 5/7] bfq: fix potential kernel crash when print dev err info
  2020-02-26 11:18 [PATCH v2 0/7] bdi: fix use-after-free for bdi device Yufen Yu
                   ` (3 preceding siblings ...)
  2020-02-26 11:18 ` [PATCH v2 4/7] bdi: create a new function bdi_get_dev_name() Yufen Yu
@ 2020-02-26 11:18 ` Yufen Yu
  2020-02-26 11:18 ` [PATCH v2 6/7] memcg: fix crash in wb_workfn when bdi unregister Yufen Yu
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Yufen Yu @ 2020-02-26 11:18 UTC (permalink / raw)
  To: axboe, linux-block, linux-fsdevel; +Cc: tj, jack, bvanassche, tytso

We use bdi_get_dev_name() to get device name, avoiding
use-after-free or NULL pointer reference for ->dev.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 block/bfq-iosched.c         | 7 +++++--
 include/linux/backing-dev.h | 2 ++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 00904611b8e4..8d41783d8e77 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -123,6 +123,7 @@
 #include <linux/ioprio.h>
 #include <linux/sbitmap.h>
 #include <linux/delay.h>
+#include <linux/backing-dev.h>
 
 #include "blk.h"
 #include "blk-mq.h"
@@ -4971,6 +4972,7 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
 	struct task_struct *tsk = current;
 	int ioprio_class;
 	struct bfq_data *bfqd = bfqq->bfqd;
+	char dname[BDI_DEV_NAME_LEN];
 
 	if (!bfqd)
 		return;
@@ -4978,8 +4980,9 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
 	ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
 	switch (ioprio_class) {
 	default:
-		dev_err(&bfqq->bfqd->queue->backing_dev_info->rcu_dev->dev,
-			"bfq: bad prio class %d\n", ioprio_class);
+		bdi_get_dev_name(bfqq->bfqd->queue->backing_dev_info,
+				dname, BDI_DEV_NAME_LEN);
+		pr_err("bdi %s: bfq: bad prio class %d\n", dname, ioprio_class);
 		/* fall through */
 	case IOPRIO_CLASS_NONE:
 		/*
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 89d1cb7923f5..291db069f7da 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -19,6 +19,8 @@
 #include <linux/backing-dev-defs.h>
 #include <linux/slab.h>
 
+#define BDI_DEV_NAME_LEN       32
+
 static inline struct backing_dev_info *bdi_get(struct backing_dev_info *bdi)
 {
 	kref_get(&bdi->refcnt);
-- 
2.16.2.dirty


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 6/7] memcg: fix crash in wb_workfn when bdi unregister
  2020-02-26 11:18 [PATCH v2 0/7] bdi: fix use-after-free for bdi device Yufen Yu
                   ` (4 preceding siblings ...)
  2020-02-26 11:18 ` [PATCH v2 5/7] bfq: fix potential kernel crash when print dev err info Yufen Yu
@ 2020-02-26 11:18 ` Yufen Yu
  2020-02-26 11:18 ` [PATCH v2 7/7] blk-wbt: replace bdi_dev_name() with bdi_get_dev_name() Yufen Yu
  2020-03-04 17:29 ` [PATCH v2 0/7] bdi: fix use-after-free for bdi device Greg KH
  7 siblings, 0 replies; 22+ messages in thread
From: Yufen Yu @ 2020-02-26 11:18 UTC (permalink / raw)
  To: axboe, linux-block, linux-fsdevel; +Cc: tj, jack, bvanassche, tytso

To fix crash in wb_workfn when bdi_unreigster(), commit 68f23b89067f
("memcg: fix a crash in wb_workfn when a device disappears") has
created bdi_dev_name() to handle bdi->dev beening NULL. But, bdi->dev
can be freed after bdi_dev_name() return successly, which may cause
use-after-free for dev or kobj->name.

After protecting bdi->dev lifetimes by rcu lock, we can use
bdi_get_dev_name() to copy name safely.

Fixes: 68f23b89067f ("memcg: fix a crash in wb_workfn when a device disappears")
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 fs/fs-writeback.c                |  4 +++-
 include/trace/events/writeback.h | 38 +++++++++++++++++---------------------
 2 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 76ac9c7d32ec..b8098009e0dc 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2062,8 +2062,10 @@ void wb_workfn(struct work_struct *work)
 	struct bdi_writeback *wb = container_of(to_delayed_work(work),
 						struct bdi_writeback, dwork);
 	long pages_written;
+	char dname[BDI_DEV_NAME_LEN];
 
-	set_worker_desc("flush-%s", bdi_dev_name(wb->bdi));
+	bdi_get_dev_name(wb->bdi, dname, BDI_DEV_NAME_LEN);
+	set_worker_desc("flush-%s", dname);
 	current->flags |= PF_SWAPWRITE;
 
 	if (likely(!current_is_workqueue_rescuer() ||
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index d94def25e4dc..4767a5ab5e36 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -66,9 +66,8 @@ DECLARE_EVENT_CLASS(writeback_page_template,
 	),
 
 	TP_fast_assign(
-		strscpy_pad(__entry->name,
-			    bdi_dev_name(mapping ? inode_to_bdi(mapping->host) :
-					 NULL), 32);
+		bdi_get_dev_name(mapping ? inode_to_bdi(mapping->host) : NULL,
+				__entry->name, 32);
 		__entry->ino = mapping ? mapping->host->i_ino : 0;
 		__entry->index = page->index;
 	),
@@ -111,7 +110,7 @@ DECLARE_EVENT_CLASS(writeback_dirty_inode_template,
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
 
 		/* may be called for files on pseudo FSes w/ unregistered bdi */
-		strscpy_pad(__entry->name, bdi_dev_name(bdi), 32);
+		bdi_get_dev_name(bdi, __entry->name, 32);
 		__entry->ino		= inode->i_ino;
 		__entry->state		= inode->i_state;
 		__entry->flags		= flags;
@@ -192,7 +191,7 @@ TRACE_EVENT(inode_foreign_history,
 	),
 
 	TP_fast_assign(
-		strncpy(__entry->name, bdi_dev_name(inode_to_bdi(inode)), 32);
+		bdi_get_dev_name(inode_to_bdi(inode), __entry->name, 32);
 		__entry->ino		= inode->i_ino;
 		__entry->cgroup_ino	= __trace_wbc_assign_cgroup(wbc);
 		__entry->history	= history;
@@ -221,7 +220,7 @@ TRACE_EVENT(inode_switch_wbs,
 	),
 
 	TP_fast_assign(
-		strncpy(__entry->name,	bdi_dev_name(old_wb->bdi), 32);
+		bdi_get_dev_name(old_wb->bdi, __entry->name, 32);
 		__entry->ino		= inode->i_ino;
 		__entry->old_cgroup_ino	= __trace_wb_assign_cgroup(old_wb);
 		__entry->new_cgroup_ino	= __trace_wb_assign_cgroup(new_wb);
@@ -254,7 +253,7 @@ TRACE_EVENT(track_foreign_dirty,
 		struct address_space *mapping = page_mapping(page);
 		struct inode *inode = mapping ? mapping->host : NULL;
 
-		strncpy(__entry->name,	bdi_dev_name(wb->bdi), 32);
+		bdi_get_dev_name(wb->bdi, __entry->name, 32);
 		__entry->bdi_id		= wb->bdi->id;
 		__entry->ino		= inode ? inode->i_ino : 0;
 		__entry->memcg_id	= wb->memcg_css->id;
@@ -287,7 +286,7 @@ TRACE_EVENT(flush_foreign,
 	),
 
 	TP_fast_assign(
-		strncpy(__entry->name,	bdi_dev_name(wb->bdi), 32);
+		bdi_get_dev_name(wb->bdi, __entry->name, 32);
 		__entry->cgroup_ino	= __trace_wb_assign_cgroup(wb);
 		__entry->frn_bdi_id	= frn_bdi_id;
 		__entry->frn_memcg_id	= frn_memcg_id;
@@ -316,8 +315,7 @@ DECLARE_EVENT_CLASS(writeback_write_inode_template,
 	),
 
 	TP_fast_assign(
-		strscpy_pad(__entry->name,
-			    bdi_dev_name(inode_to_bdi(inode)), 32);
+		bdi_get_dev_name(inode_to_bdi(inode), __entry->name, 32);
 		__entry->ino		= inode->i_ino;
 		__entry->sync_mode	= wbc->sync_mode;
 		__entry->cgroup_ino	= __trace_wbc_assign_cgroup(wbc);
@@ -360,7 +358,7 @@ DECLARE_EVENT_CLASS(writeback_work_class,
 		__field(ino_t, cgroup_ino)
 	),
 	TP_fast_assign(
-		strscpy_pad(__entry->name, bdi_dev_name(wb->bdi), 32);
+		bdi_get_dev_name(wb->bdi, __entry->name, 32);
 		__entry->nr_pages = work->nr_pages;
 		__entry->sb_dev = work->sb ? work->sb->s_dev : 0;
 		__entry->sync_mode = work->sync_mode;
@@ -413,7 +411,7 @@ DECLARE_EVENT_CLASS(writeback_class,
 		__field(ino_t, cgroup_ino)
 	),
 	TP_fast_assign(
-		strscpy_pad(__entry->name, bdi_dev_name(wb->bdi), 32);
+		bdi_get_dev_name(wb->bdi, __entry->name, 32);
 		__entry->cgroup_ino = __trace_wb_assign_cgroup(wb);
 	),
 	TP_printk("bdi %s: cgroup_ino=%lu",
@@ -435,7 +433,7 @@ TRACE_EVENT(writeback_bdi_register,
 		__array(char, name, 32)
 	),
 	TP_fast_assign(
-		strscpy_pad(__entry->name, bdi_dev_name(bdi), 32);
+		bdi_get_dev_name(bdi, __entry->name, 32);
 	),
 	TP_printk("bdi %s",
 		__entry->name
@@ -460,7 +458,7 @@ DECLARE_EVENT_CLASS(wbc_class,
 	),
 
 	TP_fast_assign(
-		strscpy_pad(__entry->name, bdi_dev_name(bdi), 32);
+		bdi_get_dev_name(bdi, __entry->name, 32);
 		__entry->nr_to_write	= wbc->nr_to_write;
 		__entry->pages_skipped	= wbc->pages_skipped;
 		__entry->sync_mode	= wbc->sync_mode;
@@ -511,7 +509,7 @@ TRACE_EVENT(writeback_queue_io,
 	),
 	TP_fast_assign(
 		unsigned long *older_than_this = work->older_than_this;
-		strscpy_pad(__entry->name, bdi_dev_name(wb->bdi), 32);
+		bdi_get_dev_name(wb->bdi, __entry->name, 32);
 		__entry->older	= older_than_this ?  *older_than_this : 0;
 		__entry->age	= older_than_this ?
 				  (jiffies - *older_than_this) * 1000 / HZ : -1;
@@ -597,7 +595,7 @@ TRACE_EVENT(bdi_dirty_ratelimit,
 	),
 
 	TP_fast_assign(
-		strscpy_pad(__entry->bdi, bdi_dev_name(wb->bdi), 32);
+		bdi_get_dev_name(wb->bdi, __entry->bdi, 32);
 		__entry->write_bw	= KBps(wb->write_bandwidth);
 		__entry->avg_write_bw	= KBps(wb->avg_write_bandwidth);
 		__entry->dirty_rate	= KBps(dirty_rate);
@@ -662,7 +660,7 @@ TRACE_EVENT(balance_dirty_pages,
 
 	TP_fast_assign(
 		unsigned long freerun = (thresh + bg_thresh) / 2;
-		strscpy_pad(__entry->bdi, bdi_dev_name(wb->bdi), 32);
+		bdi_get_dev_name(wb->bdi, __entry->bdi, 32);
 
 		__entry->limit		= global_wb_domain.dirty_limit;
 		__entry->setpoint	= (global_wb_domain.dirty_limit +
@@ -722,8 +720,7 @@ TRACE_EVENT(writeback_sb_inodes_requeue,
 	),
 
 	TP_fast_assign(
-		strscpy_pad(__entry->name,
-			    bdi_dev_name(inode_to_bdi(inode)), 32);
+		bdi_get_dev_name(inode_to_bdi(inode), __entry->name, 32);
 		__entry->ino		= inode->i_ino;
 		__entry->state		= inode->i_state;
 		__entry->dirtied_when	= inode->dirtied_when;
@@ -796,8 +793,7 @@ DECLARE_EVENT_CLASS(writeback_single_inode_template,
 	),
 
 	TP_fast_assign(
-		strscpy_pad(__entry->name,
-			    bdi_dev_name(inode_to_bdi(inode)), 32);
+		bdi_get_dev_name(inode_to_bdi(inode), __entry->name, 32);
 		__entry->ino		= inode->i_ino;
 		__entry->state		= inode->i_state;
 		__entry->dirtied_when	= inode->dirtied_when;
-- 
2.16.2.dirty


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 7/7] blk-wbt: replace bdi_dev_name() with bdi_get_dev_name()
  2020-02-26 11:18 [PATCH v2 0/7] bdi: fix use-after-free for bdi device Yufen Yu
                   ` (5 preceding siblings ...)
  2020-02-26 11:18 ` [PATCH v2 6/7] memcg: fix crash in wb_workfn when bdi unregister Yufen Yu
@ 2020-02-26 11:18 ` Yufen Yu
  2020-03-04 17:29 ` [PATCH v2 0/7] bdi: fix use-after-free for bdi device Greg KH
  7 siblings, 0 replies; 22+ messages in thread
From: Yufen Yu @ 2020-02-26 11:18 UTC (permalink / raw)
  To: axboe, linux-block, linux-fsdevel; +Cc: tj, jack, bvanassche, tytso

wb_timer_fn() can be called after bdi unregister. So, we get bdi
device name by bdi_get_dev_name() to avoid use-after-free or null
pointer deference for ->dev.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 include/trace/events/wbt.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/wbt.h b/include/trace/events/wbt.h
index 9996420d7ec4..3fbf15565962 100644
--- a/include/trace/events/wbt.h
+++ b/include/trace/events/wbt.h
@@ -33,7 +33,7 @@ TRACE_EVENT(wbt_stat,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, bdi_dev_name(bdi),
+		bdi_get_dev_name(bdi, __entry->name,
 			ARRAY_SIZE(__entry->name));
 		__entry->rmean		= stat[0].mean;
 		__entry->rmin		= stat[0].min;
@@ -68,7 +68,7 @@ TRACE_EVENT(wbt_lat,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, bdi_dev_name(bdi),
+		bdi_get_dev_name(bdi, __entry->name,
 			ARRAY_SIZE(__entry->name));
 		__entry->lat = div_u64(lat, 1000);
 	),
@@ -105,7 +105,7 @@ TRACE_EVENT(wbt_step,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, bdi_dev_name(bdi),
+		bdi_get_dev_name(bdi, __entry->name,
 			ARRAY_SIZE(__entry->name));
 		__entry->msg	= msg;
 		__entry->step	= step;
@@ -141,7 +141,7 @@ TRACE_EVENT(wbt_timer,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, bdi_dev_name(bdi),
+		bdi_get_dev_name(bdi, __entry->name,
 			ARRAY_SIZE(__entry->name));
 		__entry->status		= status;
 		__entry->step		= step;
-- 
2.16.2.dirty


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/7] bdi: protect device lifetime with RCU
  2020-02-26 11:18 ` [PATCH v2 3/7] bdi: protect device lifetime with RCU Yufen Yu
@ 2020-03-04 17:05   ` Tejun Heo
  2020-03-04 17:22     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2020-03-04 17:05 UTC (permalink / raw)
  To: Yufen Yu
  Cc: axboe, linux-block, linux-fsdevel, jack, bvanassche, tytso,
	Greg Kroah-Hartman

Hello,

It might be better to put this patch at the end rather than in the
middle so that when this patch is applied things are actually fixed.

> +struct bdi_rcu_device {
> +	struct device dev;
> +	struct rcu_head rcu_head;
> +};

(cc'ing Greg)

Greg, block layer switches association between backing_device_info and
its struct device and needs to protect it with RCU. Yufen did so by
introducing a wrapping struct around struct device like above. Do you
think it'd make sense to just embed rcu_head into struct device and
let put_device() to RCU release by default?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/7] bdi: protect device lifetime with RCU
  2020-03-04 17:05   ` Tejun Heo
@ 2020-03-04 17:22     ` Greg Kroah-Hartman
  2020-03-04 17:23       ` Greg Kroah-Hartman
  2020-03-04 18:50       ` Tejun Heo
  0 siblings, 2 replies; 22+ messages in thread
From: Greg Kroah-Hartman @ 2020-03-04 17:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Yufen Yu, axboe, linux-block, linux-fsdevel, jack, bvanassche, tytso

On Wed, Mar 04, 2020 at 12:05:43PM -0500, Tejun Heo wrote:
> Hello,
> 
> It might be better to put this patch at the end rather than in the
> middle so that when this patch is applied things are actually fixed.
> 
> > +struct bdi_rcu_device {
> > +	struct device dev;
> > +	struct rcu_head rcu_head;
> > +};
> 
> (cc'ing Greg)
> 
> Greg, block layer switches association between backing_device_info and
> its struct device and needs to protect it with RCU. Yufen did so by
> introducing a wrapping struct around struct device like above. Do you
> think it'd make sense to just embed rcu_head into struct device and
> let put_device() to RCU release by default?

Ugh, I was dreading the fact that this day might sometime come...

In theory, the reference counting for struct device shouldn't need to
use rcu at all, right?  what is driving the need to use rcu for
backing_device_info?  Are these being destroyed/used so often that rcu
really is the best solution and the existing reference counting doesn't
work properly?

Some context is needed here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/7] bdi: protect device lifetime with RCU
  2020-03-04 17:22     ` Greg Kroah-Hartman
@ 2020-03-04 17:23       ` Greg Kroah-Hartman
  2020-03-04 18:50       ` Tejun Heo
  1 sibling, 0 replies; 22+ messages in thread
From: Greg Kroah-Hartman @ 2020-03-04 17:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Yufen Yu, axboe, linux-block, linux-fsdevel, jack, bvanassche, tytso

On Wed, Mar 04, 2020 at 06:22:21PM +0100, Greg Kroah-Hartman wrote:
> On Wed, Mar 04, 2020 at 12:05:43PM -0500, Tejun Heo wrote:
> > Hello,
> > 
> > It might be better to put this patch at the end rather than in the
> > middle so that when this patch is applied things are actually fixed.
> > 
> > > +struct bdi_rcu_device {
> > > +	struct device dev;
> > > +	struct rcu_head rcu_head;
> > > +};
> > 
> > (cc'ing Greg)
> > 
> > Greg, block layer switches association between backing_device_info and
> > its struct device and needs to protect it with RCU. Yufen did so by
> > introducing a wrapping struct around struct device like above. Do you
> > think it'd make sense to just embed rcu_head into struct device and
> > let put_device() to RCU release by default?
> 
> Ugh, I was dreading the fact that this day might sometime come...
> 
> In theory, the reference counting for struct device shouldn't need to
> use rcu at all, right?  what is driving the need to use rcu for
> backing_device_info?  Are these being destroyed/used so often that rcu
> really is the best solution and the existing reference counting doesn't
> work properly?
> 
> Some context is needed here.

Ah, would help if I just read the whole patch series, that helps...

Something is odd if kobj->name is the problem here, let me look at the
patches in full.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/7] bdi: fix use-after-free for bdi device
  2020-02-26 11:18 [PATCH v2 0/7] bdi: fix use-after-free for bdi device Yufen Yu
                   ` (6 preceding siblings ...)
  2020-02-26 11:18 ` [PATCH v2 7/7] blk-wbt: replace bdi_dev_name() with bdi_get_dev_name() Yufen Yu
@ 2020-03-04 17:29 ` Greg KH
  2020-03-04 18:57   ` Tejun Heo
  2020-03-04 19:02   ` Theodore Y. Ts'o
  7 siblings, 2 replies; 22+ messages in thread
From: Greg KH @ 2020-03-04 17:29 UTC (permalink / raw)
  To: Yufen Yu; +Cc: axboe, linux-block, linux-fsdevel, tj, jack, bvanassche, tytso

On Wed, Feb 26, 2020 at 07:18:44PM +0800, Yufen Yu wrote:
> Hi, all 
> 
> We have reported a use-after-free crash for bdi device in
> __blkg_prfill_rwstat() (see Patch #3). The bug is caused by printing
> device kobj->name while the device and kobj->name has been freed by
> bdi_unregister().

How does that happen?  Who has access to a kobject without also having
the reference count incremented at the same time?  Is this through sysfs
or somewhere within the kernel itself?

> In fact, commit 68f23b8906 "memcg: fix a crash in wb_workfn when
> a device disappears" has tried to address the issue, but the code
> is till somewhat racy after that commit.

That commit is really odd, and I think is just papering over the real
issue, which is shown in the changelog for that commit.

A kobject can be unregistered, like bdi_unregister() does, even if there
are active references for it.  But someone needs to also go around and
decrement those references in order for things to be properly freed.

It feels like the use of struct device (and by virtue of that, struct
kobject and really a kref) here is not being done correctly at all.

The rule should be, "whenever you pass a pointer to a device off, the
reference count is incremented".  Somehow that is not happening here and
RCU is not going to solve the issue really, it's only going to delay the
problem from showing up until much later.

> In this patchset, we try to protect device lifetime with RCU, avoiding
> the device been freed when others used.

The struct device refcount should be all that is needed, don't use RCU
just to "delay freeing this object until some later time because someone
else might have a pointer to id".  That's ripe for disaster.

> A way which maybe fix the problem is copy device name into special
> memory (as discussed in [0]), but that is also need lock protect.

Hah, all that is needed is the name here?  That's sad.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/7] bdi: protect device lifetime with RCU
  2020-03-04 17:22     ` Greg Kroah-Hartman
  2020-03-04 17:23       ` Greg Kroah-Hartman
@ 2020-03-04 18:50       ` Tejun Heo
  2020-03-04 19:10         ` Theodore Y. Ts'o
  2020-03-04 20:05         ` Greg Kroah-Hartman
  1 sibling, 2 replies; 22+ messages in thread
From: Tejun Heo @ 2020-03-04 18:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Yufen Yu, axboe, linux-block, linux-fsdevel, jack, bvanassche, tytso

Hello,

On Wed, Mar 04, 2020 at 06:22:21PM +0100, Greg Kroah-Hartman wrote:
> Ugh, I was dreading the fact that this day might sometime come...
> 
> In theory, the reference counting for struct device shouldn't need to
> use rcu at all, right?  what is driving the need to use rcu for

Lifetime rules in block layer are kinda nebulous. Some of it comes
from the fact that some objects are reused. Instead of the usual,
create-use-release, they get repurposed to be associated with
something else. When looking at such an object from some paths, we
don't necessarily have ownership of all of the members.

> backing_device_info?  Are these being destroyed/used so often that rcu
> really is the best solution and the existing reference counting doesn't
> work properly?

It's more that there are entry points which can only ensure that just
the top level object is valid and the member objects might be going or
coming as we're looking at it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/7] bdi: fix use-after-free for bdi device
  2020-03-04 17:29 ` [PATCH v2 0/7] bdi: fix use-after-free for bdi device Greg KH
@ 2020-03-04 18:57   ` Tejun Heo
  2020-03-04 20:07     ` Greg KH
  2020-03-04 19:02   ` Theodore Y. Ts'o
  1 sibling, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2020-03-04 18:57 UTC (permalink / raw)
  To: Greg KH
  Cc: Yufen Yu, axboe, linux-block, linux-fsdevel, jack, bvanassche, tytso

Hey, Greg.

On Wed, Mar 04, 2020 at 06:29:07PM +0100, Greg KH wrote:
> How does that happen?  Who has access to a kobject without also having
> the reference count incremented at the same time?  Is this through sysfs
> or somewhere within the kernel itself?

Hopefully, this part was addressed in the other reply.

> The struct device refcount should be all that is needed, don't use RCU
> just to "delay freeing this object until some later time because someone
> else might have a pointer to id".  That's ripe for disaster.

I think it's an idiomatic use of rcu given the circumstances. Whether
the circumstances are reasonable is totally debatable.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/7] bdi: fix use-after-free for bdi device
  2020-03-04 17:29 ` [PATCH v2 0/7] bdi: fix use-after-free for bdi device Greg KH
  2020-03-04 18:57   ` Tejun Heo
@ 2020-03-04 19:02   ` Theodore Y. Ts'o
  1 sibling, 0 replies; 22+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-04 19:02 UTC (permalink / raw)
  To: Greg KH; +Cc: Yufen Yu, axboe, linux-block, linux-fsdevel, tj, jack, bvanassche

On Wed, Mar 04, 2020 at 06:29:07PM +0100, Greg KH wrote:
> The rule should be, "whenever you pass a pointer to a device off, the
> reference count is incremented".  Somehow that is not happening here and
> RCU is not going to solve the issue really, it's only going to delay the
> problem from showing up until much later.
> ...
> The struct device refcount should be all that is needed, don't use RCU
> just to "delay freeing this object until some later time because someone
> else might have a pointer to id".  That's ripe for disaster.

I agree that this is a better fix than trying to continue to paper
over the problem.

That being said, I also think it would be better if we could *also*
send a notification to the file system (or device mapper) when a block
device has disappeared, so we can set a flag in struct super
indicating, "this is an ex-device" so that we don't have to have
potentially hundreds of I/O errors clogging up the console and/or any
error notification ifrastructure we might want to add in the future,
as we attempt to send I/O to a device is not coming back.  This would
allow us to short-circuit things like writeback, instead of letting
everything drain via pointless io_submits sending bios that will never
go anywhere useful.

					- Ted

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/7] bdi: protect device lifetime with RCU
  2020-03-04 18:50       ` Tejun Heo
@ 2020-03-04 19:10         ` Theodore Y. Ts'o
  2020-03-04 19:15           ` Tejun Heo
  2020-03-04 20:05         ` Greg Kroah-Hartman
  1 sibling, 1 reply; 22+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-04 19:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Kroah-Hartman, Yufen Yu, axboe, linux-block, linux-fsdevel,
	jack, bvanassche

On Wed, Mar 04, 2020 at 01:50:56PM -0500, Tejun Heo wrote:
> 
> Lifetime rules in block layer are kinda nebulous. Some of it comes
> from the fact that some objects are reused. Instead of the usual,
> create-use-release, they get repurposed to be associated with
> something else. When looking at such an object from some paths, we
> don't necessarily have ownership of all of the members.

I wonder if the current rules should be better documented, and that
perhaps we should revisit some of them so we can tighten them down?
For things that are likely to be long-lived, such as anything
corresponding to a bdi or block device, perhaps it would be better if
the lifetime rules can be made tighter?  The cost of needing to
release and reallocate longer lived objects is going to be negligible,
and benefits of improving code readability, reliability, and
robuestness might be well worth it.

     		  	       	   	       - Ted

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/7] bdi: protect device lifetime with RCU
  2020-03-04 19:10         ` Theodore Y. Ts'o
@ 2020-03-04 19:15           ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2020-03-04 19:15 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Greg Kroah-Hartman, Yufen Yu, axboe, linux-block, linux-fsdevel,
	jack, bvanassche

Hello,

On Wed, Mar 04, 2020 at 02:10:26PM -0500, Theodore Y. Ts'o wrote:
> On Wed, Mar 04, 2020 at 01:50:56PM -0500, Tejun Heo wrote:
> > Lifetime rules in block layer are kinda nebulous. Some of it comes
> > from the fact that some objects are reused. Instead of the usual,
> > create-use-release, they get repurposed to be associated with
> > something else. When looking at such an object from some paths, we
> > don't necessarily have ownership of all of the members.
> 
> I wonder if the current rules should be better documented, and that
> perhaps we should revisit some of them so we can tighten them down?

Oh yeah, that'd be nice for sure. We've been papering over stuff
constantly for probably over a decade now. It'd be really nice if we
could clean the house and have sane nominal lifetime rules for block
objects.

> For things that are likely to be long-lived, such as anything
> corresponding to a bdi or block device, perhaps it would be better if
> the lifetime rules can be made tighter?  The cost of needing to
> release and reallocate longer lived objects is going to be negligible,
> and benefits of improving code readability, reliability, and
> robuestness might be well worth it.

I full-heartedly agree. It's just a lot of historical accumulation and
not a lot of manpower directed at cleaning it up.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/7] bdi: protect device lifetime with RCU
  2020-03-04 18:50       ` Tejun Heo
  2020-03-04 19:10         ` Theodore Y. Ts'o
@ 2020-03-04 20:05         ` Greg Kroah-Hartman
  2020-03-05  1:22           ` Tejun Heo
  1 sibling, 1 reply; 22+ messages in thread
From: Greg Kroah-Hartman @ 2020-03-04 20:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Yufen Yu, axboe, linux-block, linux-fsdevel, jack, bvanassche, tytso

On Wed, Mar 04, 2020 at 01:50:56PM -0500, Tejun Heo wrote:
> Hello,
> 
> On Wed, Mar 04, 2020 at 06:22:21PM +0100, Greg Kroah-Hartman wrote:
> > Ugh, I was dreading the fact that this day might sometime come...
> > 
> > In theory, the reference counting for struct device shouldn't need to
> > use rcu at all, right?  what is driving the need to use rcu for
> 
> Lifetime rules in block layer are kinda nebulous. Some of it comes
> from the fact that some objects are reused. Instead of the usual,
> create-use-release, they get repurposed to be associated with
> something else. When looking at such an object from some paths, we
> don't necessarily have ownership of all of the members.

That's horrid, it's not like block devices are on some "fast path" for
tear-down, we should do it correctly.

> > backing_device_info?  Are these being destroyed/used so often that rcu
> > really is the best solution and the existing reference counting doesn't
> > work properly?
> 
> It's more that there are entry points which can only ensure that just
> the top level object is valid and the member objects might be going or
> coming as we're looking at it.

That's not ok, a "member object" can only be valid if you have a
reference to it.  If you remove the object, you then drop the reference,
shouldn't that be the correct thing to do?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/7] bdi: fix use-after-free for bdi device
  2020-03-04 18:57   ` Tejun Heo
@ 2020-03-04 20:07     ` Greg KH
  0 siblings, 0 replies; 22+ messages in thread
From: Greg KH @ 2020-03-04 20:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Yufen Yu, axboe, linux-block, linux-fsdevel, jack, bvanassche, tytso

On Wed, Mar 04, 2020 at 01:57:39PM -0500, Tejun Heo wrote:
> Hey, Greg.
> 
> On Wed, Mar 04, 2020 at 06:29:07PM +0100, Greg KH wrote:
> > How does that happen?  Who has access to a kobject without also having
> > the reference count incremented at the same time?  Is this through sysfs
> > or somewhere within the kernel itself?
> 
> Hopefully, this part was addressed in the other reply.

Yes, thanks.

> > The struct device refcount should be all that is needed, don't use RCU
> > just to "delay freeing this object until some later time because someone
> > else might have a pointer to id".  That's ripe for disaster.
> 
> I think it's an idiomatic use of rcu given the circumstances. Whether
> the circumstances are reasonable is totally debatable.

They are not reasonable :)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/7] bdi: protect device lifetime with RCU
  2020-03-04 20:05         ` Greg Kroah-Hartman
@ 2020-03-05  1:22           ` Tejun Heo
  2020-03-06 16:25             ` Greg Kroah-Hartman
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2020-03-05  1:22 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Yufen Yu, axboe, linux-block, linux-fsdevel, jack, bvanassche, tytso

Hello,

On Wed, Mar 04, 2020 at 09:05:59PM +0100, Greg Kroah-Hartman wrote:
> > Lifetime rules in block layer are kinda nebulous. Some of it comes
> > from the fact that some objects are reused. Instead of the usual,
> > create-use-release, they get repurposed to be associated with
> > something else. When looking at such an object from some paths, we
> > don't necessarily have ownership of all of the members.
> 
> That's horrid, it's not like block devices are on some "fast path" for
> tear-down, we should do it correctly.

Yeah, it got retrofitted umpteenth times from the really early days. I
don't think much of it is intentionally designed to be this way.

> > > backing_device_info?  Are these being destroyed/used so often that rcu
> > > really is the best solution and the existing reference counting doesn't
> > > work properly?
> > 
> > It's more that there are entry points which can only ensure that just
> > the top level object is valid and the member objects might be going or
> > coming as we're looking at it.
> 
> That's not ok, a "member object" can only be valid if you have a
> reference to it.  If you remove the object, you then drop the reference,
> shouldn't that be the correct thing to do?

I mean, it depends. There are two layers of objects and the top level
object has two stacked lifetime rules. The "active" usage pins
everything as usual. The "shallower" usage only has full access to the
top level and when it reaches down into members it needs a different
mechanism to ensure its validity. Given a clean slate, I don't think
we'd go for this design for these objects but the usage isn't
fundamentally broken.

Idk, for the problem at hand, the choice is between patching it up by
copying the name and RCU protecting ->dev access at least for now.
Both are nasty in their own ways but copying does have a smaller blast
radius. So, copy for now?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/7] bdi: protect device lifetime with RCU
  2020-03-05  1:22           ` Tejun Heo
@ 2020-03-06 16:25             ` Greg Kroah-Hartman
  2020-03-07  9:13               ` Yufen Yu
  0 siblings, 1 reply; 22+ messages in thread
From: Greg Kroah-Hartman @ 2020-03-06 16:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Yufen Yu, axboe, linux-block, linux-fsdevel, jack, bvanassche, tytso

On Wed, Mar 04, 2020 at 08:22:11PM -0500, Tejun Heo wrote:
> Hello,
> 
> On Wed, Mar 04, 2020 at 09:05:59PM +0100, Greg Kroah-Hartman wrote:
> > > Lifetime rules in block layer are kinda nebulous. Some of it comes
> > > from the fact that some objects are reused. Instead of the usual,
> > > create-use-release, they get repurposed to be associated with
> > > something else. When looking at such an object from some paths, we
> > > don't necessarily have ownership of all of the members.
> > 
> > That's horrid, it's not like block devices are on some "fast path" for
> > tear-down, we should do it correctly.
> 
> Yeah, it got retrofitted umpteenth times from the really early days. I
> don't think much of it is intentionally designed to be this way.
> 
> > > > backing_device_info?  Are these being destroyed/used so often that rcu
> > > > really is the best solution and the existing reference counting doesn't
> > > > work properly?
> > > 
> > > It's more that there are entry points which can only ensure that just
> > > the top level object is valid and the member objects might be going or
> > > coming as we're looking at it.
> > 
> > That's not ok, a "member object" can only be valid if you have a
> > reference to it.  If you remove the object, you then drop the reference,
> > shouldn't that be the correct thing to do?
> 
> I mean, it depends. There are two layers of objects and the top level
> object has two stacked lifetime rules. The "active" usage pins
> everything as usual. The "shallower" usage only has full access to the
> top level and when it reaches down into members it needs a different
> mechanism to ensure its validity. Given a clean slate, I don't think
> we'd go for this design for these objects but the usage isn't
> fundamentally broken.
> 
> Idk, for the problem at hand, the choice is between patching it up by
> copying the name and RCU protecting ->dev access at least for now.
> Both are nasty in their own ways but copying does have a smaller blast
> radius. So, copy for now?

Yes, copy for now, don't mess with RCU and the struct device lifetime,
that is not going to solve anything.

I'll put the "fix the lifetime rules in the block layer" on my todo
list, at the bottom :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/7] bdi: protect device lifetime with RCU
  2020-03-06 16:25             ` Greg Kroah-Hartman
@ 2020-03-07  9:13               ` Yufen Yu
  0 siblings, 0 replies; 22+ messages in thread
From: Yufen Yu @ 2020-03-07  9:13 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Tejun Heo
  Cc: axboe, linux-block, linux-fsdevel, jack, bvanassche, tytso



On 2020/3/7 0:25, Greg Kroah-Hartman wrote:
> On Wed, Mar 04, 2020 at 08:22:11PM -0500, Tejun Heo wrote:
>> Hello,
>>
>> On Wed, Mar 04, 2020 at 09:05:59PM +0100, Greg Kroah-Hartman wrote:
>>>> Lifetime rules in block layer are kinda nebulous. Some of it comes
>>>> from the fact that some objects are reused. Instead of the usual,
>>>> create-use-release, they get repurposed to be associated with
>>>> something else. When looking at such an object from some paths, we
>>>> don't necessarily have ownership of all of the members.
>>>
>>> That's horrid, it's not like block devices are on some "fast path" for
>>> tear-down, we should do it correctly.
>>
>> Yeah, it got retrofitted umpteenth times from the really early days. I
>> don't think much of it is intentionally designed to be this way.
>>
>>>>> backing_device_info?  Are these being destroyed/used so often that rcu
>>>>> really is the best solution and the existing reference counting doesn't
>>>>> work properly?
>>>>
>>>> It's more that there are entry points which can only ensure that just
>>>> the top level object is valid and the member objects might be going or
>>>> coming as we're looking at it.
>>>
>>> That's not ok, a "member object" can only be valid if you have a
>>> reference to it.  If you remove the object, you then drop the reference,
>>> shouldn't that be the correct thing to do?
>>
>> I mean, it depends. There are two layers of objects and the top level
>> object has two stacked lifetime rules. The "active" usage pins
>> everything as usual. The "shallower" usage only has full access to the
>> top level and when it reaches down into members it needs a different
>> mechanism to ensure its validity. Given a clean slate, I don't think
>> we'd go for this design for these objects but the usage isn't
>> fundamentally broken.
>>
>> Idk, for the problem at hand, the choice is between patching it up by
>> copying the name and RCU protecting ->dev access at least for now.
>> Both are nasty in their own ways but copying does have a smaller blast
>> radius. So, copy for now?
> 
> Yes, copy for now, don't mess with RCU and the struct device lifetime,
> that is not going to solve anything.

Okay, I will try to rework the fix by copying the name. Thanks so much for all
response and suggestion.

Thanks,
Yufen


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-03-07  9:13 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-26 11:18 [PATCH v2 0/7] bdi: fix use-after-free for bdi device Yufen Yu
2020-02-26 11:18 ` [PATCH v2 1/7] blk-wbt: use bdi_dev_name() to get device name Yufen Yu
2020-02-26 11:18 ` [PATCH v2 2/7] fs/ceph: " Yufen Yu
2020-02-26 11:18 ` [PATCH v2 3/7] bdi: protect device lifetime with RCU Yufen Yu
2020-03-04 17:05   ` Tejun Heo
2020-03-04 17:22     ` Greg Kroah-Hartman
2020-03-04 17:23       ` Greg Kroah-Hartman
2020-03-04 18:50       ` Tejun Heo
2020-03-04 19:10         ` Theodore Y. Ts'o
2020-03-04 19:15           ` Tejun Heo
2020-03-04 20:05         ` Greg Kroah-Hartman
2020-03-05  1:22           ` Tejun Heo
2020-03-06 16:25             ` Greg Kroah-Hartman
2020-03-07  9:13               ` Yufen Yu
2020-02-26 11:18 ` [PATCH v2 4/7] bdi: create a new function bdi_get_dev_name() Yufen Yu
2020-02-26 11:18 ` [PATCH v2 5/7] bfq: fix potential kernel crash when print dev err info Yufen Yu
2020-02-26 11:18 ` [PATCH v2 6/7] memcg: fix crash in wb_workfn when bdi unregister Yufen Yu
2020-02-26 11:18 ` [PATCH v2 7/7] blk-wbt: replace bdi_dev_name() with bdi_get_dev_name() Yufen Yu
2020-03-04 17:29 ` [PATCH v2 0/7] bdi: fix use-after-free for bdi device Greg KH
2020-03-04 18:57   ` Tejun Heo
2020-03-04 20:07     ` Greg KH
2020-03-04 19:02   ` Theodore Y. Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).