linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] bdi: fix use-after-free for bdi device
@ 2020-03-23 13:22 Yufen Yu
  2020-03-23 13:22 ` [PATCH v3 1/4] bdi: use bdi_dev_name() to get device name Yufen Yu
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Yufen Yu @ 2020-03-23 13:22 UTC (permalink / raw)
  To: axboe, linux-block; +Cc: tj, jack, bvanassche, tytso, gregkh

Hi, all 

We have reported a use-after-free crash for bdi device in __blkg_prfill_rwstat().
The bug is caused by printing device kobj->name while the device and kobj->name
has been freed by bdi_unregister().

In fact, commit 68f23b8906 "memcg: fix a crash in wb_workfn when a device disappears"
has tried to address the issue, but the code is till somewhat racy after that commit.

In this patchset, we try to protect bdi->dev with spinlock, and copy device name
into buffer, avoiding use-after-free. 

V2:
  https://www.spinics.net/lists/linux-fsdevel/msg163206.html
  Rry to protect device lifetime with RCU.

V1:
  https://www.spinics.net/lists/linux-block/msg49693.html
  Add a new spinlock and copy kobj->name into caller buffer.
  Or using synchronize_rcu() to wait until reader complete.

Yufen Yu (4):
  bdi: use bdi_dev_name() to get device name
  bdi: add new bdi_get_dev_name()
  bdi: replace bdi_dev_name() with bdi_get_dev_name()
  bdi: protect bdi->dev with spinlock

 block/bfq-iosched.c              |  6 +++--
 block/blk-cgroup-rwstat.c        |  6 +++--
 block/blk-cgroup.c               | 19 +++++-----------
 block/blk-iocost.c               | 14 +++++++-----
 block/blk-iolatency.c            |  5 +++--
 block/blk-throttle.c             |  6 +++--
 fs/ceph/debugfs.c                |  2 +-
 fs/fs-writeback.c                |  4 +++-
 include/linux/backing-dev-defs.h |  1 +
 include/linux/backing-dev.h      | 23 +++++++++++++++++++
 include/linux/blk-cgroup.h       |  1 -
 include/trace/events/wbt.h       |  8 +++----
 include/trace/events/writeback.h | 38 ++++++++++++++------------------
 mm/backing-dev.c                 |  9 ++++++--
 14 files changed, 85 insertions(+), 57 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/4] bdi: use bdi_dev_name() to get device name
  2020-03-23 13:22 [PATCH v3 0/4] bdi: fix use-after-free for bdi device Yufen Yu
@ 2020-03-23 13:22 ` Yufen Yu
  2020-03-23 16:52   ` Christoph Hellwig
  2020-03-23 13:22 ` [PATCH v3 2/4] bdi: add new bdi_get_dev_name() Yufen Yu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Yufen Yu @ 2020-03-23 13:22 UTC (permalink / raw)
  To: axboe, linux-block; +Cc: tj, jack, bvanassche, tytso, gregkh

Use the common interface bdi_dev_name() to get device name.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 block/bfq-iosched.c        | 5 +++--
 block/blk-cgroup.c         | 2 +-
 fs/ceph/debugfs.c          | 2 +-
 include/trace/events/wbt.h | 8 ++++----
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 8c436abfaf14..94261b7d7181 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4978,8 +4978,9 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
 	ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
 	switch (ioprio_class) {
 	default:
-		dev_err(bfqq->bfqd->queue->backing_dev_info->dev,
-			"bfq: bad prio class %d\n", ioprio_class);
+		pr_err("bdi %s: bfq: bad prio class %d\n",
+				bdi_dev_name(bfqq->bfqd->queue->backing_dev_info),
+				ioprio_class);
 		/* fall through */
 	case IOPRIO_CLASS_NONE:
 		/*
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index a229b94d5390..be92405c6547 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -496,7 +496,7 @@ const char *blkg_dev_name(struct blkcg_gq *blkg)
 {
 	/* some drivers (floppy) instantiate a queue w/o disk registered */
 	if (blkg->q->backing_dev_info->dev)
-		return dev_name(blkg->q->backing_dev_info->dev);
+		return bdi_dev_name(blkg->q->backing_dev_info);
 	return NULL;
 }
 
diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index fb7cabd98e7b..1f8f3b631adb 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -271,7 +271,7 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc)
 				    &congestion_kb_fops);
 
 	snprintf(name, sizeof(name), "../../bdi/%s",
-		 dev_name(fsc->sb->s_bdi->dev));
+		 bdi_dev_name(fsc->sb->s_bdi));
 	fsc->debugfs_bdi =
 		debugfs_create_symlink("bdi",
 				       fsc->client->debugfs_dir,
diff --git a/include/trace/events/wbt.h b/include/trace/events/wbt.h
index 37342a13c9cb..9996420d7ec4 100644
--- a/include/trace/events/wbt.h
+++ b/include/trace/events/wbt.h
@@ -33,7 +33,7 @@ TRACE_EVENT(wbt_stat,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, dev_name(bdi->dev),
+		strlcpy(__entry->name, bdi_dev_name(bdi),
 			ARRAY_SIZE(__entry->name));
 		__entry->rmean		= stat[0].mean;
 		__entry->rmin		= stat[0].min;
@@ -68,7 +68,7 @@ TRACE_EVENT(wbt_lat,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, dev_name(bdi->dev),
+		strlcpy(__entry->name, bdi_dev_name(bdi),
 			ARRAY_SIZE(__entry->name));
 		__entry->lat = div_u64(lat, 1000);
 	),
@@ -105,7 +105,7 @@ TRACE_EVENT(wbt_step,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, dev_name(bdi->dev),
+		strlcpy(__entry->name, bdi_dev_name(bdi),
 			ARRAY_SIZE(__entry->name));
 		__entry->msg	= msg;
 		__entry->step	= step;
@@ -141,7 +141,7 @@ TRACE_EVENT(wbt_timer,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, dev_name(bdi->dev),
+		strlcpy(__entry->name, bdi_dev_name(bdi),
 			ARRAY_SIZE(__entry->name));
 		__entry->status		= status;
 		__entry->step		= step;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/4] bdi: add new bdi_get_dev_name()
  2020-03-23 13:22 [PATCH v3 0/4] bdi: fix use-after-free for bdi device Yufen Yu
  2020-03-23 13:22 ` [PATCH v3 1/4] bdi: use bdi_dev_name() to get device name Yufen Yu
@ 2020-03-23 13:22 ` Yufen Yu
  2020-03-23 13:22 ` [PATCH v3 3/4] bdi: replace bdi_dev_name() with bdi_get_dev_name() Yufen Yu
  2020-03-23 13:22 ` [PATCH v3 4/4] bdi: protect bdi->dev with spinlock Yufen Yu
  3 siblings, 0 replies; 9+ messages in thread
From: Yufen Yu @ 2020-03-23 13:22 UTC (permalink / raw)
  To: axboe, linux-block; +Cc: tj, jack, bvanassche, tytso, gregkh

We add this new function to copy device name into buffer.
This is prepare for following patch.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 include/linux/backing-dev.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index f88197c1ffc2..82d2401fec37 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -19,6 +19,8 @@
 #include <linux/backing-dev-defs.h>
 #include <linux/slab.h>
 
+#define BDI_DEV_NAME_LEN	32
+
 static inline struct backing_dev_info *bdi_get(struct backing_dev_info *bdi)
 {
 	kref_get(&bdi->refcnt);
@@ -514,4 +516,21 @@ static inline const char *bdi_dev_name(struct backing_dev_info *bdi)
 	return dev_name(bdi->dev);
 }
 
+/**
+ * bdi_get_dev_name - copy bdi device name into buffer
+ * @bdi: target bdi
+ * @dname: Where to copy the device name to
+ * @len: size of destination buffer
+ */
+static inline char *bdi_get_dev_name(struct backing_dev_info *bdi,
+			char *dname, int len)
+{
+	if (!bdi || !bdi->dev) {
+		strlcpy(dname, bdi_unknown_name, len);
+		return NULL;
+	}
+
+	strlcpy(dname, dev_name(bdi->dev), len);
+	return dname;
+}
 #endif	/* _LINUX_BACKING_DEV_H */
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 3/4] bdi: replace bdi_dev_name() with bdi_get_dev_name()
  2020-03-23 13:22 [PATCH v3 0/4] bdi: fix use-after-free for bdi device Yufen Yu
  2020-03-23 13:22 ` [PATCH v3 1/4] bdi: use bdi_dev_name() to get device name Yufen Yu
  2020-03-23 13:22 ` [PATCH v3 2/4] bdi: add new bdi_get_dev_name() Yufen Yu
@ 2020-03-23 13:22 ` Yufen Yu
  2020-03-23 16:54   ` Christoph Hellwig
  2020-03-23 13:22 ` [PATCH v3 4/4] bdi: protect bdi->dev with spinlock Yufen Yu
  3 siblings, 1 reply; 9+ messages in thread
From: Yufen Yu @ 2020-03-23 13:22 UTC (permalink / raw)
  To: axboe, linux-block; +Cc: tj, jack, bvanassche, tytso, gregkh

Since kobj->name can be freed by bdi_unregister(), we try to copy
the name into buffer rather than return name pointer. This patch
is prepare for following patch to fix use-after-free for bdi->dev.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 block/bfq-iosched.c              |  7 +++---
 block/blk-cgroup-rwstat.c        |  6 +++--
 block/blk-cgroup.c               | 19 +++++-----------
 block/blk-iocost.c               | 14 +++++++-----
 block/blk-iolatency.c            |  5 +++--
 block/blk-throttle.c             |  6 +++--
 fs/fs-writeback.c                |  4 +++-
 include/linux/blk-cgroup.h       |  1 -
 include/trace/events/wbt.h       |  8 +++----
 include/trace/events/writeback.h | 38 ++++++++++++++------------------
 10 files changed, 53 insertions(+), 55 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 94261b7d7181..a8a67a95006e 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4971,6 +4971,7 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
 	struct task_struct *tsk = current;
 	int ioprio_class;
 	struct bfq_data *bfqd = bfqq->bfqd;
+	char dname[BDI_DEV_NAME_LEN];
 
 	if (!bfqd)
 		return;
@@ -4978,9 +4979,9 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
 	ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
 	switch (ioprio_class) {
 	default:
-		pr_err("bdi %s: bfq: bad prio class %d\n",
-				bdi_dev_name(bfqq->bfqd->queue->backing_dev_info),
-				ioprio_class);
+		bdi_get_dev_name(bfqq->bfqd->queue->backing_dev_info, dname,
+				BDI_DEV_NAME_LEN);
+		pr_err("bdi %s: bfq: bad prio class %d\n", dname, ioprio_class);
 		/* fall through */
 	case IOPRIO_CLASS_NONE:
 		/*
diff --git a/block/blk-cgroup-rwstat.c b/block/blk-cgroup-rwstat.c
index 85d5790ac49b..2dc260802228 100644
--- a/block/blk-cgroup-rwstat.c
+++ b/block/blk-cgroup-rwstat.c
@@ -4,6 +4,7 @@
  * Do not use in new code.
  */
 #include "blk-cgroup-rwstat.h"
+#include <linux/backing-dev.h>
 
 int blkg_rwstat_init(struct blkg_rwstat *rwstat, gfp_t gfp)
 {
@@ -49,11 +50,12 @@ u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
 		[BLKG_RWSTAT_ASYNC]	= "Async",
 		[BLKG_RWSTAT_DISCARD]	= "Discard",
 	};
-	const char *dname = blkg_dev_name(pd->blkg);
+	char dname[BDI_DEV_NAME_LEN];
 	u64 v;
 	int i;
 
-	if (!dname)
+	if (!bdi_get_dev_name(pd->blkg->q->backing_dev_info, dname,
+				BDI_DEV_NAME_LEN))
 		return 0;
 
 	for (i = 0; i < BLKG_RWSTAT_NR; i++)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index be92405c6547..a072ca6141ca 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -492,14 +492,6 @@ static int blkcg_reset_stats(struct cgroup_subsys_state *css,
 	return 0;
 }
 
-const char *blkg_dev_name(struct blkcg_gq *blkg)
-{
-	/* some drivers (floppy) instantiate a queue w/o disk registered */
-	if (blkg->q->backing_dev_info->dev)
-		return bdi_dev_name(blkg->q->backing_dev_info);
-	return NULL;
-}
-
 /**
  * blkcg_print_blkgs - helper for printing per-blkg data
  * @sf: seq_file to print to
@@ -551,9 +543,10 @@ EXPORT_SYMBOL_GPL(blkcg_print_blkgs);
  */
 u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd, u64 v)
 {
-	const char *dname = blkg_dev_name(pd->blkg);
+	char dname[BDI_DEV_NAME_LEN];
 
-	if (!dname)
+	if (!bdi_get_dev_name(pd->blkg->q->backing_dev_info, dname,
+				BDI_DEV_NAME_LEN))
 		return 0;
 
 	seq_printf(sf, "%s %llu\n", dname, (unsigned long long)v);
@@ -749,7 +742,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
 
 	hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) {
 		struct blkg_iostat_set *bis = &blkg->iostat;
-		const char *dname;
+		char dname[BDI_DEV_NAME_LEN];
 		char *buf;
 		u64 rbytes, wbytes, rios, wios, dbytes, dios;
 		size_t size = seq_get_buf(sf, &buf), off = 0;
@@ -762,8 +755,8 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
 		if (!blkg->online)
 			goto skip;
 
-		dname = blkg_dev_name(blkg);
-		if (!dname)
+		if (!bdi_get_dev_name(blkg->q->backing_dev_info, dname,
+					BDI_DEV_NAME_LEN))
 			goto skip;
 
 		/*
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 27ca68621137..4231d65915c8 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2047,10 +2047,11 @@ static void ioc_pd_free(struct blkg_policy_data *pd)
 static u64 ioc_weight_prfill(struct seq_file *sf, struct blkg_policy_data *pd,
 			     int off)
 {
-	const char *dname = blkg_dev_name(pd->blkg);
+	char dname[BDI_DEV_NAME_LEN];
 	struct ioc_gq *iocg = pd_to_iocg(pd);
 
-	if (dname && iocg->cfg_weight)
+	if (bdi_get_dev_name(pd->blkg, dname, BDI_DEV_NAME_LEN) &&
+			iocg->cfg_weight)
 		seq_printf(sf, "%s %u\n", dname, iocg->cfg_weight);
 	return 0;
 }
@@ -2133,10 +2134,11 @@ static ssize_t ioc_weight_write(struct kernfs_open_file *of, char *buf,
 static u64 ioc_qos_prfill(struct seq_file *sf, struct blkg_policy_data *pd,
 			  int off)
 {
-	const char *dname = blkg_dev_name(pd->blkg);
+	char dname[BDI_DEV_NAME_LEN];
 	struct ioc *ioc = pd_to_iocg(pd)->ioc;
 
-	if (!dname)
+	if (!bdi_get_dev_name(pd->blkg->q->backing_dev_info, dname,
+				BDI_DEV_NAME_LEN))
 		return 0;
 
 	seq_printf(sf, "%s enable=%d ctrl=%s rpct=%u.%02u rlat=%u wpct=%u.%02u wlat=%u min=%u.%02u max=%u.%02u\n",
@@ -2304,11 +2306,11 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
 static u64 ioc_cost_model_prfill(struct seq_file *sf,
 				 struct blkg_policy_data *pd, int off)
 {
-	const char *dname = blkg_dev_name(pd->blkg);
+	char dname[BDI_DEV_NAME_LEN];
 	struct ioc *ioc = pd_to_iocg(pd)->ioc;
 	u64 *u = ioc->params.i_lcoefs;
 
-	if (!dname)
+	if (!bdi_get_dev_name(pd->blkg, dname, BDI_DEV_NAME_LEN))
 		return 0;
 
 	seq_printf(sf, "%s ctrl=%s model=linear "
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index c128d50cb410..2ab49167aea1 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -868,9 +868,10 @@ static u64 iolatency_prfill_limit(struct seq_file *sf,
 				  struct blkg_policy_data *pd, int off)
 {
 	struct iolatency_grp *iolat = pd_to_lat(pd);
-	const char *dname = blkg_dev_name(pd->blkg);
+	char dname[BDI_DEV_NAME_LEN];
 
-	if (!dname || !iolat->min_lat_nsec)
+	if (!bdi_get_dev_name(pd->blkg, dname, BDI_DEV_NAME_LEN)
+			|| !iolat->min_lat_nsec)
 		return 0;
 	seq_printf(sf, "%s target=%llu\n",
 		   dname, div_u64(iolat->min_lat_nsec, NSEC_PER_USEC));
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 98233c9c65a8..f77dc93d4e83 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -13,6 +13,7 @@
 #include <linux/blk-cgroup.h>
 #include "blk.h"
 #include "blk-cgroup-rwstat.h"
+#include <linux/backing-dev.h>
 
 /* Max dispatch from a group in 1 round */
 static int throtl_grp_quantum = 8;
@@ -1560,14 +1561,15 @@ static u64 tg_prfill_limit(struct seq_file *sf, struct blkg_policy_data *pd,
 			 int off)
 {
 	struct throtl_grp *tg = pd_to_tg(pd);
-	const char *dname = blkg_dev_name(pd->blkg);
+	char dname[BDI_DEV_NAME_LEN];
 	char bufs[4][21] = { "max", "max", "max", "max" };
 	u64 bps_dft;
 	unsigned int iops_dft;
 	char idle_time[26] = "";
 	char latency_time[26] = "";
 
-	if (!dname)
+	if (!bdi_get_dev_name(pd->blkg->q->backing_dev_info, dname,
+				BDI_DEV_NAME_LEN))
 		return 0;
 
 	if (off == LIMIT_LOW) {
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 76ac9c7d32ec..8d36c256560c 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2062,8 +2062,10 @@ void wb_workfn(struct work_struct *work)
 	struct bdi_writeback *wb = container_of(to_delayed_work(work),
 						struct bdi_writeback, dwork);
 	long pages_written;
+	char dname[BDI_DEV_NAME_LEN];
 
-	set_worker_desc("flush-%s", bdi_dev_name(wb->bdi));
+	set_worker_desc("flush-%s",
+			bdi_get_dev_name(wb->bdi, dname, BDI_DEV_NAME_LEN));
 	current->flags |= PF_SWAPWRITE;
 
 	if (likely(!current_is_workqueue_rescuer() ||
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index e4a6949fd171..384b3343d5f4 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -198,7 +198,6 @@ int blkcg_activate_policy(struct request_queue *q,
 void blkcg_deactivate_policy(struct request_queue *q,
 			     const struct blkcg_policy *pol);
 
-const char *blkg_dev_name(struct blkcg_gq *blkg);
 void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg,
 		       u64 (*prfill)(struct seq_file *,
 				     struct blkg_policy_data *, int),
diff --git a/include/trace/events/wbt.h b/include/trace/events/wbt.h
index 9996420d7ec4..3fbf15565962 100644
--- a/include/trace/events/wbt.h
+++ b/include/trace/events/wbt.h
@@ -33,7 +33,7 @@ TRACE_EVENT(wbt_stat,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, bdi_dev_name(bdi),
+		bdi_get_dev_name(bdi, __entry->name,
 			ARRAY_SIZE(__entry->name));
 		__entry->rmean		= stat[0].mean;
 		__entry->rmin		= stat[0].min;
@@ -68,7 +68,7 @@ TRACE_EVENT(wbt_lat,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, bdi_dev_name(bdi),
+		bdi_get_dev_name(bdi, __entry->name,
 			ARRAY_SIZE(__entry->name));
 		__entry->lat = div_u64(lat, 1000);
 	),
@@ -105,7 +105,7 @@ TRACE_EVENT(wbt_step,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, bdi_dev_name(bdi),
+		bdi_get_dev_name(bdi, __entry->name,
 			ARRAY_SIZE(__entry->name));
 		__entry->msg	= msg;
 		__entry->step	= step;
@@ -141,7 +141,7 @@ TRACE_EVENT(wbt_timer,
 	),
 
 	TP_fast_assign(
-		strlcpy(__entry->name, bdi_dev_name(bdi),
+		bdi_get_dev_name(bdi, __entry->name,
 			ARRAY_SIZE(__entry->name));
 		__entry->status		= status;
 		__entry->step		= step;
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index d94def25e4dc..4767a5ab5e36 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -66,9 +66,8 @@ DECLARE_EVENT_CLASS(writeback_page_template,
 	),
 
 	TP_fast_assign(
-		strscpy_pad(__entry->name,
-			    bdi_dev_name(mapping ? inode_to_bdi(mapping->host) :
-					 NULL), 32);
+		bdi_get_dev_name(mapping ? inode_to_bdi(mapping->host) : NULL,
+				__entry->name, 32);
 		__entry->ino = mapping ? mapping->host->i_ino : 0;
 		__entry->index = page->index;
 	),
@@ -111,7 +110,7 @@ DECLARE_EVENT_CLASS(writeback_dirty_inode_template,
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
 
 		/* may be called for files on pseudo FSes w/ unregistered bdi */
-		strscpy_pad(__entry->name, bdi_dev_name(bdi), 32);
+		bdi_get_dev_name(bdi, __entry->name, 32);
 		__entry->ino		= inode->i_ino;
 		__entry->state		= inode->i_state;
 		__entry->flags		= flags;
@@ -192,7 +191,7 @@ TRACE_EVENT(inode_foreign_history,
 	),
 
 	TP_fast_assign(
-		strncpy(__entry->name, bdi_dev_name(inode_to_bdi(inode)), 32);
+		bdi_get_dev_name(inode_to_bdi(inode), __entry->name, 32);
 		__entry->ino		= inode->i_ino;
 		__entry->cgroup_ino	= __trace_wbc_assign_cgroup(wbc);
 		__entry->history	= history;
@@ -221,7 +220,7 @@ TRACE_EVENT(inode_switch_wbs,
 	),
 
 	TP_fast_assign(
-		strncpy(__entry->name,	bdi_dev_name(old_wb->bdi), 32);
+		bdi_get_dev_name(old_wb->bdi, __entry->name, 32);
 		__entry->ino		= inode->i_ino;
 		__entry->old_cgroup_ino	= __trace_wb_assign_cgroup(old_wb);
 		__entry->new_cgroup_ino	= __trace_wb_assign_cgroup(new_wb);
@@ -254,7 +253,7 @@ TRACE_EVENT(track_foreign_dirty,
 		struct address_space *mapping = page_mapping(page);
 		struct inode *inode = mapping ? mapping->host : NULL;
 
-		strncpy(__entry->name,	bdi_dev_name(wb->bdi), 32);
+		bdi_get_dev_name(wb->bdi, __entry->name, 32);
 		__entry->bdi_id		= wb->bdi->id;
 		__entry->ino		= inode ? inode->i_ino : 0;
 		__entry->memcg_id	= wb->memcg_css->id;
@@ -287,7 +286,7 @@ TRACE_EVENT(flush_foreign,
 	),
 
 	TP_fast_assign(
-		strncpy(__entry->name,	bdi_dev_name(wb->bdi), 32);
+		bdi_get_dev_name(wb->bdi, __entry->name, 32);
 		__entry->cgroup_ino	= __trace_wb_assign_cgroup(wb);
 		__entry->frn_bdi_id	= frn_bdi_id;
 		__entry->frn_memcg_id	= frn_memcg_id;
@@ -316,8 +315,7 @@ DECLARE_EVENT_CLASS(writeback_write_inode_template,
 	),
 
 	TP_fast_assign(
-		strscpy_pad(__entry->name,
-			    bdi_dev_name(inode_to_bdi(inode)), 32);
+		bdi_get_dev_name(inode_to_bdi(inode), __entry->name, 32);
 		__entry->ino		= inode->i_ino;
 		__entry->sync_mode	= wbc->sync_mode;
 		__entry->cgroup_ino	= __trace_wbc_assign_cgroup(wbc);
@@ -360,7 +358,7 @@ DECLARE_EVENT_CLASS(writeback_work_class,
 		__field(ino_t, cgroup_ino)
 	),
 	TP_fast_assign(
-		strscpy_pad(__entry->name, bdi_dev_name(wb->bdi), 32);
+		bdi_get_dev_name(wb->bdi, __entry->name, 32);
 		__entry->nr_pages = work->nr_pages;
 		__entry->sb_dev = work->sb ? work->sb->s_dev : 0;
 		__entry->sync_mode = work->sync_mode;
@@ -413,7 +411,7 @@ DECLARE_EVENT_CLASS(writeback_class,
 		__field(ino_t, cgroup_ino)
 	),
 	TP_fast_assign(
-		strscpy_pad(__entry->name, bdi_dev_name(wb->bdi), 32);
+		bdi_get_dev_name(wb->bdi, __entry->name, 32);
 		__entry->cgroup_ino = __trace_wb_assign_cgroup(wb);
 	),
 	TP_printk("bdi %s: cgroup_ino=%lu",
@@ -435,7 +433,7 @@ TRACE_EVENT(writeback_bdi_register,
 		__array(char, name, 32)
 	),
 	TP_fast_assign(
-		strscpy_pad(__entry->name, bdi_dev_name(bdi), 32);
+		bdi_get_dev_name(bdi, __entry->name, 32);
 	),
 	TP_printk("bdi %s",
 		__entry->name
@@ -460,7 +458,7 @@ DECLARE_EVENT_CLASS(wbc_class,
 	),
 
 	TP_fast_assign(
-		strscpy_pad(__entry->name, bdi_dev_name(bdi), 32);
+		bdi_get_dev_name(bdi, __entry->name, 32);
 		__entry->nr_to_write	= wbc->nr_to_write;
 		__entry->pages_skipped	= wbc->pages_skipped;
 		__entry->sync_mode	= wbc->sync_mode;
@@ -511,7 +509,7 @@ TRACE_EVENT(writeback_queue_io,
 	),
 	TP_fast_assign(
 		unsigned long *older_than_this = work->older_than_this;
-		strscpy_pad(__entry->name, bdi_dev_name(wb->bdi), 32);
+		bdi_get_dev_name(wb->bdi, __entry->name, 32);
 		__entry->older	= older_than_this ?  *older_than_this : 0;
 		__entry->age	= older_than_this ?
 				  (jiffies - *older_than_this) * 1000 / HZ : -1;
@@ -597,7 +595,7 @@ TRACE_EVENT(bdi_dirty_ratelimit,
 	),
 
 	TP_fast_assign(
-		strscpy_pad(__entry->bdi, bdi_dev_name(wb->bdi), 32);
+		bdi_get_dev_name(wb->bdi, __entry->bdi, 32);
 		__entry->write_bw	= KBps(wb->write_bandwidth);
 		__entry->avg_write_bw	= KBps(wb->avg_write_bandwidth);
 		__entry->dirty_rate	= KBps(dirty_rate);
@@ -662,7 +660,7 @@ TRACE_EVENT(balance_dirty_pages,
 
 	TP_fast_assign(
 		unsigned long freerun = (thresh + bg_thresh) / 2;
-		strscpy_pad(__entry->bdi, bdi_dev_name(wb->bdi), 32);
+		bdi_get_dev_name(wb->bdi, __entry->bdi, 32);
 
 		__entry->limit		= global_wb_domain.dirty_limit;
 		__entry->setpoint	= (global_wb_domain.dirty_limit +
@@ -722,8 +720,7 @@ TRACE_EVENT(writeback_sb_inodes_requeue,
 	),
 
 	TP_fast_assign(
-		strscpy_pad(__entry->name,
-			    bdi_dev_name(inode_to_bdi(inode)), 32);
+		bdi_get_dev_name(inode_to_bdi(inode), __entry->name, 32);
 		__entry->ino		= inode->i_ino;
 		__entry->state		= inode->i_state;
 		__entry->dirtied_when	= inode->dirtied_when;
@@ -796,8 +793,7 @@ DECLARE_EVENT_CLASS(writeback_single_inode_template,
 	),
 
 	TP_fast_assign(
-		strscpy_pad(__entry->name,
-			    bdi_dev_name(inode_to_bdi(inode)), 32);
+		bdi_get_dev_name(inode_to_bdi(inode), __entry->name, 32);
 		__entry->ino		= inode->i_ino;
 		__entry->state		= inode->i_state;
 		__entry->dirtied_when	= inode->dirtied_when;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 4/4] bdi: protect bdi->dev with spinlock
  2020-03-23 13:22 [PATCH v3 0/4] bdi: fix use-after-free for bdi device Yufen Yu
                   ` (2 preceding siblings ...)
  2020-03-23 13:22 ` [PATCH v3 3/4] bdi: replace bdi_dev_name() with bdi_get_dev_name() Yufen Yu
@ 2020-03-23 13:22 ` Yufen Yu
  2020-03-23 14:04   ` Greg KH
  3 siblings, 1 reply; 9+ messages in thread
From: Yufen Yu @ 2020-03-23 13:22 UTC (permalink / raw)
  To: axboe, linux-block; +Cc: tj, jack, bvanassche, tytso, gregkh

We reported a kernel crash in __blkg_prfill_rwstat() for use-after-free
the bdi->dev. It tries to get the device name by 'bdi->dev', while the
device and kobj->name has been freed by bdi_unregister(). Then,
blkg_dev_name() will return an invalid name pointer, resulting in crash
on string(). The race as following:

CPU1                          CPU2
blkg_print_stat_bytes
                              __scsi_remove_device
                              del_gendisk
                                bdi_unregister

                                put_device(bdi->dev)
                                  kfree(bdi->dev)
__blkg_prfill_rwstat
  blkg_dev_name
    //use the freed bdi->dev
    dev_name(blkg->q->backing_dev_info->dev)
                                bdi->dev = NULL

To fix the bug, we add a new spinlock for bdi to protect the device.

In addition, to fix crash in wb_workfn when bdi_unreigster(), commit
68f23b89067f ("memcg: fix a crash in wb_workfn when a device disappears")
has created bdi_dev_name() to handle bdi->dev beening NULL. But, bdi->dev
can be freed after bdi_dev_name() return successly, which may cause
use-after-free for dev or kobj->name. After replacing bdi_dev_name()
with bdi_get_dev_name() and protect bdi->dev by spinlock, we can fix
the bug thoroughly.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 include/linux/backing-dev-defs.h | 1 +
 include/linux/backing-dev.h      | 4 ++++
 mm/backing-dev.c                 | 9 +++++++--
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 4fc87dee005a..c98dac4a7953 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -227,6 +227,7 @@ struct backing_dev_info {
 #ifdef CONFIG_DEBUG_FS
 	struct dentry *debug_dir;
 #endif
+	spinlock_t lock;		/* protects dev */
 };
 
 enum {
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 82d2401fec37..1c0e2d0d6236 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -525,12 +525,16 @@ static inline const char *bdi_dev_name(struct backing_dev_info *bdi)
 static inline char *bdi_get_dev_name(struct backing_dev_info *bdi,
 			char *dname, int len)
 {
+	spin_lock_irq(&bdi->lock);
 	if (!bdi || !bdi->dev) {
+		spin_unlock_irq(&bdi->lock);
 		strlcpy(dname, bdi_unknown_name, len);
 		return NULL;
 	}
 
 	strlcpy(dname, dev_name(bdi->dev), len);
+	spin_unlock_irq(&bdi->lock);
+
 	return dname;
 }
 #endif	/* _LINUX_BACKING_DEV_H */
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 62f05f605fb5..aa9ba7dcc2d9 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -859,6 +859,7 @@ static int bdi_init(struct backing_dev_info *bdi)
 	INIT_LIST_HEAD(&bdi->bdi_list);
 	INIT_LIST_HEAD(&bdi->wb_list);
 	init_waitqueue_head(&bdi->wb_waitq);
+	spin_lock_init(&bdi->lock);
 
 	ret = cgwb_bdi_init(bdi);
 
@@ -1007,15 +1008,19 @@ static void bdi_remove_from_list(struct backing_dev_info *bdi)
 
 void bdi_unregister(struct backing_dev_info *bdi)
 {
+	struct device *dev = bdi->dev;
+
 	/* make sure nobody finds us on the bdi_list anymore */
 	bdi_remove_from_list(bdi);
 	wb_shutdown(&bdi->wb);
 	cgwb_bdi_unregister(bdi);
 
-	if (bdi->dev) {
+	if (dev) {
 		bdi_debug_unregister(bdi);
-		device_unregister(bdi->dev);
+		spin_lock_irq(&bdi->lock);
 		bdi->dev = NULL;
+		spin_unlock_irq(&bdi->lock);
+		device_unregister(dev);
 	}
 
 	if (bdi->owner) {
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 4/4] bdi: protect bdi->dev with spinlock
  2020-03-23 13:22 ` [PATCH v3 4/4] bdi: protect bdi->dev with spinlock Yufen Yu
@ 2020-03-23 14:04   ` Greg KH
  2020-03-23 14:10     ` Yufen Yu
  0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2020-03-23 14:04 UTC (permalink / raw)
  To: Yufen Yu; +Cc: axboe, linux-block, tj, jack, bvanassche, tytso

On Mon, Mar 23, 2020 at 09:22:54PM +0800, Yufen Yu wrote:
> diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
> index 82d2401fec37..1c0e2d0d6236 100644
> --- a/include/linux/backing-dev.h
> +++ b/include/linux/backing-dev.h
> @@ -525,12 +525,16 @@ static inline const char *bdi_dev_name(struct backing_dev_info *bdi)
>  static inline char *bdi_get_dev_name(struct backing_dev_info *bdi,
>  			char *dname, int len)
>  {
> +	spin_lock_irq(&bdi->lock);
>  	if (!bdi || !bdi->dev) {
> +		spin_unlock_irq(&bdi->lock);

You can't test for (!bdi) right after you accessed bdi->lock :(


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 4/4] bdi: protect bdi->dev with spinlock
  2020-03-23 14:04   ` Greg KH
@ 2020-03-23 14:10     ` Yufen Yu
  0 siblings, 0 replies; 9+ messages in thread
From: Yufen Yu @ 2020-03-23 14:10 UTC (permalink / raw)
  To: Greg KH; +Cc: axboe, linux-block, tj, jack, bvanassche, tytso



On 2020/3/23 22:04, Greg KH wrote:
> On Mon, Mar 23, 2020 at 09:22:54PM +0800, Yufen Yu wrote:
>> diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
>> index 82d2401fec37..1c0e2d0d6236 100644
>> --- a/include/linux/backing-dev.h
>> +++ b/include/linux/backing-dev.h
>> @@ -525,12 +525,16 @@ static inline const char *bdi_dev_name(struct backing_dev_info *bdi)
>>   static inline char *bdi_get_dev_name(struct backing_dev_info *bdi,
>>   			char *dname, int len)
>>   {
>> +	spin_lock_irq(&bdi->lock);
>>   	if (!bdi || !bdi->dev) {
>> +		spin_unlock_irq(&bdi->lock);
> 
> You can't test for (!bdi) right after you accessed bdi->lock :(

Sorry for this coding error.

Thanks,
Yufen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/4] bdi: use bdi_dev_name() to get device name
  2020-03-23 13:22 ` [PATCH v3 1/4] bdi: use bdi_dev_name() to get device name Yufen Yu
@ 2020-03-23 16:52   ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2020-03-23 16:52 UTC (permalink / raw)
  To: Yufen Yu; +Cc: axboe, linux-block, tj, jack, bvanassche, tytso, gregkh

On Mon, Mar 23, 2020 at 09:22:51PM +0800, Yufen Yu wrote:
> Use the common interface bdi_dev_name() to get device name.
> 
> Signed-off-by: Yufen Yu <yuyufen@huawei.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 3/4] bdi: replace bdi_dev_name() with bdi_get_dev_name()
  2020-03-23 13:22 ` [PATCH v3 3/4] bdi: replace bdi_dev_name() with bdi_get_dev_name() Yufen Yu
@ 2020-03-23 16:54   ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2020-03-23 16:54 UTC (permalink / raw)
  To: Yufen Yu; +Cc: axboe, linux-block, tj, jack, bvanassche, tytso, gregkh

On Mon, Mar 23, 2020 at 09:22:53PM +0800, Yufen Yu wrote:
> Since kobj->name can be freed by bdi_unregister(), we try to copy
> the name into buffer rather than return name pointer. This patch
> is prepare for following patch to fix use-after-free for bdi->dev.

Well, most of these should have a bdi reference, because if they didn't
you couldn't copy out anyway.  I think you want to audit the callsites
and see who "leaks" the pointer and only copy in those cases.  And then
preferably send one patch per broken caller.  I'm also not really sure
if we need the new helper.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-03-23 16:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-23 13:22 [PATCH v3 0/4] bdi: fix use-after-free for bdi device Yufen Yu
2020-03-23 13:22 ` [PATCH v3 1/4] bdi: use bdi_dev_name() to get device name Yufen Yu
2020-03-23 16:52   ` Christoph Hellwig
2020-03-23 13:22 ` [PATCH v3 2/4] bdi: add new bdi_get_dev_name() Yufen Yu
2020-03-23 13:22 ` [PATCH v3 3/4] bdi: replace bdi_dev_name() with bdi_get_dev_name() Yufen Yu
2020-03-23 16:54   ` Christoph Hellwig
2020-03-23 13:22 ` [PATCH v3 4/4] bdi: protect bdi->dev with spinlock Yufen Yu
2020-03-23 14:04   ` Greg KH
2020-03-23 14:10     ` Yufen Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).