linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/3] block: address blktrace use-after-free
@ 2020-04-01 23:59 Luis Chamberlain
  2020-04-02  0:00 ` [RFC 1/3] block: move main block debugfs initialization to its own file Luis Chamberlain
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Luis Chamberlain @ 2020-04-01 23:59 UTC (permalink / raw)
  To: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange
  Cc: mhocko, linux-block, linux-fsdevel, linux-kernel, Luis Chamberlain

Upstream kernel.org korg#205713 contends that there is a UAF in
the core debugfs debugfs_remove() function, and has gone through
pushing for a CVE for this, CVE-2019-19770.

If correct then parent dentries are not positive, and this would
have implications far beyond this bug report. Thankfully, upon review
with Nicolai, he wasn't buying it. His suspicions that this was just
a blktrace issue were spot on, and this patch series demonstrates
that, provides a reproducer, and provides a solution to the issue.

We there would like to contend CVE-2019-19770 as invalid. The
implications suggested are not correct, and this issue is only
triggerable with root, by shooting yourself on the foot by misuing
blktrace.

If you want this on a git tree, you can get it from linux-next
20200401-blktrace-fix-uaf branch [2].

Wider review, testing, and rants are appreciated.

[0] https://bugzilla.kernel.org/show_bug.cgi?id=205713
[1] https://nvd.nist.gov/vuln/detail/CVE-2019-19770
[2] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20200401-blktrace-fix-uaf

Luis Chamberlain (3):
  block: move main block debugfs initialization to its own file
  blktrace: fix debugfs use after free
  block: avoid deferral of blk_release_queue() work

 block/Makefile               |  1 +
 block/blk-core.c             |  9 +--------
 block/blk-debugfs.c          | 27 +++++++++++++++++++++++++++
 block/blk-mq-debugfs.c       |  5 -----
 block/blk-sysfs.c            | 21 ++++++++-------------
 block/blk.h                  | 17 +++++++++++++++++
 include/linux/blktrace_api.h |  1 -
 kernel/trace/blktrace.c      | 19 ++++++++-----------
 8 files changed, 62 insertions(+), 38 deletions(-)
 create mode 100644 block/blk-debugfs.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC 1/3] block: move main block debugfs initialization to its own file
  2020-04-01 23:59 [RFC 0/3] block: address blktrace use-after-free Luis Chamberlain
@ 2020-04-02  0:00 ` Luis Chamberlain
  2020-04-05  3:12   ` Bart Van Assche
  2020-04-02  0:00 ` [RFC 2/3] blktrace: fix debugfs use after free Luis Chamberlain
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Luis Chamberlain @ 2020-04-02  0:00 UTC (permalink / raw)
  To: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange
  Cc: mhocko, linux-block, linux-fsdevel, linux-kernel,
	Luis Chamberlain, Bart Van Assche, Omar Sandoval,
	Hannes Reinecke, Michal Hocko

Single and multiqeueue block devices share some debugfs code. By
moving this into its own file it makes it easier to expand and audit
this shared code.

This patch contains no functional changes.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 block/Makefile      |  1 +
 block/blk-core.c    |  9 +--------
 block/blk-debugfs.c | 15 +++++++++++++++
 block/blk.h         |  7 +++++++
 4 files changed, 24 insertions(+), 8 deletions(-)
 create mode 100644 block/blk-debugfs.c

diff --git a/block/Makefile b/block/Makefile
index 206b96e9387f..1d3ab20505d8 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
 			blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
 			genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o
 
+obj-$(CONFIG_DEBUG_FS)		+= blk-debugfs.o
 obj-$(CONFIG_BOUNCE)		+= bounce.o
 obj-$(CONFIG_BLK_SCSI_REQUEST)	+= scsi_ioctl.o
 obj-$(CONFIG_BLK_DEV_BSG)	+= bsg.o
diff --git a/block/blk-core.c b/block/blk-core.c
index 7e4a1da0715e..5aaae7a1b338 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -48,10 +48,6 @@
 #include "blk-pm.h"
 #include "blk-rq-qos.h"
 
-#ifdef CONFIG_DEBUG_FS
-struct dentry *blk_debugfs_root;
-#endif
-
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap);
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap);
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_complete);
@@ -1796,10 +1792,7 @@ int __init blk_dev_init(void)
 
 	blk_requestq_cachep = kmem_cache_create("request_queue",
 			sizeof(struct request_queue), 0, SLAB_PANIC, NULL);
-
-#ifdef CONFIG_DEBUG_FS
-	blk_debugfs_root = debugfs_create_dir("block", NULL);
-#endif
+	blk_debugfs_register();
 
 	return 0;
 }
diff --git a/block/blk-debugfs.c b/block/blk-debugfs.c
new file mode 100644
index 000000000000..634dea4b1507
--- /dev/null
+++ b/block/blk-debugfs.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Shared debugfs mq / non-mq functionality
+ */
+#include <linux/kernel.h>
+#include <linux/blkdev.h>
+#include <linux/debugfs.h>
+
+struct dentry *blk_debugfs_root;
+
+void blk_debugfs_register(void)
+{
+	blk_debugfs_root = debugfs_create_dir("block", NULL);
+}
diff --git a/block/blk.h b/block/blk.h
index 0a94ec68af32..86a66b614f08 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -487,5 +487,12 @@ struct request_queue *__blk_alloc_queue(int node_id);
 int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
 		struct page *page, unsigned int len, unsigned int offset,
 		bool *same_page);
+#ifdef CONFIG_DEBUG_FS
+void blk_debugfs_register(void);
+#else
+static inline void blk_debugfs_register(void)
+{
+}
+#endif /* CONFIG_DEBUG_FS */
 
 #endif /* BLK_INTERNAL_H */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC 2/3] blktrace: fix debugfs use after free
  2020-04-01 23:59 [RFC 0/3] block: address blktrace use-after-free Luis Chamberlain
  2020-04-02  0:00 ` [RFC 1/3] block: move main block debugfs initialization to its own file Luis Chamberlain
@ 2020-04-02  0:00 ` Luis Chamberlain
  2020-04-02  1:57   ` Eric Sandeen
  2020-04-05  3:39   ` Bart Van Assche
  2020-04-02  0:00 ` [RFC 3/3] block: avoid deferral of blk_release_queue() work Luis Chamberlain
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 29+ messages in thread
From: Luis Chamberlain @ 2020-04-02  0:00 UTC (permalink / raw)
  To: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange
  Cc: mhocko, linux-block, linux-fsdevel, linux-kernel,
	Luis Chamberlain, Bart Van Assche, Omar Sandoval,
	Hannes Reinecke, Michal Hocko, syzbot+603294af2d01acfdd6da

On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory")
Omar fixed the original blktrace code for multiqueue use. This however
left in place a possible crash, if you happen to abuse blktrace in a way
it was not intended.

Namely, if you loop adding a device, setup the blktrace with BLKTRACESETUP,
forget to BLKTRACETEARDOWN, and then just remove the device you end up
with a panic:

[  107.193134] debugfs: Directory 'loop0' with parent 'block' already present!
[  107.254615] BUG: kernel NULL pointer dereference, address: 00000000000000a0
[  107.258785] #PF: supervisor write access in kernel mode
[  107.262035] #PF: error_code(0x0002) - not-present page
[  107.264106] PGD 0 P4D 0
[  107.264404] Oops: 0002 [#1] SMP NOPTI
[  107.264803] CPU: 8 PID: 674 Comm: kworker/8:2 Tainted: G            E 5.6.0-rc7-next-20200327 #1
[  107.265712] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
[  107.266553] Workqueue: events __blk_release_queue
[  107.267051] RIP: 0010:down_write+0x15/0x40
[  107.267488] Code: eb ca e8 ee a5 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 55 00 75 0f  65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d
[  107.269300] RSP: 0018:ffff9927c06efda8 EFLAGS: 00010246
[  107.269841] RAX: 0000000000000000 RBX: ffff8be7e73b0600 RCX: ffffff8100000000
[  107.270559] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0
[  107.271281] RBP: 00000000000000a0 R08: ffff8be7ebc80fa8 R09: ffff8be7ebc80fa8
[  107.272001] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  107.272722] R13: ffff8be7efc30400 R14: ffff8be7e0571200 R15: 00000000000000a0
[  107.273475] FS:  0000000000000000(0000) GS:ffff8be7efc00000(0000) knlGS:0000000000000000
[  107.274346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  107.274968] CR2: 00000000000000a0 CR3: 000000042abee003 CR4: 0000000000360ee0
[  107.275710] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  107.276465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  107.277214] Call Trace:
[  107.277532]  simple_recursive_removal+0x4e/0x2e0
[  107.278049]  ? debugfs_remove+0x60/0x60
[  107.278493]  debugfs_remove+0x40/0x60
[  107.278922]  blk_trace_free+0xd/0x50
[  107.279339]  __blk_trace_remove+0x27/0x40
[  107.279797]  blk_trace_shutdown+0x30/0x40
[  107.280256]  __blk_release_queue+0xab/0x110
[  107.280734]  process_one_work+0x1b4/0x380
[  107.281194]  worker_thread+0x50/0x3c0
[  107.281622]  kthread+0xf9/0x130
[  107.281994]  ? process_one_work+0x380/0x380
[  107.282467]  ? kthread_park+0x90/0x90
[  107.282895]  ret_from_fork+0x1f/0x40
[  107.283316] Modules linked in: loop(E) <etc>
[  107.288562] CR2: 00000000000000a0
[  107.288957] ---[ end trace b885d243d441bbce ]---

This splat happens to be very similar to the one reported via
kernel.org korg#205713, only that korg#205713 was for v4.19.83
and the above now includes the simple_recursive_removal() introduced
via commit a3d1e7eb5abe ("simple_recursive_removal(): kernel-side rm
-rf for ramfs-style filesystems") merged on v5.6.

korg#205713 then was used to create CVE-2019-19770 and claims that
the bug is in a use-after-free in the debugfs core code. The
implications of this being a generic UAF on debugfs would be
much more severe, as it would imply parent dentries can sometimes
not be possitive, which is something claim is not possible.

It turns out that the issue actually is a mis-use of debugfs for
the multiqueue case, and the fragile nature of how we free the
directory used to keep track of blktrace debugfs files. Omar's
commit assumed the parent directory would be kept with
debugfs_lookup() but this is not the case, only the dentry is
kept around. We also special-case a solution for multiqueue
given that for multiqueue code we always instantiate the debugfs
directory for the request queue. We were leaving it only to chance,
if someone happens to use blktrace, on single queue block devices
for the respective debugfs directory be created.

We can fix the UAF by simply using a debugfs directory which is
always created for singlequeue and multiqueue block devices. This
simplifies the code considerably, with the only penalty now being
that we're always creating the request queue directory debugfs
directory for the block device on singlequeue block devices.

The UAF then is not a core debugfs issue, but instead a mis-use of
debugfs, and this issue can only be triggered if you are root, and
mis-use blktrace.

This issue can be reproduced with break-blktrace [2] using:

  break-blktrace -c 10 -d

This patch fixes this issue. Note that there is also another
respective UAF but from the ioctl path [3], this should also fix
that issue.

This patch then also contends the severity of CVE-2019-19770 as
this issue is only possible using root to shoot yourself in the
foot by also misuing blktrace.

[0] https://bugzilla.kernel.org/show_bug.cgi?id=205713
[1] https://nvd.nist.gov/vuln/detail/CVE-2019-19770
[2] https://github.com/mcgrof/break-blktrace
[3] https://lore.kernel.org/lkml/000000000000ec635b059f752700@google.com/

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Reported-by: syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 block/blk-debugfs.c          | 12 ++++++++++++
 block/blk-mq-debugfs.c       |  5 -----
 block/blk-sysfs.c            |  3 +++
 block/blk.h                  | 10 ++++++++++
 include/linux/blktrace_api.h |  1 -
 kernel/trace/blktrace.c      | 19 ++++++++-----------
 6 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/block/blk-debugfs.c b/block/blk-debugfs.c
index 634dea4b1507..a8b343e758e4 100644
--- a/block/blk-debugfs.c
+++ b/block/blk-debugfs.c
@@ -13,3 +13,15 @@ void blk_debugfs_register(void)
 {
 	blk_debugfs_root = debugfs_create_dir("block", NULL);
 }
+
+void blk_q_debugfs_register(struct request_queue *q)
+{
+	q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
+					    blk_debugfs_root);
+}
+
+void blk_q_debugfs_unregister(struct request_queue *q)
+{
+	debugfs_remove_recursive(q->debugfs_dir);
+	q->debugfs_dir = NULL;
+}
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index b3f2ba483992..bda9378eab90 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -823,9 +823,6 @@ void blk_mq_debugfs_register(struct request_queue *q)
 	struct blk_mq_hw_ctx *hctx;
 	int i;
 
-	q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
-					    blk_debugfs_root);
-
 	debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs);
 
 	/*
@@ -856,9 +853,7 @@ void blk_mq_debugfs_register(struct request_queue *q)
 
 void blk_mq_debugfs_unregister(struct request_queue *q)
 {
-	debugfs_remove_recursive(q->debugfs_dir);
 	q->sched_debugfs_dir = NULL;
-	q->debugfs_dir = NULL;
 }
 
 static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx,
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index fca9b158f4a0..20f20b0fa0b9 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -895,6 +895,7 @@ static void __blk_release_queue(struct work_struct *work)
 
 	blk_trace_shutdown(q);
 
+	blk_q_debugfs_unregister(q);
 	if (queue_is_mq(q))
 		blk_mq_debugfs_unregister(q);
 
@@ -975,6 +976,8 @@ int blk_register_queue(struct gendisk *disk)
 		goto unlock;
 	}
 
+	blk_q_debugfs_register(q);
+
 	if (queue_is_mq(q)) {
 		__blk_mq_register_dev(dev, q);
 		blk_mq_debugfs_register(q);
diff --git a/block/blk.h b/block/blk.h
index 86a66b614f08..b86123a2d74f 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -489,10 +489,20 @@ int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
 		bool *same_page);
 #ifdef CONFIG_DEBUG_FS
 void blk_debugfs_register(void);
+void blk_q_debugfs_register(struct request_queue *q);
+void blk_q_debugfs_unregister(struct request_queue *q);
 #else
 static inline void blk_debugfs_register(void)
 {
 }
+
+static inline void blk_q_debugfs_register(struct request_queue *q)
+{
+}
+
+static inline void blk_q_debugfs_unregister(struct request_queue *q)
+{
+}
 #endif /* CONFIG_DEBUG_FS */
 
 #endif /* BLK_INTERNAL_H */
diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h
index 3b6ff5902edc..eb6db276e293 100644
--- a/include/linux/blktrace_api.h
+++ b/include/linux/blktrace_api.h
@@ -22,7 +22,6 @@ struct blk_trace {
 	u64 end_lba;
 	u32 pid;
 	u32 dev;
-	struct dentry *dir;
 	struct dentry *dropped_file;
 	struct dentry *msg_file;
 	struct list_head running_list;
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index ca39dc3230cb..15086227592f 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -311,7 +311,6 @@ static void blk_trace_free(struct blk_trace *bt)
 	debugfs_remove(bt->msg_file);
 	debugfs_remove(bt->dropped_file);
 	relay_close(bt->rchan);
-	debugfs_remove(bt->dir);
 	free_percpu(bt->sequence);
 	free_percpu(bt->msg_data);
 	kfree(bt);
@@ -476,7 +475,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 			      struct blk_user_trace_setup *buts)
 {
 	struct blk_trace *bt = NULL;
-	struct dentry *dir = NULL;
 	int ret;
 
 	if (!buts->buf_size || !buts->buf_nr)
@@ -485,6 +483,9 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 	if (!blk_debugfs_root)
 		return -ENOENT;
 
+	if (!q->debugfs_dir)
+		return -ENOENT;
+
 	strncpy(buts->name, name, BLKTRACE_BDEV_SIZE);
 	buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0';
 
@@ -509,21 +510,19 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 
 	ret = -ENOENT;
 
-	dir = debugfs_lookup(buts->name, blk_debugfs_root);
-	if (!dir)
-		bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root);
-
 	bt->dev = dev;
 	atomic_set(&bt->dropped, 0);
 	INIT_LIST_HEAD(&bt->running_list);
 
 	ret = -EIO;
-	bt->dropped_file = debugfs_create_file("dropped", 0444, dir, bt,
+	bt->dropped_file = debugfs_create_file("dropped", 0444,
+					       q->debugfs_dir, bt,
 					       &blk_dropped_fops);
 
-	bt->msg_file = debugfs_create_file("msg", 0222, dir, bt, &blk_msg_fops);
+	bt->msg_file = debugfs_create_file("msg", 0222, q->debugfs_dir,
+					   bt, &blk_msg_fops);
 
-	bt->rchan = relay_open("trace", dir, buts->buf_size,
+	bt->rchan = relay_open("trace", q->debugfs_dir, buts->buf_size,
 				buts->buf_nr, &blk_relay_callbacks, bt);
 	if (!bt->rchan)
 		goto err;
@@ -551,8 +550,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 
 	ret = 0;
 err:
-	if (dir && !bt->dir)
-		dput(dir);
 	if (ret)
 		blk_trace_free(bt);
 	return ret;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC 3/3] block: avoid deferral of blk_release_queue() work
  2020-04-01 23:59 [RFC 0/3] block: address blktrace use-after-free Luis Chamberlain
  2020-04-02  0:00 ` [RFC 1/3] block: move main block debugfs initialization to its own file Luis Chamberlain
  2020-04-02  0:00 ` [RFC 2/3] blktrace: fix debugfs use after free Luis Chamberlain
@ 2020-04-02  0:00 ` Luis Chamberlain
  2020-04-02  3:39   ` Bart Van Assche
  2020-04-02  7:44 ` [RFC 0/3] block: address blktrace use-after-free Greg KH
  2020-04-03  8:19 ` Ming Lei
  4 siblings, 1 reply; 29+ messages in thread
From: Luis Chamberlain @ 2020-04-02  0:00 UTC (permalink / raw)
  To: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange
  Cc: mhocko, linux-block, linux-fsdevel, linux-kernel,
	Luis Chamberlain, Bart Van Assche, Omar Sandoval,
	Hannes Reinecke, Michal Hocko

Commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") moved
the blk_release_queue() into a workqueue after a splat floated around
with some work here which could sleep in blk_exit_rl().

On recent commit db6d9952356 ("block: remove request_list code") though
Jens Axboe removed this code, now merged since v5.0. We no longer have
to defer this work.

By doing this we also avoid failing to detach / attach a block
device with a BLKTRACESETUP. This issue can be reproduced with
break-blktrace [0] using:

  break-blktrace -c 10 -d -s

The kernel does not crash without this commit, it just fails to
create the block device because the prior block device removal
deferred work is pending. After this commit we can use the above
flaky use of blktrace without an issue.

[0] https://github.com/mcgrof/break-blktrace

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Suggested-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 block/blk-sysfs.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 20f20b0fa0b9..f159b40899ee 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -862,8 +862,8 @@ static void blk_exit_queue(struct request_queue *q)
 
 
 /**
- * __blk_release_queue - release a request queue
- * @work: pointer to the release_work member of the request queue to be released
+ * blk_release_queue - release a request queue
+ * @kojb: pointer to the kobj representing the request queue
  *
  * Description:
  *     This function is called when a block device is being unregistered. The
@@ -873,9 +873,10 @@ static void blk_exit_queue(struct request_queue *q)
  *     of the request queue reaches zero, blk_release_queue is called to release
  *     all allocated resources of the request queue.
  */
-static void __blk_release_queue(struct work_struct *work)
+static void blk_release_queue(struct kobject *kobj)
 {
-	struct request_queue *q = container_of(work, typeof(*q), release_work);
+	struct request_queue *q =
+		container_of(kobj, struct request_queue, kobj);
 
 	if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags))
 		blk_stat_remove_callback(q, q->poll_cb);
@@ -905,15 +906,6 @@ static void __blk_release_queue(struct work_struct *work)
 	call_rcu(&q->rcu_head, blk_free_queue_rcu);
 }
 
-static void blk_release_queue(struct kobject *kobj)
-{
-	struct request_queue *q =
-		container_of(kobj, struct request_queue, kobj);
-
-	INIT_WORK(&q->release_work, __blk_release_queue);
-	schedule_work(&q->release_work);
-}
-
 static const struct sysfs_ops queue_sysfs_ops = {
 	.show	= queue_attr_show,
 	.store	= queue_attr_store,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 2/3] blktrace: fix debugfs use after free
  2020-04-02  0:00 ` [RFC 2/3] blktrace: fix debugfs use after free Luis Chamberlain
@ 2020-04-02  1:57   ` Eric Sandeen
  2020-04-02 16:14     ` Luis Chamberlain
  2020-04-05  3:39   ` Bart Van Assche
  1 sibling, 1 reply; 29+ messages in thread
From: Eric Sandeen @ 2020-04-02  1:57 UTC (permalink / raw)
  To: Luis Chamberlain, axboe, viro, gregkh, rostedt, mingo, jack,
	ming.lei, nstange
  Cc: mhocko, linux-block, linux-fsdevel, linux-kernel,
	Bart Van Assche, Omar Sandoval, Hannes Reinecke, Michal Hocko,
	syzbot+603294af2d01acfdd6da

On 4/1/20 7:00 PM, Luis Chamberlain wrote:
> On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory")
> Omar fixed the original blktrace code for multiqueue use. This however
> left in place a possible crash, if you happen to abuse blktrace in a way
> it was not intended.
> 
> Namely, if you loop adding a device, setup the blktrace with BLKTRACESETUP,
> forget to BLKTRACETEARDOWN, and then just remove the device you end up
> with a panic:

Weird, I swear I tested that and didn't hit it, but ...
 

> This issue can be reproduced with break-blktrace [2] using:
> 
>   break-blktrace -c 10 -d

+ -s, right?
 
> This patch fixes this issue. Note that there is also another
> respective UAF but from the ioctl path [3], this should also fix
> that issue.
> 
> This patch then also contends the severity of CVE-2019-19770 as
> this issue is only possible using root to shoot yourself in the
> foot by also misuing blktrace.
> 
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=205713
> [1] https://nvd.nist.gov/vuln/detail/CVE-2019-19770
> [2] https://github.com/mcgrof/break-blktrace

I verified that this does reproduce the exact same KASAN splat on
kernel 4.19.83 as reported in the original bug, thanks!

-Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 3/3] block: avoid deferral of blk_release_queue() work
  2020-04-02  0:00 ` [RFC 3/3] block: avoid deferral of blk_release_queue() work Luis Chamberlain
@ 2020-04-02  3:39   ` Bart Van Assche
  2020-04-02 14:49     ` Nicolai Stange
  0 siblings, 1 reply; 29+ messages in thread
From: Bart Van Assche @ 2020-04-02  3:39 UTC (permalink / raw)
  To: Luis Chamberlain, axboe, viro, gregkh, rostedt, mingo, jack,
	ming.lei, nstange
  Cc: mhocko, linux-block, linux-fsdevel, linux-kernel, Omar Sandoval,
	Hannes Reinecke, Michal Hocko

On 2020-04-01 17:00, Luis Chamberlain wrote:
> Commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") moved
> the blk_release_queue() into a workqueue after a splat floated around
> with some work here which could sleep in blk_exit_rl().
> 
> On recent commit db6d9952356 ("block: remove request_list code") though
> Jens Axboe removed this code, now merged since v5.0. We no longer have
> to defer this work.
> 
> By doing this we also avoid failing to detach / attach a block
> device with a BLKTRACESETUP. This issue can be reproduced with
> break-blktrace [0] using:
> 
>   break-blktrace -c 10 -d -s
> 
> The kernel does not crash without this commit, it just fails to
> create the block device because the prior block device removal
> deferred work is pending. After this commit we can use the above
> flaky use of blktrace without an issue.
> 
> [0] https://github.com/mcgrof/break-blktrace
> 
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Nicolai Stange <nstange@suse.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Suggested-by: Nicolai Stange <nstange@suse.de>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  block/blk-sysfs.c | 18 +++++-------------
>  1 file changed, 5 insertions(+), 13 deletions(-)
> 
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 20f20b0fa0b9..f159b40899ee 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -862,8 +862,8 @@ static void blk_exit_queue(struct request_queue *q)
>  
>  
>  /**
> - * __blk_release_queue - release a request queue
> - * @work: pointer to the release_work member of the request queue to be released
> + * blk_release_queue - release a request queue
> + * @kojb: pointer to the kobj representing the request queue
>   *
>   * Description:
>   *     This function is called when a block device is being unregistered. The
> @@ -873,9 +873,10 @@ static void blk_exit_queue(struct request_queue *q)
>   *     of the request queue reaches zero, blk_release_queue is called to release
>   *     all allocated resources of the request queue.
>   */
> -static void __blk_release_queue(struct work_struct *work)
> +static void blk_release_queue(struct kobject *kobj)
>  {
> -	struct request_queue *q = container_of(work, typeof(*q), release_work);
> +	struct request_queue *q =
> +		container_of(kobj, struct request_queue, kobj);
>  
>  	if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags))
>  		blk_stat_remove_callback(q, q->poll_cb);
> @@ -905,15 +906,6 @@ static void __blk_release_queue(struct work_struct *work)
>  	call_rcu(&q->rcu_head, blk_free_queue_rcu);
>  }
>  
> -static void blk_release_queue(struct kobject *kobj)
> -{
> -	struct request_queue *q =
> -		container_of(kobj, struct request_queue, kobj);
> -
> -	INIT_WORK(&q->release_work, __blk_release_queue);
> -	schedule_work(&q->release_work);
> -}
> -
>  static const struct sysfs_ops queue_sysfs_ops = {
>  	.show	= queue_attr_show,
>  	.store	= queue_attr_store,

The description of this patch mentions a single blk_release_queue() call
that happened in the past from a context from which sleeping is not
allowed and from which sleeping is allowed today. Have all other
blk_release_queue() / blk_put_queue() calls been verified to see whether
none of these happens from a context from which sleeping is not allowed?

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/3] block: address blktrace use-after-free
  2020-04-01 23:59 [RFC 0/3] block: address blktrace use-after-free Luis Chamberlain
                   ` (2 preceding siblings ...)
  2020-04-02  0:00 ` [RFC 3/3] block: avoid deferral of blk_release_queue() work Luis Chamberlain
@ 2020-04-02  7:44 ` Greg KH
  2020-04-03  8:19 ` Ming Lei
  4 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2020-04-02  7:44 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: axboe, viro, rostedt, mingo, jack, ming.lei, nstange, mhocko,
	linux-block, linux-fsdevel, linux-kernel

On Wed, Apr 01, 2020 at 11:59:59PM +0000, Luis Chamberlain wrote:
> Upstream kernel.org korg#205713 contends that there is a UAF in
> the core debugfs debugfs_remove() function, and has gone through
> pushing for a CVE for this, CVE-2019-19770.

As you point out, that CVE is crazy and pointless, thanks for seeing
this through.

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 3/3] block: avoid deferral of blk_release_queue() work
  2020-04-02  3:39   ` Bart Van Assche
@ 2020-04-02 14:49     ` Nicolai Stange
  2020-04-06  9:11       ` Nicolai Stange
  2020-04-09 18:11       ` Luis Chamberlain
  0 siblings, 2 replies; 29+ messages in thread
From: Nicolai Stange @ 2020-04-02 14:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Luis Chamberlain, axboe, viro, gregkh, rostedt, mingo, jack,
	ming.lei, nstange, mhocko, linux-block, linux-fsdevel,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko

Bart Van Assche <bvanassche@acm.org> writes:

> On 2020-04-01 17:00, Luis Chamberlain wrote:
>> Commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") moved
>> the blk_release_queue() into a workqueue after a splat floated around
>> with some work here which could sleep in blk_exit_rl().
>> 
>> On recent commit db6d9952356 ("block: remove request_list code") though
>> Jens Axboe removed this code, now merged since v5.0. We no longer have
>> to defer this work.
>> 
>> By doing this we also avoid failing to detach / attach a block
>> device with a BLKTRACESETUP. This issue can be reproduced with
>> break-blktrace [0] using:
>> 
>>   break-blktrace -c 10 -d -s
>> 
>> The kernel does not crash without this commit, it just fails to
>> create the block device because the prior block device removal
>> deferred work is pending. After this commit we can use the above
>> flaky use of blktrace without an issue.
>> 
>> [0] https://github.com/mcgrof/break-blktrace
>> 
>> Cc: Bart Van Assche <bvanassche@acm.org>
>> Cc: Omar Sandoval <osandov@fb.com>
>> Cc: Hannes Reinecke <hare@suse.com>
>> Cc: Nicolai Stange <nstange@suse.de>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Suggested-by: Nicolai Stange <nstange@suse.de>
>> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
>> ---
>>  block/blk-sysfs.c | 18 +++++-------------
>>  1 file changed, 5 insertions(+), 13 deletions(-)
>> 
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index 20f20b0fa0b9..f159b40899ee 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -862,8 +862,8 @@ static void blk_exit_queue(struct request_queue *q)
>>  
>>  
>>  /**
>> - * __blk_release_queue - release a request queue
>> - * @work: pointer to the release_work member of the request queue to be released
>> + * blk_release_queue - release a request queue
>> + * @kojb: pointer to the kobj representing the request queue
>>   *
>>   * Description:
>>   *     This function is called when a block device is being unregistered. The
>> @@ -873,9 +873,10 @@ static void blk_exit_queue(struct request_queue *q)
>>   *     of the request queue reaches zero, blk_release_queue is called to release
>>   *     all allocated resources of the request queue.
>>   */
>> -static void __blk_release_queue(struct work_struct *work)
>> +static void blk_release_queue(struct kobject *kobj)
>>  {
>> -	struct request_queue *q = container_of(work, typeof(*q), release_work);
>> +	struct request_queue *q =
>> +		container_of(kobj, struct request_queue, kobj);
>>  
>>  	if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags))
>>  		blk_stat_remove_callback(q, q->poll_cb);
>> @@ -905,15 +906,6 @@ static void __blk_release_queue(struct work_struct *work)
>>  	call_rcu(&q->rcu_head, blk_free_queue_rcu);
>>  }
>>  
>> -static void blk_release_queue(struct kobject *kobj)
>> -{
>> -	struct request_queue *q =
>> -		container_of(kobj, struct request_queue, kobj);
>> -
>> -	INIT_WORK(&q->release_work, __blk_release_queue);
>> -	schedule_work(&q->release_work);
>> -}
>> -
>>  static const struct sysfs_ops queue_sysfs_ops = {
>>  	.show	= queue_attr_show,
>>  	.store	= queue_attr_store,
>
> The description of this patch mentions a single blk_release_queue() call
> that happened in the past from a context from which sleeping is not
> allowed and from which sleeping is allowed today. Have all other
> blk_release_queue() / blk_put_queue() calls been verified to see whether
> none of these happens from a context from which sleeping is not allowed?

I've just done this today and found the following potentially
problematic call paths to blk_put_queue().

1.) mem_cgroup_throttle_swaprate() takes a spinlock and
    calls blkcg_schedule_throttle()->blk_put_queue().

    Also note that AFAICS mem_cgroup_try_charge_delay() can be called
    with GFP_ATOMIC.

2.) scsi_unblock_requests() gets called from a lot of drivers and
    invoke blk_put_queue() through
    scsi_unblock_requests() -> scsi_run_host_queues() ->
    scsi_starved_list_run() -> blk_put_queue().

    Most call sites are fine, the ones which are not are:
    a.) pmcraid_complete_ioa_reset(). This gets assigned
        to struct pmcraid_cmd's ->cmd_done and later invoked
        under a spinlock.

    b.) qla82xx_fw_dump() and qla8044_fw_dump().
        These can potentially block w/o this patch already,
        because both invoke qla2x00_wait_for_chip_reset().

	However, they can get called from IRQ context. For example,
        qla82xx_intr_handler(), qla82xx_msix_default() and
        qla82xx_poll() call qla2x00_async_event(), which calls
        ->fw_dump().

	The aforementioned functions can also reach ->fw_dump() through
        qla24xx_process_response_queue()->qlt_handle_abts_recv()->qlt_response_pkt_all_vps()
        ->qlt_response_pkt()->qlt_handle_abts_completion()->qlt_chk_unresolv_exchg()
        -> ->fw_dump().

	But I'd consider this a problem with the driver -- either
	->fw_dump() can sleep and must not be called from IRQ context
        or they must not invoke qla2x00_wait_for_hba_ready().


(I can share the full analysis, but it's lengthy and contains nothing
 interesting except for what is listed above).


One final note though: If I'm not mistaken, then the final
blk_put_queue() can in principle block even today, simply by virtue of
the kernfs operations invoked through
kobject_put()->kobject_release()->kobject_cleanup()->kobject_del()
->sysfs_remove_dir()->kernfs_remove()->mutex_lock()?


Thanks,

Nicolai

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 2/3] blktrace: fix debugfs use after free
  2020-04-02  1:57   ` Eric Sandeen
@ 2020-04-02 16:14     ` Luis Chamberlain
  0 siblings, 0 replies; 29+ messages in thread
From: Luis Chamberlain @ 2020-04-02 16:14 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange,
	mhocko, linux-block, linux-fsdevel, linux-kernel,
	Bart Van Assche, Omar Sandoval, Hannes Reinecke, Michal Hocko,
	syzbot+603294af2d01acfdd6da

On Wed, Apr 01, 2020 at 08:57:42PM -0500, Eric Sandeen wrote:
> On 4/1/20 7:00 PM, Luis Chamberlain wrote:
> > On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory")
> > Omar fixed the original blktrace code for multiqueue use. This however
> > left in place a possible crash, if you happen to abuse blktrace in a way
> > it was not intended.
> > 
> > Namely, if you loop adding a device, setup the blktrace with BLKTRACESETUP,
> > forget to BLKTRACETEARDOWN, and then just remove the device you end up
> > with a panic:
> 
> Weird, I swear I tested that and didn't hit it, but ...

The real issue was also the dangling block device, a loop device proves
easy to test that. I'll note that another way to test this as well would
be to have a virtual machine with a block devices that goes in and out
via whatever virsh shenanigans to make a block device appear / disappear.

> > This issue can be reproduced with break-blktrace [2] using:
> > 
> >   break-blktrace -c 10 -d
> 
> + -s, right?

Yeap, sorry about that.

> > This patch fixes this issue. Note that there is also another
> > respective UAF but from the ioctl path [3], this should also fix
> > that issue.
> > 
> > This patch then also contends the severity of CVE-2019-19770 as
> > this issue is only possible using root to shoot yourself in the
> > foot by also misuing blktrace.
> > 
> > [0] https://bugzilla.kernel.org/show_bug.cgi?id=205713
> > [1] https://nvd.nist.gov/vuln/detail/CVE-2019-19770
> > [2] https://github.com/mcgrof/break-blktrace
> 
> I verified that this does reproduce the exact same KASAN splat on
> kernel 4.19.83 as reported in the original bug, thanks!

I just codified what Nicolai suspected, we should thank him :)

  Luis

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/3] block: address blktrace use-after-free
  2020-04-01 23:59 [RFC 0/3] block: address blktrace use-after-free Luis Chamberlain
                   ` (3 preceding siblings ...)
  2020-04-02  7:44 ` [RFC 0/3] block: address blktrace use-after-free Greg KH
@ 2020-04-03  8:19 ` Ming Lei
  2020-04-03 14:06   ` Luis Chamberlain
                     ` (2 more replies)
  4 siblings, 3 replies; 29+ messages in thread
From: Ming Lei @ 2020-04-03  8:19 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: axboe, viro, gregkh, rostedt, mingo, jack, nstange, mhocko,
	linux-block, linux-fsdevel, linux-kernel, yukuai3

On Wed, Apr 01, 2020 at 11:59:59PM +0000, Luis Chamberlain wrote:
> Upstream kernel.org korg#205713 contends that there is a UAF in
> the core debugfs debugfs_remove() function, and has gone through
> pushing for a CVE for this, CVE-2019-19770.
> 
> If correct then parent dentries are not positive, and this would
> have implications far beyond this bug report. Thankfully, upon review
> with Nicolai, he wasn't buying it. His suspicions that this was just
> a blktrace issue were spot on, and this patch series demonstrates
> that, provides a reproducer, and provides a solution to the issue.
> 
> We there would like to contend CVE-2019-19770 as invalid. The
> implications suggested are not correct, and this issue is only
> triggerable with root, by shooting yourself on the foot by misuing
> blktrace.
> 
> If you want this on a git tree, you can get it from linux-next
> 20200401-blktrace-fix-uaf branch [2].
> 
> Wider review, testing, and rants are appreciated.
> 
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=205713
> [1] https://nvd.nist.gov/vuln/detail/CVE-2019-19770
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20200401-blktrace-fix-uaf
> 
> Luis Chamberlain (3):
>   block: move main block debugfs initialization to its own file
>   blktrace: fix debugfs use after free
>   block: avoid deferral of blk_release_queue() work
> 
>  block/Makefile               |  1 +
>  block/blk-core.c             |  9 +--------
>  block/blk-debugfs.c          | 27 +++++++++++++++++++++++++++
>  block/blk-mq-debugfs.c       |  5 -----
>  block/blk-sysfs.c            | 21 ++++++++-------------
>  block/blk.h                  | 17 +++++++++++++++++
>  include/linux/blktrace_api.h |  1 -
>  kernel/trace/blktrace.c      | 19 ++++++++-----------
>  8 files changed, 62 insertions(+), 38 deletions(-)
>  create mode 100644 block/blk-debugfs.c

BTW, Yu Kuai posted one patch for this issue, looks that approach
is simpler:

https://lore.kernel.org/linux-block/20200324132315.22133-1-yukuai3@huawei.com/


Thanks,
Ming


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/3] block: address blktrace use-after-free
  2020-04-03  8:19 ` Ming Lei
@ 2020-04-03 14:06   ` Luis Chamberlain
  2020-04-03 14:13   ` Bart Van Assche
  2020-04-07  2:47   ` yukuai (C)
  2 siblings, 0 replies; 29+ messages in thread
From: Luis Chamberlain @ 2020-04-03 14:06 UTC (permalink / raw)
  To: Ming Lei, yu kuai
  Cc: axboe, viro, gregkh, rostedt, mingo, jack, nstange, mhocko,
	linux-block, linux-fsdevel, linux-kernel, yukuai3

On Fri, Apr 03, 2020 at 04:19:29PM +0800, Ming Lei wrote:
> On Wed, Apr 01, 2020 at 11:59:59PM +0000, Luis Chamberlain wrote:
> > Upstream kernel.org korg#205713 contends that there is a UAF in
> > the core debugfs debugfs_remove() function, and has gone through
> > pushing for a CVE for this, CVE-2019-19770.
> > 
> > If correct then parent dentries are not positive, and this would
> > have implications far beyond this bug report. Thankfully, upon review
> > with Nicolai, he wasn't buying it. His suspicions that this was just
> > a blktrace issue were spot on, and this patch series demonstrates
> > that, provides a reproducer, and provides a solution to the issue.
> > 
> > We there would like to contend CVE-2019-19770 as invalid. The
> > implications suggested are not correct, and this issue is only
> > triggerable with root, by shooting yourself on the foot by misuing
> > blktrace.
> > 
> > If you want this on a git tree, you can get it from linux-next
> > 20200401-blktrace-fix-uaf branch [2].
> > 
> > Wider review, testing, and rants are appreciated.
> > 
> > [0] https://bugzilla.kernel.org/show_bug.cgi?id=205713
> > [1] https://nvd.nist.gov/vuln/detail/CVE-2019-19770
> > [2] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20200401-blktrace-fix-uaf
> > 
> > Luis Chamberlain (3):
> >   block: move main block debugfs initialization to its own file
> >   blktrace: fix debugfs use after free
> >   block: avoid deferral of blk_release_queue() work
> > 
> >  block/Makefile               |  1 +
> >  block/blk-core.c             |  9 +--------
> >  block/blk-debugfs.c          | 27 +++++++++++++++++++++++++++
> >  block/blk-mq-debugfs.c       |  5 -----
> >  block/blk-sysfs.c            | 21 ++++++++-------------
> >  block/blk.h                  | 17 +++++++++++++++++
> >  include/linux/blktrace_api.h |  1 -
> >  kernel/trace/blktrace.c      | 19 ++++++++-----------
> >  8 files changed, 62 insertions(+), 38 deletions(-)
> >  create mode 100644 block/blk-debugfs.c
> 
> BTW, Yu Kuai posted one patch for this issue, looks that approach
> is simpler:
> 
> https://lore.kernel.org/linux-block/20200324132315.22133-1-yukuai3@huawei.com/

I cannot see how renaming the possible target directory to a temporary
directory instead of unifying it for both SQ and MQ could be any
simpler. IMHO this keeps the mess and fragile nature of the issue.

The approach taken here unifies the directory we should use for both SQ
and MQ and makes the deferral issue a completely separate issue
addressed in the last patch.

  Luis

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/3] block: address blktrace use-after-free
  2020-04-03  8:19 ` Ming Lei
  2020-04-03 14:06   ` Luis Chamberlain
@ 2020-04-03 14:13   ` Bart Van Assche
  2020-04-03 19:49     ` Luis Chamberlain
  2020-04-07  2:47   ` yukuai (C)
  2 siblings, 1 reply; 29+ messages in thread
From: Bart Van Assche @ 2020-04-03 14:13 UTC (permalink / raw)
  To: Ming Lei, Luis Chamberlain
  Cc: axboe, viro, gregkh, rostedt, mingo, jack, nstange, mhocko,
	linux-block, linux-fsdevel, linux-kernel, yukuai3

On 2020-04-03 01:19, Ming Lei wrote:
> BTW, Yu Kuai posted one patch for this issue, looks that approach
> is simpler:
> 
> https://lore.kernel.org/linux-block/20200324132315.22133-1-yukuai3@huawei.com/

That approach is indeed simpler but at the expense of performing dynamic
memory allocation in a cleanup path, something I'm not enthusiast about.

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/3] block: address blktrace use-after-free
  2020-04-03 14:13   ` Bart Van Assche
@ 2020-04-03 19:49     ` Luis Chamberlain
  0 siblings, 0 replies; 29+ messages in thread
From: Luis Chamberlain @ 2020-04-03 19:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Ming Lei, axboe, viro, gregkh, rostedt, mingo, jack, nstange,
	mhocko, linux-block, linux-fsdevel, linux-kernel, yukuai3

On Fri, Apr 03, 2020 at 07:13:47AM -0700, Bart Van Assche wrote:
> On 2020-04-03 01:19, Ming Lei wrote:
> > BTW, Yu Kuai posted one patch for this issue, looks that approach
> > is simpler:
> > 
> > https://lore.kernel.org/linux-block/20200324132315.22133-1-yukuai3@huawei.com/
> 
> That approach is indeed simpler but at the expense of performing dynamic
> memory allocation in a cleanup path, something I'm not enthusiast about.

I also think that its important to annotate that there are actually two
issues here. Not one. One is the misuse of debugfs, the other is how
the deferral exposed the misuse and complications of its misuse.

  Luis

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 1/3] block: move main block debugfs initialization to its own file
  2020-04-02  0:00 ` [RFC 1/3] block: move main block debugfs initialization to its own file Luis Chamberlain
@ 2020-04-05  3:12   ` Bart Van Assche
  2020-04-06 14:23     ` Luis Chamberlain
  0 siblings, 1 reply; 29+ messages in thread
From: Bart Van Assche @ 2020-04-05  3:12 UTC (permalink / raw)
  To: Luis Chamberlain, axboe, viro, gregkh, rostedt, mingo, jack,
	ming.lei, nstange
  Cc: mhocko, linux-block, linux-fsdevel, linux-kernel, Omar Sandoval,
	Hannes Reinecke, Michal Hocko

On 2020-04-01 17:00, Luis Chamberlain wrote:
> Single and multiqeueue block devices share some debugfs code. By
             ^^^^^^^^^^^
             multiqueue?
> moving this into its own file it makes it easier to expand and audit
> this shared code.

[ ... ]

> diff --git a/block/blk-debugfs.c b/block/blk-debugfs.c
> new file mode 100644
> index 000000000000..634dea4b1507
> --- /dev/null
> +++ b/block/blk-debugfs.c
> @@ -0,0 +1,15 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Shared debugfs mq / non-mq functionality
> + */

The legacy block layer is gone, so not sure why the above comment refers
to non-mq?

> diff --git a/block/blk.h b/block/blk.h
> index 0a94ec68af32..86a66b614f08 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -487,5 +487,12 @@ struct request_queue *__blk_alloc_queue(int node_id);
>  int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
>  		struct page *page, unsigned int len, unsigned int offset,
>  		bool *same_page);
> +#ifdef CONFIG_DEBUG_FS
> +void blk_debugfs_register(void);
> +#else
> +static inline void blk_debugfs_register(void)
> +{
> +}
> +#endif /* CONFIG_DEBUG_FS */

Do we really need a new header file that only declares a single
function? How about adding the above into block/blk-mq-debugfs.h?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 2/3] blktrace: fix debugfs use after free
  2020-04-02  0:00 ` [RFC 2/3] blktrace: fix debugfs use after free Luis Chamberlain
  2020-04-02  1:57   ` Eric Sandeen
@ 2020-04-05  3:39   ` Bart Van Assche
  2020-04-06  1:27     ` Eric Sandeen
  2020-04-06 15:14     ` Luis Chamberlain
  1 sibling, 2 replies; 29+ messages in thread
From: Bart Van Assche @ 2020-04-05  3:39 UTC (permalink / raw)
  To: Luis Chamberlain, axboe, viro, gregkh, rostedt, mingo, jack,
	ming.lei, nstange
  Cc: mhocko, linux-block, linux-fsdevel, linux-kernel, Omar Sandoval,
	Hannes Reinecke, Michal Hocko, syzbot+603294af2d01acfdd6da

On 2020-04-01 17:00, Luis Chamberlain wrote:
> korg#205713 then was used to create CVE-2019-19770 and claims that
> the bug is in a use-after-free in the debugfs core code. The
> implications of this being a generic UAF on debugfs would be
> much more severe, as it would imply parent dentries can sometimes
> not be possitive, which is something claim is not possible.
         ^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         positive?  is there perhaps a word missing here?

> It turns out that the issue actually is a mis-use of debugfs for
> the multiqueue case, and the fragile nature of how we free the
> directory used to keep track of blktrace debugfs files. Omar's
> commit assumed the parent directory would be kept with
> debugfs_lookup() but this is not the case, only the dentry is
> kept around. We also special-case a solution for multiqueue
> given that for multiqueue code we always instantiate the debugfs
> directory for the request queue. We were leaving it only to chance,
> if someone happens to use blktrace, on single queue block devices
> for the respective debugfs directory be created.

Since the legacy block layer is gone, the above explanation may have to
be rephrased.

> We can fix the UAF by simply using a debugfs directory which is
> always created for singlequeue and multiqueue block devices. This
> simplifies the code considerably, with the only penalty now being
> that we're always creating the request queue directory debugfs
> directory for the block device on singlequeue block devices.

Same comment here - the legacy block layer is gone. I think that today
all block drivers are either request-based and multiqueue or so-called
make_request drivers. See also the output of git grep -nHw
blk_alloc_queue for examples of the latter category.

> This patch then also contends the severity of CVE-2019-19770 as
> this issue is only possible using root to shoot yourself in the
> foot by also misuing blktrace.
               ^^^^^^^
               misusing?

> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index b3f2ba483992..bda9378eab90 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -823,9 +823,6 @@ void blk_mq_debugfs_register(struct request_queue *q)
>  	struct blk_mq_hw_ctx *hctx;
>  	int i;
>  
> -	q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
> -					    blk_debugfs_root);
> -
>  	debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs);
>  
>  	/*

[ ... ]

>  static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx,
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index fca9b158f4a0..20f20b0fa0b9 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -895,6 +895,7 @@ static void __blk_release_queue(struct work_struct *work)
>  
>  	blk_trace_shutdown(q);
>  
> +	blk_q_debugfs_unregister(q);
>  	if (queue_is_mq(q))
>  		blk_mq_debugfs_unregister(q);

Does this patch change the behavior of the block layer from only
registering a debugfs directory for request-based block devices to
registering a debugfs directory for request-based and make_request based
block devices? Is that behavior change an intended behavior change?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 2/3] blktrace: fix debugfs use after free
  2020-04-05  3:39   ` Bart Van Assche
@ 2020-04-06  1:27     ` Eric Sandeen
  2020-04-06  4:25       ` Bart Van Assche
  2020-04-06 15:14     ` Luis Chamberlain
  1 sibling, 1 reply; 29+ messages in thread
From: Eric Sandeen @ 2020-04-06  1:27 UTC (permalink / raw)
  To: Bart Van Assche, Luis Chamberlain, axboe, viro, gregkh, rostedt,
	mingo, jack, ming.lei, nstange
  Cc: mhocko, linux-block, linux-fsdevel, linux-kernel, Omar Sandoval,
	Hannes Reinecke, Michal Hocko, syzbot+603294af2d01acfdd6da

On 4/4/20 10:39 PM, Bart Van Assche wrote:
> On 2020-04-01 17:00, Luis Chamberlain wrote:
>> korg#205713 then was used to create CVE-2019-19770 and claims that
>> the bug is in a use-after-free in the debugfs core code. The
>> implications of this being a generic UAF on debugfs would be
>> much more severe, as it would imply parent dentries can sometimes
>> not be possitive, which is something claim is not possible.
>          ^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>          positive?  is there perhaps a word missing here?
> 
>> It turns out that the issue actually is a mis-use of debugfs for
>> the multiqueue case, and the fragile nature of how we free the
>> directory used to keep track of blktrace debugfs files. Omar's
>> commit assumed the parent directory would be kept with
>> debugfs_lookup() but this is not the case, only the dentry is
>> kept around. We also special-case a solution for multiqueue
>> given that for multiqueue code we always instantiate the debugfs
>> directory for the request queue. We were leaving it only to chance,
>> if someone happens to use blktrace, on single queue block devices
>> for the respective debugfs directory be created.
> 
> Since the legacy block layer is gone, the above explanation may have to
> be rephrased.
> 
>> We can fix the UAF by simply using a debugfs directory which is
>> always created for singlequeue and multiqueue block devices. This
>> simplifies the code considerably, with the only penalty now being
>> that we're always creating the request queue directory debugfs
>> directory for the block device on singlequeue block devices.
> 
> Same comment here - the legacy block layer is gone. I think that today
> all block drivers are either request-based and multiqueue or so-called
> make_request drivers. See also the output of git grep -nHw
> blk_alloc_queue for examples of the latter category.
> 
>> This patch then also contends the severity of CVE-2019-19770 as
>> this issue is only possible using root to shoot yourself in the
>> foot by also misuing blktrace.
>                ^^^^^^^
>                misusing?

Honestly I think the whole "misusing blktrace" narrative is not relevant
here; the kernel has to deal with whatever ioctls it receives, right.

The thing I can't figure out from reading the change log is

1) what the root cause of the problem is, and
2) how this patch fixes it?

>> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
>> index b3f2ba483992..bda9378eab90 100644
>> --- a/block/blk-mq-debugfs.c
>> +++ b/block/blk-mq-debugfs.c
>> @@ -823,9 +823,6 @@ void blk_mq_debugfs_register(struct request_queue *q)
>>  	struct blk_mq_hw_ctx *hctx;
>>  	int i;
>>  
>> -	q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
>> -					    blk_debugfs_root);
>> -
>>  	debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs);
>>  
>>  	/*
> 
> [ ... ]
> 
>>  static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx,
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index fca9b158f4a0..20f20b0fa0b9 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -895,6 +895,7 @@ static void __blk_release_queue(struct work_struct *work)
>>  
>>  	blk_trace_shutdown(q);
>>  
>> +	blk_q_debugfs_unregister(q);
>>  	if (queue_is_mq(q))
>>  		blk_mq_debugfs_unregister(q);
> 
> Does this patch change the behavior of the block layer from only
> registering a debugfs directory for request-based block devices to
> registering a debugfs directory for request-based and make_request based
> block devices? Is that behavior change an intended behavior change?

Seems to be:

"This simplifies the code considerably, with the only penalty now being
that we're always creating the request queue directory debugfs
directory for the block device on singlequeue block devices."

?

-Eric


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 2/3] blktrace: fix debugfs use after free
  2020-04-06  1:27     ` Eric Sandeen
@ 2020-04-06  4:25       ` Bart Van Assche
  2020-04-06  9:18         ` Nicolai Stange
  2020-04-06 14:29         ` Eric Sandeen
  0 siblings, 2 replies; 29+ messages in thread
From: Bart Van Assche @ 2020-04-06  4:25 UTC (permalink / raw)
  To: Eric Sandeen, Luis Chamberlain, axboe, viro, gregkh, rostedt,
	mingo, jack, ming.lei, nstange
  Cc: mhocko, linux-block, linux-fsdevel, linux-kernel, Omar Sandoval,
	Hannes Reinecke, Michal Hocko, syzbot+603294af2d01acfdd6da

On 2020-04-05 18:27, Eric Sandeen wrote:
> The thing I can't figure out from reading the change log is
> 
> 1) what the root cause of the problem is, and
> 2) how this patch fixes it?

I think that the root cause is that do_blk_trace_setup() uses
debugfs_lookup() and that debugfs_lookup() may return a pointer
associated with a previous incarnation of the block device.
Additionally, I think the following changes fix that problem by using
q->debugfs_dir in the blktrace code instead of debugfs_lookup():

[ ... ]
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -311,7 +311,6 @@ static void blk_trace_free(struct blk_trace *bt)
 	debugfs_remove(bt->msg_file);
 	debugfs_remove(bt->dropped_file);
 	relay_close(bt->rchan);
-	debugfs_remove(bt->dir);
 	free_percpu(bt->sequence);
 	free_percpu(bt->msg_data);
 	kfree(bt);
[ ... ]
@@ -509,21 +510,19 @@ static int do_blk_trace_setup(struct request_queue
*q, char *name, dev_t dev,

 	ret = -ENOENT;

-	dir = debugfs_lookup(buts->name, blk_debugfs_root);
-	if (!dir)
-		bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root);
-
 	bt->dev = dev;
 	atomic_set(&bt->dropped, 0);
 	INIT_LIST_HEAD(&bt->running_list);

 	ret = -EIO;
-	bt->dropped_file = debugfs_create_file("dropped", 0444, dir, bt,
+	bt->dropped_file = debugfs_create_file("dropped", 0444,
+					       q->debugfs_dir, bt,
 					       &blk_dropped_fops);
[ ... ]

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 3/3] block: avoid deferral of blk_release_queue() work
  2020-04-02 14:49     ` Nicolai Stange
@ 2020-04-06  9:11       ` Nicolai Stange
  2020-04-09 18:11       ` Luis Chamberlain
  1 sibling, 0 replies; 29+ messages in thread
From: Nicolai Stange @ 2020-04-06  9:11 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: Bart Van Assche, Luis Chamberlain, axboe, viro, gregkh, rostedt,
	mingo, jack, ming.lei, mhocko, linux-block, linux-fsdevel,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko

Nicolai Stange <nstange@suse.de> writes:

> Bart Van Assche <bvanassche@acm.org> writes:
>
>> The description of this patch mentions a single blk_release_queue() call
>> that happened in the past from a context from which sleeping is not
>> allowed and from which sleeping is allowed today. Have all other
>> blk_release_queue() / blk_put_queue() calls been verified to see whether
>> none of these happens from a context from which sleeping is not allowed?
>
> I've just done this today and found the following potentially
> problematic call paths to blk_put_queue().
>
> 1.) mem_cgroup_throttle_swaprate() takes a spinlock and
>     calls blkcg_schedule_throttle()->blk_put_queue().
>
>     Also note that AFAICS mem_cgroup_try_charge_delay() can be called
>     with GFP_ATOMIC.
>
> 2.) scsi_unblock_requests() gets called from a lot of drivers and
>     invoke blk_put_queue() through
>     scsi_unblock_requests() -> scsi_run_host_queues() ->
>     scsi_starved_list_run() -> blk_put_queue().
>
>     Most call sites are fine, the ones which are not are:
>     a.) pmcraid_complete_ioa_reset(). This gets assigned
>         to struct pmcraid_cmd's ->cmd_done and later invoked
>         under a spinlock.
>
>     b.) qla82xx_fw_dump() and qla8044_fw_dump().
>         These can potentially block w/o this patch already,
>         because both invoke qla2x00_wait_for_chip_reset().
>
> 	However, they can get called from IRQ context. For example,
>         qla82xx_intr_handler(), qla82xx_msix_default() and
>         qla82xx_poll() call qla2x00_async_event(), which calls
>         ->fw_dump().
>
> 	The aforementioned functions can also reach ->fw_dump() through
>         qla24xx_process_response_queue()->qlt_handle_abts_recv()->qlt_response_pkt_all_vps()
>         ->qlt_response_pkt()->qlt_handle_abts_completion()->qlt_chk_unresolv_exchg()
>         -> ->fw_dump().
>
> 	But I'd consider this a problem with the driver -- either
> 	->fw_dump() can sleep and must not be called from IRQ context
>         or they must not invoke qla2x00_wait_for_hba_ready().
>
>
> (I can share the full analysis, but it's lengthy and contains nothing
>  interesting except for what is listed above).
>
>
> One final note though: If I'm not mistaken, then the final
> blk_put_queue() can in principle block even today, simply by virtue of
> the kernfs operations invoked through
> kobject_put()->kobject_release()->kobject_cleanup()->kobject_del()
> ->sysfs_remove_dir()->kernfs_remove()->mutex_lock()?\

That's wrong, I missed kobject_del() invocation issued from
blk_unregister_queue(). Thus, blk_put_queue() in its current
implementation won't ever block.

Thanks,

Nicolai

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 2/3] blktrace: fix debugfs use after free
  2020-04-06  4:25       ` Bart Van Assche
@ 2020-04-06  9:18         ` Nicolai Stange
  2020-04-06 15:19           ` Luis Chamberlain
  2020-04-06 14:29         ` Eric Sandeen
  1 sibling, 1 reply; 29+ messages in thread
From: Nicolai Stange @ 2020-04-06  9:18 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Eric Sandeen, Luis Chamberlain, axboe, viro, gregkh, rostedt,
	mingo, jack, ming.lei, nstange, mhocko, linux-block,
	linux-fsdevel, linux-kernel, Omar Sandoval, Hannes Reinecke,
	Michal Hocko, syzbot+603294af2d01acfdd6da

Bart Van Assche <bvanassche@acm.org> writes:

> On 2020-04-05 18:27, Eric Sandeen wrote:
>> The thing I can't figure out from reading the change log is
>> 
>> 1) what the root cause of the problem is, and
>> 2) how this patch fixes it?
>
> I think that the root cause is that do_blk_trace_setup() uses
> debugfs_lookup() and that debugfs_lookup() may return a pointer
> associated with a previous incarnation of the block device.

That's correct, the debugfs_lookup() can find a previous incarnation's
dir of the same name which is about to get removed from a not yet
schedule work.

I.e. something like the following is possible:

  LOOP_CTL_DEL(loop0) /* schedule __blk_release_queue() work_struct */
  LOOP_CTL_ADD(loop0) /* debugfs_create_dir() from
		       * blk_mq_debugfs_register() fails with EEXIST
                       */
  BLKTRACE_SETUP(loop0) /* debugfs_lookup() finds the directory about to
			 * get deleted and blktrace files will be created
			 * thereunder.
			 */

  The work_struct gets scheduled and the debugfs dir debugfs_remove()ed
  recursively, which includes the blktrace files just created. blktrace's
  dentry pointers are now dangling and there will be a UAF when it
  attempts to delete those again.

Luis' patch [2/3] fixes the issue of the debugfs_lookup() from
blk_mq_debugfs_register() potentially returning an existing directory
associated with a previous block device incarnation of the same name and
thus, fixes the UAF.


However, the problem that the debugfs_create_dir() from
blk_mq_debugfs_register() in a sequence of
  LOOP_CTL_DEL(loop0)
  LOOP_CTL_ADD(loop0)
could silently fail still remains. The RFC patch [3/3] from Luis
attempts to address this issue by folding the delayed
__blk_release_queue() work back into blk_release_queue(), the release
handler associated with the queue kobject and executed from the final
blk_queue_put(). However, that's still no full solution, because the
kobject release handler can run asynchronously, long after
blk_unregister_queue() has returned (c.f. also
CONFIG_DEBUG_KOBJECT_RELEASE).

Note that I proposed this change here (internally) as a potential
cleanup, because I missed the kobject_del() from blk_unregister_queue()
and *wrongly* concluded that blk_queue_put() must be allowed to sleep
nowadays. However, that kobject_del() is in place and moreover, the
analysis requested by Bart (c.f. [1] in this thread) revealed that there
are indeed a couple of sites calling blk_queue_put() from atomic
context.

So I'd suggest to drop patch [3/3] from this series and modify this
patch [2/3] here to move the blk_q_debugfs_unregister(q) invocation from
__blk_release_queue() to blk_unregister_queue() instead.


> Additionally, I think the following changes fix that problem by using
> q->debugfs_dir in the blktrace code instead of debugfs_lookup():

That would fix the UAF, but !queue_is_mq() queues wouldn't get a debugfs
directory created for them by blktrace anymore?


Thanks,

Nicolai

[1] https://lkml.kernel.org/r/87o8saj62m.fsf@suse.de


> [ ... ]
> --- a/kernel/trace/blktrace.c
> +++ b/kernel/trace/blktrace.c
> @@ -311,7 +311,6 @@ static void blk_trace_free(struct blk_trace *bt)
>  	debugfs_remove(bt->msg_file);
>  	debugfs_remove(bt->dropped_file);
>  	relay_close(bt->rchan);
> -	debugfs_remove(bt->dir);
>  	free_percpu(bt->sequence);
>  	free_percpu(bt->msg_data);
>  	kfree(bt);
> [ ... ]
> @@ -509,21 +510,19 @@ static int do_blk_trace_setup(struct request_queue
> *q, char *name, dev_t dev,
>
>  	ret = -ENOENT;
>
> -	dir = debugfs_lookup(buts->name, blk_debugfs_root);
> -	if (!dir)
> -		bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root);
> -
>  	bt->dev = dev;
>  	atomic_set(&bt->dropped, 0);
>  	INIT_LIST_HEAD(&bt->running_list);
>
>  	ret = -EIO;
> -	bt->dropped_file = debugfs_create_file("dropped", 0444, dir, bt,
> +	bt->dropped_file = debugfs_create_file("dropped", 0444,
> +					       q->debugfs_dir, bt,
>  					       &blk_dropped_fops);
> [ ... ]
>
> Bart.

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 1/3] block: move main block debugfs initialization to its own file
  2020-04-05  3:12   ` Bart Van Assche
@ 2020-04-06 14:23     ` Luis Chamberlain
  0 siblings, 0 replies; 29+ messages in thread
From: Luis Chamberlain @ 2020-04-06 14:23 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange,
	mhocko, linux-block, linux-fsdevel, linux-kernel, Omar Sandoval,
	Hannes Reinecke, Michal Hocko

On Sat, Apr 04, 2020 at 08:12:53PM -0700, Bart Van Assche wrote:
> On 2020-04-01 17:00, Luis Chamberlain wrote:
> > Single and multiqeueue block devices share some debugfs code. By
>              ^^^^^^^^^^^
>              multiqueue?
> > moving this into its own file it makes it easier to expand and audit
> > this shared code.
> 
> [ ... ]
> 
> > diff --git a/block/blk-debugfs.c b/block/blk-debugfs.c
> > new file mode 100644
> > index 000000000000..634dea4b1507
> > --- /dev/null
> > +++ b/block/blk-debugfs.c
> > @@ -0,0 +1,15 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * Shared debugfs mq / non-mq functionality
> > + */
> 
> The legacy block layer is gone, so not sure why the above comment refers
> to non-mq?

Will adjust the language, thanks.

> 
> > diff --git a/block/blk.h b/block/blk.h
> > index 0a94ec68af32..86a66b614f08 100644
> > --- a/block/blk.h
> > +++ b/block/blk.h
> > @@ -487,5 +487,12 @@ struct request_queue *__blk_alloc_queue(int node_id);
> >  int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
> >  		struct page *page, unsigned int len, unsigned int offset,
> >  		bool *same_page);
> > +#ifdef CONFIG_DEBUG_FS
> > +void blk_debugfs_register(void);
> > +#else
> > +static inline void blk_debugfs_register(void)
> > +{
> > +}
> > +#endif /* CONFIG_DEBUG_FS */
> 
> Do we really need a new header file that only declares a single
> function? How about adding the above into block/blk-mq-debugfs.h?

Moving forward rq->debugfs_dir will created when CONFIG_DEBUG_FS is
enabled to enable blktrace to use it. This creation won't depend on
CONFIG_BLK_DEBUG_FS, so we can definitely sprinkly the #ifdef
CONFIG_DEBUG_FS stuff in block/blk-mq-debugfs.h but it just didn't
seem the best place. Let me know.

  Luis

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 2/3] blktrace: fix debugfs use after free
  2020-04-06  4:25       ` Bart Van Assche
  2020-04-06  9:18         ` Nicolai Stange
@ 2020-04-06 14:29         ` Eric Sandeen
  2020-04-07  8:09           ` Luis Chamberlain
  1 sibling, 1 reply; 29+ messages in thread
From: Eric Sandeen @ 2020-04-06 14:29 UTC (permalink / raw)
  To: Bart Van Assche, Luis Chamberlain, axboe, viro, gregkh, rostedt,
	mingo, jack, ming.lei, nstange
  Cc: mhocko, linux-block, linux-fsdevel, linux-kernel, Omar Sandoval,
	Hannes Reinecke, Michal Hocko, syzbot+603294af2d01acfdd6da

On 4/5/20 11:25 PM, Bart Van Assche wrote:
> On 2020-04-05 18:27, Eric Sandeen wrote:
>> The thing I can't figure out from reading the change log is
>>
>> 1) what the root cause of the problem is, and
>> 2) how this patch fixes it?
> 
> I think that the root cause is that do_blk_trace_setup() uses
> debugfs_lookup() and that debugfs_lookup() may return a pointer
> associated with a previous incarnation of the block device.
> Additionally, I think the following changes fix that problem by using
> q->debugfs_dir in the blktrace code instead of debugfs_lookup():

Yep, I gathered that from reading the patch, was just hoping for a commit log
that makes it clear.

> [ ... ]
> --- a/kernel/trace/blktrace.c
> +++ b/kernel/trace/blktrace.c
> @@ -311,7 +311,6 @@ static void blk_trace_free(struct blk_trace *bt)
>  	debugfs_remove(bt->msg_file);
>  	debugfs_remove(bt->dropped_file);
>  	relay_close(bt->rchan);
> -	debugfs_remove(bt->dir);
>  	free_percpu(bt->sequence);
>  	free_percpu(bt->msg_data);
>  	kfree(bt);
> [ ... ]
> @@ -509,21 +510,19 @@ static int do_blk_trace_setup(struct request_queue
> *q, char *name, dev_t dev,
> 
>  	ret = -ENOENT;
> 
> -	dir = debugfs_lookup(buts->name, blk_debugfs_root);
> -	if (!dir)
> -		bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root);
> -
>  	bt->dev = dev;
>  	atomic_set(&bt->dropped, 0);
>  	INIT_LIST_HEAD(&bt->running_list);
> 
>  	ret = -EIO;
> -	bt->dropped_file = debugfs_create_file("dropped", 0444, dir, bt,
> +	bt->dropped_file = debugfs_create_file("dropped", 0444,
> +					       q->debugfs_dir, bt,
>  					       &blk_dropped_fops);

One thing I'm not sure about, the block_trace *bt still points to a dentry that
could get torn down via debugfs_remove_recursive when the queue is released, right,
but could later be sent to blk_trace_free again?  And yet this does seem to fix the
use after free in my testing, so I must be missing something.

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 2/3] blktrace: fix debugfs use after free
  2020-04-05  3:39   ` Bart Van Assche
  2020-04-06  1:27     ` Eric Sandeen
@ 2020-04-06 15:14     ` Luis Chamberlain
  1 sibling, 0 replies; 29+ messages in thread
From: Luis Chamberlain @ 2020-04-06 15:14 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange,
	mhocko, linux-block, linux-fsdevel, linux-kernel, Omar Sandoval,
	Hannes Reinecke, Michal Hocko, syzbot+603294af2d01acfdd6da

On Sat, Apr 04, 2020 at 08:39:47PM -0700, Bart Van Assche wrote:
> On 2020-04-01 17:00, Luis Chamberlain wrote:
> > korg#205713 then was used to create CVE-2019-19770 and claims that
> > the bug is in a use-after-free in the debugfs core code. The
> > implications of this being a generic UAF on debugfs would be
> > much more severe, as it would imply parent dentries can sometimes
> > not be possitive, which is something claim is not possible.
>          ^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>          positive?  is there perhaps a word missing here?

Sorry yeah, this was supposed to say:

it would imply parent dentries can sometimes not be positive, which
is just not possible.

> > It turns out that the issue actually is a mis-use of debugfs for
> > the multiqueue case, and the fragile nature of how we free the
> > directory used to keep track of blktrace debugfs files. Omar's
> > commit assumed the parent directory would be kept with
> > debugfs_lookup() but this is not the case, only the dentry is
> > kept around. We also special-case a solution for multiqueue
> > given that for multiqueue code we always instantiate the debugfs
> > directory for the request queue. We were leaving it only to chance,
> > if someone happens to use blktrace, on single queue block devices
> > for the respective debugfs directory be created.
> 
> Since the legacy block layer is gone, the above explanation may have to
> be rephrased.

Will do.

> > We can fix the UAF by simply using a debugfs directory which is
> > always created for singlequeue and multiqueue block devices. This
> > simplifies the code considerably, with the only penalty now being
> > that we're always creating the request queue directory debugfs
> > directory for the block device on singlequeue block devices.
> 
> Same comment here - the legacy block layer is gone. I think that today
> all block drivers are either request-based and multiqueue or so-called
> make_request drivers. See also the output of git grep -nHw
> blk_alloc_queue for examples of the latter category.

Will adjust.

> > This patch then also contends the severity of CVE-2019-19770 as
> > this issue is only possible using root to shoot yourself in the
> > foot by also misuing blktrace.
>                ^^^^^^^
>                misusing?
> 
> > diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> > index b3f2ba483992..bda9378eab90 100644
> > --- a/block/blk-mq-debugfs.c
> > +++ b/block/blk-mq-debugfs.c
> > @@ -823,9 +823,6 @@ void blk_mq_debugfs_register(struct request_queue *q)
> >  	struct blk_mq_hw_ctx *hctx;
> >  	int i;
> >  
> > -	q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
> > -					    blk_debugfs_root);
> > -
> >  	debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs);
> >  
> >  	/*
> 
> [ ... ]
> 
> >  static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx,
> > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > index fca9b158f4a0..20f20b0fa0b9 100644
> > --- a/block/blk-sysfs.c
> > +++ b/block/blk-sysfs.c
> > @@ -895,6 +895,7 @@ static void __blk_release_queue(struct work_struct *work)
> >  
> >  	blk_trace_shutdown(q);
> >  
> > +	blk_q_debugfs_unregister(q);
> >  	if (queue_is_mq(q))
> >  		blk_mq_debugfs_unregister(q);
> 
> Does this patch change the behavior of the block layer from only
> registering a debugfs directory for request-based block devices to
> registering a debugfs directory for request-based and make_request based
> block devices? Is that behavior change an intended behavior change?

Yes, specifically this was already done, however for request-based block
devices this was done upon init, and for make_request based devices this
was only instantiated *iff* blktrace was used at least once. It is
actually a bit difficult to see the later, given the rq->debugfs_dir was
not used per se for make_request based block devices, but instead
the debugfs_create_dir(buts->name, blk_debugfs_root) call was made
directly, which happens to end up being the same directory as
debugfs_create_dir(kobject_name(q->kobj.parent), blk_debugfs_root)
called on block/blk-mq-debugfs.c.

This changes the block layer so that the rq->debugfs_dir is always created
now if debugfs is enabled.

Note that blktrace already depends on debugfs. What was missing in this
patch too was this hunk:

--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -569,8 +569,10 @@ struct request_queue {
	struct list_head        tag_set_list;
	struct bio_set          bio_split;

-#ifdef CONFIG_BLK_DEBUG_FS
+#ifdef CONFIG_DEBUG_FS
	struct dentry           *debugfs_dir;
+#endif
+#ifdef CONFIG_BLK_DEBUG_FS
	struct dentry
	*sched_debugfs_dir;
	struct dentry
	*rqos_debugfs_dir;
#endif


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 2/3] blktrace: fix debugfs use after free
  2020-04-06  9:18         ` Nicolai Stange
@ 2020-04-06 15:19           ` Luis Chamberlain
  2020-04-07  8:15             ` Luis Chamberlain
  0 siblings, 1 reply; 29+ messages in thread
From: Luis Chamberlain @ 2020-04-06 15:19 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: Bart Van Assche, Eric Sandeen, axboe, viro, gregkh, rostedt,
	mingo, jack, ming.lei, mhocko, linux-block, linux-fsdevel,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko,
	syzbot+603294af2d01acfdd6da

On Mon, Apr 06, 2020 at 11:18:13AM +0200, Nicolai Stange wrote:
> Bart Van Assche <bvanassche@acm.org> writes:

> So I'd suggest to drop patch [3/3] from this series and modify this
> patch [2/3] here to move the blk_q_debugfs_unregister(q) invocation from
> __blk_release_queue() to blk_unregister_queue() instead.

I'll take a stab.

> > Additionally, I think the following changes fix that problem by using
> > q->debugfs_dir in the blktrace code instead of debugfs_lookup():
> 
> That would fix the UAF, but !queue_is_mq() queues wouldn't get a debugfs
> directory created for them by blktrace anymore?

It would, it would just be done early on init as well, and it would now be
shared with the queue_is_mq() case.

  Luis

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/3] block: address blktrace use-after-free
  2020-04-03  8:19 ` Ming Lei
  2020-04-03 14:06   ` Luis Chamberlain
  2020-04-03 14:13   ` Bart Van Assche
@ 2020-04-07  2:47   ` yukuai (C)
  2020-04-07 19:00     ` Luis Chamberlain
  2 siblings, 1 reply; 29+ messages in thread
From: yukuai (C) @ 2020-04-07  2:47 UTC (permalink / raw)
  To: Ming Lei, Luis Chamberlain
  Cc: axboe, viro, gregkh, rostedt, mingo, jack, nstange, mhocko,
	linux-block, linux-fsdevel, linux-kernel

On 2020/4/3 16:19, Ming Lei wrote:

> BTW, Yu Kuai posted one patch for this issue, looks that approach
> is simpler:
> 
> https://lore.kernel.org/linux-block/20200324132315.22133-1-yukuai3@huawei.com/
> 
> 

I think the issue might not be fixed with the patch seires.

At first, I think there are two key points for the issure:
1. The final release of queue is delayed in a workqueue
2. The creation of 'q->debugfs_dir' might failed(only if 1 exist)
And if we can fix any of the above problem, the UAF issue will be fixed.
(BTW, I did not come up with a good idea for problem 1, and my approach
is for problem 2.)

The third patch "block: avoid deferral of blk_release_queue() work" is
not enough to fix problem 1:
a. if CONFIG_DEBUG_KOBJECT_RELEASE is enable:
static void kobject_release(struct kref *kref)
{
         struct kobject *kobj = container_of(kref, struct kobject, kref);
#ifdef CONFIG_DEBUG_KOBJECT_RELEASE
         unsigned long delay = HZ + HZ * (get_random_int() & 0x3);
         pr_info("kobject: '%s' (%p): %s, parent %p (delayed %ld)\n",
                 ┊kobject_name(kobj), kobj, __func__, kobj->parent, delay);
         INIT_DELAYED_WORK(&kobj->release, kobject_delayed_cleanup);

         schedule_delayed_work(&kobj->release, delay);
#else
         kobject_cleanup(kobj);
#endif
}
b. when 'kobject_put' is called from blk_cleanup_queue, can we make sure
it is the last reference?

Futhermore, I do understand the second patch fix the UAF problem by
using 'q->debugfs_dir' instead of 'q->blk_trace->dir', but the problem 2
still exist and need to be fixed.

Thanks!
Yu Kuai


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 2/3] blktrace: fix debugfs use after free
  2020-04-06 14:29         ` Eric Sandeen
@ 2020-04-07  8:09           ` Luis Chamberlain
  0 siblings, 0 replies; 29+ messages in thread
From: Luis Chamberlain @ 2020-04-07  8:09 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Bart Van Assche, axboe, viro, gregkh, rostedt, mingo, jack,
	ming.lei, nstange, mhocko, linux-block, linux-fsdevel,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko,
	syzbot+603294af2d01acfdd6da

On Mon, Apr 06, 2020 at 09:29:40AM -0500, Eric Sandeen wrote:
> On 4/5/20 11:25 PM, Bart Van Assche wrote:
> > On 2020-04-05 18:27, Eric Sandeen wrote:
> >> The thing I can't figure out from reading the change log is
> >>
> >> 1) what the root cause of the problem is, and
> >> 2) how this patch fixes it?
> > 
> > I think that the root cause is that do_blk_trace_setup() uses
> > debugfs_lookup() and that debugfs_lookup() may return a pointer
> > associated with a previous incarnation of the block device.
> > Additionally, I think the following changes fix that problem by using
> > q->debugfs_dir in the blktrace code instead of debugfs_lookup():
> 
> Yep, I gathered that from reading the patch, was just hoping for a commit log
> that makes it clear.

Will try to be a bit more clear.

> > [ ... ]
> > --- a/kernel/trace/blktrace.c
> > +++ b/kernel/trace/blktrace.c
> > @@ -311,7 +311,6 @@ static void blk_trace_free(struct blk_trace *bt)
> >  	debugfs_remove(bt->msg_file);
> >  	debugfs_remove(bt->dropped_file);
> >  	relay_close(bt->rchan);
> > -	debugfs_remove(bt->dir);
> >  	free_percpu(bt->sequence);
> >  	free_percpu(bt->msg_data);
> >  	kfree(bt);
> > [ ... ]
> > @@ -509,21 +510,19 @@ static int do_blk_trace_setup(struct request_queue
> > *q, char *name, dev_t dev,
> > 
> >  	ret = -ENOENT;
> > 
> > -	dir = debugfs_lookup(buts->name, blk_debugfs_root);
> > -	if (!dir)
> > -		bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root);
> > -
> >  	bt->dev = dev;
> >  	atomic_set(&bt->dropped, 0);
> >  	INIT_LIST_HEAD(&bt->running_list);
> > 
> >  	ret = -EIO;
> > -	bt->dropped_file = debugfs_create_file("dropped", 0444, dir, bt,
> > +	bt->dropped_file = debugfs_create_file("dropped", 0444,
> > +					       q->debugfs_dir, bt,
> >  					       &blk_dropped_fops);
> 
> One thing I'm not sure about, the block_trace *bt still points to a dentry that
> could get torn down via debugfs_remove_recursive when the queue is released, right,

Yes, but let us be specific, what points to a dentry is the directory,
and then the files liked "dropped" above. In fact, as it turns out, the
relay can also be torn down for us through a debugfs_remove_recursive().

> but could later be sent to blk_trace_free again? 

This *should* not be possible, because the dentries for the blktrace
should not be removed prior to the request_queue being removed, because
the blktrace dentries are protected via a request_queue mutex,
q->blk_trace_mutex.

The blk_trace_free() free should therefore happen way early, before
the request_queue is removed. The only dentry left is the directory
on the q->debugfs_dir, and this should only be removed only later when
the request_queue is removed.

But, this does not happen today. For completeness below is a trace
of a real crash but with some extra information added, such as the
get_cpu() a call runs on, and any other interesting thing which we
might care for at the point a call is made.

Also remember debugfs_remove_recursive() is just a define to debugfs_remove().

I used:

	./break-blktrace -c 10 -d -s

So the teardown of the blktrace is skipped purposely.

The gist of it, is that upon the second loop, the LOOP_CTL_DEL for
loop0 schedules the __blk_release_queue() work, but it at least
gets to call blk_trace_free(), but in the meantime LOOP_CTL_ADD()
from the second loop gets called, without the prior work from the
LOOP_CTL_DEL having completed yet. By the time BLKTRACE_SETUP gets
called, the debugfs_lookup() is used, the directory is used (not
created). Then the old LOOP_CTL_DEL from the same second loop still
has to complete, and so it tries, and as you will see from the trace
below it calls debugfs_remove_recursive(q->debugfs_dir) prior to
the second loops' BLKTRACE_SETUP finishing. The dentries for the
files created for blktrace would be removed. Upon the third loop's
LOOP_CTL_DEL we end up trying to free these dentries again.

load loopback module                                                            
[   13.603371] == blk_mq_debugfs_register(12) start                             
[   13.604040] == blk_mq_debugfs_register(12) q->debugfs_dir created            
[   13.604934] == blk_mq_debugfs_register(12) end                               
[   13.627382] == blk_mq_debugfs_register(12) start                             
[   13.628041] == blk_mq_debugfs_register(12) q->debugfs_dir created            
[   13.629240] == blk_mq_debugfs_register(12) end                               
[   13.651667] == blk_mq_debugfs_register(12) start                             
[   13.652836] == blk_mq_debugfs_register(12) q->debugfs_dir created            
[   13.655107] == blk_mq_debugfs_register(12) end                               
[   13.684917] == blk_mq_debugfs_register(12) start                             
[   13.687876] == blk_mq_debugfs_register(12) q->debugfs_dir created            
[   13.691588] == blk_mq_debugfs_register(13) end                               
[   13.707320] == blk_mq_debugfs_register(13) start                             
[   13.707863] == blk_mq_debugfs_register(13) q->debugfs_dir created            
[   13.708856] == blk_mq_debugfs_register(13) end                               
[   13.735623] == blk_mq_debugfs_register(13) start                             
[   13.736656] == blk_mq_debugfs_register(13) q->debugfs_dir created            
[   13.738411] == blk_mq_debugfs_register(13) end                               
[   13.763326] == blk_mq_debugfs_register(13) start                             
[   13.763972] == blk_mq_debugfs_register(13) q->debugfs_dir created            
[   13.765167] == blk_mq_debugfs_register(13) end                               
[   13.779510] == blk_mq_debugfs_register(13) start                             
[   13.780522] == blk_mq_debugfs_register(13) q->debugfs_dir created            
[   13.782338] == blk_mq_debugfs_register(13) end                               
[   13.783521] loop: module loaded         

LOOP_CTL_DEL(loop0) #1                                                          
[   13.803550] = __blk_release_queue(4) start                                   
[   13.807772] == blk_trace_shutdown(4) start                                   
[   13.810749] == blk_trace_shutdown(4) end                                     
[   13.813437] = __blk_release_queue(4) calling blk_mq_debugfs_unregister()     
[   13.817593] ==== blk_mq_debugfs_unregister(4) begin                          
[   13.817621] ==== blk_mq_debugfs_unregister(4) debugfs_remove_recursive(q->debugfs_dir)
[   13.821203] ==== blk_mq_debugfs_unregister(4) end q->debugfs_dir is NULL     
[   13.826166] = __blk_release_queue(4) blk_mq_debugfs_unregister() end         
[   13.832992] = __blk_release_queue(4) end

LOOP_CTL_ADD(loop0) #1                                                          
[   13.843742] == blk_mq_debugfs_register(7) start                              
[   13.845569] == blk_mq_debugfs_register(7) q->debugfs_dir created             
[   13.848628] == blk_mq_debugfs_register(7) end

BLKTRACE_SETUP(loop0) #1                                                        
[   13.850924] == blk_trace_ioctl(7, BLKTRACESETUP) start                       
[   13.852852] === do_blk_trace_setup(7) start                                  
[   13.854580] === do_blk_trace_setup(7) creating directory                     
[   13.856620] === do_blk_trace_setup(7) using what debugfs_lookup() gave       
[   13.860635] === do_blk_trace_setup(7) end with ret: 0                        
[   13.862615] == blk_trace_ioctl(7, BLKTRACESETUP) end

LOOP_CTL_DEL(loop0) #2                                                          
[   13.883304] = __blk_release_queue(7) start                                   
[   13.885324] == blk_trace_shutdown(7) start                                   
[   13.887197] == blk_trace_shutdown(7) calling __blk_trace_remove()            
[   13.889807] == __blk_trace_remove(7) start                                   
[   13.891669] === blk_trace_cleanup(7) start                                   
[   13.911656] ====== blk_trace_free(7) start  

LOOP_CTL_ADD(loop0) #2                                                          
[   13.912709] == blk_mq_debugfs_register(2) start

---> From LOOP_CTL_DEL(loop0) #2                                                
[   13.915887] ====== blk_trace_free(7) end 

---> From LOOP_CTL_ADD(loop0) #2                                                
[   13.918359] debugfs: Directory 'loop0' with parent 'block' already present!  
[   13.926433] == blk_mq_debugfs_register(2) q->debugfs_dir created             
[   13.930373] == blk_mq_debugfs_register(2) end

BLKTRACE_SETUP(loop0) #2                                                        
[   13.933961] == blk_trace_ioctl(2, BLKTRACESETUP) start                       
[   13.936758] === do_blk_trace_setup(2) start                                  
[   13.938944] === do_blk_trace_setup(2) creating directory                     
[   13.941029] === do_blk_trace_setup(2) using what debugfs_lookup() gave

---> From LOOP_CTL_DEL(loop0) #2                                                
[   13.971046] === blk_trace_cleanup(7) end                                     
[   13.973175] == __blk_trace_remove(7) end                                     
[   13.975352] == blk_trace_shutdown(7) end                                     
[   13.977415] = __blk_release_queue(7) calling blk_mq_debugfs_unregister()     
[   13.980645] ==== blk_mq_debugfs_unregister(7) begin                          
[   13.980696] ==== blk_mq_debugfs_unregister(7) debugfs_remove_recursive(q->debugfs_dir)
[   13.983118] ==== blk_mq_debugfs_unregister(7) end q->debugfs_dir is NULL     
[   13.986945] = __blk_release_queue(7) blk_mq_debugfs_unregister() end         
[   13.993155] = __blk_release_queue(7) end 

---> From BLKTRACE_SETUP(loop0) #2                                              
[   13.995928] === do_blk_trace_setup(2) end with ret: 0                        
[   13.997623] == blk_trace_ioctl(2, BLKTRACESETUP) end   

LOOP_CTL_DEL(loop0) #3                                                          
[   14.035119] = __blk_release_queue(2) start                                   
[   14.036925] == blk_trace_shutdown(2) start                                   
[   14.038518] == blk_trace_shutdown(2) calling __blk_trace_remove()            
[   14.040829] == __blk_trace_remove(2) start                                   
[   14.042413] === blk_trace_cleanup(2) start 

LOOP_CTL_ADD(loop0) #3                                                          
[   14.072522] == blk_mq_debugfs_register(6) start 

---> From LOOP_CTL_DEL(loop0) #3                                                
[   14.075151] ====== blk_trace_free(2) start 

---> From LOOP_CTL_ADD(loop0) #3                                                
[   14.075882] == blk_mq_debugfs_register(6) q->debugfs_dir created  

---> From LOOP_CTL_DEL(loop0) #3                                                
[   14.078624] BUG: kernel NULL pointer dereference, address: 00000000000000a0  
[   14.084332] == blk_mq_debugfs_register(6) end                                
[   14.086971] #PF: supervisor write access in kernel mode                      
[   14.086974] #PF: error_code(0x0002) - not-present page                       
[   14.086977] PGD 0 P4D 0                                                      
[   14.086984] Oops: 0002 [#1] SMP NOPTI                                        
[   14.086990] CPU: 2 PID: 287 Comm: kworker/2:2 Tainted: G            E 5.6.0-next-20200403+ #54
[   14.086991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
[   14.087002] Workqueue: events __blk_release_queue                            
[   14.087011] RIP: 0010:down_write+0x15/0x40                                   
[   14.090300] == blk_trace_ioctl(6, BLKTRACESETUP) start                       
[   14.093277] Code: eb ca e8 3e 34 8d ff cc cc cc cc cc cc cc cc cc cc
cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00
00 <f0> 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d
[   14.093280] RSP: 0018:ffffc28a00533da8 EFLAGS: 00010246                      
[   14.093284] RAX: 0000000000000000 RBX: ffff9f7a24d07980 RCX: ffffff8100000000
[   14.093286] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0
[   14.093287] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000019
[   14.093289] R10: 0000000000000774 R11: 0000000000000000 R12: 0000000000000000
[   14.093291] R13: ffff9f7a2fab0400 R14: ffff9f7a21dd1140 R15: 00000000000000a0
[   14.093294] FS:  0000000000000000(0000) GS:ffff9f7a2fa80000(0000) knlGS:0000000000000000
[   14.093296] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                
[   14.093298] CR2: 00000000000000a0 CR3: 00000004293d2003 CR4: 0000000000360ee0
[   14.093307] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   14.093308] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   14.093310] Call Trace:                                                      
[   14.093324]  simple_recursive_removal+0x4e/0x2e0                             
[   14.093330]  ? debugfs_remove+0x60/0x60                                      
[   14.093334]  debugfs_remove+0x40/0x60                                        
[   14.093339]  blk_trace_free+0x20/0x70                                        
[   14.093346]  __blk_trace_remove+0x54/0x90                                    
[   14.096704] === do_blk_trace_setup(6) start                                  
[   14.098534]  blk_trace_shutdown+0x74/0x80                                    
[   14.100958] === do_blk_trace_setup(6) creating directory                     
[   14.104575]  __blk_release_queue+0xbe/0x160                                  
[   14.104580]  process_one_work+0x1b4/0x380                                    
[   14.104585]  worker_thread+0x50/0x3c0                                        
[   14.104589]  kthread+0xf9/0x130            
[   14.104593]  ? process_one_work+0x380/0x380                                  
[   14.104596]  ? kthread_park+0x90/0x90                                        
[   14.104599]  ret_from_fork+0x1f/0x40                                         
[   14.104603] Modules linked in: loop(E) xfs(E) libcrc32c(E)
crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) joydev(E)
serio_raw(E) aesni_intel(E) glue_helper(E) virtio_balloon(E) evdev(E)
crypto_simd(E) pcspkr(E) cryptd(E) i6300esb(E) button(E) ip_tables(E)
x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E)
jbd2(E) virtio_net(E) net_failover(E) failover(E) virtio_blk(E)
ata_generic(E) uhci_hcd(E) ata_piix(E) ehci_hcd(E) nvme(E) libata(E)
crc32c_intel(E) usbcore(E) psmouse(E) nvme_core(E) virtio_pci(E)
scsi_mod(E) virtio_ring(E) t10_pi(E) virtio(E) i2c_piix4(E) floppy(E)
[   14.107400] === do_blk_trace_setup(6) using what debugfs_lookup()
gave       
[   14.108939] CR2: 00000000000000a0                                            
[   14.110589] === do_blk_trace_setup(6) end with ret: 0                        
[   14.111592] ---[ end trace 7a783b33b9614db9 ]---

> And yet this does seem to fix the
> use after free in my testing, so I must be missing something.

The fragility is in the use of the debugfs_lookup(), and since that
doesn't protect the files we create underneath it. We also dput()
right away, and even if we didn't, if you try to fix that you'll
soon see that this is just a hot mess with the debugfs directories
used.

It is much cleaner to have one directory which we don't have to
worry about for its life, the complexities around special casing
it for mq or not is what gets this thing to be sloppy.

One thing we could do to avoid conflicts with removal and a setup
is:

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 15086227592f..e3d4809a2964 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -701,6 +701,9 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg)
 	if (!q)
 		return -ENXIO;
 
+	if (work_pending(&q->release_work))
+		return -ENXIO;
+
 	mutex_lock(&q->blk_trace_mutex);
 
 	switch (cmd) {


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 2/3] blktrace: fix debugfs use after free
  2020-04-06 15:19           ` Luis Chamberlain
@ 2020-04-07  8:15             ` Luis Chamberlain
  0 siblings, 0 replies; 29+ messages in thread
From: Luis Chamberlain @ 2020-04-07  8:15 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: Bart Van Assche, Eric Sandeen, Jens Axboe, Al Viro,
	Greg Kroah-Hartman, Steven Rostedt, Ingo Molnar, Jan Kara,
	Ming Lei, Michal Hocko, linux-block, Linux FS Devel,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko,
	syzbot+603294af2d01acfdd6da

On Mon, Apr 6, 2020 at 9:19 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Mon, Apr 06, 2020 at 11:18:13AM +0200, Nicolai Stange wrote:
> > Bart Van Assche <bvanassche@acm.org> writes:
>
> > So I'd suggest to drop patch [3/3] from this series and modify this
> > patch [2/3] here to move the blk_q_debugfs_unregister(q) invocation from
> > __blk_release_queue() to blk_unregister_queue() instead.
>
> I'll take a stab.

That didn't work. I'll look for alternatives.

  Luis

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/3] block: address blktrace use-after-free
  2020-04-07  2:47   ` yukuai (C)
@ 2020-04-07 19:00     ` Luis Chamberlain
  2020-04-09 20:59       ` Luis Chamberlain
  0 siblings, 1 reply; 29+ messages in thread
From: Luis Chamberlain @ 2020-04-07 19:00 UTC (permalink / raw)
  To: yukuai (C)
  Cc: Ming Lei, axboe, viro, gregkh, rostedt, mingo, jack, nstange,
	mhocko, linux-block, linux-fsdevel, linux-kernel

On Tue, Apr 07, 2020 at 10:47:01AM +0800, yukuai (C) wrote:
> On 2020/4/3 16:19, Ming Lei wrote:
> 
> > BTW, Yu Kuai posted one patch for this issue, looks that approach
> > is simpler:
> > 
> > https://lore.kernel.org/linux-block/20200324132315.22133-1-yukuai3@huawei.com/
> > 
> > 
> 
> I think the issue might not be fixed with the patch seires.
> 
> At first, I think there are two key points for the issure:
> 1. The final release of queue is delayed in a workqueue
> 2. The creation of 'q->debugfs_dir' might failed(only if 1 exist)
> And if we can fix any of the above problem, the UAF issue will be fixed.
> (BTW, I did not come up with a good idea for problem 1, and my approach
> is for problem 2.)
> 
> The third patch "block: avoid deferral of blk_release_queue() work" is
> not enough to fix problem 1:
> a. if CONFIG_DEBUG_KOBJECT_RELEASE is enable:
> static void kobject_release(struct kref *kref)
> {
>         struct kobject *kobj = container_of(kref, struct kobject, kref);
> #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
>         unsigned long delay = HZ + HZ * (get_random_int() & 0x3);
>         pr_info("kobject: '%s' (%p): %s, parent %p (delayed %ld)\n",
>                 ┊kobject_name(kobj), kobj, __func__, kobj->parent, delay);
>         INIT_DELAYED_WORK(&kobj->release, kobject_delayed_cleanup);
> 
>         schedule_delayed_work(&kobj->release, delay);
> #else
>         kobject_cleanup(kobj);
> #endif
> }
> b. when 'kobject_put' is called from blk_cleanup_queue, can we make sure
> it is the last reference?

You are right, I think I know the fix for this now. Will run some more
tests.

  Luis

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 3/3] block: avoid deferral of blk_release_queue() work
  2020-04-02 14:49     ` Nicolai Stange
  2020-04-06  9:11       ` Nicolai Stange
@ 2020-04-09 18:11       ` Luis Chamberlain
  1 sibling, 0 replies; 29+ messages in thread
From: Luis Chamberlain @ 2020-04-09 18:11 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: Bart Van Assche, axboe, viro, gregkh, rostedt, mingo, jack,
	ming.lei, mhocko, linux-block, linux-fsdevel, linux-kernel,
	Omar Sandoval, Hannes Reinecke, Michal Hocko

On Thu, Apr 02, 2020 at 04:49:37PM +0200, Nicolai Stange wrote:
> Bart Van Assche <bvanassche@acm.org> writes:
> 
> > On 2020-04-01 17:00, Luis Chamberlain wrote:
> > The description of this patch mentions a single blk_release_queue() call
> > that happened in the past from a context from which sleeping is not
> > allowed and from which sleeping is allowed today. Have all other
> > blk_release_queue() / blk_put_queue() calls been verified to see whether
> > none of these happens from a context from which sleeping is not allowed?
> 
> I've just done this today and found the following potentially
> problematic call paths to blk_put_queue().
> 
> 1.) mem_cgroup_throttle_swaprate() takes a spinlock and
>     calls blkcg_schedule_throttle()->blk_put_queue().
> 
>     Also note that AFAICS mem_cgroup_try_charge_delay() can be called
>     with GFP_ATOMIC.

I have a solution to this which would avoid having to deal with the
concern completely. I'll post in my follow up.

> 2.) scsi_unblock_requests() gets called from a lot of drivers and
>     invoke blk_put_queue() through
>     scsi_unblock_requests() -> scsi_run_host_queues() ->
>     scsi_starved_list_run() -> blk_put_queue().

sd_probe() calls device_add_disk(), and the scsi lib also has its
own refcounting for scsi, but unless you call sd_remove() you'll be
protecting the underlying block disk and request_queue, as sd_remove()
calls the del_gendisk() which would in call call blk_unregister_queue()
which calls the last blk_put_queue(). If sd_remove() can be called from
atomic context we can also fix this, and this should be evident how in
my next follow up series of patches.

  Luis

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC 0/3] block: address blktrace use-after-free
  2020-04-07 19:00     ` Luis Chamberlain
@ 2020-04-09 20:59       ` Luis Chamberlain
  0 siblings, 0 replies; 29+ messages in thread
From: Luis Chamberlain @ 2020-04-09 20:59 UTC (permalink / raw)
  To: yukuai (C)
  Cc: Ming Lei, Jens Axboe, Al Viro, Greg Kroah-Hartman,
	Steven Rostedt, Ingo Molnar, Jan Kara, Nicolai Stange,
	Michal Hocko, linux-block, Linux FS Devel, linux-kernel

On Tue, Apr 7, 2020 at 1:00 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Tue, Apr 07, 2020 at 10:47:01AM +0800, yukuai (C) wrote:
> > On 2020/4/3 16:19, Ming Lei wrote:
> >
> > > BTW, Yu Kuai posted one patch for this issue, looks that approach
> > > is simpler:
> > >
> > > https://lore.kernel.org/linux-block/20200324132315.22133-1-yukuai3@huawei.com/
> > >
> > >
> >
> > I think the issue might not be fixed with the patch seires.
> >
> > At first, I think there are two key points for the issure:
> > 1. The final release of queue is delayed in a workqueue
> > 2. The creation of 'q->debugfs_dir' might failed(only if 1 exist)
> > And if we can fix any of the above problem, the UAF issue will be fixed.
> > (BTW, I did not come up with a good idea for problem 1, and my approach
> > is for problem 2.)
> >
> > The third patch "block: avoid deferral of blk_release_queue() work" is
> > not enough to fix problem 1:
> > a. if CONFIG_DEBUG_KOBJECT_RELEASE is enable:
> > static void kobject_release(struct kref *kref)
> > {
> >         struct kobject *kobj = container_of(kref, struct kobject, kref);
> > #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
> >         unsigned long delay = HZ + HZ * (get_random_int() & 0x3);
> >         pr_info("kobject: '%s' (%p): %s, parent %p (delayed %ld)\n",
> >                 ┊kobject_name(kobj), kobj, __func__, kobj->parent, delay);
> >         INIT_DELAYED_WORK(&kobj->release, kobject_delayed_cleanup);
> >
> >         schedule_delayed_work(&kobj->release, delay);
> > #else
> >         kobject_cleanup(kobj);
> > #endif
> > }
> > b. when 'kobject_put' is called from blk_cleanup_queue, can we make sure
> > it is the last reference?
>
> You are right, I think I know the fix for this now. Will run some more
> tests.

Yeap, we were just not refcounting during blktrace. I'll send a fix as
part of the series.

  Luis

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-04-09 20:59 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-01 23:59 [RFC 0/3] block: address blktrace use-after-free Luis Chamberlain
2020-04-02  0:00 ` [RFC 1/3] block: move main block debugfs initialization to its own file Luis Chamberlain
2020-04-05  3:12   ` Bart Van Assche
2020-04-06 14:23     ` Luis Chamberlain
2020-04-02  0:00 ` [RFC 2/3] blktrace: fix debugfs use after free Luis Chamberlain
2020-04-02  1:57   ` Eric Sandeen
2020-04-02 16:14     ` Luis Chamberlain
2020-04-05  3:39   ` Bart Van Assche
2020-04-06  1:27     ` Eric Sandeen
2020-04-06  4:25       ` Bart Van Assche
2020-04-06  9:18         ` Nicolai Stange
2020-04-06 15:19           ` Luis Chamberlain
2020-04-07  8:15             ` Luis Chamberlain
2020-04-06 14:29         ` Eric Sandeen
2020-04-07  8:09           ` Luis Chamberlain
2020-04-06 15:14     ` Luis Chamberlain
2020-04-02  0:00 ` [RFC 3/3] block: avoid deferral of blk_release_queue() work Luis Chamberlain
2020-04-02  3:39   ` Bart Van Assche
2020-04-02 14:49     ` Nicolai Stange
2020-04-06  9:11       ` Nicolai Stange
2020-04-09 18:11       ` Luis Chamberlain
2020-04-02  7:44 ` [RFC 0/3] block: address blktrace use-after-free Greg KH
2020-04-03  8:19 ` Ming Lei
2020-04-03 14:06   ` Luis Chamberlain
2020-04-03 14:13   ` Bart Van Assche
2020-04-03 19:49     ` Luis Chamberlain
2020-04-07  2:47   ` yukuai (C)
2020-04-07 19:00     ` Luis Chamberlain
2020-04-09 20:59       ` Luis Chamberlain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).