linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2 0/5] blktrace: fix use after free
@ 2020-04-09 21:45 Luis Chamberlain
  2020-04-09 21:45 ` [RFC v2 1/5] block: move main block debugfs initialization to its own file Luis Chamberlain
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Luis Chamberlain @ 2020-04-09 21:45 UTC (permalink / raw)
  To: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange, akpm
  Cc: mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Luis Chamberlain

This series fixes a use after free on block trace. This v2 adjusts the
commit log feedback from the first iteration, and also expands on the
series to include a few additional fixes which would be needed for us
to continue with a synchronous request_queue removal. The refcount added
for blktrace resolves the kobject issues pointed out by yukuai. Note
that CONFIG_DEBUG_KOBJECT_RELEASE purposely was added to create
situations where drivers misbehave, and so you should not use it to
test expected userspace behaviour, but just to catch possible kernel
issues. For details refer to the commit which introduced it, which
actually helps a bit more than just reading the kconfig description,
its commit was c817a67ecba7 ("kobject: delayed kobject release: help
find buggy drivers"). This series also fixes a small build issue
discovered by 0-day.

The QUEUE_FLAG_DEFER_REMOVAL flag is added as part of the last patch,
just in case for now. However, given creative use of refcounting
I don't think we need it anymore. An example use case of creative
use of refcounting is provided for mm/swapfile.

I've extended break-blktrace [0] with 3 test cases which now pass
for the most part:

run_0001.sh
run_0002.sh
run_0003.sh

The exception to this is when we get an EBUSY on loopback removal. This
only happens every now and then, and upon further investigation, I
suspect this is happening due to the same race Dave Chinner ran into
with using loopback devices and fstests, which made explicit loopback
destruction lazy via commit a1ecac3b0656 ("loop: Make explicit loop
device destruction lazy"). Further eyeballs on this are appreciated,
perhaps break-blktrace can be extended a bit to account for this.

After a bit of brushing up, I am considering just upstreaming this
as a self tests for blktrace, instead of keeping this out of tree.

Worth noting as well was that it seems odd we didn't consider the
userspace impact of commit dc9edc44de6c ("block: Fix a blk_exit_rl()
regression") merged on v4.12 moved, as that deferral work sure did
have an impact what userspace can expect upon device removal or races
on addition/removal. Its not clear if mentioning any of this on the
commit logs is worth it... Shouldn't have that deferral been a
userspace regression?

If you want this on a git tree you can find it on my 20200409-blktrace-fix-uaf
branch on kernel.org based on linux-next next-20200409.

Feedback, reviews, rants are all greatly appreciated.

[0] https://github.com/mcgrof/break-blktrace
[1] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20200409-blktrace-fix-uaf

Luis Chamberlain (5):
  block: move main block debugfs initialization to its own file
  blktrace: fix debugfs use after free
  blktrace: ref count the request_queue during ioctl
  mm/swapfile: refcount block and queue before using
    blkcg_schedule_throttle()
  block: revert back to synchronous request_queue removal

 block/Makefile               |  1 +
 block/blk-core.c             |  9 +-------
 block/blk-debugfs.c          | 27 ++++++++++++++++++++++
 block/blk-mq-debugfs.c       |  5 -----
 block/blk-sysfs.c            | 43 +++++++++++++++++++++++++++++-------
 block/blk.h                  | 17 ++++++++++++++
 include/linux/blkdev.h       |  7 +++++-
 include/linux/blktrace_api.h |  1 -
 kernel/trace/blktrace.c      | 25 ++++++++++++---------
 mm/swapfile.c                | 11 +++++++++
 10 files changed, 112 insertions(+), 34 deletions(-)
 create mode 100644 block/blk-debugfs.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC v2 1/5] block: move main block debugfs initialization to its own file
  2020-04-09 21:45 [RFC v2 0/5] blktrace: fix use after free Luis Chamberlain
@ 2020-04-09 21:45 ` Luis Chamberlain
  2020-04-10  2:48   ` Bart Van Assche
  2020-04-09 21:45 ` [RFC v2 2/5] blktrace: fix debugfs use after free Luis Chamberlain
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2020-04-09 21:45 UTC (permalink / raw)
  To: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange, akpm
  Cc: mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Luis Chamberlain, Bart Van Assche, Omar Sandoval,
	Hannes Reinecke, Michal Hocko

Single and multiqeueue block devices share some debugfs code. By
moving this into its own file it makes it easier to expand and audit
this shared code.

This patch contains no functional changes.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: yu kuai <yukuai3@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 block/Makefile      |  1 +
 block/blk-core.c    |  9 +--------
 block/blk-debugfs.c | 15 +++++++++++++++
 block/blk.h         |  7 +++++++
 4 files changed, 24 insertions(+), 8 deletions(-)
 create mode 100644 block/blk-debugfs.c

diff --git a/block/Makefile b/block/Makefile
index 206b96e9387f..1d3ab20505d8 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
 			blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
 			genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o
 
+obj-$(CONFIG_DEBUG_FS)		+= blk-debugfs.o
 obj-$(CONFIG_BOUNCE)		+= bounce.o
 obj-$(CONFIG_BLK_SCSI_REQUEST)	+= scsi_ioctl.o
 obj-$(CONFIG_BLK_DEV_BSG)	+= bsg.o
diff --git a/block/blk-core.c b/block/blk-core.c
index 7e4a1da0715e..5aaae7a1b338 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -48,10 +48,6 @@
 #include "blk-pm.h"
 #include "blk-rq-qos.h"
 
-#ifdef CONFIG_DEBUG_FS
-struct dentry *blk_debugfs_root;
-#endif
-
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap);
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap);
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_complete);
@@ -1796,10 +1792,7 @@ int __init blk_dev_init(void)
 
 	blk_requestq_cachep = kmem_cache_create("request_queue",
 			sizeof(struct request_queue), 0, SLAB_PANIC, NULL);
-
-#ifdef CONFIG_DEBUG_FS
-	blk_debugfs_root = debugfs_create_dir("block", NULL);
-#endif
+	blk_debugfs_register();
 
 	return 0;
 }
diff --git a/block/blk-debugfs.c b/block/blk-debugfs.c
new file mode 100644
index 000000000000..634dea4b1507
--- /dev/null
+++ b/block/blk-debugfs.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Shared debugfs mq / non-mq functionality
+ */
+#include <linux/kernel.h>
+#include <linux/blkdev.h>
+#include <linux/debugfs.h>
+
+struct dentry *blk_debugfs_root;
+
+void blk_debugfs_register(void)
+{
+	blk_debugfs_root = debugfs_create_dir("block", NULL);
+}
diff --git a/block/blk.h b/block/blk.h
index 0a94ec68af32..86a66b614f08 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -487,5 +487,12 @@ struct request_queue *__blk_alloc_queue(int node_id);
 int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
 		struct page *page, unsigned int len, unsigned int offset,
 		bool *same_page);
+#ifdef CONFIG_DEBUG_FS
+void blk_debugfs_register(void);
+#else
+static inline void blk_debugfs_register(void)
+{
+}
+#endif /* CONFIG_DEBUG_FS */
 
 #endif /* BLK_INTERNAL_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC v2 2/5] blktrace: fix debugfs use after free
  2020-04-09 21:45 [RFC v2 0/5] blktrace: fix use after free Luis Chamberlain
  2020-04-09 21:45 ` [RFC v2 1/5] block: move main block debugfs initialization to its own file Luis Chamberlain
@ 2020-04-09 21:45 ` Luis Chamberlain
  2020-04-10  2:52   ` Bart Van Assche
  2020-04-09 21:45 ` [RFC v2 3/5] blktrace: ref count the request_queue during ioctl Luis Chamberlain
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2020-04-09 21:45 UTC (permalink / raw)
  To: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange, akpm
  Cc: mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Luis Chamberlain, Bart Van Assche, Omar Sandoval,
	Hannes Reinecke, Michal Hocko, syzbot+603294af2d01acfdd6da

On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory")
merged on v4.12 Omar fixed the original blktrace code for multiqueue use.
This however left in place a possible crash, if you happen to abuse blktrace
in a way it was not intended.

Namely, if you loop adding a device, setup the blktrace with BLKTRACESETUP,
forget to BLKTRACETEARDOWN, and then just remove the device you end up
with a panic:

[  107.193134] debugfs: Directory 'loop0' with parent 'block' already present!
[  107.254615] BUG: kernel NULL pointer dereference, address: 00000000000000a0
[  107.258785] #PF: supervisor write access in kernel mode
[  107.262035] #PF: error_code(0x0002) - not-present page
[  107.264106] PGD 0 P4D 0
[  107.264404] Oops: 0002 [#1] SMP NOPTI
[  107.264803] CPU: 8 PID: 674 Comm: kworker/8:2 Tainted: G            E 5.6.0-rc7-next-20200327 #1
[  107.265712] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
[  107.266553] Workqueue: events __blk_release_queue
[  107.267051] RIP: 0010:down_write+0x15/0x40
[  107.267488] Code: eb ca e8 ee a5 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 55 00 75 0f  65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d
[  107.269300] RSP: 0018:ffff9927c06efda8 EFLAGS: 00010246
[  107.269841] RAX: 0000000000000000 RBX: ffff8be7e73b0600 RCX: ffffff8100000000
[  107.270559] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0
[  107.271281] RBP: 00000000000000a0 R08: ffff8be7ebc80fa8 R09: ffff8be7ebc80fa8
[  107.272001] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  107.272722] R13: ffff8be7efc30400 R14: ffff8be7e0571200 R15: 00000000000000a0
[  107.273475] FS:  0000000000000000(0000) GS:ffff8be7efc00000(0000) knlGS:0000000000000000
[  107.274346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  107.274968] CR2: 00000000000000a0 CR3: 000000042abee003 CR4: 0000000000360ee0
[  107.275710] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  107.276465] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  107.277214] Call Trace:
[  107.277532]  simple_recursive_removal+0x4e/0x2e0
[  107.278049]  ? debugfs_remove+0x60/0x60
[  107.278493]  debugfs_remove+0x40/0x60
[  107.278922]  blk_trace_free+0xd/0x50
[  107.279339]  __blk_trace_remove+0x27/0x40
[  107.279797]  blk_trace_shutdown+0x30/0x40
[  107.280256]  __blk_release_queue+0xab/0x110
[  107.280734]  process_one_work+0x1b4/0x380
[  107.281194]  worker_thread+0x50/0x3c0
[  107.281622]  kthread+0xf9/0x130
[  107.281994]  ? process_one_work+0x380/0x380
[  107.282467]  ? kthread_park+0x90/0x90
[  107.282895]  ret_from_fork+0x1f/0x40
[  107.283316] Modules linked in: loop(E) <etc>
[  107.288562] CR2: 00000000000000a0
[  107.288957] ---[ end trace b885d243d441bbce ]---

This splat happens to be very similar to the one reported via
kernel.org korg#205713, only that korg#205713 was for v4.19.83
and the above now includes the simple_recursive_removal() introduced
via commit a3d1e7eb5abe ("simple_recursive_removal(): kernel-side rm
-rf for ramfs-style filesystems") merged on v5.6.

korg#205713 then was used to create CVE-2019-19770 and claims that
the bug is in a use-after-free in the debugfs core code. The
implications of this being a generic UAF on debugfs would be
much more severe, as it would imply parent dentries can sometimes
not be positive, which we hold by design is just not possible.

Below is the splat explained with a bit more details, explaining
what is happening in userspace, kernel, and a print of the CPU on,
which the code runs on:

load loopback module
[   13.603371] == blk_mq_debugfs_register(12) start
[   13.604040] == blk_mq_debugfs_register(12) q->debugfs_dir created
[   13.604934] == blk_mq_debugfs_register(12) end
[   13.627382] == blk_mq_debugfs_register(12) start
[   13.628041] == blk_mq_debugfs_register(12) q->debugfs_dir created
[   13.629240] == blk_mq_debugfs_register(12) end
[   13.651667] == blk_mq_debugfs_register(12) start
[   13.652836] == blk_mq_debugfs_register(12) q->debugfs_dir created
[   13.655107] == blk_mq_debugfs_register(12) end
[   13.684917] == blk_mq_debugfs_register(12) start
[   13.687876] == blk_mq_debugfs_register(12) q->debugfs_dir created
[   13.691588] == blk_mq_debugfs_register(13) end
[   13.707320] == blk_mq_debugfs_register(13) start
[   13.707863] == blk_mq_debugfs_register(13) q->debugfs_dir created
[   13.708856] == blk_mq_debugfs_register(13) end
[   13.735623] == blk_mq_debugfs_register(13) start
[   13.736656] == blk_mq_debugfs_register(13) q->debugfs_dir created
[   13.738411] == blk_mq_debugfs_register(13) end
[   13.763326] == blk_mq_debugfs_register(13) start
[   13.763972] == blk_mq_debugfs_register(13) q->debugfs_dir created
[   13.765167] == blk_mq_debugfs_register(13) end
[   13.779510] == blk_mq_debugfs_register(13) start
[   13.780522] == blk_mq_debugfs_register(13) q->debugfs_dir created
[   13.782338] == blk_mq_debugfs_register(13) end
[   13.783521] loop: module loaded

LOOP_CTL_DEL(loop0) #1
[   13.803550] = __blk_release_queue(4) start
[   13.807772] == blk_trace_shutdown(4) start
[   13.810749] == blk_trace_shutdown(4) end
[   13.813437] = __blk_release_queue(4) calling blk_mq_debugfs_unregister()
[   13.817593] ==== blk_mq_debugfs_unregister(4) begin
[   13.817621] ==== blk_mq_debugfs_unregister(4) debugfs_remove_recursive(q->debugfs_dir)
[   13.821203] ==== blk_mq_debugfs_unregister(4) end q->debugfs_dir is NULL
[   13.826166] = __blk_release_queue(4) blk_mq_debugfs_unregister() end
[   13.832992] = __blk_release_queue(4) end

LOOP_CTL_ADD(loop0) #1
[   13.843742] == blk_mq_debugfs_register(7) start
[   13.845569] == blk_mq_debugfs_register(7) q->debugfs_dir created
[   13.848628] == blk_mq_debugfs_register(7) end

BLKTRACE_SETUP(loop0) #1
[   13.850924] == blk_trace_ioctl(7, BLKTRACESETUP) start
[   13.852852] === do_blk_trace_setup(7) start
[   13.854580] === do_blk_trace_setup(7) creating directory
[   13.856620] === do_blk_trace_setup(7) using what debugfs_lookup() gave
[   13.860635] === do_blk_trace_setup(7) end with ret: 0
[   13.862615] == blk_trace_ioctl(7, BLKTRACESETUP) end

LOOP_CTL_DEL(loop0) #2
[   13.883304] = __blk_release_queue(7) start
[   13.885324] == blk_trace_shutdown(7) start
[   13.887197] == blk_trace_shutdown(7) calling __blk_trace_remove()
[   13.889807] == __blk_trace_remove(7) start
[   13.891669] === blk_trace_cleanup(7) start
[   13.911656] ====== blk_trace_free(7) start

LOOP_CTL_ADD(loop0) #2
[   13.912709] == blk_mq_debugfs_register(2) start

---> From LOOP_CTL_DEL(loop0) #2
[   13.915887] ====== blk_trace_free(7) end

---> From LOOP_CTL_ADD(loop0) #2
[   13.918359] debugfs: Directory 'loop0' with parent 'block' already present!
[   13.926433] == blk_mq_debugfs_register(2) q->debugfs_dir created
[   13.930373] == blk_mq_debugfs_register(2) end

BLKTRACE_SETUP(loop0) #2
[   13.933961] == blk_trace_ioctl(2, BLKTRACESETUP) start
[   13.936758] === do_blk_trace_setup(2) start
[   13.938944] === do_blk_trace_setup(2) creating directory
[   13.941029] === do_blk_trace_setup(2) using what debugfs_lookup() gave

---> From LOOP_CTL_DEL(loop0) #2
[   13.971046] === blk_trace_cleanup(7) end
[   13.973175] == __blk_trace_remove(7) end
[   13.975352] == blk_trace_shutdown(7) end
[   13.977415] = __blk_release_queue(7) calling blk_mq_debugfs_unregister()
[   13.980645] ==== blk_mq_debugfs_unregister(7) begin
[   13.980696] ==== blk_mq_debugfs_unregister(7) debugfs_remove_recursive(q->debugfs_dir)
[   13.983118] ==== blk_mq_debugfs_unregister(7) end q->debugfs_dir is NULL
[   13.986945] = __blk_release_queue(7) blk_mq_debugfs_unregister() end
[   13.993155] = __blk_release_queue(7) end

---> From BLKTRACE_SETUP(loop0) #2
[   13.995928] === do_blk_trace_setup(2) end with ret: 0
[   13.997623] == blk_trace_ioctl(2, BLKTRACESETUP) end

LOOP_CTL_DEL(loop0) #3
[   14.035119] = __blk_release_queue(2) start
[   14.036925] == blk_trace_shutdown(2) start
[   14.038518] == blk_trace_shutdown(2) calling __blk_trace_remove()
[   14.040829] == __blk_trace_remove(2) start
[   14.042413] === blk_trace_cleanup(2) start

LOOP_CTL_ADD(loop0) #3
[   14.072522] == blk_mq_debugfs_register(6) start

---> From LOOP_CTL_DEL(loop0) #3
[   14.075151] ====== blk_trace_free(2) start

---> From LOOP_CTL_ADD(loop0) #3
[   14.075882] == blk_mq_debugfs_register(6) q->debugfs_dir created

---> From LOOP_CTL_DEL(loop0) #3
[   14.078624] BUG: kernel NULL pointer dereference, address: 00000000000000a0
[   14.084332] == blk_mq_debugfs_register(6) end
[   14.086971] #PF: supervisor write access in kernel mode
[   14.086974] #PF: error_code(0x0002) - not-present page
[   14.086977] PGD 0 P4D 0
[   14.086984] Oops: 0002 [#1] SMP NOPTI
[   14.086990] CPU: 2 PID: 287 Comm: kworker/2:2 Tainted: G            E 5.6.0-next-20200403+ #54
[   14.086991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
[   14.087002] Workqueue: events __blk_release_queue
[   14.087011] RIP: 0010:down_write+0x15/0x40
[   14.090300] == blk_trace_ioctl(6, BLKTRACESETUP) start
[   14.093277] Code: eb ca e8 3e 34 8d ff cc cc cc cc cc cc cc cc cc cc
cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00
00 <f0> 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d
[   14.093280] RSP: 0018:ffffc28a00533da8 EFLAGS: 00010246
[   14.093284] RAX: 0000000000000000 RBX: ffff9f7a24d07980 RCX: ffffff8100000000
[   14.093286] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0
[   14.093287] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000019
[   14.093289] R10: 0000000000000774 R11: 0000000000000000 R12: 0000000000000000
[   14.093291] R13: ffff9f7a2fab0400 R14: ffff9f7a21dd1140 R15: 00000000000000a0
[   14.093294] FS:  0000000000000000(0000) GS:ffff9f7a2fa80000(0000) knlGS:0000000000000000
[   14.093296] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.093298] CR2: 00000000000000a0 CR3: 00000004293d2003 CR4: 0000000000360ee0
[   14.093307] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   14.093308] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   14.093310] Call Trace:
[   14.093324]  simple_recursive_removal+0x4e/0x2e0
[   14.093330]  ? debugfs_remove+0x60/0x60
[   14.093334]  debugfs_remove+0x40/0x60
[   14.093339]  blk_trace_free+0x20/0x70
[   14.093346]  __blk_trace_remove+0x54/0x90
[   14.096704] === do_blk_trace_setup(6) start
[   14.098534]  blk_trace_shutdown+0x74/0x80
[   14.100958] === do_blk_trace_setup(6) creating directory
[   14.104575]  __blk_release_queue+0xbe/0x160
[   14.104580]  process_one_work+0x1b4/0x380
[   14.104585]  worker_thread+0x50/0x3c0
[   14.104589]  kthread+0xf9/0x130
[   14.104593]  ? process_one_work+0x380/0x380
[   14.104596]  ? kthread_park+0x90/0x90
[   14.104599]  ret_from_fork+0x1f/0x40
[   14.104603] Modules linked in: loop(E) xfs(E) libcrc32c(E)
crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) joydev(E)
serio_raw(E) aesni_intel(E) glue_helper(E) virtio_balloon(E) evdev(E)
crypto_simd(E) pcspkr(E) cryptd(E) i6300esb(E) button(E) ip_tables(E)
x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E)
jbd2(E) virtio_net(E) net_failover(E) failover(E) virtio_blk(E)
ata_generic(E) uhci_hcd(E) ata_piix(E) ehci_hcd(E) nvme(E) libata(E)
crc32c_intel(E) usbcore(E) psmouse(E) nvme_core(E) virtio_pci(E)
scsi_mod(E) virtio_ring(E) t10_pi(E) virtio(E) i2c_piix4(E) floppy(E)
[   14.107400] === do_blk_trace_setup(6) using what debugfs_lookup() gave
[   14.108939] CR2: 00000000000000a0
[   14.110589] === do_blk_trace_setup(6) end with ret: 0
[   14.111592] ---[ end trace 7a783b33b9614db9 ]---

The root cause to this issue is that debugfs_lookup() can find a
previous incarnation's dir of the same name which is about to get
removed from a not yet schedule work.

We can fix the UAF by simply using a debugfs directory which moving
forward will always be accessible if debugfs is enabled, this way,
its allocated and avaialble always for both request-based block
drivers or make_request drivers (multiqueue) block drivers.

This simplifies the code considerably, with the only penalty now being
that we're always creating the request queue debugfs directory for the
request-based block device drivers.

The UAF then is not a core debugfs issue, but instead a misuse of
debugfs, and this issue can only be triggered if you are root, and
misuse blktrace.

This issue can be reproduced with break-blktrace [2] using:

  break-blktrace -c 10 -d -s

This patch fixes this issue. Note that there is also another
respective UAF but from the ioctl path [3], this should also fix
that issue.

This patch then also disputes the severity of CVE-2019-19770 as
this issue is only possible by being root and using blktrace.

It is not a core debugfs issue.

[0] https://bugzilla.kernel.org/show_bug.cgi?id=205713
[1] https://nvd.nist.gov/vuln/detail/CVE-2019-19770
[2] https://github.com/mcgrof/break-blktrace
[3] https://lore.kernel.org/lkml/000000000000ec635b059f752700@google.com/

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: yu kuai <yukuai3@huawei.com>
Reported-by: syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com
Fiexes: 6ac93117ab00 ("blktrace: use existing disk debugfs directory")
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 block/blk-debugfs.c          | 12 ++++++++++++
 block/blk-mq-debugfs.c       |  5 -----
 block/blk-sysfs.c            |  3 +++
 block/blk.h                  | 10 ++++++++++
 include/linux/blkdev.h       |  4 +++-
 include/linux/blktrace_api.h |  1 -
 kernel/trace/blktrace.c      | 19 ++++++++-----------
 7 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/block/blk-debugfs.c b/block/blk-debugfs.c
index 634dea4b1507..a8b343e758e4 100644
--- a/block/blk-debugfs.c
+++ b/block/blk-debugfs.c
@@ -13,3 +13,15 @@ void blk_debugfs_register(void)
 {
 	blk_debugfs_root = debugfs_create_dir("block", NULL);
 }
+
+void blk_q_debugfs_register(struct request_queue *q)
+{
+	q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
+					    blk_debugfs_root);
+}
+
+void blk_q_debugfs_unregister(struct request_queue *q)
+{
+	debugfs_remove_recursive(q->debugfs_dir);
+	q->debugfs_dir = NULL;
+}
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index b3f2ba483992..bda9378eab90 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -823,9 +823,6 @@ void blk_mq_debugfs_register(struct request_queue *q)
 	struct blk_mq_hw_ctx *hctx;
 	int i;
 
-	q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
-					    blk_debugfs_root);
-
 	debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs);
 
 	/*
@@ -856,9 +853,7 @@ void blk_mq_debugfs_register(struct request_queue *q)
 
 void blk_mq_debugfs_unregister(struct request_queue *q)
 {
-	debugfs_remove_recursive(q->debugfs_dir);
 	q->sched_debugfs_dir = NULL;
-	q->debugfs_dir = NULL;
 }
 
 static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx,
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index fca9b158f4a0..20f20b0fa0b9 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -895,6 +895,7 @@ static void __blk_release_queue(struct work_struct *work)
 
 	blk_trace_shutdown(q);
 
+	blk_q_debugfs_unregister(q);
 	if (queue_is_mq(q))
 		blk_mq_debugfs_unregister(q);
 
@@ -975,6 +976,8 @@ int blk_register_queue(struct gendisk *disk)
 		goto unlock;
 	}
 
+	blk_q_debugfs_register(q);
+
 	if (queue_is_mq(q)) {
 		__blk_mq_register_dev(dev, q);
 		blk_mq_debugfs_register(q);
diff --git a/block/blk.h b/block/blk.h
index 86a66b614f08..b86123a2d74f 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -489,10 +489,20 @@ int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
 		bool *same_page);
 #ifdef CONFIG_DEBUG_FS
 void blk_debugfs_register(void);
+void blk_q_debugfs_register(struct request_queue *q);
+void blk_q_debugfs_unregister(struct request_queue *q);
 #else
 static inline void blk_debugfs_register(void)
 {
 }
+
+static inline void blk_q_debugfs_register(struct request_queue *q)
+{
+}
+
+static inline void blk_q_debugfs_unregister(struct request_queue *q)
+{
+}
 #endif /* CONFIG_DEBUG_FS */
 
 #endif /* BLK_INTERNAL_H */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 32868fbedc9e..8b1cab52cef9 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -569,8 +569,10 @@ struct request_queue {
 	struct list_head	tag_set_list;
 	struct bio_set		bio_split;
 
-#ifdef CONFIG_BLK_DEBUG_FS
+#ifdef CONFIG_DEBUG_FS
 	struct dentry		*debugfs_dir;
+#endif
+#ifdef CONFIG_BLK_DEBUG_FS
 	struct dentry		*sched_debugfs_dir;
 	struct dentry		*rqos_debugfs_dir;
 #endif
diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h
index 3b6ff5902edc..eb6db276e293 100644
--- a/include/linux/blktrace_api.h
+++ b/include/linux/blktrace_api.h
@@ -22,7 +22,6 @@ struct blk_trace {
 	u64 end_lba;
 	u32 pid;
 	u32 dev;
-	struct dentry *dir;
 	struct dentry *dropped_file;
 	struct dentry *msg_file;
 	struct list_head running_list;
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index ca39dc3230cb..15086227592f 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -311,7 +311,6 @@ static void blk_trace_free(struct blk_trace *bt)
 	debugfs_remove(bt->msg_file);
 	debugfs_remove(bt->dropped_file);
 	relay_close(bt->rchan);
-	debugfs_remove(bt->dir);
 	free_percpu(bt->sequence);
 	free_percpu(bt->msg_data);
 	kfree(bt);
@@ -476,7 +475,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 			      struct blk_user_trace_setup *buts)
 {
 	struct blk_trace *bt = NULL;
-	struct dentry *dir = NULL;
 	int ret;
 
 	if (!buts->buf_size || !buts->buf_nr)
@@ -485,6 +483,9 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 	if (!blk_debugfs_root)
 		return -ENOENT;
 
+	if (!q->debugfs_dir)
+		return -ENOENT;
+
 	strncpy(buts->name, name, BLKTRACE_BDEV_SIZE);
 	buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0';
 
@@ -509,21 +510,19 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 
 	ret = -ENOENT;
 
-	dir = debugfs_lookup(buts->name, blk_debugfs_root);
-	if (!dir)
-		bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root);
-
 	bt->dev = dev;
 	atomic_set(&bt->dropped, 0);
 	INIT_LIST_HEAD(&bt->running_list);
 
 	ret = -EIO;
-	bt->dropped_file = debugfs_create_file("dropped", 0444, dir, bt,
+	bt->dropped_file = debugfs_create_file("dropped", 0444,
+					       q->debugfs_dir, bt,
 					       &blk_dropped_fops);
 
-	bt->msg_file = debugfs_create_file("msg", 0222, dir, bt, &blk_msg_fops);
+	bt->msg_file = debugfs_create_file("msg", 0222, q->debugfs_dir,
+					   bt, &blk_msg_fops);
 
-	bt->rchan = relay_open("trace", dir, buts->buf_size,
+	bt->rchan = relay_open("trace", q->debugfs_dir, buts->buf_size,
 				buts->buf_nr, &blk_relay_callbacks, bt);
 	if (!bt->rchan)
 		goto err;
@@ -551,8 +550,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 
 	ret = 0;
 err:
-	if (dir && !bt->dir)
-		dput(dir);
 	if (ret)
 		blk_trace_free(bt);
 	return ret;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC v2 3/5] blktrace: ref count the request_queue during ioctl
  2020-04-09 21:45 [RFC v2 0/5] blktrace: fix use after free Luis Chamberlain
  2020-04-09 21:45 ` [RFC v2 1/5] block: move main block debugfs initialization to its own file Luis Chamberlain
  2020-04-09 21:45 ` [RFC v2 2/5] blktrace: fix debugfs use after free Luis Chamberlain
@ 2020-04-09 21:45 ` Luis Chamberlain
  2020-04-10  2:58   ` Bart Van Assche
  2020-04-09 21:45 ` [RFC v2 4/5] mm/swapfile: refcount block and queue before using blkcg_schedule_throttle() Luis Chamberlain
  2020-04-09 21:45 ` [RFC v2 5/5] block: revert back to synchronous request_queue removal Luis Chamberlain
  4 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2020-04-09 21:45 UTC (permalink / raw)
  To: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange, akpm
  Cc: mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Luis Chamberlain, Bart Van Assche, Omar Sandoval,
	Hannes Reinecke, Michal Hocko

Ensure that the reqest_queue ref counts the request_queue
during its full ioctl cycle. This avoids possible races against
removal, given blk_get_queue() also checks to ensure the queue
is not dying.

This small race is possible if you defer removal of the reqest_queue
and userspace fires off an ioctl for the device in the meantime.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: yu kuai <yukuai3@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 kernel/trace/blktrace.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 15086227592f..17e144d15779 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -701,6 +701,9 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg)
 	if (!q)
 		return -ENXIO;
 
+	if (!blk_get_queue(q))
+		return -ENXIO;
+
 	mutex_lock(&q->blk_trace_mutex);
 
 	switch (cmd) {
@@ -729,6 +732,9 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg)
 	}
 
 	mutex_unlock(&q->blk_trace_mutex);
+
+	blk_put_queue(q);
+
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC v2 4/5] mm/swapfile: refcount block and queue before using blkcg_schedule_throttle()
  2020-04-09 21:45 [RFC v2 0/5] blktrace: fix use after free Luis Chamberlain
                   ` (2 preceding siblings ...)
  2020-04-09 21:45 ` [RFC v2 3/5] blktrace: ref count the request_queue during ioctl Luis Chamberlain
@ 2020-04-09 21:45 ` Luis Chamberlain
  2020-04-10  3:02   ` Bart Van Assche
  2020-04-09 21:45 ` [RFC v2 5/5] block: revert back to synchronous request_queue removal Luis Chamberlain
  4 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2020-04-09 21:45 UTC (permalink / raw)
  To: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange, akpm
  Cc: mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Luis Chamberlain, Bart Van Assche, Omar Sandoval,
	Hannes Reinecke, Michal Hocko

block device are refcounted so to ensure once its final user goes away it
can be cleaned up by the lower layers properly. The block device's
request_queue structure is also refcounted, however if the last
blk_put_queue() is called under atomic context the block layer has
to defer removal.

By refcounting the block device during the use of blkcg_schedule_throttle(),
we ensure ensure two things:

1) the block device remains available during the call
2) we ensure avoid having to deal with the fact we're using the
   request_queue structure in atomic context, since the last
   blk_put_queue() will be called upon disk_release(), *after*
   our own bdput().

This means this code path is *not* going to remove the request_queue
structure, as we are ensuring some later upper layer disk_release()
will be the one to release the request_queue structure for us.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: yu kuai <yukuai3@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/swapfile.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 6659ab563448..5f6f3a61b5c0 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3753,6 +3753,7 @@ static void free_swap_count_continuations(struct swap_info_struct *si)
 void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node,
 				  gfp_t gfp_mask)
 {
+	struct block_device *bdev;
 	struct swap_info_struct *si, *next;
 	if (!(gfp_mask & __GFP_IO) || !memcg)
 		return;
@@ -3771,8 +3772,18 @@ void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node,
 	plist_for_each_entry_safe(si, next, &swap_avail_heads[node],
 				  avail_lists[node]) {
 		if (si->bdev) {
+			bdev = bdgrab(si->bdev);
+			if (!bdev)
+				continue;
+			/*
+			 * By adding our own bdgrab() we ensure the queue
+			 * sticks around until disk_release(), and so we ensure
+			 * our release of the request_queue does not happen in
+			 * atomic context.
+			 */
 			blkcg_schedule_throttle(bdev_get_queue(si->bdev),
 						true);
+			bdput(bdev);
 			break;
 		}
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC v2 5/5] block: revert back to synchronous request_queue removal
  2020-04-09 21:45 [RFC v2 0/5] blktrace: fix use after free Luis Chamberlain
                   ` (3 preceding siblings ...)
  2020-04-09 21:45 ` [RFC v2 4/5] mm/swapfile: refcount block and queue before using blkcg_schedule_throttle() Luis Chamberlain
@ 2020-04-09 21:45 ` Luis Chamberlain
  2020-04-10  3:12   ` Bart Van Assche
  4 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2020-04-09 21:45 UTC (permalink / raw)
  To: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange, akpm
  Cc: mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Luis Chamberlain, Bart Van Assche, Omar Sandoval,
	Hannes Reinecke, Michal Hocko

Commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") merged on
v4.12 moved the work behind blk_release_queue() into a workqueue after a
splat floated around which indicated some work on blk_release_queue()
could sleep in blk_exit_rl(). This splat would be possible when a driver
called blk_put_queue() or blk_cleanup_queue() (which calls blk_put_queue()
as its final call) from an atomic context.

blk_put_queue() puts decrements the refcount for the request_queue
kobject, and upon reaching 0 blk_release_queue() is called. Although
blk_exit_rl() is now removed through commit db6d9952356 ("block: remove
request_list code"), we reserve the right to be able to sleep within
blk_release_queue() context. There should be little reason to
defer removal from atomic context these days, as you can always just
increase your block device's reference count even in atomic context and
leave the removal for the request_queue to the upper layers later.
However if you really need to defer removal of the request_queue, you can
set the queue flag QUEUE_FLAG_DEFER_REMOVAL now.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: yu kuai <yukuai3@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 block/blk-sysfs.c      | 40 ++++++++++++++++++++++++++++++++--------
 include/linux/blkdev.h |  3 +++
 2 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 20f20b0fa0b9..2ae8c39c88ef 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -860,10 +860,9 @@ static void blk_exit_queue(struct request_queue *q)
 	bdi_put(q->backing_dev_info);
 }
 
-
 /**
- * __blk_release_queue - release a request queue
- * @work: pointer to the release_work member of the request queue to be released
+ * blk_release_queue_sync- release a request queue
+ * @q: pointer to the request queue to be released
  *
  * Description:
  *     This function is called when a block device is being unregistered. The
@@ -872,11 +871,27 @@ static void blk_exit_queue(struct request_queue *q)
  *     the reference counter of the request queue. Once the reference counter
  *     of the request queue reaches zero, blk_release_queue is called to release
  *     all allocated resources of the request queue.
+ *
+ *     There are two approaches to releasing the request queue, by default
+ *     we reserve the right to sleep on release and so release is synchronous.
+ *     If you know the path under which blk_cleanup_queue() or your last
+ *     blk_put_queue() is called can be called in atomic context you want to
+ *     ensure to defer the removal by setting the QUEUE_FLAG_DEFER_REMOVAL
+ *     flag as follows upon initialization:
+ *
+ *     blk_queue_flag_set(QUEUE_FLAG_DEFER_REMOVAL, q)
+ *
+ *     Note that deferring removal may have implications for userspace. An
+ *     example is if you are using an ioctl to allow removal of a block device,
+ *     and the kernel returns immediately even though the device may only
+ *     disappear after the full removal is completed.
+ *
+ *     You should also be able to work around this by just increasing the
+ *     refcount for the block device instead during your atomic operation,
+ *     and so QUEUE_FLAG_DEFER_REMOVAL should almost never be required.
  */
-static void __blk_release_queue(struct work_struct *work)
+static void blk_release_queue_sync(struct request_queue *q)
 {
-	struct request_queue *q = container_of(work, typeof(*q), release_work);
-
 	if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags))
 		blk_stat_remove_callback(q, q->poll_cb);
 	blk_stat_free_callback(q->poll_cb);
@@ -905,13 +920,22 @@ static void __blk_release_queue(struct work_struct *work)
 	call_rcu(&q->rcu_head, blk_free_queue_rcu);
 }
 
+void __blk_release_queue(struct work_struct *work)
+{
+	struct request_queue *q = container_of(work, typeof(*q), release_work);
+
+	blk_release_queue_sync(q);
+}
+
 static void blk_release_queue(struct kobject *kobj)
 {
 	struct request_queue *q =
 		container_of(kobj, struct request_queue, kobj);
 
-	INIT_WORK(&q->release_work, __blk_release_queue);
-	schedule_work(&q->release_work);
+	if (blk_queue_defer_removal(q))
+		schedule_work(&q->release_work);
+	else
+		blk_release_queue_sync(q);
 }
 
 static const struct sysfs_ops queue_sysfs_ops = {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8b1cab52cef9..46fee1ef92e3 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -614,6 +614,7 @@ struct request_queue {
 #define QUEUE_FLAG_PCI_P2PDMA	25	/* device supports PCI p2p requests */
 #define QUEUE_FLAG_ZONE_RESETALL 26	/* supports Zone Reset All */
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
+#define QUEUE_FLAG_DEFER_REMOVAL 28	/* defer queue removal */
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP))
@@ -648,6 +649,8 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #else
 #define blk_queue_rq_alloc_time(q)	false
 #endif
+#define blk_queue_defer_removal(q) \
+	test_bit(QUEUE_FLAG_DEFER_REMOVAL, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC v2 1/5] block: move main block debugfs initialization to its own file
  2020-04-09 21:45 ` [RFC v2 1/5] block: move main block debugfs initialization to its own file Luis Chamberlain
@ 2020-04-10  2:48   ` Bart Van Assche
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2020-04-10  2:48 UTC (permalink / raw)
  To: Luis Chamberlain, axboe, viro, gregkh, rostedt, mingo, jack,
	ming.lei, nstange, akpm
  Cc: mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko

On 2020-04-09 14:45, Luis Chamberlain wrote:
> Single and multiqeueue block devices share some debugfs code. By
> moving this into its own file it makes it easier to expand and audit
> this shared code.

Christoph doesn't like the name "single queue block devices".
Additionally, the legacy (single queue) block layer is gone. Should the
description of this patch perhaps refer to make_request-based drivers
and request-based drivers?

> +/*
> + * Shared debugfs mq / non-mq functionality
> + */

Shared request-based / make_request-based functionality?

Otherwise this patch looks good to me.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 2/5] blktrace: fix debugfs use after free
  2020-04-09 21:45 ` [RFC v2 2/5] blktrace: fix debugfs use after free Luis Chamberlain
@ 2020-04-10  2:52   ` Bart Van Assche
  2020-04-10 19:58     ` Luis Chamberlain
  0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2020-04-10  2:52 UTC (permalink / raw)
  To: Luis Chamberlain, axboe, viro, gregkh, rostedt, mingo, jack,
	ming.lei, nstange, akpm
  Cc: mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko,
	syzbot+603294af2d01acfdd6da

On 2020-04-09 14:45, Luis Chamberlain wrote:
> +void blk_q_debugfs_register(struct request_queue *q)
> +{
> +	q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
> +					    blk_debugfs_root);
> +}
> +
> +void blk_q_debugfs_unregister(struct request_queue *q)
> +{
> +	debugfs_remove_recursive(q->debugfs_dir);
> +	q->debugfs_dir = NULL;
> +}

There are no other functions in the block layer that start with the
prefix blk_q_. How about changing that prefix into blk_?

> -#ifdef CONFIG_BLK_DEBUG_FS
> +#ifdef CONFIG_DEBUG_FS
>  	struct dentry		*debugfs_dir;
> +#endif

Please add a comment above 'debugfs_dir' that it is used not only by the
code in block/blk-*debugfs.c but also by the code in
kernel/trace/blktrace.c. Otherwise this patch looks good to me.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 3/5] blktrace: ref count the request_queue during ioctl
  2020-04-09 21:45 ` [RFC v2 3/5] blktrace: ref count the request_queue during ioctl Luis Chamberlain
@ 2020-04-10  2:58   ` Bart Van Assche
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2020-04-10  2:58 UTC (permalink / raw)
  To: Luis Chamberlain, axboe, viro, gregkh, rostedt, mingo, jack,
	ming.lei, nstange, akpm
  Cc: mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko

On 2020-04-09 14:45, Luis Chamberlain wrote:
> Ensure that the reqest_queue ref counts the request_queue
                  ^^^^^^^^^^^^ ^^^^^^^^^^
                request_queue? refcounts?
> during its full ioctl cycle. This avoids possible races against
> removal, given blk_get_queue() also checks to ensure the queue
> is not dying.
> 
> This small race is possible if you defer removal of the reqest_queue
                                                          ^^^^^^^^^^^^
                                                          request_queue?
> and userspace fires off an ioctl for the device in the meantime.

Anyway:

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 4/5] mm/swapfile: refcount block and queue before using blkcg_schedule_throttle()
  2020-04-09 21:45 ` [RFC v2 4/5] mm/swapfile: refcount block and queue before using blkcg_schedule_throttle() Luis Chamberlain
@ 2020-04-10  3:02   ` Bart Van Assche
  2020-04-10 14:34     ` Luis Chamberlain
  0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2020-04-10  3:02 UTC (permalink / raw)
  To: Luis Chamberlain, axboe, viro, gregkh, rostedt, mingo, jack,
	ming.lei, nstange, akpm
  Cc: mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko

On 2020-04-09 14:45, Luis Chamberlain wrote:
>  		if (si->bdev) {
> +			bdev = bdgrab(si->bdev);
> +			if (!bdev)
> +				continue;
> +			/*
> +			 * By adding our own bdgrab() we ensure the queue
> +			 * sticks around until disk_release(), and so we ensure
> +			 * our release of the request_queue does not happen in
> +			 * atomic context.
> +			 */
>  			blkcg_schedule_throttle(bdev_get_queue(si->bdev),
>  						true);

How about changing the si->bdev argument of blkcg_schedule_throttle()
into bdev?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 5/5] block: revert back to synchronous request_queue removal
  2020-04-09 21:45 ` [RFC v2 5/5] block: revert back to synchronous request_queue removal Luis Chamberlain
@ 2020-04-10  3:12   ` Bart Van Assche
  2020-04-10 14:34     ` Luis Chamberlain
  0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2020-04-10  3:12 UTC (permalink / raw)
  To: Luis Chamberlain, axboe, viro, gregkh, rostedt, mingo, jack,
	ming.lei, nstange, akpm
  Cc: mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko

On 2020-04-09 14:45, Luis Chamberlain wrote:
> blk_put_queue() puts decrements the refcount for the request_queue
                  ^^^^
        can this word be left out?

> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 8b1cab52cef9..46fee1ef92e3 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -614,6 +614,7 @@ struct request_queue {
>  #define QUEUE_FLAG_PCI_P2PDMA	25	/* device supports PCI p2p requests */
>  #define QUEUE_FLAG_ZONE_RESETALL 26	/* supports Zone Reset All */
>  #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
> +#define QUEUE_FLAG_DEFER_REMOVAL 28	/* defer queue removal */
>  
>  #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
>  				 (1 << QUEUE_FLAG_SAME_COMP))
> @@ -648,6 +649,8 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
>  #else
>  #define blk_queue_rq_alloc_time(q)	false
>  #endif
> +#define blk_queue_defer_removal(q) \
> +	test_bit(QUEUE_FLAG_DEFER_REMOVAL, &(q)->queue_flags)

Since blk_queue_defer_removal() has no callers the code that depends on
QUEUE_FLAG_DEFER_REMOVAL to be set will be subject to bitrot. It would
make me happy if the QUEUE_FLAG_DEFER_REMOVAL flag and the code that
depends on that flag would be removed.

Please add a might_sleep() call in blk_put_queue() since with this patch
applied it is no longer allowed to call blk_put_queue() from atomic context.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 5/5] block: revert back to synchronous request_queue removal
  2020-04-10  3:12   ` Bart Van Assche
@ 2020-04-10 14:34     ` Luis Chamberlain
  2020-04-10 20:50       ` Luis Chamberlain
  0 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2020-04-10 14:34 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange,
	akpm, mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko

On Thu, Apr 09, 2020 at 08:12:21PM -0700, Bart Van Assche wrote:
> On 2020-04-09 14:45, Luis Chamberlain wrote:
> > blk_put_queue() puts decrements the refcount for the request_queue
>                   ^^^^
>         can this word be left out?

Sure.

> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index 8b1cab52cef9..46fee1ef92e3 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -614,6 +614,7 @@ struct request_queue {
> >  #define QUEUE_FLAG_PCI_P2PDMA	25	/* device supports PCI p2p requests */
> >  #define QUEUE_FLAG_ZONE_RESETALL 26	/* supports Zone Reset All */
> >  #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
> > +#define QUEUE_FLAG_DEFER_REMOVAL 28	/* defer queue removal */
> >  
> >  #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
> >  				 (1 << QUEUE_FLAG_SAME_COMP))
> > @@ -648,6 +649,8 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
> >  #else
> >  #define blk_queue_rq_alloc_time(q)	false
> >  #endif
> > +#define blk_queue_defer_removal(q) \
> > +	test_bit(QUEUE_FLAG_DEFER_REMOVAL, &(q)->queue_flags)
> 
> Since blk_queue_defer_removal() has no callers the code that depends on
> QUEUE_FLAG_DEFER_REMOVAL to be set will be subject to bitrot. It would
> make me happy if the QUEUE_FLAG_DEFER_REMOVAL flag and the code that
> depends on that flag would be removed.

Sure thing.

Feedback on the cover letter thread patch 0/5 about whether or not to
consider userspace impact changes on these changes should be detailed on
the commit log would be useful.

> Please add a might_sleep() call in blk_put_queue() since with this patch
> applied it is no longer allowed to call blk_put_queue() from atomic context.

Sure thing.

  Luis

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 4/5] mm/swapfile: refcount block and queue before using blkcg_schedule_throttle()
  2020-04-10  3:02   ` Bart Van Assche
@ 2020-04-10 14:34     ` Luis Chamberlain
  0 siblings, 0 replies; 20+ messages in thread
From: Luis Chamberlain @ 2020-04-10 14:34 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange,
	akpm, mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko

On Thu, Apr 09, 2020 at 08:02:59PM -0700, Bart Van Assche wrote:
> On 2020-04-09 14:45, Luis Chamberlain wrote:
> >  		if (si->bdev) {
> > +			bdev = bdgrab(si->bdev);
> > +			if (!bdev)
> > +				continue;
> > +			/*
> > +			 * By adding our own bdgrab() we ensure the queue
> > +			 * sticks around until disk_release(), and so we ensure
> > +			 * our release of the request_queue does not happen in
> > +			 * atomic context.
> > +			 */
> >  			blkcg_schedule_throttle(bdev_get_queue(si->bdev),
> >  						true);
> 
> How about changing the si->bdev argument of blkcg_schedule_throttle()
> into bdev?

Sure, thanks!

  Luis

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 2/5] blktrace: fix debugfs use after free
  2020-04-10  2:52   ` Bart Van Assche
@ 2020-04-10 19:58     ` Luis Chamberlain
  2020-04-11 23:09       ` Bart Van Assche
  0 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2020-04-10 19:58 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange,
	akpm, mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko,
	syzbot+603294af2d01acfdd6da

On Thu, Apr 09, 2020 at 07:52:59PM -0700, Bart Van Assche wrote:
> On 2020-04-09 14:45, Luis Chamberlain wrote:
> > +void blk_q_debugfs_register(struct request_queue *q)
> > +{
> > +	q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
> > +					    blk_debugfs_root);
> > +}
> > +
> > +void blk_q_debugfs_unregister(struct request_queue *q)
> > +{
> > +	debugfs_remove_recursive(q->debugfs_dir);
> > +	q->debugfs_dir = NULL;
> > +}
> 
> There are no other functions in the block layer that start with the
> prefix blk_q_. How about changing that prefix into blk_?

I the first patch already introduced blk_debugfs_register(), so I have
now changed the above to:

blk_debugfs_common_register()
blk_debugfs_common_unregister()

Let me know if something else is preferred.

> > -#ifdef CONFIG_BLK_DEBUG_FS
> > +#ifdef CONFIG_DEBUG_FS
> >  	struct dentry		*debugfs_dir;
> > +#endif
> 
> Please add a comment above 'debugfs_dir' that it is used not only by the
> code in block/blk-*debugfs.c but also by the code in
> kernel/trace/blktrace.c. Otherwise this patch looks good to me.

Sure, I'll do that.

In the future, after this patch set I'll follow up with another series
to clean that header file to make it easier to expand on proper
documentaiton with kdoc.

  Luis

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 5/5] block: revert back to synchronous request_queue removal
  2020-04-10 14:34     ` Luis Chamberlain
@ 2020-04-10 20:50       ` Luis Chamberlain
  2020-04-10 21:27         ` Luis Chamberlain
  0 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2020-04-10 20:50 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Al Viro, Greg Kroah-Hartman, Steven Rostedt,
	Ingo Molnar, Jan Kara, Ming Lei, Nicolai Stange, Andrew Morton,
	Michal Hocko, yu kuai, linux-block, Linux FS Devel, linux-mm,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko

On Fri, Apr 10, 2020 at 8:34 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
> On Thu, Apr 09, 2020 at 08:12:21PM -0700, Bart Van Assche wrote:
> > Please add a might_sleep() call in blk_put_queue() since with this patch
> > applied it is no longer allowed to call blk_put_queue() from atomic context.
>
> Sure thing.

On second though, I don't think blk_put_queue() would be the right
place for might_sleep(), given we really only care about the *last*
refcount decrement to 0. So I'll move it to blk_release_queue().
Granted, at that point we are too late, and we'd get a splat about
this issue *iff* we really sleep. So yeah, I do suppose that forcing
this check there still makes sense.

  Luis

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 5/5] block: revert back to synchronous request_queue removal
  2020-04-10 20:50       ` Luis Chamberlain
@ 2020-04-10 21:27         ` Luis Chamberlain
  2020-04-11 23:21           ` Bart Van Assche
  0 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2020-04-10 21:27 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Al Viro, Greg Kroah-Hartman, Steven Rostedt,
	Ingo Molnar, Jan Kara, Ming Lei, Nicolai Stange, Andrew Morton,
	Michal Hocko, yu kuai, linux-block, Linux FS Devel, linux-mm,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko

On Fri, Apr 10, 2020 at 2:50 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Fri, Apr 10, 2020 at 8:34 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > On Thu, Apr 09, 2020 at 08:12:21PM -0700, Bart Van Assche wrote:
> > > Please add a might_sleep() call in blk_put_queue() since with this patch
> > > applied it is no longer allowed to call blk_put_queue() from atomic context.
> >
> > Sure thing.
>
> On second though, I don't think blk_put_queue() would be the right
> place for might_sleep(), given we really only care about the *last*
> refcount decrement to 0. So I'll move it to blk_release_queue().
> Granted, at that point we are too late, and we'd get a splat about
> this issue *iff* we really sleep. So yeah, I do suppose that forcing
> this check there still makes sense.

I'll add might_sleep() to both blk_release_queue() *and* blk_cleanup_queue().

  Luis

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 2/5] blktrace: fix debugfs use after free
  2020-04-10 19:58     ` Luis Chamberlain
@ 2020-04-11 23:09       ` Bart Van Assche
  2020-04-14  3:32         ` Luis Chamberlain
  0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2020-04-11 23:09 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange,
	akpm, mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko,
	syzbot+603294af2d01acfdd6da

On 2020-04-10 12:58, Luis Chamberlain wrote:
> On Thu, Apr 09, 2020 at 07:52:59PM -0700, Bart Van Assche wrote:
>> On 2020-04-09 14:45, Luis Chamberlain wrote:
>>> +void blk_q_debugfs_register(struct request_queue *q)
>>> +{
>>> +	q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
>>> +					    blk_debugfs_root);
>>> +}
>>> +
>>> +void blk_q_debugfs_unregister(struct request_queue *q)
>>> +{
>>> +	debugfs_remove_recursive(q->debugfs_dir);
>>> +	q->debugfs_dir = NULL;
>>> +}
>>
>> There are no other functions in the block layer that start with the
>> prefix blk_q_. How about changing that prefix into blk_?
> 
> I the first patch already introduced blk_debugfs_register(), so I have
> now changed the above to:
> 
> blk_debugfs_common_register()
> blk_debugfs_common_unregister()
> 
> Let me know if something else is preferred.

I just realized that the "q" in "blk_q_" probably refers to the word
"queue"? How about renaming these funtions into
blk_queue_debugfs_register/unregister()?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 5/5] block: revert back to synchronous request_queue removal
  2020-04-10 21:27         ` Luis Chamberlain
@ 2020-04-11 23:21           ` Bart Van Assche
  2020-04-14  3:38             ` Luis Chamberlain
  0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2020-04-11 23:21 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Jens Axboe, Al Viro, Greg Kroah-Hartman, Steven Rostedt,
	Ingo Molnar, Jan Kara, Ming Lei, Nicolai Stange, Andrew Morton,
	Michal Hocko, yu kuai, linux-block, Linux FS Devel, linux-mm,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko

On 2020-04-10 14:27, Luis Chamberlain wrote:
> On Fri, Apr 10, 2020 at 2:50 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>>
>> On Fri, Apr 10, 2020 at 8:34 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
>>> On Thu, Apr 09, 2020 at 08:12:21PM -0700, Bart Van Assche wrote:
>>>> Please add a might_sleep() call in blk_put_queue() since with this patch
>>>> applied it is no longer allowed to call blk_put_queue() from atomic context.
>>>
>>> Sure thing.
>>
>> On second though, I don't think blk_put_queue() would be the right
>> place for might_sleep(), given we really only care about the *last*
>> refcount decrement to 0. So I'll move it to blk_release_queue().
>> Granted, at that point we are too late, and we'd get a splat about
>> this issue *iff* we really sleep. So yeah, I do suppose that forcing
>> this check there still makes sense.
> 
> I'll add might_sleep() to both blk_release_queue() *and* blk_cleanup_queue().

Since there is already an unconditional mutex_lock() call in
blk_cleanup_queue(), do we really need to add a might_sleep() call in
blk_cleanup_queue()?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 2/5] blktrace: fix debugfs use after free
  2020-04-11 23:09       ` Bart Van Assche
@ 2020-04-14  3:32         ` Luis Chamberlain
  0 siblings, 0 replies; 20+ messages in thread
From: Luis Chamberlain @ 2020-04-14  3:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, viro, gregkh, rostedt, mingo, jack, ming.lei, nstange,
	akpm, mhocko, yukuai3, linux-block, linux-fsdevel, linux-mm,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko,
	syzbot+603294af2d01acfdd6da

On Sat, Apr 11, 2020 at 04:09:13PM -0700, Bart Van Assche wrote:
> On 2020-04-10 12:58, Luis Chamberlain wrote:
> > On Thu, Apr 09, 2020 at 07:52:59PM -0700, Bart Van Assche wrote:
> >> On 2020-04-09 14:45, Luis Chamberlain wrote:
> >>> +void blk_q_debugfs_register(struct request_queue *q)
> >>> +{
> >>> +	q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
> >>> +					    blk_debugfs_root);
> >>> +}
> >>> +
> >>> +void blk_q_debugfs_unregister(struct request_queue *q)
> >>> +{
> >>> +	debugfs_remove_recursive(q->debugfs_dir);
> >>> +	q->debugfs_dir = NULL;
> >>> +}
> >>
> >> There are no other functions in the block layer that start with the
> >> prefix blk_q_. How about changing that prefix into blk_?
> > 
> > I the first patch already introduced blk_debugfs_register(), so I have
> > now changed the above to:
> > 
> > blk_debugfs_common_register()
> > blk_debugfs_common_unregister()
> > 
> > Let me know if something else is preferred.
> 
> I just realized that the "q" in "blk_q_" probably refers to the word
> "queue"? How about renaming these funtions into
> blk_queue_debugfs_register/unregister()?

Sure.

  Luis

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 5/5] block: revert back to synchronous request_queue removal
  2020-04-11 23:21           ` Bart Van Assche
@ 2020-04-14  3:38             ` Luis Chamberlain
  0 siblings, 0 replies; 20+ messages in thread
From: Luis Chamberlain @ 2020-04-14  3:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Al Viro, Greg Kroah-Hartman, Steven Rostedt,
	Ingo Molnar, Jan Kara, Ming Lei, Nicolai Stange, Andrew Morton,
	Michal Hocko, yu kuai, linux-block, Linux FS Devel, linux-mm,
	linux-kernel, Omar Sandoval, Hannes Reinecke, Michal Hocko

On Sat, Apr 11, 2020 at 04:21:17PM -0700, Bart Van Assche wrote:
> On 2020-04-10 14:27, Luis Chamberlain wrote:
> > On Fri, Apr 10, 2020 at 2:50 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> >>
> >> On Fri, Apr 10, 2020 at 8:34 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
> >>> On Thu, Apr 09, 2020 at 08:12:21PM -0700, Bart Van Assche wrote:
> >>>> Please add a might_sleep() call in blk_put_queue() since with this patch
> >>>> applied it is no longer allowed to call blk_put_queue() from atomic context.
> >>>
> >>> Sure thing.
> >>
> >> On second though, I don't think blk_put_queue() would be the right
> >> place for might_sleep(), given we really only care about the *last*
> >> refcount decrement to 0. So I'll move it to blk_release_queue().
> >> Granted, at that point we are too late, and we'd get a splat about
> >> this issue *iff* we really sleep. So yeah, I do suppose that forcing
> >> this check there still makes sense.
> > 
> > I'll add might_sleep() to both blk_release_queue() *and* blk_cleanup_queue().
> 
> Since there is already an unconditional mutex_lock() call in
> blk_cleanup_queue(), do we really need to add a might_sleep() call in
> blk_cleanup_queue()?

You are right, mutex_lock() already has a might_sleep() sprinkled on it.

  Luis

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-04-14  3:38 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-09 21:45 [RFC v2 0/5] blktrace: fix use after free Luis Chamberlain
2020-04-09 21:45 ` [RFC v2 1/5] block: move main block debugfs initialization to its own file Luis Chamberlain
2020-04-10  2:48   ` Bart Van Assche
2020-04-09 21:45 ` [RFC v2 2/5] blktrace: fix debugfs use after free Luis Chamberlain
2020-04-10  2:52   ` Bart Van Assche
2020-04-10 19:58     ` Luis Chamberlain
2020-04-11 23:09       ` Bart Van Assche
2020-04-14  3:32         ` Luis Chamberlain
2020-04-09 21:45 ` [RFC v2 3/5] blktrace: ref count the request_queue during ioctl Luis Chamberlain
2020-04-10  2:58   ` Bart Van Assche
2020-04-09 21:45 ` [RFC v2 4/5] mm/swapfile: refcount block and queue before using blkcg_schedule_throttle() Luis Chamberlain
2020-04-10  3:02   ` Bart Van Assche
2020-04-10 14:34     ` Luis Chamberlain
2020-04-09 21:45 ` [RFC v2 5/5] block: revert back to synchronous request_queue removal Luis Chamberlain
2020-04-10  3:12   ` Bart Van Assche
2020-04-10 14:34     ` Luis Chamberlain
2020-04-10 20:50       ` Luis Chamberlain
2020-04-10 21:27         ` Luis Chamberlain
2020-04-11 23:21           ` Bart Van Assche
2020-04-14  3:38             ` Luis Chamberlain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).