linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Luis Chamberlain <mcgrof@kernel.org>,
	syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com,
	Christoph Hellwig <hch@lst.de>,
	Bart Van Assche <bvanassche@acm.org>,
	Omar Sandoval <osandov@fb.com>, Hannes Reinecke <hare@suse.com>,
	Nicolai Stange <nstange@suse.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Michal Hocko <mhocko@kernel.org>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	yu kuai <yukuai3@huawei.com>, Jens Axboe <axboe@kernel.dk>,
	Sasha Levin <sashal@kernel.org>,
	linux-block@vger.kernel.org
Subject: [PATCH AUTOSEL 5.7 15/58] blktrace: fix debugfs use after free
Date: Sat,  8 Aug 2020 19:36:41 -0400	[thread overview]
Message-ID: <20200808233724.3618168-15-sashal@kernel.org> (raw)
In-Reply-To: <20200808233724.3618168-1-sashal@kernel.org>

From: Luis Chamberlain <mcgrof@kernel.org>

[ Upstream commit bad8e64fb19d3a0de5e564d9a7271c31bd684369 ]

On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory")
merged on v4.12 Omar fixed the original blktrace code for request-based
drivers (multiqueue). This however left in place a possible crash, if you
happen to abuse blktrace while racing to remove / add a device.

We used to use asynchronous removal of the request_queue, and with that
the issue was easier to reproduce. Now that we have reverted to
synchronous removal of the request_queue, the issue is still possible to
reproduce, its however just a bit more difficult.

We essentially run two instances of break-blktrace which add/remove
a loop device, and setup a blktrace and just never tear the blktrace
down. We do this twice in parallel. This is easily reproduced with the
script run_0004.sh from break-blktrace [0].

We can end up with two types of panics each reflecting where we
race, one a failed blktrace setup:

[  252.426751] debugfs: Directory 'loop0' with parent 'block' already present!
[  252.432265] BUG: kernel NULL pointer dereference, address: 00000000000000a0
[  252.436592] #PF: supervisor write access in kernel mode
[  252.439822] #PF: error_code(0x0002) - not-present page
[  252.442967] PGD 0 P4D 0
[  252.444656] Oops: 0002 [#1] SMP NOPTI
[  252.446972] CPU: 10 PID: 1153 Comm: break-blktrace Tainted: G            E     5.7.0-rc2-next-20200420+ #164
[  252.452673] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
[  252.456343] RIP: 0010:down_write+0x15/0x40
[  252.458146] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc
               cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00
               00 00 <f0> 48 0f b1 55 00 75 0f 48 8b 04 25 c0 8b 01 00 48 89
               45 08 5d
[  252.463638] RSP: 0018:ffffa626415abcc8 EFLAGS: 00010246
[  252.464950] RAX: 0000000000000000 RBX: ffff958c25f0f5c0 RCX: ffffff8100000000
[  252.466727] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0
[  252.468482] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000001
[  252.470014] R10: 0000000000000000 R11: ffff958d1f9227ff R12: 0000000000000000
[  252.471473] R13: ffff958c25ea5380 R14: ffffffff8cce15f1 R15: 00000000000000a0
[  252.473346] FS:  00007f2e69dee540(0000) GS:ffff958c2fc80000(0000) knlGS:0000000000000000
[  252.475225] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  252.476267] CR2: 00000000000000a0 CR3: 0000000427d10004 CR4: 0000000000360ee0
[  252.477526] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  252.478776] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  252.479866] Call Trace:
[  252.480322]  simple_recursive_removal+0x4e/0x2e0
[  252.481078]  ? debugfs_remove+0x60/0x60
[  252.481725]  ? relay_destroy_buf+0x77/0xb0
[  252.482662]  debugfs_remove+0x40/0x60
[  252.483518]  blk_remove_buf_file_callback+0x5/0x10
[  252.484328]  relay_close_buf+0x2e/0x60
[  252.484930]  relay_open+0x1ce/0x2c0
[  252.485520]  do_blk_trace_setup+0x14f/0x2b0
[  252.486187]  __blk_trace_setup+0x54/0xb0
[  252.486803]  blk_trace_ioctl+0x90/0x140
[  252.487423]  ? do_sys_openat2+0x1ab/0x2d0
[  252.488053]  blkdev_ioctl+0x4d/0x260
[  252.488636]  block_ioctl+0x39/0x40
[  252.489139]  ksys_ioctl+0x87/0xc0
[  252.489675]  __x64_sys_ioctl+0x16/0x20
[  252.490380]  do_syscall_64+0x52/0x180
[  252.491032]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

And the other on the device removal:

[  128.528940] debugfs: Directory 'loop0' with parent 'block' already present!
[  128.615325] BUG: kernel NULL pointer dereference, address: 00000000000000a0
[  128.619537] #PF: supervisor write access in kernel mode
[  128.622700] #PF: error_code(0x0002) - not-present page
[  128.625842] PGD 0 P4D 0
[  128.627585] Oops: 0002 [#1] SMP NOPTI
[  128.629871] CPU: 12 PID: 544 Comm: break-blktrace Tainted: G            E     5.7.0-rc2-next-20200420+ #164
[  128.635595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
[  128.640471] RIP: 0010:down_write+0x15/0x40
[  128.643041] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc
               cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00
               00 00 <f0> 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89
               45 08 5d
[  128.650180] RSP: 0018:ffffa9c3c05ebd78 EFLAGS: 00010246
[  128.651820] RAX: 0000000000000000 RBX: ffff8ae9a6370240 RCX: ffffff8100000000
[  128.653942] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0
[  128.655720] RBP: 00000000000000a0 R08: 0000000000000002 R09: ffff8ae9afd2d3d0
[  128.657400] R10: 0000000000000056 R11: 0000000000000000 R12: 0000000000000000
[  128.659099] R13: 0000000000000000 R14: 0000000000000003 R15: 00000000000000a0
[  128.660500] FS:  00007febfd995540(0000) GS:ffff8ae9afd00000(0000) knlGS:0000000000000000
[  128.662204] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  128.663426] CR2: 00000000000000a0 CR3: 0000000420042003 CR4: 0000000000360ee0
[  128.664776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  128.666022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  128.667282] Call Trace:
[  128.667801]  simple_recursive_removal+0x4e/0x2e0
[  128.668663]  ? debugfs_remove+0x60/0x60
[  128.669368]  debugfs_remove+0x40/0x60
[  128.669985]  blk_trace_free+0xd/0x50
[  128.670593]  __blk_trace_remove+0x27/0x40
[  128.671274]  blk_trace_shutdown+0x30/0x40
[  128.671935]  blk_release_queue+0x95/0xf0
[  128.672589]  kobject_put+0xa5/0x1b0
[  128.673188]  disk_release+0xa2/0xc0
[  128.673786]  device_release+0x28/0x80
[  128.674376]  kobject_put+0xa5/0x1b0
[  128.674915]  loop_remove+0x39/0x50 [loop]
[  128.675511]  loop_control_ioctl+0x113/0x130 [loop]
[  128.676199]  ksys_ioctl+0x87/0xc0
[  128.676708]  __x64_sys_ioctl+0x16/0x20
[  128.677274]  do_syscall_64+0x52/0x180
[  128.677823]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

The common theme here is:

debugfs: Directory 'loop0' with parent 'block' already present

This crash happens because of how blktrace uses the debugfs directory
where it places its files. Upon init we always create the same directory
which would be needed by blktrace but we only do this for make_request
drivers (multiqueue) block drivers. When you race a removal of these
devices with a blktrace setup you end up in a situation where the
make_request recursive debugfs removal will sweep away the blktrace
files and then later blktrace will also try to remove individual
dentries which are already NULL. The inverse is also possible and hence
the two types of use after frees.

We don't create the block debugfs directory on init for these types of
block devices:

  * request-based block driver block devices
  * every possible partition
  * scsi-generic

And so, this race should in theory only be possible with make_request
drivers.

We can fix the UAF by simply re-using the debugfs directory for
make_request drivers (multiqueue) and only creating the ephemeral
directory for the other type of block devices. The new clarifications
on relying on the q->blk_trace_mutex *and* also checking for q->blk_trace
*prior* to processing a blktrace ensures the debugfs directories are
only created if no possible directory name clashes are possible.

This goes tested with:

  o nvme partitions
  o ISCSI with tgt, and blktracing against scsi-generic with:
    o block
    o tape
    o cdrom
    o media changer
  o blktests

This patch is part of the work which disputes the severity of
CVE-2019-19770 which shows this issue is not a core debugfs issue, but
a misuse of debugfs within blktace.

Fixes: 6ac93117ab00 ("blktrace: use existing disk debugfs directory")
Reported-by: syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: yu kuai <yukuai3@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/trace/blktrace.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 085fceca33774..ac59476c77ae0 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -520,10 +520,18 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 	if (!bt->msg_data)
 		goto err;
 
-	ret = -ENOENT;
-
-	dir = debugfs_lookup(buts->name, blk_debugfs_root);
-	if (!dir)
+#ifdef CONFIG_BLK_DEBUG_FS
+	/*
+	 * When tracing whole make_request drivers (multiqueue) block devices,
+	 * reuse the existing debugfs directory created by the block layer on
+	 * init. For request-based block devices, all partitions block devices,
+	 * and scsi-generic block devices we create a temporary new debugfs
+	 * directory that will be removed once the trace ends.
+	 */
+	if (queue_is_mq(q) && bdev && bdev == bdev->bd_contains)
+		dir = q->debugfs_dir;
+	else
+#endif
 		bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root);
 
 	bt->dev = dev;
@@ -564,8 +572,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 
 	ret = 0;
 err:
-	if (dir && !bt->dir)
-		dput(dir);
 	if (ret)
 		blk_trace_free(bt);
 	return ret;
-- 
2.25.1


  parent reply	other threads:[~2020-08-08 23:37 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-08 23:36 [PATCH AUTOSEL 5.7 01/58] ARM: dts: stm32: fix uart7_pins_a comments in stm32mp15-pinctrl Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 02/58] x86/mce/inject: Fix a wrong assignment of i_mce.status Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 03/58] perf/x86/intel/uncore: Fix oops when counting IMC uncore events on some TGL Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 04/58] x86, sched: Bail out of frequency invariance if turbo frequency is unknown Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 05/58] x86, sched: Bail out of frequency invariance if turbo_freq/base_freq gives 0 Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 06/58] sched/fair: Fix NOHZ next idle balance Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 07/58] sched: correct SD_flags returned by tl->sd_flags() Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 08/58] arm64: dts: rockchip: fix rk3368-lion gmac reset gpio Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 09/58] arm64: dts: rockchip: fix rk3399-puma vcc5v0-host gpio Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 10/58] arm64: dts: rockchip: fix rk3399-puma gmac reset gpio Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 11/58] EDAC: Fix reference count leaks Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 12/58] crc-t10dif: Fix potential crypto notify dead-lock Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 13/58] arm64: dts: qcom: msm8916: Replace invalid bias-pull-none property Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 14/58] memory: tegra: Fix an error handling path in tegra186_emc_probe() Sasha Levin
2020-08-08 23:36 ` Sasha Levin [this message]
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 16/58] crypto: ccree - fix resource leak on error path Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 17/58] ARM: exynos: MCPM: Restore big.LITTLE cpuidle support Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 18/58] rcu/tree: Repeat the monitor if any free channel is busy Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 19/58] firmware: arm_scmi: Fix SCMI genpd domain probing Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 20/58] arm64: dts: sun50i-pinephone: dldo4 must not be >= 1.8V Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 21/58] arm64: dts: exynos: Fix silent hang after boot on Espresso Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 22/58] sched/uclamp: Fix initialization of struct uclamp_rq Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 23/58] crypto: qat - allow xts requests not multiple of block Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 24/58] clk: scmi: Fix min and max rate when registering clocks with discrete rates Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 25/58] m68k: mac: Don't send IOP message until channel is idle Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 26/58] m68k: mac: Fix IOP status/control register writes Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 27/58] platform/x86: intel-hid: Fix return value check in check_acpi_dev() Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 28/58] platform/x86: intel-vbtn: " Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 29/58] ARM: dts: gose: Fix ports node name for adv7180 Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 30/58] arm64: dts: renesas: Fix SD Card/eMMC interface device node names Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 31/58] ARM: dts: gose: Fix ports node name for adv7612 Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 32/58] ARM: at91: pm: add missing put_device() call in at91_pm_sram_init() Sasha Levin
2020-08-08 23:36 ` [PATCH AUTOSEL 5.7 33/58] ARM: dts: exynos: Disable frequency scaling for FSYS bus on Odroid XU3 family Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 34/58] reset: intel: fix a compile warning about REG_OFFSET redefined Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 35/58] ARM: dts: sunxi: bananapi-m2-plus-v1.2: Add regulator supply to all CPU cores Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 36/58] ARM: dts: sunxi: bananapi-m2-plus-v1.2: Fix CPU supply voltages Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 37/58] spi: lantiq: fix: Rx overflow error in full duplex mode Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 38/58] tpm: Require that all digests are present in TCG_PCR_EVENT2 structures Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 39/58] recordmcount: only record relocation of type R_AARCH64_CALL26 on arm64 Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 40/58] regulator: fix memory leak on error path of regulator_register() Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 41/58] io_uring: fix sq array offset calculation Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 42/58] arm64: dts: meson: fix mmc0 tuning error on Khadas VIM3 Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 43/58] soc: qcom: pdr: Reorder the PD state indication ack Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 44/58] spi: rockchip: Fix error in SPI slave pio read Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 45/58] ARM: socfpga: PM: add missing put_device() call in socfpga_setup_ocram_self_refresh() Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 46/58] iocost: Fix check condition of iocg abs_vdebt Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 47/58] scripts/selinux/mdp: fix initial SID handling Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 48/58] irqchip/ti-sci-inta: Fix return value about devm_ioremap_resource() Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 49/58] seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 50/58] md: raid0/linear: fix dereference before null check on pointer mddev Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 51/58] irqchip/loongson-liointc: Fix potential dead lock Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 52/58] irqchip/irq-bcm7038-l1: Guard uses of cpu_logical_map Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 53/58] irqchip/gic-v4.1: Use GFP_ATOMIC flag in allocate_vpe_l1_table() Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 54/58] nvme-tcp: fix controller reset hang during traffic Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 55/58] nvme-rdma: " Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 56/58] nvme-multipath: fix logic for non-optimized paths Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 57/58] nvme-multipath: do not fall back to __nvme_find_path() " Sasha Levin
2020-08-08 23:37 ` [PATCH AUTOSEL 5.7 58/58] block: don't do revalidate zones on invalid devices Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200808233724.3618168-15-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=jejb@linux.ibm.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=nstange@suse.de \
    --cc=osandov@fb.com \
    --cc=stable@vger.kernel.org \
    --cc=syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).