All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-devel] [RFC PATCH] dm-rq: fix double free of blk tag set in dev remove after table load fails
@ 2021-04-29 21:37 Benjamin Block
  2021-05-01 21:16 ` Benjamin Block
  0 siblings, 1 reply; 2+ messages in thread
From: Benjamin Block @ 2021-04-29 21:37 UTC (permalink / raw)
  To: Alasdair Kergon, Mike Snitzer
  Cc: Steffen Maier, Benjamin Block, dm-devel, Martin Wilck

When loading a device-mapper table for a request-based mapped device,
and the allocation/initialization of the blk-mq tag set for the device
fails, a following device remove will cause a double free.

E.g. (dmesg):
  device-mapper: core: Cannot initialize queue for request-based dm-mq mapped device
  device-mapper: ioctl: unable to set up device queue for new table.
  Unable to handle kernel pointer dereference in virtual kernel address space
  Failing address: 0305e098835de000 TEID: 0305e098835de803
  Fault in home space mode while using kernel ASCE.
  AS:000000025efe0007 R3:0000000000000024
  Oops: 0038 ilc:3 [#1] SMP
  Modules linked in: ... lots of modules ...
  Supported: Yes, External
  CPU: 0 PID: 7348 Comm: multipathd Kdump: loaded Tainted: G        W      X    5.3.18-53-default #1 SLE15-SP3
  Hardware name: IBM 8561 T01 7I2 (LPAR)
  Krnl PSW : 0704e00180000000 000000025e368eca (kfree+0x42/0x330)
             R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
  Krnl GPRS: 000000000000004a 000000025efe5230 c1773200d779968d 0000000000000000
             000000025e520270 000000025e8d1b40 0000000000000003 00000007aae10000
             000000025e5202a2 0000000000000001 c1773200d779968d 0305e098835de640
             00000007a8170000 000003ff80138650 000000025e5202a2 000003e00396faa8
  Krnl Code: 000000025e368eb8: c4180041e100       lgrl    %r1,25eba50b8
             000000025e368ebe: ecba06b93a55       risbg   %r11,%r10,6,185,58
            #000000025e368ec4: e3b010000008       ag      %r11,0(%r1)
            >000000025e368eca: e310b0080004       lg      %r1,8(%r11)
             000000025e368ed0: a7110001           tmll    %r1,1
             000000025e368ed4: a7740129           brc     7,25e369126
             000000025e368ed8: e320b0080004       lg      %r2,8(%r11)
             000000025e368ede: b904001b           lgr     %r1,%r11
  Call Trace:
   [<000000025e368eca>] kfree+0x42/0x330
   [<000000025e5202a2>] blk_mq_free_tag_set+0x72/0xb8
   [<000003ff801316a8>] dm_mq_cleanup_mapped_device+0x38/0x50 [dm_mod]
   [<000003ff80120082>] free_dev+0x52/0xd0 [dm_mod]
   [<000003ff801233f0>] __dm_destroy+0x150/0x1d0 [dm_mod]
   [<000003ff8012bb9a>] dev_remove+0x162/0x1c0 [dm_mod]
   [<000003ff8012a988>] ctl_ioctl+0x198/0x478 [dm_mod]
   [<000003ff8012ac8a>] dm_ctl_ioctl+0x22/0x38 [dm_mod]
   [<000000025e3b11ee>] ksys_ioctl+0xbe/0xe0
   [<000000025e3b127a>] __s390x_sys_ioctl+0x2a/0x40
   [<000000025e8c15ac>] system_call+0xd8/0x2c8
  Last Breaking-Event-Address:
   [<000000025e52029c>] blk_mq_free_tag_set+0x6c/0xb8
  Kernel panic - not syncing: Fatal exception: panic_on_oops

When allocation/initialization of the blk-mq tag set fails in
`dm_mq_init_request_queue()`, it is uninitialized/freed, but the pointer
is not reset to NULL; so when `dev_remove()` later gets into
`dm_mq_cleanup_mapped_device()` it sees the pointer and tries to
uninitialized and free it again.

Fix this by also setting the pointer to NULL in
`dm_mq_init_request_queue()` after error-handling.

Cc: <stable@vger.kernel.org> # 4.6+
Fixes: 1c357a1e86a4 ("dm: allocate blk_mq_tag_set rather than embed in mapped_device")
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
---
 drivers/md/dm-rq.c | 1 +
 1 file changed, 1 insertion(+)

Hey,

I got this report from internal testing for a distribution kernel (you
see the version information in the dmesg output), but I checked the code
for 5.12, and it looks like nothing has change really, and the same
crash would happen there as well under the same circumstances.

I'm not sure why exactly the tag set allocation/initialization failed..
like you see in the dmesg output, the stack of `table_load()` already
unwound and the IOCTL returned, so I can't check the error-code or the
tag set state. But the crash seems obvious enough to me.

I wrote a small inject to test this with 5.12, because I didn't know how
to trigger this otherwise. It just jumps to `out_tag_set` in
`dm_mq_init_request_queue()` after the tag set is allocated, but
before `blk_mq_init_allocated_queue()` could run.

Here the kernel with inject and patch applied:

  ...
  [   52.291458] sd 0:0:0:1075265560: alua: port group 00 state A preferred supports tolusnA
  [   52.291815] sd 0:0:0:1075265560: alua: port group 00 state A preferred supports tolusnA
  [  158.242153] device-mapper: core: Cannot initialize queue for request-based dm mapped device
  [  158.242233] device-mapper: ioctl: unable to set up device queue for new table.
  [  158.761525] sd 0:0:0:1075134488: alua: port group 00 state A preferred supports tolusnA
  [  158.761839] sd 0:0:0:1075134488: alua: port group 00 state A preferred supports tolusnA
  ...

>From multipath (the userspace tool) perspective, it looks like this:

  Apr 29 23:13:03 | ds8k31_err_40184014_npiv: addmap [0 20971520 multipath 1 queue_if_no_path 1 alua 1 1 service-time 0 2 1 8:0 1 8:64 1]
  Apr 29 23:13:03 | libdevmapper: ioctl/libdm-iface.c(1923): device-mapper: reload ioctl on ds8k31_err_40184014_npiv  failed: Cannot allocate memory
  Apr 29 23:13:03 | dm_addmap: libdm task=0 error: Success
  Apr 29 23:13:03 | ds8k31_err_40184014_npiv: ignoring map
  Apr 29 23:13:03 | ds8k31_err_40184015_npiv: addmap [0 20971520 multipath 1 queue_if_no_path 1 alua 1 1 service-time 0 2 1 8:16 1 8:80 1]
  create: ds8k31_err_40184015_npiv (36005076309ffd4300000000000001815) undef IBM,2107900
  size=10G features='1 queue_if_no_path' hwhandler='1 alua' wp=undef
  `-+- policy='service-time 0' prio=50 status=undef
    |- 0:0:0:1075134488 sdb 8:16  undef ready running
    `- 1:0:1:1075134488 sdf 8:80  undef ready running
  ...

Recreating the exact same crash as in the report without the patch (but
with inject) is actually not all that easy on s390x; the double free
doesn't necessarily end up touching unmapped memory, and the crashes I
got where all over the place.. so its actually quite lucky I got this
clear report.

I don't really know the device-mapper code by heart, so thats why I
marked it as RFC.


 - Benjamin

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 13b4385f4d5a..4583c5d6885f 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -569,6 +569,7 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t)
 	blk_mq_free_tag_set(md->tag_set);
 out_kfree_tag_set:
 	kfree(md->tag_set);
+	md->tag_set = NULL;
 
 	return err;
 }
-- 
2.30.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [dm-devel] [RFC PATCH] dm-rq: fix double free of blk tag set in dev remove after table load fails
  2021-04-29 21:37 [dm-devel] [RFC PATCH] dm-rq: fix double free of blk tag set in dev remove after table load fails Benjamin Block
@ 2021-05-01 21:16 ` Benjamin Block
  0 siblings, 0 replies; 2+ messages in thread
From: Benjamin Block @ 2021-05-01 21:16 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Benjamin Block, dm-devel, Steffen Maier, Alasdair Kergon, Martin Wilck

On Thu, Apr 29, 2021 at 11:37:00PM +0200, Benjamin Block wrote:
> When loading a device-mapper table for a request-based mapped device,
> and the allocation/initialization of the blk-mq tag set for the device
> fails, a following device remove will cause a double free.
>
> [snip]
>
> Fix this by also setting the pointer to NULL in
> `dm_mq_init_request_queue()` after error-handling.
> 

Thanks for taking the fix so fast Mike :)

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-05-01 21:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-29 21:37 [dm-devel] [RFC PATCH] dm-rq: fix double free of blk tag set in dev remove after table load fails Benjamin Block
2021-05-01 21:16 ` Benjamin Block

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.