All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: unable to handle kernel NULL pointer dereference at sysfs_do_create_link_sd (after mdadm)
@ 2015-04-14 17:15 Azat Khuzhin
  2015-04-15  2:44 ` Guoqing Jiang
  2015-04-23  6:05 ` Bisected, with rfc/patch - was " NeilBrown
  0 siblings, 2 replies; 21+ messages in thread
From: Azat Khuzhin @ 2015-04-14 17:15 UTC (permalink / raw)
  To: Kernel.org-Linux-RAID, neilb

$ git describe
v4.0-2620-gb79013b

During setting up partitions with mdadm, mdadm hung, after attaching to mdadm with strace I got next:

# pgrep mdadm | xargs strace -fp
Process 27389 attached - interrupt to quit
unlink("/dev/.tmp.md.27389:9:127")      = 0
mknod("/tmp/.tmp.md.27389:9:127", S_IFBLK|0600, makedev(9, 127)) = 0
open("/tmp/.tmp.md.27389:9:127", O_RDWR|O_EXCL|O_DIRECT) <-- *hung*

After, I looked into dmesg, and found this:
[ 9627.630018] ------------[ cut here ]------------
[ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
[ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'
[ 9627.630033] Modules linked in: xt_tcpudp iptable_filter ip_tables x_tables nfsd nfs lockd grace sunrpc ipmi_devintf netconsole configfs loop hid_generic usbhid hid x86_pkg_temp_thermal coretemp ghash_clmulni_intel aesni_intel ioatdma ehci_pci aes_x86_64 iTCO_wdt iTCO_ve
[ 9627.630074] CPU: 18 PID: 3330 Comm: mdadm Not tainted 4.0.0bl-azat-v6+ #1
[ 9627.630076] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013
[ 9627.630077]  0000000000000000 ffffffff814e3fcc ffffffff813e590f ffff885f9bcd3808
[ 9627.630079]  ffffffff8104575c ffff885f96acb000 ffff885fa4b3e3c0 ffff885fa2fec780
[ 9627.630081]  ffff885fa4bc4000 0000000000000000 ffffffff810457d5 ffffffff814e5d78
[ 9627.630083] Call Trace:
[ 9627.630091]  [<ffffffff813e590f>] ? dump_stack+0x40/0x50
[ 9627.630096]  [<ffffffff8104575c>] ? warn_slowpath_common+0x7c/0xb0
[ 9627.630098]  [<ffffffff810457d5>] ? warn_slowpath_fmt+0x45/0x50
[ 9627.630100]  [<ffffffff81185092>] ? kernfs_path+0x42/0x50
[ 9627.630102]  [<ffffffff811883da>] ? sysfs_warn_dup+0x5a/0x70
[ 9627.630104]  [<ffffffff8118846e>] ? sysfs_create_dir_ns+0x7e/0x90
[ 9627.630108]  [<ffffffff811d94ab>] ? kobject_add_internal+0x9b/0x2f0
[ 9627.630109]  [<ffffffff811d9af6>] ? kobject_add+0x66/0xb0
[ 9627.630114]  [<ffffffff812bb2e3>] ? device_add+0x263/0x620
[ 9627.630116]  [<ffffffff812bb8a8>] ? device_create_groups_vargs+0xe8/0x100
[ 9627.630118]  [<ffffffff812bb8d3>] ? device_create_vargs+0x13/0x20
[ 9627.630124]  [<ffffffff810ed128>] ? bdi_register+0x68/0x150
[ 9627.630129]  [<ffffffff811c535d>] ? add_disk+0x14d/0x4a0
[ 9627.630132]  [<ffffffff811c585f>] ? alloc_disk_node+0xaf/0x100
[ 9627.630137]  [<ffffffffa0252269>] ? md_alloc+0x1e9/0x350 [md_mod]
[ 9627.630141]  [<ffffffffa02523db>] ? md_probe+0xb/0x20 [md_mod]
[ 9627.630143]  [<ffffffff812c0654>] ? kobj_lookup+0x104/0x170
[ 9627.630147]  [<ffffffffa02523d0>] ? md_alloc+0x350/0x350 [md_mod]
[ 9627.630149]  [<ffffffff811c4da8>] ? get_gendisk+0x28/0xf0
[ 9627.630153]  [<ffffffff8115fb74>] ? __blkdev_get+0x114/0x3c0
[ 9627.630156]  [<ffffffff8115e590>] ? bdev_direct_access+0xa0/0xa0
[ 9627.630158]  [<ffffffff8115e5a0>] ? bdev_test+0x10/0x10
[ 9627.630160]  [<ffffffff8115fe58>] ? blkdev_get+0x38/0x310
[ 9627.630162]  [<ffffffff81160170>] ? blkdev_get_by_dev+0x40/0x40
[ 9627.630167]  [<ffffffff8112b3d3>] ? do_dentry_open.isra.16+0x153/0x320
[ 9627.630170]  [<ffffffff811380f3>] ? do_last.isra.51+0x323/0xd50
[ 9627.630172]  [<ffffffff8111f5b3>] ? kmem_cache_alloc+0x123/0x130
[ 9627.630174]  [<ffffffff8113a97f>] ? path_openat+0x7f/0x610
[ 9627.630177]  [<ffffffff810f7480>] ? tlb_flush_mmu_free+0x30/0x50
[ 9627.630180]  [<ffffffff810fe800>] ? unmap_region+0xb0/0xf0
[ 9627.630182]  [<ffffffff8113bb3b>] ? do_filp_open+0x2b/0x90
[ 9627.630187]  [<ffffffff811472ec>] ? __alloc_fd+0x7c/0x120
[ 9627.630189]  [<ffffffff8112c531>] ? do_sys_open+0x121/0x210
[ 9627.630193]  [<ffffffff813ea097>] ? system_call_fastpath+0x12/0x6a
[ 9627.630195] ---[ end trace b7a3e9c6f05c2666 ]---
[ 9627.630196] ------------[ cut here ]------------
[ 9627.630198] WARNING: CPU: 18 PID: 3330 at lib/kobject.c:240 kobject_add_internal+0x274/0x2f0()
[ 9627.630200] kobject_add_internal failed for 9:127 with -EEXIST, don't try to register things with the same name in the same directory.
[ 9627.630201] Modules linked in: xt_tcpudp iptable_filter ip_tables x_tables nfsd nfs lockd grace sunrpc ipmi_devintf netconsole configfs loop hid_generic usbhid hid x86_pkg_temp_thermal coretemp ghash_clmulni_intel aesni_intel ioatdma ehci_pci aes_x86_64 iTCO_wdt iTCO_ve
[ 9627.630223] CPU: 18 PID: 3330 Comm: mdadm Tainted: G        W       4.0.0bl-azat-v6+ #1
[ 9627.630224] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013
[ 9627.630225]  0000000000000000 ffffffff814f2c88 ffffffff813e590f ffff885f9bcd3858
[ 9627.630227]  ffffffff8104575c ffff885fa4bc4010 00000000ffffffef ffff885fa473f420
[ 9627.630229]  ffff885fa4bc4000 0000000000000000 ffffffff810457d5 ffffffff814f2e48
[ 9627.630230] Call Trace:
[ 9627.630233]  [<ffffffff813e590f>] ? dump_stack+0x40/0x50
[ 9627.630235]  [<ffffffff8104575c>] ? warn_slowpath_common+0x7c/0xb0
[ 9627.630236]  [<ffffffff810457d5>] ? warn_slowpath_fmt+0x45/0x50
[ 9627.630238]  [<ffffffff8118846e>] ? sysfs_create_dir_ns+0x7e/0x90
[ 9627.630240]  [<ffffffff811d9684>] ? kobject_add_internal+0x274/0x2f0
[ 9627.630242]  [<ffffffff811d9af6>] ? kobject_add+0x66/0xb0
[ 9627.630244]  [<ffffffff812bb2e3>] ? device_add+0x263/0x620
[ 9627.630245]  [<ffffffff812bb8a8>] ? device_create_groups_vargs+0xe8/0x100
[ 9627.630247]  [<ffffffff812bb8d3>] ? device_create_vargs+0x13/0x20
[ 9627.630250]  [<ffffffff810ed128>] ? bdi_register+0x68/0x150
[ 9627.630252]  [<ffffffff811c535d>] ? add_disk+0x14d/0x4a0
[ 9627.630255]  [<ffffffff811c585f>] ? alloc_disk_node+0xaf/0x100
[ 9627.630258]  [<ffffffffa0252269>] ? md_alloc+0x1e9/0x350 [md_mod]
[ 9627.630261]  [<ffffffffa02523db>] ? md_probe+0xb/0x20 [md_mod]
[ 9627.630262]  [<ffffffff812c0654>] ? kobj_lookup+0x104/0x170
[ 9627.630266]  [<ffffffffa02523d0>] ? md_alloc+0x350/0x350 [md_mod]
[ 9627.630268]  [<ffffffff811c4da8>] ? get_gendisk+0x28/0xf0
[ 9627.630270]  [<ffffffff8115fb74>] ? __blkdev_get+0x114/0x3c0
[ 9627.630272]  [<ffffffff8115e590>] ? bdev_direct_access+0xa0/0xa0
[ 9627.630274]  [<ffffffff8115e5a0>] ? bdev_test+0x10/0x10
[ 9627.630276]  [<ffffffff8115fe58>] ? blkdev_get+0x38/0x310
[ 9627.630278]  [<ffffffff81160170>] ? blkdev_get_by_dev+0x40/0x40
[ 9627.630280]  [<ffffffff8112b3d3>] ? do_dentry_open.isra.16+0x153/0x320
[ 9627.630282]  [<ffffffff811380f3>] ? do_last.isra.51+0x323/0xd50
[ 9627.630283]  [<ffffffff8111f5b3>] ? kmem_cache_alloc+0x123/0x130
[ 9627.630285]  [<ffffffff8113a97f>] ? path_openat+0x7f/0x610
[ 9627.630287]  [<ffffffff810f7480>] ? tlb_flush_mmu_free+0x30/0x50
[ 9627.630289]  [<ffffffff810fe800>] ? unmap_region+0xb0/0xf0
[ 9627.630291]  [<ffffffff8113bb3b>] ? do_filp_open+0x2b/0x90
[ 9627.630293]  [<ffffffff811472ec>] ? __alloc_fd+0x7c/0x120
[ 9627.630295]  [<ffffffff8112c531>] ? do_sys_open+0x121/0x210
[ 9627.630297]  [<ffffffff813ea097>] ? system_call_fastpath+0x12/0x6a
[ 9627.630298] ---[ end trace b7a3e9c6f05c2667 ]---
[ 9627.630395] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[ 9627.630430] IP: [<ffffffff8118869a>] sysfs_do_create_link_sd.isra.2+0x2a/0xb0
[ 9627.630524] PGD 5fa2d03067 PUD 5f9d679067 PMD 0 
[ 9627.630550] Oops: 0000 [#1] SMP 
[ 9627.630624] Modules linked in: xt_tcpudp iptable_filter ip_tables x_tables nfsd nfs lockd grace sunrpc ipmi_devintf netconsole configfs loop hid_generic usbhid hid x86_pkg_temp_thermal coretemp ghash_clmulni_intel aesni_intel ioatdma ehci_pci aes_x86_64 iTCO_wdt iTCO_ve
[ 9627.631073] CPU: 18 PID: 3330 Comm: mdadm Tainted: G        W       4.0.0bl-azat-v6+ #1
[ 9627.631090] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013
[ 9627.631109] task: ffff885fa4659f00 ti: ffff885f9bcd0000 task.ti: ffff885f9bcd0000
[ 9627.631124] RIP: 0010:[<ffffffff8118869a>]  [<ffffffff8118869a>] sysfs_do_create_link_sd.isra.2+0x2a/0xb0
[ 9627.631162] RSP: 0018:ffff885f9bcd3a78  EFLAGS: 00010246
[ 9627.631189] RAX: 000000000000e6e6 RBX: 0000000000000040 RCX: 00000000000000e6
[ 9627.631219] RDX: ffffffff814e19d0 RSI: 0000000000000040 RDI: ffffffff81740d88
[ 9627.631249] RBP: ffffffff814e19d0 R08: 0000000000017d60 R09: ffff88607fcd7d60
[ 9627.631278] R10: ffff882fbf802400 R11: ffffea017e830e00 R12: 0000000000000001
[ 9627.631308] R13: ffff885fa5a61ca8 R14: ffff885fa4bc3c70 R15: ffff885fa4bc3c00
[ 9627.631338] FS:  00007f439e991700(0000) GS:ffff88607fcc0000(0000) knlGS:0000000000000000
[ 9627.631383] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9627.631411] CR2: 0000000000000040 CR3: 0000005f9fe22000 CR4: 00000000001407e0
[ 9627.631440] Stack:
[ 9627.631460]  ffff885fa4bc3c00 ffff885f9bf41428 ffff885fa4bc3c0c ffff885fa4bc3c80
[ 9627.631515]  ffff885fa4bc3c70 ffffffff811c53f3 ffff885fa4bc3c00 ffffffff00000018
[ 9627.631569]  0090007f9bcd3b08 ffff885fa4bc3c00 0000000000000000 0000000000000001
[ 9627.631680] Call Trace:
[ 9627.631703]  [<ffffffff811c53f3>] ? add_disk+0x1e3/0x4a0
[ 9627.631733]  [<ffffffffa0252269>] ? md_alloc+0x1e9/0x350 [md_mod]
[ 9627.631763]  [<ffffffffa02523db>] ? md_probe+0xb/0x20 [md_mod]
[ 9627.631791]  [<ffffffff812c0654>] ? kobj_lookup+0x104/0x170
[ 9627.631820]  [<ffffffffa02523d0>] ? md_alloc+0x350/0x350 [md_mod]
[ 9627.631849]  [<ffffffff811c4da8>] ? get_gendisk+0x28/0xf0
[ 9627.631877]  [<ffffffff8115fb74>] ? __blkdev_get+0x114/0x3c0
[ 9627.631905]  [<ffffffff8115e590>] ? bdev_direct_access+0xa0/0xa0
[ 9627.631933]  [<ffffffff8115e5a0>] ? bdev_test+0x10/0x10
[ 9627.631961]  [<ffffffff8115fe58>] ? blkdev_get+0x38/0x310
[ 9627.631988]  [<ffffffff81160170>] ? blkdev_get_by_dev+0x40/0x40
[ 9627.632017]  [<ffffffff8112b3d3>] ? do_dentry_open.isra.16+0x153/0x320
[ 9627.632046]  [<ffffffff811380f3>] ? do_last.isra.51+0x323/0xd50
[ 9627.632075]  [<ffffffff8111f5b3>] ? kmem_cache_alloc+0x123/0x130
[ 9627.632103]  [<ffffffff8113a97f>] ? path_openat+0x7f/0x610
[ 9627.632131]  [<ffffffff810f7480>] ? tlb_flush_mmu_free+0x30/0x50
[ 9627.632159]  [<ffffffff810fe800>] ? unmap_region+0xb0/0xf0
[ 9627.632186]  [<ffffffff8113bb3b>] ? do_filp_open+0x2b/0x90
[ 9627.632215]  [<ffffffff811472ec>] ? __alloc_fd+0x7c/0x120
[ 9627.632242]  [<ffffffff8112c531>] ? do_sys_open+0x121/0x210
[ 9627.632270]  [<ffffffff813ea097>] ? system_call_fastpath+0x12/0x6a
[ 9627.632298] Code: 00 48 85 d2 74 73 48 85 ff 74 6e 41 56 41 55 49 89 fd 41 54 55 48 c7 c7 88 0d 74 81 53 48 89 f3 41 89 cc 48 89 d5 e8 76 15 26 00 <48> 8b 1b 48 85 db 74 08 48 89 df e8 f6 c9 ff ff 80 05 d7 86 5b 
[ 9627.632557] RIP  [<ffffffff8118869a>] sysfs_do_create_link_sd.isra.2+0x2a/0xb0
[ 9627.632604]  RSP <ffff885f9bcd3a78>
[ 9627.632627] CR2: 0000000000000040
[ 9627.633014] ---[ end trace b7a3e9c6f05c2668 ]---

Any assumptions?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BUG: unable to handle kernel NULL pointer dereference at sysfs_do_create_link_sd (after mdadm)
  2015-04-14 17:15 BUG: unable to handle kernel NULL pointer dereference at sysfs_do_create_link_sd (after mdadm) Azat Khuzhin
@ 2015-04-15  2:44 ` Guoqing Jiang
  2015-04-15  8:47   ` Azat Khuzhin
  2015-04-23  6:05 ` Bisected, with rfc/patch - was " NeilBrown
  1 sibling, 1 reply; 21+ messages in thread
From: Guoqing Jiang @ 2015-04-15  2:44 UTC (permalink / raw)
  To: Azat Khuzhin; +Cc: Kernel.org-Linux-RAID, neilb

Azat Khuzhin wrote:
> $ git describe
> v4.0-2620-gb79013b
>
> During setting up partitions with mdadm, mdadm hung, after attaching to mdadm with strace I got next:
>
> # pgrep mdadm | xargs strace -fp
> Process 27389 attached - interrupt to quit
> unlink("/dev/.tmp.md.27389:9:127")      = 0
> mknod("/tmp/.tmp.md.27389:9:127", S_IFBLK|0600, makedev(9, 127)) = 0
> open("/tmp/.tmp.md.27389:9:127", O_RDWR|O_EXCL|O_DIRECT) <-- *hung*
>
> After, I looked into dmesg, and found this:
> [ 9627.630018] ------------[ cut here ]------------
> [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
> [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'
>   
sysfs complains about duplicate filename, maybe you tried to create the
array with same node.
> [ 9627.630033] Modules linked in: xt_tcpudp iptable_filter ip_tables x_tables nfsd nfs lockd grace sunrpc ipmi_devintf netconsole configfs loop hid_generic usbhid hid x86_pkg_temp_thermal coretemp ghash_clmulni_intel aesni_intel ioatdma ehci_pci aes_x86_64 iTCO_wdt iTCO_ve
> [ 9627.630074] CPU: 18 PID: 3330 Comm: mdadm Not tainted 4.0.0bl-azat-v6+ #1
> [ 9627.630076] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013
> [ 9627.630077]  0000000000000000 ffffffff814e3fcc ffffffff813e590f ffff885f9bcd3808
> [ 9627.630079]  ffffffff8104575c ffff885f96acb000 ffff885fa4b3e3c0 ffff885fa2fec780
> [ 9627.630081]  ffff885fa4bc4000 0000000000000000 ffffffff810457d5 ffffffff814e5d78
> [ 9627.630083] Call Trace:
> [ 9627.630091]  [<ffffffff813e590f>] ? dump_stack+0x40/0x50
> [ 9627.630096]  [<ffffffff8104575c>] ? warn_slowpath_common+0x7c/0xb0
> [ 9627.630098]  [<ffffffff810457d5>] ? warn_slowpath_fmt+0x45/0x50
> [ 9627.630100]  [<ffffffff81185092>] ? kernfs_path+0x42/0x50
> [ 9627.630102]  [<ffffffff811883da>] ? sysfs_warn_dup+0x5a/0x70
> [ 9627.630104]  [<ffffffff8118846e>] ? sysfs_create_dir_ns+0x7e/0x90
> [ 9627.630108]  [<ffffffff811d94ab>] ? kobject_add_internal+0x9b/0x2f0
> [ 9627.630109]  [<ffffffff811d9af6>] ? kobject_add+0x66/0xb0
> [ 9627.630114]  [<ffffffff812bb2e3>] ? device_add+0x263/0x620
> [ 9627.630116]  [<ffffffff812bb8a8>] ? device_create_groups_vargs+0xe8/0x100
> [ 9627.630118]  [<ffffffff812bb8d3>] ? device_create_vargs+0x13/0x20
> [ 9627.630124]  [<ffffffff810ed128>] ? bdi_register+0x68/0x150
> [ 9627.630129]  [<ffffffff811c535d>] ? add_disk+0x14d/0x4a0
> [ 9627.630132]  [<ffffffff811c585f>] ? alloc_disk_node+0xaf/0x100
> [ 9627.630137]  [<ffffffffa0252269>] ? md_alloc+0x1e9/0x350 [md_mod]
> [ 9627.630141]  [<ffffffffa02523db>] ? md_probe+0xb/0x20 [md_mod]
> [ 9627.630143]  [<ffffffff812c0654>] ? kobj_lookup+0x104/0x170
> [ 9627.630147]  [<ffffffffa02523d0>] ? md_alloc+0x350/0x350 [md_mod]
> [ 9627.630149]  [<ffffffff811c4da8>] ? get_gendisk+0x28/0xf0
> [ 9627.630153]  [<ffffffff8115fb74>] ? __blkdev_get+0x114/0x3c0
> [ 9627.630156]  [<ffffffff8115e590>] ? bdev_direct_access+0xa0/0xa0
> [ 9627.630158]  [<ffffffff8115e5a0>] ? bdev_test+0x10/0x10
> [ 9627.630160]  [<ffffffff8115fe58>] ? blkdev_get+0x38/0x310
> [ 9627.630162]  [<ffffffff81160170>] ? blkdev_get_by_dev+0x40/0x40
> [ 9627.630167]  [<ffffffff8112b3d3>] ? do_dentry_open.isra.16+0x153/0x320
> [ 9627.630170]  [<ffffffff811380f3>] ? do_last.isra.51+0x323/0xd50
> [ 9627.630172]  [<ffffffff8111f5b3>] ? kmem_cache_alloc+0x123/0x130
> [ 9627.630174]  [<ffffffff8113a97f>] ? path_openat+0x7f/0x610
> [ 9627.630177]  [<ffffffff810f7480>] ? tlb_flush_mmu_free+0x30/0x50
> [ 9627.630180]  [<ffffffff810fe800>] ? unmap_region+0xb0/0xf0
> [ 9627.630182]  [<ffffffff8113bb3b>] ? do_filp_open+0x2b/0x90
> [ 9627.630187]  [<ffffffff811472ec>] ? __alloc_fd+0x7c/0x120
> [ 9627.630189]  [<ffffffff8112c531>] ? do_sys_open+0x121/0x210
> [ 9627.630193]  [<ffffffff813ea097>] ? system_call_fastpath+0x12/0x6a
> [ 9627.630195] ---[ end trace b7a3e9c6f05c2666 ]---
> [ 9627.630196] ------------[ cut here ]------------
> [ 9627.630198] WARNING: CPU: 18 PID: 3330 at lib/kobject.c:240 kobject_add_internal+0x274/0x2f0()
> [ 9627.630200] kobject_add_internal failed for 9:127 with -EEXIST, don't try to register things with the same name in the same directory.
>   
Ditto, seems the same issue.

Thanks,
Guoqing


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BUG: unable to handle kernel NULL pointer dereference at sysfs_do_create_link_sd (after mdadm)
  2015-04-15  2:44 ` Guoqing Jiang
@ 2015-04-15  8:47   ` Azat Khuzhin
  0 siblings, 0 replies; 21+ messages in thread
From: Azat Khuzhin @ 2015-04-15  8:47 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Kernel.org-Linux-RAID, neilb

On Wed, Apr 15, 2015 at 10:44:39AM +0800, Guoqing Jiang wrote:
> Azat Khuzhin wrote:
> > $ git describe
> > v4.0-2620-gb79013b
> >
> > During setting up partitions with mdadm, mdadm hung, after attaching to mdadm with strace I got next:
> >
> > # pgrep mdadm | xargs strace -fp
> > Process 27389 attached - interrupt to quit
> > unlink("/dev/.tmp.md.27389:9:127")      = 0
> > mknod("/tmp/.tmp.md.27389:9:127", S_IFBLK|0600, makedev(9, 127)) = 0
> > open("/tmp/.tmp.md.27389:9:127", O_RDWR|O_EXCL|O_DIRECT) <-- *hung*
> >
> > After, I looked into dmesg, and found this:
> > [ 9627.630018] ------------[ cut here ]------------
> > [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
> > [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'
> >   
> sysfs complains about duplicate filename, maybe you tried to create the
> array with same node.

Yes, my steps was close to this:
$ mdadm --create /dev/md/foo --raid-devices=2 /dev/sda /dev/sdb
$ mdadm --stop --scan
$ mdadm --create /dev/md/foo --raid-devices=2 /dev/sda /dev/sdb

But it is not always reproduces.

Anyway the most important is not WARNING but BUG (also after that BUG
you can't umount all file systems normally -- i.e. on reboot).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Bisected, with rfc/patch - was Re: BUG: unable to handle kernel NULL pointer dereference at sysfs_do_create_link_sd (after mdadm)
  2015-04-14 17:15 BUG: unable to handle kernel NULL pointer dereference at sysfs_do_create_link_sd (after mdadm) Azat Khuzhin
  2015-04-15  2:44 ` Guoqing Jiang
@ 2015-04-23  6:05 ` NeilBrown
  2015-04-23  7:37   ` Christoph Hellwig
  2015-04-27  4:12   ` [PATCH -stable] block: destroy bdi before blockdev is unregistered NeilBrown
  1 sibling, 2 replies; 21+ messages in thread
From: NeilBrown @ 2015-04-23  6:05 UTC (permalink / raw)
  To: Azat Khuzhin, Christoph Hellwig
  Cc: Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 15917 bytes --]


Hi Christoph et-al,

As Azat reports (see mail below with stack traces etc), it is now fairly easy
to trigger a WARNING followed by a BUG when you stop an 'md' array.

Bisection shows that this was introduced by:

commit c4db59d31e39ea067c32163ac961e9c80198fd37
Author: Christoph Hellwig <hch@lst.de>
    fs: don't reassign dirty inodes to default_backing_dev_info

When an md array is stopped, a udev event causes udev to run blkid which
opens the md device.  Due to the "interesting" semantics of md devices, this
creates a new device, at least temporarily.  So we have device removal
immediately followed by creation of the same device.
And particularly: bdi removal immediately before bdi creation with same name.

Prior to the above commit, the bdi entry would be removed from sysfs when
bdi_unregister is called by del_gendisk() (bdi_unregister called
device_unregister(bdi->dev)).
Importantly del_gendisk() calls this *before* calling blk_unregister_region().
As soon as the latter is called, a new md device can be created simply by
opening the device node.

After the identified commit, the device_unregister call is delayed until
bdi_destroy().  I'm not sure how long this delay is, but it happens *after*
the blk_unregister_region() call.

This means there is a window during which the block device has been
unregistered, but the bdi still appears in sysfs.
If the md device is opened during this window (by blkid from udev), the
WARNING shown below results and the new bdi does not get registered properly.

The following RFC patch moves the device_unregister() call (and related
bdi_debug_unregister()) back into bdi_unregister().

This by itself is not sufficient as bdi_writeback_workfn() uses bdi->dev, and
that can run later (which was half the point of the patch I believe).
bdi_writeback_workfn() *only* uses bdi->dev to get a name for the worker
process, so I changed that to only use the bdi->dev name if bdi->dev was not
NULL.

Finally, to ensure no races, I flush the dwork if it is pending between
clearing bdi->dev and unregistering it.  This ensure bdi_writeback_workfn()
doesn't try to find the name of a device which is just being freed.

I left the unregister code in bdi_destroy() as I'm not certain that
bdi_unregister() is always called (though I suspect it is) and the code cannot
hurt as it only runs if bdi->dev is not NULL.  I'll remove that if that
would be more correct.

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 32a8bbd7a9ad..93f872ec434b 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1093,9 +1093,13 @@ void bdi_writeback_workfn(struct work_struct *work)
 	struct bdi_writeback *wb = container_of(to_delayed_work(work),
 						struct bdi_writeback, dwork);
 	struct backing_dev_info *bdi = wb->bdi;
+	struct device *dev = bdi->dev;
 	long pages_written;
 
-	set_worker_desc("flush-%s", dev_name(bdi->dev));
+	if (dev)
+		set_worker_desc("flush-%s", dev_name(dev));
+	else
+		set_worker_desc("flush-final");
 	current->flags |= PF_SWAPWRITE;
 
 	if (likely(!current_is_workqueue_rescuer() ||
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6dc4580df2af..110af4534905 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -369,9 +369,17 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
  */
 void bdi_unregister(struct backing_dev_info *bdi)
 {
-	if (WARN_ON_ONCE(!bdi->dev))
+	struct device *dev = bdi->dev;
+	if (WARN_ON_ONCE(!dev))
 		return;
 
+	bdi->dev = NULL;
+	if (delayed_work_pending(&bdi->wb.dwork))
+		/* The worker can access bdi->dev */
+		flush_work(&bdi->wb.dwork.work);
+	bdi_debug_unregister(bdi);
+	device_unregister(dev);
+
 	bdi_set_min_ratio(bdi, 0);
 }
 EXPORT_SYMBOL(bdi_unregister);


As mentioned the WARNING is followed shortly by a BUG.
This is because add_disk() doesn't check if bdi_register_dev() failed, and
proceeds to use bdi->dev.  This is probably best fixed by the following patch:

diff --git a/block/genhd.c b/block/genhd.c
index 0a536dc05f3b..e351fc521053 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -612,6 +612,10 @@ void add_disk(struct gendisk *disk)
 	/* Register BDI before referencing it from bdev */
 	bdi = &disk->queue->backing_dev_info;
 	bdi_register_dev(bdi, disk_devt(disk));
+	if (!bdi->dev) {
+		WARN_ON(1);
+		return;
+	}
 
 	blk_register_region(disk_devt(disk), disk->minors, NULL,
 			    exact_match, exact_lock, disk);


which Jens or Tejun might like to comment on.

If I can get an 'ack' or a proposal for a different approach I will happy
create a properly formatted patch for submission to -stable and to Linus.

And Azat:  thanks for reporting!! sorry it took over a week to get to this.

NeilBrown


On Tue, 14 Apr 2015 20:15:37 +0300 Azat Khuzhin <a3at.mail@gmail.com> wrote:

> $ git describe
> v4.0-2620-gb79013b
> 
> During setting up partitions with mdadm, mdadm hung, after attaching to mdadm with strace I got next:
> 
> # pgrep mdadm | xargs strace -fp
> Process 27389 attached - interrupt to quit
> unlink("/dev/.tmp.md.27389:9:127")      = 0
> mknod("/tmp/.tmp.md.27389:9:127", S_IFBLK|0600, makedev(9, 127)) = 0
> open("/tmp/.tmp.md.27389:9:127", O_RDWR|O_EXCL|O_DIRECT) <-- *hung*
> 
> After, I looked into dmesg, and found this:
> [ 9627.630018] ------------[ cut here ]------------
> [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
> [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'
> [ 9627.630033] Modules linked in: xt_tcpudp iptable_filter ip_tables x_tables nfsd nfs lockd grace sunrpc ipmi_devintf netconsole configfs loop hid_generic usbhid hid x86_pkg_temp_thermal coretemp ghash_clmulni_intel aesni_intel ioatdma ehci_pci aes_x86_64 iTCO_wdt iTCO_ve
> [ 9627.630074] CPU: 18 PID: 3330 Comm: mdadm Not tainted 4.0.0bl-azat-v6+ #1
> [ 9627.630076] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013
> [ 9627.630077]  0000000000000000 ffffffff814e3fcc ffffffff813e590f ffff885f9bcd3808
> [ 9627.630079]  ffffffff8104575c ffff885f96acb000 ffff885fa4b3e3c0 ffff885fa2fec780
> [ 9627.630081]  ffff885fa4bc4000 0000000000000000 ffffffff810457d5 ffffffff814e5d78
> [ 9627.630083] Call Trace:
> [ 9627.630091]  [<ffffffff813e590f>] ? dump_stack+0x40/0x50
> [ 9627.630096]  [<ffffffff8104575c>] ? warn_slowpath_common+0x7c/0xb0
> [ 9627.630098]  [<ffffffff810457d5>] ? warn_slowpath_fmt+0x45/0x50
> [ 9627.630100]  [<ffffffff81185092>] ? kernfs_path+0x42/0x50
> [ 9627.630102]  [<ffffffff811883da>] ? sysfs_warn_dup+0x5a/0x70
> [ 9627.630104]  [<ffffffff8118846e>] ? sysfs_create_dir_ns+0x7e/0x90
> [ 9627.630108]  [<ffffffff811d94ab>] ? kobject_add_internal+0x9b/0x2f0
> [ 9627.630109]  [<ffffffff811d9af6>] ? kobject_add+0x66/0xb0
> [ 9627.630114]  [<ffffffff812bb2e3>] ? device_add+0x263/0x620
> [ 9627.630116]  [<ffffffff812bb8a8>] ? device_create_groups_vargs+0xe8/0x100
> [ 9627.630118]  [<ffffffff812bb8d3>] ? device_create_vargs+0x13/0x20
> [ 9627.630124]  [<ffffffff810ed128>] ? bdi_register+0x68/0x150
> [ 9627.630129]  [<ffffffff811c535d>] ? add_disk+0x14d/0x4a0
> [ 9627.630132]  [<ffffffff811c585f>] ? alloc_disk_node+0xaf/0x100
> [ 9627.630137]  [<ffffffffa0252269>] ? md_alloc+0x1e9/0x350 [md_mod]
> [ 9627.630141]  [<ffffffffa02523db>] ? md_probe+0xb/0x20 [md_mod]
> [ 9627.630143]  [<ffffffff812c0654>] ? kobj_lookup+0x104/0x170
> [ 9627.630147]  [<ffffffffa02523d0>] ? md_alloc+0x350/0x350 [md_mod]
> [ 9627.630149]  [<ffffffff811c4da8>] ? get_gendisk+0x28/0xf0
> [ 9627.630153]  [<ffffffff8115fb74>] ? __blkdev_get+0x114/0x3c0
> [ 9627.630156]  [<ffffffff8115e590>] ? bdev_direct_access+0xa0/0xa0
> [ 9627.630158]  [<ffffffff8115e5a0>] ? bdev_test+0x10/0x10
> [ 9627.630160]  [<ffffffff8115fe58>] ? blkdev_get+0x38/0x310
> [ 9627.630162]  [<ffffffff81160170>] ? blkdev_get_by_dev+0x40/0x40
> [ 9627.630167]  [<ffffffff8112b3d3>] ? do_dentry_open.isra.16+0x153/0x320
> [ 9627.630170]  [<ffffffff811380f3>] ? do_last.isra.51+0x323/0xd50
> [ 9627.630172]  [<ffffffff8111f5b3>] ? kmem_cache_alloc+0x123/0x130
> [ 9627.630174]  [<ffffffff8113a97f>] ? path_openat+0x7f/0x610
> [ 9627.630177]  [<ffffffff810f7480>] ? tlb_flush_mmu_free+0x30/0x50
> [ 9627.630180]  [<ffffffff810fe800>] ? unmap_region+0xb0/0xf0
> [ 9627.630182]  [<ffffffff8113bb3b>] ? do_filp_open+0x2b/0x90
> [ 9627.630187]  [<ffffffff811472ec>] ? __alloc_fd+0x7c/0x120
> [ 9627.630189]  [<ffffffff8112c531>] ? do_sys_open+0x121/0x210
> [ 9627.630193]  [<ffffffff813ea097>] ? system_call_fastpath+0x12/0x6a
> [ 9627.630195] ---[ end trace b7a3e9c6f05c2666 ]---
> [ 9627.630196] ------------[ cut here ]------------
> [ 9627.630198] WARNING: CPU: 18 PID: 3330 at lib/kobject.c:240 kobject_add_internal+0x274/0x2f0()
> [ 9627.630200] kobject_add_internal failed for 9:127 with -EEXIST, don't try to register things with the same name in the same directory.
> [ 9627.630201] Modules linked in: xt_tcpudp iptable_filter ip_tables x_tables nfsd nfs lockd grace sunrpc ipmi_devintf netconsole configfs loop hid_generic usbhid hid x86_pkg_temp_thermal coretemp ghash_clmulni_intel aesni_intel ioatdma ehci_pci aes_x86_64 iTCO_wdt iTCO_ve
> [ 9627.630223] CPU: 18 PID: 3330 Comm: mdadm Tainted: G        W       4.0.0bl-azat-v6+ #1
> [ 9627.630224] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013
> [ 9627.630225]  0000000000000000 ffffffff814f2c88 ffffffff813e590f ffff885f9bcd3858
> [ 9627.630227]  ffffffff8104575c ffff885fa4bc4010 00000000ffffffef ffff885fa473f420
> [ 9627.630229]  ffff885fa4bc4000 0000000000000000 ffffffff810457d5 ffffffff814f2e48
> [ 9627.630230] Call Trace:
> [ 9627.630233]  [<ffffffff813e590f>] ? dump_stack+0x40/0x50
> [ 9627.630235]  [<ffffffff8104575c>] ? warn_slowpath_common+0x7c/0xb0
> [ 9627.630236]  [<ffffffff810457d5>] ? warn_slowpath_fmt+0x45/0x50
> [ 9627.630238]  [<ffffffff8118846e>] ? sysfs_create_dir_ns+0x7e/0x90
> [ 9627.630240]  [<ffffffff811d9684>] ? kobject_add_internal+0x274/0x2f0
> [ 9627.630242]  [<ffffffff811d9af6>] ? kobject_add+0x66/0xb0
> [ 9627.630244]  [<ffffffff812bb2e3>] ? device_add+0x263/0x620
> [ 9627.630245]  [<ffffffff812bb8a8>] ? device_create_groups_vargs+0xe8/0x100
> [ 9627.630247]  [<ffffffff812bb8d3>] ? device_create_vargs+0x13/0x20
> [ 9627.630250]  [<ffffffff810ed128>] ? bdi_register+0x68/0x150
> [ 9627.630252]  [<ffffffff811c535d>] ? add_disk+0x14d/0x4a0
> [ 9627.630255]  [<ffffffff811c585f>] ? alloc_disk_node+0xaf/0x100
> [ 9627.630258]  [<ffffffffa0252269>] ? md_alloc+0x1e9/0x350 [md_mod]
> [ 9627.630261]  [<ffffffffa02523db>] ? md_probe+0xb/0x20 [md_mod]
> [ 9627.630262]  [<ffffffff812c0654>] ? kobj_lookup+0x104/0x170
> [ 9627.630266]  [<ffffffffa02523d0>] ? md_alloc+0x350/0x350 [md_mod]
> [ 9627.630268]  [<ffffffff811c4da8>] ? get_gendisk+0x28/0xf0
> [ 9627.630270]  [<ffffffff8115fb74>] ? __blkdev_get+0x114/0x3c0
> [ 9627.630272]  [<ffffffff8115e590>] ? bdev_direct_access+0xa0/0xa0
> [ 9627.630274]  [<ffffffff8115e5a0>] ? bdev_test+0x10/0x10
> [ 9627.630276]  [<ffffffff8115fe58>] ? blkdev_get+0x38/0x310
> [ 9627.630278]  [<ffffffff81160170>] ? blkdev_get_by_dev+0x40/0x40
> [ 9627.630280]  [<ffffffff8112b3d3>] ? do_dentry_open.isra.16+0x153/0x320
> [ 9627.630282]  [<ffffffff811380f3>] ? do_last.isra.51+0x323/0xd50
> [ 9627.630283]  [<ffffffff8111f5b3>] ? kmem_cache_alloc+0x123/0x130
> [ 9627.630285]  [<ffffffff8113a97f>] ? path_openat+0x7f/0x610
> [ 9627.630287]  [<ffffffff810f7480>] ? tlb_flush_mmu_free+0x30/0x50
> [ 9627.630289]  [<ffffffff810fe800>] ? unmap_region+0xb0/0xf0
> [ 9627.630291]  [<ffffffff8113bb3b>] ? do_filp_open+0x2b/0x90
> [ 9627.630293]  [<ffffffff811472ec>] ? __alloc_fd+0x7c/0x120
> [ 9627.630295]  [<ffffffff8112c531>] ? do_sys_open+0x121/0x210
> [ 9627.630297]  [<ffffffff813ea097>] ? system_call_fastpath+0x12/0x6a
> [ 9627.630298] ---[ end trace b7a3e9c6f05c2667 ]---
> [ 9627.630395] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> [ 9627.630430] IP: [<ffffffff8118869a>] sysfs_do_create_link_sd.isra.2+0x2a/0xb0
> [ 9627.630524] PGD 5fa2d03067 PUD 5f9d679067 PMD 0 
> [ 9627.630550] Oops: 0000 [#1] SMP 
> [ 9627.630624] Modules linked in: xt_tcpudp iptable_filter ip_tables x_tables nfsd nfs lockd grace sunrpc ipmi_devintf netconsole configfs loop hid_generic usbhid hid x86_pkg_temp_thermal coretemp ghash_clmulni_intel aesni_intel ioatdma ehci_pci aes_x86_64 iTCO_wdt iTCO_ve
> [ 9627.631073] CPU: 18 PID: 3330 Comm: mdadm Tainted: G        W       4.0.0bl-azat-v6+ #1
> [ 9627.631090] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013
> [ 9627.631109] task: ffff885fa4659f00 ti: ffff885f9bcd0000 task.ti: ffff885f9bcd0000
> [ 9627.631124] RIP: 0010:[<ffffffff8118869a>]  [<ffffffff8118869a>] sysfs_do_create_link_sd.isra.2+0x2a/0xb0
> [ 9627.631162] RSP: 0018:ffff885f9bcd3a78  EFLAGS: 00010246
> [ 9627.631189] RAX: 000000000000e6e6 RBX: 0000000000000040 RCX: 00000000000000e6
> [ 9627.631219] RDX: ffffffff814e19d0 RSI: 0000000000000040 RDI: ffffffff81740d88
> [ 9627.631249] RBP: ffffffff814e19d0 R08: 0000000000017d60 R09: ffff88607fcd7d60
> [ 9627.631278] R10: ffff882fbf802400 R11: ffffea017e830e00 R12: 0000000000000001
> [ 9627.631308] R13: ffff885fa5a61ca8 R14: ffff885fa4bc3c70 R15: ffff885fa4bc3c00
> [ 9627.631338] FS:  00007f439e991700(0000) GS:ffff88607fcc0000(0000) knlGS:0000000000000000
> [ 9627.631383] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 9627.631411] CR2: 0000000000000040 CR3: 0000005f9fe22000 CR4: 00000000001407e0
> [ 9627.631440] Stack:
> [ 9627.631460]  ffff885fa4bc3c00 ffff885f9bf41428 ffff885fa4bc3c0c ffff885fa4bc3c80
> [ 9627.631515]  ffff885fa4bc3c70 ffffffff811c53f3 ffff885fa4bc3c00 ffffffff00000018
> [ 9627.631569]  0090007f9bcd3b08 ffff885fa4bc3c00 0000000000000000 0000000000000001
> [ 9627.631680] Call Trace:
> [ 9627.631703]  [<ffffffff811c53f3>] ? add_disk+0x1e3/0x4a0
> [ 9627.631733]  [<ffffffffa0252269>] ? md_alloc+0x1e9/0x350 [md_mod]
> [ 9627.631763]  [<ffffffffa02523db>] ? md_probe+0xb/0x20 [md_mod]
> [ 9627.631791]  [<ffffffff812c0654>] ? kobj_lookup+0x104/0x170
> [ 9627.631820]  [<ffffffffa02523d0>] ? md_alloc+0x350/0x350 [md_mod]
> [ 9627.631849]  [<ffffffff811c4da8>] ? get_gendisk+0x28/0xf0
> [ 9627.631877]  [<ffffffff8115fb74>] ? __blkdev_get+0x114/0x3c0
> [ 9627.631905]  [<ffffffff8115e590>] ? bdev_direct_access+0xa0/0xa0
> [ 9627.631933]  [<ffffffff8115e5a0>] ? bdev_test+0x10/0x10
> [ 9627.631961]  [<ffffffff8115fe58>] ? blkdev_get+0x38/0x310
> [ 9627.631988]  [<ffffffff81160170>] ? blkdev_get_by_dev+0x40/0x40
> [ 9627.632017]  [<ffffffff8112b3d3>] ? do_dentry_open.isra.16+0x153/0x320
> [ 9627.632046]  [<ffffffff811380f3>] ? do_last.isra.51+0x323/0xd50
> [ 9627.632075]  [<ffffffff8111f5b3>] ? kmem_cache_alloc+0x123/0x130
> [ 9627.632103]  [<ffffffff8113a97f>] ? path_openat+0x7f/0x610
> [ 9627.632131]  [<ffffffff810f7480>] ? tlb_flush_mmu_free+0x30/0x50
> [ 9627.632159]  [<ffffffff810fe800>] ? unmap_region+0xb0/0xf0
> [ 9627.632186]  [<ffffffff8113bb3b>] ? do_filp_open+0x2b/0x90
> [ 9627.632215]  [<ffffffff811472ec>] ? __alloc_fd+0x7c/0x120
> [ 9627.632242]  [<ffffffff8112c531>] ? do_sys_open+0x121/0x210
> [ 9627.632270]  [<ffffffff813ea097>] ? system_call_fastpath+0x12/0x6a
> [ 9627.632298] Code: 00 48 85 d2 74 73 48 85 ff 74 6e 41 56 41 55 49 89 fd 41 54 55 48 c7 c7 88 0d 74 81 53 48 89 f3 41 89 cc 48 89 d5 e8 76 15 26 00 <48> 8b 1b 48 85 db 74 08 48 89 df e8 f6 c9 ff ff 80 05 d7 86 5b 
> [ 9627.632557] RIP  [<ffffffff8118869a>] sysfs_do_create_link_sd.isra.2+0x2a/0xb0
> [ 9627.632604]  RSP <ffff885f9bcd3a78>
> [ 9627.632627] CR2: 0000000000000040
> [ 9627.633014] ---[ end trace b7a3e9c6f05c2668 ]---
> 
> Any assumptions?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Bisected, with rfc/patch - was Re: BUG: unable to handle kernel NULL pointer dereference at sysfs_do_create_link_sd (after mdadm)
  2015-04-23  6:05 ` Bisected, with rfc/patch - was " NeilBrown
@ 2015-04-23  7:37   ` Christoph Hellwig
  2015-04-23  8:03     ` NeilBrown
  2015-04-27  4:12   ` [PATCH -stable] block: destroy bdi before blockdev is unregistered NeilBrown
  1 sibling, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2015-04-23  7:37 UTC (permalink / raw)
  To: NeilBrown
  Cc: Azat Khuzhin, Christoph Hellwig, Kernel.org-Linux-RAID,
	Guoqing Jiang, Tejun Heo, Jan Kara, Jens Axboe

Plase fix your device name lifetimes.

See the DM commit

    63a4f0 ("dm: fix add_disk() NULL pointer due to race with free_dev()")

for a template.

Unregistering the device too early means we'll have half constructed
bdis hanging around, which caused all kinds of problems for filesystems
and the writeback code.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bisected, with rfc/patch - was Re: BUG: unable to handle kernel NULL pointer dereference at sysfs_do_create_link_sd (after mdadm)
  2015-04-23  7:37   ` Christoph Hellwig
@ 2015-04-23  8:03     ` NeilBrown
  2015-04-23 16:10       ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: NeilBrown @ 2015-04-23  8:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Azat Khuzhin, Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo,
	Jan Kara, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 798 bytes --]

On Thu, 23 Apr 2015 09:37:24 +0200 Christoph Hellwig <hch@lst.de> wrote:

> Plase fix your device name lifetimes.

Any chance you could be more explicit?

The commit you identified doesn't seem to help much - md and dm are quite
different in this area.

It seems that it is no longer safe to call 'add_disk' between calling
'del_gendisk' and bdi_destroy being called.  How can I find out if I am in
that window, or wait for bdi_destroy to be called?

Thanks,
NeilBrown


> 
> See the DM commit
> 
>     63a4f0 ("dm: fix add_disk() NULL pointer due to race with free_dev()")
> 
> for a template.
> 
> Unregistering the device too early means we'll have half constructed
> bdis hanging around, which caused all kinds of problems for filesystems
> and the writeback code.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bisected, with rfc/patch - was Re: BUG: unable to handle kernel NULL pointer dereference at sysfs_do_create_link_sd (after mdadm)
  2015-04-23  8:03     ` NeilBrown
@ 2015-04-23 16:10       ` Christoph Hellwig
  2015-04-24  2:09         ` NeilBrown
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2015-04-23 16:10 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Azat Khuzhin, Kernel.org-Linux-RAID,
	Guoqing Jiang, Tejun Heo, Jan Kara, Jens Axboe, dm-devel

On Thu, Apr 23, 2015 at 06:03:14PM +1000, NeilBrown wrote:
> On Thu, 23 Apr 2015 09:37:24 +0200 Christoph Hellwig <hch@lst.de> wrote:
> 
> > Plase fix your device name lifetimes.
> 
> Any chance you could be more explicit?
>
> The commit you identified doesn't seem to help much - md and dm are quite
> different in this area.
> 
> It seems that it is no longer safe to call 'add_disk' between calling
> 'del_gendisk' and bdi_destroy being called.  How can I find out if I am in
> that window, or wait for bdi_destroy to be called?

The bdi is only around if the device is open, either through a device
node, or through a blkdev_get from a file system.  If you get duplicate
names that means you're trying to allocate a new gendisk while the old
one is still around.

In theory you're fine once the device gets ->release called.

Except that we can hold sysfs reference to the qeue, eww.  So for now
try to follow the dm model, but I'll need to add a callback to the
queue called once the request_queue actually is released for this.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Bisected, with rfc/patch - was Re: BUG: unable to handle kernel NULL pointer dereference at sysfs_do_create_link_sd (after mdadm)
  2015-04-23 16:10       ` Christoph Hellwig
@ 2015-04-24  2:09         ` NeilBrown
  2015-04-24  8:27           ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: NeilBrown @ 2015-04-24  2:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Azat Khuzhin, Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo,
	Jan Kara, Jens Axboe, dm-devel

[-- Attachment #1: Type: text/plain, Size: 3201 bytes --]

On Thu, 23 Apr 2015 18:10:51 +0200 Christoph Hellwig <hch@lst.de> wrote:

> On Thu, Apr 23, 2015 at 06:03:14PM +1000, NeilBrown wrote:
> > On Thu, 23 Apr 2015 09:37:24 +0200 Christoph Hellwig <hch@lst.de> wrote:
> > 
> > > Plase fix your device name lifetimes.
> > 
> > Any chance you could be more explicit?
> >
> > The commit you identified doesn't seem to help much - md and dm are quite
> > different in this area.
> > 
> > It seems that it is no longer safe to call 'add_disk' between calling
> > 'del_gendisk' and bdi_destroy being called.  How can I find out if I am in
> > that window, or wait for bdi_destroy to be called?
> 
> The bdi is only around if the device is open, either through a device
> node, or through a blkdev_get from a file system.  If you get duplicate
> names that means you're trying to allocate a new gendisk while the old
> one is still around.
> 
> In theory you're fine once the device gets ->release called.

In practice .... the put_disk() shortly after the ->release call in
__blkdev_put() is what ultimately releases the name of the bdi - at least in
the cases where I get a crash.

> 
> Except that we can hold sysfs reference to the qeue, eww.  So for now
> try to follow the dm model, but I'll need to add a callback to the
> queue called once the request_queue actually is released for this.

I'm pretty sure that the md code is already as close to the "dm model" as it
meaningfully can be.

If I move bdi_destroy out of blk_release_queue (which really think is too
later) and place it in blk_cleanup_queue (which seems a credible place for
it), and then move the blk_cleanup_queue call in md_free up before the
del_gendisk() call (which is probably the right thing to do anyway, though dm
has the same order that md currently has) then I don't get any crashes and
I'm almost convince it is correct...

Thoughts?

Thanks,
NeilBrown


diff --git a/block/blk-core.c b/block/blk-core.c
index 794c3e7f01cf..66406474f0c4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -552,6 +552,8 @@ void blk_cleanup_queue(struct request_queue *q)
 		q->queue_lock = &q->__queue_lock;
 	spin_unlock_irq(lock);
 
+	bdi_destroy(&q->backing_dev_info);
+
 	/* @q is and will stay empty, shutdown and put */
 	blk_put_queue(q);
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index faaf36ade7eb..2b8fd302f677 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -522,8 +522,6 @@ static void blk_release_queue(struct kobject *kobj)
 
 	blk_trace_shutdown(q);
 
-	bdi_destroy(&q->backing_dev_info);
-
 	ida_simple_remove(&blk_queue_ida, q->id);
 	call_rcu(&q->rcu_head, blk_free_queue_rcu);
 }
diff --git a/drivers/md/md.c b/drivers/md/md.c
index d4f31e195e26..593a02476c78 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4818,12 +4818,12 @@ static void md_free(struct kobject *ko)
 	if (mddev->sysfs_state)
 		sysfs_put(mddev->sysfs_state);
 
+	if (mddev->queue)
+		blk_cleanup_queue(mddev->queue);
 	if (mddev->gendisk) {
 		del_gendisk(mddev->gendisk);
 		put_disk(mddev->gendisk);
 	}
-	if (mddev->queue)
-		blk_cleanup_queue(mddev->queue);
 
 	kfree(mddev);
 }

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Bisected, with rfc/patch - was Re: BUG: unable to handle kernel NULL pointer dereference at sysfs_do_create_link_sd (after mdadm)
  2015-04-24  2:09         ` NeilBrown
@ 2015-04-24  8:27           ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2015-04-24  8:27 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Azat Khuzhin, Kernel.org-Linux-RAID,
	Guoqing Jiang, Tejun Heo, Jan Kara, Jens Axboe, dm-devel

On Fri, Apr 24, 2015 at 12:09:32PM +1000, NeilBrown wrote:
> I'm pretty sure that the md code is already as close to the "dm model" as it
> meaningfully can be.
> 
> If I move bdi_destroy out of blk_release_queue (which really think is too
> later) and place it in blk_cleanup_queue (which seems a credible place for
> it), and then move the blk_cleanup_queue call in md_free up before the
> del_gendisk() call (which is probably the right thing to do anyway, though dm
> has the same order that md currently has) then I don't get any crashes and
> I'm almost convince it is correct...

This looks reasonable to me, thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH -stable] block: destroy bdi before blockdev is unregistered.
  2015-04-23  6:05 ` Bisected, with rfc/patch - was " NeilBrown
  2015-04-23  7:37   ` Christoph Hellwig
@ 2015-04-27  4:12   ` NeilBrown
  2015-04-27 13:03     ` Christoph Hellwig
                       ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: NeilBrown @ 2015-04-27  4:12 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Azat Khuzhin, Christoph Hellwig, Kernel.org-Linux-RAID,
	Guoqing Jiang, Tejun Heo, Jan Kara, lkml

[-- Attachment #1: Type: text/plain, Size: 3587 bytes --]


Because of the peculiar way that md devices are created (automatically
when the device node is opened), a new device can be created and
registered immediately after the
	blk_unregister_region(disk_devt(disk), disk->minors);
call in del_gendisk().

Therefore it is important that all visible artifacts of the previous
device are removed before this call.  In particular, the 'bdi'.

Since:
commit c4db59d31e39ea067c32163ac961e9c80198fd37
Author: Christoph Hellwig <hch@lst.de>
    fs: don't reassign dirty inodes to default_backing_dev_info

moved the
   device_unregister(bdi->dev);
call from bdi_unregister() to bdi_destroy() it has been quite easy to
lose a race and have a new (e.g.) "md127" be created after the
blk_unregister_region() call and before bdi_destroy() is ultimately
called by the final 'put_disk', which must come after del_gendisk().

The new device finds that the bdi name is already registered in sysfs
and complains

> [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
> [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'

We can fix this by moving the bdi_destroy() call out of
blk_release_queue() (which can happen very late when a refcount
reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
device driver calls it.

Then it is only necessary for md to call blk_cleanup_queue() before
del_gendisk().  As loop.c devices are also created on demand by
opening the device node, we make the same change there.

Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
Reported-by: Azat Khuzhin <a3at.mail@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org (v4.0)
Signed-off-by: NeilBrown <neilb@suse.de>

--
Hi Jens,
 if you could check this and forward on to Linus I'd really appreciate it.

Thanks,
NeilBrown


diff --git a/block/blk-core.c b/block/blk-core.c
index fd154b94447a..7871603f0a29 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -552,6 +552,8 @@ void blk_cleanup_queue(struct request_queue *q)
 		q->queue_lock = &q->__queue_lock;
 	spin_unlock_irq(lock);
 
+	bdi_destroy(&q->backing_dev_info);
+
 	/* @q is and will stay empty, shutdown and put */
 	blk_put_queue(q);
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index faaf36ade7eb..2b8fd302f677 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -522,8 +522,6 @@ static void blk_release_queue(struct kobject *kobj)
 
 	blk_trace_shutdown(q);
 
-	bdi_destroy(&q->backing_dev_info);
-
 	ida_simple_remove(&blk_queue_ida, q->id);
 	call_rcu(&q->rcu_head, blk_free_queue_rcu);
 }
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ae3fcb4199e9..d7173cb1ea76 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1620,8 +1620,8 @@ out:
 
 static void loop_remove(struct loop_device *lo)
 {
-	del_gendisk(lo->lo_disk);
 	blk_cleanup_queue(lo->lo_queue);
+	del_gendisk(lo->lo_disk);
 	blk_mq_free_tag_set(&lo->tag_set);
 	put_disk(lo->lo_disk);
 	kfree(lo);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index d4f31e195e26..593a02476c78 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4818,12 +4818,12 @@ static void md_free(struct kobject *ko)
 	if (mddev->sysfs_state)
 		sysfs_put(mddev->sysfs_state);
 
+	if (mddev->queue)
+		blk_cleanup_queue(mddev->queue);
 	if (mddev->gendisk) {
 		del_gendisk(mddev->gendisk);
 		put_disk(mddev->gendisk);
 	}
-	if (mddev->queue)
-		blk_cleanup_queue(mddev->queue);
 
 	kfree(mddev);
 }

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH -stable] block: destroy bdi before blockdev is unregistered.
  2015-04-27  4:12   ` [PATCH -stable] block: destroy bdi before blockdev is unregistered NeilBrown
@ 2015-04-27 13:03     ` Christoph Hellwig
  2015-04-27 16:27     ` Jens Axboe
  2015-04-28 16:41     ` Mike Snitzer
  2 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2015-04-27 13:03 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jens Axboe, Azat Khuzhin, Christoph Hellwig,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH -stable] block: destroy bdi before blockdev is unregistered.
  2015-04-27  4:12   ` [PATCH -stable] block: destroy bdi before blockdev is unregistered NeilBrown
  2015-04-27 13:03     ` Christoph Hellwig
@ 2015-04-27 16:27     ` Jens Axboe
  2015-04-28 16:41     ` Mike Snitzer
  2 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2015-04-27 16:27 UTC (permalink / raw)
  To: NeilBrown
  Cc: Azat Khuzhin, Christoph Hellwig, Kernel.org-Linux-RAID,
	Guoqing Jiang, Tejun Heo, Jan Kara, lkml

On 04/26/2015 10:12 PM, NeilBrown wrote:
>
> Because of the peculiar way that md devices are created (automatically
> when the device node is opened), a new device can be created and
> registered immediately after the
> 	blk_unregister_region(disk_devt(disk), disk->minors);
> call in del_gendisk().
>
> Therefore it is important that all visible artifacts of the previous
> device are removed before this call.  In particular, the 'bdi'.
>
> Since:
> commit c4db59d31e39ea067c32163ac961e9c80198fd37
> Author: Christoph Hellwig <hch@lst.de>
>      fs: don't reassign dirty inodes to default_backing_dev_info
>
> moved the
>     device_unregister(bdi->dev);
> call from bdi_unregister() to bdi_destroy() it has been quite easy to
> lose a race and have a new (e.g.) "md127" be created after the
> blk_unregister_region() call and before bdi_destroy() is ultimately
> called by the final 'put_disk', which must come after del_gendisk().
>
> The new device finds that the bdi name is already registered in sysfs
> and complains
>
>> [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
>> [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'
>
> We can fix this by moving the bdi_destroy() call out of
> blk_release_queue() (which can happen very late when a refcount
> reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
> device driver calls it.
>
> Then it is only necessary for md to call blk_cleanup_queue() before
> del_gendisk().  As loop.c devices are also created on demand by
> opening the device node, we make the same change there.
>
> Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
> Reported-by: Azat Khuzhin <a3at.mail@gmail.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: stable@vger.kernel.org (v4.0)
> Signed-off-by: NeilBrown <neilb@suse.de>
>
> --
> Hi Jens,
>   if you could check this and forward on to Linus I'd really appreciate it.

Yup, I added it. BTW, that line needs 3 '-', otherwise git am will pick 
up the comments below :-)

Thanks Neil.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH -stable] block: destroy bdi before blockdev is unregistered.
  2015-04-27  4:12   ` [PATCH -stable] block: destroy bdi before blockdev is unregistered NeilBrown
  2015-04-27 13:03     ` Christoph Hellwig
  2015-04-27 16:27     ` Jens Axboe
@ 2015-04-28 16:41     ` Mike Snitzer
  2015-04-28 21:25       ` NeilBrown
  2 siblings, 1 reply; 21+ messages in thread
From: Mike Snitzer @ 2015-04-28 16:41 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jens Axboe, Azat Khuzhin, Christoph Hellwig,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml,
	device-mapper development

On Mon, Apr 27, 2015 at 12:12 AM, NeilBrown <neilb@suse.de> wrote:
>
> Because of the peculiar way that md devices are created (automatically
> when the device node is opened), a new device can be created and
> registered immediately after the
>         blk_unregister_region(disk_devt(disk), disk->minors);
> call in del_gendisk().
>
> Therefore it is important that all visible artifacts of the previous
> device are removed before this call.  In particular, the 'bdi'.
>
> Since:
> commit c4db59d31e39ea067c32163ac961e9c80198fd37
> Author: Christoph Hellwig <hch@lst.de>
>     fs: don't reassign dirty inodes to default_backing_dev_info
>
> moved the
>    device_unregister(bdi->dev);
> call from bdi_unregister() to bdi_destroy() it has been quite easy to
> lose a race and have a new (e.g.) "md127" be created after the
> blk_unregister_region() call and before bdi_destroy() is ultimately
> called by the final 'put_disk', which must come after del_gendisk().
>
> The new device finds that the bdi name is already registered in sysfs
> and complains
>
>> [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
>> [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'
>
> We can fix this by moving the bdi_destroy() call out of
> blk_release_queue() (which can happen very late when a refcount
> reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
> device driver calls it.
>
> Then it is only necessary for md to call blk_cleanup_queue() before
> del_gendisk().  As loop.c devices are also created on demand by
> opening the device node, we make the same change there.
>
> Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
> Reported-by: Azat Khuzhin <a3at.mail@gmail.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: stable@vger.kernel.org (v4.0)
> Signed-off-by: NeilBrown <neilb@suse.de>
>
> --
> Hi Jens,
>  if you could check this and forward on to Linus I'd really appreciate it.
>
> Thanks,
> NeilBrown
>
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index fd154b94447a..7871603f0a29 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -552,6 +552,8 @@ void blk_cleanup_queue(struct request_queue *q)
>                 q->queue_lock = &q->__queue_lock;
>         spin_unlock_irq(lock);
>
> +       bdi_destroy(&q->backing_dev_info);
> +
>         /* @q is and will stay empty, shutdown and put */
>         blk_put_queue(q);
>  }
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index faaf36ade7eb..2b8fd302f677 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -522,8 +522,6 @@ static void blk_release_queue(struct kobject *kobj)
>
>         blk_trace_shutdown(q);
>
> -       bdi_destroy(&q->backing_dev_info);
> -
>         ida_simple_remove(&blk_queue_ida, q->id);
>         call_rcu(&q->rcu_head, blk_free_queue_rcu);
>  }
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index ae3fcb4199e9..d7173cb1ea76 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1620,8 +1620,8 @@ out:
>
>  static void loop_remove(struct loop_device *lo)
>  {
> -       del_gendisk(lo->lo_disk);
>         blk_cleanup_queue(lo->lo_queue);
> +       del_gendisk(lo->lo_disk);
>         blk_mq_free_tag_set(&lo->tag_set);
>         put_disk(lo->lo_disk);
>         kfree(lo);
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index d4f31e195e26..593a02476c78 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4818,12 +4818,12 @@ static void md_free(struct kobject *ko)
>         if (mddev->sysfs_state)
>                 sysfs_put(mddev->sysfs_state);
>
> +       if (mddev->queue)
> +               blk_cleanup_queue(mddev->queue);
>         if (mddev->gendisk) {
>                 del_gendisk(mddev->gendisk);
>                 put_disk(mddev->gendisk);
>         }
> -       if (mddev->queue)
> -               blk_cleanup_queue(mddev->queue);
>
>         kfree(mddev);
>  }

I've taken this patch into consideration relative to DM, please see:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip&id=5dac86fb79697e93297edb6a316b429236d00137

Point is in my private snitzer/wip branch DM now calls
blk_cleanup_queue() before del_gendisk().

With your patch:
1) blk_cleanup_queue -> bdi_destroy -> bdi->dev = NULL;
2) del_gendisk -> bdi_unregister -> WARN_ON_ONCE(!bdi->dev)

So, in testing with DM I'm hitting bdi_unregister's WARN_ON(), are you
seeing this WARN_ON?

[  385.134474] WARNING: CPU: 3 PID: 11231 at mm/backing-dev.c:372
bdi_unregister+0x36/0x40()
[  385.143593] Modules linked in: dm_service_time dm_multipath
target_core_iblock tcm_loop target_core_mod sg iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi core
temp kvm_intel kvm ixgbe igb crct10dif_pclmul crc32_pclmul
crc32c_intel ghash_clmulni_intel mdio aesni_intel ptp pps_core
glue_helper ipmi_si i7core_edac iTCO_wdt lrw
dca iTCO_vendor_support edac_core i2c_i801 pcspkr gf128mul
ipmi_msghandler acpi_power_meter shpchp lpc_ich ablk_helper cryptd
mfd_core acpi_cpufreq xfs libcrc32c sr_mo
d cdrom mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit
drm_kms_helper ata_generic ttm pata_acpi sd_mod ata_piix drm
iomemory_vsl(POE) libata megaraid_sas i2c_c
ore skd dm_mirror dm_region_hash dm_log dm_mod
[  385.213736] CPU: 3 PID: 11231 Comm: dmsetup Tainted: P          IOE
  4.1.0-rc1.snitm+ #55
[  385.222958] Hardware name: FUJITSU
PRIMERGY RX300 S6             /D2619, BIOS 6.00 Rev. 1.10.2619.N1
     05/24/2011
[  385.237602]  0000000000000000 000000006cf7a5f2 ffff88031bfcfb68
ffffffff8167b7e6
[  385.245897]  0000000000000000 0000000000000000 ffff88031bfcfba8
ffffffff8107bd5a
[  385.254193]  0000000000000000 ffff88032c4b6c00 0000000000000000
0000000000000000
[  385.262488] Call Trace:
[  385.265218]  [<ffffffff8167b7e6>] dump_stack+0x45/0x57
[  385.270955]  [<ffffffff8107bd5a>] warn_slowpath_common+0x8a/0xc0
[  385.277660]  [<ffffffff8107be8a>] warn_slowpath_null+0x1a/0x20
[  385.284171]  [<ffffffff8119e216>] bdi_unregister+0x36/0x40
[  385.290295]  [<ffffffff812fb8c1>] del_gendisk+0x131/0x2b0
[  385.296326]  [<ffffffffa0000eba>] cleanup_mapped_device+0xda/0x130 [dm_mod]
[  385.304101]  [<ffffffffa000283b>] __dm_destroy+0x19b/0x260 [dm_mod]
[  385.311099]  [<ffffffffa00040c3>] dm_destroy+0x13/0x20 [dm_mod]
[  385.317709]  [<ffffffffa0009d0e>] dev_remove+0x11e/0x180 [dm_mod]
[  385.324516]  [<ffffffffa0009bf0>] ? dev_suspend+0x250/0x250 [dm_mod]
[  385.331614]  [<ffffffffa000a3e5>] ctl_ioctl+0x255/0x500 [dm_mod]
[  385.338319]  [<ffffffff8128c8d8>] ? SYSC_semtimedop+0x298/0xea0
[  385.344931]  [<ffffffffa000a6a3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[  385.351733]  [<ffffffff8120d038>] do_vfs_ioctl+0x2f8/0x4f0
[  385.357857]  [<ffffffff811207e4>] ? __audit_syscall_entry+0xb4/0x110
[  385.364950]  [<ffffffff8102367c>] ? do_audit_syscall_entry+0x6c/0x70
[  385.372041]  [<ffffffff8120d2b1>] SyS_ioctl+0x81/0xa0
[  385.377679]  [<ffffffff81025046>] ? syscall_trace_leave+0xc6/0x120
[  385.384578]  [<ffffffff81682f2e>] system_call_fastpath+0x12/0x71
[  385.391282] ---[ end trace af60d8ac7157d319 ]---

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH -stable] block: destroy bdi before blockdev is unregistered.
  2015-04-28 16:41     ` Mike Snitzer
@ 2015-04-28 21:25       ` NeilBrown
  2015-04-29 13:35         ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: NeilBrown @ 2015-04-28 21:25 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, Azat Khuzhin, Christoph Hellwig,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml,
	device-mapper development

[-- Attachment #1: Type: text/plain, Size: 8089 bytes --]

On Tue, 28 Apr 2015 12:41:18 -0400 Mike Snitzer <snitzer@redhat.com> wrote:

> On Mon, Apr 27, 2015 at 12:12 AM, NeilBrown <neilb@suse.de> wrote:
> >
> > Because of the peculiar way that md devices are created (automatically
> > when the device node is opened), a new device can be created and
> > registered immediately after the
> >         blk_unregister_region(disk_devt(disk), disk->minors);
> > call in del_gendisk().
> >
> > Therefore it is important that all visible artifacts of the previous
> > device are removed before this call.  In particular, the 'bdi'.
> >
> > Since:
> > commit c4db59d31e39ea067c32163ac961e9c80198fd37
> > Author: Christoph Hellwig <hch@lst.de>
> >     fs: don't reassign dirty inodes to default_backing_dev_info
> >
> > moved the
> >    device_unregister(bdi->dev);
> > call from bdi_unregister() to bdi_destroy() it has been quite easy to
> > lose a race and have a new (e.g.) "md127" be created after the
> > blk_unregister_region() call and before bdi_destroy() is ultimately
> > called by the final 'put_disk', which must come after del_gendisk().
> >
> > The new device finds that the bdi name is already registered in sysfs
> > and complains
> >
> >> [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
> >> [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'
> >
> > We can fix this by moving the bdi_destroy() call out of
> > blk_release_queue() (which can happen very late when a refcount
> > reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
> > device driver calls it.
> >
> > Then it is only necessary for md to call blk_cleanup_queue() before
> > del_gendisk().  As loop.c devices are also created on demand by
> > opening the device node, we make the same change there.
> >
> > Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
> > Reported-by: Azat Khuzhin <a3at.mail@gmail.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: stable@vger.kernel.org (v4.0)
> > Signed-off-by: NeilBrown <neilb@suse.de>
> >
> > --
> > Hi Jens,
> >  if you could check this and forward on to Linus I'd really appreciate it.
> >
> > Thanks,
> > NeilBrown
> >
> >
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index fd154b94447a..7871603f0a29 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -552,6 +552,8 @@ void blk_cleanup_queue(struct request_queue *q)
> >                 q->queue_lock = &q->__queue_lock;
> >         spin_unlock_irq(lock);
> >
> > +       bdi_destroy(&q->backing_dev_info);
> > +
> >         /* @q is and will stay empty, shutdown and put */
> >         blk_put_queue(q);
> >  }
> > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > index faaf36ade7eb..2b8fd302f677 100644
> > --- a/block/blk-sysfs.c
> > +++ b/block/blk-sysfs.c
> > @@ -522,8 +522,6 @@ static void blk_release_queue(struct kobject *kobj)
> >
> >         blk_trace_shutdown(q);
> >
> > -       bdi_destroy(&q->backing_dev_info);
> > -
> >         ida_simple_remove(&blk_queue_ida, q->id);
> >         call_rcu(&q->rcu_head, blk_free_queue_rcu);
> >  }
> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> > index ae3fcb4199e9..d7173cb1ea76 100644
> > --- a/drivers/block/loop.c
> > +++ b/drivers/block/loop.c
> > @@ -1620,8 +1620,8 @@ out:
> >
> >  static void loop_remove(struct loop_device *lo)
> >  {
> > -       del_gendisk(lo->lo_disk);
> >         blk_cleanup_queue(lo->lo_queue);
> > +       del_gendisk(lo->lo_disk);
> >         blk_mq_free_tag_set(&lo->tag_set);
> >         put_disk(lo->lo_disk);
> >         kfree(lo);
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index d4f31e195e26..593a02476c78 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -4818,12 +4818,12 @@ static void md_free(struct kobject *ko)
> >         if (mddev->sysfs_state)
> >                 sysfs_put(mddev->sysfs_state);
> >
> > +       if (mddev->queue)
> > +               blk_cleanup_queue(mddev->queue);
> >         if (mddev->gendisk) {
> >                 del_gendisk(mddev->gendisk);
> >                 put_disk(mddev->gendisk);
> >         }
> > -       if (mddev->queue)
> > -               blk_cleanup_queue(mddev->queue);
> >
> >         kfree(mddev);
> >  }
> 
> I've taken this patch into consideration relative to DM, please see:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip&id=5dac86fb79697e93297edb6a316b429236d00137
> 
> Point is in my private snitzer/wip branch DM now calls
> blk_cleanup_queue() before del_gendisk().
> 
> With your patch:
> 1) blk_cleanup_queue -> bdi_destroy -> bdi->dev = NULL;
> 2) del_gendisk -> bdi_unregister -> WARN_ON_ONCE(!bdi->dev)
> 
> So, in testing with DM I'm hitting bdi_unregister's WARN_ON(), are you
> seeing this WARN_ON?

Hmmm.  Yes I am.  I wonder how I missed that.  Thanks!

As bdi_set_min_ratio doesn't touch bdi->dev, there seems to be no need for
the test, or the warning.

I wonder if it would make sense to move the bdi_set_min_ratio() call to
bdi_destroy, and discard bdi_unregister??
There is a comment which suggests bdi_unregister might be of use later, but
it might be best to have a clean slate in which to add whatever might be
needed??

NeilBrown

diff --git a/block/genhd.c b/block/genhd.c
index e351fc521053..1d4435478e8a 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -657,7 +657,6 @@ void del_gendisk(struct gendisk *disk)
 	disk->flags &= ~GENHD_FL_UP;
 
 	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
-	bdi_unregister(&disk->queue->backing_dev_info);
 	blk_unregister_queue(disk);
 	blk_unregister_region(disk_devt(disk), disk->minors);
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index aff923ae8c4b..d87d8eced064 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -116,7 +116,6 @@ __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 		const char *fmt, ...);
 int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
-void bdi_unregister(struct backing_dev_info *bdi);
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
 			enum wb_reason reason);
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 880dd7437172..c178d13d6f4c 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -250,7 +250,6 @@ DEFINE_EVENT(writeback_class, name, \
 DEFINE_WRITEBACK_EVENT(writeback_nowork);
 DEFINE_WRITEBACK_EVENT(writeback_wake_background);
 DEFINE_WRITEBACK_EVENT(writeback_bdi_register);
-DEFINE_WRITEBACK_EVENT(writeback_bdi_unregister);
 
 DECLARE_EVENT_CLASS(wbc_class,
 	TP_PROTO(struct writeback_control *wbc, struct backing_dev_info *bdi),
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6dc4580df2af..000e7b3b9896 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -359,23 +359,6 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
 	flush_delayed_work(&bdi->wb.dwork);
 }
 
-/*
- * Called when the device behind @bdi has been removed or ejected.
- *
- * We can't really do much here except for reducing the dirty ratio at
- * the moment.  In the future we should be able to set a flag so that
- * the filesystem can handle errors at mark_inode_dirty time instead
- * of only at writeback time.
- */
-void bdi_unregister(struct backing_dev_info *bdi)
-{
-	if (WARN_ON_ONCE(!bdi->dev))
-		return;
-
-	bdi_set_min_ratio(bdi, 0);
-}
-EXPORT_SYMBOL(bdi_unregister);
-
 static void bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
 {
 	memset(wb, 0, sizeof(*wb));
@@ -443,6 +426,7 @@ void bdi_destroy(struct backing_dev_info *bdi)
 	int i;
 
 	bdi_wb_shutdown(bdi);
+	bdi_set_min_ratio(bdi, 0);
 
 	WARN_ON(!list_empty(&bdi->work_list));
 	WARN_ON(delayed_work_pending(&bdi->wb.dwork));

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH -stable] block: destroy bdi before blockdev is unregistered.
  2015-04-28 21:25       ` NeilBrown
@ 2015-04-29 13:35         ` Christoph Hellwig
  2015-04-29 16:02           ` Peter Zijlstra
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2015-04-29 13:35 UTC (permalink / raw)
  To: NeilBrown
  Cc: Mike Snitzer, Jens Axboe, Azat Khuzhin, Christoph Hellwig,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml,
	device-mapper development, Peter Zijlstra

On Wed, Apr 29, 2015 at 07:25:30AM +1000, NeilBrown wrote:
> As bdi_set_min_ratio doesn't touch bdi->dev, there seems to be no need for
> the test, or the warning.
> 
> I wonder if it would make sense to move the bdi_set_min_ratio() call to
> bdi_destroy, and discard bdi_unregister??
> There is a comment which suggests bdi_unregister might be of use later, but
> it might be best to have a clean slate in which to add whatever might be
> needed??

This seems fine to me from the block dev point of view.  I don't really
understand the bdi_min_ratio logic, but Peter might have a better idea.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH -stable] block: destroy bdi before blockdev is unregistered.
  2015-04-29 13:35         ` Christoph Hellwig
@ 2015-04-29 16:02           ` Peter Zijlstra
  2015-04-30  0:06             ` NeilBrown
  2015-04-30  0:32             ` [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy() NeilBrown
  0 siblings, 2 replies; 21+ messages in thread
From: Peter Zijlstra @ 2015-04-29 16:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: NeilBrown, Mike Snitzer, Jens Axboe, Azat Khuzhin,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml,
	device-mapper development

On Wed, Apr 29, 2015 at 03:35:12PM +0200, Christoph Hellwig wrote:
> On Wed, Apr 29, 2015 at 07:25:30AM +1000, NeilBrown wrote:
> > As bdi_set_min_ratio doesn't touch bdi->dev, there seems to be no need for
> > the test, or the warning.
> > 
> > I wonder if it would make sense to move the bdi_set_min_ratio() call to
> > bdi_destroy, and discard bdi_unregister??
> > There is a comment which suggests bdi_unregister might be of use later, but
> > it might be best to have a clean slate in which to add whatever might be
> > needed??
> 
> This seems fine to me from the block dev point of view.  I don't really
> understand the bdi_min_ratio logic, but Peter might have a better idea.

Ah, that was a bit of digging, I've not looked at that in ages :-)

So if you look at bdi_dirty_limit()'s comment:

 * The bdi's share of dirty limit will be adapting to its throughput and
 * bounded by the bdi->min_ratio and/or bdi->max_ratio parameters, if set.

So the min_ratio is a minimum guaranteed fraction of the total
throughput.

Now the problem before commit ccb6108f5b0b ("mm/backing-dev.c: reset bdi
min_ratio in bdi_unregister()") was that since bdi_set_min_ratio()
keeps a global sum of bdi->min_ratio, you need to subtract from said
global sum when taking the BDI away. Otherwise we loose/leak a fraction
of the total throughput available (to the other BDIs).

Which is what that bdi_set_min_ratio(bdi, 0) in unregister does. It
resets the min_ratio for the bdi being taken out and frees up the min
allocated bandwidth for the others.

So I think moving that do destroy would be fine; assuming the delay
between unregister and destroy is typically 'short'. Because without
that you can 'leak' this min ratio for extended periods which means the
bandwidth is unavailable for other BDIs.

Does that make sense?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH -stable] block: destroy bdi before blockdev is unregistered.
  2015-04-29 16:02           ` Peter Zijlstra
@ 2015-04-30  0:06             ` NeilBrown
  2015-04-30  0:32             ` [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy() NeilBrown
  1 sibling, 0 replies; 21+ messages in thread
From: NeilBrown @ 2015-04-30  0:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Mike Snitzer, Jens Axboe, Azat Khuzhin,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml,
	device-mapper development

[-- Attachment #1: Type: text/plain, Size: 2458 bytes --]

On Wed, 29 Apr 2015 18:02:58 +0200 Peter Zijlstra <peterz@infradead.org>
wrote:

> On Wed, Apr 29, 2015 at 03:35:12PM +0200, Christoph Hellwig wrote:
> > On Wed, Apr 29, 2015 at 07:25:30AM +1000, NeilBrown wrote:
> > > As bdi_set_min_ratio doesn't touch bdi->dev, there seems to be no need for
> > > the test, or the warning.
> > > 
> > > I wonder if it would make sense to move the bdi_set_min_ratio() call to
> > > bdi_destroy, and discard bdi_unregister??
> > > There is a comment which suggests bdi_unregister might be of use later, but
> > > it might be best to have a clean slate in which to add whatever might be
> > > needed??
> > 
> > This seems fine to me from the block dev point of view.  I don't really
> > understand the bdi_min_ratio logic, but Peter might have a better idea.
> 
> Ah, that was a bit of digging, I've not looked at that in ages :-)
> 
> So if you look at bdi_dirty_limit()'s comment:
> 
>  * The bdi's share of dirty limit will be adapting to its throughput and
>  * bounded by the bdi->min_ratio and/or bdi->max_ratio parameters, if set.
> 
> So the min_ratio is a minimum guaranteed fraction of the total
> throughput.
> 
> Now the problem before commit ccb6108f5b0b ("mm/backing-dev.c: reset bdi
> min_ratio in bdi_unregister()") was that since bdi_set_min_ratio()
> keeps a global sum of bdi->min_ratio, you need to subtract from said
> global sum when taking the BDI away. Otherwise we loose/leak a fraction
> of the total throughput available (to the other BDIs).
> 
> Which is what that bdi_set_min_ratio(bdi, 0) in unregister does. It
> resets the min_ratio for the bdi being taken out and frees up the min
> allocated bandwidth for the others.
> 
> So I think moving that do destroy would be fine; assuming the delay
> between unregister and destroy is typically 'short'. Because without
> that you can 'leak' this min ratio for extended periods which means the
> bandwidth is unavailable for other BDIs.
> 
> Does that make sense?

Your assessment is almost exactly what I had come up with, so it definitely
makes sense :-)
'destroy' does come very shortly after 'unregister' (and immediately before
'blk_put_queue' which actually frees the struct).  However the driving force
for this patch was a desire to move blk_cleanup_queue(), which calls
'destroy', earlier.  So the net result is that bdi_set_min_ratio will be
called slightly sooner.

Thanks,
NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy()
  2015-04-29 16:02           ` Peter Zijlstra
  2015-04-30  0:06             ` NeilBrown
@ 2015-04-30  0:32             ` NeilBrown
  2015-04-30  8:35               ` Peter Zijlstra
  2015-05-06 16:11               ` [dm-devel] " Dan Williams
  1 sibling, 2 replies; 21+ messages in thread
From: NeilBrown @ 2015-04-30  0:32 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Peter Zijlstra, Christoph Hellwig, Mike Snitzer, Azat Khuzhin,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml,
	device-mapper development

[-- Attachment #1: Type: text/plain, Size: 4448 bytes --]


bdi_unregister() now contains very little functionality.

It contains a "WARN_ON" if bdi->dev is NULL.  This warning is of no
real consequence as bdi->dev isn't needed by anything else in the function,
and it triggers if
   blk_cleanup_queue() -> bdi_destroy()
is called before bdi_unregister, which a subsequent patch will make happen.
So this isn't wanted.

It also calls bdi_set_min_ratio().  This needs to be called after
writes through the bdi have all been flushed, and before the bdi is destroyed.
Calling it early is better than calling it late as it frees up a global
resource.

Calling it immediately after bdi_wb_shutdown() in bdi_destroy()
perfectly fits these requirements.

So bdi_unregister can be discarded with the important content moved to
bdi_destroy, as can the
  writeback_bdi_unregister
event which is already not used.

This is tagged for 'stable' as it is a pre-requisite for a subsequent
patch which moves calls to blk_cleanup_queue() before calls to
del_gendisk().  The commit identified as 'Fixes' removed a lot of
other functionality from bdi_unregister(), and made a change which
necessitated moving the blk_cleanup_queue() calls.

Reported-by: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org (v4.0)
Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
Signed-off-by: NeilBrown <neilb@suse.de>

---

Hi again Jens,
 would you be able to queue this patch *before* the other one:
   block: destroy bdi before blockdev is unregistered.

 If it has to come after I'll need to re-write the text a bit.
 If you could give me the commit hash to reference I'll do that.

diff --git a/block/genhd.c b/block/genhd.c
index e351fc521053..1d4435478e8a 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -657,7 +657,6 @@ void del_gendisk(struct gendisk *disk)
 	disk->flags &= ~GENHD_FL_UP;
 
 	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
-	bdi_unregister(&disk->queue->backing_dev_info);
 	blk_unregister_queue(disk);
 	blk_unregister_region(disk_devt(disk), disk->minors);
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index aff923ae8c4b..d87d8eced064 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -116,7 +116,6 @@ __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 		const char *fmt, ...);
 int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
-void bdi_unregister(struct backing_dev_info *bdi);
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
 			enum wb_reason reason);
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 880dd7437172..c178d13d6f4c 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -250,7 +250,6 @@ DEFINE_EVENT(writeback_class, name, \
 DEFINE_WRITEBACK_EVENT(writeback_nowork);
 DEFINE_WRITEBACK_EVENT(writeback_wake_background);
 DEFINE_WRITEBACK_EVENT(writeback_bdi_register);
-DEFINE_WRITEBACK_EVENT(writeback_bdi_unregister);
 
 DECLARE_EVENT_CLASS(wbc_class,
 	TP_PROTO(struct writeback_control *wbc, struct backing_dev_info *bdi),
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6dc4580df2af..000e7b3b9896 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -359,23 +359,6 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
 	flush_delayed_work(&bdi->wb.dwork);
 }
 
-/*
- * Called when the device behind @bdi has been removed or ejected.
- *
- * We can't really do much here except for reducing the dirty ratio at
- * the moment.  In the future we should be able to set a flag so that
- * the filesystem can handle errors at mark_inode_dirty time instead
- * of only at writeback time.
- */
-void bdi_unregister(struct backing_dev_info *bdi)
-{
-	if (WARN_ON_ONCE(!bdi->dev))
-		return;
-
-	bdi_set_min_ratio(bdi, 0);
-}
-EXPORT_SYMBOL(bdi_unregister);
-
 static void bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
 {
 	memset(wb, 0, sizeof(*wb));
@@ -443,6 +426,7 @@ void bdi_destroy(struct backing_dev_info *bdi)
 	int i;
 
 	bdi_wb_shutdown(bdi);
+	bdi_set_min_ratio(bdi, 0);
 
 	WARN_ON(!list_empty(&bdi->work_list));
 	WARN_ON(delayed_work_pending(&bdi->wb.dwork));

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy()
  2015-04-30  0:32             ` [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy() NeilBrown
@ 2015-04-30  8:35               ` Peter Zijlstra
  2015-05-06 16:11               ` [dm-devel] " Dan Williams
  1 sibling, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2015-04-30  8:35 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jens Axboe, Christoph Hellwig, Mike Snitzer, Azat Khuzhin,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml,
	device-mapper development

On Thu, Apr 30, 2015 at 10:32:33AM +1000, NeilBrown wrote:
> 
> bdi_unregister() now contains very little functionality.
> 
> It contains a "WARN_ON" if bdi->dev is NULL.  This warning is of no
> real consequence as bdi->dev isn't needed by anything else in the function,
> and it triggers if
>    blk_cleanup_queue() -> bdi_destroy()
> is called before bdi_unregister, which a subsequent patch will make happen.
> So this isn't wanted.
> 
> It also calls bdi_set_min_ratio().  This needs to be called after
> writes through the bdi have all been flushed, and before the bdi is destroyed.
> Calling it early is better than calling it late as it frees up a global
> resource.
> 
> Calling it immediately after bdi_wb_shutdown() in bdi_destroy()
> perfectly fits these requirements.
> 
> So bdi_unregister can be discarded with the important content moved to
> bdi_destroy, as can the
>   writeback_bdi_unregister
> event which is already not used.
> 
> This is tagged for 'stable' as it is a pre-requisite for a subsequent
> patch which moves calls to blk_cleanup_queue() before calls to
> del_gendisk().  The commit identified as 'Fixes' removed a lot of
> other functionality from bdi_unregister(), and made a change which
> necessitated moving the blk_cleanup_queue() calls.
> 
> Reported-by: Mike Snitzer <snitzer@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: stable@vger.kernel.org (v4.0)
> Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37

Fixes: c4db59d31e39 ("fs: don't reassign dirty inodes to default_backing_dev_info")

> Signed-off-by: NeilBrown <neilb@suse.de>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dm-devel] [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy()
  2015-04-30  0:32             ` [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy() NeilBrown
  2015-04-30  8:35               ` Peter Zijlstra
@ 2015-05-06 16:11               ` Dan Williams
  2015-05-08  5:09                 ` [PATCH v2] " NeilBrown
  1 sibling, 1 reply; 21+ messages in thread
From: Dan Williams @ 2015-05-06 16:11 UTC (permalink / raw)
  To: device-mapper development
  Cc: Jens Axboe, Jan Kara, Mike Snitzer, Azat Khuzhin, Peter Zijlstra,
	lkml, Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo,
	Christoph Hellwig, nicholas.w.moulin

On Wed, Apr 29, 2015 at 5:32 PM, NeilBrown <neilb@suse.de> wrote:
>
> bdi_unregister() now contains very little functionality.
>
> It contains a "WARN_ON" if bdi->dev is NULL.  This warning is of no
> real consequence as bdi->dev isn't needed by anything else in the function,
> and it triggers if
>    blk_cleanup_queue() -> bdi_destroy()
> is called before bdi_unregister, which a subsequent patch will make happen.
> So this isn't wanted.
>
> It also calls bdi_set_min_ratio().  This needs to be called after
> writes through the bdi have all been flushed, and before the bdi is destroyed.
> Calling it early is better than calling it late as it frees up a global
> resource.
>
> Calling it immediately after bdi_wb_shutdown() in bdi_destroy()
> perfectly fits these requirements.
>
> So bdi_unregister can be discarded with the important content moved to
> bdi_destroy, as can the
>   writeback_bdi_unregister
> event which is already not used.
>
> This is tagged for 'stable' as it is a pre-requisite for a subsequent
> patch which moves calls to blk_cleanup_queue() before calls to
> del_gendisk().  The commit identified as 'Fixes' removed a lot of
> other functionality from bdi_unregister(), and made a change which
> necessitated moving the blk_cleanup_queue() calls.
>
> Reported-by: Mike Snitzer <snitzer@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: stable@vger.kernel.org (v4.0)
> Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
> Signed-off-by: NeilBrown <neilb@suse.de>
>
> ---
>
> Hi again Jens,
>  would you be able to queue this patch *before* the other one:
>    block: destroy bdi before blockdev is unregistered.
>
>  If it has to come after I'll need to re-write the text a bit.
>  If you could give me the commit hash to reference I'll do that.

Seems it is after:

http://git.kernel.dk/?p=linux-block.git;a=commit;h=6cd18e71

Also, we gave both patches a try internally after seeing the duplicate
sysfs warning.  You can add:

Acked-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>

...on the re-send.

Thanks Neil!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2] block: discard bdi_unregister() in favour of bdi_destroy()
  2015-05-06 16:11               ` [dm-devel] " Dan Williams
@ 2015-05-08  5:09                 ` NeilBrown
  0 siblings, 0 replies; 21+ messages in thread
From: NeilBrown @ 2015-05-08  5:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Dan Williams, device-mapper development, Jan Kara, Mike Snitzer,
	Azat Khuzhin, Peter Zijlstra, lkml, Kernel.org-Linux-RAID,
	Guoqing Jiang, Tejun Heo, Christoph Hellwig, nicholas.w.moulin

[-- Attachment #1: Type: text/plain, Size: 4274 bytes --]



bdi_unregister() now contains very little functionality.

It contains a "WARN_ON" if bdi->dev is NULL.  This warning is of no
real consequence as bdi->dev isn't needed by anything else in the function,
and it triggers if
   blk_cleanup_queue() -> bdi_destroy()
is called before bdi_unregister, which happens since
  Commit: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.")

So this isn't wanted.

It also calls bdi_set_min_ratio().  This needs to be called after
writes through the bdi have all been flushed, and before the bdi is destroyed.
Calling it early is better than calling it late as it frees up a global
resource.

Calling it immediately after bdi_wb_shutdown() in bdi_destroy()
perfectly fits these requirements.

So bdi_unregister() can be discarded with the important content moved to
bdi_destroy(), as can the
  writeback_bdi_unregister
event which is already not used.

Reported-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org (v4.0)
Fixes: c4db59d31e39 ("fs: don't reassign dirty inodes to default_backing_dev_info")
Fixes: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.")
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

---

hi Jens,
 this is a revised version of the comment - no code change - make it suitable to add
to your linux-block tree.

Thanks,
NeilBrown


diff --git a/block/genhd.c b/block/genhd.c
index e351fc521053..1d4435478e8a 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -657,7 +657,6 @@ void del_gendisk(struct gendisk *disk)
 	disk->flags &= ~GENHD_FL_UP;
 
 	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
-	bdi_unregister(&disk->queue->backing_dev_info);
 	blk_unregister_queue(disk);
 	blk_unregister_region(disk_devt(disk), disk->minors);
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index aff923ae8c4b..d87d8eced064 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -116,7 +116,6 @@ __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 		const char *fmt, ...);
 int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
-void bdi_unregister(struct backing_dev_info *bdi);
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
 			enum wb_reason reason);
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 880dd7437172..c178d13d6f4c 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -250,7 +250,6 @@ DEFINE_EVENT(writeback_class, name, \
 DEFINE_WRITEBACK_EVENT(writeback_nowork);
 DEFINE_WRITEBACK_EVENT(writeback_wake_background);
 DEFINE_WRITEBACK_EVENT(writeback_bdi_register);
-DEFINE_WRITEBACK_EVENT(writeback_bdi_unregister);
 
 DECLARE_EVENT_CLASS(wbc_class,
 	TP_PROTO(struct writeback_control *wbc, struct backing_dev_info *bdi),
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6dc4580df2af..000e7b3b9896 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -359,23 +359,6 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
 	flush_delayed_work(&bdi->wb.dwork);
 }
 
-/*
- * Called when the device behind @bdi has been removed or ejected.
- *
- * We can't really do much here except for reducing the dirty ratio at
- * the moment.  In the future we should be able to set a flag so that
- * the filesystem can handle errors at mark_inode_dirty time instead
- * of only at writeback time.
- */
-void bdi_unregister(struct backing_dev_info *bdi)
-{
-	if (WARN_ON_ONCE(!bdi->dev))
-		return;
-
-	bdi_set_min_ratio(bdi, 0);
-}
-EXPORT_SYMBOL(bdi_unregister);
-
 static void bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
 {
 	memset(wb, 0, sizeof(*wb));
@@ -443,6 +426,7 @@ void bdi_destroy(struct backing_dev_info *bdi)
 	int i;
 
 	bdi_wb_shutdown(bdi);
+	bdi_set_min_ratio(bdi, 0);
 
 	WARN_ON(!list_empty(&bdi->work_list));
 	WARN_ON(delayed_work_pending(&bdi->wb.dwork));

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-05-08  5:09 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-14 17:15 BUG: unable to handle kernel NULL pointer dereference at sysfs_do_create_link_sd (after mdadm) Azat Khuzhin
2015-04-15  2:44 ` Guoqing Jiang
2015-04-15  8:47   ` Azat Khuzhin
2015-04-23  6:05 ` Bisected, with rfc/patch - was " NeilBrown
2015-04-23  7:37   ` Christoph Hellwig
2015-04-23  8:03     ` NeilBrown
2015-04-23 16:10       ` Christoph Hellwig
2015-04-24  2:09         ` NeilBrown
2015-04-24  8:27           ` Christoph Hellwig
2015-04-27  4:12   ` [PATCH -stable] block: destroy bdi before blockdev is unregistered NeilBrown
2015-04-27 13:03     ` Christoph Hellwig
2015-04-27 16:27     ` Jens Axboe
2015-04-28 16:41     ` Mike Snitzer
2015-04-28 21:25       ` NeilBrown
2015-04-29 13:35         ` Christoph Hellwig
2015-04-29 16:02           ` Peter Zijlstra
2015-04-30  0:06             ` NeilBrown
2015-04-30  0:32             ` [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy() NeilBrown
2015-04-30  8:35               ` Peter Zijlstra
2015-05-06 16:11               ` [dm-devel] " Dan Williams
2015-05-08  5:09                 ` [PATCH v2] " NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.