All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel oops on bcachefs umount, 6.7 kernel
@ 2024-02-01 21:52 Tony Asleson
  2024-02-15 16:55 ` Brian Foster
  0 siblings, 1 reply; 9+ messages in thread
From: Tony Asleson @ 2024-02-01 21:52 UTC (permalink / raw)
  To: linux-bcachefs

Fedora 39 with 6.7 kernel with bcachefs support

Steps
1. I created the bcachefs (using locally compiled git repo
f15633cce1b79e708e9debc21c7b8772df7c7a29)
# bcachefs format --replicas=2 /dev/sd[bcdej]
2. mounted FS and wrote some files to it
3. I removed one of the virtual disk drives from the VM, (/dev/sdj)
while the VM was running
4. I wrote some more data to bcachefs FS
5. unmounted the bcachefs FS

Got the following oops

....
[90806.970165] bcachefs (sdj): error writing journal entry 429: I/O
[90808.112301] bcachefs (sdj): error writing journal entry 430: I/O
[90808.122558] kworker/1:1H: attempt to access beyond end of device
               sdj: rw=137217, sector=18584, nr_sectors = 16 limit=0
[90808.122565] bcachefs (sdj): error writing journal entry 430: I/O
[90809.200254] bcachefs (sdj): error writing journal entry 431: I/O
[90810.287776] bcachefs (sdj): error writing journal entry 432: I/O
[90810.298269] kworker/1:1H: attempt to access beyond end of device
               sdj: rw=137217, sector=18600, nr_sectors = 16 limit=0
[90810.298274] bcachefs (sdj): error writing journal entry 432: I/O
[90811.439700] bcachefs (sdj): error writing journal entry 433: I/O
[90811.450090] kworker/1:1H: attempt to access beyond end of device
               sdj: rw=137217, sector=18616, nr_sectors = 16 limit=0
[90811.450094] bcachefs (sdj): error writing journal entry 433: I/O
[90812.528102] bcachefs (sdj): error writing journal entry 434: I/O
[90812.538847] kworker/1:1H: attempt to access beyond end of device
               sdj: rw=137217, sector=18632, nr_sectors = 16 limit=0
[90812.538854] bcachefs (sdj): error writing journal entry 434: I/O
[90813.616108] bcachefs (sdj): error writing journal entry 435: I/O
[90813.627032] kworker/1:1H: attempt to access beyond end of device
               sdj: rw=137217, sector=18648, nr_sectors = 16 limit=0
[90813.627043] bcachefs (sdj): error writing journal entry 435: I/O
[90814.768088] bcachefs (sdj): error writing journal entry 436: I/O
[90816.112053] bcachefs (sdj): error writing journal entry 437: I/O
[90816.114579] kworker/1:1H: attempt to access beyond end of device
               sdj: rw=137217, sector=18664, nr_sectors = 16 limit=0
[90816.114584] bcachefs (sdj): error writing journal entry 437: I/O
[90817.264105] bcachefs (sdj): error writing journal entry 438: I/O
[90817.276368] kworker/2:0H: attempt to access beyond end of device
               sdj: rw=137217, sector=18680, nr_sectors = 16 limit=0
[90817.276373] bcachefs (sdj): error writing journal entry 438: I/O
[90849.997509] bcachefs (sdj): error writing journal entry 439: I/O
[90850.010057] kworker/1:1H: attempt to access beyond end of device
               sdj: rw=137217, sector=18696, nr_sectors = 8 limit=0
[90850.010062] bcachefs (sdj): error writing journal entry 439: I/O
[90850.012743] umount: attempt to access beyond end of device
               sdj: rw=6144, sector=8, nr_sectors = 8 limit=0
[90850.012747] bcachefs (sdj): superblock read error: I/O
[90850.013664] ------------[ cut here ]------------
[90850.013666] kernfs: can not remove 'bcachefs', no directory
[90850.013671] WARNING: CPU: 4 PID: 6892 at fs/kernfs/dir.c:1662
kernfs_remove_by_name_ns+0xba/0xc0
[90850.013678] Modules linked in: binfmt_misc bcachefs lz4hc_compress
lz4_compress bcache nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill
ip_set nf_tables nfnetlink vfat fat snd_pcm snd_timer intel_rapl_msr
snd intel_rapl_common soundcore hv_netvsc hv_utils pcspkr hv_balloon
joydev fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel
polyval_clmulni serio_raw hv_storvsc polyval_generic scsi_transport_fc
hyperv_drm hyperv_keyboard hid_hyperv ghash_clmulni_intel sha512_ssse3
hv_vmbus sha256_ssse3 sha1_ssse3
[90850.013725] CPU: 4 PID: 6892 Comm: umount Not tainted 6.7.0 #4
[90850.013727] Hardware name: Microsoft Corporation Virtual
Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 04/06/2022
[90850.013728] RIP: 0010:kernfs_remove_by_name_ns+0xba/0xc0
[90850.013731] Code: bc 00 48 89 ef e8 26 a4 c5 ff 5b b8 fe ff ff ff
5d 41 5c 41 5d e9 b1 7c bc 00 0f 0b eb a6 48 c7 c7 20 cf 8e ac e8 66
3e bd ff <0f> 0b eb dc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
90 90
[90850.013733] RSP: 0018:ffffbd6c4cbb7de0 EFLAGS: 00010286
[90850.013734] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
[90850.013736] RDX: 0000000000000002 RSI: 0000000000000027 RDI: 00000000ffffffff
[90850.013737] RBP: ffff9b2d65fc0000 R08: 0000000000000000 R09: ffffbd6c4cbb7c68
[90850.013738] R10: 0000000000000003 R11: ffffffffad3443e8 R12: ffffffffc0c910fb
[90850.013739] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[90850.013740] FS:  00007fe8fbdb6800(0000) GS:ffff9b3427b00000(0000)
knlGS:0000000000000000
[90850.013742] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[90850.013743] CR2: 00007ffd7b335f98 CR3: 00000000d6512004 CR4: 0000000000370ef0
[90850.013746] Call Trace:
[90850.013748]  <TASK>
[90850.013749]  ? kernfs_remove_by_name_ns+0xba/0xc0
[90850.013751]  ? __warn+0x7d/0x130
[90850.013757]  ? kernfs_remove_by_name_ns+0xba/0xc0
[90850.013759]  ? report_bug+0x18d/0x1c0
[90850.013763]  ? srso_alias_return_thunk+0x5/0xfbef5
[90850.013765]  ? console_unlock+0x74/0x120
[90850.013769]  ? handle_bug+0x3c/0x80
[90850.013772]  ? exc_invalid_op+0x13/0x60
[90850.013774]  ? asm_exc_invalid_op+0x16/0x20
[90850.013780]  ? kernfs_remove_by_name_ns+0xba/0xc0
[90850.013783]  __bch2_fs_stop+0xc6/0x280 [bcachefs]
[90850.013819]  generic_shutdown_super+0x7c/0x110
[90850.013824]  bch2_kill_sb+0x12/0x20 [bcachefs]
[90850.013857]  deactivate_locked_super+0x2f/0xb0
[90850.013860]  cleanup_mnt+0xba/0x150
[90850.013863]  task_work_run+0x59/0x90
[90850.013867]  exit_to_user_mode_prepare+0x1e3/0x1f0
[90850.013871]  syscall_exit_to_user_mode+0x17/0x40
[90850.013874]  do_syscall_64+0x6c/0xe0
[90850.013876]  ? srso_alias_return_thunk+0x5/0xfbef5
[90850.013878]  ? syscall_exit_to_user_mode+0x27/0x40
[90850.013880]  ? srso_alias_return_thunk+0x5/0xfbef5
[90850.013882]  ? do_syscall_64+0x6c/0xe0
[90850.013883]  ? srso_alias_return_thunk+0x5/0xfbef5
[90850.013885]  ? exc_page_fault+0x7b/0x180
[90850.013887]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[90850.013890] RIP: 0033:0x7fe8fbfd652b
[90850.013899] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3
0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 d1 18 0c 00
f7 d8
[90850.013900] RSP: 002b:00007ffd7b337748 EFLAGS: 00000246 ORIG_RAX:
00000000000000a6
[90850.013902] RAX: 0000000000000000 RBX: 000055d8e3eab690 RCX: 00007fe8fbfd652b
[90850.013903] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055d8e3eb0010
[90850.013904] RBP: 00007ffd7b337820 R08: 000055d8e3ea6010 R09: 0000000000000007
[90850.013905] R10: 0000000000000000 R11: 0000000000000246 R12: 000055d8e3eab790
[90850.013906] R13: 0000000000000000 R14: 000055d8e3eb0010 R15: 000055d8e3eaffd0
[90850.013910]  </TASK>
[90850.013911] ---[ end trace 0000000000000000 ]---

Please CC as I'm not subscribed to this mailing list
-Tony


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel oops on bcachefs umount, 6.7 kernel
  2024-02-01 21:52 kernel oops on bcachefs umount, 6.7 kernel Tony Asleson
@ 2024-02-15 16:55 ` Brian Foster
  2024-02-16  0:24   ` Kent Overstreet
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Foster @ 2024-02-15 16:55 UTC (permalink / raw)
  To: Tony Asleson; +Cc: linux-bcachefs, linux-block

cc linux-block

On Thu, Feb 01, 2024 at 03:52:56PM -0600, Tony Asleson wrote:
> Fedora 39 with 6.7 kernel with bcachefs support
> 
> Steps
> 1. I created the bcachefs (using locally compiled git repo
> f15633cce1b79e708e9debc21c7b8772df7c7a29)
> # bcachefs format --replicas=2 /dev/sd[bcdej]
> 2. mounted FS and wrote some files to it
> 3. I removed one of the virtual disk drives from the VM, (/dev/sdj)
> while the VM was running
> 4. I wrote some more data to bcachefs FS
> 5. unmounted the bcachefs FS
> 
> Got the following oops
> 
> ....
> [90806.970165] bcachefs (sdj): error writing journal entry 429: I/O
...
> [90850.012743] umount: attempt to access beyond end of device
>                sdj: rw=6144, sector=8, nr_sectors = 8 limit=0
> [90850.012747] bcachefs (sdj): superblock read error: I/O
> [90850.013664] ------------[ cut here ]------------
> [90850.013666] kernfs: can not remove 'bcachefs', no directory
> [90850.013671] WARNING: CPU: 4 PID: 6892 at fs/kernfs/dir.c:1662
> kernfs_remove_by_name_ns+0xba/0xc0

Hi Tony,

Firstly note that this is a warning, not necessarily an oops or crash.
That aside, I am able to reproduce this pretty easily running a similar
test as above on one of my local VMs.

It looks like the cause of this is that bcachefs creates a
/sys/block/<dev>/bcachefs symlink on each block device associated with
the mount. We attempt to remove the link at unmount time, but the async
device removal has caused the sysfs parent dir to disappear and so the
removal codepath warns about a NULL parent dir/node.

It looks like the warning could be avoided in bcachefs by checking for
whether the parent dir/node still exists at cleanup time, but I'm not
familiar enough with kobj management to say whether that's the
right/best solution. It also looks a little odd to me to see a
/sys/block/<dev>/bcachefs dir when I've not seen any other fs or driver
do such a thing in the block sysfs dir(s).

Any thoughts on this from the block subsystem folks? Is it reasonable to
leave this link around and just fix the removal check, or is another
behavior preferred? Thanks.

Brian


> [90850.013678] Modules linked in: binfmt_misc bcachefs lz4hc_compress
> lz4_compress bcache nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill
> ip_set nf_tables nfnetlink vfat fat snd_pcm snd_timer intel_rapl_msr
> snd intel_rapl_common soundcore hv_netvsc hv_utils pcspkr hv_balloon
> joydev fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel
> polyval_clmulni serio_raw hv_storvsc polyval_generic scsi_transport_fc
> hyperv_drm hyperv_keyboard hid_hyperv ghash_clmulni_intel sha512_ssse3
> hv_vmbus sha256_ssse3 sha1_ssse3
> [90850.013725] CPU: 4 PID: 6892 Comm: umount Not tainted 6.7.0 #4
> [90850.013727] Hardware name: Microsoft Corporation Virtual
> Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 04/06/2022
> [90850.013728] RIP: 0010:kernfs_remove_by_name_ns+0xba/0xc0
> [90850.013731] Code: bc 00 48 89 ef e8 26 a4 c5 ff 5b b8 fe ff ff ff
> 5d 41 5c 41 5d e9 b1 7c bc 00 0f 0b eb a6 48 c7 c7 20 cf 8e ac e8 66
> 3e bd ff <0f> 0b eb dc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> 90 90
> [90850.013733] RSP: 0018:ffffbd6c4cbb7de0 EFLAGS: 00010286
> [90850.013734] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
> [90850.013736] RDX: 0000000000000002 RSI: 0000000000000027 RDI: 00000000ffffffff
> [90850.013737] RBP: ffff9b2d65fc0000 R08: 0000000000000000 R09: ffffbd6c4cbb7c68
> [90850.013738] R10: 0000000000000003 R11: ffffffffad3443e8 R12: ffffffffc0c910fb
> [90850.013739] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [90850.013740] FS:  00007fe8fbdb6800(0000) GS:ffff9b3427b00000(0000)
> knlGS:0000000000000000
> [90850.013742] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [90850.013743] CR2: 00007ffd7b335f98 CR3: 00000000d6512004 CR4: 0000000000370ef0
> [90850.013746] Call Trace:
> [90850.013748]  <TASK>
> [90850.013749]  ? kernfs_remove_by_name_ns+0xba/0xc0
> [90850.013751]  ? __warn+0x7d/0x130
> [90850.013757]  ? kernfs_remove_by_name_ns+0xba/0xc0
> [90850.013759]  ? report_bug+0x18d/0x1c0
> [90850.013763]  ? srso_alias_return_thunk+0x5/0xfbef5
> [90850.013765]  ? console_unlock+0x74/0x120
> [90850.013769]  ? handle_bug+0x3c/0x80
> [90850.013772]  ? exc_invalid_op+0x13/0x60
> [90850.013774]  ? asm_exc_invalid_op+0x16/0x20
> [90850.013780]  ? kernfs_remove_by_name_ns+0xba/0xc0
> [90850.013783]  __bch2_fs_stop+0xc6/0x280 [bcachefs]
> [90850.013819]  generic_shutdown_super+0x7c/0x110
> [90850.013824]  bch2_kill_sb+0x12/0x20 [bcachefs]
> [90850.013857]  deactivate_locked_super+0x2f/0xb0
> [90850.013860]  cleanup_mnt+0xba/0x150
> [90850.013863]  task_work_run+0x59/0x90
> [90850.013867]  exit_to_user_mode_prepare+0x1e3/0x1f0
> [90850.013871]  syscall_exit_to_user_mode+0x17/0x40
> [90850.013874]  do_syscall_64+0x6c/0xe0
> [90850.013876]  ? srso_alias_return_thunk+0x5/0xfbef5
> [90850.013878]  ? syscall_exit_to_user_mode+0x27/0x40
> [90850.013880]  ? srso_alias_return_thunk+0x5/0xfbef5
> [90850.013882]  ? do_syscall_64+0x6c/0xe0
> [90850.013883]  ? srso_alias_return_thunk+0x5/0xfbef5
> [90850.013885]  ? exc_page_fault+0x7b/0x180
> [90850.013887]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> [90850.013890] RIP: 0033:0x7fe8fbfd652b
> [90850.013899] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3
> 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 d1 18 0c 00
> f7 d8
> [90850.013900] RSP: 002b:00007ffd7b337748 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000a6
> [90850.013902] RAX: 0000000000000000 RBX: 000055d8e3eab690 RCX: 00007fe8fbfd652b
> [90850.013903] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055d8e3eb0010
> [90850.013904] RBP: 00007ffd7b337820 R08: 000055d8e3ea6010 R09: 0000000000000007
> [90850.013905] R10: 0000000000000000 R11: 0000000000000246 R12: 000055d8e3eab790
> [90850.013906] R13: 0000000000000000 R14: 000055d8e3eb0010 R15: 000055d8e3eaffd0
> [90850.013910]  </TASK>
> [90850.013911] ---[ end trace 0000000000000000 ]---
> 
> Please CC as I'm not subscribed to this mailing list
> -Tony
> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel oops on bcachefs umount, 6.7 kernel
  2024-02-15 16:55 ` Brian Foster
@ 2024-02-16  0:24   ` Kent Overstreet
  2024-02-16  8:00     ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Kent Overstreet @ 2024-02-16  0:24 UTC (permalink / raw)
  To: Brian Foster
  Cc: Tony Asleson, linux-bcachefs, linux-block, Greg Kroah-Hartman,
	Darrick J. Wong, Christoph Hellwig

On Thu, Feb 15, 2024 at 11:55:59AM -0500, Brian Foster wrote:
> cc linux-block
> 
> On Thu, Feb 01, 2024 at 03:52:56PM -0600, Tony Asleson wrote:
> > Fedora 39 with 6.7 kernel with bcachefs support
> > 
> > Steps
> > 1. I created the bcachefs (using locally compiled git repo
> > f15633cce1b79e708e9debc21c7b8772df7c7a29)
> > # bcachefs format --replicas=2 /dev/sd[bcdej]
> > 2. mounted FS and wrote some files to it
> > 3. I removed one of the virtual disk drives from the VM, (/dev/sdj)
> > while the VM was running
> > 4. I wrote some more data to bcachefs FS
> > 5. unmounted the bcachefs FS
> > 
> > Got the following oops
> > 
> > ....
> > [90806.970165] bcachefs (sdj): error writing journal entry 429: I/O
> ...
> > [90850.012743] umount: attempt to access beyond end of device
> >                sdj: rw=6144, sector=8, nr_sectors = 8 limit=0
> > [90850.012747] bcachefs (sdj): superblock read error: I/O
> > [90850.013664] ------------[ cut here ]------------
> > [90850.013666] kernfs: can not remove 'bcachefs', no directory
> > [90850.013671] WARNING: CPU: 4 PID: 6892 at fs/kernfs/dir.c:1662
> > kernfs_remove_by_name_ns+0xba/0xc0
> 
> Hi Tony,
> 
> Firstly note that this is a warning, not necessarily an oops or crash.
> That aside, I am able to reproduce this pretty easily running a similar
> test as above on one of my local VMs.
> 
> It looks like the cause of this is that bcachefs creates a
> /sys/block/<dev>/bcachefs symlink on each block device associated with
> the mount. We attempt to remove the link at unmount time, but the async
> device removal has caused the sysfs parent dir to disappear and so the
> removal codepath warns about a NULL parent dir/node.
> 
> It looks like the warning could be avoided in bcachefs by checking for
> whether the parent dir/node still exists at cleanup time, but I'm not
> familiar enough with kobj management to say whether that's the
> right/best solution. It also looks a little odd to me to see a
> /sys/block/<dev>/bcachefs dir when I've not seen any other fs or driver
> do such a thing in the block sysfs dir(s).
> 
> Any thoughts on this from the block subsystem folks? Is it reasonable to
> leave this link around and just fix the removal check, or is another
> behavior preferred? Thanks.

so there's an existing bd_holder mechanism that e.g. device mapper uses
for links between block devices. I think the "this block device is going
away" code knows how to clean those up.

We're not using that mechanism - and perhaps we should have been, I'd
need a time machine to ask myself why I did it that way 15 years back.

But rather than switching to that directly, this should probably be
getting handled for us by more standardized vfs/sysfs code.

block layer wise, all the bd_holder stuff has been getting improved
recently, that's one thing to look at; we might want the holder that's
passed to blkdev_get_by_path() to be directly reflected in sysfs.

And Greg and Darrick have also both been making noises about wanting to
unify some sysfs stuff too - maybe filesystems should be using the
blockdev holder thing that dm uses.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel oops on bcachefs umount, 6.7 kernel
  2024-02-16  0:24   ` Kent Overstreet
@ 2024-02-16  8:00     ` Christoph Hellwig
  2024-02-16 12:40       ` Brian Foster
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2024-02-16  8:00 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Brian Foster, Tony Asleson, linux-bcachefs, linux-block,
	Greg Kroah-Hartman, Darrick J. Wong, Tejun Heo

On Thu, Feb 15, 2024 at 07:24:23PM -0500, Kent Overstreet wrote:
> > It looks like the warning could be avoided in bcachefs by checking for
> > whether the parent dir/node still exists at cleanup time, but I'm not
> > familiar enough with kobj management to say whether that's the
> > right/best solution. It also looks a little odd to me to see a
> > /sys/block/<dev>/bcachefs dir when I've not seen any other fs or driver
> > do such a thing in the block sysfs dir(s).
> > 
> > Any thoughts on this from the block subsystem folks? Is it reasonable to
> > leave this link around and just fix the removal check, or is another
> > behavior preferred? Thanks.

This is the general problem with random cross-subsystem sysfs reference,
and why they are best avoided.  The block layer tears down all the sysfs
objects at del_gendisk time as no one should start using the sysfs files
at that point, but a mounted file system or other opener will of course
keep the bdev itself alive.

I'm not sure why bcachefs is doing this, but no one really should be
using the block layer sysfs structures and pointers except for the block
layer itself.  

> so there's an existing bd_holder mechanism that e.g. device mapper uses
> for links between block devices. I think the "this block device is going
> away" code knows how to clean those up.
> 
> We're not using that mechanism - and perhaps we should have been, I'd
> need a time machine to ask myself why I did it that way 15 years back.

Well, at least Tejun had a very strong opinion that no one should be
abusing sysfs symlinks for linking up subsystems at all, see commit
49731baa41df404c2c3f44555869ab387363af43, which is also why this code
is marked deprecated and we've not added additional users.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel oops on bcachefs umount, 6.7 kernel
  2024-02-16  8:00     ` Christoph Hellwig
@ 2024-02-16 12:40       ` Brian Foster
  2024-02-16 15:57         ` Kent Overstreet
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Foster @ 2024-02-16 12:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kent Overstreet, Tony Asleson, linux-bcachefs, linux-block,
	Greg Kroah-Hartman, Darrick J. Wong, Tejun Heo

On Fri, Feb 16, 2024 at 09:00:17AM +0100, Christoph Hellwig wrote:
> On Thu, Feb 15, 2024 at 07:24:23PM -0500, Kent Overstreet wrote:
> > > It looks like the warning could be avoided in bcachefs by checking for
> > > whether the parent dir/node still exists at cleanup time, but I'm not
> > > familiar enough with kobj management to say whether that's the
> > > right/best solution. It also looks a little odd to me to see a
> > > /sys/block/<dev>/bcachefs dir when I've not seen any other fs or driver
> > > do such a thing in the block sysfs dir(s).
> > > 
> > > Any thoughts on this from the block subsystem folks? Is it reasonable to
> > > leave this link around and just fix the removal check, or is another
> > > behavior preferred? Thanks.
> 
> This is the general problem with random cross-subsystem sysfs reference,
> and why they are best avoided.  The block layer tears down all the sysfs
> objects at del_gendisk time as no one should start using the sysfs files
> at that point, but a mounted file system or other opener will of course
> keep the bdev itself alive.
> 

Yeah, makes sense. The fact that the dir goes away despite having the
bdev open is partly what made this seem a little odd to me.

> I'm not sure why bcachefs is doing this, but no one really should be
> using the block layer sysfs structures and pointers except for the block
> layer itself.  
> 

From Kent's comments it sounds like it was just some loose carryover
from old bcache stuff. I had poked around a bit for anything similar and
it looked to me that current bcache doesn't do this either, but I could
have missed something.

> > so there's an existing bd_holder mechanism that e.g. device mapper uses
> > for links between block devices. I think the "this block device is going
> > away" code knows how to clean those up.
> > 
> > We're not using that mechanism - and perhaps we should have been, I'd
> > need a time machine to ask myself why I did it that way 15 years back.
> 
> Well, at least Tejun had a very strong opinion that no one should be
> abusing sysfs symlinks for linking up subsystems at all, see commit
> 49731baa41df404c2c3f44555869ab387363af43, which is also why this code
> is marked deprecated and we've not added additional users.
> 

Thanks. I'll send a patch to remove this once I'm back from vacation.

Brian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel oops on bcachefs umount, 6.7 kernel
  2024-02-16 12:40       ` Brian Foster
@ 2024-02-16 15:57         ` Kent Overstreet
  2024-02-21 12:39           ` Brian Foster
  0 siblings, 1 reply; 9+ messages in thread
From: Kent Overstreet @ 2024-02-16 15:57 UTC (permalink / raw)
  To: Brian Foster
  Cc: Christoph Hellwig, Tony Asleson, linux-bcachefs, linux-block,
	Greg Kroah-Hartman, Darrick J. Wong, Tejun Heo

On Fri, Feb 16, 2024 at 07:40:34AM -0500, Brian Foster wrote:
> On Fri, Feb 16, 2024 at 09:00:17AM +0100, Christoph Hellwig wrote:
> > On Thu, Feb 15, 2024 at 07:24:23PM -0500, Kent Overstreet wrote:
> > > > It looks like the warning could be avoided in bcachefs by checking for
> > > > whether the parent dir/node still exists at cleanup time, but I'm not
> > > > familiar enough with kobj management to say whether that's the
> > > > right/best solution. It also looks a little odd to me to see a
> > > > /sys/block/<dev>/bcachefs dir when I've not seen any other fs or driver
> > > > do such a thing in the block sysfs dir(s).
> > > > 
> > > > Any thoughts on this from the block subsystem folks? Is it reasonable to
> > > > leave this link around and just fix the removal check, or is another
> > > > behavior preferred? Thanks.
> > 
> > This is the general problem with random cross-subsystem sysfs reference,
> > and why they are best avoided.  The block layer tears down all the sysfs
> > objects at del_gendisk time as no one should start using the sysfs files
> > at that point, but a mounted file system or other opener will of course
> > keep the bdev itself alive.
> > 
> 
> Yeah, makes sense. The fact that the dir goes away despite having the
> bdev open is partly what made this seem a little odd to me.
> 
> > I'm not sure why bcachefs is doing this, but no one really should be
> > using the block layer sysfs structures and pointers except for the block
> > layer itself.  
> > 
> 
> From Kent's comments it sounds like it was just some loose carryover
> from old bcache stuff. I had poked around a bit for anything similar and
> it looked to me that current bcache doesn't do this either, but I could
> have missed something.
> 
> > > so there's an existing bd_holder mechanism that e.g. device mapper uses
> > > for links between block devices. I think the "this block device is going
> > > away" code knows how to clean those up.
> > > 
> > > We're not using that mechanism - and perhaps we should have been, I'd
> > > need a time machine to ask myself why I did it that way 15 years back.
> > 
> > Well, at least Tejun had a very strong opinion that no one should be
> > abusing sysfs symlinks for linking up subsystems at all, see commit
> > 49731baa41df404c2c3f44555869ab387363af43, which is also why this code
> > is marked deprecated and we've not added additional users.
> > 
> 
> Thanks. I'll send a patch to remove this once I'm back from vacation.

No, we can't remove it - userspace needs to know this topology. When
we've got one sysfs node with a direct relationship to another sysfs
node, that needs to be reflected in syfs.

What I'm hoping for is that we can get filesystems in general to do this
right, not just bcachefs, so that they finally start showing up in the
tree lsblk reports.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel oops on bcachefs umount, 6.7 kernel
  2024-02-16 15:57         ` Kent Overstreet
@ 2024-02-21 12:39           ` Brian Foster
  2024-02-22  0:07             ` Kent Overstreet
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Foster @ 2024-02-21 12:39 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, Tony Asleson, linux-bcachefs, linux-block,
	Greg Kroah-Hartman, Darrick J. Wong, Tejun Heo

On Fri, Feb 16, 2024 at 10:57:24AM -0500, Kent Overstreet wrote:
> On Fri, Feb 16, 2024 at 07:40:34AM -0500, Brian Foster wrote:
> > On Fri, Feb 16, 2024 at 09:00:17AM +0100, Christoph Hellwig wrote:
> > > On Thu, Feb 15, 2024 at 07:24:23PM -0500, Kent Overstreet wrote:
> > > > > It looks like the warning could be avoided in bcachefs by checking for
> > > > > whether the parent dir/node still exists at cleanup time, but I'm not
> > > > > familiar enough with kobj management to say whether that's the
> > > > > right/best solution. It also looks a little odd to me to see a
> > > > > /sys/block/<dev>/bcachefs dir when I've not seen any other fs or driver
> > > > > do such a thing in the block sysfs dir(s).
> > > > > 
> > > > > Any thoughts on this from the block subsystem folks? Is it reasonable to
> > > > > leave this link around and just fix the removal check, or is another
> > > > > behavior preferred? Thanks.
> > > 
> > > This is the general problem with random cross-subsystem sysfs reference,
> > > and why they are best avoided.  The block layer tears down all the sysfs
> > > objects at del_gendisk time as no one should start using the sysfs files
> > > at that point, but a mounted file system or other opener will of course
> > > keep the bdev itself alive.
> > > 
> > 
> > Yeah, makes sense. The fact that the dir goes away despite having the
> > bdev open is partly what made this seem a little odd to me.
> > 
> > > I'm not sure why bcachefs is doing this, but no one really should be
> > > using the block layer sysfs structures and pointers except for the block
> > > layer itself.  
> > > 
> > 
> > From Kent's comments it sounds like it was just some loose carryover
> > from old bcache stuff. I had poked around a bit for anything similar and
> > it looked to me that current bcache doesn't do this either, but I could
> > have missed something.
> > 
> > > > so there's an existing bd_holder mechanism that e.g. device mapper uses
> > > > for links between block devices. I think the "this block device is going
> > > > away" code knows how to clean those up.
> > > > 
> > > > We're not using that mechanism - and perhaps we should have been, I'd
> > > > need a time machine to ask myself why I did it that way 15 years back.
> > > 
> > > Well, at least Tejun had a very strong opinion that no one should be
> > > abusing sysfs symlinks for linking up subsystems at all, see commit
> > > 49731baa41df404c2c3f44555869ab387363af43, which is also why this code
> > > is marked deprecated and we've not added additional users.
> > > 
> > 
> > Thanks. I'll send a patch to remove this once I'm back from vacation.
> 
> No, we can't remove it - userspace needs to know this topology. When
> we've got one sysfs node with a direct relationship to another sysfs
> node, that needs to be reflected in syfs.
> 

Do you mean that some related userspace tool relies on this to function,
or generally disagreeing with the statement(s) above around links from
/sys/block/<dev>/?

Brian

> What I'm hoping for is that we can get filesystems in general to do this
> right, not just bcachefs, so that they finally start showing up in the
> tree lsblk reports.
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel oops on bcachefs umount, 6.7 kernel
  2024-02-21 12:39           ` Brian Foster
@ 2024-02-22  0:07             ` Kent Overstreet
  2024-02-22 13:23               ` Brian Foster
  0 siblings, 1 reply; 9+ messages in thread
From: Kent Overstreet @ 2024-02-22  0:07 UTC (permalink / raw)
  To: Brian Foster
  Cc: Christoph Hellwig, Tony Asleson, linux-bcachefs, linux-block,
	Greg Kroah-Hartman, Darrick J. Wong, Tejun Heo

On Wed, Feb 21, 2024 at 07:39:49AM -0500, Brian Foster wrote:
> On Fri, Feb 16, 2024 at 10:57:24AM -0500, Kent Overstreet wrote:
> > On Fri, Feb 16, 2024 at 07:40:34AM -0500, Brian Foster wrote:
> > > On Fri, Feb 16, 2024 at 09:00:17AM +0100, Christoph Hellwig wrote:
> > > > On Thu, Feb 15, 2024 at 07:24:23PM -0500, Kent Overstreet wrote:
> > > > > > It looks like the warning could be avoided in bcachefs by checking for
> > > > > > whether the parent dir/node still exists at cleanup time, but I'm not
> > > > > > familiar enough with kobj management to say whether that's the
> > > > > > right/best solution. It also looks a little odd to me to see a
> > > > > > /sys/block/<dev>/bcachefs dir when I've not seen any other fs or driver
> > > > > > do such a thing in the block sysfs dir(s).
> > > > > > 
> > > > > > Any thoughts on this from the block subsystem folks? Is it reasonable to
> > > > > > leave this link around and just fix the removal check, or is another
> > > > > > behavior preferred? Thanks.
> > > > 
> > > > This is the general problem with random cross-subsystem sysfs reference,
> > > > and why they are best avoided.  The block layer tears down all the sysfs
> > > > objects at del_gendisk time as no one should start using the sysfs files
> > > > at that point, but a mounted file system or other opener will of course
> > > > keep the bdev itself alive.
> > > > 
> > > 
> > > Yeah, makes sense. The fact that the dir goes away despite having the
> > > bdev open is partly what made this seem a little odd to me.
> > > 
> > > > I'm not sure why bcachefs is doing this, but no one really should be
> > > > using the block layer sysfs structures and pointers except for the block
> > > > layer itself.  
> > > > 
> > > 
> > > From Kent's comments it sounds like it was just some loose carryover
> > > from old bcache stuff. I had poked around a bit for anything similar and
> > > it looked to me that current bcache doesn't do this either, but I could
> > > have missed something.
> > > 
> > > > > so there's an existing bd_holder mechanism that e.g. device mapper uses
> > > > > for links between block devices. I think the "this block device is going
> > > > > away" code knows how to clean those up.
> > > > > 
> > > > > We're not using that mechanism - and perhaps we should have been, I'd
> > > > > need a time machine to ask myself why I did it that way 15 years back.
> > > > 
> > > > Well, at least Tejun had a very strong opinion that no one should be
> > > > abusing sysfs symlinks for linking up subsystems at all, see commit
> > > > 49731baa41df404c2c3f44555869ab387363af43, which is also why this code
> > > > is marked deprecated and we've not added additional users.
> > > > 
> > > 
> > > Thanks. I'll send a patch to remove this once I'm back from vacation.
> > 
> > No, we can't remove it - userspace needs to know this topology. When
> > we've got one sysfs node with a direct relationship to another sysfs
> > node, that needs to be reflected in syfs.
> > 
> 
> Do you mean that some related userspace tool relies on this to function,
> or generally disagreeing with the statement(s) above around links from
> /sys/block/<dev>/?

I would have to do some digging to give a definitive answer to that
question.

bcache definitely needed those links (IIRC in udev rules?) - bcachefs
may not, but we haven't even started on integrating bcachefs with all
the userspace disk management tooling out there.

And this is stuff that userspace does in general need; if we're getting
by without it for other filesystems right now it's probably by doing
something horrid like parsing /proc/mounts; if we could get filesystems
using the same bd_holder stuff that other block layer stuff uses, that
would be _amazing_

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel oops on bcachefs umount, 6.7 kernel
  2024-02-22  0:07             ` Kent Overstreet
@ 2024-02-22 13:23               ` Brian Foster
  0 siblings, 0 replies; 9+ messages in thread
From: Brian Foster @ 2024-02-22 13:23 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, Tony Asleson, linux-bcachefs, linux-block,
	Greg Kroah-Hartman, Darrick J. Wong, Tejun Heo

On Wed, Feb 21, 2024 at 07:07:37PM -0500, Kent Overstreet wrote:
> On Wed, Feb 21, 2024 at 07:39:49AM -0500, Brian Foster wrote:
> > On Fri, Feb 16, 2024 at 10:57:24AM -0500, Kent Overstreet wrote:
> > > On Fri, Feb 16, 2024 at 07:40:34AM -0500, Brian Foster wrote:
> > > > On Fri, Feb 16, 2024 at 09:00:17AM +0100, Christoph Hellwig wrote:
> > > > > On Thu, Feb 15, 2024 at 07:24:23PM -0500, Kent Overstreet wrote:
> > > > > > > It looks like the warning could be avoided in bcachefs by checking for
> > > > > > > whether the parent dir/node still exists at cleanup time, but I'm not
> > > > > > > familiar enough with kobj management to say whether that's the
> > > > > > > right/best solution. It also looks a little odd to me to see a
> > > > > > > /sys/block/<dev>/bcachefs dir when I've not seen any other fs or driver
> > > > > > > do such a thing in the block sysfs dir(s).
> > > > > > > 
> > > > > > > Any thoughts on this from the block subsystem folks? Is it reasonable to
> > > > > > > leave this link around and just fix the removal check, or is another
> > > > > > > behavior preferred? Thanks.
> > > > > 
> > > > > This is the general problem with random cross-subsystem sysfs reference,
> > > > > and why they are best avoided.  The block layer tears down all the sysfs
> > > > > objects at del_gendisk time as no one should start using the sysfs files
> > > > > at that point, but a mounted file system or other opener will of course
> > > > > keep the bdev itself alive.
> > > > > 
> > > > 
> > > > Yeah, makes sense. The fact that the dir goes away despite having the
> > > > bdev open is partly what made this seem a little odd to me.
> > > > 
> > > > > I'm not sure why bcachefs is doing this, but no one really should be
> > > > > using the block layer sysfs structures and pointers except for the block
> > > > > layer itself.  
> > > > > 
> > > > 
> > > > From Kent's comments it sounds like it was just some loose carryover
> > > > from old bcache stuff. I had poked around a bit for anything similar and
> > > > it looked to me that current bcache doesn't do this either, but I could
> > > > have missed something.
> > > > 
> > > > > > so there's an existing bd_holder mechanism that e.g. device mapper uses
> > > > > > for links between block devices. I think the "this block device is going
> > > > > > away" code knows how to clean those up.
> > > > > > 
> > > > > > We're not using that mechanism - and perhaps we should have been, I'd
> > > > > > need a time machine to ask myself why I did it that way 15 years back.
> > > > > 
> > > > > Well, at least Tejun had a very strong opinion that no one should be
> > > > > abusing sysfs symlinks for linking up subsystems at all, see commit
> > > > > 49731baa41df404c2c3f44555869ab387363af43, which is also why this code
> > > > > is marked deprecated and we've not added additional users.
> > > > > 
> > > > 
> > > > Thanks. I'll send a patch to remove this once I'm back from vacation.
> > > 
> > > No, we can't remove it - userspace needs to know this topology. When
> > > we've got one sysfs node with a direct relationship to another sysfs
> > > node, that needs to be reflected in syfs.
> > > 
> > 
> > Do you mean that some related userspace tool relies on this to function,
> > or generally disagreeing with the statement(s) above around links from
> > /sys/block/<dev>/?
> 
> I would have to do some digging to give a definitive answer to that
> question.
> 
> bcache definitely needed those links (IIRC in udev rules?) - bcachefs
> may not, but we haven't even started on integrating bcachefs with all
> the userspace disk management tooling out there.
> 

I don't know bcache that well, but I didn't see anything obviously
putting links in the bdev dir when I last looked. It did look like it
created some links between its own sysfs dirs, but maybe I
misinterpreted the code.

I don't think that any other fs with its own /sys/fs/ presence (i.e.,
XFS, ext4, btrfs) does this sort of thing, so I'm a little skeptical of
the idea that userspace currently needs it (not necessarily that it
couldn't use it for the better in the future).

> And this is stuff that userspace does in general need; if we're getting
> by without it for other filesystems right now it's probably by doing
> something horrid like parsing /proc/mounts; if we could get filesystems
> using the same bd_holder stuff that other block layer stuff uses, that
> would be _amazing_
> 

Ok, but that sounds contradictory to what the block layer folks want.

I dunno, it seems to me that the notions of better general coordination
between /sys/block and /sys/fs and that of bcachefs' current behavior
being wrong are not mutually exclusive things. But I'll just leave it
alone until there's some more clarity here..

Brian


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-02-22 13:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-01 21:52 kernel oops on bcachefs umount, 6.7 kernel Tony Asleson
2024-02-15 16:55 ` Brian Foster
2024-02-16  0:24   ` Kent Overstreet
2024-02-16  8:00     ` Christoph Hellwig
2024-02-16 12:40       ` Brian Foster
2024-02-16 15:57         ` Kent Overstreet
2024-02-21 12:39           ` Brian Foster
2024-02-22  0:07             ` Kent Overstreet
2024-02-22 13:23               ` Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.