All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG xfs_buf while running tests/xfs/435 (next-20220715)
@ 2022-07-18  6:31 ` Sachin Sant
  0 siblings, 0 replies; 6+ messages in thread
From: Sachin Sant @ 2022-07-18  6:31 UTC (permalink / raw)
  To: dchinner, linux-xfs; +Cc: linuxppc-dev, linux-next, riteshh

While running xfstests (specifically xfs/435) on a IBM Power server booted with
5.19.0-rc6-next-20220715 following warnings are seen:


[  110.954136] XFS (sdb2): Unmounting Filesystem
[  110.968860] XFS (sdb1): Unmounting Filesystem
[  111.115807] =============================================================================
[  111.115817] BUG xfs_buf (Tainted: G            E     ): Objects remaining in xfs_buf on __kmem_cache_shutdown()
[  111.115824] -----------------------------------------------------------------------------
[  111.115824] 
[  111.115828] Slab 0x0000000074cdc09a objects=170 used=1 fp=0x000000005f24a5e1 flags=0x13ffff800000200(slab|node=1|zone=0|lastcpupid=0x7ffff)
[  111.115840] CPU: 26 PID: 4704 Comm: modprobe Tainted: G            E      5.19.0-rc6-next-20220715 #3
[  111.115849] Call Trace:
[  111.115852] [c00000002985b9a0] [c000000000830bec] dump_stack_lvl+0x70/0xa4 (unreliable)
[  111.115867] [c00000002985b9e0] [c0000000004ef6f8] slab_err+0xd8/0xf0
[  111.115877] [c00000002985bad0] [c0000000004f6cbc] __kmem_cache_shutdown+0x1fc/0x560
[  111.115884] [c00000002985bbf0] [c0000000004534c8] kmem_cache_destroy+0xa8/0x1f0
[  111.115893] [c00000002985bc80] [c00800000ccf30e4] xfs_buf_terminate+0x2c/0x48 [xfs]
[  111.115977] [c00000002985bca0] [c00800000cd6f55c] exit_xfs_fs+0x90/0x20b34 [xfs]
[  111.116045] [c00000002985bcd0] [c00000000023b7e0] sys_delete_module+0x1e0/0x3c0
[  111.116053] [c00000002985bdb0] [c00000000003302c] system_call_exception+0x17c/0x350
[  111.116062] [c00000002985be10] [c00000000000c53c] system_call_common+0xec/0x270
[  111.116070] --- interrupt: c00 at 0x7fff8c158b88
[  111.116075] NIP:  00007fff8c158b88 LR: 000000013adb0398 CTR: 0000000000000000
[  111.116080] REGS: c00000002985be80 TRAP: 0c00   Tainted: G            E       (5.19.0-rc6-next-20220715)
[  111.116086] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 24008282  XER: 00000000
[  111.116103] IRQMASK: 0 
[  111.116103] GPR00: 0000000000000081 00007ffffe17dff0 00007fff8c227300 000001003f2f0c18 
[  111.116103] GPR04: 0000000000000800 000000000000000a 1999999999999999 0000000000000000 
[  111.116103] GPR08: 00007fff8c1b7830 0000000000000000 0000000000000000 0000000000000000 
[  111.116103] GPR12: 0000000000000000 00007fff8c72ca50 000000013adba650 000000013adba648 
[  111.116103] GPR16: 0000000000000000 0000000000000001 0000000000000000 000000013adba428 
[  111.116103] GPR20: 000000013ade0068 0000000000000000 00007ffffe17f948 000001003f2f02a0 
[  111.116103] GPR24: 0000000000000000 00007ffffe17f948 000001003f2f0c18 0000000000000000 
[  111.116103] GPR28: 0000000000000000 000001003f2f0bb0 000001003f2f0c18 000001003f2f0bb0 
[  111.116162] NIP [00007fff8c158b88] 0x7fff8c158b88
[  111.116166] LR [000000013adb0398] 0x13adb0398
[  111.116170] --- interrupt: c00
[  111.116173] Disabling lock debugging due to kernel taint
[  111.116184] Object 0x000000007e079655 @offset=18816
[  111.116189] =============================================================================
[  111.116193] BUG xfs_buf (Tainted: G    B       E     ): Objects remaining in xfs_buf on __kmem_cache_shutdown()
[  111.116198] -----------------------------------------------------------------------------
[  111.116198] 
[  111.116202] Slab 0x000000008186f78a objects=170 used=12 fp=0x000000008233ac7d flags=0x13ffff800000200(slab|node=1|zone=0|lastcpupid=0x7ffff)
[  111.116210] CPU: 26 PID: 4704 Comm: modprobe Tainted: G    B       E      5.19.0-rc6-next-20220715 #3
[  111.116216] Call Trace:
[  111.116218] [c00000002985b9a0] [c000000000830bec] dump_stack_lvl+0x70/0xa4 (unreliable)
[  111.116227] [c00000002985b9e0] [c0000000004ef6f8] slab_err+0xd8/0xf0
[  111.116234] [c00000002985bad0] [c0000000004f6cbc] __kmem_cache_shutdown+0x1fc/0x560
[  111.116241] [c00000002985bbf0] [c0000000004534c8] kmem_cache_destroy+0xa8/0x1f0
[  111.116248] [c00000002985bc80] [c00800000ccf30e4] xfs_buf_terminate+0x2c/0x48 [xfs]
[  111.116312] [c00000002985bca0] [c00800000cd6f55c] exit_xfs_fs+0x90/0x20b34 [xfs]
[  111.116379] [c00000002985bcd0] [c00000000023b7e0] sys_delete_module+0x1e0/0x3c0
[  111.116386] [c00000002985bdb0] [c00000000003302c] system_call_exception+0x17c/0x350
[  111.116392] [c00000002985be10] [c00000000000c53c] system_call_common+0xec/0x270
[  111.116400] --- interrupt: c00 at 0x7fff8c158b88
[  111.116404] NIP:  00007fff8c158b88 LR: 000000013adb0398 CTR: 0000000000000000
[  111.116409] REGS: c00000002985be80 TRAP: 0c00   Tainted: G    B       E       (5.19.0-rc6-next-20220715)
[  111.116414] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 24008282  XER: 00000000
[  111.116430] IRQMASK: 0 
[  111.116430] GPR00: 0000000000000081 00007ffffe17dff0 00007fff8c227300 000001003f2f0c18 
[  111.116430] GPR04: 0000000000000800 000000000000000a 1999999999999999 0000000000000000 
[  111.116430] GPR08: 00007fff8c1b7830 0000000000000000 0000000000000000 0000000000000000 
[  111.116430] GPR12: 0000000000000000 00007fff8c72ca50 000000013adba650 000000013adba648 
[  111.116430] GPR16: 0000000000000000 0000000000000001 0000000000000000 000000013adba428 
[  111.116430] GPR20: 000000013ade0068 0000000000000000 00007ffffe17f948 000001003f2f02a0 
[  111.116430] GPR24: 0000000000000000 00007ffffe17f948 000001003f2f0c18 0000000000000000 
[  111.116430] GPR28: 0000000000000000 000001003f2f0bb0 000001003f2f0c18 000001003f2f0bb0 
[  111.116488] NIP [00007fff8c158b88] 0x7fff8c158b88
[  111.116492] LR [000000013adb0398] 0x13adb0398
[  111.116496] --- interrupt: c00
[  111.116504] Object 0x000000002b93c535 @offset=5376
[  111.116508] Object 0x000000009be4058b @offset=16896
[  111.116511] Object 0x00000000c1d5c895 @offset=24960
[  111.116515] Object 0x0000000097fb6f84 @offset=30336
[  111.116518] Object 0x00000000213fb535 @offset=43008
[  111.116521] Object 0x0000000045473fa3 @offset=43392
[  111.116525] Object 0x000000006462ef89 @offset=44160
[  111.116528] Object 0x000000000c85ce0b @offset=44544
[  111.116531] Object 0x0000000059166af4 @offset=45312
[  111.116535] Object 0x00000000e7b40b45 @offset=46848
[  111.116538] Object 0x00000000bc6ce716 @offset=54528
[  111.116541] Object 0x000000005f7be1fa @offset=64512
[  111.116546] ------------[ cut here ]------------
[  111.116549] kmem_cache_destroy xfs_buf: Slab cache still has objects when called from xfs_buf_terminate+0x2c/0x48 [xfs]
[  111.116622] WARNING: CPU: 26 PID: 4704 at mm/slab_common.c:507 kmem_cache_destroy+0x1d8/0x1f0
[  111.116634] Modules linked in: xfs(E-) dm_mod(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) bonding(E) tls(E) nf_tables(E) rfkill(E) libcrc32c(E) nfnetlink(E) sunrpc(E) pseries_rng(E) vmx_crypto(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) sd_mod(E) cdrom(E) sg(E) lpfc(E) nvmet_fc(E) nvmet(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) nvme_fc(E) nvme(E) nvme_fabrics(E) nvme_core(E) t10_pi(E) scsi_transport_fc(E) crc64_rocksoft(E) crc64(E) tg3(E) ipmi_devintf(E) ipmi_msghandler(E) fuse(E) [last unloaded: xfs(E)]
[  111.116705] CPU: 26 PID: 4704 Comm: modprobe Tainted: G    B       E      5.19.0-rc6-next-20220715 #3
[  111.116711] NIP:  c0000000004535f8 LR: c0000000004535f4 CTR: 00000000ffffffde
[  111.116716] REGS: c00000002985b950 TRAP: 0700   Tainted: G    B       E       (5.19.0-rc6-next-20220715)
[  111.116722] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 48008824  XER: 20040005
[  111.116736] CFAR: c000000000154534 IRQMASK: 0 
[  111.116736] GPR00: c0000000004535f4 c00000002985bbf0 c000000002a8f600 000000000000006b 
[  111.116736] GPR04: 00000000ffff7fff c00000002985b9b0 c00000002985b9a8 0000000000000027 
[  111.116736] GPR08: c000000db9a07f98 0000000000000001 0000000000000027 c000000002947148 
[  111.116736] GPR12: 0000000000008000 c00000001ec20700 000000013adba650 000000013adba648 
[  111.116736] GPR16: 0000000000000000 0000000000000001 0000000000000000 000000013adba428 
[  111.116736] GPR20: 000000013ade0068 0000000000000000 00007ffffe17f948 000001003f2f02a0 
[  111.116736] GPR24: 00007fff8c1b7830 0000000000000000 1999999999999999 000000000000000a 
[  111.116736] GPR28: c000000002980228 c000000002980230 c00800000ccf30e4 c000000006f9f600 
[  111.116796] NIP [c0000000004535f8] kmem_cache_destroy+0x1d8/0x1f0
[  111.116802] LR [c0000000004535f4] kmem_cache_destroy+0x1d4/0x1f0
[  111.116808] Call Trace:
[  111.116810] [c00000002985bbf0] [c0000000004535f4] kmem_cache_destroy+0x1d4/0x1f0 (unreliable)
[  111.116818] [c00000002985bc80] [c00800000ccf30e4] xfs_buf_terminate+0x2c/0x48 [xfs]
[  111.116882] [c00000002985bca0] [c00800000cd6f55c] exit_xfs_fs+0x90/0x20b34 [xfs]
[  111.116949] [c00000002985bcd0] [c00000000023b7e0] sys_delete_module+0x1e0/0x3c0
[  111.116955] [c00000002985bdb0] [c00000000003302c] system_call_exception+0x17c/0x350
[  111.116962] [c00000002985be10] [c00000000000c53c] system_call_common+0xec/0x270
[  111.116970] --- interrupt: c00 at 0x7fff8c158b88
[  111.116974] NIP:  00007fff8c158b88 LR: 000000013adb0398 CTR: 0000000000000000
[  111.116979] REGS: c00000002985be80 TRAP: 0c00   Tainted: G    B       E       (5.19.0-rc6-next-20220715)
[  111.116984] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 24008282  XER: 00000000
[  111.116999] IRQMASK: 0 
[  111.116999] GPR00: 0000000000000081 00007ffffe17dff0 00007fff8c227300 000001003f2f0c18 
[  111.116999] GPR04: 0000000000000800 000000000000000a 1999999999999999 0000000000000000 
[  111.116999] GPR08: 00007fff8c1b7830 0000000000000000 0000000000000000 0000000000000000 
[  111.116999] GPR12: 0000000000000000 00007fff8c72ca50 000000013adba650 000000013adba648 
[  111.116999] GPR16: 0000000000000000 0000000000000001 0000000000000000 000000013adba428 
[  111.116999] GPR20: 000000013ade0068 0000000000000000 00007ffffe17f948 000001003f2f02a0 
[  111.116999] GPR24: 0000000000000000 00007ffffe17f948 000001003f2f0c18 0000000000000000 
[  111.116999] GPR28: 0000000000000000 000001003f2f0bb0 000001003f2f0c18 000001003f2f0bb0 
[  111.117057] NIP [00007fff8c158b88] 0x7fff8c158b88
[  111.117061] LR [000000013adb0398] 0x13adb0398
[  111.117065] --- interrupt: c00
[  111.117068] Instruction dump:
[  111.117071] 7fe3fb78 480a463d 60000000 4bfffeb8 3c82fe49 3c62fe76 e8bf0060 7fc6f378 
[  111.117082] 38849110 38630410 4bd00edd 60000000 <0fe00000> fb410060 fb610068 60000000 
[  111.117093] ---[ end trace 0000000000000000 ]---
[  111.974215] SGI XFS with ACLs, security attributes, scrub, quota, no debug enabled
[  111.983722] XFS (sdb1): Mounting V5 Filesystem
[  112.008668] XFS (sdb1): Ending clean mount
[  112.010988] xfs filesystem being mounted at /mnt/test supports timestamps until 2038 (0x7fffffff)

Git bisect leads me to following patch:

commit 298f34224506
      xfs: lockless buffer lookup

Reverting the patch allows the test to run correctly without any warning.

- Sachin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* BUG xfs_buf while running tests/xfs/435 (next-20220715)
@ 2022-07-18  6:31 ` Sachin Sant
  0 siblings, 0 replies; 6+ messages in thread
From: Sachin Sant @ 2022-07-18  6:31 UTC (permalink / raw)
  To: dchinner, linux-xfs; +Cc: linux-next, linuxppc-dev, riteshh

While running xfstests (specifically xfs/435) on a IBM Power server booted with
5.19.0-rc6-next-20220715 following warnings are seen:


[  110.954136] XFS (sdb2): Unmounting Filesystem
[  110.968860] XFS (sdb1): Unmounting Filesystem
[  111.115807] =============================================================================
[  111.115817] BUG xfs_buf (Tainted: G            E     ): Objects remaining in xfs_buf on __kmem_cache_shutdown()
[  111.115824] -----------------------------------------------------------------------------
[  111.115824] 
[  111.115828] Slab 0x0000000074cdc09a objects=170 used=1 fp=0x000000005f24a5e1 flags=0x13ffff800000200(slab|node=1|zone=0|lastcpupid=0x7ffff)
[  111.115840] CPU: 26 PID: 4704 Comm: modprobe Tainted: G            E      5.19.0-rc6-next-20220715 #3
[  111.115849] Call Trace:
[  111.115852] [c00000002985b9a0] [c000000000830bec] dump_stack_lvl+0x70/0xa4 (unreliable)
[  111.115867] [c00000002985b9e0] [c0000000004ef6f8] slab_err+0xd8/0xf0
[  111.115877] [c00000002985bad0] [c0000000004f6cbc] __kmem_cache_shutdown+0x1fc/0x560
[  111.115884] [c00000002985bbf0] [c0000000004534c8] kmem_cache_destroy+0xa8/0x1f0
[  111.115893] [c00000002985bc80] [c00800000ccf30e4] xfs_buf_terminate+0x2c/0x48 [xfs]
[  111.115977] [c00000002985bca0] [c00800000cd6f55c] exit_xfs_fs+0x90/0x20b34 [xfs]
[  111.116045] [c00000002985bcd0] [c00000000023b7e0] sys_delete_module+0x1e0/0x3c0
[  111.116053] [c00000002985bdb0] [c00000000003302c] system_call_exception+0x17c/0x350
[  111.116062] [c00000002985be10] [c00000000000c53c] system_call_common+0xec/0x270
[  111.116070] --- interrupt: c00 at 0x7fff8c158b88
[  111.116075] NIP:  00007fff8c158b88 LR: 000000013adb0398 CTR: 0000000000000000
[  111.116080] REGS: c00000002985be80 TRAP: 0c00   Tainted: G            E       (5.19.0-rc6-next-20220715)
[  111.116086] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 24008282  XER: 00000000
[  111.116103] IRQMASK: 0 
[  111.116103] GPR00: 0000000000000081 00007ffffe17dff0 00007fff8c227300 000001003f2f0c18 
[  111.116103] GPR04: 0000000000000800 000000000000000a 1999999999999999 0000000000000000 
[  111.116103] GPR08: 00007fff8c1b7830 0000000000000000 0000000000000000 0000000000000000 
[  111.116103] GPR12: 0000000000000000 00007fff8c72ca50 000000013adba650 000000013adba648 
[  111.116103] GPR16: 0000000000000000 0000000000000001 0000000000000000 000000013adba428 
[  111.116103] GPR20: 000000013ade0068 0000000000000000 00007ffffe17f948 000001003f2f02a0 
[  111.116103] GPR24: 0000000000000000 00007ffffe17f948 000001003f2f0c18 0000000000000000 
[  111.116103] GPR28: 0000000000000000 000001003f2f0bb0 000001003f2f0c18 000001003f2f0bb0 
[  111.116162] NIP [00007fff8c158b88] 0x7fff8c158b88
[  111.116166] LR [000000013adb0398] 0x13adb0398
[  111.116170] --- interrupt: c00
[  111.116173] Disabling lock debugging due to kernel taint
[  111.116184] Object 0x000000007e079655 @offset=18816
[  111.116189] =============================================================================
[  111.116193] BUG xfs_buf (Tainted: G    B       E     ): Objects remaining in xfs_buf on __kmem_cache_shutdown()
[  111.116198] -----------------------------------------------------------------------------
[  111.116198] 
[  111.116202] Slab 0x000000008186f78a objects=170 used=12 fp=0x000000008233ac7d flags=0x13ffff800000200(slab|node=1|zone=0|lastcpupid=0x7ffff)
[  111.116210] CPU: 26 PID: 4704 Comm: modprobe Tainted: G    B       E      5.19.0-rc6-next-20220715 #3
[  111.116216] Call Trace:
[  111.116218] [c00000002985b9a0] [c000000000830bec] dump_stack_lvl+0x70/0xa4 (unreliable)
[  111.116227] [c00000002985b9e0] [c0000000004ef6f8] slab_err+0xd8/0xf0
[  111.116234] [c00000002985bad0] [c0000000004f6cbc] __kmem_cache_shutdown+0x1fc/0x560
[  111.116241] [c00000002985bbf0] [c0000000004534c8] kmem_cache_destroy+0xa8/0x1f0
[  111.116248] [c00000002985bc80] [c00800000ccf30e4] xfs_buf_terminate+0x2c/0x48 [xfs]
[  111.116312] [c00000002985bca0] [c00800000cd6f55c] exit_xfs_fs+0x90/0x20b34 [xfs]
[  111.116379] [c00000002985bcd0] [c00000000023b7e0] sys_delete_module+0x1e0/0x3c0
[  111.116386] [c00000002985bdb0] [c00000000003302c] system_call_exception+0x17c/0x350
[  111.116392] [c00000002985be10] [c00000000000c53c] system_call_common+0xec/0x270
[  111.116400] --- interrupt: c00 at 0x7fff8c158b88
[  111.116404] NIP:  00007fff8c158b88 LR: 000000013adb0398 CTR: 0000000000000000
[  111.116409] REGS: c00000002985be80 TRAP: 0c00   Tainted: G    B       E       (5.19.0-rc6-next-20220715)
[  111.116414] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 24008282  XER: 00000000
[  111.116430] IRQMASK: 0 
[  111.116430] GPR00: 0000000000000081 00007ffffe17dff0 00007fff8c227300 000001003f2f0c18 
[  111.116430] GPR04: 0000000000000800 000000000000000a 1999999999999999 0000000000000000 
[  111.116430] GPR08: 00007fff8c1b7830 0000000000000000 0000000000000000 0000000000000000 
[  111.116430] GPR12: 0000000000000000 00007fff8c72ca50 000000013adba650 000000013adba648 
[  111.116430] GPR16: 0000000000000000 0000000000000001 0000000000000000 000000013adba428 
[  111.116430] GPR20: 000000013ade0068 0000000000000000 00007ffffe17f948 000001003f2f02a0 
[  111.116430] GPR24: 0000000000000000 00007ffffe17f948 000001003f2f0c18 0000000000000000 
[  111.116430] GPR28: 0000000000000000 000001003f2f0bb0 000001003f2f0c18 000001003f2f0bb0 
[  111.116488] NIP [00007fff8c158b88] 0x7fff8c158b88
[  111.116492] LR [000000013adb0398] 0x13adb0398
[  111.116496] --- interrupt: c00
[  111.116504] Object 0x000000002b93c535 @offset=5376
[  111.116508] Object 0x000000009be4058b @offset=16896
[  111.116511] Object 0x00000000c1d5c895 @offset=24960
[  111.116515] Object 0x0000000097fb6f84 @offset=30336
[  111.116518] Object 0x00000000213fb535 @offset=43008
[  111.116521] Object 0x0000000045473fa3 @offset=43392
[  111.116525] Object 0x000000006462ef89 @offset=44160
[  111.116528] Object 0x000000000c85ce0b @offset=44544
[  111.116531] Object 0x0000000059166af4 @offset=45312
[  111.116535] Object 0x00000000e7b40b45 @offset=46848
[  111.116538] Object 0x00000000bc6ce716 @offset=54528
[  111.116541] Object 0x000000005f7be1fa @offset=64512
[  111.116546] ------------[ cut here ]------------
[  111.116549] kmem_cache_destroy xfs_buf: Slab cache still has objects when called from xfs_buf_terminate+0x2c/0x48 [xfs]
[  111.116622] WARNING: CPU: 26 PID: 4704 at mm/slab_common.c:507 kmem_cache_destroy+0x1d8/0x1f0
[  111.116634] Modules linked in: xfs(E-) dm_mod(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) bonding(E) tls(E) nf_tables(E) rfkill(E) libcrc32c(E) nfnetlink(E) sunrpc(E) pseries_rng(E) vmx_crypto(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) sd_mod(E) cdrom(E) sg(E) lpfc(E) nvmet_fc(E) nvmet(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) nvme_fc(E) nvme(E) nvme_fabrics(E) nvme_core(E) t10_pi(E) scsi_transport_fc(E) crc64_rocksoft(E) crc64(E) tg3(E) ipmi_devintf(E) ipmi_msghandler(E) fuse(E) [last unloaded: xfs(E)]
[  111.116705] CPU: 26 PID: 4704 Comm: modprobe Tainted: G    B       E      5.19.0-rc6-next-20220715 #3
[  111.116711] NIP:  c0000000004535f8 LR: c0000000004535f4 CTR: 00000000ffffffde
[  111.116716] REGS: c00000002985b950 TRAP: 0700   Tainted: G    B       E       (5.19.0-rc6-next-20220715)
[  111.116722] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 48008824  XER: 20040005
[  111.116736] CFAR: c000000000154534 IRQMASK: 0 
[  111.116736] GPR00: c0000000004535f4 c00000002985bbf0 c000000002a8f600 000000000000006b 
[  111.116736] GPR04: 00000000ffff7fff c00000002985b9b0 c00000002985b9a8 0000000000000027 
[  111.116736] GPR08: c000000db9a07f98 0000000000000001 0000000000000027 c000000002947148 
[  111.116736] GPR12: 0000000000008000 c00000001ec20700 000000013adba650 000000013adba648 
[  111.116736] GPR16: 0000000000000000 0000000000000001 0000000000000000 000000013adba428 
[  111.116736] GPR20: 000000013ade0068 0000000000000000 00007ffffe17f948 000001003f2f02a0 
[  111.116736] GPR24: 00007fff8c1b7830 0000000000000000 1999999999999999 000000000000000a 
[  111.116736] GPR28: c000000002980228 c000000002980230 c00800000ccf30e4 c000000006f9f600 
[  111.116796] NIP [c0000000004535f8] kmem_cache_destroy+0x1d8/0x1f0
[  111.116802] LR [c0000000004535f4] kmem_cache_destroy+0x1d4/0x1f0
[  111.116808] Call Trace:
[  111.116810] [c00000002985bbf0] [c0000000004535f4] kmem_cache_destroy+0x1d4/0x1f0 (unreliable)
[  111.116818] [c00000002985bc80] [c00800000ccf30e4] xfs_buf_terminate+0x2c/0x48 [xfs]
[  111.116882] [c00000002985bca0] [c00800000cd6f55c] exit_xfs_fs+0x90/0x20b34 [xfs]
[  111.116949] [c00000002985bcd0] [c00000000023b7e0] sys_delete_module+0x1e0/0x3c0
[  111.116955] [c00000002985bdb0] [c00000000003302c] system_call_exception+0x17c/0x350
[  111.116962] [c00000002985be10] [c00000000000c53c] system_call_common+0xec/0x270
[  111.116970] --- interrupt: c00 at 0x7fff8c158b88
[  111.116974] NIP:  00007fff8c158b88 LR: 000000013adb0398 CTR: 0000000000000000
[  111.116979] REGS: c00000002985be80 TRAP: 0c00   Tainted: G    B       E       (5.19.0-rc6-next-20220715)
[  111.116984] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 24008282  XER: 00000000
[  111.116999] IRQMASK: 0 
[  111.116999] GPR00: 0000000000000081 00007ffffe17dff0 00007fff8c227300 000001003f2f0c18 
[  111.116999] GPR04: 0000000000000800 000000000000000a 1999999999999999 0000000000000000 
[  111.116999] GPR08: 00007fff8c1b7830 0000000000000000 0000000000000000 0000000000000000 
[  111.116999] GPR12: 0000000000000000 00007fff8c72ca50 000000013adba650 000000013adba648 
[  111.116999] GPR16: 0000000000000000 0000000000000001 0000000000000000 000000013adba428 
[  111.116999] GPR20: 000000013ade0068 0000000000000000 00007ffffe17f948 000001003f2f02a0 
[  111.116999] GPR24: 0000000000000000 00007ffffe17f948 000001003f2f0c18 0000000000000000 
[  111.116999] GPR28: 0000000000000000 000001003f2f0bb0 000001003f2f0c18 000001003f2f0bb0 
[  111.117057] NIP [00007fff8c158b88] 0x7fff8c158b88
[  111.117061] LR [000000013adb0398] 0x13adb0398
[  111.117065] --- interrupt: c00
[  111.117068] Instruction dump:
[  111.117071] 7fe3fb78 480a463d 60000000 4bfffeb8 3c82fe49 3c62fe76 e8bf0060 7fc6f378 
[  111.117082] 38849110 38630410 4bd00edd 60000000 <0fe00000> fb410060 fb610068 60000000 
[  111.117093] ---[ end trace 0000000000000000 ]---
[  111.974215] SGI XFS with ACLs, security attributes, scrub, quota, no debug enabled
[  111.983722] XFS (sdb1): Mounting V5 Filesystem
[  112.008668] XFS (sdb1): Ending clean mount
[  112.010988] xfs filesystem being mounted at /mnt/test supports timestamps until 2038 (0x7fffffff)

Git bisect leads me to following patch:

commit 298f34224506
      xfs: lockless buffer lookup

Reverting the patch allows the test to run correctly without any warning.

- Sachin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG xfs_buf while running tests/xfs/435 (next-20220715)
  2022-07-18  6:31 ` Sachin Sant
@ 2022-07-18  8:01   ` Dave Chinner
  -1 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2022-07-18  8:01 UTC (permalink / raw)
  To: Sachin Sant; +Cc: dchinner, linux-xfs, linuxppc-dev, linux-next, riteshh

On Mon, Jul 18, 2022 at 12:01:53PM +0530, Sachin Sant wrote:
> While running xfstests (specifically xfs/435) on a IBM Power server booted with
> 5.19.0-rc6-next-20220715 following warnings are seen:
> 
> 
> [  110.954136] XFS (sdb2): Unmounting Filesystem
> [  110.968860] XFS (sdb1): Unmounting Filesystem
> [  111.115807] =============================================================================
> [  111.115817] BUG xfs_buf (Tainted: G            E     ): Objects remaining in xfs_buf on __kmem_cache_shutdown()
> [  111.115824] -----------------------------------------------------------------------------
> [  111.115824] 
> [  111.115828] Slab 0x0000000074cdc09a objects=170 used=1 fp=0x000000005f24a5e1 flags=0x13ffff800000200(slab|node=1|zone=0|lastcpupid=0x7ffff)
> [  111.115840] CPU: 26 PID: 4704 Comm: modprobe Tainted: G            E      5.19.0-rc6-next-20220715 #3
> [  111.115849] Call Trace:
> [  111.115852] [c00000002985b9a0] [c000000000830bec] dump_stack_lvl+0x70/0xa4 (unreliable)
> [  111.115867] [c00000002985b9e0] [c0000000004ef6f8] slab_err+0xd8/0xf0
> [  111.115877] [c00000002985bad0] [c0000000004f6cbc] __kmem_cache_shutdown+0x1fc/0x560
> [  111.115884] [c00000002985bbf0] [c0000000004534c8] kmem_cache_destroy+0xa8/0x1f0
> [  111.115893] [c00000002985bc80] [c00800000ccf30e4] xfs_buf_terminate+0x2c/0x48 [xfs]
> [  111.115977] [c00000002985bca0] [c00800000cd6f55c] exit_xfs_fs+0x90/0x20b34 [xfs]
> [  111.116045] [c00000002985bcd0] [c00000000023b7e0] sys_delete_module+0x1e0/0x3c0
> [  111.116053] [c00000002985bdb0] [c00000000003302c] system_call_exception+0x17c/0x350
> [  111.116062] [c00000002985be10] [c00000000000c53c] system_call_common+0xec/0x270
> [  111.116070] --- interrupt: c00 at 0x7fff8c158b88
> [  111.116075] NIP:  00007fff8c158b88 LR: 000000013adb0398 CTR: 0000000000000000
> [  111.116080] REGS: c00000002985be80 TRAP: 0c00   Tainted: G            E       (5.19.0-rc6-next-20220715)
> [  111.116086] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 24008282  XER: 00000000
> [  111.116103] IRQMASK: 0 
> [  111.116103] GPR00: 0000000000000081 00007ffffe17dff0 00007fff8c227300 000001003f2f0c18 
> [  111.116103] GPR04: 0000000000000800 000000000000000a 1999999999999999 0000000000000000 
> [  111.116103] GPR08: 00007fff8c1b7830 0000000000000000 0000000000000000 0000000000000000 
> [  111.116103] GPR12: 0000000000000000 00007fff8c72ca50 000000013adba650 000000013adba648 
> [  111.116103] GPR16: 0000000000000000 0000000000000001 0000000000000000 000000013adba428 
> [  111.116103] GPR20: 000000013ade0068 0000000000000000 00007ffffe17f948 000001003f2f02a0 
> [  111.116103] GPR24: 0000000000000000 00007ffffe17f948 000001003f2f0c18 0000000000000000 
> [  111.116103] GPR28: 0000000000000000 000001003f2f0bb0 000001003f2f0c18 000001003f2f0bb0 
> [  111.116162] NIP [00007fff8c158b88] 0x7fff8c158b88
> [  111.116166] LR [000000013adb0398] 0x13adb0398
> [  111.116170] --- interrupt: c00
> [  111.116173] Disabling lock debugging due to kernel taint
> [  111.116184] Object 0x000000007e079655 @offset=18816
> [  111.116189] =============================================================================
> [  111.116193] BUG xfs_buf (Tainted: G    B       E     ): Objects remaining in xfs_buf on __kmem_cache_shutdown()
> [  111.116198] -----------------------------------------------------------------------------
> [  111.116198] 
> [  111.116202] Slab 0x000000008186f78a objects=170 used=12 fp=0x000000008233ac7d flags=0x13ffff800000200(slab|node=1|zone=0|lastcpupid=0x7ffff)
> [  111.116210] CPU: 26 PID: 4704 Comm: modprobe Tainted: G    B       E      5.19.0-rc6-next-20220715 #3
> [  111.116216] Call Trace:
> [  111.116218] [c00000002985b9a0] [c000000000830bec] dump_stack_lvl+0x70/0xa4 (unreliable)
> [  111.116227] [c00000002985b9e0] [c0000000004ef6f8] slab_err+0xd8/0xf0
> [  111.116234] [c00000002985bad0] [c0000000004f6cbc] __kmem_cache_shutdown+0x1fc/0x560
> [  111.116241] [c00000002985bbf0] [c0000000004534c8] kmem_cache_destroy+0xa8/0x1f0
> [  111.116248] [c00000002985bc80] [c00800000ccf30e4] xfs_buf_terminate+0x2c/0x48 [xfs]
> [  111.116312] [c00000002985bca0] [c00800000cd6f55c] exit_xfs_fs+0x90/0x20b34 [xfs]
> [  111.116379] [c00000002985bcd0] [c00000000023b7e0] sys_delete_module+0x1e0/0x3c0
> [  111.116386] [c00000002985bdb0] [c00000000003302c] system_call_exception+0x17c/0x350
> [  111.116392] [c00000002985be10] [c00000000000c53c] system_call_common+0xec/0x270
> [  111.116400] --- interrupt: c00 at 0x7fff8c158b88
> [  111.116404] NIP:  00007fff8c158b88 LR: 000000013adb0398 CTR: 0000000000000000
> [  111.116409] REGS: c00000002985be80 TRAP: 0c00   Tainted: G    B       E       (5.19.0-rc6-next-20220715)
> [  111.116414] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 24008282  XER: 00000000
> [  111.116430] IRQMASK: 0 
> [  111.116430] GPR00: 0000000000000081 00007ffffe17dff0 00007fff8c227300 000001003f2f0c18 
> [  111.116430] GPR04: 0000000000000800 000000000000000a 1999999999999999 0000000000000000 
> [  111.116430] GPR08: 00007fff8c1b7830 0000000000000000 0000000000000000 0000000000000000 
> [  111.116430] GPR12: 0000000000000000 00007fff8c72ca50 000000013adba650 000000013adba648 
> [  111.116430] GPR16: 0000000000000000 0000000000000001 0000000000000000 000000013adba428 
> [  111.116430] GPR20: 000000013ade0068 0000000000000000 00007ffffe17f948 000001003f2f02a0 
> [  111.116430] GPR24: 0000000000000000 00007ffffe17f948 000001003f2f0c18 0000000000000000 
> [  111.116430] GPR28: 0000000000000000 000001003f2f0bb0 000001003f2f0c18 000001003f2f0bb0 
> [  111.116488] NIP [00007fff8c158b88] 0x7fff8c158b88
> [  111.116492] LR [000000013adb0398] 0x13adb0398
> [  111.116496] --- interrupt: c00
> [  111.116504] Object 0x000000002b93c535 @offset=5376
> [  111.116508] Object 0x000000009be4058b @offset=16896
> [  111.116511] Object 0x00000000c1d5c895 @offset=24960
> [  111.116515] Object 0x0000000097fb6f84 @offset=30336
> [  111.116518] Object 0x00000000213fb535 @offset=43008
> [  111.116521] Object 0x0000000045473fa3 @offset=43392
> [  111.116525] Object 0x000000006462ef89 @offset=44160
> [  111.116528] Object 0x000000000c85ce0b @offset=44544
> [  111.116531] Object 0x0000000059166af4 @offset=45312
> [  111.116535] Object 0x00000000e7b40b45 @offset=46848
> [  111.116538] Object 0x00000000bc6ce716 @offset=54528
> [  111.116541] Object 0x000000005f7be1fa @offset=64512
> [  111.116546] ------------[ cut here ]------------

Yup, Darrick reported this once and couldn't reproduce it. We know
it's a result of result of converting the xfs_buffer cache to
rcu-protected lockless lookups and for some reason and the rcu
callbacks that free these objects seem not to have been processed
before the module is removed. We have an rcu_barrier() in
xfs_destroy_caches() to avoid this ......

Wait. What is xfs_buf_terminate()? I don't recall that function....

Yeah, there's the bug.

exit_xfs_fs(void)
{
....
        xfs_buf_terminate();
        xfs_mru_cache_uninit();
        xfs_destroy_workqueues();
        xfs_destroy_caches();
....

xfs_buf_terminate() calls kmem_cache_destroy() before the
rcu_barrier() call in xfs_destroy_caches().....

Try the (slightly smoke tested only) patch below.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

xfs: xfs_buf cache destroy isn't RCU safe

From: Dave Chinner <dchinner@redhat.com>

Darrick and Sachin Sant reported that xfs/435 and xfs/436 would
report an non-empty xfs_buf slab on module remove. This isn't easily
to reproduce, but is clearly a side effect of converting the buffer
caceh to RUC freeing and lockless lookups. Sachin bisected and
Darrick hit it when testing the patchset directly.

Turns out that the xfs_buf slab is not destroyed when all the other
XFS slab caches are destroyed. Instead, it's got it's own little
wrapper function that gets called separately, and so it doesn't have
an rcu_barrier() call in it that is needed to drain all the rcu
callbacks before the slab is destroyed.

Fix it by removing the xfs_buf_init/terminate wrappers that just
allocate and destroy the xfs_buf slab, and move them to the same
place that all the other slab caches are set up and destroyed.

Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Fixes: 298f34224506 ("xfs: lockless buffer lookup")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf.c   | 25 +------------------------
 fs/xfs/xfs_buf.h   |  6 ++----
 fs/xfs/xfs_super.c | 22 +++++++++++++---------
 3 files changed, 16 insertions(+), 37 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 4affba7c6669..f8bdc4698492 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -21,7 +21,7 @@
 #include "xfs_error.h"
 #include "xfs_ag.h"
 
-static struct kmem_cache *xfs_buf_cache;
+struct kmem_cache *xfs_buf_cache;
 
 /*
  * Locking orders
@@ -2300,29 +2300,6 @@ xfs_buf_delwri_pushbuf(
 	return error;
 }
 
-int __init
-xfs_buf_init(void)
-{
-	xfs_buf_cache = kmem_cache_create("xfs_buf", sizeof(struct xfs_buf), 0,
-					 SLAB_HWCACHE_ALIGN |
-					 SLAB_RECLAIM_ACCOUNT |
-					 SLAB_MEM_SPREAD,
-					 NULL);
-	if (!xfs_buf_cache)
-		goto out;
-
-	return 0;
-
- out:
-	return -ENOMEM;
-}
-
-void
-xfs_buf_terminate(void)
-{
-	kmem_cache_destroy(xfs_buf_cache);
-}
-
 void xfs_buf_set_ref(struct xfs_buf *bp, int lru_ref)
 {
 	/*
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index f65ef4d34ff7..04ff86a8eeef 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -15,6 +15,8 @@
 #include <linux/uio.h>
 #include <linux/list_lru.h>
 
+extern struct kmem_cache *xfs_buf_cache;
+
 /*
  *	Base types
  */
@@ -307,10 +309,6 @@ extern int xfs_buf_delwri_submit(struct list_head *);
 extern int xfs_buf_delwri_submit_nowait(struct list_head *);
 extern int xfs_buf_delwri_pushbuf(struct xfs_buf *, struct list_head *);
 
-/* Buffer Daemon Setup Routines */
-extern int xfs_buf_init(void);
-extern void xfs_buf_terminate(void);
-
 static inline xfs_daddr_t xfs_buf_daddr(struct xfs_buf *bp)
 {
 	return bp->b_maps[0].bm_bn;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 4edee1d3784a..3d27ba1295c9 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1967,11 +1967,19 @@ xfs_init_caches(void)
 {
 	int		error;
 
+	xfs_buf_cache = kmem_cache_create("xfs_buf", sizeof(struct xfs_buf), 0,
+					 SLAB_HWCACHE_ALIGN |
+					 SLAB_RECLAIM_ACCOUNT |
+					 SLAB_MEM_SPREAD,
+					 NULL);
+	if (!xfs_buf_cache)
+		goto out;
+
 	xfs_log_ticket_cache = kmem_cache_create("xfs_log_ticket",
 						sizeof(struct xlog_ticket),
 						0, 0, NULL);
 	if (!xfs_log_ticket_cache)
-		goto out;
+		goto out_destroy_buf_cache;
 
 	error = xfs_btree_init_cur_caches();
 	if (error)
@@ -2145,6 +2153,8 @@ xfs_init_caches(void)
 	xfs_btree_destroy_cur_caches();
  out_destroy_log_ticket_cache:
 	kmem_cache_destroy(xfs_log_ticket_cache);
+ out_destroy_buf_cache:
+	kmem_cache_destroy(xfs_buf_cache);
  out:
 	return -ENOMEM;
 }
@@ -2178,6 +2188,7 @@ xfs_destroy_caches(void)
 	xfs_defer_destroy_item_caches();
 	xfs_btree_destroy_cur_caches();
 	kmem_cache_destroy(xfs_log_ticket_cache);
+	kmem_cache_destroy(xfs_buf_cache);
 }
 
 STATIC int __init
@@ -2283,13 +2294,9 @@ init_xfs_fs(void)
 	if (error)
 		goto out_destroy_wq;
 
-	error = xfs_buf_init();
-	if (error)
-		goto out_mru_cache_uninit;
-
 	error = xfs_init_procfs();
 	if (error)
-		goto out_buf_terminate;
+		goto out_mru_cache_uninit;
 
 	error = xfs_sysctl_register();
 	if (error)
@@ -2346,8 +2353,6 @@ init_xfs_fs(void)
 	xfs_sysctl_unregister();
  out_cleanup_procfs:
 	xfs_cleanup_procfs();
- out_buf_terminate:
-	xfs_buf_terminate();
  out_mru_cache_uninit:
 	xfs_mru_cache_uninit();
  out_destroy_wq:
@@ -2373,7 +2378,6 @@ exit_xfs_fs(void)
 	kset_unregister(xfs_kset);
 	xfs_sysctl_unregister();
 	xfs_cleanup_procfs();
-	xfs_buf_terminate();
 	xfs_mru_cache_uninit();
 	xfs_destroy_workqueues();
 	xfs_destroy_caches();

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: BUG xfs_buf while running tests/xfs/435 (next-20220715)
@ 2022-07-18  8:01   ` Dave Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2022-07-18  8:01 UTC (permalink / raw)
  To: Sachin Sant; +Cc: linux-xfs, riteshh, linux-next, linuxppc-dev, dchinner

On Mon, Jul 18, 2022 at 12:01:53PM +0530, Sachin Sant wrote:
> While running xfstests (specifically xfs/435) on a IBM Power server booted with
> 5.19.0-rc6-next-20220715 following warnings are seen:
> 
> 
> [  110.954136] XFS (sdb2): Unmounting Filesystem
> [  110.968860] XFS (sdb1): Unmounting Filesystem
> [  111.115807] =============================================================================
> [  111.115817] BUG xfs_buf (Tainted: G            E     ): Objects remaining in xfs_buf on __kmem_cache_shutdown()
> [  111.115824] -----------------------------------------------------------------------------
> [  111.115824] 
> [  111.115828] Slab 0x0000000074cdc09a objects=170 used=1 fp=0x000000005f24a5e1 flags=0x13ffff800000200(slab|node=1|zone=0|lastcpupid=0x7ffff)
> [  111.115840] CPU: 26 PID: 4704 Comm: modprobe Tainted: G            E      5.19.0-rc6-next-20220715 #3
> [  111.115849] Call Trace:
> [  111.115852] [c00000002985b9a0] [c000000000830bec] dump_stack_lvl+0x70/0xa4 (unreliable)
> [  111.115867] [c00000002985b9e0] [c0000000004ef6f8] slab_err+0xd8/0xf0
> [  111.115877] [c00000002985bad0] [c0000000004f6cbc] __kmem_cache_shutdown+0x1fc/0x560
> [  111.115884] [c00000002985bbf0] [c0000000004534c8] kmem_cache_destroy+0xa8/0x1f0
> [  111.115893] [c00000002985bc80] [c00800000ccf30e4] xfs_buf_terminate+0x2c/0x48 [xfs]
> [  111.115977] [c00000002985bca0] [c00800000cd6f55c] exit_xfs_fs+0x90/0x20b34 [xfs]
> [  111.116045] [c00000002985bcd0] [c00000000023b7e0] sys_delete_module+0x1e0/0x3c0
> [  111.116053] [c00000002985bdb0] [c00000000003302c] system_call_exception+0x17c/0x350
> [  111.116062] [c00000002985be10] [c00000000000c53c] system_call_common+0xec/0x270
> [  111.116070] --- interrupt: c00 at 0x7fff8c158b88
> [  111.116075] NIP:  00007fff8c158b88 LR: 000000013adb0398 CTR: 0000000000000000
> [  111.116080] REGS: c00000002985be80 TRAP: 0c00   Tainted: G            E       (5.19.0-rc6-next-20220715)
> [  111.116086] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 24008282  XER: 00000000
> [  111.116103] IRQMASK: 0 
> [  111.116103] GPR00: 0000000000000081 00007ffffe17dff0 00007fff8c227300 000001003f2f0c18 
> [  111.116103] GPR04: 0000000000000800 000000000000000a 1999999999999999 0000000000000000 
> [  111.116103] GPR08: 00007fff8c1b7830 0000000000000000 0000000000000000 0000000000000000 
> [  111.116103] GPR12: 0000000000000000 00007fff8c72ca50 000000013adba650 000000013adba648 
> [  111.116103] GPR16: 0000000000000000 0000000000000001 0000000000000000 000000013adba428 
> [  111.116103] GPR20: 000000013ade0068 0000000000000000 00007ffffe17f948 000001003f2f02a0 
> [  111.116103] GPR24: 0000000000000000 00007ffffe17f948 000001003f2f0c18 0000000000000000 
> [  111.116103] GPR28: 0000000000000000 000001003f2f0bb0 000001003f2f0c18 000001003f2f0bb0 
> [  111.116162] NIP [00007fff8c158b88] 0x7fff8c158b88
> [  111.116166] LR [000000013adb0398] 0x13adb0398
> [  111.116170] --- interrupt: c00
> [  111.116173] Disabling lock debugging due to kernel taint
> [  111.116184] Object 0x000000007e079655 @offset=18816
> [  111.116189] =============================================================================
> [  111.116193] BUG xfs_buf (Tainted: G    B       E     ): Objects remaining in xfs_buf on __kmem_cache_shutdown()
> [  111.116198] -----------------------------------------------------------------------------
> [  111.116198] 
> [  111.116202] Slab 0x000000008186f78a objects=170 used=12 fp=0x000000008233ac7d flags=0x13ffff800000200(slab|node=1|zone=0|lastcpupid=0x7ffff)
> [  111.116210] CPU: 26 PID: 4704 Comm: modprobe Tainted: G    B       E      5.19.0-rc6-next-20220715 #3
> [  111.116216] Call Trace:
> [  111.116218] [c00000002985b9a0] [c000000000830bec] dump_stack_lvl+0x70/0xa4 (unreliable)
> [  111.116227] [c00000002985b9e0] [c0000000004ef6f8] slab_err+0xd8/0xf0
> [  111.116234] [c00000002985bad0] [c0000000004f6cbc] __kmem_cache_shutdown+0x1fc/0x560
> [  111.116241] [c00000002985bbf0] [c0000000004534c8] kmem_cache_destroy+0xa8/0x1f0
> [  111.116248] [c00000002985bc80] [c00800000ccf30e4] xfs_buf_terminate+0x2c/0x48 [xfs]
> [  111.116312] [c00000002985bca0] [c00800000cd6f55c] exit_xfs_fs+0x90/0x20b34 [xfs]
> [  111.116379] [c00000002985bcd0] [c00000000023b7e0] sys_delete_module+0x1e0/0x3c0
> [  111.116386] [c00000002985bdb0] [c00000000003302c] system_call_exception+0x17c/0x350
> [  111.116392] [c00000002985be10] [c00000000000c53c] system_call_common+0xec/0x270
> [  111.116400] --- interrupt: c00 at 0x7fff8c158b88
> [  111.116404] NIP:  00007fff8c158b88 LR: 000000013adb0398 CTR: 0000000000000000
> [  111.116409] REGS: c00000002985be80 TRAP: 0c00   Tainted: G    B       E       (5.19.0-rc6-next-20220715)
> [  111.116414] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 24008282  XER: 00000000
> [  111.116430] IRQMASK: 0 
> [  111.116430] GPR00: 0000000000000081 00007ffffe17dff0 00007fff8c227300 000001003f2f0c18 
> [  111.116430] GPR04: 0000000000000800 000000000000000a 1999999999999999 0000000000000000 
> [  111.116430] GPR08: 00007fff8c1b7830 0000000000000000 0000000000000000 0000000000000000 
> [  111.116430] GPR12: 0000000000000000 00007fff8c72ca50 000000013adba650 000000013adba648 
> [  111.116430] GPR16: 0000000000000000 0000000000000001 0000000000000000 000000013adba428 
> [  111.116430] GPR20: 000000013ade0068 0000000000000000 00007ffffe17f948 000001003f2f02a0 
> [  111.116430] GPR24: 0000000000000000 00007ffffe17f948 000001003f2f0c18 0000000000000000 
> [  111.116430] GPR28: 0000000000000000 000001003f2f0bb0 000001003f2f0c18 000001003f2f0bb0 
> [  111.116488] NIP [00007fff8c158b88] 0x7fff8c158b88
> [  111.116492] LR [000000013adb0398] 0x13adb0398
> [  111.116496] --- interrupt: c00
> [  111.116504] Object 0x000000002b93c535 @offset=5376
> [  111.116508] Object 0x000000009be4058b @offset=16896
> [  111.116511] Object 0x00000000c1d5c895 @offset=24960
> [  111.116515] Object 0x0000000097fb6f84 @offset=30336
> [  111.116518] Object 0x00000000213fb535 @offset=43008
> [  111.116521] Object 0x0000000045473fa3 @offset=43392
> [  111.116525] Object 0x000000006462ef89 @offset=44160
> [  111.116528] Object 0x000000000c85ce0b @offset=44544
> [  111.116531] Object 0x0000000059166af4 @offset=45312
> [  111.116535] Object 0x00000000e7b40b45 @offset=46848
> [  111.116538] Object 0x00000000bc6ce716 @offset=54528
> [  111.116541] Object 0x000000005f7be1fa @offset=64512
> [  111.116546] ------------[ cut here ]------------

Yup, Darrick reported this once and couldn't reproduce it. We know
it's a result of result of converting the xfs_buffer cache to
rcu-protected lockless lookups and for some reason and the rcu
callbacks that free these objects seem not to have been processed
before the module is removed. We have an rcu_barrier() in
xfs_destroy_caches() to avoid this ......

Wait. What is xfs_buf_terminate()? I don't recall that function....

Yeah, there's the bug.

exit_xfs_fs(void)
{
....
        xfs_buf_terminate();
        xfs_mru_cache_uninit();
        xfs_destroy_workqueues();
        xfs_destroy_caches();
....

xfs_buf_terminate() calls kmem_cache_destroy() before the
rcu_barrier() call in xfs_destroy_caches().....

Try the (slightly smoke tested only) patch below.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

xfs: xfs_buf cache destroy isn't RCU safe

From: Dave Chinner <dchinner@redhat.com>

Darrick and Sachin Sant reported that xfs/435 and xfs/436 would
report an non-empty xfs_buf slab on module remove. This isn't easily
to reproduce, but is clearly a side effect of converting the buffer
caceh to RUC freeing and lockless lookups. Sachin bisected and
Darrick hit it when testing the patchset directly.

Turns out that the xfs_buf slab is not destroyed when all the other
XFS slab caches are destroyed. Instead, it's got it's own little
wrapper function that gets called separately, and so it doesn't have
an rcu_barrier() call in it that is needed to drain all the rcu
callbacks before the slab is destroyed.

Fix it by removing the xfs_buf_init/terminate wrappers that just
allocate and destroy the xfs_buf slab, and move them to the same
place that all the other slab caches are set up and destroyed.

Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Fixes: 298f34224506 ("xfs: lockless buffer lookup")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf.c   | 25 +------------------------
 fs/xfs/xfs_buf.h   |  6 ++----
 fs/xfs/xfs_super.c | 22 +++++++++++++---------
 3 files changed, 16 insertions(+), 37 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 4affba7c6669..f8bdc4698492 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -21,7 +21,7 @@
 #include "xfs_error.h"
 #include "xfs_ag.h"
 
-static struct kmem_cache *xfs_buf_cache;
+struct kmem_cache *xfs_buf_cache;
 
 /*
  * Locking orders
@@ -2300,29 +2300,6 @@ xfs_buf_delwri_pushbuf(
 	return error;
 }
 
-int __init
-xfs_buf_init(void)
-{
-	xfs_buf_cache = kmem_cache_create("xfs_buf", sizeof(struct xfs_buf), 0,
-					 SLAB_HWCACHE_ALIGN |
-					 SLAB_RECLAIM_ACCOUNT |
-					 SLAB_MEM_SPREAD,
-					 NULL);
-	if (!xfs_buf_cache)
-		goto out;
-
-	return 0;
-
- out:
-	return -ENOMEM;
-}
-
-void
-xfs_buf_terminate(void)
-{
-	kmem_cache_destroy(xfs_buf_cache);
-}
-
 void xfs_buf_set_ref(struct xfs_buf *bp, int lru_ref)
 {
 	/*
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index f65ef4d34ff7..04ff86a8eeef 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -15,6 +15,8 @@
 #include <linux/uio.h>
 #include <linux/list_lru.h>
 
+extern struct kmem_cache *xfs_buf_cache;
+
 /*
  *	Base types
  */
@@ -307,10 +309,6 @@ extern int xfs_buf_delwri_submit(struct list_head *);
 extern int xfs_buf_delwri_submit_nowait(struct list_head *);
 extern int xfs_buf_delwri_pushbuf(struct xfs_buf *, struct list_head *);
 
-/* Buffer Daemon Setup Routines */
-extern int xfs_buf_init(void);
-extern void xfs_buf_terminate(void);
-
 static inline xfs_daddr_t xfs_buf_daddr(struct xfs_buf *bp)
 {
 	return bp->b_maps[0].bm_bn;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 4edee1d3784a..3d27ba1295c9 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1967,11 +1967,19 @@ xfs_init_caches(void)
 {
 	int		error;
 
+	xfs_buf_cache = kmem_cache_create("xfs_buf", sizeof(struct xfs_buf), 0,
+					 SLAB_HWCACHE_ALIGN |
+					 SLAB_RECLAIM_ACCOUNT |
+					 SLAB_MEM_SPREAD,
+					 NULL);
+	if (!xfs_buf_cache)
+		goto out;
+
 	xfs_log_ticket_cache = kmem_cache_create("xfs_log_ticket",
 						sizeof(struct xlog_ticket),
 						0, 0, NULL);
 	if (!xfs_log_ticket_cache)
-		goto out;
+		goto out_destroy_buf_cache;
 
 	error = xfs_btree_init_cur_caches();
 	if (error)
@@ -2145,6 +2153,8 @@ xfs_init_caches(void)
 	xfs_btree_destroy_cur_caches();
  out_destroy_log_ticket_cache:
 	kmem_cache_destroy(xfs_log_ticket_cache);
+ out_destroy_buf_cache:
+	kmem_cache_destroy(xfs_buf_cache);
  out:
 	return -ENOMEM;
 }
@@ -2178,6 +2188,7 @@ xfs_destroy_caches(void)
 	xfs_defer_destroy_item_caches();
 	xfs_btree_destroy_cur_caches();
 	kmem_cache_destroy(xfs_log_ticket_cache);
+	kmem_cache_destroy(xfs_buf_cache);
 }
 
 STATIC int __init
@@ -2283,13 +2294,9 @@ init_xfs_fs(void)
 	if (error)
 		goto out_destroy_wq;
 
-	error = xfs_buf_init();
-	if (error)
-		goto out_mru_cache_uninit;
-
 	error = xfs_init_procfs();
 	if (error)
-		goto out_buf_terminate;
+		goto out_mru_cache_uninit;
 
 	error = xfs_sysctl_register();
 	if (error)
@@ -2346,8 +2353,6 @@ init_xfs_fs(void)
 	xfs_sysctl_unregister();
  out_cleanup_procfs:
 	xfs_cleanup_procfs();
- out_buf_terminate:
-	xfs_buf_terminate();
  out_mru_cache_uninit:
 	xfs_mru_cache_uninit();
  out_destroy_wq:
@@ -2373,7 +2378,6 @@ exit_xfs_fs(void)
 	kset_unregister(xfs_kset);
 	xfs_sysctl_unregister();
 	xfs_cleanup_procfs();
-	xfs_buf_terminate();
 	xfs_mru_cache_uninit();
 	xfs_destroy_workqueues();
 	xfs_destroy_caches();

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: BUG xfs_buf while running tests/xfs/435 (next-20220715)
  2022-07-18  8:01   ` Dave Chinner
@ 2022-07-18 12:40     ` Sachin Sant
  -1 siblings, 0 replies; 6+ messages in thread
From: Sachin Sant @ 2022-07-18 12:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: dchinner, linux-xfs, linuxppc-dev, linux-next, riteshh


> Fix it by removing the xfs_buf_init/terminate wrappers that just
> allocate and destroy the xfs_buf slab, and move them to the same
> place that all the other slab caches are set up and destroyed.
> 
> Reported-by: Sachin Sant <sachinp@linux.ibm.com>
> Fixes: 298f34224506 ("xfs: lockless buffer lookup")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Thanks. The patch fixes the reported problem for me.

Tested-by: Sachin Sant <sachinp@linux.ibm.com>

- Sachin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG xfs_buf while running tests/xfs/435 (next-20220715)
@ 2022-07-18 12:40     ` Sachin Sant
  0 siblings, 0 replies; 6+ messages in thread
From: Sachin Sant @ 2022-07-18 12:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, riteshh, linux-next, linuxppc-dev, dchinner


> Fix it by removing the xfs_buf_init/terminate wrappers that just
> allocate and destroy the xfs_buf slab, and move them to the same
> place that all the other slab caches are set up and destroyed.
> 
> Reported-by: Sachin Sant <sachinp@linux.ibm.com>
> Fixes: 298f34224506 ("xfs: lockless buffer lookup")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Thanks. The patch fixes the reported problem for me.

Tested-by: Sachin Sant <sachinp@linux.ibm.com>

- Sachin

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-07-18 12:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-18  6:31 BUG xfs_buf while running tests/xfs/435 (next-20220715) Sachin Sant
2022-07-18  6:31 ` Sachin Sant
2022-07-18  8:01 ` Dave Chinner
2022-07-18  8:01   ` Dave Chinner
2022-07-18 12:40   ` Sachin Sant
2022-07-18 12:40     ` Sachin Sant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.