linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: Crash with PREEMPT_RT on aarch64 machine
       [not found] <20221103115444.m2rjglbkubydidts@quack3>
@ 2022-11-04  8:06 ` Hillf Danton
  2022-11-07 12:41   ` Jan Kara
  0 siblings, 1 reply; 3+ messages in thread
From: Hillf Danton @ 2022-11-04  8:06 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, Thomas Gleixner, Steven Rostedt, Sebastian Andrzej Siewior,
	linux-mm, Mel Gorman

On 3 Nov 2022 12:54:44 +0100 Jan Kara <jack@suse.cz>
> Hello,
> 
> I was tracking down the following crash with 6.0 kernel with
> patch-6.0.5-rt14.patch applied:
> 
> [ T6611] ------------[ cut here ]------------
> [ T6611] kernel BUG at fs/inode.c:625!
> [ T6611] Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
> [ T6611] Modules linked in: xfs(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) mlx5_ib(E) ib_uverbs(E) ib_core(E) arm_spe_pmu(E) mlx5_core(E) sunrpc(E) mlxfw(E) pci_hyperv_intf(E) nls_iso8859_1(E) acpi_ipmi(E) nls_cp437(E) ipmi_ssif(E) vfat(E) ipmi_devintf(E) tls(E) igb(E) psample(E) button(E) arm_cmn(E) arm_dmc620_pmu(E) ipmi_msghandler(E) fat(E) cppc_cpufreq(E) arm_dsu_pmu(E) fuse(E) ip_tables(E) x_tables(E) ast(E) i2c_algo_bit(E) drm_vram_helper(E) aes_ce_blk(E) aes_ce_cipher(E) crct10dif_ce(E) ghash_ce(E) gf128mul(E) nvme(E) drm_kms_helper(E) sha2_ce(E) syscopyarea(E) sha256_arm64(E) sysfillrect(E) xhci_pci(E) sha1_ce(E) sysimgblt(E) nvme_core(E) xhci_pci_renesas(E) fb_sys_fops(E) nvme_common(E) drm_ttm_helper(E) sbsa_gwdt(E) t10_pi(E) ttm(E) xhci_hcd(E) crc64_rocksoft_generic(E) crc64_rocksoft(E) usbcore(E) crc64(E) drm(E) usb_common(E) i2c_designware_platform(E) i2c_designware_core(E) btrfs(E) blake2b_generic(E) libcrc32c(E) xor(E) xor_neon(E)
> [ T6611]  raid6_pq(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E)
> [ T6611] CPU: 11 PID: 6611 Comm: dbench Tainted: G            E   6.0.0-rt14-rt+ #1 4a18df02c109f1e703cf2ff86b77cf9cd9d5a188
> [ T6611] Hardware name: GIGABYTE R272-P30-JG/MP32-AR0-JG, BIOS F16f (SCP: 1.06.20210615) 07/01/2021
> [ T6611] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ T6611] pc : clear_inode+0xa0/0xc0
> [ T6611] lr : clear_inode+0x38/0xc0
> [ T6611] sp : ffff80000f4f3cd0
> [ T6611] x29: ffff80000f4f3cd0 x28: ffff07ff92142000 x27: 0000000000000000
> [ T6611] x26: ffff08012aef6058 x25: 0000000000000002 x24: ffffb657395e8000
> [ T6611] x23: ffffb65739072008 x22: ffffb656e0bed0a8 x21: ffff08012aef6190
> [ T6611] x20: ffff08012aef61f8 x19: ffff08012aef6058 x18: 0000000000000014
> [ T6611] x17: 00000000f0d86255 x16: ffffb65737dfdb00 x15: 0100000004000000
> [ T6611] x14: 644d000008090000 x13: 644d000008090000 x12: ffff80000f4f3b20
> [ T6611] x11: 0000000000000002 x10: ffff083f5ffbe1c0 x9 : ffffb657388284a4
> [ T6611] x8 : fffffffffffffffe x7 : ffff80000f4f3b20 x6 : ffff80000f4f3b20
> [ T6611] x5 : ffff08012aef6210 x4 : ffff08012aef6210 x3 : 0000000000000000
> [ T6611] x2 : ffff08012aef62d8 x1 : ffff07ff8fbbf690 x0 : ffff08012aef61a0
> [ T6611] Call trace:
> [ T6611]  clear_inode+0xa0/0xc0
> [ T6611]  evict+0x160/0x180
> [ T6611]  iput+0x154/0x240
> [ T6611]  do_unlinkat+0x184/0x300
> [ T6611]  __arm64_sys_unlinkat+0x48/0xc0
> [ T6611]  el0_svc_common.constprop.4+0xe4/0x2c0
> [ T6611]  do_el0_svc+0xac/0x100
> [ T6611]  el0_svc+0x78/0x200
> [ T6611]  el0t_64_sync_handler+0x9c/0xc0
> [ T6611]  el0t_64_sync+0x19c/0x1a0
> [ T6611] Code: d4210000 d503201f d4210000 d503201f (d4210000) 
> [ T6611] ---[ end trace 0000000000000000 ]---
> 
> The machine is aarch64 architecture, kernel config is attached. I have seen
> the crashes also with 5.14-rt kernel so it is not a new thing. The crash is
> triggered relatively reliably (on two different aarch64 machines) by our
> performance testing framework when running dbench benchmark against an XFS
> filesystem.
> 
> Now originally I thought this is some problem with XFS or writeback code
> but after debugging this for some time I don't think that anymore.
> clear_inode() complains about inode->i_wb_list being non-empty. In fact
> looking at the list_head, I can see it is corrupted. In all the occurences
> of the problem ->prev points back to the list_head itself but ->next points
> to some list_head that used to be part of the sb->s_inodes_wb list (or
> actually that list spliced in wait_sb_inodes() because I've seen a pointer to
> the stack as ->next pointer as well).
> 
> This is not just some memory ordering issue with the check in
> clear_inode(). If I add sb->s_inode_wblist_lock locking around the check in
> clear_inode(), the problem still reproduces.
> 
> If I enable CONFIG_DEBUG_LIST or if I convert sb->s_inode_wblist_lock to
> raw_spinlock_t, the problem disappears.
> 
> Finally, I'd note that the list is modified from three places which makes
> audit relatively simple. sb_mark_inode_writeback(),
> sb_clear_inode_writeback(), and wait_sb_inodes(). All these places hold
> sb->s_inode_wblist_lock when modifying the list. So at this point I'm at
> loss what could be causing this. As unlikely as it seems to me I've started
> wondering whether it is not some subtle issue with RT spinlocks on aarch64
> possibly in combination with interrupts (because sb_clear_inode_writeback()
> may be called from an interrupt).
> 
> Any ideas?

Feel free to collect debug info ONLY in your spare cycles, given your
relatively reliable reproducer.

Only for thoughts.

Hillf

+++ b/fs/fs-writeback.c
@@ -1256,6 +1256,7 @@ void sb_mark_inode_writeback(struct inod
 	if (list_empty(&inode->i_wb_list)) {
 		spin_lock_irqsave(&sb->s_inode_wblist_lock, flags);
 		if (list_empty(&inode->i_wb_list)) {
+			ihold(inode);
 			list_add_tail(&inode->i_wb_list, &sb->s_inodes_wb);
 			trace_sb_mark_inode_writeback(inode);
 		}
@@ -1272,12 +1273,19 @@ void sb_clear_inode_writeback(struct ino
 	unsigned long flags;
 
 	if (!list_empty(&inode->i_wb_list)) {
+		int put = 0;
 		spin_lock_irqsave(&sb->s_inode_wblist_lock, flags);
 		if (!list_empty(&inode->i_wb_list)) {
+			put = 1;
 			list_del_init(&inode->i_wb_list);
 			trace_sb_clear_inode_writeback(inode);
 		}
 		spin_unlock_irqrestore(&sb->s_inode_wblist_lock, flags);
+		if (put) {
+			ihold(inode);
+			iput(inode);
+			iput(inode);
+		}
 	}
 }
 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Crash with PREEMPT_RT on aarch64 machine
  2022-11-04  8:06 ` Crash with PREEMPT_RT on aarch64 machine Hillf Danton
@ 2022-11-07 12:41   ` Jan Kara
  2022-11-07 14:43     ` Hillf Danton
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Kara @ 2022-11-07 12:41 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Jan Kara, LKML, Thomas Gleixner, Steven Rostedt,
	Sebastian Andrzej Siewior, linux-mm, Mel Gorman

On Fri 04-11-22 16:06:37, Hillf Danton wrote:
> On 3 Nov 2022 12:54:44 +0100 Jan Kara <jack@suse.cz>
> > Hello,
> > 
> > I was tracking down the following crash with 6.0 kernel with
> > patch-6.0.5-rt14.patch applied:
> > 
> > [ T6611] ------------[ cut here ]------------
> > [ T6611] kernel BUG at fs/inode.c:625!
> > [ T6611] Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
> > [ T6611] Modules linked in: xfs(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) mlx5_ib(E) ib_uverbs(E) ib_core(E) arm_spe_pmu(E) mlx5_core(E) sunrpc(E) mlxfw(E) pci_hyperv_intf(E) nls_iso8859_1(E) acpi_ipmi(E) nls_cp437(E) ipmi_ssif(E) vfat(E) ipmi_devintf(E) tls(E) igb(E) psample(E) button(E) arm_cmn(E) arm_dmc620_pmu(E) ipmi_msghandler(E) fat(E) cppc_cpufreq(E) arm_dsu_pmu(E) fuse(E) ip_tables(E) x_tables(E) ast(E) i2c_algo_bit(E) drm_vram_helper(E) aes_ce_blk(E) aes_ce_cipher(E) crct10dif_ce(E) ghash_ce(E) gf128mul(E) nvme(E) drm_kms_helper(E) sha2_ce(E) syscopyarea(E) sha256_arm64(E) sysfillrect(E) xhci_pci(E) sha1_ce(E) sysimgblt(E) nvme_core(E) xhci_pci_renesas(E) fb_sys_fops(E) nvme_common(E) drm_ttm_helper(E) sbsa_gwdt(E) t10_pi(E) ttm(E) xhci_hcd(E) crc64_rocksoft_generic(E) crc64_rocksoft(E) usbcore(E) crc64(E) drm(E) usb_common(E) i2c_designware_platform(E) i2c_designware_core(E) btrfs(E) blake2b_generic(E) libcrc32c(E) xor(E) xor_neon(E)
> > [ T6611]  raid6_pq(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E)
> > [ T6611] CPU: 11 PID: 6611 Comm: dbench Tainted: G            E   6.0.0-rt14-rt+ #1 4a18df02c109f1e703cf2ff86b77cf9cd9d5a188
> > [ T6611] Hardware name: GIGABYTE R272-P30-JG/MP32-AR0-JG, BIOS F16f (SCP: 1.06.20210615) 07/01/2021
> > [ T6611] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [ T6611] pc : clear_inode+0xa0/0xc0
> > [ T6611] lr : clear_inode+0x38/0xc0
> > [ T6611] sp : ffff80000f4f3cd0
> > [ T6611] x29: ffff80000f4f3cd0 x28: ffff07ff92142000 x27: 0000000000000000
> > [ T6611] x26: ffff08012aef6058 x25: 0000000000000002 x24: ffffb657395e8000
> > [ T6611] x23: ffffb65739072008 x22: ffffb656e0bed0a8 x21: ffff08012aef6190
> > [ T6611] x20: ffff08012aef61f8 x19: ffff08012aef6058 x18: 0000000000000014
> > [ T6611] x17: 00000000f0d86255 x16: ffffb65737dfdb00 x15: 0100000004000000
> > [ T6611] x14: 644d000008090000 x13: 644d000008090000 x12: ffff80000f4f3b20
> > [ T6611] x11: 0000000000000002 x10: ffff083f5ffbe1c0 x9 : ffffb657388284a4
> > [ T6611] x8 : fffffffffffffffe x7 : ffff80000f4f3b20 x6 : ffff80000f4f3b20
> > [ T6611] x5 : ffff08012aef6210 x4 : ffff08012aef6210 x3 : 0000000000000000
> > [ T6611] x2 : ffff08012aef62d8 x1 : ffff07ff8fbbf690 x0 : ffff08012aef61a0
> > [ T6611] Call trace:
> > [ T6611]  clear_inode+0xa0/0xc0
> > [ T6611]  evict+0x160/0x180
> > [ T6611]  iput+0x154/0x240
> > [ T6611]  do_unlinkat+0x184/0x300
> > [ T6611]  __arm64_sys_unlinkat+0x48/0xc0
> > [ T6611]  el0_svc_common.constprop.4+0xe4/0x2c0
> > [ T6611]  do_el0_svc+0xac/0x100
> > [ T6611]  el0_svc+0x78/0x200
> > [ T6611]  el0t_64_sync_handler+0x9c/0xc0
> > [ T6611]  el0t_64_sync+0x19c/0x1a0
> > [ T6611] Code: d4210000 d503201f d4210000 d503201f (d4210000) 
> > [ T6611] ---[ end trace 0000000000000000 ]---
> > 
> > The machine is aarch64 architecture, kernel config is attached. I have seen
> > the crashes also with 5.14-rt kernel so it is not a new thing. The crash is
> > triggered relatively reliably (on two different aarch64 machines) by our
> > performance testing framework when running dbench benchmark against an XFS
> > filesystem.
> > 
> > Now originally I thought this is some problem with XFS or writeback code
> > but after debugging this for some time I don't think that anymore.
> > clear_inode() complains about inode->i_wb_list being non-empty. In fact
> > looking at the list_head, I can see it is corrupted. In all the occurences
> > of the problem ->prev points back to the list_head itself but ->next points
> > to some list_head that used to be part of the sb->s_inodes_wb list (or
> > actually that list spliced in wait_sb_inodes() because I've seen a pointer to
> > the stack as ->next pointer as well).
> > 
> > This is not just some memory ordering issue with the check in
> > clear_inode(). If I add sb->s_inode_wblist_lock locking around the check in
> > clear_inode(), the problem still reproduces.
> > 
> > If I enable CONFIG_DEBUG_LIST or if I convert sb->s_inode_wblist_lock to
> > raw_spinlock_t, the problem disappears.
> > 
> > Finally, I'd note that the list is modified from three places which makes
> > audit relatively simple. sb_mark_inode_writeback(),
> > sb_clear_inode_writeback(), and wait_sb_inodes(). All these places hold
> > sb->s_inode_wblist_lock when modifying the list. So at this point I'm at
> > loss what could be causing this. As unlikely as it seems to me I've started
> > wondering whether it is not some subtle issue with RT spinlocks on aarch64
> > possibly in combination with interrupts (because sb_clear_inode_writeback()
> > may be called from an interrupt).
> > 
> > Any ideas?
> 
> Feel free to collect debug info ONLY in your spare cycles, given your
> relatively reliable reproducer.

So in fact I made sure (by debug counters) that sb_mark_inode_writeback()
and sb_clear_inode_writeback() get called the same number of times before
evict() gets called. So your debug patch would change nothing AFAICT...

								Honza


> +++ b/fs/fs-writeback.c
> @@ -1256,6 +1256,7 @@ void sb_mark_inode_writeback(struct inod
>  	if (list_empty(&inode->i_wb_list)) {
>  		spin_lock_irqsave(&sb->s_inode_wblist_lock, flags);
>  		if (list_empty(&inode->i_wb_list)) {
> +			ihold(inode);
>  			list_add_tail(&inode->i_wb_list, &sb->s_inodes_wb);
>  			trace_sb_mark_inode_writeback(inode);
>  		}
> @@ -1272,12 +1273,19 @@ void sb_clear_inode_writeback(struct ino
>  	unsigned long flags;
>  
>  	if (!list_empty(&inode->i_wb_list)) {
> +		int put = 0;
>  		spin_lock_irqsave(&sb->s_inode_wblist_lock, flags);
>  		if (!list_empty(&inode->i_wb_list)) {
> +			put = 1;
>  			list_del_init(&inode->i_wb_list);
>  			trace_sb_clear_inode_writeback(inode);
>  		}
>  		spin_unlock_irqrestore(&sb->s_inode_wblist_lock, flags);
> +		if (put) {
> +			ihold(inode);
> +			iput(inode);
> +			iput(inode);
> +		}
>  	}
>  }
>  
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Crash with PREEMPT_RT on aarch64 machine
  2022-11-07 12:41   ` Jan Kara
@ 2022-11-07 14:43     ` Hillf Danton
  0 siblings, 0 replies; 3+ messages in thread
From: Hillf Danton @ 2022-11-07 14:43 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, Thomas Gleixner, Steven Rostedt, Sebastian Andrzej Siewior,
	linux-mm, Mel Gorman

On 7 Nov 2022 13:41:49 +0100 Jan Kara <jack@suse.cz>
> So in fact I made sure (by debug counters) that sb_mark_inode_writeback()
> and sb_clear_inode_writeback() get called the same number of times before
> evict() gets called. So your debug patch would change nothing AFAICT...

The same number you observed != the difference ihold + iput can make
since the added ihold ensures the crash has nothing to do with the
two functions mentioned if it reproduces. Right?


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-11-07 14:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20221103115444.m2rjglbkubydidts@quack3>
2022-11-04  8:06 ` Crash with PREEMPT_RT on aarch64 machine Hillf Danton
2022-11-07 12:41   ` Jan Kara
2022-11-07 14:43     ` Hillf Danton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).