All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Mel Gorman <mgorman@suse.de>
Subject: Crash with PREEMPT_RT on aarch64 machine
Date: Thu, 3 Nov 2022 12:54:44 +0100	[thread overview]
Message-ID: <20221103115444.m2rjglbkubydidts@quack3> (raw)

[-- Attachment #1: Type: text/plain, Size: 4745 bytes --]

Hello,

I was tracking down the following crash with 6.0 kernel with
patch-6.0.5-rt14.patch applied:

[ T6611] ------------[ cut here ]------------
[ T6611] kernel BUG at fs/inode.c:625!
[ T6611] Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
[ T6611] Modules linked in: xfs(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) mlx5_ib(E) ib_uverbs(E) ib_core(E) arm_spe_pmu(E) mlx5_core(E) sunrpc(E) mlxfw(E) pci_hyperv_intf(E) nls_iso8859_1(E) acpi_ipmi(E) nls_cp437(E) ipmi_ssif(E) vfat(E) ipmi_devintf(E) tls(E) igb(E) psample(E) button(E) arm_cmn(E) arm_dmc620_pmu(E) ipmi_msghandler(E) fat(E) cppc_cpufreq(E) arm_dsu_pmu(E) fuse(E) ip_tables(E) x_tables(E) ast(E) i2c_algo_bit(E) drm_vram_helper(E) aes_ce_blk(E) aes_ce_cipher(E) crct10dif_ce(E) ghash_ce(E) gf128mul(E) nvme(E) drm_kms_helper(E) sha2_ce(E) syscopyarea(E) sha256_arm64(E) sysfillrect(E) xhci_pci(E) sha1_ce(E) sysimgblt(E) nvme_core(E) xhci_pci_renesas(E) fb_sys_fops(E) nvme_common(E) drm_ttm_helper(E) sbsa_gwdt(E) t10_pi(E) ttm(E) xhci_hcd(E) crc64_rocksoft_generic(E) crc64_rocksoft(E) usbcore(E) crc64(E) drm(E) usb_common(E) i2c_designware_platform(E) i2c_designware_core(E) btrfs(E) blake2b_generic(E) libcrc32c(E) xor(E) xor_neon(E)
[ T6611]  raid6_pq(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E)
[ T6611] CPU: 11 PID: 6611 Comm: dbench Tainted: G            E   6.0.0-rt14-rt+ #1 4a18df02c109f1e703cf2ff86b77cf9cd9d5a188
[ T6611] Hardware name: GIGABYTE R272-P30-JG/MP32-AR0-JG, BIOS F16f (SCP: 1.06.20210615) 07/01/2021
[ T6611] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ T6611] pc : clear_inode+0xa0/0xc0
[ T6611] lr : clear_inode+0x38/0xc0
[ T6611] sp : ffff80000f4f3cd0
[ T6611] x29: ffff80000f4f3cd0 x28: ffff07ff92142000 x27: 0000000000000000
[ T6611] x26: ffff08012aef6058 x25: 0000000000000002 x24: ffffb657395e8000
[ T6611] x23: ffffb65739072008 x22: ffffb656e0bed0a8 x21: ffff08012aef6190
[ T6611] x20: ffff08012aef61f8 x19: ffff08012aef6058 x18: 0000000000000014
[ T6611] x17: 00000000f0d86255 x16: ffffb65737dfdb00 x15: 0100000004000000
[ T6611] x14: 644d000008090000 x13: 644d000008090000 x12: ffff80000f4f3b20
[ T6611] x11: 0000000000000002 x10: ffff083f5ffbe1c0 x9 : ffffb657388284a4
[ T6611] x8 : fffffffffffffffe x7 : ffff80000f4f3b20 x6 : ffff80000f4f3b20
[ T6611] x5 : ffff08012aef6210 x4 : ffff08012aef6210 x3 : 0000000000000000
[ T6611] x2 : ffff08012aef62d8 x1 : ffff07ff8fbbf690 x0 : ffff08012aef61a0
[ T6611] Call trace:
[ T6611]  clear_inode+0xa0/0xc0
[ T6611]  evict+0x160/0x180
[ T6611]  iput+0x154/0x240
[ T6611]  do_unlinkat+0x184/0x300
[ T6611]  __arm64_sys_unlinkat+0x48/0xc0
[ T6611]  el0_svc_common.constprop.4+0xe4/0x2c0
[ T6611]  do_el0_svc+0xac/0x100
[ T6611]  el0_svc+0x78/0x200
[ T6611]  el0t_64_sync_handler+0x9c/0xc0
[ T6611]  el0t_64_sync+0x19c/0x1a0
[ T6611] Code: d4210000 d503201f d4210000 d503201f (d4210000) 
[ T6611] ---[ end trace 0000000000000000 ]---

The machine is aarch64 architecture, kernel config is attached. I have seen
the crashes also with 5.14-rt kernel so it is not a new thing. The crash is
triggered relatively reliably (on two different aarch64 machines) by our
performance testing framework when running dbench benchmark against an XFS
filesystem.

Now originally I thought this is some problem with XFS or writeback code
but after debugging this for some time I don't think that anymore.
clear_inode() complains about inode->i_wb_list being non-empty. In fact
looking at the list_head, I can see it is corrupted. In all the occurences
of the problem ->prev points back to the list_head itself but ->next points
to some list_head that used to be part of the sb->s_inodes_wb list (or
actually that list spliced in wait_sb_inodes() because I've seen a pointer to
the stack as ->next pointer as well).

This is not just some memory ordering issue with the check in
clear_inode(). If I add sb->s_inode_wblist_lock locking around the check in
clear_inode(), the problem still reproduces.

If I enable CONFIG_DEBUG_LIST or if I convert sb->s_inode_wblist_lock to
raw_spinlock_t, the problem disappears.

Finally, I'd note that the list is modified from three places which makes
audit relatively simple. sb_mark_inode_writeback(),
sb_clear_inode_writeback(), and wait_sb_inodes(). All these places hold
sb->s_inode_wblist_lock when modifying the list. So at this point I'm at
loss what could be causing this. As unlikely as it seems to me I've started
wondering whether it is not some subtle issue with RT spinlocks on aarch64
possibly in combination with interrupts (because sb_clear_inode_writeback()
may be called from an interrupt).

Any ideas?

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: .config.gz --]
[-- Type: application/x-gzip, Size: 42126 bytes --]

             reply	other threads:[~2022-11-03 11:55 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-03 11:54 Jan Kara [this message]
2022-11-04  8:06 ` Crash with PREEMPT_RT on aarch64 machine Hillf Danton
2022-11-07 12:41   ` Jan Kara
2022-11-07 14:43     ` Hillf Danton
2022-11-04 16:30 ` Sebastian Andrzej Siewior
2022-11-07 13:56   ` Jan Kara
2022-11-07 15:10     ` Sebastian Andrzej Siewior
2022-11-07 16:30       ` Jan Kara
2022-11-07 17:12         ` Sebastian Andrzej Siewior
2022-11-07 16:49       ` Waiman Long
2022-11-08 10:53         ` Mark Rutland
2022-11-08 17:45           ` Jan Kara
2022-11-09  9:55             ` Mark Rutland
2022-11-09 10:11               ` Pierre Gondois
2022-11-09 10:54                 ` Jan Kara
2022-11-09 11:01               ` Jan Kara
2022-11-09 13:52                 ` Pierre Gondois
2022-11-09 14:21                   ` Pierre Gondois
2022-11-09 12:57         ` Will Deacon
2022-11-09 15:40           ` Jan Kara
2022-11-11 14:27             ` Jan Kara
2022-11-14 12:41               ` Will Deacon
2022-11-28 15:58                 ` Sebastian Andrzej Siewior
2022-11-28 20:30                   ` kernel test robot
2022-11-28 21:11                   ` kernel test robot
2022-11-29  5:16                   ` kernel test robot
2022-11-29  5:26                   ` kernel test robot
2022-11-29  6:48                   ` kernel test robot
2022-11-29  7:39                   ` kernel test robot
2022-11-30 17:20                   ` Pierre Gondois
2022-12-01 12:37                     ` Jan Kara
2022-11-30 20:22                   ` Mel Gorman
2022-12-01 17:09                     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221103115444.m2rjglbkubydidts@quack3 \
    --to=jack@suse.cz \
    --cc=bigeasy@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.