All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thilo Fromm <t-lo@linux.microsoft.com>
To: Jan Kara <jack@suse.cz>, Ye Bin <yebin10@huawei.com>
Cc: jack@suse.com, tytso@mit.edu, linux-ext4@vger.kernel.org,
	regressions@lists.linux.dev,
	Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Subject: Re: [syzbot] possible deadlock in jbd2_journal_lock_updates
Date: Fri, 14 Oct 2022 08:42:57 +0200	[thread overview]
Message-ID: <2ede5fce-7077-6e64-93a9-a7d993bc498f@linux.microsoft.com> (raw)
In-Reply-To: <20221010142410.GA1689@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>

Hello Honza, hello Ye,

Just want to make sure this does not get lost - as mentioned earlier, 
reverting 51ae846cff5 leads to a kernel build that does not have this issue.

>>>>>>>>> So this seems like a real issue. Essentially, the problem is that
>>>>>>>>> ext4_bmap() acquires inode->i_rwsem while its caller
>>>>>>>>> jbd2_journal_flush() is holding journal->j_checkpoint_mutex. This
>>>>>>>>> looks like a real deadlock possibility.
>>>>>>>>
>>> [...]
>>>>>>>> The issue can be triggered on Flatcar release 3227.2.2 / kernel version
>>>>>>>> 5.15.63 (we ship LTS kernels) but not on release 3227.2.1 / kernel 5.15.58.
>>>>>>>> 51ae846cff5 was introduced to 5.15 in 5.15.61.
>>>>>>>
>>>>>>> Well, so far your stacktraces do not really show anything pointing to that
>>>>>>> particular commit. So we need to understand that hang some more.
>>>>>>
>>> [...]
>>>>> So our stacktraces were mangled because historically our kernel build used
>>>>> INSTALL_MOD_STRIP=--strip-unneeded, we've now switched it back to --strip-debug
>>>>> which is the default. We're still using CONFIG_UNWINDER_ORC=y.
>>>>>
>>>>> Here's the hung task output after the change to stripping:
>>>>
>>>> Yeah, the stacktraces now look as what I'd expect. Thanks for fixing that!
>>>> Sadly they don't point to the culprit of the problem. They show jbd2/sda9-8
>>>> is waiting for someone to drop its transaction handle. Other processes are
>>>> waiting for jbd2/sda9-8 to commit a transaction. And then a few processes
>>>> are waiting for locks held by these waiting processes. But I don't see
>>>> anywhere the process holding the transaction handle. Can you please
>>>> reproduce the problem once more and when the system hangs run:
>>>>
>>>> echo w >/proc/sysrq-trigger
>>>>
>>>> Unlike softlockup detector, this will dump all blocked task so hopefully
>>>> we'll see the offending task there. Thanks!
>>
>>> [ 3451.530765] sysrq: Show Blocked State
>>> [ 3451.534632] task:jbd2/sda9-8     state:D stack:    0 pid:  704 ppid:    2
>>> flags:0x00004000
>>> [ 3451.543107] Call Trace:
>>> [ 3451.545671]  <TASK>
>>> [ 3451.547888]  __schedule+0x2eb/0x8d0
>>> [ 3451.551491]  schedule+0x5b/0xd0
>>> [ 3451.554749]  jbd2_journal_commit_transaction+0x301/0x18e0 [jbd2]
>>> [ 3451.560881]  ? wait_woken+0x70/0x70
>>> [ 3451.564485]  ? lock_timer_base+0x61/0x80
>>> [ 3451.568524]  kjournald2+0xab/0x270 [jbd2]
>>> [ 3451.572657]  ? wait_woken+0x70/0x70
>>> [ 3451.576258]  ? load_superblock.part.0+0xb0/0xb0 [jbd2]
>>> [ 3451.581526]  kthread+0x124/0x150
>>> [ 3451.584874]  ? set_kthread_struct+0x50/0x50
>>> [ 3451.589177]  ret_from_fork+0x1f/0x30
>>> [ 3451.592887]  </TASK>
>>
>> So again jdb2 waiting for the transaction handle to be dropped. The task
>> having the handle open is:
>>
>>> [ 3473.580964] task:containerd      state:D stack:    0 pid:92591 ppid:
>>> 70946 flags:0x00004000
>>> [ 3473.589432] Call Trace:
>>> [ 3473.591997]  <TASK>
>>> [ 3473.594209]  ? ext4_mark_iloc_dirty+0x56a/0xaf0 [ext4]
>>> [ 3473.599518]  ? __schedule+0x2eb/0x8d0
>>> [ 3473.603301]  ? _raw_spin_lock_irqsave+0x36/0x50
>>> [ 3473.607947]  ? __ext4_journal_start_sb+0xf8/0x110 [ext4]
>>> [ 3473.613393]  ? __wait_on_bit_lock+0x40/0xb0
>>> [ 3473.617689]  ? out_of_line_wait_on_bit_lock+0x92/0xb0
>>> [ 3473.622854]  ? var_wake_function+0x30/0x30
>>> [ 3473.627062]  ? ext4_xattr_block_set+0x865/0xf00 [ext4]
>>> [ 3473.632346]  ? ext4_xattr_set_handle+0x48e/0x630 [ext4]
>>> [ 3473.637718]  ? ext4_initxattrs+0x43/0x60 [ext4]
>>> [ 3473.642389]  ? security_inode_init_security+0xab/0x140
>>> [ 3473.647640]  ? ext4_init_acl+0x170/0x170 [ext4]
>>> [ 3473.652315]  ? __ext4_new_inode+0x11f7/0x1710 [ext4]
>>> [ 3473.657430]  ? ext4_create+0x115/0x1d0 [ext4]
>>> [ 3473.661935]  ? path_openat+0xf48/0x1280
>>> [ 3473.665888]  ? do_filp_open+0xa9/0x150
>>> [ 3473.669751]  ? vfs_statx+0x74/0x130
>>> [ 3473.673359]  ? __check_object_size+0x146/0x160
>>> [ 3473.677917]  ? do_sys_openat2+0x9b/0x160
>>> [ 3473.681953]  ? __x64_sys_openat+0x54/0xa0
>>> [ 3473.686076]  ? do_syscall_64+0x38/0x90
>>> [ 3473.689942]  ? entry_SYSCALL_64_after_hwframe+0x61/0xcb
>>> [ 3473.695281]  </TASK>
>>
>> Which seems to be waiting on something in ext4_xattr_block_set(). This
>> "something" is not quite clear because the stacktrace looks a bit
>> unreliable at the top - either it is a buffer lock or we are waiting for
>> xattr block reference usecount to decrease (which would kind of make sense
>> because there were changes to ext4 xattr block handling in the time window
>> where the lockup started happening).
>>
>> Can you try to feed the stacktrace through addr2line utility (it will need
>> objects & debug symbols for the kernel)? Maybe it will show something
>> useful...
> 
> Sure, I think this worked fine. It's the buffer lock but right before it we're
> opening a journal transaction. Symbolized it looks like this:
> 
>    ext4_mark_iloc_dirty (include/linux/buffer_head.h:308 fs/ext4/inode.c:5712) ext4
>    __schedule (kernel/sched/core.c:4994 kernel/sched/core.c:6341)
>    _raw_spin_lock_irqsave (arch/x86/include/asm/paravirt.h:585 arch/x86/include/asm/qspinlock.h:51 include/asm-generic/qspinlock.h:85 include/linux/spinlock.h:199 include/linux/spinlock_api_smp.h:119 kernel/locking/spinlock.c:162)
>    __ext4_journal_start_sb (fs/ext4/ext4_jbd2.c:105) ext4
>    __wait_on_bit_lock (arch/x86/include/asm/bitops.h:214 include/asm-generic/bitops/instrumented-non-atomic.h:135 kernel/sched/wait_bit.c:89)
>    out_of_line_wait_on_bit_lock (kernel/sched/wait_bit.c:118)
>    var_wake_function (kernel/sched/wait_bit.c:22)
>    ext4_xattr_block_set (include/linux/buffer_head.h:391 fs/ext4/xattr.c:2019) ext4
>    ext4_xattr_set_handle (fs/ext4/xattr.c:2395) ext4
>    ext4_initxattrs (fs/ext4/xattr_security.c:48) ext4
>    security_inode_init_security (security/security.c:1114)
>    ext4_init_acl (fs/ext4/xattr_security.c:38) ext4
>    __ext4_new_inode (fs/ext4/ialloc.c:1325) ext4
>    ext4_create (fs/ext4/namei.c:2796) ext4
>    path_openat (fs/namei.c:3334 fs/namei.c:3404 fs/namei.c:3612)
>    do_filp_open (fs/namei.c:3642)
>    vfs_statx (include/linux/namei.h:57 fs/stat.c:221)
>    __check_object_size (mm/usercopy.c:240 mm/usercopy.c:286 mm/usercopy.c:256)
>    do_sys_openat2 (fs/open.c:1214)
>    __x64_sys_openat (fs/open.c:1241)
>    do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
>    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:118)

Is the symbolised stack trace Jeremi sent helpful to get to the bottom 
of this issue? Can we do anything else to help?

Best regards,
Thilo

  reply	other threads:[~2022-10-14  6:43 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-08  7:34 [syzbot] possible deadlock in jbd2_journal_lock_updates syzbot
2022-08-08 16:38 ` syzbot
2022-08-24 10:06   ` Jan Kara
2022-09-28  7:30     ` Thilo Fromm
2022-09-29  8:27       ` Jan Kara
2022-09-29 13:18         ` Thilo Fromm
2022-10-04  6:38           ` Jeremi Piotrowski
2022-10-04  9:10             ` Jan Kara
2022-10-04 14:21               ` Thilo Fromm
2022-10-05 15:10                 ` Jan Kara
2022-10-10 14:24                   ` Jeremi Piotrowski
2022-10-14  6:42                     ` Thilo Fromm [this message]
2022-10-14 13:25                       ` Jan Kara
2022-10-21 10:23                         ` Thilo Fromm
2022-10-24 10:46                           ` Jan Kara
2022-10-24 16:32                             ` Thilo Fromm
2022-10-26 10:18                               ` Jan Kara
2022-11-10 12:57                                 ` Jeremi Piotrowski
2022-11-10 15:26                                   ` Jan Kara
2022-11-10 19:27                                     ` Jeremi Piotrowski
2022-11-11 14:24                                       ` Jan Kara
2022-11-11 15:10                                         ` Jeremi Piotrowski
2022-11-11 15:52                                           ` Jeremi Piotrowski
2022-11-21 13:35                                             ` Jan Kara
2022-11-21 15:00                                               ` Jan Kara
2022-11-21 15:18                                                 ` Thorsten Leemhuis
2022-11-21 15:40                                                   ` Jan Kara
2022-11-21 18:15                                                 ` Jeremi Piotrowski
2022-11-22 11:57                                                   ` Jan Kara
2022-11-22 17:48                                                     ` Jeremi Piotrowski
2022-11-23 19:41                                                       ` Jan Kara
2022-09-30 12:16       ` [syzbot] possible deadlock in jbd2_journal_lock_updates #forregzbot Thorsten Leemhuis
2022-11-23  9:56         ` Thorsten Leemhuis
2023-04-30 23:38 ` [syzbot] possible deadlock in jbd2_journal_lock_updates Theodore Ts'o
     [not found] <20220819122008.1561-1-hdanton@sina.com>
2022-08-19 16:00 ` syzbot
     [not found] <20220821023626.1810-1-hdanton@sina.com>
2022-08-21 10:34 ` syzbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2ede5fce-7077-6e64-93a9-a7d993bc498f@linux.microsoft.com \
    --to=t-lo@linux.microsoft.com \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=jpiotrowski@linux.microsoft.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=tytso@mit.edu \
    --cc=yebin10@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.