linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* looking for assistance with jbd2 (and other processes) hung trying to write to disk
@ 2020-11-09 21:11 Chris Friesen
  2020-11-10 11:42 ` Jan Kara
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Friesen @ 2020-11-09 21:11 UTC (permalink / raw)
  To: linux-ext4

Hi,

I'm running a 3.10.0-1127.rt56.1093 CentOS kernel.  I realize you don't 
support this particular kernel but I'm hoping for some general pointers.

I've got a system with four "housekeeping" CPUs, with rcu_nocbs and 
hohz_full used to reduce system overhead on the "application" CPUs, with 
four CPUs set as "isolcpus" to try and isolate them even further.  I 
have a crashdump vmcore file from a softdog expiry when the process that 
pets the softdog hung trying to write to /dev/log after the unix socket 
backlog had been reached.

I can see a "jbd2/nvme2n1p4-" process that appears to be hung for over 9 
minutes waiting to commit a transaction.  /dev/nvme2n1p4 corresponds to 
the root filesystem.  The "ps" and "bt" output from the crashdump are 
below.  syslog-ng is also blocked waiting on filesystem access, and 
there are other tasks also blocked on disk, including a few jbd2 tasks 
that are associated with device mapper.

Can anyone give some suggestions on how to track down what's causing the 
delay here?  I suspect there's a race condition somewhere similar to 
what happened with https://access.redhat.com/solutions/3226391, although 
that one was specific to device-mapper and the root filesystem here is 
directly on the nvme device.

Any suggestions would be appreciated.

Additional info below:

crash> ps -m 930
[0 00:09:11.694] [UN]  PID: 930    TASK: ffffa14b5f9032c0  CPU: 1 
COMMAND: "jbd2/nvme2n1p4-"

crash> bt 930
PID: 930    TASK: ffffa14b5f9032c0  CPU: 1   COMMAND: "jbd2/nvme2n1p4-"
  #0 [ffffa14b5ff0ba20] __schedule at ffffffffafe1b959
  #1 [ffffa14b5ff0bab0] schedule at ffffffffafe1be80
  #2 [ffffa14b5ff0bac8] schedule_timeout at ffffffffafe19d4c
  #3 [ffffa14b5ff0bb70] io_schedule_timeout at ffffffffafe1ab6d
  #4 [ffffa14b5ff0bba0] io_schedule at ffffffffafe1ac08
  #5 [ffffa14b5ff0bbb0] bit_wait_io at ffffffffafe1a561
  #6 [ffffa14b5ff0bbc8] __wait_on_bit at ffffffffafe1a087
  #7 [ffffa14b5ff0bc08] out_of_line_wait_on_bit at ffffffffafe1a1f1
  #8 [ffffa14b5ff0bc80] __wait_on_buffer at ffffffffaf85068a
  #9 [ffffa14b5ff0bc90] jbd2_journal_commit_transaction at 
ffffffffc0e543fc [jbd2]
#10 [ffffa14b5ff0be48] kjournald2 at ffffffffc0e5a6ad [jbd2]
#11 [ffffa14b5ff0bec8] kthread at ffffffffaf6ad781
#12 [ffffa14b5ff0bf50] ret_from_fork_nospec_begin at ffffffffafe1fe5d

Possibly of interest, syslog-ng is also blocked waiting on filesystem 
access:

crash> bt 1912
PID: 1912   TASK: ffffa14b62dc2610  CPU: 1   COMMAND: "syslog-ng"
  #0 [ffffa14b635b7980] __schedule at ffffffffafe1b959
  #1 [ffffa14b635b7a10] schedule at ffffffffafe1be80
  #2 [ffffa14b635b7a28] schedule_timeout at ffffffffafe19d4c
  #3 [ffffa14b635b7ad0] io_schedule_timeout at ffffffffafe1ab6d
  #4 [ffffa14b635b7b00] io_schedule at ffffffffafe1ac08
  #5 [ffffa14b635b7b10] bit_wait_io at ffffffffafe1a561
  #6 [ffffa14b635b7b28] __wait_on_bit at ffffffffafe1a087
  #7 [ffffa14b635b7b68] out_of_line_wait_on_bit at ffffffffafe1a1f1
  #8 [ffffa14b635b7be0] do_get_write_access at ffffffffc0e51e94 [jbd2]
  #9 [ffffa14b635b7c80] jbd2_journal_get_write_access at 
ffffffffc0e521b7 [jbd2]
#10 [ffffa14b635b7ca0] __ext4_journal_get_write_access at 
ffffffffc0eb8e31 [ext4]
#11 [ffffa14b635b7cd0] ext4_reserve_inode_write at ffffffffc0e87fa0 [ext4]
#12 [ffffa14b635b7d00] ext4_mark_inode_dirty at ffffffffc0e8801e [ext4]
#13 [ffffa14b635b7d58] ext4_dirty_inode at ffffffffc0e8bc40 [ext4]
#14 [ffffa14b635b7d78] __mark_inode_dirty at ffffffffaf84855d
#15 [ffffa14b635b7da8] ext4_setattr at ffffffffc0e8b558 [ext4]
#16 [ffffa14b635b7e18] notify_change at ffffffffaf8363fc
#17 [ffffa14b635b7e60] chown_common at ffffffffaf8128ac
#18 [ffffa14b635b7f08] sys_fchown at ffffffffaf813fb7
#19 [ffffa14b635b7f50] tracesys at ffffffffafe202a8 (via system_call)

One of the hung jbd2 tasks associated with device mapper:

crash> bt 1489
PID: 1489   TASK: ffffa14b641f0000  CPU: 1   COMMAND: "jbd2/dm-0-8"
  #0 [ffffa14b5fab7a20] __schedule at ffffffffafe1b959
  #1 [ffffa14b5fab7ab0] schedule at ffffffffafe1be80
  #2 [ffffa14b5fab7ac8] schedule_timeout at ffffffffafe19d4c
  #3 [ffffa14b5fab7b70] io_schedule_timeout at ffffffffafe1ab6d
  #4 [ffffa14b5fab7ba0] io_schedule at ffffffffafe1ac08
  #5 [ffffa14b5fab7bb0] bit_wait_io at ffffffffafe1a561
  #6 [ffffa14b5fab7bc8] __wait_on_bit at ffffffffafe1a087
  #7 [ffffa14b5fab7c08] out_of_line_wait_on_bit at ffffffffafe1a1f1
  #8 [ffffa14b5fab7c80] __wait_on_buffer at ffffffffaf85068a
  #9 [ffffa14b5fab7c90] jbd2_journal_commit_transaction at 
ffffffffc0e543fc [jbd2]
#10 [ffffa14b5fab7e48] kjournald2 at ffffffffc0e5a6ad [jbd2]
#11 [ffffa14b5fab7ec8] kthread at ffffffffaf6ad781
#12 [ffffa14b5fab7f50] ret_from_fork_nospec_begin at ffffffffafe1fe5d

Thanks,

Chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: looking for assistance with jbd2 (and other processes) hung trying to write to disk
  2020-11-09 21:11 looking for assistance with jbd2 (and other processes) hung trying to write to disk Chris Friesen
@ 2020-11-10 11:42 ` Jan Kara
  2020-11-10 15:57   ` Chris Friesen
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2020-11-10 11:42 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-ext4

Hello!

On Mon 09-11-20 15:11:58, Chris Friesen wrote:
> I'm running a 3.10.0-1127.rt56.1093 CentOS kernel.  I realize you don't
> support this particular kernel but I'm hoping for some general pointers.
> 
> I've got a system with four "housekeeping" CPUs, with rcu_nocbs and
> hohz_full used to reduce system overhead on the "application" CPUs, with
> four CPUs set as "isolcpus" to try and isolate them even further.  I have a
> crashdump vmcore file from a softdog expiry when the process that pets the
> softdog hung trying to write to /dev/log after the unix socket backlog had
> been reached.
> 
> I can see a "jbd2/nvme2n1p4-" process that appears to be hung for over 9
> minutes waiting to commit a transaction.  /dev/nvme2n1p4 corresponds to the
> root filesystem.  The "ps" and "bt" output from the crashdump are below.
> syslog-ng is also blocked waiting on filesystem access, and there are other
> tasks also blocked on disk, including a few jbd2 tasks that are associated
> with device mapper.

Yeah, 9 minutes seems far too long.

> Can anyone give some suggestions on how to track down what's causing the
> delay here?  I suspect there's a race condition somewhere similar to what
> happened with https://access.redhat.com/solutions/3226391, although that one
> was specific to device-mapper and the root filesystem here is directly on
> the nvme device.

Sadly I don't have access to RH portal to be able to check what that hang
was about...

> crash> ps -m 930
> [0 00:09:11.694] [UN]  PID: 930    TASK: ffffa14b5f9032c0  CPU: 1 COMMAND:
> "jbd2/nvme2n1p4-"
> 

Are the tasks below the only ones hanging in D state (UN state in crash)?
Because I can see processes are waiting for the locked buffer but it is
unclear who is holding the buffer lock...

> crash> bt 930
> PID: 930    TASK: ffffa14b5f9032c0  CPU: 1   COMMAND: "jbd2/nvme2n1p4-"
>  #0 [ffffa14b5ff0ba20] __schedule at ffffffffafe1b959
>  #1 [ffffa14b5ff0bab0] schedule at ffffffffafe1be80
>  #2 [ffffa14b5ff0bac8] schedule_timeout at ffffffffafe19d4c
>  #3 [ffffa14b5ff0bb70] io_schedule_timeout at ffffffffafe1ab6d
>  #4 [ffffa14b5ff0bba0] io_schedule at ffffffffafe1ac08
>  #5 [ffffa14b5ff0bbb0] bit_wait_io at ffffffffafe1a561
>  #6 [ffffa14b5ff0bbc8] __wait_on_bit at ffffffffafe1a087
>  #7 [ffffa14b5ff0bc08] out_of_line_wait_on_bit at ffffffffafe1a1f1
>  #8 [ffffa14b5ff0bc80] __wait_on_buffer at ffffffffaf85068a
>  #9 [ffffa14b5ff0bc90] jbd2_journal_commit_transaction at ffffffffc0e543fc
> [jbd2]
> #10 [ffffa14b5ff0be48] kjournald2 at ffffffffc0e5a6ad [jbd2]
> #11 [ffffa14b5ff0bec8] kthread at ffffffffaf6ad781
> #12 [ffffa14b5ff0bf50] ret_from_fork_nospec_begin at ffffffffafe1fe5d
> 
> Possibly of interest, syslog-ng is also blocked waiting on filesystem
> access:
> 
> crash> bt 1912
> PID: 1912   TASK: ffffa14b62dc2610  CPU: 1   COMMAND: "syslog-ng"
>  #0 [ffffa14b635b7980] __schedule at ffffffffafe1b959
>  #1 [ffffa14b635b7a10] schedule at ffffffffafe1be80
>  #2 [ffffa14b635b7a28] schedule_timeout at ffffffffafe19d4c
>  #3 [ffffa14b635b7ad0] io_schedule_timeout at ffffffffafe1ab6d
>  #4 [ffffa14b635b7b00] io_schedule at ffffffffafe1ac08
>  #5 [ffffa14b635b7b10] bit_wait_io at ffffffffafe1a561
>  #6 [ffffa14b635b7b28] __wait_on_bit at ffffffffafe1a087
>  #7 [ffffa14b635b7b68] out_of_line_wait_on_bit at ffffffffafe1a1f1
>  #8 [ffffa14b635b7be0] do_get_write_access at ffffffffc0e51e94 [jbd2]
>  #9 [ffffa14b635b7c80] jbd2_journal_get_write_access at ffffffffc0e521b7
> [jbd2]
> #10 [ffffa14b635b7ca0] __ext4_journal_get_write_access at ffffffffc0eb8e31
> [ext4]
> #11 [ffffa14b635b7cd0] ext4_reserve_inode_write at ffffffffc0e87fa0 [ext4]
> #12 [ffffa14b635b7d00] ext4_mark_inode_dirty at ffffffffc0e8801e [ext4]
> #13 [ffffa14b635b7d58] ext4_dirty_inode at ffffffffc0e8bc40 [ext4]
> #14 [ffffa14b635b7d78] __mark_inode_dirty at ffffffffaf84855d
> #15 [ffffa14b635b7da8] ext4_setattr at ffffffffc0e8b558 [ext4]
> #16 [ffffa14b635b7e18] notify_change at ffffffffaf8363fc
> #17 [ffffa14b635b7e60] chown_common at ffffffffaf8128ac
> #18 [ffffa14b635b7f08] sys_fchown at ffffffffaf813fb7
> #19 [ffffa14b635b7f50] tracesys at ffffffffafe202a8 (via system_call)
> 
> One of the hung jbd2 tasks associated with device mapper:
> 
> crash> bt 1489
> PID: 1489   TASK: ffffa14b641f0000  CPU: 1   COMMAND: "jbd2/dm-0-8"
>  #0 [ffffa14b5fab7a20] __schedule at ffffffffafe1b959
>  #1 [ffffa14b5fab7ab0] schedule at ffffffffafe1be80
>  #2 [ffffa14b5fab7ac8] schedule_timeout at ffffffffafe19d4c
>  #3 [ffffa14b5fab7b70] io_schedule_timeout at ffffffffafe1ab6d
>  #4 [ffffa14b5fab7ba0] io_schedule at ffffffffafe1ac08
>  #5 [ffffa14b5fab7bb0] bit_wait_io at ffffffffafe1a561
>  #6 [ffffa14b5fab7bc8] __wait_on_bit at ffffffffafe1a087
>  #7 [ffffa14b5fab7c08] out_of_line_wait_on_bit at ffffffffafe1a1f1
>  #8 [ffffa14b5fab7c80] __wait_on_buffer at ffffffffaf85068a
>  #9 [ffffa14b5fab7c90] jbd2_journal_commit_transaction at ffffffffc0e543fc
> [jbd2]
> #10 [ffffa14b5fab7e48] kjournald2 at ffffffffc0e5a6ad [jbd2]
> #11 [ffffa14b5fab7ec8] kthread at ffffffffaf6ad781
> #12 [ffffa14b5fab7f50] ret_from_fork_nospec_begin at ffffffffafe1fe5d
> 
> Thanks,
> 
> Chris
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: looking for assistance with jbd2 (and other processes) hung trying to write to disk
  2020-11-10 11:42 ` Jan Kara
@ 2020-11-10 15:57   ` Chris Friesen
  2020-11-10 19:46     ` Theodore Y. Ts'o
  2020-11-11 15:57     ` Jan Kara
  0 siblings, 2 replies; 7+ messages in thread
From: Chris Friesen @ 2020-11-10 15:57 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

On 11/10/2020 5:42 AM, Jan Kara wrote:
> On Mon 09-11-20 15:11:58, Chris Friesen wrote:

>> Can anyone give some suggestions on how to track down what's causing the
>> delay here?  I suspect there's a race condition somewhere similar to what
>> happened with https://access.redhat.com/solutions/3226391, although that one
>> was specific to device-mapper and the root filesystem here is directly on
>> the nvme device.
> 
> Sadly I don't have access to RH portal to be able to check what that hang
> was about...

They had exactly the same stack trace (different addresses) with 
"jbd2/dm-16-8" trying to commit a journal transaction.  In their case it 
was apparently due to two problems, "the RAID1 code leaking the r1bio", 
and "dm-raid not handling a needed retry scenario".  They fixed it by 
backporting upstream commits.  The kernel we're running should have 
those fixes, and in our case we're operating directly on an nvme device.

>> crash> ps -m 930
>> [0 00:09:11.694] [UN]  PID: 930    TASK: ffffa14b5f9032c0  CPU: 1 COMMAND:
>> "jbd2/nvme2n1p4-"
>>
> 
> Are the tasks below the only ones hanging in D state (UN state in crash)?
> Because I can see processes are waiting for the locked buffer but it is
> unclear who is holding the buffer lock...

No, there are quite a few of them.  I've included them below.  I agree, 
it's not clear who's holding the lock.  Is there a way to find that out?

Just to be sure, I'm looking for whoever has the BH_Lock bit set on the 
buffer_head "b_state" field, right?  I don't see any ownership field the 
way we have for mutexes.  Is there some way to find out who would have 
locked the buffer?

Do you think it would help at all to enable CONFIG_JBD_DEBUG?

Processes in "UN" state in crashdump:

crash> ps|grep UN
       1      0   1  ffffa14b687d8000  UN   0.0  193616   6620  systemd
     930      2   1  ffffa14b5f9032c0  UN   0.0       0      0 
[jbd2/nvme2n1p4-]
    1489      2   1  ffffa14b641f0000  UN   0.0       0      0 
[jbd2/dm-0-8]
    1494      2   1  ffffa14b641f2610  UN   0.0       0      0 
[jbd2/dm-11-8]
    1523      2   1  ffffa14b64182610  UN   0.0       0      0 
[jbd2/dm-1-8]
    1912      1   1  ffffa14b62dc2610  UN   0.0  117868  17568  syslog-ng
   86293      1   1  ffffa14ae4650cb0  UN   0.1 4618100 116664  containerd
   86314      1   1  ffffa14ae2639960  UN   0.1 4618100 116664  containerd
   88019      1   1  ffffa14ae26ad8d0  UN   0.2  651196 210260  safe_timer
   90539      1   1  ffffa13caca3bf70  UN   0.0   25868   2140  fsmond
   94006  93595   1  ffffa14ae31fe580  UN   0.1 13843140 113604  etcd
   95061  93508   1  ffffa14a913e8cb0  UN   0.1  721888 114652  log
   96367      1   1  ffffa14af53f9960  UN   0.0  119404  19084  python
   121292      1   1  ffffa14ae18932c0  UN   0.1 4618100 116664  containerd
   122042      1   1  ffffa14a950a6580  UN   0.0  111680   9496 
containerd-shim
   126119  122328  23  ffffa14b3d76a610  UN   0.0       0      0  com.xcg
   126171  122328  47  ffffa14a91571960  UN   0.0       0      0  com.xcg
   126173  122328  23  ffffa14a91573f70  UN   0.0       0      0  com.xcg
   126177  122328  23  ffffa14a91888000  UN   0.0       0      0  com.xcg
   128049  124763  47  ffffa14a964e6580  UN   0.1 1817292  80388  confd
   136938  136924   1  ffffa14b5bb7d8d0  UN   0.0  146256  25672  coredns
   136972  136924   1  ffffa14a9aae2610  UN   0.0  146256  25672  coredns
   136978  136924   1  ffffa14ae2238000  UN   0.0  146256  25672  coredns
   143026  142739   1  ffffa14b035e0000  UN   0.0       0      0  cainjector
   166456  165537  44  ffffa14af3cb8000  UN   0.0  325468  10736  nronmd.xcg
   166712  165537  44  ffffa149a2fecc20  UN   0.0  200116   3728  vpms.xcg
   166725  165537  44  ffffa14962fb6580  UN   0.1 2108336  58176  vrlcb.xcg
   166860  165537  45  ffffa14afd22bf70  UN   0.0  848320  12180  gcci.xcg
   166882  165537  45  ffffa14aff3c58d0  UN   0.0  693256  11624  ndc.xcg
   167556  165537  44  ffffa14929a6cc20  UN   0.0  119604   2612  gcdm.xcg
   170732  122328  23  ffffa1492987bf70  UN   0.0  616660   4348  com.xcg
   170741  122328  46  ffffa1492987cc20  UN   0.0       0      0  com.xcg
   170745  122328  23  ffffa1492987e580  UN   0.0       0      0  com.xcg
   170750  122328  23  ffffa14924d4f230  UN   0.0       0      0  com.xcg
   170774  122328  23  ffffa14924d4bf70  UN   0.0       0      0  com.xcg
   189870  187717  46  ffffa14873591960  UN   0.1  881516  83840  filebeat
   332649  136924   1  ffffa147efd49960  UN   0.0  146256  25672  coredns
   1036113  3779184  23  ffffa13c9317bf70  UN   0.9 6703644 878848 
pool-3-thread-1
   1793349      2   1  ffffa14ae2402610  UN   0.0       0      0 
[kworker/1:0]
   1850718  166101   0  ffffa14807448cb0  UN   0.0   18724   6068  exe
   1850727  1850722   0  ffffa147e18dd8d0  UN   0.0   18724   6068  exe
   1850733  120305   1  ffffa147e18da610  UN   0.0  135924   6512  runc
   1850792  128006  46  ffffa14ae1948cb0  UN   0.0   21716   1280  logrotate
   1850914  1850911   1  ffffa147086dbf70  UN   0.0   18724   6068  exe
   1851274  127909  46  ffffa14703661960  UN   0.0   53344   3232 
redis-server
   1851474  1850787   1  ffffa1470026cc20  UN   0.0  115704   1244  ceph
   1853422  1853340  44  ffffa146dfdc1960  UN   0.0   12396   2312  sh
   1854005      1   1  ffffa146d7d8f230  UN   0.0  116872    812  mkdir
   1854955  2847282   1  ffffa146c5d18cb0  UN   0.0   18724   6068  exe
   1856515  166108   1  ffffa146aa071960  UN   0.0   18724   6068  exe
   1856602  84624   1  ffffa146aa073f70  UN   0.0  184416   1988  crond
   1859661  1859658   1  ffffa14672090000  UN   0.0  116872    812  mkdir
   2232051  165443   7  ffffa147e1ac0000  UN   0.0       0      0 
eal-intr-thread


Thanks,
Chris


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: looking for assistance with jbd2 (and other processes) hung trying to write to disk
  2020-11-10 15:57   ` Chris Friesen
@ 2020-11-10 19:46     ` Theodore Y. Ts'o
  2020-11-10 20:14       ` Chris Friesen
  2020-11-11 15:57     ` Jan Kara
  1 sibling, 1 reply; 7+ messages in thread
From: Theodore Y. Ts'o @ 2020-11-10 19:46 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Jan Kara, linux-ext4

On Tue, Nov 10, 2020 at 09:57:39AM -0600, Chris Friesen wrote:
> No, there are quite a few of them.  I've included them below.  I agree, it's
> not clear who's holding the lock.  Is there a way to find that out?
> 
> Just to be sure, I'm looking for whoever has the BH_Lock bit set on the
> buffer_head "b_state" field, right?  I don't see any ownership field the way
> we have for mutexes.  Is there some way to find out who would have locked
> the buffer?

It's quite possible that the buffer was locked as part of doing I/O,
and we are just waiting for the I/O to complete.  An example of this
is in journal_submit_commit_record(), where we lock the buffer using
lock_buffer(), and then call submit_bh() to submit the buffer for I/O.
When the I/O is completed, the buffer head will be unlocked, and we
can check the buffer_uptodate flag to see if the I/O completed
successfully.  (See journal_wait_on_commit_record() for an example of
this.)

So the first thing I'd suggest doing is looking at the console output
or dmesg output from the crashdump to see if there are any clues in
terms of kernel messages from the device driver before things locked
up.  This could be as simple as the device falling off the bus, in
which case there might be some kernel error messages from the block
layer or device driver that would give some insight.

Good luck,

					- Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: looking for assistance with jbd2 (and other processes) hung trying to write to disk
  2020-11-10 19:46     ` Theodore Y. Ts'o
@ 2020-11-10 20:14       ` Chris Friesen
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Friesen @ 2020-11-10 20:14 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: Jan Kara, linux-ext4

On 11/10/2020 1:46 PM, Theodore Y. Ts'o wrote:
> [Please note this e-mail is from an EXTERNAL e-mail address]
> 
> On Tue, Nov 10, 2020 at 09:57:39AM -0600, Chris Friesen wrote:

>> Just to be sure, I'm looking for whoever has the BH_Lock bit set on the
>> buffer_head "b_state" field, right?  I don't see any ownership field the way
>> we have for mutexes.  Is there some way to find out who would have locked
>> the buffer?
> 
> It's quite possible that the buffer was locked as part of doing I/O,
> and we are just waiting for the I/O to complete.  An example of this
> is in journal_submit_commit_record(), where we lock the buffer using
> lock_buffer(), and then call submit_bh() to submit the buffer for I/O.
> When the I/O is completed, the buffer head will be unlocked, and we
> can check the buffer_uptodate flag to see if the I/O completed
> successfully.  (See journal_wait_on_commit_record() for an example of
> this.)

Running "ps -m 'jbd2'" in the crashdump shows jbd2/nvme2n1p4- in the 
uninterruptible state, with a "last run" timestamp of over 9 minutes 
before the crash.  Same for a number of jbd2/dm* tasks.  This seems like 
a very long time to wait for I/O to complete, which is why I'm assuming 
something's gone off the rails.

> So the first thing I'd suggest doing is looking at the console output
> or dmesg output from the crashdump to see if there are any clues in
> terms of kernel messages from the device driver before things locked
> up.  This could be as simple as the device falling off the bus, in
> which case there might be some kernel error messages from the block
> layer or device driver that would give some insight.

The timeline looks like this (CPUs 0,1,24,25 are the housekeeping CPUS):

The only device-related issue I see is this, just a bit over 9 minutes 
before the eventual panic.  Prior to this there are no crashdump dmesg 
logs for a couple hours previous.
[119982.636995] WARNING: CPU: 1 PID: 21 at net/sched/sch_generic.c:360 
dev_watchdog+0x268/0x280
[119982.636997] NETDEV WATCHDOG: mh0 (iavf): transmit queue 3 timed out

Then I see rcu_sched self-detecting stalls:
[120024.146369] INFO: rcu_sched self-detected stall on CPU { 25} 
(t=60000 jiffies g=10078853 c=10078852 q=250)
[120203.725976] INFO: rcu_sched self-detected stall on CPU { 25} 
(t=240003 jiffies g=10078853 c=10078852 q=361)
[120383.305584] INFO: rcu_sched self-detected stall on CPU { 25} 
(t=420006 jiffies g=10078853 c=10078852 q=401)

The actual panic is here:
[120536.886219] Kernel panic - not syncing: Software Watchdog Timer expired
[120536.886221] CPU: 1 PID: 21 Comm: ktimersoftd/1 Kdump: loaded 
Tainted: G        W  O   ------------ T 
3.10.0-1127.rt56.1093.el7.tis.2.x86_64 #1


Chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: looking for assistance with jbd2 (and other processes) hung trying to write to disk
  2020-11-10 15:57   ` Chris Friesen
  2020-11-10 19:46     ` Theodore Y. Ts'o
@ 2020-11-11 15:57     ` Jan Kara
  2020-11-11 16:24       ` Chris Friesen
  1 sibling, 1 reply; 7+ messages in thread
From: Jan Kara @ 2020-11-11 15:57 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Jan Kara, linux-ext4

On Tue 10-11-20 09:57:39, Chris Friesen wrote:
> On 11/10/2020 5:42 AM, Jan Kara wrote:
> > On Mon 09-11-20 15:11:58, Chris Friesen wrote:
> 
> > > Can anyone give some suggestions on how to track down what's causing the
> > > delay here?  I suspect there's a race condition somewhere similar to what
> > > happened with https://access.redhat.com/solutions/3226391, although that one
> > > was specific to device-mapper and the root filesystem here is directly on
> > > the nvme device.
> > 
> > Sadly I don't have access to RH portal to be able to check what that hang
> > was about...
> 
> They had exactly the same stack trace (different addresses) with
> "jbd2/dm-16-8" trying to commit a journal transaction.  In their case it was
> apparently due to two problems, "the RAID1 code leaking the r1bio", and
> "dm-raid not handling a needed retry scenario".  They fixed it by
> backporting upstream commits.  The kernel we're running should have those
> fixes, and in our case we're operating directly on an nvme device.

I see. Thanks for explanation.

> > > crash> ps -m 930
> > > [0 00:09:11.694] [UN]  PID: 930    TASK: ffffa14b5f9032c0  CPU: 1 COMMAND:
> > > "jbd2/nvme2n1p4-"
> > > 
> > 
> > Are the tasks below the only ones hanging in D state (UN state in crash)?
> > Because I can see processes are waiting for the locked buffer but it is
> > unclear who is holding the buffer lock...
> 
> No, there are quite a few of them.  I've included them below.  I agree, it's
> not clear who's holding the lock.  Is there a way to find that out?
> 
> Just to be sure, I'm looking for whoever has the BH_Lock bit set on the
> buffer_head "b_state" field, right?  I don't see any ownership field the way
> we have for mutexes.  Is there some way to find out who would have locked
> the buffer?

Buffer lock is a bitlock so there's no owner field. If you can reproduce
the problem at will and can use debug kernels, then it's easiest to add
code to lock_buffer() (and fields to struct buffer_head) to track lock
owner and then see who locked the buffer. Without this, the only way is to
check stack traces of all UN processes and see whether some stacktrace
looks suspicious like it could hold the buffer locked (e.g. recursing into
memory allocation and reclaim while holding buffer locked or something like
that)...

As Ted wrote the buffer is indeed usually locked because the IO is running
and that would be the expected situation with the jdb2 stacktrace you
posted. So it could also be the IO got stuck somewhere in the block layer
or NVME (frankly, AFAIR NVME was pretty rudimentary with 3.10). To see
whether that's the case, you need to find 'bio' pointing to the buffer_head
(through bi_private field), possibly also struct request for that bio and see
what state they are in... Again, if you can run debug kernels, you can
write code to simplify this search for you...

> Do you think it would help at all to enable CONFIG_JBD_DEBUG?

I don't think so.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: looking for assistance with jbd2 (and other processes) hung trying to write to disk
  2020-11-11 15:57     ` Jan Kara
@ 2020-11-11 16:24       ` Chris Friesen
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Friesen @ 2020-11-11 16:24 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

On 11/11/2020 9:57 AM, Jan Kara wrote:
> On Tue 10-11-20 09:57:39, Chris Friesen wrote:
>> Just to be sure, I'm looking for whoever has the BH_Lock bit set on the
>> buffer_head "b_state" field, right?  I don't see any ownership field the way
>> we have for mutexes.  Is there some way to find out who would have locked
>> the buffer?
> 
> Buffer lock is a bitlock so there's no owner field. If you can reproduce
> the problem at will and can use debug kernels, then it's easiest to add
> code to lock_buffer() (and fields to struct buffer_head) to track lock
> owner and then see who locked the buffer. Without this, the only way is to
> check stack traces of all UN processes and see whether some stacktrace
> looks suspicious like it could hold the buffer locked (e.g. recursing into
> memory allocation and reclaim while holding buffer locked or something like
> that)...

That's what I thought. :)   Debug kernels are doable, but unfortunately 
we can't (yet) reproduce the problem at will.  Naturally it's only shown 
up in a couple of customer sites so far and not in any test labs.

> As Ted wrote the buffer is indeed usually locked because the IO is running
> and that would be the expected situation with the jdb2 stacktrace you
> posted. So it could also be the IO got stuck somewhere in the block layer
> or NVME (frankly, AFAIR NVME was pretty rudimentary with 3.10). To see
> whether that's the case, you need to find 'bio' pointing to the buffer_head
> (through bi_private field), possibly also struct request for that bio and see
> what state they are in... Again, if you can run debug kernels, you can
> write code to simplify this search for you...

Thanks, that's helpful.

Chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-11-11 16:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-09 21:11 looking for assistance with jbd2 (and other processes) hung trying to write to disk Chris Friesen
2020-11-10 11:42 ` Jan Kara
2020-11-10 15:57   ` Chris Friesen
2020-11-10 19:46     ` Theodore Y. Ts'o
2020-11-10 20:14       ` Chris Friesen
2020-11-11 15:57     ` Jan Kara
2020-11-11 16:24       ` Chris Friesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).