Kernel Newbies archive on lore.kernel.org
 help / Atom feed
* High IO load cause kswapd hang and unresponsive system
@ 2018-11-15 16:55 narutonguyen2018
  2018-11-15 16:55 ` Naruto Nguyen
  2018-11-16  5:10 ` mulyadi.santosa
  0 siblings, 2 replies; 8+ messages in thread
From: narutonguyen2018 @ 2018-11-15 16:55 UTC (permalink / raw)
  To: kernelnewbies

Hi everyone,

Recently when I upgrade my kernel from 3.12.61 to kernel 4.4.74, I
find my system sometimes is very slow and unresponsive

When my system is under a heavy IO load, I see the node start to
respond very slow, the IO wait time keeps very high, and the node
starts unresponsive. I try to dump all block task at that time and
found some interesting things, look like under a heavy load I/O, the
kswap start to hang in jbd2_log_wait_commit.

Sep 07 12:25:35 myPC kernel: [11039.126514] kswapd0         D
0000000000000000     0    26      2 0x00000000
Sep 07 12:25:35 myPC kernel: [11039.126517]  ffff88003da27b78
ffff88000ce08280 ffff88003cdf0680 ffff88003da28000
Sep 07 12:25:35 myPC kernel: [11039.126519]  ffff88001a15c888
ffff88001a15c824 ffff88003da27bb8 ffff88001a15c8a0
Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
ffffffff815e4c85 ffff88001a15c800 0000000000013183
Sep 07 12:25:35 myPC kernel: [11039.126522] Call Trace:
Sep 07 12:25:35 myPC kernel: [11039.126539]  [<ffffffff815e4c85>]
schedule+0x35/0x80
Sep 07 12:25:35 myPC kernel: [11039.126561]  [<ffffffffa02faada>]
jbd2_log_wait_commit+0x8a/0xf0 [jbd2]
Sep 07 12:25:35 myPC kernel: [11039.126621]  [<ffffffffa03b28a4>]
ext4_evict_inode+0x2d4/0x4b0 [ext4]
Sep 07 12:25:35 myPC kernel: [11039.126630]  [<ffffffff81215a7a>]
evict+0xba/0x190
Sep 07 12:25:35 myPC kernel: [11039.126635]  [<ffffffff81215b82>]
dispose_list+0x32/0x50
Sep 07 12:25:35 myPC kernel: [11039.126639]  [<ffffffff81216e52>]
prune_icache_sb+0x42/0x50
Sep 07 12:25:35 myPC kernel: [11039.126644]  [<ffffffff811ff8f6>]
super_cache_scan+0x136/0x180
Sep 07 12:25:35 myPC kernel: [11039.126653]  [<ffffffff81196a1b>]
shrink_slab.part.41+0x20b/0x3f0
Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
ffffffff815e4c85 ffff88001a15c800 0000000000013183

And after that more and more processes are hitting STAT D, nfsd is
hang to wait or the inode deletion to complete in
__wait_on_freeing_inode.

Sep 07 12:25:35 myPC kernel: [11039.390218] nfsd            D
0000000000000000     0 30419      2 0x00000080
Sep 07 12:25:35 myPC kernel: [11039.390221]  ffff88000da47a98
ffff8800277ec380 ffff8800374ec580 ffff88000da48000
Sep 07 12:25:35 myPC kernel: [11039.390223]  ffff88000da47ad8
ffffc9000011edc8 0000000000000000 ffff88000e92e2f8
Sep 07 12:25:35 myPC kernel: [11039.390225]  ffff88000da47ab0
ffffffff815e4c85 ffffffff81c094c0 ffff88003ff9d8b8
Sep 07 12:25:35 myPC kernel: [11039.390225] Call Trace:
Sep 07 12:25:35 myPC kernel: [11039.390231]  [<ffffffff815e4c85>]
schedule+0x35/0x80
Sep 07 12:25:35 myPC kernel: [11039.390236]  [<ffffffff812157fc>]
__wait_on_freeing_inode+0x9c/0xc0
Sep 07 12:25:35 myPC kernel: [11039.390241]  [<ffffffff81215d83>]
find_inode_fast+0x73/0xa0
Sep 07 12:25:35 myPC kernel: [11039.390245]  [<ffffffff81216052>]
iget_locked+0x72/0x190
Sep 07 12:25:35 myPC kernel: [11039.390284]  [<ffffffffa03aeb8c>]
ext4_iget+0x2c/0xb10 [ext4]
Sep 07 12:25:35 myPC kernel: [11039.390314]  [<ffffffffa03c1055>]
ext4_nfs_get_inode+0x45/0x80 [ext4]
Sep 07 12:25:35 myPC kernel: [11039.390320]  [<ffffffff812211c4>]
generic_fh_to_dentry+0x24/0x30
Sep 07 12:25:35 myPC kernel: [11039.390326]  [<ffffffff81277ede>]
exportfs_decode_fh+0x5e/0x350
Sep 07 12:25:35 myPC kernel: [11039.390347]  [<ffffffffa047749a>]
fh_verify+0x2da/0x5a0 [nfsd]
Sep 07 12:25:35 myPC kernel: [11039.390362]  [<ffffffffa0480933>]
nfsd3_proc_getattr+0x63/0xd0 [nfsd]
Sep 07 12:25:35 myPC kernel: [11039.390374]  [<ffffffffa0473e23>]
nfsd_dispatch+0xc3/0x260 [nfsd]
Sep 07 12:25:35 myPC kernel: [11039.390411]  [<ffffffffa0322108>]
svc_process_common+0x418/0x6a0 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.390439]  [<ffffffffa032248d>]
svc_process+0xfd/0x1b0 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.390454]  [<ffffffffa047389a>]
nfsd+0xea/0x160 [nfsd]

That causes the other nfs client is hang to wait nfsd to continue.

Sep 07 12:25:35 myPC kernel: [11039.207987] Call Trace:
Sep 07 12:25:35 myPC kernel: [11039.207994]  [<ffffffff815e4c85>]
schedule+0x35/0x80
Sep 07 12:25:35 myPC kernel: [11039.208008]  [<ffffffffa031acce>]
rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.208055]  [<ffffffff815e5098>]
__wait_on_bit+0x58/0x90
Sep 07 12:25:35 myPC kernel: [11039.208059]  [<ffffffff815e513e>]
out_of_line_wait_on_bit+0x6e/0x80
Sep 07 12:25:35 myPC kernel: [11039.208082]  [<ffffffffa031bd26>]
__rpc_execute+0x146/0x440 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.208104]  [<ffffffffa0313049>]
rpc_run_task+0x69/0x80 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.208126]  [<ffffffffa031309f>]
rpc_call_sync+0x3f/0xa0 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.204818]  [<ffffffff81200d69>]
vfs_fstatat+0x49/0x90
Sep 07 12:25:35 myPC kernel: [11039.204821]  [<ffffffff812011ea>]
SYSC_newstat+0x1a/0x40
Sep 07 12:25:35 myPC kernel: [11039.204826]  [<ffffffff815e876e>]
entry_SYSCALL_64_fastpath+0x12/0x6d
Sep 07 12:25:35 myPC kernel: [11039.207972] DWARF2 unwinder stuck at
entry_SYSCALL_64_fastpath+0x12/0x6d

I guess under heavy load, somehow the inode cannot be evicted in
kswap, so the nfsd just keeps waiting and cause the whole system ends
up in unresponsive state. I found some related issues
https://www.systutorials.com/linux-kernels/105429/nfsv4-1-prevent-a-3-way-deadlock-between-layoutreturn-open-and-state-recovery-linux-3-12-17/
(it?s not the same as it?s nfsv4 but just refer for the same symptom.)

My system has very limited resource, only 1 core and  1GB RAM and the
issue is very easy to happen in a slow disk. However, I did not see
the same issue in the older kernel 3.12.61. Could you please let me
know if my analysis makes sense, if jbd2_log_wait_commit hang causes
__wait_on_freeing_inode wait ? Is it a known issue in kernel? If any
kernel change causes the issue?. And if we have any WA to avoid that
like preventing kswapd kicks in (I have a 5G swap partition) or to
make my system is not unresponsive like that (of course it is not to
add more RAM :) ?

Thanks a lot
Brs,
Naruto

^ permalink raw reply	[flat|nested] 8+ messages in thread

* High IO load cause kswapd hang and unresponsive system
  2018-11-15 16:55 High IO load cause kswapd hang and unresponsive system narutonguyen2018
@ 2018-11-15 16:55 ` Naruto Nguyen
  2018-11-16  5:10 ` mulyadi.santosa
  1 sibling, 0 replies; 8+ messages in thread
From: Naruto Nguyen @ 2018-11-15 16:55 UTC (permalink / raw)
  To: linux-kernel, kernelnewbies

Hi everyone,

Recently when I upgrade my kernel from 3.12.61 to kernel 4.4.74, I
find my system sometimes is very slow and unresponsive

When my system is under a heavy IO load, I see the node start to
respond very slow, the IO wait time keeps very high, and the node
starts unresponsive. I try to dump all block task at that time and
found some interesting things, look like under a heavy load I/O, the
kswap start to hang in jbd2_log_wait_commit.

Sep 07 12:25:35 myPC kernel: [11039.126514] kswapd0         D
0000000000000000     0    26      2 0x00000000
Sep 07 12:25:35 myPC kernel: [11039.126517]  ffff88003da27b78
ffff88000ce08280 ffff88003cdf0680 ffff88003da28000
Sep 07 12:25:35 myPC kernel: [11039.126519]  ffff88001a15c888
ffff88001a15c824 ffff88003da27bb8 ffff88001a15c8a0
Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
ffffffff815e4c85 ffff88001a15c800 0000000000013183
Sep 07 12:25:35 myPC kernel: [11039.126522] Call Trace:
Sep 07 12:25:35 myPC kernel: [11039.126539]  [<ffffffff815e4c85>]
schedule+0x35/0x80
Sep 07 12:25:35 myPC kernel: [11039.126561]  [<ffffffffa02faada>]
jbd2_log_wait_commit+0x8a/0xf0 [jbd2]
Sep 07 12:25:35 myPC kernel: [11039.126621]  [<ffffffffa03b28a4>]
ext4_evict_inode+0x2d4/0x4b0 [ext4]
Sep 07 12:25:35 myPC kernel: [11039.126630]  [<ffffffff81215a7a>]
evict+0xba/0x190
Sep 07 12:25:35 myPC kernel: [11039.126635]  [<ffffffff81215b82>]
dispose_list+0x32/0x50
Sep 07 12:25:35 myPC kernel: [11039.126639]  [<ffffffff81216e52>]
prune_icache_sb+0x42/0x50
Sep 07 12:25:35 myPC kernel: [11039.126644]  [<ffffffff811ff8f6>]
super_cache_scan+0x136/0x180
Sep 07 12:25:35 myPC kernel: [11039.126653]  [<ffffffff81196a1b>]
shrink_slab.part.41+0x20b/0x3f0
Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
ffffffff815e4c85 ffff88001a15c800 0000000000013183

And after that more and more processes are hitting STAT D, nfsd is
hang to wait or the inode deletion to complete in
__wait_on_freeing_inode.

Sep 07 12:25:35 myPC kernel: [11039.390218] nfsd            D
0000000000000000     0 30419      2 0x00000080
Sep 07 12:25:35 myPC kernel: [11039.390221]  ffff88000da47a98
ffff8800277ec380 ffff8800374ec580 ffff88000da48000
Sep 07 12:25:35 myPC kernel: [11039.390223]  ffff88000da47ad8
ffffc9000011edc8 0000000000000000 ffff88000e92e2f8
Sep 07 12:25:35 myPC kernel: [11039.390225]  ffff88000da47ab0
ffffffff815e4c85 ffffffff81c094c0 ffff88003ff9d8b8
Sep 07 12:25:35 myPC kernel: [11039.390225] Call Trace:
Sep 07 12:25:35 myPC kernel: [11039.390231]  [<ffffffff815e4c85>]
schedule+0x35/0x80
Sep 07 12:25:35 myPC kernel: [11039.390236]  [<ffffffff812157fc>]
__wait_on_freeing_inode+0x9c/0xc0
Sep 07 12:25:35 myPC kernel: [11039.390241]  [<ffffffff81215d83>]
find_inode_fast+0x73/0xa0
Sep 07 12:25:35 myPC kernel: [11039.390245]  [<ffffffff81216052>]
iget_locked+0x72/0x190
Sep 07 12:25:35 myPC kernel: [11039.390284]  [<ffffffffa03aeb8c>]
ext4_iget+0x2c/0xb10 [ext4]
Sep 07 12:25:35 myPC kernel: [11039.390314]  [<ffffffffa03c1055>]
ext4_nfs_get_inode+0x45/0x80 [ext4]
Sep 07 12:25:35 myPC kernel: [11039.390320]  [<ffffffff812211c4>]
generic_fh_to_dentry+0x24/0x30
Sep 07 12:25:35 myPC kernel: [11039.390326]  [<ffffffff81277ede>]
exportfs_decode_fh+0x5e/0x350
Sep 07 12:25:35 myPC kernel: [11039.390347]  [<ffffffffa047749a>]
fh_verify+0x2da/0x5a0 [nfsd]
Sep 07 12:25:35 myPC kernel: [11039.390362]  [<ffffffffa0480933>]
nfsd3_proc_getattr+0x63/0xd0 [nfsd]
Sep 07 12:25:35 myPC kernel: [11039.390374]  [<ffffffffa0473e23>]
nfsd_dispatch+0xc3/0x260 [nfsd]
Sep 07 12:25:35 myPC kernel: [11039.390411]  [<ffffffffa0322108>]
svc_process_common+0x418/0x6a0 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.390439]  [<ffffffffa032248d>]
svc_process+0xfd/0x1b0 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.390454]  [<ffffffffa047389a>]
nfsd+0xea/0x160 [nfsd]

That causes the other nfs client is hang to wait nfsd to continue.

Sep 07 12:25:35 myPC kernel: [11039.207987] Call Trace:
Sep 07 12:25:35 myPC kernel: [11039.207994]  [<ffffffff815e4c85>]
schedule+0x35/0x80
Sep 07 12:25:35 myPC kernel: [11039.208008]  [<ffffffffa031acce>]
rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.208055]  [<ffffffff815e5098>]
__wait_on_bit+0x58/0x90
Sep 07 12:25:35 myPC kernel: [11039.208059]  [<ffffffff815e513e>]
out_of_line_wait_on_bit+0x6e/0x80
Sep 07 12:25:35 myPC kernel: [11039.208082]  [<ffffffffa031bd26>]
__rpc_execute+0x146/0x440 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.208104]  [<ffffffffa0313049>]
rpc_run_task+0x69/0x80 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.208126]  [<ffffffffa031309f>]
rpc_call_sync+0x3f/0xa0 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.204818]  [<ffffffff81200d69>]
vfs_fstatat+0x49/0x90
Sep 07 12:25:35 myPC kernel: [11039.204821]  [<ffffffff812011ea>]
SYSC_newstat+0x1a/0x40
Sep 07 12:25:35 myPC kernel: [11039.204826]  [<ffffffff815e876e>]
entry_SYSCALL_64_fastpath+0x12/0x6d
Sep 07 12:25:35 myPC kernel: [11039.207972] DWARF2 unwinder stuck at
entry_SYSCALL_64_fastpath+0x12/0x6d

I guess under heavy load, somehow the inode cannot be evicted in
kswap, so the nfsd just keeps waiting and cause the whole system ends
up in unresponsive state. I found some related issues
https://www.systutorials.com/linux-kernels/105429/nfsv4-1-prevent-a-3-way-deadlock-between-layoutreturn-open-and-state-recovery-linux-3-12-17/
(it’s not the same as it’s nfsv4 but just refer for the same symptom.)

My system has very limited resource, only 1 core and  1GB RAM and the
issue is very easy to happen in a slow disk. However, I did not see
the same issue in the older kernel 3.12.61. Could you please let me
know if my analysis makes sense, if jbd2_log_wait_commit hang causes
__wait_on_freeing_inode wait ? Is it a known issue in kernel? If any
kernel change causes the issue?. And if we have any WA to avoid that
like preventing kswapd kicks in (I have a 5G swap partition) or to
make my system is not unresponsive like that (of course it is not to
add more RAM :) ?

Thanks a lot
Brs,
Naruto

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 8+ messages in thread

* High IO load cause kswapd hang and unresponsive system
  2018-11-15 16:55 High IO load cause kswapd hang and unresponsive system narutonguyen2018
  2018-11-15 16:55 ` Naruto Nguyen
@ 2018-11-16  5:10 ` mulyadi.santosa
  2018-11-16  5:10   ` Mulyadi Santosa
  2018-11-17  9:36   ` narutonguyen2018
  1 sibling, 2 replies; 8+ messages in thread
From: mulyadi.santosa @ 2018-11-16  5:10 UTC (permalink / raw)
  To: kernelnewbies

On Thu, Nov 15, 2018 at 11:57 PM Naruto Nguyen <narutonguyen2018@gmail.com>
wrote:

> Hi everyone,
>
> Recently when I upgrade my kernel from 3.12.61 to kernel 4.4.74, I
>

Hi.

You need to upgrade, either using 4.19.2, 4.18.2, or 4.4.163 (if you prefer
to stay with 4.4.x) and see if it still happens

find my system sometimes is very slow and unresponsive
>
> When my system is under a heavy IO load, I see the node start to
> respond very slow, the IO wait time keeps very high, and the node
> starts unresponsive. I try to dump all block task at that time and
> found some interesting things, look like under a heavy load I/O, the
> kswap start to hang in jbd2_log_wait_commit.
>
> Sep 07 12:25:35 myPC kernel: [11039.126514] kswapd0         D
> 0000000000000000     0    26      2 0x00000000
> Sep 07 12:25:35 myPC kernel: [11039.126517]  ffff88003da27b78
> ffff88000ce08280 ffff88003cdf0680 ffff88003da28000
> Sep 07 12:25:35 myPC kernel: [11039.126519]  ffff88001a15c888
> ffff88001a15c824 ffff88003da27bb8 ffff88001a15c8a0
> Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
> ffffffff815e4c85 ffff88001a15c800 0000000000013183
> Sep 07 12:25:35 myPC kernel: [11039.126522] Call Trace:
> Sep 07 12:25:35 myPC kernel: [11039.126539]  [<ffffffff815e4c85>]
> schedule+0x35/0x80
> Sep 07 12:25:35 myPC kernel: [11039.126561]  [<ffffffffa02faada>]
> jbd2_log_wait_commit+0x8a/0xf0 [jbd2]
> Sep 07 12:25:35 myPC kernel: [11039.126621]  [<ffffffffa03b28a4>]
> ext4_evict_inode+0x2d4/0x4b0 [ext4]
> Sep 07 12:25:35 myPC kernel: [11039.126630]  [<ffffffff81215a7a>]
> evict+0xba/0x190
> Sep 07 12:25:35 myPC kernel: [11039.126635]  [<ffffffff81215b82>]
> dispose_list+0x32/0x50
> Sep 07 12:25:35 myPC kernel: [11039.126639]  [<ffffffff81216e52>]
> prune_icache_sb+0x42/0x50
> Sep 07 12:25:35 myPC kernel: [11039.126644]  [<ffffffff811ff8f6>]
> super_cache_scan+0x136/0x180
> Sep 07 12:25:35 myPC kernel: [11039.126653]  [<ffffffff81196a1b>]
> shrink_slab.part.41+0x20b/0x3f0
> Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
> ffffffff815e4c85 ffff88001a15c800 0000000000013183
>
> And after that more and more processes are hitting STAT D, nfsd is
> hang to wait or the inode deletion to complete in
> __wait_on_freeing_inode.
>
> Sep 07 12:25:35 myPC kernel: [11039.390218] nfsd            D
> 0000000000000000     0 30419      2 0x00000080
> Sep 07 12:25:35 myPC kernel: [11039.390221]  ffff88000da47a98
> ffff8800277ec380 ffff8800374ec580 ffff88000da48000
> Sep 07 12:25:35 myPC kernel: [11039.390223]  ffff88000da47ad8
> ffffc9000011edc8 0000000000000000 ffff88000e92e2f8
> Sep 07 12:25:35 myPC kernel: [11039.390225]  ffff88000da47ab0
> ffffffff815e4c85 ffffffff81c094c0 ffff88003ff9d8b8
> Sep 07 12:25:35 myPC kernel: [11039.390225] Call Trace:
> Sep 07 12:25:35 myPC kernel: [11039.390231]  [<ffffffff815e4c85>]
> schedule+0x35/0x80
> Sep 07 12:25:35 myPC kernel: [11039.390236]  [<ffffffff812157fc>]
> __wait_on_freeing_inode+0x9c/0xc0
> Sep 07 12:25:35 myPC kernel: [11039.390241]  [<ffffffff81215d83>]
> find_inode_fast+0x73/0xa0
> Sep 07 12:25:35 myPC kernel: [11039.390245]  [<ffffffff81216052>]
> iget_locked+0x72/0x190
> Sep 07 12:25:35 myPC kernel: [11039.390284]  [<ffffffffa03aeb8c>]
> ext4_iget+0x2c/0xb10 [ext4]
> Sep 07 12:25:35 myPC kernel: [11039.390314]  [<ffffffffa03c1055>]
> ext4_nfs_get_inode+0x45/0x80 [ext4]
> Sep 07 12:25:35 myPC kernel: [11039.390320]  [<ffffffff812211c4>]
> generic_fh_to_dentry+0x24/0x30
> Sep 07 12:25:35 myPC kernel: [11039.390326]  [<ffffffff81277ede>]
> exportfs_decode_fh+0x5e/0x350
> Sep 07 12:25:35 myPC kernel: [11039.390347]  [<ffffffffa047749a>]
> fh_verify+0x2da/0x5a0 [nfsd]
> Sep 07 12:25:35 myPC kernel: [11039.390362]  [<ffffffffa0480933>]
> nfsd3_proc_getattr+0x63/0xd0 [nfsd]
> Sep 07 12:25:35 myPC kernel: [11039.390374]  [<ffffffffa0473e23>]
> nfsd_dispatch+0xc3/0x260 [nfsd]
> Sep 07 12:25:35 myPC kernel: [11039.390411]  [<ffffffffa0322108>]
> svc_process_common+0x418/0x6a0 [sunrpc]
> Sep 07 12:25:35 myPC kernel: [11039.390439]  [<ffffffffa032248d>]
> svc_process+0xfd/0x1b0 [sunrpc]
> Sep 07 12:25:35 myPC kernel: [11039.390454]  [<ffffffffa047389a>]
> nfsd+0xea/0x160 [nfsd]
>
> That causes the other nfs client is hang to wait nfsd to continue.
>
> Sep 07 12:25:35 myPC kernel: [11039.207987] Call Trace:
> Sep 07 12:25:35 myPC kernel: [11039.207994]  [<ffffffff815e4c85>]
> schedule+0x35/0x80
> Sep 07 12:25:35 myPC kernel: [11039.208008]  [<ffffffffa031acce>]
> rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
> Sep 07 12:25:35 myPC kernel: [11039.208055]  [<ffffffff815e5098>]
> __wait_on_bit+0x58/0x90
> Sep 07 12:25:35 myPC kernel: [11039.208059]  [<ffffffff815e513e>]
> out_of_line_wait_on_bit+0x6e/0x80
> Sep 07 12:25:35 myPC kernel: [11039.208082]  [<ffffffffa031bd26>]
> __rpc_execute+0x146/0x440 [sunrpc]
> Sep 07 12:25:35 myPC kernel: [11039.208104]  [<ffffffffa0313049>]
> rpc_run_task+0x69/0x80 [sunrpc]
> Sep 07 12:25:35 myPC kernel: [11039.208126]  [<ffffffffa031309f>]
> rpc_call_sync+0x3f/0xa0 [sunrpc]
> Sep 07 12:25:35 myPC kernel: [11039.204818]  [<ffffffff81200d69>]
> vfs_fstatat+0x49/0x90
> Sep 07 12:25:35 myPC kernel: [11039.204821]  [<ffffffff812011ea>]
> SYSC_newstat+0x1a/0x40
> Sep 07 12:25:35 myPC kernel: [11039.204826]  [<ffffffff815e876e>]
> entry_SYSCALL_64_fastpath+0x12/0x6d
> Sep 07 12:25:35 myPC kernel: [11039.207972] DWARF2 unwinder stuck at
> entry_SYSCALL_64_fastpath+0x12/0x6d
>
> I guess under heavy load, somehow the inode cannot be evicted in
> kswap, so the nfsd just keeps waiting and cause the whole system ends
> up in unresponsive state. I found some related issues
>
> https://www.systutorials.com/linux-kernels/105429/nfsv4-1-prevent-a-3-way-deadlock-between-layoutreturn-open-and-state-recovery-linux-3-12-17/
> (it?s not the same as it?s nfsv4 but just refer for the same symptom.)
>
> My system has very limited resource, only 1 core and  1GB RAM and the
> issue is very easy to happen in a slow disk. However, I did not see
> the same issue in the older kernel 3.12.61. Could you please let me
> know if my analysis makes sense, if jbd2_log_wait_commit hang causes
> __wait_on_freeing_inode wait ? Is it a known issue in kernel? If any
> kernel change causes the issue?. And if we have any WA to avoid that
> like preventing kswapd kicks in (I have a 5G swap partition) or to
> make my system is not unresponsive like that (of course it is not to
> add more RAM :) ?
>
> Thanks a lot
> Brs,
> Naruto
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>


-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20181116/5924c87c/attachment.html>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: High IO load cause kswapd hang and unresponsive system
  2018-11-16  5:10 ` mulyadi.santosa
@ 2018-11-16  5:10   ` Mulyadi Santosa
  2018-11-17  9:36   ` narutonguyen2018
  1 sibling, 0 replies; 8+ messages in thread
From: Mulyadi Santosa @ 2018-11-16  5:10 UTC (permalink / raw)
  To: narutonguyen2018; +Cc: kernel derkeiler, kernelnewbies

[-- Attachment #1.1: Type: text/plain, Size: 6931 bytes --]

On Thu, Nov 15, 2018 at 11:57 PM Naruto Nguyen <narutonguyen2018@gmail.com>
wrote:

> Hi everyone,
>
> Recently when I upgrade my kernel from 3.12.61 to kernel 4.4.74, I
>

Hi.

You need to upgrade, either using 4.19.2, 4.18.2, or 4.4.163 (if you prefer
to stay with 4.4.x) and see if it still happens

find my system sometimes is very slow and unresponsive
>
> When my system is under a heavy IO load, I see the node start to
> respond very slow, the IO wait time keeps very high, and the node
> starts unresponsive. I try to dump all block task at that time and
> found some interesting things, look like under a heavy load I/O, the
> kswap start to hang in jbd2_log_wait_commit.
>
> Sep 07 12:25:35 myPC kernel: [11039.126514] kswapd0         D
> 0000000000000000     0    26      2 0x00000000
> Sep 07 12:25:35 myPC kernel: [11039.126517]  ffff88003da27b78
> ffff88000ce08280 ffff88003cdf0680 ffff88003da28000
> Sep 07 12:25:35 myPC kernel: [11039.126519]  ffff88001a15c888
> ffff88001a15c824 ffff88003da27bb8 ffff88001a15c8a0
> Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
> ffffffff815e4c85 ffff88001a15c800 0000000000013183
> Sep 07 12:25:35 myPC kernel: [11039.126522] Call Trace:
> Sep 07 12:25:35 myPC kernel: [11039.126539]  [<ffffffff815e4c85>]
> schedule+0x35/0x80
> Sep 07 12:25:35 myPC kernel: [11039.126561]  [<ffffffffa02faada>]
> jbd2_log_wait_commit+0x8a/0xf0 [jbd2]
> Sep 07 12:25:35 myPC kernel: [11039.126621]  [<ffffffffa03b28a4>]
> ext4_evict_inode+0x2d4/0x4b0 [ext4]
> Sep 07 12:25:35 myPC kernel: [11039.126630]  [<ffffffff81215a7a>]
> evict+0xba/0x190
> Sep 07 12:25:35 myPC kernel: [11039.126635]  [<ffffffff81215b82>]
> dispose_list+0x32/0x50
> Sep 07 12:25:35 myPC kernel: [11039.126639]  [<ffffffff81216e52>]
> prune_icache_sb+0x42/0x50
> Sep 07 12:25:35 myPC kernel: [11039.126644]  [<ffffffff811ff8f6>]
> super_cache_scan+0x136/0x180
> Sep 07 12:25:35 myPC kernel: [11039.126653]  [<ffffffff81196a1b>]
> shrink_slab.part.41+0x20b/0x3f0
> Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
> ffffffff815e4c85 ffff88001a15c800 0000000000013183
>
> And after that more and more processes are hitting STAT D, nfsd is
> hang to wait or the inode deletion to complete in
> __wait_on_freeing_inode.
>
> Sep 07 12:25:35 myPC kernel: [11039.390218] nfsd            D
> 0000000000000000     0 30419      2 0x00000080
> Sep 07 12:25:35 myPC kernel: [11039.390221]  ffff88000da47a98
> ffff8800277ec380 ffff8800374ec580 ffff88000da48000
> Sep 07 12:25:35 myPC kernel: [11039.390223]  ffff88000da47ad8
> ffffc9000011edc8 0000000000000000 ffff88000e92e2f8
> Sep 07 12:25:35 myPC kernel: [11039.390225]  ffff88000da47ab0
> ffffffff815e4c85 ffffffff81c094c0 ffff88003ff9d8b8
> Sep 07 12:25:35 myPC kernel: [11039.390225] Call Trace:
> Sep 07 12:25:35 myPC kernel: [11039.390231]  [<ffffffff815e4c85>]
> schedule+0x35/0x80
> Sep 07 12:25:35 myPC kernel: [11039.390236]  [<ffffffff812157fc>]
> __wait_on_freeing_inode+0x9c/0xc0
> Sep 07 12:25:35 myPC kernel: [11039.390241]  [<ffffffff81215d83>]
> find_inode_fast+0x73/0xa0
> Sep 07 12:25:35 myPC kernel: [11039.390245]  [<ffffffff81216052>]
> iget_locked+0x72/0x190
> Sep 07 12:25:35 myPC kernel: [11039.390284]  [<ffffffffa03aeb8c>]
> ext4_iget+0x2c/0xb10 [ext4]
> Sep 07 12:25:35 myPC kernel: [11039.390314]  [<ffffffffa03c1055>]
> ext4_nfs_get_inode+0x45/0x80 [ext4]
> Sep 07 12:25:35 myPC kernel: [11039.390320]  [<ffffffff812211c4>]
> generic_fh_to_dentry+0x24/0x30
> Sep 07 12:25:35 myPC kernel: [11039.390326]  [<ffffffff81277ede>]
> exportfs_decode_fh+0x5e/0x350
> Sep 07 12:25:35 myPC kernel: [11039.390347]  [<ffffffffa047749a>]
> fh_verify+0x2da/0x5a0 [nfsd]
> Sep 07 12:25:35 myPC kernel: [11039.390362]  [<ffffffffa0480933>]
> nfsd3_proc_getattr+0x63/0xd0 [nfsd]
> Sep 07 12:25:35 myPC kernel: [11039.390374]  [<ffffffffa0473e23>]
> nfsd_dispatch+0xc3/0x260 [nfsd]
> Sep 07 12:25:35 myPC kernel: [11039.390411]  [<ffffffffa0322108>]
> svc_process_common+0x418/0x6a0 [sunrpc]
> Sep 07 12:25:35 myPC kernel: [11039.390439]  [<ffffffffa032248d>]
> svc_process+0xfd/0x1b0 [sunrpc]
> Sep 07 12:25:35 myPC kernel: [11039.390454]  [<ffffffffa047389a>]
> nfsd+0xea/0x160 [nfsd]
>
> That causes the other nfs client is hang to wait nfsd to continue.
>
> Sep 07 12:25:35 myPC kernel: [11039.207987] Call Trace:
> Sep 07 12:25:35 myPC kernel: [11039.207994]  [<ffffffff815e4c85>]
> schedule+0x35/0x80
> Sep 07 12:25:35 myPC kernel: [11039.208008]  [<ffffffffa031acce>]
> rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
> Sep 07 12:25:35 myPC kernel: [11039.208055]  [<ffffffff815e5098>]
> __wait_on_bit+0x58/0x90
> Sep 07 12:25:35 myPC kernel: [11039.208059]  [<ffffffff815e513e>]
> out_of_line_wait_on_bit+0x6e/0x80
> Sep 07 12:25:35 myPC kernel: [11039.208082]  [<ffffffffa031bd26>]
> __rpc_execute+0x146/0x440 [sunrpc]
> Sep 07 12:25:35 myPC kernel: [11039.208104]  [<ffffffffa0313049>]
> rpc_run_task+0x69/0x80 [sunrpc]
> Sep 07 12:25:35 myPC kernel: [11039.208126]  [<ffffffffa031309f>]
> rpc_call_sync+0x3f/0xa0 [sunrpc]
> Sep 07 12:25:35 myPC kernel: [11039.204818]  [<ffffffff81200d69>]
> vfs_fstatat+0x49/0x90
> Sep 07 12:25:35 myPC kernel: [11039.204821]  [<ffffffff812011ea>]
> SYSC_newstat+0x1a/0x40
> Sep 07 12:25:35 myPC kernel: [11039.204826]  [<ffffffff815e876e>]
> entry_SYSCALL_64_fastpath+0x12/0x6d
> Sep 07 12:25:35 myPC kernel: [11039.207972] DWARF2 unwinder stuck at
> entry_SYSCALL_64_fastpath+0x12/0x6d
>
> I guess under heavy load, somehow the inode cannot be evicted in
> kswap, so the nfsd just keeps waiting and cause the whole system ends
> up in unresponsive state. I found some related issues
>
> https://www.systutorials.com/linux-kernels/105429/nfsv4-1-prevent-a-3-way-deadlock-between-layoutreturn-open-and-state-recovery-linux-3-12-17/
> (it’s not the same as it’s nfsv4 but just refer for the same symptom.)
>
> My system has very limited resource, only 1 core and  1GB RAM and the
> issue is very easy to happen in a slow disk. However, I did not see
> the same issue in the older kernel 3.12.61. Could you please let me
> know if my analysis makes sense, if jbd2_log_wait_commit hang causes
> __wait_on_freeing_inode wait ? Is it a known issue in kernel? If any
> kernel change causes the issue?. And if we have any WA to avoid that
> like preventing kswapd kicks in (I have a 5G swap partition) or to
> make my system is not unresponsive like that (of course it is not to
> add more RAM :) ?
>
> Thanks a lot
> Brs,
> Naruto
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>


-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

[-- Attachment #1.2: Type: text/html, Size: 8477 bytes --]

<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Thu, Nov 15, 2018 at 11:57 PM Naruto Nguyen &lt;<a href="mailto:narutonguyen2018@gmail.com">narutonguyen2018@gmail.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi everyone,<br>
<br>
Recently when I upgrade my kernel from 3.12.61 to kernel 4.4.74, I<br></blockquote><div><br></div><div>Hi.</div><div><br></div><div>You need to upgrade, either using 4.19.2, 4.18.2, or 4.4.163 (if you prefer to stay with 4.4.x) and see if it still happens</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
find my system sometimes is very slow and unresponsive<br>
<br>
When my system is under a heavy IO load, I see the node start to<br>
respond very slow, the IO wait time keeps very high, and the node<br>
starts unresponsive. I try to dump all block task at that time and<br>
found some interesting things, look like under a heavy load I/O, the<br>
kswap start to hang in jbd2_log_wait_commit.<br>
<br>
Sep 07 12:25:35 myPC kernel: [11039.126514] kswapd0         D<br>
0000000000000000     0    26      2 0x00000000<br>
Sep 07 12:25:35 myPC kernel: [11039.126517]  ffff88003da27b78<br>
ffff88000ce08280 ffff88003cdf0680 ffff88003da28000<br>
Sep 07 12:25:35 myPC kernel: [11039.126519]  ffff88001a15c888<br>
ffff88001a15c824 ffff88003da27bb8 ffff88001a15c8a0<br>
Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90<br>
ffffffff815e4c85 ffff88001a15c800 0000000000013183<br>
Sep 07 12:25:35 myPC kernel: [11039.126522] Call Trace:<br>
Sep 07 12:25:35 myPC kernel: [11039.126539]  [&lt;ffffffff815e4c85&gt;]<br>
schedule+0x35/0x80<br>
Sep 07 12:25:35 myPC kernel: [11039.126561]  [&lt;ffffffffa02faada&gt;]<br>
jbd2_log_wait_commit+0x8a/0xf0 [jbd2]<br>
Sep 07 12:25:35 myPC kernel: [11039.126621]  [&lt;ffffffffa03b28a4&gt;]<br>
ext4_evict_inode+0x2d4/0x4b0 [ext4]<br>
Sep 07 12:25:35 myPC kernel: [11039.126630]  [&lt;ffffffff81215a7a&gt;]<br>
evict+0xba/0x190<br>
Sep 07 12:25:35 myPC kernel: [11039.126635]  [&lt;ffffffff81215b82&gt;]<br>
dispose_list+0x32/0x50<br>
Sep 07 12:25:35 myPC kernel: [11039.126639]  [&lt;ffffffff81216e52&gt;]<br>
prune_icache_sb+0x42/0x50<br>
Sep 07 12:25:35 myPC kernel: [11039.126644]  [&lt;ffffffff811ff8f6&gt;]<br>
super_cache_scan+0x136/0x180<br>
Sep 07 12:25:35 myPC kernel: [11039.126653]  [&lt;ffffffff81196a1b&gt;]<br>
shrink_slab.part.41+0x20b/0x3f0<br>
Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90<br>
ffffffff815e4c85 ffff88001a15c800 0000000000013183<br>
<br>
And after that more and more processes are hitting STAT D, nfsd is<br>
hang to wait or the inode deletion to complete in<br>
__wait_on_freeing_inode.<br>
<br>
Sep 07 12:25:35 myPC kernel: [11039.390218] nfsd            D<br>
0000000000000000     0 30419      2 0x00000080<br>
Sep 07 12:25:35 myPC kernel: [11039.390221]  ffff88000da47a98<br>
ffff8800277ec380 ffff8800374ec580 ffff88000da48000<br>
Sep 07 12:25:35 myPC kernel: [11039.390223]  ffff88000da47ad8<br>
ffffc9000011edc8 0000000000000000 ffff88000e92e2f8<br>
Sep 07 12:25:35 myPC kernel: [11039.390225]  ffff88000da47ab0<br>
ffffffff815e4c85 ffffffff81c094c0 ffff88003ff9d8b8<br>
Sep 07 12:25:35 myPC kernel: [11039.390225] Call Trace:<br>
Sep 07 12:25:35 myPC kernel: [11039.390231]  [&lt;ffffffff815e4c85&gt;]<br>
schedule+0x35/0x80<br>
Sep 07 12:25:35 myPC kernel: [11039.390236]  [&lt;ffffffff812157fc&gt;]<br>
__wait_on_freeing_inode+0x9c/0xc0<br>
Sep 07 12:25:35 myPC kernel: [11039.390241]  [&lt;ffffffff81215d83&gt;]<br>
find_inode_fast+0x73/0xa0<br>
Sep 07 12:25:35 myPC kernel: [11039.390245]  [&lt;ffffffff81216052&gt;]<br>
iget_locked+0x72/0x190<br>
Sep 07 12:25:35 myPC kernel: [11039.390284]  [&lt;ffffffffa03aeb8c&gt;]<br>
ext4_iget+0x2c/0xb10 [ext4]<br>
Sep 07 12:25:35 myPC kernel: [11039.390314]  [&lt;ffffffffa03c1055&gt;]<br>
ext4_nfs_get_inode+0x45/0x80 [ext4]<br>
Sep 07 12:25:35 myPC kernel: [11039.390320]  [&lt;ffffffff812211c4&gt;]<br>
generic_fh_to_dentry+0x24/0x30<br>
Sep 07 12:25:35 myPC kernel: [11039.390326]  [&lt;ffffffff81277ede&gt;]<br>
exportfs_decode_fh+0x5e/0x350<br>
Sep 07 12:25:35 myPC kernel: [11039.390347]  [&lt;ffffffffa047749a&gt;]<br>
fh_verify+0x2da/0x5a0 [nfsd]<br>
Sep 07 12:25:35 myPC kernel: [11039.390362]  [&lt;ffffffffa0480933&gt;]<br>
nfsd3_proc_getattr+0x63/0xd0 [nfsd]<br>
Sep 07 12:25:35 myPC kernel: [11039.390374]  [&lt;ffffffffa0473e23&gt;]<br>
nfsd_dispatch+0xc3/0x260 [nfsd]<br>
Sep 07 12:25:35 myPC kernel: [11039.390411]  [&lt;ffffffffa0322108&gt;]<br>
svc_process_common+0x418/0x6a0 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.390439]  [&lt;ffffffffa032248d&gt;]<br>
svc_process+0xfd/0x1b0 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.390454]  [&lt;ffffffffa047389a&gt;]<br>
nfsd+0xea/0x160 [nfsd]<br>
<br>
That causes the other nfs client is hang to wait nfsd to continue.<br>
<br>
Sep 07 12:25:35 myPC kernel: [11039.207987] Call Trace:<br>
Sep 07 12:25:35 myPC kernel: [11039.207994]  [&lt;ffffffff815e4c85&gt;]<br>
schedule+0x35/0x80<br>
Sep 07 12:25:35 myPC kernel: [11039.208008]  [&lt;ffffffffa031acce&gt;]<br>
rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.208055]  [&lt;ffffffff815e5098&gt;]<br>
__wait_on_bit+0x58/0x90<br>
Sep 07 12:25:35 myPC kernel: [11039.208059]  [&lt;ffffffff815e513e&gt;]<br>
out_of_line_wait_on_bit+0x6e/0x80<br>
Sep 07 12:25:35 myPC kernel: [11039.208082]  [&lt;ffffffffa031bd26&gt;]<br>
__rpc_execute+0x146/0x440 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.208104]  [&lt;ffffffffa0313049&gt;]<br>
rpc_run_task+0x69/0x80 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.208126]  [&lt;ffffffffa031309f&gt;]<br>
rpc_call_sync+0x3f/0xa0 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.204818]  [&lt;ffffffff81200d69&gt;]<br>
vfs_fstatat+0x49/0x90<br>
Sep 07 12:25:35 myPC kernel: [11039.204821]  [&lt;ffffffff812011ea&gt;]<br>
SYSC_newstat+0x1a/0x40<br>
Sep 07 12:25:35 myPC kernel: [11039.204826]  [&lt;ffffffff815e876e&gt;]<br>
entry_SYSCALL_64_fastpath+0x12/0x6d<br>
Sep 07 12:25:35 myPC kernel: [11039.207972] DWARF2 unwinder stuck at<br>
entry_SYSCALL_64_fastpath+0x12/0x6d<br>
<br>
I guess under heavy load, somehow the inode cannot be evicted in<br>
kswap, so the nfsd just keeps waiting and cause the whole system ends<br>
up in unresponsive state. I found some related issues<br>
<a href="https://www.systutorials.com/linux-kernels/105429/nfsv4-1-prevent-a-3-way-deadlock-between-layoutreturn-open-and-state-recovery-linux-3-12-17/" rel="noreferrer" target="_blank">https://www.systutorials.com/linux-kernels/105429/nfsv4-1-prevent-a-3-way-deadlock-between-layoutreturn-open-and-state-recovery-linux-3-12-17/</a><br>
(it’s not the same as it’s nfsv4 but just refer for the same symptom.)<br>
<br>
My system has very limited resource, only 1 core and  1GB RAM and the<br>
issue is very easy to happen in a slow disk. However, I did not see<br>
the same issue in the older kernel 3.12.61. Could you please let me<br>
know if my analysis makes sense, if jbd2_log_wait_commit hang causes<br>
__wait_on_freeing_inode wait ? Is it a known issue in kernel? If any<br>
kernel change causes the issue?. And if we have any WA to avoid that<br>
like preventing kswapd kicks in (I have a 5G swap partition) or to<br>
make my system is not unresponsive like that (of course it is not to<br>
add more RAM :) ?<br>
<br>
Thanks a lot<br>
Brs,<br>
Naruto<br>
<br>
_______________________________________________<br>
Kernelnewbies mailing list<br>
<a href="mailto:Kernelnewbies@kernelnewbies.org" target="_blank">Kernelnewbies@kernelnewbies.org</a><br>
<a href="https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies" rel="noreferrer" target="_blank">https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies</a><br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">regards,<br><br>Mulyadi Santosa<br>Freelance Linux trainer and consultant<br><br>blog: <a href="http://the-hydra.blogspot.com" target="_blank">the-hydra.blogspot.com</a><br>training: <a href="http://mulyaditraining.blogspot.com" target="_blank">mulyaditraining.blogspot.com</a></div></div>

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 8+ messages in thread

* High IO load cause kswapd hang and unresponsive system
  2018-11-16  5:10 ` mulyadi.santosa
  2018-11-16  5:10   ` Mulyadi Santosa
@ 2018-11-17  9:36   ` narutonguyen2018
  2018-11-17  9:36     ` Naruto Nguyen
  2018-11-18 15:40     ` mulyadi.santosa
  1 sibling, 2 replies; 8+ messages in thread
From: narutonguyen2018 @ 2018-11-17  9:36 UTC (permalink / raw)
  To: kernelnewbies

Hi Mulyadi,

Thanks for your answer. I have updated to kernel 4.4.121 but the issue
still exists. I will try to update to kernel 4.4.163. But if you could
provide any reason that why i did not see the issue in the 3.12 kernel and
now it happens in 4.4, any changes in 4.4 can trigger the issue? And if
kernel 4.4.163 has any changes to fix the issue?

Thanks,
Naruto

On Friday, 16 November 2018, Mulyadi Santosa <mulyadi.santosa@gmail.com>
wrote:

>
>
> On Thu, Nov 15, 2018 at 11:57 PM Naruto Nguyen <narutonguyen2018@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> Recently when I upgrade my kernel from 3.12.61 to kernel 4.4.74, I
>>
>
> Hi.
>
> You need to upgrade, either using 4.19.2, 4.18.2, or 4.4.163 (if you
> prefer to stay with 4.4.x) and see if it still happens
>
> find my system sometimes is very slow and unresponsive
>>
>> When my system is under a heavy IO load, I see the node start to
>> respond very slow, the IO wait time keeps very high, and the node
>> starts unresponsive. I try to dump all block task at that time and
>> found some interesting things, look like under a heavy load I/O, the
>> kswap start to hang in jbd2_log_wait_commit.
>>
>> Sep 07 12:25:35 myPC kernel: [11039.126514] kswapd0         D
>> 0000000000000000     0    26      2 0x00000000
>> Sep 07 12:25:35 myPC kernel: [11039.126517]  ffff88003da27b78
>> ffff88000ce08280 ffff88003cdf0680 ffff88003da28000
>> Sep 07 12:25:35 myPC kernel: [11039.126519]  ffff88001a15c888
>> ffff88001a15c824 ffff88003da27bb8 ffff88001a15c8a0
>> Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
>> ffffffff815e4c85 ffff88001a15c800 0000000000013183
>> Sep 07 12:25:35 myPC kernel: [11039.126522] Call Trace:
>> Sep 07 12:25:35 myPC kernel: [11039.126539]  [<ffffffff815e4c85>]
>> schedule+0x35/0x80
>> Sep 07 12:25:35 myPC kernel: [11039.126561]  [<ffffffffa02faada>]
>> jbd2_log_wait_commit+0x8a/0xf0 [jbd2]
>> Sep 07 12:25:35 myPC kernel: [11039.126621]  [<ffffffffa03b28a4>]
>> ext4_evict_inode+0x2d4/0x4b0 [ext4]
>> Sep 07 12:25:35 myPC kernel: [11039.126630]  [<ffffffff81215a7a>]
>> evict+0xba/0x190
>> Sep 07 12:25:35 myPC kernel: [11039.126635]  [<ffffffff81215b82>]
>> dispose_list+0x32/0x50
>> Sep 07 12:25:35 myPC kernel: [11039.126639]  [<ffffffff81216e52>]
>> prune_icache_sb+0x42/0x50
>> Sep 07 12:25:35 myPC kernel: [11039.126644]  [<ffffffff811ff8f6>]
>> super_cache_scan+0x136/0x180
>> Sep 07 12:25:35 myPC kernel: [11039.126653]  [<ffffffff81196a1b>]
>> shrink_slab.part.41+0x20b/0x3f0
>> Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
>> ffffffff815e4c85 ffff88001a15c800 0000000000013183
>>
>> And after that more and more processes are hitting STAT D, nfsd is
>> hang to wait or the inode deletion to complete in
>> __wait_on_freeing_inode.
>>
>> Sep 07 12:25:35 myPC kernel: [11039.390218] nfsd            D
>> 0000000000000000     0 30419      2 0x00000080
>> Sep 07 12:25:35 myPC kernel: [11039.390221]  ffff88000da47a98
>> ffff8800277ec380 ffff8800374ec580 ffff88000da48000
>> Sep 07 12:25:35 myPC kernel: [11039.390223]  ffff88000da47ad8
>> ffffc9000011edc8 0000000000000000 ffff88000e92e2f8
>> Sep 07 12:25:35 myPC kernel: [11039.390225]  ffff88000da47ab0
>> ffffffff815e4c85 ffffffff81c094c0 ffff88003ff9d8b8
>> Sep 07 12:25:35 myPC kernel: [11039.390225] Call Trace:
>> Sep 07 12:25:35 myPC kernel: [11039.390231]  [<ffffffff815e4c85>]
>> schedule+0x35/0x80
>> Sep 07 12:25:35 myPC kernel: [11039.390236]  [<ffffffff812157fc>]
>> __wait_on_freeing_inode+0x9c/0xc0
>> Sep 07 12:25:35 myPC kernel: [11039.390241]  [<ffffffff81215d83>]
>> find_inode_fast+0x73/0xa0
>> Sep 07 12:25:35 myPC kernel: [11039.390245]  [<ffffffff81216052>]
>> iget_locked+0x72/0x190
>> Sep 07 12:25:35 myPC kernel: [11039.390284]  [<ffffffffa03aeb8c>]
>> ext4_iget+0x2c/0xb10 [ext4]
>> Sep 07 12:25:35 myPC kernel: [11039.390314]  [<ffffffffa03c1055>]
>> ext4_nfs_get_inode+0x45/0x80 [ext4]
>> Sep 07 12:25:35 myPC kernel: [11039.390320]  [<ffffffff812211c4>]
>> generic_fh_to_dentry+0x24/0x30
>> Sep 07 12:25:35 myPC kernel: [11039.390326]  [<ffffffff81277ede>]
>> exportfs_decode_fh+0x5e/0x350
>> Sep 07 12:25:35 myPC kernel: [11039.390347]  [<ffffffffa047749a>]
>> fh_verify+0x2da/0x5a0 [nfsd]
>> Sep 07 12:25:35 myPC kernel: [11039.390362]  [<ffffffffa0480933>]
>> nfsd3_proc_getattr+0x63/0xd0 [nfsd]
>> Sep 07 12:25:35 myPC kernel: [11039.390374]  [<ffffffffa0473e23>]
>> nfsd_dispatch+0xc3/0x260 [nfsd]
>> Sep 07 12:25:35 myPC kernel: [11039.390411]  [<ffffffffa0322108>]
>> svc_process_common+0x418/0x6a0 [sunrpc]
>> Sep 07 12:25:35 myPC kernel: [11039.390439]  [<ffffffffa032248d>]
>> svc_process+0xfd/0x1b0 [sunrpc]
>> Sep 07 12:25:35 myPC kernel: [11039.390454]  [<ffffffffa047389a>]
>> nfsd+0xea/0x160 [nfsd]
>>
>> That causes the other nfs client is hang to wait nfsd to continue.
>>
>> Sep 07 12:25:35 myPC kernel: [11039.207987] Call Trace:
>> Sep 07 12:25:35 myPC kernel: [11039.207994]  [<ffffffff815e4c85>]
>> schedule+0x35/0x80
>> Sep 07 12:25:35 myPC kernel: [11039.208008]  [<ffffffffa031acce>]
>> rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
>> Sep 07 12:25:35 myPC kernel: [11039.208055]  [<ffffffff815e5098>]
>> __wait_on_bit+0x58/0x90
>> Sep 07 12:25:35 myPC kernel: [11039.208059]  [<ffffffff815e513e>]
>> out_of_line_wait_on_bit+0x6e/0x80
>> Sep 07 12:25:35 myPC kernel: [11039.208082]  [<ffffffffa031bd26>]
>> __rpc_execute+0x146/0x440 [sunrpc]
>> Sep 07 12:25:35 myPC kernel: [11039.208104]  [<ffffffffa0313049>]
>> rpc_run_task+0x69/0x80 [sunrpc]
>> Sep 07 12:25:35 myPC kernel: [11039.208126]  [<ffffffffa031309f>]
>> rpc_call_sync+0x3f/0xa0 [sunrpc]
>> Sep 07 12:25:35 myPC kernel: [11039.204818]  [<ffffffff81200d69>]
>> vfs_fstatat+0x49/0x90
>> Sep 07 12:25:35 myPC kernel: [11039.204821]  [<ffffffff812011ea>]
>> SYSC_newstat+0x1a/0x40
>> Sep 07 12:25:35 myPC kernel: [11039.204826]  [<ffffffff815e876e>]
>> entry_SYSCALL_64_fastpath+0x12/0x6d
>> Sep 07 12:25:35 myPC kernel: [11039.207972] DWARF2 unwinder stuck at
>> entry_SYSCALL_64_fastpath+0x12/0x6d
>>
>> I guess under heavy load, somehow the inode cannot be evicted in
>> kswap, so the nfsd just keeps waiting and cause the whole system ends
>> up in unresponsive state. I found some related issues
>> https://www.systutorials.com/linux-kernels/105429/nfsv4-1-
>> prevent-a-3-way-deadlock-between-layoutreturn-open-and-
>> state-recovery-linux-3-12-17/
>> (it?s not the same as it?s nfsv4 but just refer for the same symptom.)
>>
>> My system has very limited resource, only 1 core and  1GB RAM and the
>> issue is very easy to happen in a slow disk. However, I did not see
>> the same issue in the older kernel 3.12.61. Could you please let me
>> know if my analysis makes sense, if jbd2_log_wait_commit hang causes
>> __wait_on_freeing_inode wait ? Is it a known issue in kernel? If any
>> kernel change causes the issue?. And if we have any WA to avoid that
>> like preventing kswapd kicks in (I have a 5G swap partition) or to
>> make my system is not unresponsive like that (of course it is not to
>> add more RAM :) ?
>>
>> Thanks a lot
>> Brs,
>> Naruto
>>
>> _______________________________________________
>> Kernelnewbies mailing list
>> Kernelnewbies at kernelnewbies.org
>> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>>
>
>
> --
> regards,
>
> Mulyadi Santosa
> Freelance Linux trainer and consultant
>
> blog: the-hydra.blogspot.com
> training: mulyaditraining.blogspot.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20181117/cfbd241e/attachment-0001.html>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: High IO load cause kswapd hang and unresponsive system
  2018-11-17  9:36   ` narutonguyen2018
@ 2018-11-17  9:36     ` Naruto Nguyen
  2018-11-18 15:40     ` mulyadi.santosa
  1 sibling, 0 replies; 8+ messages in thread
From: Naruto Nguyen @ 2018-11-17  9:36 UTC (permalink / raw)
  To: Mulyadi Santosa; +Cc: Linux NFS Mailing List, kernel derkeiler, kernelnewbies

[-- Attachment #1.1: Type: text/plain, Size: 7573 bytes --]

Hi Mulyadi,

Thanks for your answer. I have updated to kernel 4.4.121 but the issue
still exists. I will try to update to kernel 4.4.163. But if you could
provide any reason that why i did not see the issue in the 3.12 kernel and
now it happens in 4.4, any changes in 4.4 can trigger the issue? And if
kernel 4.4.163 has any changes to fix the issue?

Thanks,
Naruto

On Friday, 16 November 2018, Mulyadi Santosa <mulyadi.santosa@gmail.com>
wrote:

>
>
> On Thu, Nov 15, 2018 at 11:57 PM Naruto Nguyen <narutonguyen2018@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> Recently when I upgrade my kernel from 3.12.61 to kernel 4.4.74, I
>>
>
> Hi.
>
> You need to upgrade, either using 4.19.2, 4.18.2, or 4.4.163 (if you
> prefer to stay with 4.4.x) and see if it still happens
>
> find my system sometimes is very slow and unresponsive
>>
>> When my system is under a heavy IO load, I see the node start to
>> respond very slow, the IO wait time keeps very high, and the node
>> starts unresponsive. I try to dump all block task at that time and
>> found some interesting things, look like under a heavy load I/O, the
>> kswap start to hang in jbd2_log_wait_commit.
>>
>> Sep 07 12:25:35 myPC kernel: [11039.126514] kswapd0         D
>> 0000000000000000     0    26      2 0x00000000
>> Sep 07 12:25:35 myPC kernel: [11039.126517]  ffff88003da27b78
>> ffff88000ce08280 ffff88003cdf0680 ffff88003da28000
>> Sep 07 12:25:35 myPC kernel: [11039.126519]  ffff88001a15c888
>> ffff88001a15c824 ffff88003da27bb8 ffff88001a15c8a0
>> Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
>> ffffffff815e4c85 ffff88001a15c800 0000000000013183
>> Sep 07 12:25:35 myPC kernel: [11039.126522] Call Trace:
>> Sep 07 12:25:35 myPC kernel: [11039.126539]  [<ffffffff815e4c85>]
>> schedule+0x35/0x80
>> Sep 07 12:25:35 myPC kernel: [11039.126561]  [<ffffffffa02faada>]
>> jbd2_log_wait_commit+0x8a/0xf0 [jbd2]
>> Sep 07 12:25:35 myPC kernel: [11039.126621]  [<ffffffffa03b28a4>]
>> ext4_evict_inode+0x2d4/0x4b0 [ext4]
>> Sep 07 12:25:35 myPC kernel: [11039.126630]  [<ffffffff81215a7a>]
>> evict+0xba/0x190
>> Sep 07 12:25:35 myPC kernel: [11039.126635]  [<ffffffff81215b82>]
>> dispose_list+0x32/0x50
>> Sep 07 12:25:35 myPC kernel: [11039.126639]  [<ffffffff81216e52>]
>> prune_icache_sb+0x42/0x50
>> Sep 07 12:25:35 myPC kernel: [11039.126644]  [<ffffffff811ff8f6>]
>> super_cache_scan+0x136/0x180
>> Sep 07 12:25:35 myPC kernel: [11039.126653]  [<ffffffff81196a1b>]
>> shrink_slab.part.41+0x20b/0x3f0
>> Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
>> ffffffff815e4c85 ffff88001a15c800 0000000000013183
>>
>> And after that more and more processes are hitting STAT D, nfsd is
>> hang to wait or the inode deletion to complete in
>> __wait_on_freeing_inode.
>>
>> Sep 07 12:25:35 myPC kernel: [11039.390218] nfsd            D
>> 0000000000000000     0 30419      2 0x00000080
>> Sep 07 12:25:35 myPC kernel: [11039.390221]  ffff88000da47a98
>> ffff8800277ec380 ffff8800374ec580 ffff88000da48000
>> Sep 07 12:25:35 myPC kernel: [11039.390223]  ffff88000da47ad8
>> ffffc9000011edc8 0000000000000000 ffff88000e92e2f8
>> Sep 07 12:25:35 myPC kernel: [11039.390225]  ffff88000da47ab0
>> ffffffff815e4c85 ffffffff81c094c0 ffff88003ff9d8b8
>> Sep 07 12:25:35 myPC kernel: [11039.390225] Call Trace:
>> Sep 07 12:25:35 myPC kernel: [11039.390231]  [<ffffffff815e4c85>]
>> schedule+0x35/0x80
>> Sep 07 12:25:35 myPC kernel: [11039.390236]  [<ffffffff812157fc>]
>> __wait_on_freeing_inode+0x9c/0xc0
>> Sep 07 12:25:35 myPC kernel: [11039.390241]  [<ffffffff81215d83>]
>> find_inode_fast+0x73/0xa0
>> Sep 07 12:25:35 myPC kernel: [11039.390245]  [<ffffffff81216052>]
>> iget_locked+0x72/0x190
>> Sep 07 12:25:35 myPC kernel: [11039.390284]  [<ffffffffa03aeb8c>]
>> ext4_iget+0x2c/0xb10 [ext4]
>> Sep 07 12:25:35 myPC kernel: [11039.390314]  [<ffffffffa03c1055>]
>> ext4_nfs_get_inode+0x45/0x80 [ext4]
>> Sep 07 12:25:35 myPC kernel: [11039.390320]  [<ffffffff812211c4>]
>> generic_fh_to_dentry+0x24/0x30
>> Sep 07 12:25:35 myPC kernel: [11039.390326]  [<ffffffff81277ede>]
>> exportfs_decode_fh+0x5e/0x350
>> Sep 07 12:25:35 myPC kernel: [11039.390347]  [<ffffffffa047749a>]
>> fh_verify+0x2da/0x5a0 [nfsd]
>> Sep 07 12:25:35 myPC kernel: [11039.390362]  [<ffffffffa0480933>]
>> nfsd3_proc_getattr+0x63/0xd0 [nfsd]
>> Sep 07 12:25:35 myPC kernel: [11039.390374]  [<ffffffffa0473e23>]
>> nfsd_dispatch+0xc3/0x260 [nfsd]
>> Sep 07 12:25:35 myPC kernel: [11039.390411]  [<ffffffffa0322108>]
>> svc_process_common+0x418/0x6a0 [sunrpc]
>> Sep 07 12:25:35 myPC kernel: [11039.390439]  [<ffffffffa032248d>]
>> svc_process+0xfd/0x1b0 [sunrpc]
>> Sep 07 12:25:35 myPC kernel: [11039.390454]  [<ffffffffa047389a>]
>> nfsd+0xea/0x160 [nfsd]
>>
>> That causes the other nfs client is hang to wait nfsd to continue.
>>
>> Sep 07 12:25:35 myPC kernel: [11039.207987] Call Trace:
>> Sep 07 12:25:35 myPC kernel: [11039.207994]  [<ffffffff815e4c85>]
>> schedule+0x35/0x80
>> Sep 07 12:25:35 myPC kernel: [11039.208008]  [<ffffffffa031acce>]
>> rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
>> Sep 07 12:25:35 myPC kernel: [11039.208055]  [<ffffffff815e5098>]
>> __wait_on_bit+0x58/0x90
>> Sep 07 12:25:35 myPC kernel: [11039.208059]  [<ffffffff815e513e>]
>> out_of_line_wait_on_bit+0x6e/0x80
>> Sep 07 12:25:35 myPC kernel: [11039.208082]  [<ffffffffa031bd26>]
>> __rpc_execute+0x146/0x440 [sunrpc]
>> Sep 07 12:25:35 myPC kernel: [11039.208104]  [<ffffffffa0313049>]
>> rpc_run_task+0x69/0x80 [sunrpc]
>> Sep 07 12:25:35 myPC kernel: [11039.208126]  [<ffffffffa031309f>]
>> rpc_call_sync+0x3f/0xa0 [sunrpc]
>> Sep 07 12:25:35 myPC kernel: [11039.204818]  [<ffffffff81200d69>]
>> vfs_fstatat+0x49/0x90
>> Sep 07 12:25:35 myPC kernel: [11039.204821]  [<ffffffff812011ea>]
>> SYSC_newstat+0x1a/0x40
>> Sep 07 12:25:35 myPC kernel: [11039.204826]  [<ffffffff815e876e>]
>> entry_SYSCALL_64_fastpath+0x12/0x6d
>> Sep 07 12:25:35 myPC kernel: [11039.207972] DWARF2 unwinder stuck at
>> entry_SYSCALL_64_fastpath+0x12/0x6d
>>
>> I guess under heavy load, somehow the inode cannot be evicted in
>> kswap, so the nfsd just keeps waiting and cause the whole system ends
>> up in unresponsive state. I found some related issues
>> https://www.systutorials.com/linux-kernels/105429/nfsv4-1-
>> prevent-a-3-way-deadlock-between-layoutreturn-open-and-
>> state-recovery-linux-3-12-17/
>> (it’s not the same as it’s nfsv4 but just refer for the same symptom.)
>>
>> My system has very limited resource, only 1 core and  1GB RAM and the
>> issue is very easy to happen in a slow disk. However, I did not see
>> the same issue in the older kernel 3.12.61. Could you please let me
>> know if my analysis makes sense, if jbd2_log_wait_commit hang causes
>> __wait_on_freeing_inode wait ? Is it a known issue in kernel? If any
>> kernel change causes the issue?. And if we have any WA to avoid that
>> like preventing kswapd kicks in (I have a 5G swap partition) or to
>> make my system is not unresponsive like that (of course it is not to
>> add more RAM :) ?
>>
>> Thanks a lot
>> Brs,
>> Naruto
>>
>> _______________________________________________
>> Kernelnewbies mailing list
>> Kernelnewbies@kernelnewbies.org
>> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>>
>
>
> --
> regards,
>
> Mulyadi Santosa
> Freelance Linux trainer and consultant
>
> blog: the-hydra.blogspot.com
> training: mulyaditraining.blogspot.com
>

[-- Attachment #1.2: Type: text/html, Size: 9224 bytes --]

Hi Mulyadi,<div><br></div><div>Thanks for your answer. I have updated to kernel 4.4.121 but the issue still exists. I will try to update to kernel 4.4.163. But if you could provide any reason that why i did not see the issue in the 3.12 kernel and now it happens in 4.4, any changes in 4.4 can trigger the issue? And if kernel 4.4.163 has any changes to fix the issue?</div><div><br></div><div>Thanks,</div><div>Naruto<br><br>On Friday, 16 November 2018, Mulyadi Santosa &lt;<a href="mailto:mulyadi.santosa@gmail.com">mulyadi.santosa@gmail.com</a>&gt; wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Thu, Nov 15, 2018 at 11:57 PM Naruto Nguyen &lt;<a href="mailto:narutonguyen2018@gmail.com" target="_blank">narutonguyen2018@gmail.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi everyone,<br>
<br>
Recently when I upgrade my kernel from 3.12.61 to kernel 4.4.74, I<br></blockquote><div><br></div><div>Hi.</div><div><br></div><div>You need to upgrade, either using 4.19.2, 4.18.2, or 4.4.163 (if you prefer to stay with 4.4.x) and see if it still happens</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
find my system sometimes is very slow and unresponsive<br>
<br>
When my system is under a heavy IO load, I see the node start to<br>
respond very slow, the IO wait time keeps very high, and the node<br>
starts unresponsive. I try to dump all block task at that time and<br>
found some interesting things, look like under a heavy load I/O, the<br>
kswap start to hang in jbd2_log_wait_commit.<br>
<br>
Sep 07 12:25:35 myPC kernel: [11039.126514] kswapd0         D<br>
0000000000000000     0    26      2 0x00000000<br>
Sep 07 12:25:35 myPC kernel: [11039.126517]  ffff88003da27b78<br>
ffff88000ce08280 ffff88003cdf0680 ffff88003da28000<br>
Sep 07 12:25:35 myPC kernel: [11039.126519]  ffff88001a15c888<br>
ffff88001a15c824 ffff88003da27bb8 ffff88001a15c8a0<br>
Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90<br>
ffffffff815e4c85 ffff88001a15c800 0000000000013183<br>
Sep 07 12:25:35 myPC kernel: [11039.126522] Call Trace:<br>
Sep 07 12:25:35 myPC kernel: [11039.126539]  [&lt;ffffffff815e4c85&gt;]<br>
schedule+0x35/0x80<br>
Sep 07 12:25:35 myPC kernel: [11039.126561]  [&lt;ffffffffa02faada&gt;]<br>
jbd2_log_wait_commit+0x8a/0xf0 [jbd2]<br>
Sep 07 12:25:35 myPC kernel: [11039.126621]  [&lt;ffffffffa03b28a4&gt;]<br>
ext4_evict_inode+0x2d4/0x4b0 [ext4]<br>
Sep 07 12:25:35 myPC kernel: [11039.126630]  [&lt;ffffffff81215a7a&gt;]<br>
evict+0xba/0x190<br>
Sep 07 12:25:35 myPC kernel: [11039.126635]  [&lt;ffffffff81215b82&gt;]<br>
dispose_list+0x32/0x50<br>
Sep 07 12:25:35 myPC kernel: [11039.126639]  [&lt;ffffffff81216e52&gt;]<br>
prune_icache_sb+0x42/0x50<br>
Sep 07 12:25:35 myPC kernel: [11039.126644]  [&lt;ffffffff811ff8f6&gt;]<br>
super_cache_scan+0x136/0x180<br>
Sep 07 12:25:35 myPC kernel: [11039.126653]  [&lt;ffffffff81196a1b&gt;]<br>
shrink_slab.part.41+0x20b/<wbr>0x3f0<br>
Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90<br>
ffffffff815e4c85 ffff88001a15c800 0000000000013183<br>
<br>
And after that more and more processes are hitting STAT D, nfsd is<br>
hang to wait or the inode deletion to complete in<br>
__wait_on_freeing_inode.<br>
<br>
Sep 07 12:25:35 myPC kernel: [11039.390218] nfsd            D<br>
0000000000000000     0 30419      2 0x00000080<br>
Sep 07 12:25:35 myPC kernel: [11039.390221]  ffff88000da47a98<br>
ffff8800277ec380 ffff8800374ec580 ffff88000da48000<br>
Sep 07 12:25:35 myPC kernel: [11039.390223]  ffff88000da47ad8<br>
ffffc9000011edc8 0000000000000000 ffff88000e92e2f8<br>
Sep 07 12:25:35 myPC kernel: [11039.390225]  ffff88000da47ab0<br>
ffffffff815e4c85 ffffffff81c094c0 ffff88003ff9d8b8<br>
Sep 07 12:25:35 myPC kernel: [11039.390225] Call Trace:<br>
Sep 07 12:25:35 myPC kernel: [11039.390231]  [&lt;ffffffff815e4c85&gt;]<br>
schedule+0x35/0x80<br>
Sep 07 12:25:35 myPC kernel: [11039.390236]  [&lt;ffffffff812157fc&gt;]<br>
__wait_on_freeing_inode+0x9c/<wbr>0xc0<br>
Sep 07 12:25:35 myPC kernel: [11039.390241]  [&lt;ffffffff81215d83&gt;]<br>
find_inode_fast+0x73/0xa0<br>
Sep 07 12:25:35 myPC kernel: [11039.390245]  [&lt;ffffffff81216052&gt;]<br>
iget_locked+0x72/0x190<br>
Sep 07 12:25:35 myPC kernel: [11039.390284]  [&lt;ffffffffa03aeb8c&gt;]<br>
ext4_iget+0x2c/0xb10 [ext4]<br>
Sep 07 12:25:35 myPC kernel: [11039.390314]  [&lt;ffffffffa03c1055&gt;]<br>
ext4_nfs_get_inode+0x45/0x80 [ext4]<br>
Sep 07 12:25:35 myPC kernel: [11039.390320]  [&lt;ffffffff812211c4&gt;]<br>
generic_fh_to_dentry+0x24/0x30<br>
Sep 07 12:25:35 myPC kernel: [11039.390326]  [&lt;ffffffff81277ede&gt;]<br>
exportfs_decode_fh+0x5e/0x350<br>
Sep 07 12:25:35 myPC kernel: [11039.390347]  [&lt;ffffffffa047749a&gt;]<br>
fh_verify+0x2da/0x5a0 [nfsd]<br>
Sep 07 12:25:35 myPC kernel: [11039.390362]  [&lt;ffffffffa0480933&gt;]<br>
nfsd3_proc_getattr+0x63/0xd0 [nfsd]<br>
Sep 07 12:25:35 myPC kernel: [11039.390374]  [&lt;ffffffffa0473e23&gt;]<br>
nfsd_dispatch+0xc3/0x260 [nfsd]<br>
Sep 07 12:25:35 myPC kernel: [11039.390411]  [&lt;ffffffffa0322108&gt;]<br>
svc_process_common+0x418/0x6a0 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.390439]  [&lt;ffffffffa032248d&gt;]<br>
svc_process+0xfd/0x1b0 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.390454]  [&lt;ffffffffa047389a&gt;]<br>
nfsd+0xea/0x160 [nfsd]<br>
<br>
That causes the other nfs client is hang to wait nfsd to continue.<br>
<br>
Sep 07 12:25:35 myPC kernel: [11039.207987] Call Trace:<br>
Sep 07 12:25:35 myPC kernel: [11039.207994]  [&lt;ffffffff815e4c85&gt;]<br>
schedule+0x35/0x80<br>
Sep 07 12:25:35 myPC kernel: [11039.208008]  [&lt;ffffffffa031acce&gt;]<br>
rpc_wait_bit_killable+0x1e/<wbr>0xa0 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.208055]  [&lt;ffffffff815e5098&gt;]<br>
__wait_on_bit+0x58/0x90<br>
Sep 07 12:25:35 myPC kernel: [11039.208059]  [&lt;ffffffff815e513e&gt;]<br>
out_of_line_wait_on_bit+0x6e/<wbr>0x80<br>
Sep 07 12:25:35 myPC kernel: [11039.208082]  [&lt;ffffffffa031bd26&gt;]<br>
__rpc_execute+0x146/0x440 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.208104]  [&lt;ffffffffa0313049&gt;]<br>
rpc_run_task+0x69/0x80 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.208126]  [&lt;ffffffffa031309f&gt;]<br>
rpc_call_sync+0x3f/0xa0 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.204818]  [&lt;ffffffff81200d69&gt;]<br>
vfs_fstatat+0x49/0x90<br>
Sep 07 12:25:35 myPC kernel: [11039.204821]  [&lt;ffffffff812011ea&gt;]<br>
SYSC_newstat+0x1a/0x40<br>
Sep 07 12:25:35 myPC kernel: [11039.204826]  [&lt;ffffffff815e876e&gt;]<br>
entry_SYSCALL_64_fastpath+<wbr>0x12/0x6d<br>
Sep 07 12:25:35 myPC kernel: [11039.207972] DWARF2 unwinder stuck at<br>
entry_SYSCALL_64_fastpath+<wbr>0x12/0x6d<br>
<br>
I guess under heavy load, somehow the inode cannot be evicted in<br>
kswap, so the nfsd just keeps waiting and cause the whole system ends<br>
up in unresponsive state. I found some related issues<br>
<a href="https://www.systutorials.com/linux-kernels/105429/nfsv4-1-prevent-a-3-way-deadlock-between-layoutreturn-open-and-state-recovery-linux-3-12-17/" rel="noreferrer" target="_blank">https://www.systutorials.com/<wbr>linux-kernels/105429/nfsv4-1-<wbr>prevent-a-3-way-deadlock-<wbr>between-layoutreturn-open-and-<wbr>state-recovery-linux-3-12-17/</a><br>
(it’s not the same as it’s nfsv4 but just refer for the same symptom.)<br>
<br>
My system has very limited resource, only 1 core and  1GB RAM and the<br>
issue is very easy to happen in a slow disk. However, I did not see<br>
the same issue in the older kernel 3.12.61. Could you please let me<br>
know if my analysis makes sense, if jbd2_log_wait_commit hang causes<br>
__wait_on_freeing_inode wait ? Is it a known issue in kernel? If any<br>
kernel change causes the issue?. And if we have any WA to avoid that<br>
like preventing kswapd kicks in (I have a 5G swap partition) or to<br>
make my system is not unresponsive like that (of course it is not to<br>
add more RAM :) ?<br>
<br>
Thanks a lot<br>
Brs,<br>
Naruto<br>
<br>
______________________________<wbr>_________________<br>
Kernelnewbies mailing list<br>
<a href="mailto:Kernelnewbies@kernelnewbies.org" target="_blank">Kernelnewbies@kernelnewbies.<wbr>org</a><br>
<a href="https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies" rel="noreferrer" target="_blank">https://lists.kernelnewbies.<wbr>org/mailman/listinfo/<wbr>kernelnewbies</a><br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" data-smartmail="gmail_signature">regards,<br><br>Mulyadi Santosa<br>Freelance Linux trainer and consultant<br><br>blog: <a href="http://the-hydra.blogspot.com" target="_blank">the-hydra.blogspot.com</a><br>training: <a href="http://mulyaditraining.blogspot.com" target="_blank">mulyaditraining.blogspot.com</a></div></div>
</blockquote></div>

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 8+ messages in thread

* High IO load cause kswapd hang and unresponsive system
  2018-11-17  9:36   ` narutonguyen2018
  2018-11-17  9:36     ` Naruto Nguyen
@ 2018-11-18 15:40     ` mulyadi.santosa
  2018-11-18 15:40       ` Mulyadi Santosa
  1 sibling, 1 reply; 8+ messages in thread
From: mulyadi.santosa @ 2018-11-18 15:40 UTC (permalink / raw)
  To: kernelnewbies

On Sat, Nov 17, 2018 at 4:36 PM Naruto Nguyen <narutonguyen2018@gmail.com>
wrote:

> Hi Mulyadi,
>
> Thanks for your answer. I have updated to kernel 4.4.121 but the issue
> still exists. I will try to update to kernel 4.4.163. But if you could
> provide any reason that why i did not see the issue in the 3.12 kernel and
> now it happens in 4.4, any changes in 4.4 can trigger the issue? And if
> kernel 4.4.163 has any changes to fix the issue?
>
>
Hi..

Please don't top post.

As for the "why", that I don't know. There are too many changes between 3.x
and 4.x. That's why checking if the issue still happens with latest stable
version IS the first logical step .




> Thanks,
> Naruto
>
> On Friday, 16 November 2018, Mulyadi Santosa <mulyadi.santosa@gmail.com>
> wrote:
>
>>
>>
>> On Thu, Nov 15, 2018 at 11:57 PM Naruto Nguyen <
>> narutonguyen2018 at gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> Recently when I upgrade my kernel from 3.12.61 to kernel 4.4.74, I
>>>
>>
>> Hi.
>>
>> You need to upgrade, either using 4.19.2, 4.18.2, or 4.4.163 (if you
>> prefer to stay with 4.4.x) and see if it still happens
>>
>> find my system sometimes is very slow and unresponsive
>>>
>>> When my system is under a heavy IO load, I see the node start to
>>> respond very slow, the IO wait time keeps very high, and the node
>>> starts unresponsive. I try to dump all block task at that time and
>>> found some interesting things, look like under a heavy load I/O, the
>>> kswap start to hang in jbd2_log_wait_commit.
>>>
>>> Sep 07 12:25:35 myPC kernel: [11039.126514] kswapd0         D
>>> 0000000000000000     0    26      2 0x00000000
>>> Sep 07 12:25:35 myPC kernel: [11039.126517]  ffff88003da27b78
>>> ffff88000ce08280 ffff88003cdf0680 ffff88003da28000
>>> Sep 07 12:25:35 myPC kernel: [11039.126519]  ffff88001a15c888
>>> ffff88001a15c824 ffff88003da27bb8 ffff88001a15c8a0
>>> Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
>>> ffffffff815e4c85 ffff88001a15c800 0000000000013183
>>> Sep 07 12:25:35 myPC kernel: [11039.126522] Call Trace:
>>> Sep 07 12:25:35 myPC kernel: [11039.126539]  [<ffffffff815e4c85>]
>>> schedule+0x35/0x80
>>> Sep 07 12:25:35 myPC kernel: [11039.126561]  [<ffffffffa02faada>]
>>> jbd2_log_wait_commit+0x8a/0xf0 [jbd2]
>>> Sep 07 12:25:35 myPC kernel: [11039.126621]  [<ffffffffa03b28a4>]
>>> ext4_evict_inode+0x2d4/0x4b0 [ext4]
>>> Sep 07 12:25:35 myPC kernel: [11039.126630]  [<ffffffff81215a7a>]
>>> evict+0xba/0x190
>>> Sep 07 12:25:35 myPC kernel: [11039.126635]  [<ffffffff81215b82>]
>>> dispose_list+0x32/0x50
>>> Sep 07 12:25:35 myPC kernel: [11039.126639]  [<ffffffff81216e52>]
>>> prune_icache_sb+0x42/0x50
>>> Sep 07 12:25:35 myPC kernel: [11039.126644]  [<ffffffff811ff8f6>]
>>> super_cache_scan+0x136/0x180
>>> Sep 07 12:25:35 myPC kernel: [11039.126653]  [<ffffffff81196a1b>]
>>> shrink_slab.part.41+0x20b/0x3f0
>>> Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
>>> ffffffff815e4c85 ffff88001a15c800 0000000000013183
>>>
>>> And after that more and more processes are hitting STAT D, nfsd is
>>> hang to wait or the inode deletion to complete in
>>> __wait_on_freeing_inode.
>>>
>>> Sep 07 12:25:35 myPC kernel: [11039.390218] nfsd            D
>>> 0000000000000000     0 30419      2 0x00000080
>>> Sep 07 12:25:35 myPC kernel: [11039.390221]  ffff88000da47a98
>>> ffff8800277ec380 ffff8800374ec580 ffff88000da48000
>>> Sep 07 12:25:35 myPC kernel: [11039.390223]  ffff88000da47ad8
>>> ffffc9000011edc8 0000000000000000 ffff88000e92e2f8
>>> Sep 07 12:25:35 myPC kernel: [11039.390225]  ffff88000da47ab0
>>> ffffffff815e4c85 ffffffff81c094c0 ffff88003ff9d8b8
>>> Sep 07 12:25:35 myPC kernel: [11039.390225] Call Trace:
>>> Sep 07 12:25:35 myPC kernel: [11039.390231]  [<ffffffff815e4c85>]
>>> schedule+0x35/0x80
>>> Sep 07 12:25:35 myPC kernel: [11039.390236]  [<ffffffff812157fc>]
>>> __wait_on_freeing_inode+0x9c/0xc0
>>> Sep 07 12:25:35 myPC kernel: [11039.390241]  [<ffffffff81215d83>]
>>> find_inode_fast+0x73/0xa0
>>> Sep 07 12:25:35 myPC kernel: [11039.390245]  [<ffffffff81216052>]
>>> iget_locked+0x72/0x190
>>> Sep 07 12:25:35 myPC kernel: [11039.390284]  [<ffffffffa03aeb8c>]
>>> ext4_iget+0x2c/0xb10 [ext4]
>>> Sep 07 12:25:35 myPC kernel: [11039.390314]  [<ffffffffa03c1055>]
>>> ext4_nfs_get_inode+0x45/0x80 [ext4]
>>> Sep 07 12:25:35 myPC kernel: [11039.390320]  [<ffffffff812211c4>]
>>> generic_fh_to_dentry+0x24/0x30
>>> Sep 07 12:25:35 myPC kernel: [11039.390326]  [<ffffffff81277ede>]
>>> exportfs_decode_fh+0x5e/0x350
>>> Sep 07 12:25:35 myPC kernel: [11039.390347]  [<ffffffffa047749a>]
>>> fh_verify+0x2da/0x5a0 [nfsd]
>>> Sep 07 12:25:35 myPC kernel: [11039.390362]  [<ffffffffa0480933>]
>>> nfsd3_proc_getattr+0x63/0xd0 [nfsd]
>>> Sep 07 12:25:35 myPC kernel: [11039.390374]  [<ffffffffa0473e23>]
>>> nfsd_dispatch+0xc3/0x260 [nfsd]
>>> Sep 07 12:25:35 myPC kernel: [11039.390411]  [<ffffffffa0322108>]
>>> svc_process_common+0x418/0x6a0 [sunrpc]
>>> Sep 07 12:25:35 myPC kernel: [11039.390439]  [<ffffffffa032248d>]
>>> svc_process+0xfd/0x1b0 [sunrpc]
>>> Sep 07 12:25:35 myPC kernel: [11039.390454]  [<ffffffffa047389a>]
>>> nfsd+0xea/0x160 [nfsd]
>>>
>>> That causes the other nfs client is hang to wait nfsd to continue.
>>>
>>> Sep 07 12:25:35 myPC kernel: [11039.207987] Call Trace:
>>> Sep 07 12:25:35 myPC kernel: [11039.207994]  [<ffffffff815e4c85>]
>>> schedule+0x35/0x80
>>> Sep 07 12:25:35 myPC kernel: [11039.208008]  [<ffffffffa031acce>]
>>> rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
>>> Sep 07 12:25:35 myPC kernel: [11039.208055]  [<ffffffff815e5098>]
>>> __wait_on_bit+0x58/0x90
>>> Sep 07 12:25:35 myPC kernel: [11039.208059]  [<ffffffff815e513e>]
>>> out_of_line_wait_on_bit+0x6e/0x80
>>> Sep 07 12:25:35 myPC kernel: [11039.208082]  [<ffffffffa031bd26>]
>>> __rpc_execute+0x146/0x440 [sunrpc]
>>> Sep 07 12:25:35 myPC kernel: [11039.208104]  [<ffffffffa0313049>]
>>> rpc_run_task+0x69/0x80 [sunrpc]
>>> Sep 07 12:25:35 myPC kernel: [11039.208126]  [<ffffffffa031309f>]
>>> rpc_call_sync+0x3f/0xa0 [sunrpc]
>>> Sep 07 12:25:35 myPC kernel: [11039.204818]  [<ffffffff81200d69>]
>>> vfs_fstatat+0x49/0x90
>>> Sep 07 12:25:35 myPC kernel: [11039.204821]  [<ffffffff812011ea>]
>>> SYSC_newstat+0x1a/0x40
>>> Sep 07 12:25:35 myPC kernel: [11039.204826]  [<ffffffff815e876e>]
>>> entry_SYSCALL_64_fastpath+0x12/0x6d
>>> Sep 07 12:25:35 myPC kernel: [11039.207972] DWARF2 unwinder stuck at
>>> entry_SYSCALL_64_fastpath+0x12/0x6d
>>>
>>> I guess under heavy load, somehow the inode cannot be evicted in
>>> kswap, so the nfsd just keeps waiting and cause the whole system ends
>>> up in unresponsive state. I found some related issues
>>>
>>> https://www.systutorials.com/linux-kernels/105429/nfsv4-1-prevent-a-3-way-deadlock-between-layoutreturn-open-and-state-recovery-linux-3-12-17/
>>> (it?s not the same as it?s nfsv4 but just refer for the same symptom.)
>>>
>>> My system has very limited resource, only 1 core and  1GB RAM and the
>>> issue is very easy to happen in a slow disk. However, I did not see
>>> the same issue in the older kernel 3.12.61. Could you please let me
>>> know if my analysis makes sense, if jbd2_log_wait_commit hang causes
>>> __wait_on_freeing_inode wait ? Is it a known issue in kernel? If any
>>> kernel change causes the issue?. And if we have any WA to avoid that
>>> like preventing kswapd kicks in (I have a 5G swap partition) or to
>>> make my system is not unresponsive like that (of course it is not to
>>> add more RAM :) ?
>>>
>>> Thanks a lot
>>> Brs,
>>> Naruto
>>>
>>> _______________________________________________
>>> Kernelnewbies mailing list
>>> Kernelnewbies at kernelnewbies.org
>>> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>>>
>>
>>
>> --
>> regards,
>>
>> Mulyadi Santosa
>> Freelance Linux trainer and consultant
>>
>> blog: the-hydra.blogspot.com
>> training: mulyaditraining.blogspot.com
>>
>

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20181118/0870ea50/attachment.html>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: High IO load cause kswapd hang and unresponsive system
  2018-11-18 15:40     ` mulyadi.santosa
@ 2018-11-18 15:40       ` Mulyadi Santosa
  0 siblings, 0 replies; 8+ messages in thread
From: Mulyadi Santosa @ 2018-11-18 15:40 UTC (permalink / raw)
  To: kernelnewbies

[-- Attachment #1.1: Type: text/plain, Size: 8220 bytes --]

On Sat, Nov 17, 2018 at 4:36 PM Naruto Nguyen <narutonguyen2018@gmail.com>
wrote:

> Hi Mulyadi,
>
> Thanks for your answer. I have updated to kernel 4.4.121 but the issue
> still exists. I will try to update to kernel 4.4.163. But if you could
> provide any reason that why i did not see the issue in the 3.12 kernel and
> now it happens in 4.4, any changes in 4.4 can trigger the issue? And if
> kernel 4.4.163 has any changes to fix the issue?
>
>
Hi..

Please don't top post.

As for the "why", that I don't know. There are too many changes between 3.x
and 4.x. That's why checking if the issue still happens with latest stable
version IS the first logical step .




> Thanks,
> Naruto
>
> On Friday, 16 November 2018, Mulyadi Santosa <mulyadi.santosa@gmail.com>
> wrote:
>
>>
>>
>> On Thu, Nov 15, 2018 at 11:57 PM Naruto Nguyen <
>> narutonguyen2018@gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> Recently when I upgrade my kernel from 3.12.61 to kernel 4.4.74, I
>>>
>>
>> Hi.
>>
>> You need to upgrade, either using 4.19.2, 4.18.2, or 4.4.163 (if you
>> prefer to stay with 4.4.x) and see if it still happens
>>
>> find my system sometimes is very slow and unresponsive
>>>
>>> When my system is under a heavy IO load, I see the node start to
>>> respond very slow, the IO wait time keeps very high, and the node
>>> starts unresponsive. I try to dump all block task at that time and
>>> found some interesting things, look like under a heavy load I/O, the
>>> kswap start to hang in jbd2_log_wait_commit.
>>>
>>> Sep 07 12:25:35 myPC kernel: [11039.126514] kswapd0         D
>>> 0000000000000000     0    26      2 0x00000000
>>> Sep 07 12:25:35 myPC kernel: [11039.126517]  ffff88003da27b78
>>> ffff88000ce08280 ffff88003cdf0680 ffff88003da28000
>>> Sep 07 12:25:35 myPC kernel: [11039.126519]  ffff88001a15c888
>>> ffff88001a15c824 ffff88003da27bb8 ffff88001a15c8a0
>>> Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
>>> ffffffff815e4c85 ffff88001a15c800 0000000000013183
>>> Sep 07 12:25:35 myPC kernel: [11039.126522] Call Trace:
>>> Sep 07 12:25:35 myPC kernel: [11039.126539]  [<ffffffff815e4c85>]
>>> schedule+0x35/0x80
>>> Sep 07 12:25:35 myPC kernel: [11039.126561]  [<ffffffffa02faada>]
>>> jbd2_log_wait_commit+0x8a/0xf0 [jbd2]
>>> Sep 07 12:25:35 myPC kernel: [11039.126621]  [<ffffffffa03b28a4>]
>>> ext4_evict_inode+0x2d4/0x4b0 [ext4]
>>> Sep 07 12:25:35 myPC kernel: [11039.126630]  [<ffffffff81215a7a>]
>>> evict+0xba/0x190
>>> Sep 07 12:25:35 myPC kernel: [11039.126635]  [<ffffffff81215b82>]
>>> dispose_list+0x32/0x50
>>> Sep 07 12:25:35 myPC kernel: [11039.126639]  [<ffffffff81216e52>]
>>> prune_icache_sb+0x42/0x50
>>> Sep 07 12:25:35 myPC kernel: [11039.126644]  [<ffffffff811ff8f6>]
>>> super_cache_scan+0x136/0x180
>>> Sep 07 12:25:35 myPC kernel: [11039.126653]  [<ffffffff81196a1b>]
>>> shrink_slab.part.41+0x20b/0x3f0
>>> Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90
>>> ffffffff815e4c85 ffff88001a15c800 0000000000013183
>>>
>>> And after that more and more processes are hitting STAT D, nfsd is
>>> hang to wait or the inode deletion to complete in
>>> __wait_on_freeing_inode.
>>>
>>> Sep 07 12:25:35 myPC kernel: [11039.390218] nfsd            D
>>> 0000000000000000     0 30419      2 0x00000080
>>> Sep 07 12:25:35 myPC kernel: [11039.390221]  ffff88000da47a98
>>> ffff8800277ec380 ffff8800374ec580 ffff88000da48000
>>> Sep 07 12:25:35 myPC kernel: [11039.390223]  ffff88000da47ad8
>>> ffffc9000011edc8 0000000000000000 ffff88000e92e2f8
>>> Sep 07 12:25:35 myPC kernel: [11039.390225]  ffff88000da47ab0
>>> ffffffff815e4c85 ffffffff81c094c0 ffff88003ff9d8b8
>>> Sep 07 12:25:35 myPC kernel: [11039.390225] Call Trace:
>>> Sep 07 12:25:35 myPC kernel: [11039.390231]  [<ffffffff815e4c85>]
>>> schedule+0x35/0x80
>>> Sep 07 12:25:35 myPC kernel: [11039.390236]  [<ffffffff812157fc>]
>>> __wait_on_freeing_inode+0x9c/0xc0
>>> Sep 07 12:25:35 myPC kernel: [11039.390241]  [<ffffffff81215d83>]
>>> find_inode_fast+0x73/0xa0
>>> Sep 07 12:25:35 myPC kernel: [11039.390245]  [<ffffffff81216052>]
>>> iget_locked+0x72/0x190
>>> Sep 07 12:25:35 myPC kernel: [11039.390284]  [<ffffffffa03aeb8c>]
>>> ext4_iget+0x2c/0xb10 [ext4]
>>> Sep 07 12:25:35 myPC kernel: [11039.390314]  [<ffffffffa03c1055>]
>>> ext4_nfs_get_inode+0x45/0x80 [ext4]
>>> Sep 07 12:25:35 myPC kernel: [11039.390320]  [<ffffffff812211c4>]
>>> generic_fh_to_dentry+0x24/0x30
>>> Sep 07 12:25:35 myPC kernel: [11039.390326]  [<ffffffff81277ede>]
>>> exportfs_decode_fh+0x5e/0x350
>>> Sep 07 12:25:35 myPC kernel: [11039.390347]  [<ffffffffa047749a>]
>>> fh_verify+0x2da/0x5a0 [nfsd]
>>> Sep 07 12:25:35 myPC kernel: [11039.390362]  [<ffffffffa0480933>]
>>> nfsd3_proc_getattr+0x63/0xd0 [nfsd]
>>> Sep 07 12:25:35 myPC kernel: [11039.390374]  [<ffffffffa0473e23>]
>>> nfsd_dispatch+0xc3/0x260 [nfsd]
>>> Sep 07 12:25:35 myPC kernel: [11039.390411]  [<ffffffffa0322108>]
>>> svc_process_common+0x418/0x6a0 [sunrpc]
>>> Sep 07 12:25:35 myPC kernel: [11039.390439]  [<ffffffffa032248d>]
>>> svc_process+0xfd/0x1b0 [sunrpc]
>>> Sep 07 12:25:35 myPC kernel: [11039.390454]  [<ffffffffa047389a>]
>>> nfsd+0xea/0x160 [nfsd]
>>>
>>> That causes the other nfs client is hang to wait nfsd to continue.
>>>
>>> Sep 07 12:25:35 myPC kernel: [11039.207987] Call Trace:
>>> Sep 07 12:25:35 myPC kernel: [11039.207994]  [<ffffffff815e4c85>]
>>> schedule+0x35/0x80
>>> Sep 07 12:25:35 myPC kernel: [11039.208008]  [<ffffffffa031acce>]
>>> rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
>>> Sep 07 12:25:35 myPC kernel: [11039.208055]  [<ffffffff815e5098>]
>>> __wait_on_bit+0x58/0x90
>>> Sep 07 12:25:35 myPC kernel: [11039.208059]  [<ffffffff815e513e>]
>>> out_of_line_wait_on_bit+0x6e/0x80
>>> Sep 07 12:25:35 myPC kernel: [11039.208082]  [<ffffffffa031bd26>]
>>> __rpc_execute+0x146/0x440 [sunrpc]
>>> Sep 07 12:25:35 myPC kernel: [11039.208104]  [<ffffffffa0313049>]
>>> rpc_run_task+0x69/0x80 [sunrpc]
>>> Sep 07 12:25:35 myPC kernel: [11039.208126]  [<ffffffffa031309f>]
>>> rpc_call_sync+0x3f/0xa0 [sunrpc]
>>> Sep 07 12:25:35 myPC kernel: [11039.204818]  [<ffffffff81200d69>]
>>> vfs_fstatat+0x49/0x90
>>> Sep 07 12:25:35 myPC kernel: [11039.204821]  [<ffffffff812011ea>]
>>> SYSC_newstat+0x1a/0x40
>>> Sep 07 12:25:35 myPC kernel: [11039.204826]  [<ffffffff815e876e>]
>>> entry_SYSCALL_64_fastpath+0x12/0x6d
>>> Sep 07 12:25:35 myPC kernel: [11039.207972] DWARF2 unwinder stuck at
>>> entry_SYSCALL_64_fastpath+0x12/0x6d
>>>
>>> I guess under heavy load, somehow the inode cannot be evicted in
>>> kswap, so the nfsd just keeps waiting and cause the whole system ends
>>> up in unresponsive state. I found some related issues
>>>
>>> https://www.systutorials.com/linux-kernels/105429/nfsv4-1-prevent-a-3-way-deadlock-between-layoutreturn-open-and-state-recovery-linux-3-12-17/
>>> (it’s not the same as it’s nfsv4 but just refer for the same symptom.)
>>>
>>> My system has very limited resource, only 1 core and  1GB RAM and the
>>> issue is very easy to happen in a slow disk. However, I did not see
>>> the same issue in the older kernel 3.12.61. Could you please let me
>>> know if my analysis makes sense, if jbd2_log_wait_commit hang causes
>>> __wait_on_freeing_inode wait ? Is it a known issue in kernel? If any
>>> kernel change causes the issue?. And if we have any WA to avoid that
>>> like preventing kswapd kicks in (I have a 5G swap partition) or to
>>> make my system is not unresponsive like that (of course it is not to
>>> add more RAM :) ?
>>>
>>> Thanks a lot
>>> Brs,
>>> Naruto
>>>
>>> _______________________________________________
>>> Kernelnewbies mailing list
>>> Kernelnewbies@kernelnewbies.org
>>> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>>>
>>
>>
>> --
>> regards,
>>
>> Mulyadi Santosa
>> Freelance Linux trainer and consultant
>>
>> blog: the-hydra.blogspot.com
>> training: mulyaditraining.blogspot.com
>>
>

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

[-- Attachment #1.2: Type: text/html, Size: 10370 bytes --]

<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Sat, Nov 17, 2018 at 4:36 PM Naruto Nguyen &lt;<a href="mailto:narutonguyen2018@gmail.com">narutonguyen2018@gmail.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Mulyadi,<div><br></div><div>Thanks for your answer. I have updated to kernel 4.4.121 but the issue still exists. I will try to update to kernel 4.4.163. But if you could provide any reason that why i did not see the issue in the 3.12 kernel and now it happens in 4.4, any changes in 4.4 can trigger the issue? And if kernel 4.4.163 has any changes to fix the issue?</div><div><br></div></blockquote><div><br></div><div>Hi..</div><div><br></div><div>Please don&#39;t top post.</div><div><br></div><div>As for the &quot;why&quot;, that I don&#39;t know. There are too many changes between 3.x and 4.x. That&#39;s why checking if the issue still happens with latest stable version IS the first logical step .</div><div><br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div></div><div>Thanks,</div><div>Naruto<br><br>On Friday, 16 November 2018, Mulyadi Santosa &lt;<a href="mailto:mulyadi.santosa@gmail.com" target="_blank">mulyadi.santosa@gmail.com</a>&gt; wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Thu, Nov 15, 2018 at 11:57 PM Naruto Nguyen &lt;<a href="mailto:narutonguyen2018@gmail.com" target="_blank">narutonguyen2018@gmail.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi everyone,<br>
<br>
Recently when I upgrade my kernel from 3.12.61 to kernel 4.4.74, I<br></blockquote><div><br></div><div>Hi.</div><div><br></div><div>You need to upgrade, either using 4.19.2, 4.18.2, or 4.4.163 (if you prefer to stay with 4.4.x) and see if it still happens</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
find my system sometimes is very slow and unresponsive<br>
<br>
When my system is under a heavy IO load, I see the node start to<br>
respond very slow, the IO wait time keeps very high, and the node<br>
starts unresponsive. I try to dump all block task at that time and<br>
found some interesting things, look like under a heavy load I/O, the<br>
kswap start to hang in jbd2_log_wait_commit.<br>
<br>
Sep 07 12:25:35 myPC kernel: [11039.126514] kswapd0         D<br>
0000000000000000     0    26      2 0x00000000<br>
Sep 07 12:25:35 myPC kernel: [11039.126517]  ffff88003da27b78<br>
ffff88000ce08280 ffff88003cdf0680 ffff88003da28000<br>
Sep 07 12:25:35 myPC kernel: [11039.126519]  ffff88001a15c888<br>
ffff88001a15c824 ffff88003da27bb8 ffff88001a15c8a0<br>
Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90<br>
ffffffff815e4c85 ffff88001a15c800 0000000000013183<br>
Sep 07 12:25:35 myPC kernel: [11039.126522] Call Trace:<br>
Sep 07 12:25:35 myPC kernel: [11039.126539]  [&lt;ffffffff815e4c85&gt;]<br>
schedule+0x35/0x80<br>
Sep 07 12:25:35 myPC kernel: [11039.126561]  [&lt;ffffffffa02faada&gt;]<br>
jbd2_log_wait_commit+0x8a/0xf0 [jbd2]<br>
Sep 07 12:25:35 myPC kernel: [11039.126621]  [&lt;ffffffffa03b28a4&gt;]<br>
ext4_evict_inode+0x2d4/0x4b0 [ext4]<br>
Sep 07 12:25:35 myPC kernel: [11039.126630]  [&lt;ffffffff81215a7a&gt;]<br>
evict+0xba/0x190<br>
Sep 07 12:25:35 myPC kernel: [11039.126635]  [&lt;ffffffff81215b82&gt;]<br>
dispose_list+0x32/0x50<br>
Sep 07 12:25:35 myPC kernel: [11039.126639]  [&lt;ffffffff81216e52&gt;]<br>
prune_icache_sb+0x42/0x50<br>
Sep 07 12:25:35 myPC kernel: [11039.126644]  [&lt;ffffffff811ff8f6&gt;]<br>
super_cache_scan+0x136/0x180<br>
Sep 07 12:25:35 myPC kernel: [11039.126653]  [&lt;ffffffff81196a1b&gt;]<br>
shrink_slab.part.41+0x20b/0x3f0<br>
Sep 07 12:25:35 myPC kernel: [11039.126521]  ffff88003da27b90<br>
ffffffff815e4c85 ffff88001a15c800 0000000000013183<br>
<br>
And after that more and more processes are hitting STAT D, nfsd is<br>
hang to wait or the inode deletion to complete in<br>
__wait_on_freeing_inode.<br>
<br>
Sep 07 12:25:35 myPC kernel: [11039.390218] nfsd            D<br>
0000000000000000     0 30419      2 0x00000080<br>
Sep 07 12:25:35 myPC kernel: [11039.390221]  ffff88000da47a98<br>
ffff8800277ec380 ffff8800374ec580 ffff88000da48000<br>
Sep 07 12:25:35 myPC kernel: [11039.390223]  ffff88000da47ad8<br>
ffffc9000011edc8 0000000000000000 ffff88000e92e2f8<br>
Sep 07 12:25:35 myPC kernel: [11039.390225]  ffff88000da47ab0<br>
ffffffff815e4c85 ffffffff81c094c0 ffff88003ff9d8b8<br>
Sep 07 12:25:35 myPC kernel: [11039.390225] Call Trace:<br>
Sep 07 12:25:35 myPC kernel: [11039.390231]  [&lt;ffffffff815e4c85&gt;]<br>
schedule+0x35/0x80<br>
Sep 07 12:25:35 myPC kernel: [11039.390236]  [&lt;ffffffff812157fc&gt;]<br>
__wait_on_freeing_inode+0x9c/0xc0<br>
Sep 07 12:25:35 myPC kernel: [11039.390241]  [&lt;ffffffff81215d83&gt;]<br>
find_inode_fast+0x73/0xa0<br>
Sep 07 12:25:35 myPC kernel: [11039.390245]  [&lt;ffffffff81216052&gt;]<br>
iget_locked+0x72/0x190<br>
Sep 07 12:25:35 myPC kernel: [11039.390284]  [&lt;ffffffffa03aeb8c&gt;]<br>
ext4_iget+0x2c/0xb10 [ext4]<br>
Sep 07 12:25:35 myPC kernel: [11039.390314]  [&lt;ffffffffa03c1055&gt;]<br>
ext4_nfs_get_inode+0x45/0x80 [ext4]<br>
Sep 07 12:25:35 myPC kernel: [11039.390320]  [&lt;ffffffff812211c4&gt;]<br>
generic_fh_to_dentry+0x24/0x30<br>
Sep 07 12:25:35 myPC kernel: [11039.390326]  [&lt;ffffffff81277ede&gt;]<br>
exportfs_decode_fh+0x5e/0x350<br>
Sep 07 12:25:35 myPC kernel: [11039.390347]  [&lt;ffffffffa047749a&gt;]<br>
fh_verify+0x2da/0x5a0 [nfsd]<br>
Sep 07 12:25:35 myPC kernel: [11039.390362]  [&lt;ffffffffa0480933&gt;]<br>
nfsd3_proc_getattr+0x63/0xd0 [nfsd]<br>
Sep 07 12:25:35 myPC kernel: [11039.390374]  [&lt;ffffffffa0473e23&gt;]<br>
nfsd_dispatch+0xc3/0x260 [nfsd]<br>
Sep 07 12:25:35 myPC kernel: [11039.390411]  [&lt;ffffffffa0322108&gt;]<br>
svc_process_common+0x418/0x6a0 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.390439]  [&lt;ffffffffa032248d&gt;]<br>
svc_process+0xfd/0x1b0 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.390454]  [&lt;ffffffffa047389a&gt;]<br>
nfsd+0xea/0x160 [nfsd]<br>
<br>
That causes the other nfs client is hang to wait nfsd to continue.<br>
<br>
Sep 07 12:25:35 myPC kernel: [11039.207987] Call Trace:<br>
Sep 07 12:25:35 myPC kernel: [11039.207994]  [&lt;ffffffff815e4c85&gt;]<br>
schedule+0x35/0x80<br>
Sep 07 12:25:35 myPC kernel: [11039.208008]  [&lt;ffffffffa031acce&gt;]<br>
rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.208055]  [&lt;ffffffff815e5098&gt;]<br>
__wait_on_bit+0x58/0x90<br>
Sep 07 12:25:35 myPC kernel: [11039.208059]  [&lt;ffffffff815e513e&gt;]<br>
out_of_line_wait_on_bit+0x6e/0x80<br>
Sep 07 12:25:35 myPC kernel: [11039.208082]  [&lt;ffffffffa031bd26&gt;]<br>
__rpc_execute+0x146/0x440 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.208104]  [&lt;ffffffffa0313049&gt;]<br>
rpc_run_task+0x69/0x80 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.208126]  [&lt;ffffffffa031309f&gt;]<br>
rpc_call_sync+0x3f/0xa0 [sunrpc]<br>
Sep 07 12:25:35 myPC kernel: [11039.204818]  [&lt;ffffffff81200d69&gt;]<br>
vfs_fstatat+0x49/0x90<br>
Sep 07 12:25:35 myPC kernel: [11039.204821]  [&lt;ffffffff812011ea&gt;]<br>
SYSC_newstat+0x1a/0x40<br>
Sep 07 12:25:35 myPC kernel: [11039.204826]  [&lt;ffffffff815e876e&gt;]<br>
entry_SYSCALL_64_fastpath+0x12/0x6d<br>
Sep 07 12:25:35 myPC kernel: [11039.207972] DWARF2 unwinder stuck at<br>
entry_SYSCALL_64_fastpath+0x12/0x6d<br>
<br>
I guess under heavy load, somehow the inode cannot be evicted in<br>
kswap, so the nfsd just keeps waiting and cause the whole system ends<br>
up in unresponsive state. I found some related issues<br>
<a href="https://www.systutorials.com/linux-kernels/105429/nfsv4-1-prevent-a-3-way-deadlock-between-layoutreturn-open-and-state-recovery-linux-3-12-17/" rel="noreferrer" target="_blank">https://www.systutorials.com/linux-kernels/105429/nfsv4-1-prevent-a-3-way-deadlock-between-layoutreturn-open-and-state-recovery-linux-3-12-17/</a><br>
(it’s not the same as it’s nfsv4 but just refer for the same symptom.)<br>
<br>
My system has very limited resource, only 1 core and  1GB RAM and the<br>
issue is very easy to happen in a slow disk. However, I did not see<br>
the same issue in the older kernel 3.12.61. Could you please let me<br>
know if my analysis makes sense, if jbd2_log_wait_commit hang causes<br>
__wait_on_freeing_inode wait ? Is it a known issue in kernel? If any<br>
kernel change causes the issue?. And if we have any WA to avoid that<br>
like preventing kswapd kicks in (I have a 5G swap partition) or to<br>
make my system is not unresponsive like that (of course it is not to<br>
add more RAM :) ?<br>
<br>
Thanks a lot<br>
Brs,<br>
Naruto<br>
<br>
_______________________________________________<br>
Kernelnewbies mailing list<br>
<a href="mailto:Kernelnewbies@kernelnewbies.org" target="_blank">Kernelnewbies@kernelnewbies.org</a><br>
<a href="https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies" rel="noreferrer" target="_blank">https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies</a><br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" data-smartmail="gmail_signature">regards,<br><br>Mulyadi Santosa<br>Freelance Linux trainer and consultant<br><br>blog: <a href="http://the-hydra.blogspot.com" target="_blank">the-hydra.blogspot.com</a><br>training: <a href="http://mulyaditraining.blogspot.com" target="_blank">mulyaditraining.blogspot.com</a></div></div>
</blockquote></div>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">regards,<br><br>Mulyadi Santosa<br>Freelance Linux trainer and consultant<br><br>blog: <a href="http://the-hydra.blogspot.com" target="_blank">the-hydra.blogspot.com</a><br>training: <a href="http://mulyaditraining.blogspot.com" target="_blank">mulyaditraining.blogspot.com</a></div></div>

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-15 16:55 High IO load cause kswapd hang and unresponsive system narutonguyen2018
2018-11-15 16:55 ` Naruto Nguyen
2018-11-16  5:10 ` mulyadi.santosa
2018-11-16  5:10   ` Mulyadi Santosa
2018-11-17  9:36   ` narutonguyen2018
2018-11-17  9:36     ` Naruto Nguyen
2018-11-18 15:40     ` mulyadi.santosa
2018-11-18 15:40       ` Mulyadi Santosa

Kernel Newbies archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kernelnewbies/0 kernelnewbies/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kernelnewbies kernelnewbies/ https://lore.kernel.org/kernelnewbies \
		kernelnewbies@kernelnewbies.org kernelnewbies@archiver.kernel.org
	public-inbox-index kernelnewbies


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernelnewbies.kernelnewbies


AGPL code for this site: git clone https://public-inbox.org/ public-inbox