From mboxrd@z Thu Jan 1 00:00:00 1970 From: mulyadi.santosa@gmail.com (Mulyadi Santosa) Date: Sun, 18 Nov 2018 22:40:23 +0700 Subject: High IO load cause kswapd hang and unresponsive system In-Reply-To: References: Message-ID: To: kernelnewbies@lists.kernelnewbies.org List-Id: kernelnewbies.lists.kernelnewbies.org On Sat, Nov 17, 2018 at 4:36 PM Naruto Nguyen wrote: > Hi Mulyadi, > > Thanks for your answer. I have updated to kernel 4.4.121 but the issue > still exists. I will try to update to kernel 4.4.163. But if you could > provide any reason that why i did not see the issue in the 3.12 kernel and > now it happens in 4.4, any changes in 4.4 can trigger the issue? And if > kernel 4.4.163 has any changes to fix the issue? > > Hi.. Please don't top post. As for the "why", that I don't know. There are too many changes between 3.x and 4.x. That's why checking if the issue still happens with latest stable version IS the first logical step . > Thanks, > Naruto > > On Friday, 16 November 2018, Mulyadi Santosa > wrote: > >> >> >> On Thu, Nov 15, 2018 at 11:57 PM Naruto Nguyen < >> narutonguyen2018 at gmail.com> wrote: >> >>> Hi everyone, >>> >>> Recently when I upgrade my kernel from 3.12.61 to kernel 4.4.74, I >>> >> >> Hi. >> >> You need to upgrade, either using 4.19.2, 4.18.2, or 4.4.163 (if you >> prefer to stay with 4.4.x) and see if it still happens >> >> find my system sometimes is very slow and unresponsive >>> >>> When my system is under a heavy IO load, I see the node start to >>> respond very slow, the IO wait time keeps very high, and the node >>> starts unresponsive. I try to dump all block task at that time and >>> found some interesting things, look like under a heavy load I/O, the >>> kswap start to hang in jbd2_log_wait_commit. >>> >>> Sep 07 12:25:35 myPC kernel: [11039.126514] kswapd0 D >>> 0000000000000000 0 26 2 0x00000000 >>> Sep 07 12:25:35 myPC kernel: [11039.126517] ffff88003da27b78 >>> ffff88000ce08280 ffff88003cdf0680 ffff88003da28000 >>> Sep 07 12:25:35 myPC kernel: [11039.126519] ffff88001a15c888 >>> ffff88001a15c824 ffff88003da27bb8 ffff88001a15c8a0 >>> Sep 07 12:25:35 myPC kernel: [11039.126521] ffff88003da27b90 >>> ffffffff815e4c85 ffff88001a15c800 0000000000013183 >>> Sep 07 12:25:35 myPC kernel: [11039.126522] Call Trace: >>> Sep 07 12:25:35 myPC kernel: [11039.126539] [] >>> schedule+0x35/0x80 >>> Sep 07 12:25:35 myPC kernel: [11039.126561] [] >>> jbd2_log_wait_commit+0x8a/0xf0 [jbd2] >>> Sep 07 12:25:35 myPC kernel: [11039.126621] [] >>> ext4_evict_inode+0x2d4/0x4b0 [ext4] >>> Sep 07 12:25:35 myPC kernel: [11039.126630] [] >>> evict+0xba/0x190 >>> Sep 07 12:25:35 myPC kernel: [11039.126635] [] >>> dispose_list+0x32/0x50 >>> Sep 07 12:25:35 myPC kernel: [11039.126639] [] >>> prune_icache_sb+0x42/0x50 >>> Sep 07 12:25:35 myPC kernel: [11039.126644] [] >>> super_cache_scan+0x136/0x180 >>> Sep 07 12:25:35 myPC kernel: [11039.126653] [] >>> shrink_slab.part.41+0x20b/0x3f0 >>> Sep 07 12:25:35 myPC kernel: [11039.126521] ffff88003da27b90 >>> ffffffff815e4c85 ffff88001a15c800 0000000000013183 >>> >>> And after that more and more processes are hitting STAT D, nfsd is >>> hang to wait or the inode deletion to complete in >>> __wait_on_freeing_inode. >>> >>> Sep 07 12:25:35 myPC kernel: [11039.390218] nfsd D >>> 0000000000000000 0 30419 2 0x00000080 >>> Sep 07 12:25:35 myPC kernel: [11039.390221] ffff88000da47a98 >>> ffff8800277ec380 ffff8800374ec580 ffff88000da48000 >>> Sep 07 12:25:35 myPC kernel: [11039.390223] ffff88000da47ad8 >>> ffffc9000011edc8 0000000000000000 ffff88000e92e2f8 >>> Sep 07 12:25:35 myPC kernel: [11039.390225] ffff88000da47ab0 >>> ffffffff815e4c85 ffffffff81c094c0 ffff88003ff9d8b8 >>> Sep 07 12:25:35 myPC kernel: [11039.390225] Call Trace: >>> Sep 07 12:25:35 myPC kernel: [11039.390231] [] >>> schedule+0x35/0x80 >>> Sep 07 12:25:35 myPC kernel: [11039.390236] [] >>> __wait_on_freeing_inode+0x9c/0xc0 >>> Sep 07 12:25:35 myPC kernel: [11039.390241] [] >>> find_inode_fast+0x73/0xa0 >>> Sep 07 12:25:35 myPC kernel: [11039.390245] [] >>> iget_locked+0x72/0x190 >>> Sep 07 12:25:35 myPC kernel: [11039.390284] [] >>> ext4_iget+0x2c/0xb10 [ext4] >>> Sep 07 12:25:35 myPC kernel: [11039.390314] [] >>> ext4_nfs_get_inode+0x45/0x80 [ext4] >>> Sep 07 12:25:35 myPC kernel: [11039.390320] [] >>> generic_fh_to_dentry+0x24/0x30 >>> Sep 07 12:25:35 myPC kernel: [11039.390326] [] >>> exportfs_decode_fh+0x5e/0x350 >>> Sep 07 12:25:35 myPC kernel: [11039.390347] [] >>> fh_verify+0x2da/0x5a0 [nfsd] >>> Sep 07 12:25:35 myPC kernel: [11039.390362] [] >>> nfsd3_proc_getattr+0x63/0xd0 [nfsd] >>> Sep 07 12:25:35 myPC kernel: [11039.390374] [] >>> nfsd_dispatch+0xc3/0x260 [nfsd] >>> Sep 07 12:25:35 myPC kernel: [11039.390411] [] >>> svc_process_common+0x418/0x6a0 [sunrpc] >>> Sep 07 12:25:35 myPC kernel: [11039.390439] [] >>> svc_process+0xfd/0x1b0 [sunrpc] >>> Sep 07 12:25:35 myPC kernel: [11039.390454] [] >>> nfsd+0xea/0x160 [nfsd] >>> >>> That causes the other nfs client is hang to wait nfsd to continue. >>> >>> Sep 07 12:25:35 myPC kernel: [11039.207987] Call Trace: >>> Sep 07 12:25:35 myPC kernel: [11039.207994] [] >>> schedule+0x35/0x80 >>> Sep 07 12:25:35 myPC kernel: [11039.208008] [] >>> rpc_wait_bit_killable+0x1e/0xa0 [sunrpc] >>> Sep 07 12:25:35 myPC kernel: [11039.208055] [] >>> __wait_on_bit+0x58/0x90 >>> Sep 07 12:25:35 myPC kernel: [11039.208059] [] >>> out_of_line_wait_on_bit+0x6e/0x80 >>> Sep 07 12:25:35 myPC kernel: [11039.208082] [] >>> __rpc_execute+0x146/0x440 [sunrpc] >>> Sep 07 12:25:35 myPC kernel: [11039.208104] [] >>> rpc_run_task+0x69/0x80 [sunrpc] >>> Sep 07 12:25:35 myPC kernel: [11039.208126] [] >>> rpc_call_sync+0x3f/0xa0 [sunrpc] >>> Sep 07 12:25:35 myPC kernel: [11039.204818] [] >>> vfs_fstatat+0x49/0x90 >>> Sep 07 12:25:35 myPC kernel: [11039.204821] [] >>> SYSC_newstat+0x1a/0x40 >>> Sep 07 12:25:35 myPC kernel: [11039.204826] [] >>> entry_SYSCALL_64_fastpath+0x12/0x6d >>> Sep 07 12:25:35 myPC kernel: [11039.207972] DWARF2 unwinder stuck at >>> entry_SYSCALL_64_fastpath+0x12/0x6d >>> >>> I guess under heavy load, somehow the inode cannot be evicted in >>> kswap, so the nfsd just keeps waiting and cause the whole system ends >>> up in unresponsive state. I found some related issues >>> >>> https://www.systutorials.com/linux-kernels/105429/nfsv4-1-prevent-a-3-way-deadlock-between-layoutreturn-open-and-state-recovery-linux-3-12-17/ >>> (it?s not the same as it?s nfsv4 but just refer for the same symptom.) >>> >>> My system has very limited resource, only 1 core and 1GB RAM and the >>> issue is very easy to happen in a slow disk. However, I did not see >>> the same issue in the older kernel 3.12.61. Could you please let me >>> know if my analysis makes sense, if jbd2_log_wait_commit hang causes >>> __wait_on_freeing_inode wait ? Is it a known issue in kernel? If any >>> kernel change causes the issue?. And if we have any WA to avoid that >>> like preventing kswapd kicks in (I have a 5G swap partition) or to >>> make my system is not unresponsive like that (of course it is not to >>> add more RAM :) ? >>> >>> Thanks a lot >>> Brs, >>> Naruto >>> >>> _______________________________________________ >>> Kernelnewbies mailing list >>> Kernelnewbies at kernelnewbies.org >>> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >>> >> >> >> -- >> regards, >> >> Mulyadi Santosa >> Freelance Linux trainer and consultant >> >> blog: the-hydra.blogspot.com >> training: mulyaditraining.blogspot.com >> > -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x143.google.com ([2a00:1450:4864:20::143]) by shelob.surriel.com with esmtps (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.91) (envelope-from ) id 1gOPCJ-0005t5-Rg for kernelnewbies@kernelnewbies.org; Sun, 18 Nov 2018 10:41:20 -0500 Received: by mail-lf1-x143.google.com with SMTP id u18so19581716lff.10 for ; Sun, 18 Nov 2018 07:41:19 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Mulyadi Santosa Date: Sun, 18 Nov 2018 22:40:23 +0700 Message-ID: Subject: Re: High IO load cause kswapd hang and unresponsive system To: kernelnewbies List-Id: Learn about the Linux kernel List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============1601557651052839022==" Errors-To: kernelnewbies-bounces@kernelnewbies.org Message-ID: <20181118154023.VZB7c51GS9AvHEUn6c3ITWlvVJl07ANoUG13vgMqUBM@z> --===============1601557651052839022== Content-Type: multipart/alternative; boundary="000000000000259471057af23b06" --000000000000259471057af23b06 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Nov 17, 2018 at 4:36 PM Naruto Nguyen wrote: > Hi Mulyadi, > > Thanks for your answer. I have updated to kernel 4.4.121 but the issue > still exists. I will try to update to kernel 4.4.163. But if you could > provide any reason that why i did not see the issue in the 3.12 kernel an= d > now it happens in 4.4, any changes in 4.4 can trigger the issue? And if > kernel 4.4.163 has any changes to fix the issue? > > Hi.. Please don't top post. As for the "why", that I don't know. There are too many changes between 3.x and 4.x. That's why checking if the issue still happens with latest stable version IS the first logical step . > Thanks, > Naruto > > On Friday, 16 November 2018, Mulyadi Santosa > wrote: > >> >> >> On Thu, Nov 15, 2018 at 11:57 PM Naruto Nguyen < >> narutonguyen2018@gmail.com> wrote: >> >>> Hi everyone, >>> >>> Recently when I upgrade my kernel from 3.12.61 to kernel 4.4.74, I >>> >> >> Hi. >> >> You need to upgrade, either using 4.19.2, 4.18.2, or 4.4.163 (if you >> prefer to stay with 4.4.x) and see if it still happens >> >> find my system sometimes is very slow and unresponsive >>> >>> When my system is under a heavy IO load, I see the node start to >>> respond very slow, the IO wait time keeps very high, and the node >>> starts unresponsive. I try to dump all block task at that time and >>> found some interesting things, look like under a heavy load I/O, the >>> kswap start to hang in jbd2_log_wait_commit. >>> >>> Sep 07 12:25:35 myPC kernel: [11039.126514] kswapd0 D >>> 0000000000000000 0 26 2 0x00000000 >>> Sep 07 12:25:35 myPC kernel: [11039.126517] ffff88003da27b78 >>> ffff88000ce08280 ffff88003cdf0680 ffff88003da28000 >>> Sep 07 12:25:35 myPC kernel: [11039.126519] ffff88001a15c888 >>> ffff88001a15c824 ffff88003da27bb8 ffff88001a15c8a0 >>> Sep 07 12:25:35 myPC kernel: [11039.126521] ffff88003da27b90 >>> ffffffff815e4c85 ffff88001a15c800 0000000000013183 >>> Sep 07 12:25:35 myPC kernel: [11039.126522] Call Trace: >>> Sep 07 12:25:35 myPC kernel: [11039.126539] [] >>> schedule+0x35/0x80 >>> Sep 07 12:25:35 myPC kernel: [11039.126561] [] >>> jbd2_log_wait_commit+0x8a/0xf0 [jbd2] >>> Sep 07 12:25:35 myPC kernel: [11039.126621] [] >>> ext4_evict_inode+0x2d4/0x4b0 [ext4] >>> Sep 07 12:25:35 myPC kernel: [11039.126630] [] >>> evict+0xba/0x190 >>> Sep 07 12:25:35 myPC kernel: [11039.126635] [] >>> dispose_list+0x32/0x50 >>> Sep 07 12:25:35 myPC kernel: [11039.126639] [] >>> prune_icache_sb+0x42/0x50 >>> Sep 07 12:25:35 myPC kernel: [11039.126644] [] >>> super_cache_scan+0x136/0x180 >>> Sep 07 12:25:35 myPC kernel: [11039.126653] [] >>> shrink_slab.part.41+0x20b/0x3f0 >>> Sep 07 12:25:35 myPC kernel: [11039.126521] ffff88003da27b90 >>> ffffffff815e4c85 ffff88001a15c800 0000000000013183 >>> >>> And after that more and more processes are hitting STAT D, nfsd is >>> hang to wait or the inode deletion to complete in >>> __wait_on_freeing_inode. >>> >>> Sep 07 12:25:35 myPC kernel: [11039.390218] nfsd D >>> 0000000000000000 0 30419 2 0x00000080 >>> Sep 07 12:25:35 myPC kernel: [11039.390221] ffff88000da47a98 >>> ffff8800277ec380 ffff8800374ec580 ffff88000da48000 >>> Sep 07 12:25:35 myPC kernel: [11039.390223] ffff88000da47ad8 >>> ffffc9000011edc8 0000000000000000 ffff88000e92e2f8 >>> Sep 07 12:25:35 myPC kernel: [11039.390225] ffff88000da47ab0 >>> ffffffff815e4c85 ffffffff81c094c0 ffff88003ff9d8b8 >>> Sep 07 12:25:35 myPC kernel: [11039.390225] Call Trace: >>> Sep 07 12:25:35 myPC kernel: [11039.390231] [] >>> schedule+0x35/0x80 >>> Sep 07 12:25:35 myPC kernel: [11039.390236] [] >>> __wait_on_freeing_inode+0x9c/0xc0 >>> Sep 07 12:25:35 myPC kernel: [11039.390241] [] >>> find_inode_fast+0x73/0xa0 >>> Sep 07 12:25:35 myPC kernel: [11039.390245] [] >>> iget_locked+0x72/0x190 >>> Sep 07 12:25:35 myPC kernel: [11039.390284] [] >>> ext4_iget+0x2c/0xb10 [ext4] >>> Sep 07 12:25:35 myPC kernel: [11039.390314] [] >>> ext4_nfs_get_inode+0x45/0x80 [ext4] >>> Sep 07 12:25:35 myPC kernel: [11039.390320] [] >>> generic_fh_to_dentry+0x24/0x30 >>> Sep 07 12:25:35 myPC kernel: [11039.390326] [] >>> exportfs_decode_fh+0x5e/0x350 >>> Sep 07 12:25:35 myPC kernel: [11039.390347] [] >>> fh_verify+0x2da/0x5a0 [nfsd] >>> Sep 07 12:25:35 myPC kernel: [11039.390362] [] >>> nfsd3_proc_getattr+0x63/0xd0 [nfsd] >>> Sep 07 12:25:35 myPC kernel: [11039.390374] [] >>> nfsd_dispatch+0xc3/0x260 [nfsd] >>> Sep 07 12:25:35 myPC kernel: [11039.390411] [] >>> svc_process_common+0x418/0x6a0 [sunrpc] >>> Sep 07 12:25:35 myPC kernel: [11039.390439] [] >>> svc_process+0xfd/0x1b0 [sunrpc] >>> Sep 07 12:25:35 myPC kernel: [11039.390454] [] >>> nfsd+0xea/0x160 [nfsd] >>> >>> That causes the other nfs client is hang to wait nfsd to continue. >>> >>> Sep 07 12:25:35 myPC kernel: [11039.207987] Call Trace: >>> Sep 07 12:25:35 myPC kernel: [11039.207994] [] >>> schedule+0x35/0x80 >>> Sep 07 12:25:35 myPC kernel: [11039.208008] [] >>> rpc_wait_bit_killable+0x1e/0xa0 [sunrpc] >>> Sep 07 12:25:35 myPC kernel: [11039.208055] [] >>> __wait_on_bit+0x58/0x90 >>> Sep 07 12:25:35 myPC kernel: [11039.208059] [] >>> out_of_line_wait_on_bit+0x6e/0x80 >>> Sep 07 12:25:35 myPC kernel: [11039.208082] [] >>> __rpc_execute+0x146/0x440 [sunrpc] >>> Sep 07 12:25:35 myPC kernel: [11039.208104] [] >>> rpc_run_task+0x69/0x80 [sunrpc] >>> Sep 07 12:25:35 myPC kernel: [11039.208126] [] >>> rpc_call_sync+0x3f/0xa0 [sunrpc] >>> Sep 07 12:25:35 myPC kernel: [11039.204818] [] >>> vfs_fstatat+0x49/0x90 >>> Sep 07 12:25:35 myPC kernel: [11039.204821] [] >>> SYSC_newstat+0x1a/0x40 >>> Sep 07 12:25:35 myPC kernel: [11039.204826] [] >>> entry_SYSCALL_64_fastpath+0x12/0x6d >>> Sep 07 12:25:35 myPC kernel: [11039.207972] DWARF2 unwinder stuck at >>> entry_SYSCALL_64_fastpath+0x12/0x6d >>> >>> I guess under heavy load, somehow the inode cannot be evicted in >>> kswap, so the nfsd just keeps waiting and cause the whole system ends >>> up in unresponsive state. I found some related issues >>> >>> https://www.systutorials.com/linux-kernels/105429/nfsv4-1-prevent-a-3-w= ay-deadlock-between-layoutreturn-open-and-state-recovery-linux-3-12-17/ >>> (it=E2=80=99s not the same as it=E2=80=99s nfsv4 but just refer for the= same symptom.) >>> >>> My system has very limited resource, only 1 core and 1GB RAM and the >>> issue is very easy to happen in a slow disk. However, I did not see >>> the same issue in the older kernel 3.12.61. Could you please let me >>> know if my analysis makes sense, if jbd2_log_wait_commit hang causes >>> __wait_on_freeing_inode wait ? Is it a known issue in kernel? If any >>> kernel change causes the issue?. And if we have any WA to avoid that >>> like preventing kswapd kicks in (I have a 5G swap partition) or to >>> make my system is not unresponsive like that (of course it is not to >>> add more RAM :) ? >>> >>> Thanks a lot >>> Brs, >>> Naruto >>> >>> _______________________________________________ >>> Kernelnewbies mailing list >>> Kernelnewbies@kernelnewbies.org >>> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >>> >> >> >> -- >> regards, >> >> Mulyadi Santosa >> Freelance Linux trainer and consultant >> >> blog: the-hydra.blogspot.com >> training: mulyaditraining.blogspot.com >> > --=20 regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com --000000000000259471057af23b06 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Sat= , Nov 17, 2018 at 4:36 PM Naruto Nguyen <narutonguyen2018@gmail.com> wrote:
Hi Mulyadi,

Thanks for your answer= . I have updated to kernel 4.4.121 but the issue still exists. I will try t= o update to kernel 4.4.163. But if you could provide any reason that why i = did not see the issue in the 3.12 kernel and now it happens in 4.4, any cha= nges in 4.4 can trigger the issue? And if kernel 4.4.163 has any changes to= fix the issue?


Hi..

Please don't top post.

As for the "why", that I don't know. There are too many chan= ges between 3.x and 4.x. That's why checking if the issue still happens= with latest stable version IS the first logical step .


=C2=A0
=
Thanks,
Naruto

On Friday, 16 November 2018, Mulyadi S= antosa <m= ulyadi.santosa@gmail.com> wrote:
<= div dir=3D"ltr">

On Thu,= Nov 15, 2018 at 11:57 PM Naruto Nguyen <narutonguyen2018@gmail.com> wrote:<= br>
Hi everyone,

Recently when I upgrade my kernel from 3.12.61 to kernel 4.4.74, I

Hi.

You need to upgrad= e, either using 4.19.2, 4.18.2, or 4.4.163 (if you prefer to stay with 4.4.= x) and see if it still happens

find my system sometimes is very slow and unresponsive

When my system is under a heavy IO load, I see the node start to
respond very slow, the IO wait time keeps very high, and the node
starts unresponsive. I try to dump all block task at that time and
found some interesting things, look like under a heavy load I/O, the
kswap start to hang in jbd2_log_wait_commit.

Sep 07 12:25:35 myPC kernel: [11039.126514] kswapd0=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0D
0000000000000000=C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 26=C2=A0 =C2=A0 =C2=A0 2= 0x00000000
Sep 07 12:25:35 myPC kernel: [11039.126517]=C2=A0 ffff88003da27b78
ffff88000ce08280 ffff88003cdf0680 ffff88003da28000
Sep 07 12:25:35 myPC kernel: [11039.126519]=C2=A0 ffff88001a15c888
ffff88001a15c824 ffff88003da27bb8 ffff88001a15c8a0
Sep 07 12:25:35 myPC kernel: [11039.126521]=C2=A0 ffff88003da27b90
ffffffff815e4c85 ffff88001a15c800 0000000000013183
Sep 07 12:25:35 myPC kernel: [11039.126522] Call Trace:
Sep 07 12:25:35 myPC kernel: [11039.126539]=C2=A0 [<ffffffff815e4c85>= ]
schedule+0x35/0x80
Sep 07 12:25:35 myPC kernel: [11039.126561]=C2=A0 [<ffffffffa02faada>= ]
jbd2_log_wait_commit+0x8a/0xf0 [jbd2]
Sep 07 12:25:35 myPC kernel: [11039.126621]=C2=A0 [<ffffffffa03b28a4>= ]
ext4_evict_inode+0x2d4/0x4b0 [ext4]
Sep 07 12:25:35 myPC kernel: [11039.126630]=C2=A0 [<ffffffff81215a7a>= ]
evict+0xba/0x190
Sep 07 12:25:35 myPC kernel: [11039.126635]=C2=A0 [<ffffffff81215b82>= ]
dispose_list+0x32/0x50
Sep 07 12:25:35 myPC kernel: [11039.126639]=C2=A0 [<ffffffff81216e52>= ]
prune_icache_sb+0x42/0x50
Sep 07 12:25:35 myPC kernel: [11039.126644]=C2=A0 [<ffffffff811ff8f6>= ]
super_cache_scan+0x136/0x180
Sep 07 12:25:35 myPC kernel: [11039.126653]=C2=A0 [<ffffffff81196a1b>= ]
shrink_slab.part.41+0x20b/0x3f0
Sep 07 12:25:35 myPC kernel: [11039.126521]=C2=A0 ffff88003da27b90
ffffffff815e4c85 ffff88001a15c800 0000000000013183

And after that more and more processes are hitting STAT D, nfsd is
hang to wait or the inode deletion to complete in
__wait_on_freeing_inode.

Sep 07 12:25:35 myPC kernel: [11039.390218] nfsd=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 D
0000000000000000=C2=A0 =C2=A0 =C2=A00 30419=C2=A0 =C2=A0 =C2=A0 2 0x0000008= 0
Sep 07 12:25:35 myPC kernel: [11039.390221]=C2=A0 ffff88000da47a98
ffff8800277ec380 ffff8800374ec580 ffff88000da48000
Sep 07 12:25:35 myPC kernel: [11039.390223]=C2=A0 ffff88000da47ad8
ffffc9000011edc8 0000000000000000 ffff88000e92e2f8
Sep 07 12:25:35 myPC kernel: [11039.390225]=C2=A0 ffff88000da47ab0
ffffffff815e4c85 ffffffff81c094c0 ffff88003ff9d8b8
Sep 07 12:25:35 myPC kernel: [11039.390225] Call Trace:
Sep 07 12:25:35 myPC kernel: [11039.390231]=C2=A0 [<ffffffff815e4c85>= ]
schedule+0x35/0x80
Sep 07 12:25:35 myPC kernel: [11039.390236]=C2=A0 [<ffffffff812157fc>= ]
__wait_on_freeing_inode+0x9c/0xc0
Sep 07 12:25:35 myPC kernel: [11039.390241]=C2=A0 [<ffffffff81215d83>= ]
find_inode_fast+0x73/0xa0
Sep 07 12:25:35 myPC kernel: [11039.390245]=C2=A0 [<ffffffff81216052>= ]
iget_locked+0x72/0x190
Sep 07 12:25:35 myPC kernel: [11039.390284]=C2=A0 [<ffffffffa03aeb8c>= ]
ext4_iget+0x2c/0xb10 [ext4]
Sep 07 12:25:35 myPC kernel: [11039.390314]=C2=A0 [<ffffffffa03c1055>= ]
ext4_nfs_get_inode+0x45/0x80 [ext4]
Sep 07 12:25:35 myPC kernel: [11039.390320]=C2=A0 [<ffffffff812211c4>= ]
generic_fh_to_dentry+0x24/0x30
Sep 07 12:25:35 myPC kernel: [11039.390326]=C2=A0 [<ffffffff81277ede>= ]
exportfs_decode_fh+0x5e/0x350
Sep 07 12:25:35 myPC kernel: [11039.390347]=C2=A0 [<ffffffffa047749a>= ]
fh_verify+0x2da/0x5a0 [nfsd]
Sep 07 12:25:35 myPC kernel: [11039.390362]=C2=A0 [<ffffffffa0480933>= ]
nfsd3_proc_getattr+0x63/0xd0 [nfsd]
Sep 07 12:25:35 myPC kernel: [11039.390374]=C2=A0 [<ffffffffa0473e23>= ]
nfsd_dispatch+0xc3/0x260 [nfsd]
Sep 07 12:25:35 myPC kernel: [11039.390411]=C2=A0 [<ffffffffa0322108>= ]
svc_process_common+0x418/0x6a0 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.390439]=C2=A0 [<ffffffffa032248d>= ]
svc_process+0xfd/0x1b0 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.390454]=C2=A0 [<ffffffffa047389a>= ]
nfsd+0xea/0x160 [nfsd]

That causes the other nfs client is hang to wait nfsd to continue.

Sep 07 12:25:35 myPC kernel: [11039.207987] Call Trace:
Sep 07 12:25:35 myPC kernel: [11039.207994]=C2=A0 [<ffffffff815e4c85>= ]
schedule+0x35/0x80
Sep 07 12:25:35 myPC kernel: [11039.208008]=C2=A0 [<ffffffffa031acce>= ]
rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.208055]=C2=A0 [<ffffffff815e5098>= ]
__wait_on_bit+0x58/0x90
Sep 07 12:25:35 myPC kernel: [11039.208059]=C2=A0 [<ffffffff815e513e>= ]
out_of_line_wait_on_bit+0x6e/0x80
Sep 07 12:25:35 myPC kernel: [11039.208082]=C2=A0 [<ffffffffa031bd26>= ]
__rpc_execute+0x146/0x440 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.208104]=C2=A0 [<ffffffffa0313049>= ]
rpc_run_task+0x69/0x80 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.208126]=C2=A0 [<ffffffffa031309f>= ]
rpc_call_sync+0x3f/0xa0 [sunrpc]
Sep 07 12:25:35 myPC kernel: [11039.204818]=C2=A0 [<ffffffff81200d69>= ]
vfs_fstatat+0x49/0x90
Sep 07 12:25:35 myPC kernel: [11039.204821]=C2=A0 [<ffffffff812011ea>= ]
SYSC_newstat+0x1a/0x40
Sep 07 12:25:35 myPC kernel: [11039.204826]=C2=A0 [<ffffffff815e876e>= ]
entry_SYSCALL_64_fastpath+0x12/0x6d
Sep 07 12:25:35 myPC kernel: [11039.207972] DWARF2 unwinder stuck at
entry_SYSCALL_64_fastpath+0x12/0x6d

I guess under heavy load, somehow the inode cannot be evicted in
kswap, so the nfsd just keeps waiting and cause the whole system ends
up in unresponsive state. I found some related issues
https://www.systutorials.com/linu= x-kernels/105429/nfsv4-1-prevent-a-3-way-deadlock-between-layoutreturn-open= -and-state-recovery-linux-3-12-17/
(it=E2=80=99s not the same as it=E2=80=99s nfsv4 but just refer for the sam= e symptom.)

My system has very limited resource, only 1 core and=C2=A0 1GB RAM and the<= br> issue is very easy to happen in a slow disk. However, I did not see
the same issue in the older kernel 3.12.61. Could you please let me
know if my analysis makes sense, if jbd2_log_wait_commit hang causes
__wait_on_freeing_inode wait ? Is it a known issue in kernel? If any
kernel change causes the issue?. And if we have any WA to avoid that
like preventing kswapd kicks in (I have a 5G swap partition) or to
make my system is not unresponsive like that (of course it is not to
add more RAM :) ?

Thanks a lot
Brs,
Naruto

_______________________________________________
Kernelnewbies mailing list
Kernel= newbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailma= n/listinfo/kernelnewbies


--
regards,

Mulyadi Santosa
Freelance Linux= trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogsp= ot.com


--
regards,

Mulyadi = Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com<= br>training: mulyaditraining.blogspot.com
--000000000000259471057af23b06-- --===============1601557651052839022== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies --===============1601557651052839022==--