* steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) @ 2021-01-03 14:27 Nick Alcock 2021-02-23 22:57 ` J. Bruce Fields 0 siblings, 1 reply; 12+ messages in thread From: Nick Alcock @ 2021-01-03 14:27 UTC (permalink / raw) To: NFS list So I bought a game or two on Steam a fortnight ago to while away a Christmas spent stuck indoors with no family, decorations, or presents (stuck at my sister's and she got a perfectly-timed case of covid-19). It turned out this was a very bad idea! I keep my Steam directory on an ext4 fs on the local fileserver and access it via NFSv4.2: over 10GbE: it's actually faster than local spinning-rust storage would be (five-spindle RAID-6 and bcache and 128GiB RAM on the server for caching probably have something to do with that!) though likely slower than the NVMe the local machine actually has: but it doesn't have enough local storage, so I keep all the Steam stuff remotely. (I keep my $HOME on the same fileserver, though that's on xfs: I'd use xfs for Steam, too, but there are still a lot of games out there that fail on filesystems with 64-bit inode numbers.) Unfortunately, what I observe now when kicking up Steam is a spasm of downloading I/O for the game I just bought and then a complete client-side hang when it starts installing: just about all I can do is hard-reboot. This of course stops me using Steam for anything which didn't make me very happy :P The server hums onwards as though nothing went wrong, but the client is in a bad way indeed: not only has NFS hung but there are clearly some important locks taken out since even an ls on a non-NFS dir (with no NFS dirs in the PATH) hangs, as does a while :; ps -e; done loop. I can rebuild with lockdep and do a lock dump if that would be useful. (The kernel is not quite stock: it has DTrace built in, but the hangs happen even if the kernel module has never been loaded, and I'm pretty sure the hangs can't be associated with the not-very-substantial built-in DTrace changes. I can try rebuilding without that as well, but I'll have to rip chunks out of my monitoring machinery to do that so I've avoided doing it so far.) I finally got round to doing a bit of debugging today. NFS debugging dumps covering the last fifteen seconds or so before the hang and a couple of minutes afterwards (taken on the client, logged without incident to the server's syslog, as proof that not *everything* is stalled) are a few megabytes long so are at <http://www.esperi.org.uk/~nix/temporary/nfs-steam-hang.log>. Everything seems to go wrong after an especially large bunch of NFS commits. Both client (silk, 192.168.16.21) and server (loom, 192.168.16.8) are running 5.10.4 with the same NFS config, though the client has no NFS server exports at the time and the server isn't acting as an NFS client at all: CONFIG_NFS_FS=y # CONFIG_NFS_V2 is not set CONFIG_NFS_V3=y CONFIG_NFS_V3_ACL=y CONFIG_NFS_V4=y # CONFIG_NFS_SWAP is not set CONFIG_NFS_V4_1=y CONFIG_NFS_V4_2=y CONFIG_PNFS_FILE_LAYOUT=y CONFIG_PNFS_BLOCK=y CONFIG_PNFS_FLEXFILE_LAYOUT=m CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org" # CONFIG_NFS_V4_1_MIGRATION is not set # CONFIG_ROOT_NFS is not set # CONFIG_NFS_USE_LEGACY_DNS is not set CONFIG_NFS_USE_KERNEL_DNS=y CONFIG_NFS_DEBUG=y CONFIG_NFS_DISABLE_UDP_SUPPORT=y # CONFIG_NFS_V4_2_READ_PLUS is not set CONFIG_NFSD=y CONFIG_NFSD_V2_ACL=y CONFIG_NFSD_V3=y CONFIG_NFSD_V3_ACL=y CONFIG_NFSD_V4=y # CONFIG_NFSD_BLOCKLAYOUT is not set # CONFIG_NFSD_SCSILAYOUT is not set # CONFIG_NFSD_FLEXFILELAYOUT is not set CONFIG_NFSD_V4_2_INTER_SSC=y CONFIG_GRACE_PERIOD=y CONFIG_LOCKD=y CONFIG_LOCKD_V4=y CONFIG_NFS_ACL_SUPPORT=y CONFIG_NFS_COMMON=y CONFIG_SUNRPC=y CONFIG_SUNRPC_GSS=y CONFIG_SUNRPC_BACKCHANNEL=y CONFIG_SUNRPC_DEBUG=y The server (loom) has a large number of exports: many of the filesystems being exported are moved around via bind mounts and the like. The steam datadir is in /pkg/non-free/steam, which is bind-mounted onto /home/nix/.steam on the client: /home/nix itself is bind-mounted from /home/.loom.srvr.nix/steam, and is NFS-exported from the server under that pathname. Relevant exports, from /proc/fs/nfs/exports: / 192.168.16.0/24,silk.wkstn.nix(ro,insecure,no_root_squash,sync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=0a4a4563:00764033:8c827c0e:989cf534,sec=390003:390004:390005:1) /home/.loom.srvr.nix *.srvr.nix,fold.srvr.nix(rw,root_squash,sync,wdelay,no_subtree_check,uuid=0a4a4563:00764033:8c827c0e:989cf534,sec=1) /pkg/non-free/steam 192.168.16.0/24,silk.wkstn.nix(rw,root_squash,sync,wdelay,uuid=11b94016:cb6e4b64:beb6c134:9ea3d6b3,sec=1) Blocked tasks after the hang suggest that the client is hung waiting indefinitely for the server to reply, and I'd guess the server doesn't think it needs to. (I can do some server-side debugging dumps too if you think the problem might be there.) [ 322.843962] sysrq: Show Blocked State [ 322.844058] task:tee state:D stack: 0 pid: 1176 ppid: 1173 flags:0x00000000 [ 322.844064] Call Trace: [ 322.844074] __schedule+0x3db/0x7ee [ 322.844079] ? io_schedule+0x70/0x6d [ 322.844082] schedule+0x68/0xdc [ 322.844085] io_schedule+0x46/0x6d [ 322.844089] bit_wait_io+0x11/0x52 [ 322.844092] __wait_on_bit+0x31/0x90 [ 322.844096] out_of_line_wait_on_bit+0x7e/0x80 [ 322.844100] ? var_wake_function+0x30/0x27 [ 322.844104] nfs_wait_on_request.part.0+0x31/0x33 [ 322.844107] nfs_page_group_lock_head+0x31/0x78 [ 322.844110] nfs_lock_and_join_requests+0x9d/0x26e [ 322.844114] nfs_updatepage+0x1be/0x985 [ 322.844118] nfs_write_end+0x128/0x52a [ 322.844122] ? iov_iter_copy_from_user_atomic+0xe8/0x3ce [ 322.844127] generic_perform_write+0xed/0x17c [ 322.844131] nfs_file_write+0x14a/0x29f [ 322.844135] new_sync_write+0xfb/0x16b [ 322.844139] vfs_write+0x1ee/0x29a [ 322.844142] ? vfs_read+0x17b/0x198 [ 322.844145] ksys_write+0x61/0xd0 [ 322.844148] __x64_sys_write+0x1a/0x1c [ 322.844151] do_syscall_64+0x32/0x45 [ 322.844154] entry_SYSCALL_64_after_hwframe+0x44/0x0 [ 322.844158] RIP: 0033:0x7f4781f323a3 [ 322.844160] RSP: 002b:00007ffcf8220408 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 322.844164] RAX: ffffffffffffffda RBX: 0000000000000092 RCX: 00007f4781f323a3 [ 322.844166] RDX: 0000000000000092 RSI: 00007ffcf8220530 RDI: 0000000000000003 [ 322.844168] RBP: 00007ffcf8220530 R08: 0000000000000000 R09: 0000000000000001 [ 322.844170] R10: 0000000000400ad2 R11: 0000000000000246 R12: 0000000000000092 [ 322.844172] R13: 0000000002361580 R14: 0000000000000092 R15: 00007f4782201720 [ 322.844202] task:CIPCServer::Thr state:D stack: 0 pid: 1801 ppid: 1540 flags:0x20020000 [ 322.844207] Call Trace: [ 322.844212] __schedule+0x3db/0x7ee [ 322.844219] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e [ 322.844222] schedule+0x68/0xdc [ 322.844227] rpc_wait_bit_killable+0x11/0x52 [ 322.844230] __wait_on_bit+0x31/0x90 [ 322.844235] ? __rpc_atrun+0x20/0x1a [ 322.844238] ? __rpc_atrun+0x20/0x1a [ 322.844242] out_of_line_wait_on_bit+0x7e/0x80 [ 322.844246] ? var_wake_function+0x30/0x27 [ 322.844250] __rpc_execute+0xe5/0x485 [ 322.844254] ? sugov_get_util+0xf0/0xeb [ 322.844258] rpc_execute+0xa5/0xaa [ 322.844263] rpc_run_task+0x148/0x187 [ 322.844267] nfs4_do_call_sync+0x61/0x81 [ 322.844272] _nfs4_proc_getattr+0x10a/0x120 [ 322.844278] nfs4_proc_getattr+0x67/0xee [ 322.844283] __nfs_revalidate_inode+0xb7/0x2df [ 322.844287] nfs_access_get_cached+0x14d/0x1df [ 322.844291] nfs_do_access+0x48/0x28e [ 322.844296] ? step_into+0x2bc/0x6a1 [ 322.844301] nfs_permission+0xa2/0x1b8 [ 322.844305] inode_permission+0x96/0xca [ 322.844308] ? unlazy_walk+0x4c/0x80 [ 322.844312] link_path_walk.part.0+0x2a5/0x320 [ 322.844315] ? path_init+0x2c2/0x3e5 [ 322.844318] path_lookupat+0x3f/0x1b1 [ 322.844322] filename_lookup+0x97/0x17f [ 322.844326] ? kmem_cache_alloc+0x32/0x200 [ 322.844331] user_path_at_empty+0x59/0x8a [ 322.844333] do_faccessat+0x70/0x227 [ 322.844337] __ia32_sys_access+0x1c/0x1e [ 322.844340] __do_fast_syscall_32+0x5f/0x8e [ 322.844342] do_fast_syscall_32+0x3d/0x80 [ 322.844346] entry_SYSCALL_compat_after_hwframe+0x45/0x0 [ 322.844348] RIP: 0023:0xf7ee6549 [ 322.844350] RSP: 002b:00000000eb763b28 EFLAGS: 00200292 ORIG_RAX: 0000000000000021 [ 322.844353] RAX: ffffffffffffffda RBX: 000000005754d9e0 RCX: 0000000000000004 [ 322.844355] RDX: 00000000f7ab3e1c RSI: 000000005754d9e0 RDI: 00000000eb763bc8 [ 322.844357] RBP: 00000000f78a91c0 R08: 0000000000000000 R09: 0000000000000000 [ 322.844359] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 322.844360] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 322.844364] task:CFileWriterThre state:D stack: 0 pid: 1810 ppid: 1540 flags:0x20024000 [ 322.844367] Call Trace: [ 322.844371] __schedule+0x3db/0x7ee [ 322.844375] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e [ 322.844379] schedule+0x68/0xdc [ 322.844382] rpc_wait_bit_killable+0x11/0x52 [ 322.844385] __wait_on_bit+0x31/0x90 [ 322.844389] ? call_reserveresult+0xa0/0x9f [ 322.844391] ? call_transmit_status+0x160/0x159 [ 322.844395] out_of_line_wait_on_bit+0x7e/0x80 [ 322.844398] ? var_wake_function+0x30/0x27 [ 322.844401] __rpc_execute+0xe5/0x485 [ 322.844404] ? nfs_check_verifier+0x5b/0xa4 [ 322.844407] rpc_execute+0xa5/0xaa [ 322.844410] rpc_run_task+0x148/0x187 [ 322.844414] nfs4_do_call_sync+0x61/0x81 [ 322.844417] _nfs4_proc_getattr+0x10a/0x120 [ 322.844422] nfs4_proc_getattr+0x67/0xee [ 322.844426] __nfs_revalidate_inode+0xb7/0x2df [ 322.844430] nfs_getattr+0x159/0x374 [ 322.844433] vfs_getattr_nosec+0x96/0xa4 [ 322.844435] vfs_statx+0x7a/0xe5 [ 322.844438] vfs_fstatat+0x1f/0x21 [ 322.844443] __do_compat_sys_ia32_lstat64+0x2b/0x4e [ 322.844448] __ia32_compat_sys_ia32_lstat64+0x14/0x16 [ 322.844450] __do_fast_syscall_32+0x5f/0x8e [ 322.844453] do_fast_syscall_32+0x3d/0x80 [ 322.844457] entry_SYSCALL_compat_after_hwframe+0x45/0x0 [ 322.844459] RIP: 0023:0xf7ee6549 [ 322.844460] RSP: 002b:00000000eb561be8 EFLAGS: 00000292 ORIG_RAX: 00000000000000c4 [ 322.844463] RAX: ffffffffffffffda RBX: 00000000c521bf60 RCX: 00000000eb561e70 [ 322.844465] RDX: 00000000f7ab3e1c RSI: 00000000eb561e70 RDI: 00000000c521bf60 [ 322.844467] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 322.844468] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 322.844470] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 322.844477] task:Steam - DirWatc state:D stack: 0 pid: 1924 ppid: 1540 flags:0x20024000 [ 322.844480] Call Trace: [ 322.844484] __schedule+0x3db/0x7ee [ 322.844488] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e [ 322.844492] schedule+0x68/0xdc [ 322.844495] rpc_wait_bit_killable+0x11/0x52 [ 322.844498] __wait_on_bit+0x31/0x90 [ 322.844501] ? rpc_sleep_on_timeout+0x64/0x9c [ 322.844504] ? rpc_check_timeout+0x1e0/0x1e0 [ 322.844507] ? call_connect_status+0x180/0x17a [ 322.844511] out_of_line_wait_on_bit+0x7e/0x80 [ 322.844513] ? var_wake_function+0x30/0x27 [ 322.844516] __rpc_execute+0xe5/0x485 [ 322.844520] rpc_execute+0xa5/0xaa [ 322.844523] rpc_run_task+0x148/0x187 [ 322.844526] nfs4_do_call_sync+0x61/0x81 [ 322.844530] _nfs4_proc_getattr+0x10a/0x120 [ 322.844534] nfs4_proc_getattr+0x67/0xee [ 322.844538] __nfs_revalidate_inode+0xb7/0x2df [ 322.844542] nfs_lookup_verify_inode+0x67/0x82 [ 322.844545] nfs_do_lookup_revalidate+0x19b/0x29c [ 322.844548] nfs4_do_lookup_revalidate+0x5d/0xc5 [ 322.844552] nfs4_lookup_revalidate+0x6a/0x8d [ 322.844555] lookup_fast+0xb8/0x122 [ 322.844559] path_openat+0xfd/0xfd8 [ 322.844562] ? unlazy_walk+0x4c/0x80 [ 322.844565] ? terminate_walk+0x62/0xed [ 322.844569] ? putname+0x4b/0x53 [ 322.844572] do_filp_open+0x86/0x110 [ 322.844576] ? kmem_cache_alloc+0x32/0x200 [ 322.844579] ? __alloc_fd+0xb2/0x143 [ 322.844583] do_sys_openat2+0x8d/0x13a [ 322.844585] ? do_faccessat+0x108/0x227 [ 322.844588] __ia32_compat_sys_openat+0x48/0x63 [ 322.844591] __do_fast_syscall_32+0x5f/0x8e [ 322.844594] do_fast_syscall_32+0x3d/0x80 [ 322.844597] entry_SYSCALL_compat_after_hwframe+0x45/0x0 [ 322.844600] RIP: 0023:0xf7ee6549 [ 322.844601] RSP: 002b:000000008c8fbcb0 EFLAGS: 00200287 ORIG_RAX: 0000000000000127 [ 322.844604] RAX: ffffffffffffffda RBX: 00000000ffffff9c RCX: 000000008c8fbffc [ 322.844606] RDX: 0000000000098800 RSI: 0000000000000000 RDI: 00000000f7ab3e1c [ 322.844608] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 322.844609] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 322.844611] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 322.844614] task:CGenericAsyncFi state:D stack: 0 pid: 1927 ppid: 1540 flags:0x20020000 [ 322.844617] Call Trace: [ 322.844621] __schedule+0x3db/0x7ee [ 322.844625] schedule+0x68/0xdc [ 322.844628] io_schedule+0x46/0x6d [ 322.844631] wait_on_page_bit_common+0xf8/0x325 [ 322.844634] ? trace_raw_output_file_check_and_advance_wb_err+0x80/0x7d [ 322.844638] wait_on_page_bit+0x3f/0x41 [ 322.844642] wait_on_page_writeback.part.0+0x35/0x80 [ 322.844645] wait_on_page_writeback+0x27/0x29 [ 322.844647] __filemap_fdatawait_range+0x88/0xd9 [ 322.844651] filemap_write_and_wait_range+0x45/0x80 [ 322.844655] nfs_wb_all+0x30/0x13d [ 322.844658] nfs4_file_flush+0x85/0xc0 [ 322.844661] filp_close+0x3e/0x78 [ 322.844664] __close_fd+0x23/0x30 [ 322.844667] __ia32_sys_close+0x22/0x50 [ 322.844670] __do_fast_syscall_32+0x5f/0x8e [ 322.844673] do_fast_syscall_32+0x3d/0x80 [ 322.844676] entry_SYSCALL_compat_after_hwframe+0x45/0x0 [ 322.844678] RIP: 0023:0xf7ee6549 [ 322.844680] RSP: 002b:000000008c5fddb0 EFLAGS: 00200286 ORIG_RAX: 0000000000000006 [ 322.844682] RAX: ffffffffffffffda RBX: 000000000000004d RCX: 0000000000000002 [ 322.844684] RDX: 0000000000000000 RSI: 00000000f7d02e70 RDI: 000000000000004d [ 322.844686] RBP: 000000008c5fe110 R08: 0000000000000000 R09: 0000000000000000 [ 322.844688] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 322.844689] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 322.844699] task:ThreadPoolForeg state:D stack: 0 pid: 1768 ppid: 1735 flags:0x00004000 [ 322.844702] Call Trace: [ 322.844705] __schedule+0x3db/0x7ee [ 322.844709] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e [ 322.844712] schedule+0x68/0xdc [ 322.844716] rpc_wait_bit_killable+0x11/0x52 [ 322.844719] __wait_on_bit+0x31/0x90 [ 322.844722] ? rpc_sleep_on_timeout+0x64/0x9c [ 322.844725] ? rpc_check_timeout+0x1e0/0x1e0 [ 322.844728] ? call_connect_status+0x180/0x17a [ 322.844731] out_of_line_wait_on_bit+0x7e/0x80 [ 322.844734] ? var_wake_function+0x30/0x27 [ 322.844737] __rpc_execute+0xe5/0x485 [ 322.844740] ? nfs_check_verifier+0x5b/0xa4 [ 322.844743] rpc_execute+0xa5/0xaa [ 322.844746] rpc_run_task+0x148/0x187 [ 322.844750] nfs4_do_call_sync+0x61/0x81 [ 322.844753] _nfs4_proc_getattr+0x10a/0x120 [ 322.844758] nfs4_proc_getattr+0x67/0xee [ 322.844762] __nfs_revalidate_inode+0xb7/0x2df [ 322.844766] nfs_getattr+0x159/0x374 [ 322.844769] vfs_getattr_nosec+0x96/0xa4 [ 322.844771] vfs_statx+0x7a/0xe5 [ 322.844774] __do_sys_newstat+0x31/0x4c [ 322.844778] __x64_sys_newstat+0x16/0x18 [ 322.844781] do_syscall_64+0x32/0x45 [ 322.844783] entry_SYSCALL_64_after_hwframe+0x44/0x0 [ 322.844786] RIP: 0033:0x7fe70ebf19e6 [ 322.844787] RSP: 002b:00007fe6c15b2068 EFLAGS: 00000246 ORIG_RAX: 0000000000000004 [ 322.844790] RAX: ffffffffffffffda RBX: 00000bd4193e1910 RCX: 00007fe70ebf19e6 [ 322.844792] RDX: 00007fe6c15b2078 RSI: 00007fe6c15b2078 RDI: 00000bd4193e1910 [ 322.844794] RBP: 00007fe6c15b2160 R08: 0000000000000001 R09: 00007ffd971ed090 [ 322.844796] R10: 0000000000056764 R11: 0000000000000246 R12: 00007fe6c15b2078 [ 322.844797] R13: 00000bd4192f6f60 R14: 00007fe6c15b2108 R15: 00007fe6c15b2120 [ 322.844801] task:ThreadPoolForeg state:D stack: 0 pid: 1769 ppid: 1735 flags:0x00000000 [ 322.844804] Call Trace: [ 322.844807] __schedule+0x3db/0x7ee [ 322.844811] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e [ 322.844814] schedule+0x68/0xdc [ 322.844817] rpc_wait_bit_killable+0x11/0x52 [ 322.844820] __wait_on_bit+0x31/0x90 [ 322.844823] ? __rpc_atrun+0x20/0x1a [ 322.844826] ? __rpc_atrun+0x20/0x1a [ 322.844829] out_of_line_wait_on_bit+0x7e/0x80 [ 322.844831] ? var_wake_function+0x30/0x27 [ 322.844834] __rpc_execute+0xe5/0x485 [ 322.844837] ? update_cfs_group+0x9a/0x9e [ 322.844840] rpc_execute+0xa5/0xaa [ 322.844842] rpc_run_task+0x148/0x187 [ 322.844845] nfs4_do_call_sync+0x61/0x81 [ 322.844849] _nfs4_proc_getattr+0x10a/0x120 [ 322.844853] nfs4_proc_getattr+0x67/0xee [ 322.844856] __nfs_revalidate_inode+0xb7/0x2df [ 322.844859] nfs_file_write+0xfc/0x29f [ 322.844862] new_sync_write+0xfb/0x16b [ 322.844866] vfs_write+0x1ee/0x29a [ 322.844868] ? __fget_light+0x31/0x72 [ 322.844871] ksys_write+0x61/0xd0 [ 322.844874] __x64_sys_write+0x1a/0x1c [ 322.844877] do_syscall_64+0x32/0x45 [ 322.844880] entry_SYSCALL_64_after_hwframe+0x44/0x0 [ 322.844881] RIP: 0033:0x7fe71859ab7f [ 322.844883] RSP: 002b:00007fe6bd5b0d30 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 [ 322.844885] RAX: ffffffffffffffda RBX: 00007fe6bd5b0d68 RCX: 00007fe71859ab7f [ 322.844887] RDX: 0000000000000007 RSI: 00007fe6bd5b1090 RDI: 000000000000005b [ 322.844888] RBP: 00007fe6bd5b0de0 R08: 0000000000000000 R09: 00007ffd971ed090 [ 322.844890] R10: 0000000000065fd2 R11: 0000000000000293 R12: 00000bd418d95020 [ 322.844891] R13: 00007fe6bd5b1090 R14: 0000000000000007 R15: 0000000000000000 [ 322.844900] task:ThreadPoolForeg state:D stack: 0 pid: 1807 ppid: 1773 flags:0x00000000 [ 322.844902] Call Trace: [ 322.844904] __schedule+0x3db/0x7ee [ 322.844907] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e [ 322.844908] schedule+0x68/0xdc [ 322.844911] rpc_wait_bit_killable+0x11/0x52 [ 322.844913] __wait_on_bit+0x31/0x90 [ 322.844915] ? __rpc_atrun+0x20/0x1a [ 322.844916] ? __rpc_atrun+0x20/0x1a [ 322.844918] out_of_line_wait_on_bit+0x7e/0x80 [ 322.844920] ? var_wake_function+0x30/0x27 [ 322.844922] __rpc_execute+0xe5/0x485 [ 322.844924] rpc_execute+0xa5/0xaa [ 322.844926] rpc_run_task+0x148/0x187 [ 322.844928] nfs4_do_call_sync+0x61/0x81 [ 322.844930] _nfs4_proc_getattr+0x10a/0x120 [ 322.844933] nfs4_proc_getattr+0x67/0xee [ 322.844936] __nfs_revalidate_inode+0xb7/0x2df [ 322.844938] nfs_access_get_cached+0x14d/0x1df [ 322.844940] nfs_do_access+0x48/0x28e [ 322.844942] ? nfs4_lookup_revalidate+0x38/0x8d [ 322.844944] nfs_permission+0xa2/0x1b8 [ 322.844946] inode_permission+0x96/0xca [ 322.844948] ? unlazy_walk+0x4c/0x80 [ 322.844950] link_path_walk.part.0+0x2a5/0x320 [ 322.844952] ? path_init+0x2c2/0x3e5 [ 322.844954] path_lookupat+0x3f/0x1b1 [ 322.844956] filename_lookup+0x97/0x17f [ 322.844959] ? do_unlk+0x9a/0xca [ 322.844960] ? kmem_cache_alloc+0x32/0x200 [ 322.844963] user_path_at_empty+0x59/0x8a [ 322.844965] vfs_statx+0x64/0xe5 [ 322.844967] __do_sys_newstat+0x31/0x4c [ 322.844968] ? do_fcntl+0x113/0x582 [ 322.844971] ? fput+0x13/0x15 [ 322.844972] ? __x64_sys_fcntl+0x81/0xa8 [ 322.844974] __x64_sys_newstat+0x16/0x18 [ 322.844976] do_syscall_64+0x32/0x45 [ 322.844977] entry_SYSCALL_64_after_hwframe+0x44/0x0 [ 322.844979] RIP: 0033:0x7f0d1fd0c9e6 [ 322.844980] RSP: 002b:00007f0d08e12e18 EFLAGS: 00000246 ORIG_RAX: 0000000000000004 [ 322.844981] RAX: ffffffffffffffda RBX: 00007f0d08e12edc RCX: 00007f0d1fd0c9e6 [ 322.844983] RDX: 00007f0d08e12e28 RSI: 00007f0d08e12e28 RDI: 00001605c2ff1169 [ 322.844984] RBP: 00007f0d08e12ec0 R08: 0000000000000001 R09: 00007fffaddd9090 [ 322.844985] R10: 00001605c3133480 R11: 0000000000000246 R12: 0000000000000000 [ 322.844986] R13: 0000000000000000 R14: 00001605c2ff0f00 R15: 0000000000000000 [ 322.844988] task:ThreadPoolForeg state:D stack: 0 pid: 2031 ppid: 1773 flags:0x00004000 [ 322.844990] Call Trace: [ 322.844992] __schedule+0x3db/0x7ee [ 322.844995] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e [ 322.844997] schedule+0x68/0xdc [ 322.844999] rpc_wait_bit_killable+0x11/0x52 [ 322.845001] __wait_on_bit+0x31/0x90 [ 322.845003] ? rpc_sleep_on_timeout+0x64/0x9c [ 322.845005] ? rpc_check_timeout+0x1e0/0x1e0 [ 322.845006] ? call_connect_status+0x180/0x17a [ 322.845009] out_of_line_wait_on_bit+0x7e/0x80 [ 322.845010] ? var_wake_function+0x30/0x27 [ 322.845012] __rpc_execute+0xe5/0x485 [ 322.845014] rpc_execute+0xa5/0xaa [ 322.845016] rpc_run_task+0x148/0x187 [ 322.845018] nfs4_do_call_sync+0x61/0x81 [ 322.845020] _nfs4_proc_getattr+0x10a/0x120 [ 322.845023] nfs4_proc_getattr+0x67/0xee [ 322.845026] __nfs_revalidate_inode+0xb7/0x2df [ 322.845028] nfs_access_get_cached+0x14d/0x1df [ 322.845030] nfs_do_access+0x48/0x28e [ 322.845032] ? nfs4_lookup_revalidate+0x38/0x8d [ 322.845034] nfs_permission+0xa2/0x1b8 [ 322.845036] inode_permission+0x96/0xca [ 322.845038] ? unlazy_walk+0x4c/0x80 [ 322.845040] link_path_walk.part.0+0x2a5/0x320 [ 322.845042] ? path_init+0x2c2/0x3e5 [ 322.845044] path_lookupat+0x3f/0x1b1 [ 322.845046] filename_lookup+0x97/0x17f [ 322.845049] ? kmem_cache_alloc+0x32/0x200 [ 322.845051] user_path_at_empty+0x59/0x8a [ 322.845054] ? mntput_no_expire+0x4f/0x264 [ 322.845055] vfs_statx+0x64/0xe5 [ 322.845057] ? putname+0x4b/0x53 [ 322.845059] __do_sys_newstat+0x31/0x4c [ 322.845062] ? __x64_sys_futex+0x13a/0x159 [ 322.845064] ? switch_fpu_return+0x58/0xf2 [ 322.845066] __x64_sys_newstat+0x16/0x18 [ 322.845068] do_syscall_64+0x32/0x45 [ 322.845069] entry_SYSCALL_64_after_hwframe+0x44/0x0 [ 322.845071] RIP: 0033:0x7f0d1fd0c9e6 [ 322.845072] RSP: 002b:00007f0d00331108 EFLAGS: 00000246 ORIG_RAX: 0000000000000004 [ 322.845073] RAX: ffffffffffffffda RBX: 00001605c2fdc0a0 RCX: 00007f0d1fd0c9e6 [ 322.845074] RDX: 00007f0d00331118 RSI: 00007f0d00331118 RDI: 00001605c2fdc0a0 [ 322.845076] RBP: 00007f0d00331200 R08: 0000000000000001 R09: 00007fffaddd9090 [ 322.845077] R10: 000000000005771c R11: 0000000000000246 R12: 00007f0d00331118 [ 322.845078] R13: 00001605c3309c40 R14: 00007f0d003311a8 R15: 00007f0d003311c0 [ 322.845087] task:192.168.16.8-ma state:D stack: 0 pid: 2044 ppid: 2 flags:0x00004000 [ 322.845089] Call Trace: [ 322.845091] __schedule+0x3db/0x7ee [ 322.845094] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e [ 322.845095] schedule+0x68/0xdc [ 322.845098] rpc_wait_bit_killable+0x11/0x52 [ 322.845100] __wait_on_bit+0x31/0x90 [ 322.845102] ? __rpc_atrun+0x20/0x1a [ 322.845103] ? __rpc_atrun+0x20/0x1a [ 322.845105] out_of_line_wait_on_bit+0x7e/0x80 [ 322.845107] ? var_wake_function+0x30/0x27 [ 322.845109] __rpc_execute+0xe5/0x485 [ 322.845111] rpc_execute+0xa5/0xaa [ 322.845113] rpc_run_task+0x148/0x187 [ 322.845115] nfs4_do_call_sync+0x61/0x81 [ 322.845117] _nfs4_proc_statfs+0xc1/0xcf [ 322.845120] nfs4_proc_statfs+0x55/0x74 [ 322.845122] nfs_statfs+0x65/0x152 [ 322.845124] ? _nfs4_proc_delegreturn+0x21e/0x2e5 [ 322.845127] statfs_by_dentry+0x55/0x74 [ 322.845129] vfs_statfs+0x24/0x43 [ 322.845132] check_free_space+0x3c/0xe5 [ 322.845135] do_acct_process+0x5d/0x5e1 [ 322.845136] ? nfs_end_delegation_return+0x1cb/0x3aa [ 322.845138] ? complete+0x49/0x57 [ 322.845141] acct_process+0xe5/0x105 [ 322.845143] do_exit+0x781/0xa1b [ 322.845145] ? refcount_dec_and_lock+0x22/0x8c [ 322.845148] __module_put_and_exit+0x1b/0x1b [ 322.845150] nfs4_run_state_manager+0x293/0x933 [ 322.845152] ? nfs4_do_reclaim+0x820/0x81c [ 322.845155] kthread+0x142/0x15f [ 322.845157] ? __kthread_create_worker+0x130/0x122 [ 322.845159] ret_from_fork+0x1f/0x2a [ 369.859385] INFO: task tee:1176 blocked for more than 122 seconds. [ 369.859391] Not tainted 5.10.4-00021-gde017294a539-dirty #3 [ 369.859393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.859395] task:tee state:D stack: 0 pid: 1176 ppid: 1173 flags:0x00000000 [ 369.859399] Call Trace: [ 369.859407] __schedule+0x3db/0x7ee [ 369.859410] ? io_schedule+0x70/0x6d [ 369.859412] schedule+0x68/0xdc [ 369.859415] io_schedule+0x46/0x6d [ 369.859417] bit_wait_io+0x11/0x52 [ 369.859420] __wait_on_bit+0x31/0x90 [ 369.859422] out_of_line_wait_on_bit+0x7e/0x80 [ 369.859426] ? var_wake_function+0x30/0x27 [ 369.859429] nfs_wait_on_request.part.0+0x31/0x33 [ 369.859431] nfs_page_group_lock_head+0x31/0x78 [ 369.859433] nfs_lock_and_join_requests+0x9d/0x26e [ 369.859436] nfs_updatepage+0x1be/0x985 [ 369.859439] nfs_write_end+0x128/0x52a [ 369.859443] ? iov_iter_copy_from_user_atomic+0xe8/0x3ce [ 369.859446] generic_perform_write+0xed/0x17c [ 369.859449] nfs_file_write+0x14a/0x29f [ 369.859452] new_sync_write+0xfb/0x16b [ 369.859455] vfs_write+0x1ee/0x29a [ 369.859457] ? vfs_read+0x17b/0x198 [ 369.859459] ksys_write+0x61/0xd0 [ 369.859462] __x64_sys_write+0x1a/0x1c [ 369.859464] do_syscall_64+0x32/0x45 [ 369.859467] entry_SYSCALL_64_after_hwframe+0x44/0x0 [ 369.859469] RIP: 0033:0x7f4781f323a3 [ 369.859471] RSP: 002b:00007ffcf8220408 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 369.859474] RAX: ffffffffffffffda RBX: 0000000000000092 RCX: 00007f4781f323a3 [ 369.859475] RDX: 0000000000000092 RSI: 00007ffcf8220530 RDI: 0000000000000003 [ 369.859477] RBP: 00007ffcf8220530 R08: 0000000000000000 R09: 0000000000000001 [ 369.859478] R10: 0000000000400ad2 R11: 0000000000000246 R12: 0000000000000092 [ 369.859480] R13: 0000000002361580 R14: 0000000000000092 R15: 00007f4782201720 [ 369.859495] INFO: task CGenericAsyncFi:1927 blocked for more than 122 seconds. [ 369.859497] Not tainted 5.10.4-00021-gde017294a539-dirty #3 [ 369.859498] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.859499] task:CGenericAsyncFi state:D stack: 0 pid: 1927 ppid: 1540 flags:0x20020000 [ 369.859501] Call Trace: [ 369.859504] __schedule+0x3db/0x7ee [ 369.859506] schedule+0x68/0xdc [ 369.859508] io_schedule+0x46/0x6d [ 369.859510] wait_on_page_bit_common+0xf8/0x325 [ 369.859512] ? trace_raw_output_file_check_and_advance_wb_err+0x80/0x7d [ 369.859515] wait_on_page_bit+0x3f/0x41 [ 369.859517] wait_on_page_writeback.part.0+0x35/0x80 [ 369.859519] wait_on_page_writeback+0x27/0x29 [ 369.859521] __filemap_fdatawait_range+0x88/0xd9 [ 369.859523] filemap_write_and_wait_range+0x45/0x80 [ 369.859525] nfs_wb_all+0x30/0x13d [ 369.859528] nfs4_file_flush+0x85/0xc0 [ 369.859530] filp_close+0x3e/0x78 [ 369.859532] __close_fd+0x23/0x30 [ 369.859534] __ia32_sys_close+0x22/0x50 [ 369.859536] __do_fast_syscall_32+0x5f/0x8e [ 369.859538] do_fast_syscall_32+0x3d/0x80 [ 369.859540] entry_SYSCALL_compat_after_hwframe+0x45/0x0 [ 369.859541] RIP: 0023:0xf7ee6549 [ 369.859543] RSP: 002b:000000008c5fddb0 EFLAGS: 00200286 ORIG_RAX: 0000000000000006 [ 369.859545] RAX: ffffffffffffffda RBX: 000000000000004d RCX: 0000000000000002 [ 369.859546] RDX: 0000000000000000 RSI: 00000000f7d02e70 RDI: 000000000000004d [ 369.859547] RBP: 000000008c5fe110 R08: 0000000000000000 R09: 0000000000000000 [ 369.859548] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 369.859549] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 394.423093] nfs: server loom not responding, still trying -- NULL && (void) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) 2021-01-03 14:27 steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) Nick Alcock @ 2021-02-23 22:57 ` J. Bruce Fields 2021-02-23 23:58 ` Trond Myklebust 0 siblings, 1 reply; 12+ messages in thread From: J. Bruce Fields @ 2021-02-23 22:57 UTC (permalink / raw) To: Nick Alcock; +Cc: NFS list On Sun, Jan 03, 2021 at 02:27:51PM +0000, Nick Alcock wrote: > So I bought a game or two on Steam a fortnight ago to while away a > Christmas spent stuck indoors with no family, decorations, or presents > (stuck at my sister's and she got a perfectly-timed case of covid-19). > It turned out this was a very bad idea! > > I keep my Steam directory on an ext4 fs on the local fileserver and > access it via NFSv4.2: over 10GbE: it's actually faster than local > spinning-rust storage would be (five-spindle RAID-6 and bcache and > 128GiB RAM on the server for caching probably have something to do with > that!) though likely slower than the NVMe the local machine actually > has: but it doesn't have enough local storage, so I keep all the Steam > stuff remotely. > > (I keep my $HOME on the same fileserver, though that's on xfs: I'd use > xfs for Steam, too, but there are still a lot of games out there that > fail on filesystems with 64-bit inode numbers.) > > Unfortunately, what I observe now when kicking up Steam is a spasm of > downloading I/O for the game I just bought and then a complete > client-side hang when it starts installing: just about all I can do is > hard-reboot. This of course stops me using Steam for anything which > didn't make me very happy :P > > The server hums onwards as though nothing went wrong, but the client is > in a bad way indeed: not only has NFS hung but there are clearly some > important locks taken out since even an ls on a non-NFS dir (with no NFS > dirs in the PATH) hangs, as does a while :; ps -e; done loop. I can > rebuild with lockdep and do a lock dump if that would be useful. (The > kernel is not quite stock: it has DTrace built in, but the hangs happen > even if the kernel module has never been loaded, and I'm pretty sure the > hangs can't be associated with the not-very-substantial built-in DTrace > changes. I can try rebuilding without that as well, but I'll have to rip > chunks out of my monitoring machinery to do that so I've avoided doing > it so far.) > > I finally got round to doing a bit of debugging today. NFS debugging > dumps covering the last fifteen seconds or so before the hang and a > couple of minutes afterwards (taken on the client, logged without > incident to the server's syslog, as proof that not *everything* is > stalled) are a few megabytes long so are at > <http://www.esperi.org.uk/~nix/temporary/nfs-steam-hang.log>. Everything > seems to go wrong after an especially large bunch of NFS commits. > > Both client (silk, 192.168.16.21) and server (loom, 192.168.16.8) are > running 5.10.4 with the same NFS config, though the client has no NFS > server exports at the time and the server isn't acting as an NFS client > at all: > > CONFIG_NFS_FS=y > # CONFIG_NFS_V2 is not set > CONFIG_NFS_V3=y > CONFIG_NFS_V3_ACL=y > CONFIG_NFS_V4=y > # CONFIG_NFS_SWAP is not set > CONFIG_NFS_V4_1=y > CONFIG_NFS_V4_2=y > CONFIG_PNFS_FILE_LAYOUT=y > CONFIG_PNFS_BLOCK=y > CONFIG_PNFS_FLEXFILE_LAYOUT=m > CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org" > # CONFIG_NFS_V4_1_MIGRATION is not set > # CONFIG_ROOT_NFS is not set > # CONFIG_NFS_USE_LEGACY_DNS is not set > CONFIG_NFS_USE_KERNEL_DNS=y > CONFIG_NFS_DEBUG=y > CONFIG_NFS_DISABLE_UDP_SUPPORT=y > # CONFIG_NFS_V4_2_READ_PLUS is not set > CONFIG_NFSD=y > CONFIG_NFSD_V2_ACL=y > CONFIG_NFSD_V3=y > CONFIG_NFSD_V3_ACL=y > CONFIG_NFSD_V4=y > # CONFIG_NFSD_BLOCKLAYOUT is not set > # CONFIG_NFSD_SCSILAYOUT is not set > # CONFIG_NFSD_FLEXFILELAYOUT is not set > CONFIG_NFSD_V4_2_INTER_SSC=y > CONFIG_GRACE_PERIOD=y > CONFIG_LOCKD=y > CONFIG_LOCKD_V4=y > CONFIG_NFS_ACL_SUPPORT=y > CONFIG_NFS_COMMON=y > CONFIG_SUNRPC=y > CONFIG_SUNRPC_GSS=y > CONFIG_SUNRPC_BACKCHANNEL=y > CONFIG_SUNRPC_DEBUG=y > > The server (loom) has a large number of exports: many of the filesystems > being exported are moved around via bind mounts and the like. The steam > datadir is in /pkg/non-free/steam, which is bind-mounted onto > /home/nix/.steam on the client: /home/nix itself is bind-mounted from > /home/.loom.srvr.nix/steam, and is NFS-exported from the server under > that pathname. > > Relevant exports, from /proc/fs/nfs/exports: > > / 192.168.16.0/24,silk.wkstn.nix(ro,insecure,no_root_squash,sync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=0a4a4563:00764033:8c827c0e:989cf534,sec=390003:390004:390005:1) > /home/.loom.srvr.nix *.srvr.nix,fold.srvr.nix(rw,root_squash,sync,wdelay,no_subtree_check,uuid=0a4a4563:00764033:8c827c0e:989cf534,sec=1) > /pkg/non-free/steam 192.168.16.0/24,silk.wkstn.nix(rw,root_squash,sync,wdelay,uuid=11b94016:cb6e4b64:beb6c134:9ea3d6b3,sec=1) > > Blocked tasks after the hang suggest that the client is hung waiting > indefinitely for the server to reply, and I'd guess the server doesn't > think it needs to. (I can do some server-side debugging dumps too if you > think the problem might be there.) I seem to remember there have been some subtle changes to how dropped replies are handled. If I remember right: a v4 server is never supposed to drop a reply without also closing the tcp connection, but it may have done so in the past. (I *think* all those cases are fixed?) And the client may have been changed to be more unforgiving about that requirement in recent years. But I don't know if that's where the issue is. I'm not sure how best to go about debugging. I suppose you could try to capture a network trace and work out if there's an RPC that isn't getting a reply. Maybe turning on server debugging and looking for "svc: svc_process dropit" messages timed with the start of the hang. I think what you want is "rpdebug -m rpc -s svcdsp". Does rpc.mountd look healthy? --b. > > [ 322.843962] sysrq: Show Blocked State > [ 322.844058] task:tee state:D stack: 0 pid: 1176 ppid: 1173 flags:0x00000000 > [ 322.844064] Call Trace: > [ 322.844074] __schedule+0x3db/0x7ee > [ 322.844079] ? io_schedule+0x70/0x6d > [ 322.844082] schedule+0x68/0xdc > [ 322.844085] io_schedule+0x46/0x6d > [ 322.844089] bit_wait_io+0x11/0x52 > [ 322.844092] __wait_on_bit+0x31/0x90 > [ 322.844096] out_of_line_wait_on_bit+0x7e/0x80 > [ 322.844100] ? var_wake_function+0x30/0x27 > [ 322.844104] nfs_wait_on_request.part.0+0x31/0x33 > [ 322.844107] nfs_page_group_lock_head+0x31/0x78 > [ 322.844110] nfs_lock_and_join_requests+0x9d/0x26e > [ 322.844114] nfs_updatepage+0x1be/0x985 > [ 322.844118] nfs_write_end+0x128/0x52a > [ 322.844122] ? iov_iter_copy_from_user_atomic+0xe8/0x3ce > [ 322.844127] generic_perform_write+0xed/0x17c > [ 322.844131] nfs_file_write+0x14a/0x29f > [ 322.844135] new_sync_write+0xfb/0x16b > [ 322.844139] vfs_write+0x1ee/0x29a > [ 322.844142] ? vfs_read+0x17b/0x198 > [ 322.844145] ksys_write+0x61/0xd0 > [ 322.844148] __x64_sys_write+0x1a/0x1c > [ 322.844151] do_syscall_64+0x32/0x45 > [ 322.844154] entry_SYSCALL_64_after_hwframe+0x44/0x0 > [ 322.844158] RIP: 0033:0x7f4781f323a3 > [ 322.844160] RSP: 002b:00007ffcf8220408 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > [ 322.844164] RAX: ffffffffffffffda RBX: 0000000000000092 RCX: 00007f4781f323a3 > [ 322.844166] RDX: 0000000000000092 RSI: 00007ffcf8220530 RDI: 0000000000000003 > [ 322.844168] RBP: 00007ffcf8220530 R08: 0000000000000000 R09: 0000000000000001 > [ 322.844170] R10: 0000000000400ad2 R11: 0000000000000246 R12: 0000000000000092 > [ 322.844172] R13: 0000000002361580 R14: 0000000000000092 R15: 00007f4782201720 > [ 322.844202] task:CIPCServer::Thr state:D stack: 0 pid: 1801 ppid: 1540 flags:0x20020000 > [ 322.844207] Call Trace: > [ 322.844212] __schedule+0x3db/0x7ee > [ 322.844219] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > [ 322.844222] schedule+0x68/0xdc > [ 322.844227] rpc_wait_bit_killable+0x11/0x52 > [ 322.844230] __wait_on_bit+0x31/0x90 > [ 322.844235] ? __rpc_atrun+0x20/0x1a > [ 322.844238] ? __rpc_atrun+0x20/0x1a > [ 322.844242] out_of_line_wait_on_bit+0x7e/0x80 > [ 322.844246] ? var_wake_function+0x30/0x27 > [ 322.844250] __rpc_execute+0xe5/0x485 > [ 322.844254] ? sugov_get_util+0xf0/0xeb > [ 322.844258] rpc_execute+0xa5/0xaa > [ 322.844263] rpc_run_task+0x148/0x187 > [ 322.844267] nfs4_do_call_sync+0x61/0x81 > [ 322.844272] _nfs4_proc_getattr+0x10a/0x120 > [ 322.844278] nfs4_proc_getattr+0x67/0xee > [ 322.844283] __nfs_revalidate_inode+0xb7/0x2df > [ 322.844287] nfs_access_get_cached+0x14d/0x1df > [ 322.844291] nfs_do_access+0x48/0x28e > [ 322.844296] ? step_into+0x2bc/0x6a1 > [ 322.844301] nfs_permission+0xa2/0x1b8 > [ 322.844305] inode_permission+0x96/0xca > [ 322.844308] ? unlazy_walk+0x4c/0x80 > [ 322.844312] link_path_walk.part.0+0x2a5/0x320 > [ 322.844315] ? path_init+0x2c2/0x3e5 > [ 322.844318] path_lookupat+0x3f/0x1b1 > [ 322.844322] filename_lookup+0x97/0x17f > [ 322.844326] ? kmem_cache_alloc+0x32/0x200 > [ 322.844331] user_path_at_empty+0x59/0x8a > [ 322.844333] do_faccessat+0x70/0x227 > [ 322.844337] __ia32_sys_access+0x1c/0x1e > [ 322.844340] __do_fast_syscall_32+0x5f/0x8e > [ 322.844342] do_fast_syscall_32+0x3d/0x80 > [ 322.844346] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > [ 322.844348] RIP: 0023:0xf7ee6549 > [ 322.844350] RSP: 002b:00000000eb763b28 EFLAGS: 00200292 ORIG_RAX: 0000000000000021 > [ 322.844353] RAX: ffffffffffffffda RBX: 000000005754d9e0 RCX: 0000000000000004 > [ 322.844355] RDX: 00000000f7ab3e1c RSI: 000000005754d9e0 RDI: 00000000eb763bc8 > [ 322.844357] RBP: 00000000f78a91c0 R08: 0000000000000000 R09: 0000000000000000 > [ 322.844359] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > [ 322.844360] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 322.844364] task:CFileWriterThre state:D stack: 0 pid: 1810 ppid: 1540 flags:0x20024000 > [ 322.844367] Call Trace: > [ 322.844371] __schedule+0x3db/0x7ee > [ 322.844375] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > [ 322.844379] schedule+0x68/0xdc > [ 322.844382] rpc_wait_bit_killable+0x11/0x52 > [ 322.844385] __wait_on_bit+0x31/0x90 > [ 322.844389] ? call_reserveresult+0xa0/0x9f > [ 322.844391] ? call_transmit_status+0x160/0x159 > [ 322.844395] out_of_line_wait_on_bit+0x7e/0x80 > [ 322.844398] ? var_wake_function+0x30/0x27 > [ 322.844401] __rpc_execute+0xe5/0x485 > [ 322.844404] ? nfs_check_verifier+0x5b/0xa4 > [ 322.844407] rpc_execute+0xa5/0xaa > [ 322.844410] rpc_run_task+0x148/0x187 > [ 322.844414] nfs4_do_call_sync+0x61/0x81 > [ 322.844417] _nfs4_proc_getattr+0x10a/0x120 > [ 322.844422] nfs4_proc_getattr+0x67/0xee > [ 322.844426] __nfs_revalidate_inode+0xb7/0x2df > [ 322.844430] nfs_getattr+0x159/0x374 > [ 322.844433] vfs_getattr_nosec+0x96/0xa4 > [ 322.844435] vfs_statx+0x7a/0xe5 > [ 322.844438] vfs_fstatat+0x1f/0x21 > [ 322.844443] __do_compat_sys_ia32_lstat64+0x2b/0x4e > [ 322.844448] __ia32_compat_sys_ia32_lstat64+0x14/0x16 > [ 322.844450] __do_fast_syscall_32+0x5f/0x8e > [ 322.844453] do_fast_syscall_32+0x3d/0x80 > [ 322.844457] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > [ 322.844459] RIP: 0023:0xf7ee6549 > [ 322.844460] RSP: 002b:00000000eb561be8 EFLAGS: 00000292 ORIG_RAX: 00000000000000c4 > [ 322.844463] RAX: ffffffffffffffda RBX: 00000000c521bf60 RCX: 00000000eb561e70 > [ 322.844465] RDX: 00000000f7ab3e1c RSI: 00000000eb561e70 RDI: 00000000c521bf60 > [ 322.844467] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > [ 322.844468] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > [ 322.844470] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 322.844477] task:Steam - DirWatc state:D stack: 0 pid: 1924 ppid: 1540 flags:0x20024000 > [ 322.844480] Call Trace: > [ 322.844484] __schedule+0x3db/0x7ee > [ 322.844488] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > [ 322.844492] schedule+0x68/0xdc > [ 322.844495] rpc_wait_bit_killable+0x11/0x52 > [ 322.844498] __wait_on_bit+0x31/0x90 > [ 322.844501] ? rpc_sleep_on_timeout+0x64/0x9c > [ 322.844504] ? rpc_check_timeout+0x1e0/0x1e0 > [ 322.844507] ? call_connect_status+0x180/0x17a > [ 322.844511] out_of_line_wait_on_bit+0x7e/0x80 > [ 322.844513] ? var_wake_function+0x30/0x27 > [ 322.844516] __rpc_execute+0xe5/0x485 > [ 322.844520] rpc_execute+0xa5/0xaa > [ 322.844523] rpc_run_task+0x148/0x187 > [ 322.844526] nfs4_do_call_sync+0x61/0x81 > [ 322.844530] _nfs4_proc_getattr+0x10a/0x120 > [ 322.844534] nfs4_proc_getattr+0x67/0xee > [ 322.844538] __nfs_revalidate_inode+0xb7/0x2df > [ 322.844542] nfs_lookup_verify_inode+0x67/0x82 > [ 322.844545] nfs_do_lookup_revalidate+0x19b/0x29c > [ 322.844548] nfs4_do_lookup_revalidate+0x5d/0xc5 > [ 322.844552] nfs4_lookup_revalidate+0x6a/0x8d > [ 322.844555] lookup_fast+0xb8/0x122 > [ 322.844559] path_openat+0xfd/0xfd8 > [ 322.844562] ? unlazy_walk+0x4c/0x80 > [ 322.844565] ? terminate_walk+0x62/0xed > [ 322.844569] ? putname+0x4b/0x53 > [ 322.844572] do_filp_open+0x86/0x110 > [ 322.844576] ? kmem_cache_alloc+0x32/0x200 > [ 322.844579] ? __alloc_fd+0xb2/0x143 > [ 322.844583] do_sys_openat2+0x8d/0x13a > [ 322.844585] ? do_faccessat+0x108/0x227 > [ 322.844588] __ia32_compat_sys_openat+0x48/0x63 > [ 322.844591] __do_fast_syscall_32+0x5f/0x8e > [ 322.844594] do_fast_syscall_32+0x3d/0x80 > [ 322.844597] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > [ 322.844600] RIP: 0023:0xf7ee6549 > [ 322.844601] RSP: 002b:000000008c8fbcb0 EFLAGS: 00200287 ORIG_RAX: 0000000000000127 > [ 322.844604] RAX: ffffffffffffffda RBX: 00000000ffffff9c RCX: 000000008c8fbffc > [ 322.844606] RDX: 0000000000098800 RSI: 0000000000000000 RDI: 00000000f7ab3e1c > [ 322.844608] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > [ 322.844609] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > [ 322.844611] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 322.844614] task:CGenericAsyncFi state:D stack: 0 pid: 1927 ppid: 1540 flags:0x20020000 > [ 322.844617] Call Trace: > [ 322.844621] __schedule+0x3db/0x7ee > [ 322.844625] schedule+0x68/0xdc > [ 322.844628] io_schedule+0x46/0x6d > [ 322.844631] wait_on_page_bit_common+0xf8/0x325 > [ 322.844634] ? trace_raw_output_file_check_and_advance_wb_err+0x80/0x7d > [ 322.844638] wait_on_page_bit+0x3f/0x41 > [ 322.844642] wait_on_page_writeback.part.0+0x35/0x80 > [ 322.844645] wait_on_page_writeback+0x27/0x29 > [ 322.844647] __filemap_fdatawait_range+0x88/0xd9 > [ 322.844651] filemap_write_and_wait_range+0x45/0x80 > [ 322.844655] nfs_wb_all+0x30/0x13d > [ 322.844658] nfs4_file_flush+0x85/0xc0 > [ 322.844661] filp_close+0x3e/0x78 > [ 322.844664] __close_fd+0x23/0x30 > [ 322.844667] __ia32_sys_close+0x22/0x50 > [ 322.844670] __do_fast_syscall_32+0x5f/0x8e > [ 322.844673] do_fast_syscall_32+0x3d/0x80 > [ 322.844676] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > [ 322.844678] RIP: 0023:0xf7ee6549 > [ 322.844680] RSP: 002b:000000008c5fddb0 EFLAGS: 00200286 ORIG_RAX: 0000000000000006 > [ 322.844682] RAX: ffffffffffffffda RBX: 000000000000004d RCX: 0000000000000002 > [ 322.844684] RDX: 0000000000000000 RSI: 00000000f7d02e70 RDI: 000000000000004d > [ 322.844686] RBP: 000000008c5fe110 R08: 0000000000000000 R09: 0000000000000000 > [ 322.844688] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > [ 322.844689] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 322.844699] task:ThreadPoolForeg state:D stack: 0 pid: 1768 ppid: 1735 flags:0x00004000 > [ 322.844702] Call Trace: > [ 322.844705] __schedule+0x3db/0x7ee > [ 322.844709] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > [ 322.844712] schedule+0x68/0xdc > [ 322.844716] rpc_wait_bit_killable+0x11/0x52 > [ 322.844719] __wait_on_bit+0x31/0x90 > [ 322.844722] ? rpc_sleep_on_timeout+0x64/0x9c > [ 322.844725] ? rpc_check_timeout+0x1e0/0x1e0 > [ 322.844728] ? call_connect_status+0x180/0x17a > [ 322.844731] out_of_line_wait_on_bit+0x7e/0x80 > [ 322.844734] ? var_wake_function+0x30/0x27 > [ 322.844737] __rpc_execute+0xe5/0x485 > [ 322.844740] ? nfs_check_verifier+0x5b/0xa4 > [ 322.844743] rpc_execute+0xa5/0xaa > [ 322.844746] rpc_run_task+0x148/0x187 > [ 322.844750] nfs4_do_call_sync+0x61/0x81 > [ 322.844753] _nfs4_proc_getattr+0x10a/0x120 > [ 322.844758] nfs4_proc_getattr+0x67/0xee > [ 322.844762] __nfs_revalidate_inode+0xb7/0x2df > [ 322.844766] nfs_getattr+0x159/0x374 > [ 322.844769] vfs_getattr_nosec+0x96/0xa4 > [ 322.844771] vfs_statx+0x7a/0xe5 > [ 322.844774] __do_sys_newstat+0x31/0x4c > [ 322.844778] __x64_sys_newstat+0x16/0x18 > [ 322.844781] do_syscall_64+0x32/0x45 > [ 322.844783] entry_SYSCALL_64_after_hwframe+0x44/0x0 > [ 322.844786] RIP: 0033:0x7fe70ebf19e6 > [ 322.844787] RSP: 002b:00007fe6c15b2068 EFLAGS: 00000246 ORIG_RAX: 0000000000000004 > [ 322.844790] RAX: ffffffffffffffda RBX: 00000bd4193e1910 RCX: 00007fe70ebf19e6 > [ 322.844792] RDX: 00007fe6c15b2078 RSI: 00007fe6c15b2078 RDI: 00000bd4193e1910 > [ 322.844794] RBP: 00007fe6c15b2160 R08: 0000000000000001 R09: 00007ffd971ed090 > [ 322.844796] R10: 0000000000056764 R11: 0000000000000246 R12: 00007fe6c15b2078 > [ 322.844797] R13: 00000bd4192f6f60 R14: 00007fe6c15b2108 R15: 00007fe6c15b2120 > [ 322.844801] task:ThreadPoolForeg state:D stack: 0 pid: 1769 ppid: 1735 flags:0x00000000 > [ 322.844804] Call Trace: > [ 322.844807] __schedule+0x3db/0x7ee > [ 322.844811] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > [ 322.844814] schedule+0x68/0xdc > [ 322.844817] rpc_wait_bit_killable+0x11/0x52 > [ 322.844820] __wait_on_bit+0x31/0x90 > [ 322.844823] ? __rpc_atrun+0x20/0x1a > [ 322.844826] ? __rpc_atrun+0x20/0x1a > [ 322.844829] out_of_line_wait_on_bit+0x7e/0x80 > [ 322.844831] ? var_wake_function+0x30/0x27 > [ 322.844834] __rpc_execute+0xe5/0x485 > [ 322.844837] ? update_cfs_group+0x9a/0x9e > [ 322.844840] rpc_execute+0xa5/0xaa > [ 322.844842] rpc_run_task+0x148/0x187 > [ 322.844845] nfs4_do_call_sync+0x61/0x81 > [ 322.844849] _nfs4_proc_getattr+0x10a/0x120 > [ 322.844853] nfs4_proc_getattr+0x67/0xee > [ 322.844856] __nfs_revalidate_inode+0xb7/0x2df > [ 322.844859] nfs_file_write+0xfc/0x29f > [ 322.844862] new_sync_write+0xfb/0x16b > [ 322.844866] vfs_write+0x1ee/0x29a > [ 322.844868] ? __fget_light+0x31/0x72 > [ 322.844871] ksys_write+0x61/0xd0 > [ 322.844874] __x64_sys_write+0x1a/0x1c > [ 322.844877] do_syscall_64+0x32/0x45 > [ 322.844880] entry_SYSCALL_64_after_hwframe+0x44/0x0 > [ 322.844881] RIP: 0033:0x7fe71859ab7f > [ 322.844883] RSP: 002b:00007fe6bd5b0d30 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 > [ 322.844885] RAX: ffffffffffffffda RBX: 00007fe6bd5b0d68 RCX: 00007fe71859ab7f > [ 322.844887] RDX: 0000000000000007 RSI: 00007fe6bd5b1090 RDI: 000000000000005b > [ 322.844888] RBP: 00007fe6bd5b0de0 R08: 0000000000000000 R09: 00007ffd971ed090 > [ 322.844890] R10: 0000000000065fd2 R11: 0000000000000293 R12: 00000bd418d95020 > [ 322.844891] R13: 00007fe6bd5b1090 R14: 0000000000000007 R15: 0000000000000000 > [ 322.844900] task:ThreadPoolForeg state:D stack: 0 pid: 1807 ppid: 1773 flags:0x00000000 > [ 322.844902] Call Trace: > [ 322.844904] __schedule+0x3db/0x7ee > [ 322.844907] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > [ 322.844908] schedule+0x68/0xdc > [ 322.844911] rpc_wait_bit_killable+0x11/0x52 > [ 322.844913] __wait_on_bit+0x31/0x90 > [ 322.844915] ? __rpc_atrun+0x20/0x1a > [ 322.844916] ? __rpc_atrun+0x20/0x1a > [ 322.844918] out_of_line_wait_on_bit+0x7e/0x80 > [ 322.844920] ? var_wake_function+0x30/0x27 > [ 322.844922] __rpc_execute+0xe5/0x485 > [ 322.844924] rpc_execute+0xa5/0xaa > [ 322.844926] rpc_run_task+0x148/0x187 > [ 322.844928] nfs4_do_call_sync+0x61/0x81 > [ 322.844930] _nfs4_proc_getattr+0x10a/0x120 > [ 322.844933] nfs4_proc_getattr+0x67/0xee > [ 322.844936] __nfs_revalidate_inode+0xb7/0x2df > [ 322.844938] nfs_access_get_cached+0x14d/0x1df > [ 322.844940] nfs_do_access+0x48/0x28e > [ 322.844942] ? nfs4_lookup_revalidate+0x38/0x8d > [ 322.844944] nfs_permission+0xa2/0x1b8 > [ 322.844946] inode_permission+0x96/0xca > [ 322.844948] ? unlazy_walk+0x4c/0x80 > [ 322.844950] link_path_walk.part.0+0x2a5/0x320 > [ 322.844952] ? path_init+0x2c2/0x3e5 > [ 322.844954] path_lookupat+0x3f/0x1b1 > [ 322.844956] filename_lookup+0x97/0x17f > [ 322.844959] ? do_unlk+0x9a/0xca > [ 322.844960] ? kmem_cache_alloc+0x32/0x200 > [ 322.844963] user_path_at_empty+0x59/0x8a > [ 322.844965] vfs_statx+0x64/0xe5 > [ 322.844967] __do_sys_newstat+0x31/0x4c > [ 322.844968] ? do_fcntl+0x113/0x582 > [ 322.844971] ? fput+0x13/0x15 > [ 322.844972] ? __x64_sys_fcntl+0x81/0xa8 > [ 322.844974] __x64_sys_newstat+0x16/0x18 > [ 322.844976] do_syscall_64+0x32/0x45 > [ 322.844977] entry_SYSCALL_64_after_hwframe+0x44/0x0 > [ 322.844979] RIP: 0033:0x7f0d1fd0c9e6 > [ 322.844980] RSP: 002b:00007f0d08e12e18 EFLAGS: 00000246 ORIG_RAX: 0000000000000004 > [ 322.844981] RAX: ffffffffffffffda RBX: 00007f0d08e12edc RCX: 00007f0d1fd0c9e6 > [ 322.844983] RDX: 00007f0d08e12e28 RSI: 00007f0d08e12e28 RDI: 00001605c2ff1169 > [ 322.844984] RBP: 00007f0d08e12ec0 R08: 0000000000000001 R09: 00007fffaddd9090 > [ 322.844985] R10: 00001605c3133480 R11: 0000000000000246 R12: 0000000000000000 > [ 322.844986] R13: 0000000000000000 R14: 00001605c2ff0f00 R15: 0000000000000000 > [ 322.844988] task:ThreadPoolForeg state:D stack: 0 pid: 2031 ppid: 1773 flags:0x00004000 > [ 322.844990] Call Trace: > [ 322.844992] __schedule+0x3db/0x7ee > [ 322.844995] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > [ 322.844997] schedule+0x68/0xdc > [ 322.844999] rpc_wait_bit_killable+0x11/0x52 > [ 322.845001] __wait_on_bit+0x31/0x90 > [ 322.845003] ? rpc_sleep_on_timeout+0x64/0x9c > [ 322.845005] ? rpc_check_timeout+0x1e0/0x1e0 > [ 322.845006] ? call_connect_status+0x180/0x17a > [ 322.845009] out_of_line_wait_on_bit+0x7e/0x80 > [ 322.845010] ? var_wake_function+0x30/0x27 > [ 322.845012] __rpc_execute+0xe5/0x485 > [ 322.845014] rpc_execute+0xa5/0xaa > [ 322.845016] rpc_run_task+0x148/0x187 > [ 322.845018] nfs4_do_call_sync+0x61/0x81 > [ 322.845020] _nfs4_proc_getattr+0x10a/0x120 > [ 322.845023] nfs4_proc_getattr+0x67/0xee > [ 322.845026] __nfs_revalidate_inode+0xb7/0x2df > [ 322.845028] nfs_access_get_cached+0x14d/0x1df > [ 322.845030] nfs_do_access+0x48/0x28e > [ 322.845032] ? nfs4_lookup_revalidate+0x38/0x8d > [ 322.845034] nfs_permission+0xa2/0x1b8 > [ 322.845036] inode_permission+0x96/0xca > [ 322.845038] ? unlazy_walk+0x4c/0x80 > [ 322.845040] link_path_walk.part.0+0x2a5/0x320 > [ 322.845042] ? path_init+0x2c2/0x3e5 > [ 322.845044] path_lookupat+0x3f/0x1b1 > [ 322.845046] filename_lookup+0x97/0x17f > [ 322.845049] ? kmem_cache_alloc+0x32/0x200 > [ 322.845051] user_path_at_empty+0x59/0x8a > [ 322.845054] ? mntput_no_expire+0x4f/0x264 > [ 322.845055] vfs_statx+0x64/0xe5 > [ 322.845057] ? putname+0x4b/0x53 > [ 322.845059] __do_sys_newstat+0x31/0x4c > [ 322.845062] ? __x64_sys_futex+0x13a/0x159 > [ 322.845064] ? switch_fpu_return+0x58/0xf2 > [ 322.845066] __x64_sys_newstat+0x16/0x18 > [ 322.845068] do_syscall_64+0x32/0x45 > [ 322.845069] entry_SYSCALL_64_after_hwframe+0x44/0x0 > [ 322.845071] RIP: 0033:0x7f0d1fd0c9e6 > [ 322.845072] RSP: 002b:00007f0d00331108 EFLAGS: 00000246 ORIG_RAX: 0000000000000004 > [ 322.845073] RAX: ffffffffffffffda RBX: 00001605c2fdc0a0 RCX: 00007f0d1fd0c9e6 > [ 322.845074] RDX: 00007f0d00331118 RSI: 00007f0d00331118 RDI: 00001605c2fdc0a0 > [ 322.845076] RBP: 00007f0d00331200 R08: 0000000000000001 R09: 00007fffaddd9090 > [ 322.845077] R10: 000000000005771c R11: 0000000000000246 R12: 00007f0d00331118 > [ 322.845078] R13: 00001605c3309c40 R14: 00007f0d003311a8 R15: 00007f0d003311c0 > [ 322.845087] task:192.168.16.8-ma state:D stack: 0 pid: 2044 ppid: 2 flags:0x00004000 > [ 322.845089] Call Trace: > [ 322.845091] __schedule+0x3db/0x7ee > [ 322.845094] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > [ 322.845095] schedule+0x68/0xdc > [ 322.845098] rpc_wait_bit_killable+0x11/0x52 > [ 322.845100] __wait_on_bit+0x31/0x90 > [ 322.845102] ? __rpc_atrun+0x20/0x1a > [ 322.845103] ? __rpc_atrun+0x20/0x1a > [ 322.845105] out_of_line_wait_on_bit+0x7e/0x80 > [ 322.845107] ? var_wake_function+0x30/0x27 > [ 322.845109] __rpc_execute+0xe5/0x485 > [ 322.845111] rpc_execute+0xa5/0xaa > [ 322.845113] rpc_run_task+0x148/0x187 > [ 322.845115] nfs4_do_call_sync+0x61/0x81 > [ 322.845117] _nfs4_proc_statfs+0xc1/0xcf > [ 322.845120] nfs4_proc_statfs+0x55/0x74 > [ 322.845122] nfs_statfs+0x65/0x152 > [ 322.845124] ? _nfs4_proc_delegreturn+0x21e/0x2e5 > [ 322.845127] statfs_by_dentry+0x55/0x74 > [ 322.845129] vfs_statfs+0x24/0x43 > [ 322.845132] check_free_space+0x3c/0xe5 > [ 322.845135] do_acct_process+0x5d/0x5e1 > [ 322.845136] ? nfs_end_delegation_return+0x1cb/0x3aa > [ 322.845138] ? complete+0x49/0x57 > [ 322.845141] acct_process+0xe5/0x105 > [ 322.845143] do_exit+0x781/0xa1b > [ 322.845145] ? refcount_dec_and_lock+0x22/0x8c > [ 322.845148] __module_put_and_exit+0x1b/0x1b > [ 322.845150] nfs4_run_state_manager+0x293/0x933 > [ 322.845152] ? nfs4_do_reclaim+0x820/0x81c > [ 322.845155] kthread+0x142/0x15f > [ 322.845157] ? __kthread_create_worker+0x130/0x122 > [ 322.845159] ret_from_fork+0x1f/0x2a > [ 369.859385] INFO: task tee:1176 blocked for more than 122 seconds. > [ 369.859391] Not tainted 5.10.4-00021-gde017294a539-dirty #3 > [ 369.859393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 369.859395] task:tee state:D stack: 0 pid: 1176 ppid: 1173 flags:0x00000000 > [ 369.859399] Call Trace: > [ 369.859407] __schedule+0x3db/0x7ee > [ 369.859410] ? io_schedule+0x70/0x6d > [ 369.859412] schedule+0x68/0xdc > [ 369.859415] io_schedule+0x46/0x6d > [ 369.859417] bit_wait_io+0x11/0x52 > [ 369.859420] __wait_on_bit+0x31/0x90 > [ 369.859422] out_of_line_wait_on_bit+0x7e/0x80 > [ 369.859426] ? var_wake_function+0x30/0x27 > [ 369.859429] nfs_wait_on_request.part.0+0x31/0x33 > [ 369.859431] nfs_page_group_lock_head+0x31/0x78 > [ 369.859433] nfs_lock_and_join_requests+0x9d/0x26e > [ 369.859436] nfs_updatepage+0x1be/0x985 > [ 369.859439] nfs_write_end+0x128/0x52a > [ 369.859443] ? iov_iter_copy_from_user_atomic+0xe8/0x3ce > [ 369.859446] generic_perform_write+0xed/0x17c > [ 369.859449] nfs_file_write+0x14a/0x29f > [ 369.859452] new_sync_write+0xfb/0x16b > [ 369.859455] vfs_write+0x1ee/0x29a > [ 369.859457] ? vfs_read+0x17b/0x198 > [ 369.859459] ksys_write+0x61/0xd0 > [ 369.859462] __x64_sys_write+0x1a/0x1c > [ 369.859464] do_syscall_64+0x32/0x45 > [ 369.859467] entry_SYSCALL_64_after_hwframe+0x44/0x0 > [ 369.859469] RIP: 0033:0x7f4781f323a3 > [ 369.859471] RSP: 002b:00007ffcf8220408 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > [ 369.859474] RAX: ffffffffffffffda RBX: 0000000000000092 RCX: 00007f4781f323a3 > [ 369.859475] RDX: 0000000000000092 RSI: 00007ffcf8220530 RDI: 0000000000000003 > [ 369.859477] RBP: 00007ffcf8220530 R08: 0000000000000000 R09: 0000000000000001 > [ 369.859478] R10: 0000000000400ad2 R11: 0000000000000246 R12: 0000000000000092 > [ 369.859480] R13: 0000000002361580 R14: 0000000000000092 R15: 00007f4782201720 > [ 369.859495] INFO: task CGenericAsyncFi:1927 blocked for more than 122 seconds. > [ 369.859497] Not tainted 5.10.4-00021-gde017294a539-dirty #3 > [ 369.859498] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 369.859499] task:CGenericAsyncFi state:D stack: 0 pid: 1927 ppid: 1540 flags:0x20020000 > [ 369.859501] Call Trace: > [ 369.859504] __schedule+0x3db/0x7ee > [ 369.859506] schedule+0x68/0xdc > [ 369.859508] io_schedule+0x46/0x6d > [ 369.859510] wait_on_page_bit_common+0xf8/0x325 > [ 369.859512] ? trace_raw_output_file_check_and_advance_wb_err+0x80/0x7d > [ 369.859515] wait_on_page_bit+0x3f/0x41 > [ 369.859517] wait_on_page_writeback.part.0+0x35/0x80 > [ 369.859519] wait_on_page_writeback+0x27/0x29 > [ 369.859521] __filemap_fdatawait_range+0x88/0xd9 > [ 369.859523] filemap_write_and_wait_range+0x45/0x80 > [ 369.859525] nfs_wb_all+0x30/0x13d > [ 369.859528] nfs4_file_flush+0x85/0xc0 > [ 369.859530] filp_close+0x3e/0x78 > [ 369.859532] __close_fd+0x23/0x30 > [ 369.859534] __ia32_sys_close+0x22/0x50 > [ 369.859536] __do_fast_syscall_32+0x5f/0x8e > [ 369.859538] do_fast_syscall_32+0x3d/0x80 > [ 369.859540] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > [ 369.859541] RIP: 0023:0xf7ee6549 > [ 369.859543] RSP: 002b:000000008c5fddb0 EFLAGS: 00200286 ORIG_RAX: 0000000000000006 > [ 369.859545] RAX: ffffffffffffffda RBX: 000000000000004d RCX: 0000000000000002 > [ 369.859546] RDX: 0000000000000000 RSI: 00000000f7d02e70 RDI: 000000000000004d > [ 369.859547] RBP: 000000008c5fe110 R08: 0000000000000000 R09: 0000000000000000 > [ 369.859548] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > [ 369.859549] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 394.423093] nfs: server loom not responding, still trying > > -- > NULL && (void) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) 2021-02-23 22:57 ` J. Bruce Fields @ 2021-02-23 23:58 ` Trond Myklebust 2021-02-24 2:01 ` bfields 0 siblings, 1 reply; 12+ messages in thread From: Trond Myklebust @ 2021-02-23 23:58 UTC (permalink / raw) To: bfields, nix; +Cc: linux-nfs On Tue, 2021-02-23 at 17:57 -0500, J. Bruce Fields wrote: > On Sun, Jan 03, 2021 at 02:27:51PM +0000, Nick Alcock wrote: > > So I bought a game or two on Steam a fortnight ago to while away a > > Christmas spent stuck indoors with no family, decorations, or > > presents > > (stuck at my sister's and she got a perfectly-timed case of covid- > > 19). > > It turned out this was a very bad idea! > > > > I keep my Steam directory on an ext4 fs on the local fileserver and > > access it via NFSv4.2: over 10GbE: it's actually faster than local > > spinning-rust storage would be (five-spindle RAID-6 and bcache and > > 128GiB RAM on the server for caching probably have something to do > > with > > that!) though likely slower than the NVMe the local machine > > actually > > has: but it doesn't have enough local storage, so I keep all the > > Steam > > stuff remotely. > > > > (I keep my $HOME on the same fileserver, though that's on xfs: I'd > > use > > xfs for Steam, too, but there are still a lot of games out there > > that > > fail on filesystems with 64-bit inode numbers.) > > > > Unfortunately, what I observe now when kicking up Steam is a spasm > > of > > downloading I/O for the game I just bought and then a complete > > client-side hang when it starts installing: just about all I can do > > is > > hard-reboot. This of course stops me using Steam for anything which > > didn't make me very happy :P > > > > The server hums onwards as though nothing went wrong, but the > > client is > > in a bad way indeed: not only has NFS hung but there are clearly > > some > > important locks taken out since even an ls on a non-NFS dir (with > > no NFS > > dirs in the PATH) hangs, as does a while :; ps -e; done loop. I can > > rebuild with lockdep and do a lock dump if that would be useful. > > (The > > kernel is not quite stock: it has DTrace built in, but the hangs > > happen > > even if the kernel module has never been loaded, and I'm pretty > > sure the > > hangs can't be associated with the not-very-substantial built-in > > DTrace > > changes. I can try rebuilding without that as well, but I'll have > > to rip > > chunks out of my monitoring machinery to do that so I've avoided > > doing > > it so far.) > > > > I finally got round to doing a bit of debugging today. NFS > > debugging > > dumps covering the last fifteen seconds or so before the hang and a > > couple of minutes afterwards (taken on the client, logged without > > incident to the server's syslog, as proof that not *everything* is > > stalled) are a few megabytes long so are at > > <http://www.esperi.org.uk/~nix/temporary/nfs-steam-hang.log>. > > Everything > > seems to go wrong after an especially large bunch of NFS commits. > > > > Both client (silk, 192.168.16.21) and server (loom, 192.168.16.8) > > are > > running 5.10.4 with the same NFS config, though the client has no > > NFS > > server exports at the time and the server isn't acting as an NFS > > client > > at all: > > > > CONFIG_NFS_FS=y > > # CONFIG_NFS_V2 is not set > > CONFIG_NFS_V3=y > > CONFIG_NFS_V3_ACL=y > > CONFIG_NFS_V4=y > > # CONFIG_NFS_SWAP is not set > > CONFIG_NFS_V4_1=y > > CONFIG_NFS_V4_2=y > > CONFIG_PNFS_FILE_LAYOUT=y > > CONFIG_PNFS_BLOCK=y > > CONFIG_PNFS_FLEXFILE_LAYOUT=m > > CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org" > > # CONFIG_NFS_V4_1_MIGRATION is not set > > # CONFIG_ROOT_NFS is not set > > # CONFIG_NFS_USE_LEGACY_DNS is not set > > CONFIG_NFS_USE_KERNEL_DNS=y > > CONFIG_NFS_DEBUG=y > > CONFIG_NFS_DISABLE_UDP_SUPPORT=y > > # CONFIG_NFS_V4_2_READ_PLUS is not set > > CONFIG_NFSD=y > > CONFIG_NFSD_V2_ACL=y > > CONFIG_NFSD_V3=y > > CONFIG_NFSD_V3_ACL=y > > CONFIG_NFSD_V4=y > > # CONFIG_NFSD_BLOCKLAYOUT is not set > > # CONFIG_NFSD_SCSILAYOUT is not set > > # CONFIG_NFSD_FLEXFILELAYOUT is not set > > CONFIG_NFSD_V4_2_INTER_SSC=y > > CONFIG_GRACE_PERIOD=y > > CONFIG_LOCKD=y > > CONFIG_LOCKD_V4=y > > CONFIG_NFS_ACL_SUPPORT=y > > CONFIG_NFS_COMMON=y > > CONFIG_SUNRPC=y > > CONFIG_SUNRPC_GSS=y > > CONFIG_SUNRPC_BACKCHANNEL=y > > CONFIG_SUNRPC_DEBUG=y > > > > The server (loom) has a large number of exports: many of the > > filesystems > > being exported are moved around via bind mounts and the like. The > > steam > > datadir is in /pkg/non-free/steam, which is bind-mounted onto > > /home/nix/.steam on the client: /home/nix itself is bind-mounted > > from > > /home/.loom.srvr.nix/steam, and is NFS-exported from the server > > under > > that pathname. > > > > Relevant exports, from /proc/fs/nfs/exports: > > > > / 192.168.16.0/24,silk.wkstn.nix(ro,insecure,no_root_squash,s > > ync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=0a4a4563:00764033 > > :8c827c0e:989cf534,sec=390003:390004:390005:1) > > /home/.loom.srvr.nix *.srvr.nix,fold.srvr.nix(rw,root_squash,syn > > c,wdelay,no_subtree_check,uuid=0a4a4563:00764033:8c827c0e:989cf534, > > sec=1) Isn't that trying to export the same filesystem as '/' on the line above using conflicting export options? > > /pkg/non- > > free/steam 192.168.16.0/24,silk.wkstn.nix(rw,root_squash,sync,w > > delay,uuid=11b94016:cb6e4b64:beb6c134:9ea3d6b3,sec=1) > > > > Blocked tasks after the hang suggest that the client is hung > > waiting > > indefinitely for the server to reply, and I'd guess the server > > doesn't > > think it needs to. (I can do some server-side debugging dumps too > > if you > > think the problem might be there.) > > I seem to remember there have been some subtle changes to how dropped > replies are handled. If I remember right: a v4 server is never > supposed > to drop a reply without also closing the tcp connection, but it may > have > done so in the past. (I *think* all those cases are fixed?) > > And the client may have been changed to be more unforgiving about > that > requirement in recent years. > > But I don't know if that's where the issue is. > > I'm not sure how best to go about debugging. I suppose you could try > to > capture a network trace and work out if there's an RPC that isn't > getting a reply. > > Maybe turning on server debugging and looking for "svc: svc_process > dropit" messages timed with the start of the hang. I think what you > want is "rpdebug -m rpc -s svcdsp". > > Does rpc.mountd look healthy? > > --b. > > > > > > [ 322.843962] sysrq: Show Blocked State > > [ 322.844058] task:tee state:D stack: 0 pid: 1176 > > ppid: 1173 flags:0x00000000 > > [ 322.844064] Call Trace: > > [ 322.844074] __schedule+0x3db/0x7ee > > [ 322.844079] ? io_schedule+0x70/0x6d > > [ 322.844082] schedule+0x68/0xdc > > [ 322.844085] io_schedule+0x46/0x6d > > [ 322.844089] bit_wait_io+0x11/0x52 > > [ 322.844092] __wait_on_bit+0x31/0x90 > > [ 322.844096] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.844100] ? var_wake_function+0x30/0x27 > > [ 322.844104] nfs_wait_on_request.part.0+0x31/0x33 > > [ 322.844107] nfs_page_group_lock_head+0x31/0x78 > > [ 322.844110] nfs_lock_and_join_requests+0x9d/0x26e > > [ 322.844114] nfs_updatepage+0x1be/0x985 > > [ 322.844118] nfs_write_end+0x128/0x52a > > [ 322.844122] ? iov_iter_copy_from_user_atomic+0xe8/0x3ce > > [ 322.844127] generic_perform_write+0xed/0x17c > > [ 322.844131] nfs_file_write+0x14a/0x29f > > [ 322.844135] new_sync_write+0xfb/0x16b > > [ 322.844139] vfs_write+0x1ee/0x29a > > [ 322.844142] ? vfs_read+0x17b/0x198 > > [ 322.844145] ksys_write+0x61/0xd0 > > [ 322.844148] __x64_sys_write+0x1a/0x1c > > [ 322.844151] do_syscall_64+0x32/0x45 > > [ 322.844154] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > [ 322.844158] RIP: 0033:0x7f4781f323a3 > > [ 322.844160] RSP: 002b:00007ffcf8220408 EFLAGS: 00000246 > > ORIG_RAX: 0000000000000001 > > [ 322.844164] RAX: ffffffffffffffda RBX: 0000000000000092 RCX: > > 00007f4781f323a3 > > [ 322.844166] RDX: 0000000000000092 RSI: 00007ffcf8220530 RDI: > > 0000000000000003 > > [ 322.844168] RBP: 00007ffcf8220530 R08: 0000000000000000 R09: > > 0000000000000001 > > [ 322.844170] R10: 0000000000400ad2 R11: 0000000000000246 R12: > > 0000000000000092 > > [ 322.844172] R13: 0000000002361580 R14: 0000000000000092 R15: > > 00007f4782201720 > > [ 322.844202] task:CIPCServer::Thr state:D stack: 0 pid: 1801 > > ppid: 1540 flags:0x20020000 > > [ 322.844207] Call Trace: > > [ 322.844212] __schedule+0x3db/0x7ee > > [ 322.844219] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.844222] schedule+0x68/0xdc > > [ 322.844227] rpc_wait_bit_killable+0x11/0x52 > > [ 322.844230] __wait_on_bit+0x31/0x90 > > [ 322.844235] ? __rpc_atrun+0x20/0x1a > > [ 322.844238] ? __rpc_atrun+0x20/0x1a > > [ 322.844242] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.844246] ? var_wake_function+0x30/0x27 > > [ 322.844250] __rpc_execute+0xe5/0x485 > > [ 322.844254] ? sugov_get_util+0xf0/0xeb > > [ 322.844258] rpc_execute+0xa5/0xaa > > [ 322.844263] rpc_run_task+0x148/0x187 > > [ 322.844267] nfs4_do_call_sync+0x61/0x81 > > [ 322.844272] _nfs4_proc_getattr+0x10a/0x120 > > [ 322.844278] nfs4_proc_getattr+0x67/0xee > > [ 322.844283] __nfs_revalidate_inode+0xb7/0x2df > > [ 322.844287] nfs_access_get_cached+0x14d/0x1df > > [ 322.844291] nfs_do_access+0x48/0x28e > > [ 322.844296] ? step_into+0x2bc/0x6a1 > > [ 322.844301] nfs_permission+0xa2/0x1b8 > > [ 322.844305] inode_permission+0x96/0xca > > [ 322.844308] ? unlazy_walk+0x4c/0x80 > > [ 322.844312] link_path_walk.part.0+0x2a5/0x320 > > [ 322.844315] ? path_init+0x2c2/0x3e5 > > [ 322.844318] path_lookupat+0x3f/0x1b1 > > [ 322.844322] filename_lookup+0x97/0x17f > > [ 322.844326] ? kmem_cache_alloc+0x32/0x200 > > [ 322.844331] user_path_at_empty+0x59/0x8a > > [ 322.844333] do_faccessat+0x70/0x227 > > [ 322.844337] __ia32_sys_access+0x1c/0x1e > > [ 322.844340] __do_fast_syscall_32+0x5f/0x8e > > [ 322.844342] do_fast_syscall_32+0x3d/0x80 > > [ 322.844346] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > > [ 322.844348] RIP: 0023:0xf7ee6549 > > [ 322.844350] RSP: 002b:00000000eb763b28 EFLAGS: 00200292 > > ORIG_RAX: 0000000000000021 > > [ 322.844353] RAX: ffffffffffffffda RBX: 000000005754d9e0 RCX: > > 0000000000000004 > > [ 322.844355] RDX: 00000000f7ab3e1c RSI: 000000005754d9e0 RDI: > > 00000000eb763bc8 > > [ 322.844357] RBP: 00000000f78a91c0 R08: 0000000000000000 R09: > > 0000000000000000 > > [ 322.844359] R10: 0000000000000000 R11: 0000000000000000 R12: > > 0000000000000000 > > [ 322.844360] R13: 0000000000000000 R14: 0000000000000000 R15: > > 0000000000000000 > > [ 322.844364] task:CFileWriterThre state:D stack: 0 pid: 1810 > > ppid: 1540 flags:0x20024000 > > [ 322.844367] Call Trace: > > [ 322.844371] __schedule+0x3db/0x7ee > > [ 322.844375] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.844379] schedule+0x68/0xdc > > [ 322.844382] rpc_wait_bit_killable+0x11/0x52 > > [ 322.844385] __wait_on_bit+0x31/0x90 > > [ 322.844389] ? call_reserveresult+0xa0/0x9f > > [ 322.844391] ? call_transmit_status+0x160/0x159 > > [ 322.844395] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.844398] ? var_wake_function+0x30/0x27 > > [ 322.844401] __rpc_execute+0xe5/0x485 > > [ 322.844404] ? nfs_check_verifier+0x5b/0xa4 > > [ 322.844407] rpc_execute+0xa5/0xaa > > [ 322.844410] rpc_run_task+0x148/0x187 > > [ 322.844414] nfs4_do_call_sync+0x61/0x81 > > [ 322.844417] _nfs4_proc_getattr+0x10a/0x120 > > [ 322.844422] nfs4_proc_getattr+0x67/0xee > > [ 322.844426] __nfs_revalidate_inode+0xb7/0x2df > > [ 322.844430] nfs_getattr+0x159/0x374 > > [ 322.844433] vfs_getattr_nosec+0x96/0xa4 > > [ 322.844435] vfs_statx+0x7a/0xe5 > > [ 322.844438] vfs_fstatat+0x1f/0x21 > > [ 322.844443] __do_compat_sys_ia32_lstat64+0x2b/0x4e > > [ 322.844448] __ia32_compat_sys_ia32_lstat64+0x14/0x16 > > [ 322.844450] __do_fast_syscall_32+0x5f/0x8e > > [ 322.844453] do_fast_syscall_32+0x3d/0x80 > > [ 322.844457] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > > [ 322.844459] RIP: 0023:0xf7ee6549 > > [ 322.844460] RSP: 002b:00000000eb561be8 EFLAGS: 00000292 > > ORIG_RAX: 00000000000000c4 > > [ 322.844463] RAX: ffffffffffffffda RBX: 00000000c521bf60 RCX: > > 00000000eb561e70 > > [ 322.844465] RDX: 00000000f7ab3e1c RSI: 00000000eb561e70 RDI: > > 00000000c521bf60 > > [ 322.844467] RBP: 0000000000000000 R08: 0000000000000000 R09: > > 0000000000000000 > > [ 322.844468] R10: 0000000000000000 R11: 0000000000000000 R12: > > 0000000000000000 > > [ 322.844470] R13: 0000000000000000 R14: 0000000000000000 R15: > > 0000000000000000 > > [ 322.844477] task:Steam - DirWatc state:D stack: 0 pid: 1924 > > ppid: 1540 flags:0x20024000 > > [ 322.844480] Call Trace: > > [ 322.844484] __schedule+0x3db/0x7ee > > [ 322.844488] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.844492] schedule+0x68/0xdc > > [ 322.844495] rpc_wait_bit_killable+0x11/0x52 > > [ 322.844498] __wait_on_bit+0x31/0x90 > > [ 322.844501] ? rpc_sleep_on_timeout+0x64/0x9c > > [ 322.844504] ? rpc_check_timeout+0x1e0/0x1e0 > > [ 322.844507] ? call_connect_status+0x180/0x17a > > [ 322.844511] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.844513] ? var_wake_function+0x30/0x27 > > [ 322.844516] __rpc_execute+0xe5/0x485 > > [ 322.844520] rpc_execute+0xa5/0xaa > > [ 322.844523] rpc_run_task+0x148/0x187 > > [ 322.844526] nfs4_do_call_sync+0x61/0x81 > > [ 322.844530] _nfs4_proc_getattr+0x10a/0x120 > > [ 322.844534] nfs4_proc_getattr+0x67/0xee > > [ 322.844538] __nfs_revalidate_inode+0xb7/0x2df > > [ 322.844542] nfs_lookup_verify_inode+0x67/0x82 > > [ 322.844545] nfs_do_lookup_revalidate+0x19b/0x29c > > [ 322.844548] nfs4_do_lookup_revalidate+0x5d/0xc5 > > [ 322.844552] nfs4_lookup_revalidate+0x6a/0x8d > > [ 322.844555] lookup_fast+0xb8/0x122 > > [ 322.844559] path_openat+0xfd/0xfd8 > > [ 322.844562] ? unlazy_walk+0x4c/0x80 > > [ 322.844565] ? terminate_walk+0x62/0xed > > [ 322.844569] ? putname+0x4b/0x53 > > [ 322.844572] do_filp_open+0x86/0x110 > > [ 322.844576] ? kmem_cache_alloc+0x32/0x200 > > [ 322.844579] ? __alloc_fd+0xb2/0x143 > > [ 322.844583] do_sys_openat2+0x8d/0x13a > > [ 322.844585] ? do_faccessat+0x108/0x227 > > [ 322.844588] __ia32_compat_sys_openat+0x48/0x63 > > [ 322.844591] __do_fast_syscall_32+0x5f/0x8e > > [ 322.844594] do_fast_syscall_32+0x3d/0x80 > > [ 322.844597] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > > [ 322.844600] RIP: 0023:0xf7ee6549 > > [ 322.844601] RSP: 002b:000000008c8fbcb0 EFLAGS: 00200287 > > ORIG_RAX: 0000000000000127 > > [ 322.844604] RAX: ffffffffffffffda RBX: 00000000ffffff9c RCX: > > 000000008c8fbffc > > [ 322.844606] RDX: 0000000000098800 RSI: 0000000000000000 RDI: > > 00000000f7ab3e1c > > [ 322.844608] RBP: 0000000000000000 R08: 0000000000000000 R09: > > 0000000000000000 > > [ 322.844609] R10: 0000000000000000 R11: 0000000000000000 R12: > > 0000000000000000 > > [ 322.844611] R13: 0000000000000000 R14: 0000000000000000 R15: > > 0000000000000000 > > [ 322.844614] task:CGenericAsyncFi state:D stack: 0 pid: 1927 > > ppid: 1540 flags:0x20020000 > > [ 322.844617] Call Trace: > > [ 322.844621] __schedule+0x3db/0x7ee > > [ 322.844625] schedule+0x68/0xdc > > [ 322.844628] io_schedule+0x46/0x6d > > [ 322.844631] wait_on_page_bit_common+0xf8/0x325 > > [ 322.844634] ? > > trace_raw_output_file_check_and_advance_wb_err+0x80/0x7d > > [ 322.844638] wait_on_page_bit+0x3f/0x41 > > [ 322.844642] wait_on_page_writeback.part.0+0x35/0x80 > > [ 322.844645] wait_on_page_writeback+0x27/0x29 > > [ 322.844647] __filemap_fdatawait_range+0x88/0xd9 > > [ 322.844651] filemap_write_and_wait_range+0x45/0x80 > > [ 322.844655] nfs_wb_all+0x30/0x13d > > [ 322.844658] nfs4_file_flush+0x85/0xc0 > > [ 322.844661] filp_close+0x3e/0x78 > > [ 322.844664] __close_fd+0x23/0x30 > > [ 322.844667] __ia32_sys_close+0x22/0x50 > > [ 322.844670] __do_fast_syscall_32+0x5f/0x8e > > [ 322.844673] do_fast_syscall_32+0x3d/0x80 > > [ 322.844676] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > > [ 322.844678] RIP: 0023:0xf7ee6549 > > [ 322.844680] RSP: 002b:000000008c5fddb0 EFLAGS: 00200286 > > ORIG_RAX: 0000000000000006 > > [ 322.844682] RAX: ffffffffffffffda RBX: 000000000000004d RCX: > > 0000000000000002 > > [ 322.844684] RDX: 0000000000000000 RSI: 00000000f7d02e70 RDI: > > 000000000000004d > > [ 322.844686] RBP: 000000008c5fe110 R08: 0000000000000000 R09: > > 0000000000000000 > > [ 322.844688] R10: 0000000000000000 R11: 0000000000000000 R12: > > 0000000000000000 > > [ 322.844689] R13: 0000000000000000 R14: 0000000000000000 R15: > > 0000000000000000 > > [ 322.844699] task:ThreadPoolForeg state:D stack: 0 pid: 1768 > > ppid: 1735 flags:0x00004000 > > [ 322.844702] Call Trace: > > [ 322.844705] __schedule+0x3db/0x7ee > > [ 322.844709] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.844712] schedule+0x68/0xdc > > [ 322.844716] rpc_wait_bit_killable+0x11/0x52 > > [ 322.844719] __wait_on_bit+0x31/0x90 > > [ 322.844722] ? rpc_sleep_on_timeout+0x64/0x9c > > [ 322.844725] ? rpc_check_timeout+0x1e0/0x1e0 > > [ 322.844728] ? call_connect_status+0x180/0x17a > > [ 322.844731] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.844734] ? var_wake_function+0x30/0x27 > > [ 322.844737] __rpc_execute+0xe5/0x485 > > [ 322.844740] ? nfs_check_verifier+0x5b/0xa4 > > [ 322.844743] rpc_execute+0xa5/0xaa > > [ 322.844746] rpc_run_task+0x148/0x187 > > [ 322.844750] nfs4_do_call_sync+0x61/0x81 > > [ 322.844753] _nfs4_proc_getattr+0x10a/0x120 > > [ 322.844758] nfs4_proc_getattr+0x67/0xee > > [ 322.844762] __nfs_revalidate_inode+0xb7/0x2df > > [ 322.844766] nfs_getattr+0x159/0x374 > > [ 322.844769] vfs_getattr_nosec+0x96/0xa4 > > [ 322.844771] vfs_statx+0x7a/0xe5 > > [ 322.844774] __do_sys_newstat+0x31/0x4c > > [ 322.844778] __x64_sys_newstat+0x16/0x18 > > [ 322.844781] do_syscall_64+0x32/0x45 > > [ 322.844783] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > [ 322.844786] RIP: 0033:0x7fe70ebf19e6 > > [ 322.844787] RSP: 002b:00007fe6c15b2068 EFLAGS: 00000246 > > ORIG_RAX: 0000000000000004 > > [ 322.844790] RAX: ffffffffffffffda RBX: 00000bd4193e1910 RCX: > > 00007fe70ebf19e6 > > [ 322.844792] RDX: 00007fe6c15b2078 RSI: 00007fe6c15b2078 RDI: > > 00000bd4193e1910 > > [ 322.844794] RBP: 00007fe6c15b2160 R08: 0000000000000001 R09: > > 00007ffd971ed090 > > [ 322.844796] R10: 0000000000056764 R11: 0000000000000246 R12: > > 00007fe6c15b2078 > > [ 322.844797] R13: 00000bd4192f6f60 R14: 00007fe6c15b2108 R15: > > 00007fe6c15b2120 > > [ 322.844801] task:ThreadPoolForeg state:D stack: 0 pid: 1769 > > ppid: 1735 flags:0x00000000 > > [ 322.844804] Call Trace: > > [ 322.844807] __schedule+0x3db/0x7ee > > [ 322.844811] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.844814] schedule+0x68/0xdc > > [ 322.844817] rpc_wait_bit_killable+0x11/0x52 > > [ 322.844820] __wait_on_bit+0x31/0x90 > > [ 322.844823] ? __rpc_atrun+0x20/0x1a > > [ 322.844826] ? __rpc_atrun+0x20/0x1a > > [ 322.844829] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.844831] ? var_wake_function+0x30/0x27 > > [ 322.844834] __rpc_execute+0xe5/0x485 > > [ 322.844837] ? update_cfs_group+0x9a/0x9e > > [ 322.844840] rpc_execute+0xa5/0xaa > > [ 322.844842] rpc_run_task+0x148/0x187 > > [ 322.844845] nfs4_do_call_sync+0x61/0x81 > > [ 322.844849] _nfs4_proc_getattr+0x10a/0x120 > > [ 322.844853] nfs4_proc_getattr+0x67/0xee > > [ 322.844856] __nfs_revalidate_inode+0xb7/0x2df > > [ 322.844859] nfs_file_write+0xfc/0x29f > > [ 322.844862] new_sync_write+0xfb/0x16b > > [ 322.844866] vfs_write+0x1ee/0x29a > > [ 322.844868] ? __fget_light+0x31/0x72 > > [ 322.844871] ksys_write+0x61/0xd0 > > [ 322.844874] __x64_sys_write+0x1a/0x1c > > [ 322.844877] do_syscall_64+0x32/0x45 > > [ 322.844880] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > [ 322.844881] RIP: 0033:0x7fe71859ab7f > > [ 322.844883] RSP: 002b:00007fe6bd5b0d30 EFLAGS: 00000293 > > ORIG_RAX: 0000000000000001 > > [ 322.844885] RAX: ffffffffffffffda RBX: 00007fe6bd5b0d68 RCX: > > 00007fe71859ab7f > > [ 322.844887] RDX: 0000000000000007 RSI: 00007fe6bd5b1090 RDI: > > 000000000000005b > > [ 322.844888] RBP: 00007fe6bd5b0de0 R08: 0000000000000000 R09: > > 00007ffd971ed090 > > [ 322.844890] R10: 0000000000065fd2 R11: 0000000000000293 R12: > > 00000bd418d95020 > > [ 322.844891] R13: 00007fe6bd5b1090 R14: 0000000000000007 R15: > > 0000000000000000 > > [ 322.844900] task:ThreadPoolForeg state:D stack: 0 pid: 1807 > > ppid: 1773 flags:0x00000000 > > [ 322.844902] Call Trace: > > [ 322.844904] __schedule+0x3db/0x7ee > > [ 322.844907] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.844908] schedule+0x68/0xdc > > [ 322.844911] rpc_wait_bit_killable+0x11/0x52 > > [ 322.844913] __wait_on_bit+0x31/0x90 > > [ 322.844915] ? __rpc_atrun+0x20/0x1a > > [ 322.844916] ? __rpc_atrun+0x20/0x1a > > [ 322.844918] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.844920] ? var_wake_function+0x30/0x27 > > [ 322.844922] __rpc_execute+0xe5/0x485 > > [ 322.844924] rpc_execute+0xa5/0xaa > > [ 322.844926] rpc_run_task+0x148/0x187 > > [ 322.844928] nfs4_do_call_sync+0x61/0x81 > > [ 322.844930] _nfs4_proc_getattr+0x10a/0x120 > > [ 322.844933] nfs4_proc_getattr+0x67/0xee > > [ 322.844936] __nfs_revalidate_inode+0xb7/0x2df > > [ 322.844938] nfs_access_get_cached+0x14d/0x1df > > [ 322.844940] nfs_do_access+0x48/0x28e > > [ 322.844942] ? nfs4_lookup_revalidate+0x38/0x8d > > [ 322.844944] nfs_permission+0xa2/0x1b8 > > [ 322.844946] inode_permission+0x96/0xca > > [ 322.844948] ? unlazy_walk+0x4c/0x80 > > [ 322.844950] link_path_walk.part.0+0x2a5/0x320 > > [ 322.844952] ? path_init+0x2c2/0x3e5 > > [ 322.844954] path_lookupat+0x3f/0x1b1 > > [ 322.844956] filename_lookup+0x97/0x17f > > [ 322.844959] ? do_unlk+0x9a/0xca > > [ 322.844960] ? kmem_cache_alloc+0x32/0x200 > > [ 322.844963] user_path_at_empty+0x59/0x8a > > [ 322.844965] vfs_statx+0x64/0xe5 > > [ 322.844967] __do_sys_newstat+0x31/0x4c > > [ 322.844968] ? do_fcntl+0x113/0x582 > > [ 322.844971] ? fput+0x13/0x15 > > [ 322.844972] ? __x64_sys_fcntl+0x81/0xa8 > > [ 322.844974] __x64_sys_newstat+0x16/0x18 > > [ 322.844976] do_syscall_64+0x32/0x45 > > [ 322.844977] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > [ 322.844979] RIP: 0033:0x7f0d1fd0c9e6 > > [ 322.844980] RSP: 002b:00007f0d08e12e18 EFLAGS: 00000246 > > ORIG_RAX: 0000000000000004 > > [ 322.844981] RAX: ffffffffffffffda RBX: 00007f0d08e12edc RCX: > > 00007f0d1fd0c9e6 > > [ 322.844983] RDX: 00007f0d08e12e28 RSI: 00007f0d08e12e28 RDI: > > 00001605c2ff1169 > > [ 322.844984] RBP: 00007f0d08e12ec0 R08: 0000000000000001 R09: > > 00007fffaddd9090 > > [ 322.844985] R10: 00001605c3133480 R11: 0000000000000246 R12: > > 0000000000000000 > > [ 322.844986] R13: 0000000000000000 R14: 00001605c2ff0f00 R15: > > 0000000000000000 > > [ 322.844988] task:ThreadPoolForeg state:D stack: 0 pid: 2031 > > ppid: 1773 flags:0x00004000 > > [ 322.844990] Call Trace: > > [ 322.844992] __schedule+0x3db/0x7ee > > [ 322.844995] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.844997] schedule+0x68/0xdc > > [ 322.844999] rpc_wait_bit_killable+0x11/0x52 > > [ 322.845001] __wait_on_bit+0x31/0x90 > > [ 322.845003] ? rpc_sleep_on_timeout+0x64/0x9c > > [ 322.845005] ? rpc_check_timeout+0x1e0/0x1e0 > > [ 322.845006] ? call_connect_status+0x180/0x17a > > [ 322.845009] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.845010] ? var_wake_function+0x30/0x27 > > [ 322.845012] __rpc_execute+0xe5/0x485 > > [ 322.845014] rpc_execute+0xa5/0xaa > > [ 322.845016] rpc_run_task+0x148/0x187 > > [ 322.845018] nfs4_do_call_sync+0x61/0x81 > > [ 322.845020] _nfs4_proc_getattr+0x10a/0x120 > > [ 322.845023] nfs4_proc_getattr+0x67/0xee > > [ 322.845026] __nfs_revalidate_inode+0xb7/0x2df > > [ 322.845028] nfs_access_get_cached+0x14d/0x1df > > [ 322.845030] nfs_do_access+0x48/0x28e > > [ 322.845032] ? nfs4_lookup_revalidate+0x38/0x8d > > [ 322.845034] nfs_permission+0xa2/0x1b8 > > [ 322.845036] inode_permission+0x96/0xca > > [ 322.845038] ? unlazy_walk+0x4c/0x80 > > [ 322.845040] link_path_walk.part.0+0x2a5/0x320 > > [ 322.845042] ? path_init+0x2c2/0x3e5 > > [ 322.845044] path_lookupat+0x3f/0x1b1 > > [ 322.845046] filename_lookup+0x97/0x17f > > [ 322.845049] ? kmem_cache_alloc+0x32/0x200 > > [ 322.845051] user_path_at_empty+0x59/0x8a > > [ 322.845054] ? mntput_no_expire+0x4f/0x264 > > [ 322.845055] vfs_statx+0x64/0xe5 > > [ 322.845057] ? putname+0x4b/0x53 > > [ 322.845059] __do_sys_newstat+0x31/0x4c > > [ 322.845062] ? __x64_sys_futex+0x13a/0x159 > > [ 322.845064] ? switch_fpu_return+0x58/0xf2 > > [ 322.845066] __x64_sys_newstat+0x16/0x18 > > [ 322.845068] do_syscall_64+0x32/0x45 > > [ 322.845069] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > [ 322.845071] RIP: 0033:0x7f0d1fd0c9e6 > > [ 322.845072] RSP: 002b:00007f0d00331108 EFLAGS: 00000246 > > ORIG_RAX: 0000000000000004 > > [ 322.845073] RAX: ffffffffffffffda RBX: 00001605c2fdc0a0 RCX: > > 00007f0d1fd0c9e6 > > [ 322.845074] RDX: 00007f0d00331118 RSI: 00007f0d00331118 RDI: > > 00001605c2fdc0a0 > > [ 322.845076] RBP: 00007f0d00331200 R08: 0000000000000001 R09: > > 00007fffaddd9090 > > [ 322.845077] R10: 000000000005771c R11: 0000000000000246 R12: > > 00007f0d00331118 > > [ 322.845078] R13: 00001605c3309c40 R14: 00007f0d003311a8 R15: > > 00007f0d003311c0 > > [ 322.845087] task:192.168.16.8-ma state:D stack: 0 pid: 2044 > > ppid: 2 flags:0x00004000 > > [ 322.845089] Call Trace: > > [ 322.845091] __schedule+0x3db/0x7ee > > [ 322.845094] ? > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > [ 322.845095] schedule+0x68/0xdc > > [ 322.845098] rpc_wait_bit_killable+0x11/0x52 > > [ 322.845100] __wait_on_bit+0x31/0x90 > > [ 322.845102] ? __rpc_atrun+0x20/0x1a > > [ 322.845103] ? __rpc_atrun+0x20/0x1a > > [ 322.845105] out_of_line_wait_on_bit+0x7e/0x80 > > [ 322.845107] ? var_wake_function+0x30/0x27 > > [ 322.845109] __rpc_execute+0xe5/0x485 > > [ 322.845111] rpc_execute+0xa5/0xaa > > [ 322.845113] rpc_run_task+0x148/0x187 > > [ 322.845115] nfs4_do_call_sync+0x61/0x81 > > [ 322.845117] _nfs4_proc_statfs+0xc1/0xcf > > [ 322.845120] nfs4_proc_statfs+0x55/0x74 > > [ 322.845122] nfs_statfs+0x65/0x152 > > [ 322.845124] ? _nfs4_proc_delegreturn+0x21e/0x2e5 > > [ 322.845127] statfs_by_dentry+0x55/0x74 > > [ 322.845129] vfs_statfs+0x24/0x43 > > [ 322.845132] check_free_space+0x3c/0xe5 > > [ 322.845135] do_acct_process+0x5d/0x5e1 > > [ 322.845136] ? nfs_end_delegation_return+0x1cb/0x3aa > > [ 322.845138] ? complete+0x49/0x57 > > [ 322.845141] acct_process+0xe5/0x105 > > [ 322.845143] do_exit+0x781/0xa1b > > [ 322.845145] ? refcount_dec_and_lock+0x22/0x8c > > [ 322.845148] __module_put_and_exit+0x1b/0x1b > > [ 322.845150] nfs4_run_state_manager+0x293/0x933 > > [ 322.845152] ? nfs4_do_reclaim+0x820/0x81c > > [ 322.845155] kthread+0x142/0x15f > > [ 322.845157] ? __kthread_create_worker+0x130/0x122 > > [ 322.845159] ret_from_fork+0x1f/0x2a > > [ 369.859385] INFO: task tee:1176 blocked for more than 122 > > seconds. > > [ 369.859391] Not tainted 5.10.4-00021-gde017294a539-dirty > > #3 > > [ 369.859393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [ 369.859395] task:tee state:D stack: 0 pid: 1176 > > ppid: 1173 flags:0x00000000 > > [ 369.859399] Call Trace: > > [ 369.859407] __schedule+0x3db/0x7ee > > [ 369.859410] ? io_schedule+0x70/0x6d > > [ 369.859412] schedule+0x68/0xdc > > [ 369.859415] io_schedule+0x46/0x6d > > [ 369.859417] bit_wait_io+0x11/0x52 > > [ 369.859420] __wait_on_bit+0x31/0x90 > > [ 369.859422] out_of_line_wait_on_bit+0x7e/0x80 > > [ 369.859426] ? var_wake_function+0x30/0x27 > > [ 369.859429] nfs_wait_on_request.part.0+0x31/0x33 > > [ 369.859431] nfs_page_group_lock_head+0x31/0x78 > > [ 369.859433] nfs_lock_and_join_requests+0x9d/0x26e > > [ 369.859436] nfs_updatepage+0x1be/0x985 > > [ 369.859439] nfs_write_end+0x128/0x52a > > [ 369.859443] ? iov_iter_copy_from_user_atomic+0xe8/0x3ce > > [ 369.859446] generic_perform_write+0xed/0x17c > > [ 369.859449] nfs_file_write+0x14a/0x29f > > [ 369.859452] new_sync_write+0xfb/0x16b > > [ 369.859455] vfs_write+0x1ee/0x29a > > [ 369.859457] ? vfs_read+0x17b/0x198 > > [ 369.859459] ksys_write+0x61/0xd0 > > [ 369.859462] __x64_sys_write+0x1a/0x1c > > [ 369.859464] do_syscall_64+0x32/0x45 > > [ 369.859467] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > [ 369.859469] RIP: 0033:0x7f4781f323a3 > > [ 369.859471] RSP: 002b:00007ffcf8220408 EFLAGS: 00000246 > > ORIG_RAX: 0000000000000001 > > [ 369.859474] RAX: ffffffffffffffda RBX: 0000000000000092 RCX: > > 00007f4781f323a3 > > [ 369.859475] RDX: 0000000000000092 RSI: 00007ffcf8220530 RDI: > > 0000000000000003 > > [ 369.859477] RBP: 00007ffcf8220530 R08: 0000000000000000 R09: > > 0000000000000001 > > [ 369.859478] R10: 0000000000400ad2 R11: 0000000000000246 R12: > > 0000000000000092 > > [ 369.859480] R13: 0000000002361580 R14: 0000000000000092 R15: > > 00007f4782201720 > > [ 369.859495] INFO: task CGenericAsyncFi:1927 blocked for more > > than 122 seconds. > > [ 369.859497] Not tainted 5.10.4-00021-gde017294a539-dirty > > #3 > > [ 369.859498] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [ 369.859499] task:CGenericAsyncFi state:D stack: 0 pid: 1927 > > ppid: 1540 flags:0x20020000 > > [ 369.859501] Call Trace: > > [ 369.859504] __schedule+0x3db/0x7ee > > [ 369.859506] schedule+0x68/0xdc > > [ 369.859508] io_schedule+0x46/0x6d > > [ 369.859510] wait_on_page_bit_common+0xf8/0x325 > > [ 369.859512] ? > > trace_raw_output_file_check_and_advance_wb_err+0x80/0x7d > > [ 369.859515] wait_on_page_bit+0x3f/0x41 > > [ 369.859517] wait_on_page_writeback.part.0+0x35/0x80 > > [ 369.859519] wait_on_page_writeback+0x27/0x29 > > [ 369.859521] __filemap_fdatawait_range+0x88/0xd9 > > [ 369.859523] filemap_write_and_wait_range+0x45/0x80 > > [ 369.859525] nfs_wb_all+0x30/0x13d > > [ 369.859528] nfs4_file_flush+0x85/0xc0 > > [ 369.859530] filp_close+0x3e/0x78 > > [ 369.859532] __close_fd+0x23/0x30 > > [ 369.859534] __ia32_sys_close+0x22/0x50 > > [ 369.859536] __do_fast_syscall_32+0x5f/0x8e > > [ 369.859538] do_fast_syscall_32+0x3d/0x80 > > [ 369.859540] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > > [ 369.859541] RIP: 0023:0xf7ee6549 > > [ 369.859543] RSP: 002b:000000008c5fddb0 EFLAGS: 00200286 > > ORIG_RAX: 0000000000000006 > > [ 369.859545] RAX: ffffffffffffffda RBX: 000000000000004d RCX: > > 0000000000000002 > > [ 369.859546] RDX: 0000000000000000 RSI: 00000000f7d02e70 RDI: > > 000000000000004d > > [ 369.859547] RBP: 000000008c5fe110 R08: 0000000000000000 R09: > > 0000000000000000 > > [ 369.859548] R10: 0000000000000000 R11: 0000000000000000 R12: > > 0000000000000000 > > [ 369.859549] R13: 0000000000000000 R14: 0000000000000000 R15: > > 0000000000000000 > > [ 394.423093] nfs: server loom not responding, still trying > > > > -- > > NULL && (void) -- Trond Myklebust CTO, Hammerspace Inc 4984 El Camino Real, Suite 208 Los Altos, CA 94022 www.hammer.space ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) 2021-02-23 23:58 ` Trond Myklebust @ 2021-02-24 2:01 ` bfields 2021-04-01 13:33 ` Nix 2021-04-05 16:52 ` Nix 0 siblings, 2 replies; 12+ messages in thread From: bfields @ 2021-02-24 2:01 UTC (permalink / raw) To: Trond Myklebust; +Cc: nix, linux-nfs On Tue, Feb 23, 2021 at 11:58:51PM +0000, Trond Myklebust wrote: > On Tue, 2021-02-23 at 17:57 -0500, J. Bruce Fields wrote: > > On Sun, Jan 03, 2021 at 02:27:51PM +0000, Nick Alcock wrote: > > > Relevant exports, from /proc/fs/nfs/exports: > > > > > > / 192.168.16.0/24,silk.wkstn.nix(ro,insecure,no_root_squash,s > > > ync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=0a4a4563:00764033 > > > :8c827c0e:989cf534,sec=390003:390004:390005:1) > > > /home/.loom.srvr.nix *.srvr.nix,fold.srvr.nix(rw,root_squash,syn > > > c,wdelay,no_subtree_check,uuid=0a4a4563:00764033:8c827c0e:989cf534, > > > sec=1) > > Isn't that trying to export the same filesystem as '/' on the line > above using conflicting export options? Yes, and exporting something on the root filesystem is generally a bad idea for the usual reasons. I don't see any explanation for the write hang there, though. Unless maybe if it leaves mountd unable to answer an upcall for some reason, hm. I think that would show up in /proc/net/rpc/nfsd.fh/content. Try cat'ing that file after 'rpcdebug -m rpc -s cache' and that should show if there's an upcall that's not getting answered. --b. > > > > > /pkg/non- > > > free/steam 192.168.16.0/24,silk.wkstn.nix(rw,root_squash,sync,w > > > delay,uuid=11b94016:cb6e4b64:beb6c134:9ea3d6b3,sec=1) > > > > > > Blocked tasks after the hang suggest that the client is hung > > > waiting > > > indefinitely for the server to reply, and I'd guess the server > > > doesn't > > > think it needs to. (I can do some server-side debugging dumps too > > > if you > > > think the problem might be there.) > > > > I seem to remember there have been some subtle changes to how dropped > > replies are handled. If I remember right: a v4 server is never > > supposed > > to drop a reply without also closing the tcp connection, but it may > > have > > done so in the past. (I *think* all those cases are fixed?) > > > > And the client may have been changed to be more unforgiving about > > that > > requirement in recent years. > > > > But I don't know if that's where the issue is. > > > > I'm not sure how best to go about debugging. I suppose you could try > > to > > capture a network trace and work out if there's an RPC that isn't > > getting a reply. > > > > Maybe turning on server debugging and looking for "svc: svc_process > > dropit" messages timed with the start of the hang. I think what you > > want is "rpdebug -m rpc -s svcdsp". > > > > Does rpc.mountd look healthy? > > > > --b. > > > > > > > > > > [ 322.843962] sysrq: Show Blocked State > > > [ 322.844058] task:tee state:D stack: 0 pid: 1176 > > > ppid: 1173 flags:0x00000000 > > > [ 322.844064] Call Trace: > > > [ 322.844074] __schedule+0x3db/0x7ee > > > [ 322.844079] ? io_schedule+0x70/0x6d > > > [ 322.844082] schedule+0x68/0xdc > > > [ 322.844085] io_schedule+0x46/0x6d > > > [ 322.844089] bit_wait_io+0x11/0x52 > > > [ 322.844092] __wait_on_bit+0x31/0x90 > > > [ 322.844096] out_of_line_wait_on_bit+0x7e/0x80 > > > [ 322.844100] ? var_wake_function+0x30/0x27 > > > [ 322.844104] nfs_wait_on_request.part.0+0x31/0x33 > > > [ 322.844107] nfs_page_group_lock_head+0x31/0x78 > > > [ 322.844110] nfs_lock_and_join_requests+0x9d/0x26e > > > [ 322.844114] nfs_updatepage+0x1be/0x985 > > > [ 322.844118] nfs_write_end+0x128/0x52a > > > [ 322.844122] ? iov_iter_copy_from_user_atomic+0xe8/0x3ce > > > [ 322.844127] generic_perform_write+0xed/0x17c > > > [ 322.844131] nfs_file_write+0x14a/0x29f > > > [ 322.844135] new_sync_write+0xfb/0x16b > > > [ 322.844139] vfs_write+0x1ee/0x29a > > > [ 322.844142] ? vfs_read+0x17b/0x198 > > > [ 322.844145] ksys_write+0x61/0xd0 > > > [ 322.844148] __x64_sys_write+0x1a/0x1c > > > [ 322.844151] do_syscall_64+0x32/0x45 > > > [ 322.844154] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > > [ 322.844158] RIP: 0033:0x7f4781f323a3 > > > [ 322.844160] RSP: 002b:00007ffcf8220408 EFLAGS: 00000246 > > > ORIG_RAX: 0000000000000001 > > > [ 322.844164] RAX: ffffffffffffffda RBX: 0000000000000092 RCX: > > > 00007f4781f323a3 > > > [ 322.844166] RDX: 0000000000000092 RSI: 00007ffcf8220530 RDI: > > > 0000000000000003 > > > [ 322.844168] RBP: 00007ffcf8220530 R08: 0000000000000000 R09: > > > 0000000000000001 > > > [ 322.844170] R10: 0000000000400ad2 R11: 0000000000000246 R12: > > > 0000000000000092 > > > [ 322.844172] R13: 0000000002361580 R14: 0000000000000092 R15: > > > 00007f4782201720 > > > [ 322.844202] task:CIPCServer::Thr state:D stack: 0 pid: 1801 > > > ppid: 1540 flags:0x20020000 > > > [ 322.844207] Call Trace: > > > [ 322.844212] __schedule+0x3db/0x7ee > > > [ 322.844219] ? > > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > > [ 322.844222] schedule+0x68/0xdc > > > [ 322.844227] rpc_wait_bit_killable+0x11/0x52 > > > [ 322.844230] __wait_on_bit+0x31/0x90 > > > [ 322.844235] ? __rpc_atrun+0x20/0x1a > > > [ 322.844238] ? __rpc_atrun+0x20/0x1a > > > [ 322.844242] out_of_line_wait_on_bit+0x7e/0x80 > > > [ 322.844246] ? var_wake_function+0x30/0x27 > > > [ 322.844250] __rpc_execute+0xe5/0x485 > > > [ 322.844254] ? sugov_get_util+0xf0/0xeb > > > [ 322.844258] rpc_execute+0xa5/0xaa > > > [ 322.844263] rpc_run_task+0x148/0x187 > > > [ 322.844267] nfs4_do_call_sync+0x61/0x81 > > > [ 322.844272] _nfs4_proc_getattr+0x10a/0x120 > > > [ 322.844278] nfs4_proc_getattr+0x67/0xee > > > [ 322.844283] __nfs_revalidate_inode+0xb7/0x2df > > > [ 322.844287] nfs_access_get_cached+0x14d/0x1df > > > [ 322.844291] nfs_do_access+0x48/0x28e > > > [ 322.844296] ? step_into+0x2bc/0x6a1 > > > [ 322.844301] nfs_permission+0xa2/0x1b8 > > > [ 322.844305] inode_permission+0x96/0xca > > > [ 322.844308] ? unlazy_walk+0x4c/0x80 > > > [ 322.844312] link_path_walk.part.0+0x2a5/0x320 > > > [ 322.844315] ? path_init+0x2c2/0x3e5 > > > [ 322.844318] path_lookupat+0x3f/0x1b1 > > > [ 322.844322] filename_lookup+0x97/0x17f > > > [ 322.844326] ? kmem_cache_alloc+0x32/0x200 > > > [ 322.844331] user_path_at_empty+0x59/0x8a > > > [ 322.844333] do_faccessat+0x70/0x227 > > > [ 322.844337] __ia32_sys_access+0x1c/0x1e > > > [ 322.844340] __do_fast_syscall_32+0x5f/0x8e > > > [ 322.844342] do_fast_syscall_32+0x3d/0x80 > > > [ 322.844346] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > > > [ 322.844348] RIP: 0023:0xf7ee6549 > > > [ 322.844350] RSP: 002b:00000000eb763b28 EFLAGS: 00200292 > > > ORIG_RAX: 0000000000000021 > > > [ 322.844353] RAX: ffffffffffffffda RBX: 000000005754d9e0 RCX: > > > 0000000000000004 > > > [ 322.844355] RDX: 00000000f7ab3e1c RSI: 000000005754d9e0 RDI: > > > 00000000eb763bc8 > > > [ 322.844357] RBP: 00000000f78a91c0 R08: 0000000000000000 R09: > > > 0000000000000000 > > > [ 322.844359] R10: 0000000000000000 R11: 0000000000000000 R12: > > > 0000000000000000 > > > [ 322.844360] R13: 0000000000000000 R14: 0000000000000000 R15: > > > 0000000000000000 > > > [ 322.844364] task:CFileWriterThre state:D stack: 0 pid: 1810 > > > ppid: 1540 flags:0x20024000 > > > [ 322.844367] Call Trace: > > > [ 322.844371] __schedule+0x3db/0x7ee > > > [ 322.844375] ? > > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > > [ 322.844379] schedule+0x68/0xdc > > > [ 322.844382] rpc_wait_bit_killable+0x11/0x52 > > > [ 322.844385] __wait_on_bit+0x31/0x90 > > > [ 322.844389] ? call_reserveresult+0xa0/0x9f > > > [ 322.844391] ? call_transmit_status+0x160/0x159 > > > [ 322.844395] out_of_line_wait_on_bit+0x7e/0x80 > > > [ 322.844398] ? var_wake_function+0x30/0x27 > > > [ 322.844401] __rpc_execute+0xe5/0x485 > > > [ 322.844404] ? nfs_check_verifier+0x5b/0xa4 > > > [ 322.844407] rpc_execute+0xa5/0xaa > > > [ 322.844410] rpc_run_task+0x148/0x187 > > > [ 322.844414] nfs4_do_call_sync+0x61/0x81 > > > [ 322.844417] _nfs4_proc_getattr+0x10a/0x120 > > > [ 322.844422] nfs4_proc_getattr+0x67/0xee > > > [ 322.844426] __nfs_revalidate_inode+0xb7/0x2df > > > [ 322.844430] nfs_getattr+0x159/0x374 > > > [ 322.844433] vfs_getattr_nosec+0x96/0xa4 > > > [ 322.844435] vfs_statx+0x7a/0xe5 > > > [ 322.844438] vfs_fstatat+0x1f/0x21 > > > [ 322.844443] __do_compat_sys_ia32_lstat64+0x2b/0x4e > > > [ 322.844448] __ia32_compat_sys_ia32_lstat64+0x14/0x16 > > > [ 322.844450] __do_fast_syscall_32+0x5f/0x8e > > > [ 322.844453] do_fast_syscall_32+0x3d/0x80 > > > [ 322.844457] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > > > [ 322.844459] RIP: 0023:0xf7ee6549 > > > [ 322.844460] RSP: 002b:00000000eb561be8 EFLAGS: 00000292 > > > ORIG_RAX: 00000000000000c4 > > > [ 322.844463] RAX: ffffffffffffffda RBX: 00000000c521bf60 RCX: > > > 00000000eb561e70 > > > [ 322.844465] RDX: 00000000f7ab3e1c RSI: 00000000eb561e70 RDI: > > > 00000000c521bf60 > > > [ 322.844467] RBP: 0000000000000000 R08: 0000000000000000 R09: > > > 0000000000000000 > > > [ 322.844468] R10: 0000000000000000 R11: 0000000000000000 R12: > > > 0000000000000000 > > > [ 322.844470] R13: 0000000000000000 R14: 0000000000000000 R15: > > > 0000000000000000 > > > [ 322.844477] task:Steam - DirWatc state:D stack: 0 pid: 1924 > > > ppid: 1540 flags:0x20024000 > > > [ 322.844480] Call Trace: > > > [ 322.844484] __schedule+0x3db/0x7ee > > > [ 322.844488] ? > > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > > [ 322.844492] schedule+0x68/0xdc > > > [ 322.844495] rpc_wait_bit_killable+0x11/0x52 > > > [ 322.844498] __wait_on_bit+0x31/0x90 > > > [ 322.844501] ? rpc_sleep_on_timeout+0x64/0x9c > > > [ 322.844504] ? rpc_check_timeout+0x1e0/0x1e0 > > > [ 322.844507] ? call_connect_status+0x180/0x17a > > > [ 322.844511] out_of_line_wait_on_bit+0x7e/0x80 > > > [ 322.844513] ? var_wake_function+0x30/0x27 > > > [ 322.844516] __rpc_execute+0xe5/0x485 > > > [ 322.844520] rpc_execute+0xa5/0xaa > > > [ 322.844523] rpc_run_task+0x148/0x187 > > > [ 322.844526] nfs4_do_call_sync+0x61/0x81 > > > [ 322.844530] _nfs4_proc_getattr+0x10a/0x120 > > > [ 322.844534] nfs4_proc_getattr+0x67/0xee > > > [ 322.844538] __nfs_revalidate_inode+0xb7/0x2df > > > [ 322.844542] nfs_lookup_verify_inode+0x67/0x82 > > > [ 322.844545] nfs_do_lookup_revalidate+0x19b/0x29c > > > [ 322.844548] nfs4_do_lookup_revalidate+0x5d/0xc5 > > > [ 322.844552] nfs4_lookup_revalidate+0x6a/0x8d > > > [ 322.844555] lookup_fast+0xb8/0x122 > > > [ 322.844559] path_openat+0xfd/0xfd8 > > > [ 322.844562] ? unlazy_walk+0x4c/0x80 > > > [ 322.844565] ? terminate_walk+0x62/0xed > > > [ 322.844569] ? putname+0x4b/0x53 > > > [ 322.844572] do_filp_open+0x86/0x110 > > > [ 322.844576] ? kmem_cache_alloc+0x32/0x200 > > > [ 322.844579] ? __alloc_fd+0xb2/0x143 > > > [ 322.844583] do_sys_openat2+0x8d/0x13a > > > [ 322.844585] ? do_faccessat+0x108/0x227 > > > [ 322.844588] __ia32_compat_sys_openat+0x48/0x63 > > > [ 322.844591] __do_fast_syscall_32+0x5f/0x8e > > > [ 322.844594] do_fast_syscall_32+0x3d/0x80 > > > [ 322.844597] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > > > [ 322.844600] RIP: 0023:0xf7ee6549 > > > [ 322.844601] RSP: 002b:000000008c8fbcb0 EFLAGS: 00200287 > > > ORIG_RAX: 0000000000000127 > > > [ 322.844604] RAX: ffffffffffffffda RBX: 00000000ffffff9c RCX: > > > 000000008c8fbffc > > > [ 322.844606] RDX: 0000000000098800 RSI: 0000000000000000 RDI: > > > 00000000f7ab3e1c > > > [ 322.844608] RBP: 0000000000000000 R08: 0000000000000000 R09: > > > 0000000000000000 > > > [ 322.844609] R10: 0000000000000000 R11: 0000000000000000 R12: > > > 0000000000000000 > > > [ 322.844611] R13: 0000000000000000 R14: 0000000000000000 R15: > > > 0000000000000000 > > > [ 322.844614] task:CGenericAsyncFi state:D stack: 0 pid: 1927 > > > ppid: 1540 flags:0x20020000 > > > [ 322.844617] Call Trace: > > > [ 322.844621] __schedule+0x3db/0x7ee > > > [ 322.844625] schedule+0x68/0xdc > > > [ 322.844628] io_schedule+0x46/0x6d > > > [ 322.844631] wait_on_page_bit_common+0xf8/0x325 > > > [ 322.844634] ? > > > trace_raw_output_file_check_and_advance_wb_err+0x80/0x7d > > > [ 322.844638] wait_on_page_bit+0x3f/0x41 > > > [ 322.844642] wait_on_page_writeback.part.0+0x35/0x80 > > > [ 322.844645] wait_on_page_writeback+0x27/0x29 > > > [ 322.844647] __filemap_fdatawait_range+0x88/0xd9 > > > [ 322.844651] filemap_write_and_wait_range+0x45/0x80 > > > [ 322.844655] nfs_wb_all+0x30/0x13d > > > [ 322.844658] nfs4_file_flush+0x85/0xc0 > > > [ 322.844661] filp_close+0x3e/0x78 > > > [ 322.844664] __close_fd+0x23/0x30 > > > [ 322.844667] __ia32_sys_close+0x22/0x50 > > > [ 322.844670] __do_fast_syscall_32+0x5f/0x8e > > > [ 322.844673] do_fast_syscall_32+0x3d/0x80 > > > [ 322.844676] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > > > [ 322.844678] RIP: 0023:0xf7ee6549 > > > [ 322.844680] RSP: 002b:000000008c5fddb0 EFLAGS: 00200286 > > > ORIG_RAX: 0000000000000006 > > > [ 322.844682] RAX: ffffffffffffffda RBX: 000000000000004d RCX: > > > 0000000000000002 > > > [ 322.844684] RDX: 0000000000000000 RSI: 00000000f7d02e70 RDI: > > > 000000000000004d > > > [ 322.844686] RBP: 000000008c5fe110 R08: 0000000000000000 R09: > > > 0000000000000000 > > > [ 322.844688] R10: 0000000000000000 R11: 0000000000000000 R12: > > > 0000000000000000 > > > [ 322.844689] R13: 0000000000000000 R14: 0000000000000000 R15: > > > 0000000000000000 > > > [ 322.844699] task:ThreadPoolForeg state:D stack: 0 pid: 1768 > > > ppid: 1735 flags:0x00004000 > > > [ 322.844702] Call Trace: > > > [ 322.844705] __schedule+0x3db/0x7ee > > > [ 322.844709] ? > > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > > [ 322.844712] schedule+0x68/0xdc > > > [ 322.844716] rpc_wait_bit_killable+0x11/0x52 > > > [ 322.844719] __wait_on_bit+0x31/0x90 > > > [ 322.844722] ? rpc_sleep_on_timeout+0x64/0x9c > > > [ 322.844725] ? rpc_check_timeout+0x1e0/0x1e0 > > > [ 322.844728] ? call_connect_status+0x180/0x17a > > > [ 322.844731] out_of_line_wait_on_bit+0x7e/0x80 > > > [ 322.844734] ? var_wake_function+0x30/0x27 > > > [ 322.844737] __rpc_execute+0xe5/0x485 > > > [ 322.844740] ? nfs_check_verifier+0x5b/0xa4 > > > [ 322.844743] rpc_execute+0xa5/0xaa > > > [ 322.844746] rpc_run_task+0x148/0x187 > > > [ 322.844750] nfs4_do_call_sync+0x61/0x81 > > > [ 322.844753] _nfs4_proc_getattr+0x10a/0x120 > > > [ 322.844758] nfs4_proc_getattr+0x67/0xee > > > [ 322.844762] __nfs_revalidate_inode+0xb7/0x2df > > > [ 322.844766] nfs_getattr+0x159/0x374 > > > [ 322.844769] vfs_getattr_nosec+0x96/0xa4 > > > [ 322.844771] vfs_statx+0x7a/0xe5 > > > [ 322.844774] __do_sys_newstat+0x31/0x4c > > > [ 322.844778] __x64_sys_newstat+0x16/0x18 > > > [ 322.844781] do_syscall_64+0x32/0x45 > > > [ 322.844783] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > > [ 322.844786] RIP: 0033:0x7fe70ebf19e6 > > > [ 322.844787] RSP: 002b:00007fe6c15b2068 EFLAGS: 00000246 > > > ORIG_RAX: 0000000000000004 > > > [ 322.844790] RAX: ffffffffffffffda RBX: 00000bd4193e1910 RCX: > > > 00007fe70ebf19e6 > > > [ 322.844792] RDX: 00007fe6c15b2078 RSI: 00007fe6c15b2078 RDI: > > > 00000bd4193e1910 > > > [ 322.844794] RBP: 00007fe6c15b2160 R08: 0000000000000001 R09: > > > 00007ffd971ed090 > > > [ 322.844796] R10: 0000000000056764 R11: 0000000000000246 R12: > > > 00007fe6c15b2078 > > > [ 322.844797] R13: 00000bd4192f6f60 R14: 00007fe6c15b2108 R15: > > > 00007fe6c15b2120 > > > [ 322.844801] task:ThreadPoolForeg state:D stack: 0 pid: 1769 > > > ppid: 1735 flags:0x00000000 > > > [ 322.844804] Call Trace: > > > [ 322.844807] __schedule+0x3db/0x7ee > > > [ 322.844811] ? > > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > > [ 322.844814] schedule+0x68/0xdc > > > [ 322.844817] rpc_wait_bit_killable+0x11/0x52 > > > [ 322.844820] __wait_on_bit+0x31/0x90 > > > [ 322.844823] ? __rpc_atrun+0x20/0x1a > > > [ 322.844826] ? __rpc_atrun+0x20/0x1a > > > [ 322.844829] out_of_line_wait_on_bit+0x7e/0x80 > > > [ 322.844831] ? var_wake_function+0x30/0x27 > > > [ 322.844834] __rpc_execute+0xe5/0x485 > > > [ 322.844837] ? update_cfs_group+0x9a/0x9e > > > [ 322.844840] rpc_execute+0xa5/0xaa > > > [ 322.844842] rpc_run_task+0x148/0x187 > > > [ 322.844845] nfs4_do_call_sync+0x61/0x81 > > > [ 322.844849] _nfs4_proc_getattr+0x10a/0x120 > > > [ 322.844853] nfs4_proc_getattr+0x67/0xee > > > [ 322.844856] __nfs_revalidate_inode+0xb7/0x2df > > > [ 322.844859] nfs_file_write+0xfc/0x29f > > > [ 322.844862] new_sync_write+0xfb/0x16b > > > [ 322.844866] vfs_write+0x1ee/0x29a > > > [ 322.844868] ? __fget_light+0x31/0x72 > > > [ 322.844871] ksys_write+0x61/0xd0 > > > [ 322.844874] __x64_sys_write+0x1a/0x1c > > > [ 322.844877] do_syscall_64+0x32/0x45 > > > [ 322.844880] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > > [ 322.844881] RIP: 0033:0x7fe71859ab7f > > > [ 322.844883] RSP: 002b:00007fe6bd5b0d30 EFLAGS: 00000293 > > > ORIG_RAX: 0000000000000001 > > > [ 322.844885] RAX: ffffffffffffffda RBX: 00007fe6bd5b0d68 RCX: > > > 00007fe71859ab7f > > > [ 322.844887] RDX: 0000000000000007 RSI: 00007fe6bd5b1090 RDI: > > > 000000000000005b > > > [ 322.844888] RBP: 00007fe6bd5b0de0 R08: 0000000000000000 R09: > > > 00007ffd971ed090 > > > [ 322.844890] R10: 0000000000065fd2 R11: 0000000000000293 R12: > > > 00000bd418d95020 > > > [ 322.844891] R13: 00007fe6bd5b1090 R14: 0000000000000007 R15: > > > 0000000000000000 > > > [ 322.844900] task:ThreadPoolForeg state:D stack: 0 pid: 1807 > > > ppid: 1773 flags:0x00000000 > > > [ 322.844902] Call Trace: > > > [ 322.844904] __schedule+0x3db/0x7ee > > > [ 322.844907] ? > > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > > [ 322.844908] schedule+0x68/0xdc > > > [ 322.844911] rpc_wait_bit_killable+0x11/0x52 > > > [ 322.844913] __wait_on_bit+0x31/0x90 > > > [ 322.844915] ? __rpc_atrun+0x20/0x1a > > > [ 322.844916] ? __rpc_atrun+0x20/0x1a > > > [ 322.844918] out_of_line_wait_on_bit+0x7e/0x80 > > > [ 322.844920] ? var_wake_function+0x30/0x27 > > > [ 322.844922] __rpc_execute+0xe5/0x485 > > > [ 322.844924] rpc_execute+0xa5/0xaa > > > [ 322.844926] rpc_run_task+0x148/0x187 > > > [ 322.844928] nfs4_do_call_sync+0x61/0x81 > > > [ 322.844930] _nfs4_proc_getattr+0x10a/0x120 > > > [ 322.844933] nfs4_proc_getattr+0x67/0xee > > > [ 322.844936] __nfs_revalidate_inode+0xb7/0x2df > > > [ 322.844938] nfs_access_get_cached+0x14d/0x1df > > > [ 322.844940] nfs_do_access+0x48/0x28e > > > [ 322.844942] ? nfs4_lookup_revalidate+0x38/0x8d > > > [ 322.844944] nfs_permission+0xa2/0x1b8 > > > [ 322.844946] inode_permission+0x96/0xca > > > [ 322.844948] ? unlazy_walk+0x4c/0x80 > > > [ 322.844950] link_path_walk.part.0+0x2a5/0x320 > > > [ 322.844952] ? path_init+0x2c2/0x3e5 > > > [ 322.844954] path_lookupat+0x3f/0x1b1 > > > [ 322.844956] filename_lookup+0x97/0x17f > > > [ 322.844959] ? do_unlk+0x9a/0xca > > > [ 322.844960] ? kmem_cache_alloc+0x32/0x200 > > > [ 322.844963] user_path_at_empty+0x59/0x8a > > > [ 322.844965] vfs_statx+0x64/0xe5 > > > [ 322.844967] __do_sys_newstat+0x31/0x4c > > > [ 322.844968] ? do_fcntl+0x113/0x582 > > > [ 322.844971] ? fput+0x13/0x15 > > > [ 322.844972] ? __x64_sys_fcntl+0x81/0xa8 > > > [ 322.844974] __x64_sys_newstat+0x16/0x18 > > > [ 322.844976] do_syscall_64+0x32/0x45 > > > [ 322.844977] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > > [ 322.844979] RIP: 0033:0x7f0d1fd0c9e6 > > > [ 322.844980] RSP: 002b:00007f0d08e12e18 EFLAGS: 00000246 > > > ORIG_RAX: 0000000000000004 > > > [ 322.844981] RAX: ffffffffffffffda RBX: 00007f0d08e12edc RCX: > > > 00007f0d1fd0c9e6 > > > [ 322.844983] RDX: 00007f0d08e12e28 RSI: 00007f0d08e12e28 RDI: > > > 00001605c2ff1169 > > > [ 322.844984] RBP: 00007f0d08e12ec0 R08: 0000000000000001 R09: > > > 00007fffaddd9090 > > > [ 322.844985] R10: 00001605c3133480 R11: 0000000000000246 R12: > > > 0000000000000000 > > > [ 322.844986] R13: 0000000000000000 R14: 00001605c2ff0f00 R15: > > > 0000000000000000 > > > [ 322.844988] task:ThreadPoolForeg state:D stack: 0 pid: 2031 > > > ppid: 1773 flags:0x00004000 > > > [ 322.844990] Call Trace: > > > [ 322.844992] __schedule+0x3db/0x7ee > > > [ 322.844995] ? > > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > > [ 322.844997] schedule+0x68/0xdc > > > [ 322.844999] rpc_wait_bit_killable+0x11/0x52 > > > [ 322.845001] __wait_on_bit+0x31/0x90 > > > [ 322.845003] ? rpc_sleep_on_timeout+0x64/0x9c > > > [ 322.845005] ? rpc_check_timeout+0x1e0/0x1e0 > > > [ 322.845006] ? call_connect_status+0x180/0x17a > > > [ 322.845009] out_of_line_wait_on_bit+0x7e/0x80 > > > [ 322.845010] ? var_wake_function+0x30/0x27 > > > [ 322.845012] __rpc_execute+0xe5/0x485 > > > [ 322.845014] rpc_execute+0xa5/0xaa > > > [ 322.845016] rpc_run_task+0x148/0x187 > > > [ 322.845018] nfs4_do_call_sync+0x61/0x81 > > > [ 322.845020] _nfs4_proc_getattr+0x10a/0x120 > > > [ 322.845023] nfs4_proc_getattr+0x67/0xee > > > [ 322.845026] __nfs_revalidate_inode+0xb7/0x2df > > > [ 322.845028] nfs_access_get_cached+0x14d/0x1df > > > [ 322.845030] nfs_do_access+0x48/0x28e > > > [ 322.845032] ? nfs4_lookup_revalidate+0x38/0x8d > > > [ 322.845034] nfs_permission+0xa2/0x1b8 > > > [ 322.845036] inode_permission+0x96/0xca > > > [ 322.845038] ? unlazy_walk+0x4c/0x80 > > > [ 322.845040] link_path_walk.part.0+0x2a5/0x320 > > > [ 322.845042] ? path_init+0x2c2/0x3e5 > > > [ 322.845044] path_lookupat+0x3f/0x1b1 > > > [ 322.845046] filename_lookup+0x97/0x17f > > > [ 322.845049] ? kmem_cache_alloc+0x32/0x200 > > > [ 322.845051] user_path_at_empty+0x59/0x8a > > > [ 322.845054] ? mntput_no_expire+0x4f/0x264 > > > [ 322.845055] vfs_statx+0x64/0xe5 > > > [ 322.845057] ? putname+0x4b/0x53 > > > [ 322.845059] __do_sys_newstat+0x31/0x4c > > > [ 322.845062] ? __x64_sys_futex+0x13a/0x159 > > > [ 322.845064] ? switch_fpu_return+0x58/0xf2 > > > [ 322.845066] __x64_sys_newstat+0x16/0x18 > > > [ 322.845068] do_syscall_64+0x32/0x45 > > > [ 322.845069] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > > [ 322.845071] RIP: 0033:0x7f0d1fd0c9e6 > > > [ 322.845072] RSP: 002b:00007f0d00331108 EFLAGS: 00000246 > > > ORIG_RAX: 0000000000000004 > > > [ 322.845073] RAX: ffffffffffffffda RBX: 00001605c2fdc0a0 RCX: > > > 00007f0d1fd0c9e6 > > > [ 322.845074] RDX: 00007f0d00331118 RSI: 00007f0d00331118 RDI: > > > 00001605c2fdc0a0 > > > [ 322.845076] RBP: 00007f0d00331200 R08: 0000000000000001 R09: > > > 00007fffaddd9090 > > > [ 322.845077] R10: 000000000005771c R11: 0000000000000246 R12: > > > 00007f0d00331118 > > > [ 322.845078] R13: 00001605c3309c40 R14: 00007f0d003311a8 R15: > > > 00007f0d003311c0 > > > [ 322.845087] task:192.168.16.8-ma state:D stack: 0 pid: 2044 > > > ppid: 2 flags:0x00004000 > > > [ 322.845089] Call Trace: > > > [ 322.845091] __schedule+0x3db/0x7ee > > > [ 322.845094] ? > > > trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e > > > [ 322.845095] schedule+0x68/0xdc > > > [ 322.845098] rpc_wait_bit_killable+0x11/0x52 > > > [ 322.845100] __wait_on_bit+0x31/0x90 > > > [ 322.845102] ? __rpc_atrun+0x20/0x1a > > > [ 322.845103] ? __rpc_atrun+0x20/0x1a > > > [ 322.845105] out_of_line_wait_on_bit+0x7e/0x80 > > > [ 322.845107] ? var_wake_function+0x30/0x27 > > > [ 322.845109] __rpc_execute+0xe5/0x485 > > > [ 322.845111] rpc_execute+0xa5/0xaa > > > [ 322.845113] rpc_run_task+0x148/0x187 > > > [ 322.845115] nfs4_do_call_sync+0x61/0x81 > > > [ 322.845117] _nfs4_proc_statfs+0xc1/0xcf > > > [ 322.845120] nfs4_proc_statfs+0x55/0x74 > > > [ 322.845122] nfs_statfs+0x65/0x152 > > > [ 322.845124] ? _nfs4_proc_delegreturn+0x21e/0x2e5 > > > [ 322.845127] statfs_by_dentry+0x55/0x74 > > > [ 322.845129] vfs_statfs+0x24/0x43 > > > [ 322.845132] check_free_space+0x3c/0xe5 > > > [ 322.845135] do_acct_process+0x5d/0x5e1 > > > [ 322.845136] ? nfs_end_delegation_return+0x1cb/0x3aa > > > [ 322.845138] ? complete+0x49/0x57 > > > [ 322.845141] acct_process+0xe5/0x105 > > > [ 322.845143] do_exit+0x781/0xa1b > > > [ 322.845145] ? refcount_dec_and_lock+0x22/0x8c > > > [ 322.845148] __module_put_and_exit+0x1b/0x1b > > > [ 322.845150] nfs4_run_state_manager+0x293/0x933 > > > [ 322.845152] ? nfs4_do_reclaim+0x820/0x81c > > > [ 322.845155] kthread+0x142/0x15f > > > [ 322.845157] ? __kthread_create_worker+0x130/0x122 > > > [ 322.845159] ret_from_fork+0x1f/0x2a > > > [ 369.859385] INFO: task tee:1176 blocked for more than 122 > > > seconds. > > > [ 369.859391] Not tainted 5.10.4-00021-gde017294a539-dirty > > > #3 > > > [ 369.859393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > > disables this message. > > > [ 369.859395] task:tee state:D stack: 0 pid: 1176 > > > ppid: 1173 flags:0x00000000 > > > [ 369.859399] Call Trace: > > > [ 369.859407] __schedule+0x3db/0x7ee > > > [ 369.859410] ? io_schedule+0x70/0x6d > > > [ 369.859412] schedule+0x68/0xdc > > > [ 369.859415] io_schedule+0x46/0x6d > > > [ 369.859417] bit_wait_io+0x11/0x52 > > > [ 369.859420] __wait_on_bit+0x31/0x90 > > > [ 369.859422] out_of_line_wait_on_bit+0x7e/0x80 > > > [ 369.859426] ? var_wake_function+0x30/0x27 > > > [ 369.859429] nfs_wait_on_request.part.0+0x31/0x33 > > > [ 369.859431] nfs_page_group_lock_head+0x31/0x78 > > > [ 369.859433] nfs_lock_and_join_requests+0x9d/0x26e > > > [ 369.859436] nfs_updatepage+0x1be/0x985 > > > [ 369.859439] nfs_write_end+0x128/0x52a > > > [ 369.859443] ? iov_iter_copy_from_user_atomic+0xe8/0x3ce > > > [ 369.859446] generic_perform_write+0xed/0x17c > > > [ 369.859449] nfs_file_write+0x14a/0x29f > > > [ 369.859452] new_sync_write+0xfb/0x16b > > > [ 369.859455] vfs_write+0x1ee/0x29a > > > [ 369.859457] ? vfs_read+0x17b/0x198 > > > [ 369.859459] ksys_write+0x61/0xd0 > > > [ 369.859462] __x64_sys_write+0x1a/0x1c > > > [ 369.859464] do_syscall_64+0x32/0x45 > > > [ 369.859467] entry_SYSCALL_64_after_hwframe+0x44/0x0 > > > [ 369.859469] RIP: 0033:0x7f4781f323a3 > > > [ 369.859471] RSP: 002b:00007ffcf8220408 EFLAGS: 00000246 > > > ORIG_RAX: 0000000000000001 > > > [ 369.859474] RAX: ffffffffffffffda RBX: 0000000000000092 RCX: > > > 00007f4781f323a3 > > > [ 369.859475] RDX: 0000000000000092 RSI: 00007ffcf8220530 RDI: > > > 0000000000000003 > > > [ 369.859477] RBP: 00007ffcf8220530 R08: 0000000000000000 R09: > > > 0000000000000001 > > > [ 369.859478] R10: 0000000000400ad2 R11: 0000000000000246 R12: > > > 0000000000000092 > > > [ 369.859480] R13: 0000000002361580 R14: 0000000000000092 R15: > > > 00007f4782201720 > > > [ 369.859495] INFO: task CGenericAsyncFi:1927 blocked for more > > > than 122 seconds. > > > [ 369.859497] Not tainted 5.10.4-00021-gde017294a539-dirty > > > #3 > > > [ 369.859498] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > > disables this message. > > > [ 369.859499] task:CGenericAsyncFi state:D stack: 0 pid: 1927 > > > ppid: 1540 flags:0x20020000 > > > [ 369.859501] Call Trace: > > > [ 369.859504] __schedule+0x3db/0x7ee > > > [ 369.859506] schedule+0x68/0xdc > > > [ 369.859508] io_schedule+0x46/0x6d > > > [ 369.859510] wait_on_page_bit_common+0xf8/0x325 > > > [ 369.859512] ? > > > trace_raw_output_file_check_and_advance_wb_err+0x80/0x7d > > > [ 369.859515] wait_on_page_bit+0x3f/0x41 > > > [ 369.859517] wait_on_page_writeback.part.0+0x35/0x80 > > > [ 369.859519] wait_on_page_writeback+0x27/0x29 > > > [ 369.859521] __filemap_fdatawait_range+0x88/0xd9 > > > [ 369.859523] filemap_write_and_wait_range+0x45/0x80 > > > [ 369.859525] nfs_wb_all+0x30/0x13d > > > [ 369.859528] nfs4_file_flush+0x85/0xc0 > > > [ 369.859530] filp_close+0x3e/0x78 > > > [ 369.859532] __close_fd+0x23/0x30 > > > [ 369.859534] __ia32_sys_close+0x22/0x50 > > > [ 369.859536] __do_fast_syscall_32+0x5f/0x8e > > > [ 369.859538] do_fast_syscall_32+0x3d/0x80 > > > [ 369.859540] entry_SYSCALL_compat_after_hwframe+0x45/0x0 > > > [ 369.859541] RIP: 0023:0xf7ee6549 > > > [ 369.859543] RSP: 002b:000000008c5fddb0 EFLAGS: 00200286 > > > ORIG_RAX: 0000000000000006 > > > [ 369.859545] RAX: ffffffffffffffda RBX: 000000000000004d RCX: > > > 0000000000000002 > > > [ 369.859546] RDX: 0000000000000000 RSI: 00000000f7d02e70 RDI: > > > 000000000000004d > > > [ 369.859547] RBP: 000000008c5fe110 R08: 0000000000000000 R09: > > > 0000000000000000 > > > [ 369.859548] R10: 0000000000000000 R11: 0000000000000000 R12: > > > 0000000000000000 > > > [ 369.859549] R13: 0000000000000000 R14: 0000000000000000 R15: > > > 0000000000000000 > > > [ 394.423093] nfs: server loom not responding, still trying > > > > > > -- > > > NULL && (void) > > -- > Trond Myklebust > CTO, Hammerspace Inc > 4984 El Camino Real, Suite 208 > Los Altos, CA 94022 > > www.hammer.space > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) 2021-02-24 2:01 ` bfields @ 2021-04-01 13:33 ` Nix 2021-04-01 13:44 ` bfields 2021-04-05 16:52 ` Nix 1 sibling, 1 reply; 12+ messages in thread From: Nix @ 2021-04-01 13:33 UTC (permalink / raw) To: bfields; +Cc: Trond Myklebust, linux-nfs [Sorry about the huge delay: your replies got accidentally marked as read in a MUA snafu. I'll be getting some more debugging dumps -- and seeing if this still happens! -- when I reboot into 5.11 this weekend.] On 24 Feb 2021, bfields@fieldses.org said: > On Tue, Feb 23, 2021 at 11:58:51PM +0000, Trond Myklebust wrote: >> On Tue, 2021-02-23 at 17:57 -0500, J. Bruce Fields wrote: >> > On Sun, Jan 03, 2021 at 02:27:51PM +0000, Nick Alcock wrote: >> > > Relevant exports, from /proc/fs/nfs/exports: >> > > >> > > / 192.168.16.0/24,silk.wkstn.nix(ro,insecure,no_root_squash,s >> > > ync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=0a4a4563:00764033 >> > > :8c827c0e:989cf534,sec=390003:390004:390005:1) >> > > /home/.loom.srvr.nix *.srvr.nix,fold.srvr.nix(rw,root_squash,syn >> > > c,wdelay,no_subtree_check,uuid=0a4a4563:00764033:8c827c0e:989cf534, >> > > sec=1) >> >> Isn't that trying to export the same filesystem as '/' on the line >> above using conflicting export options? Hmm. I don't actually have a / mount in /etc/exports (and haven't had one since I finished building this machine, last year), and looking at /proc/fs/nfs/exports on the server now, it's not there. The 'v4root' makes me wonder if this was automatically synthesised? Exports explicitly named in /etc/exports for that machine are: /home/.loom.srvr.nix -rw,no_subtree_check,sync fold.srvr.nix(root_squash) mutilate(no_root_squash) silk(no_root_squash) /.nfs/nix/Graphics/Private -no_root_squash,sync,no_subtree_check mutilate(rw) silk(rw) /.nfs/nix/share/phones -root_squash,async,no_subtree_check fold.srvr.nix(rw) mutilate(rw) silk(rw) /.nfs/nix/.cache -root_squash,async,no_subtree_check fold.srvr.nix(rw) mutilate(rw) silk(rw) /.nfs/nix/Mail/nnmh -root_squash,async,no_subtree_check fold.srvr.nix(ro) mutilate(rw) silk(rw) /.nfs/compiler/.ccache -root_squash,async,no_subtree_check mutilate(rw) silk(rw) /.nfs/compiler/.cache -root_squash,async,no_subtree_check fold.srvr.nix(rw) mutilate(rw) silk(rw) /usr/doc -root_squash,no_subtree_check,async mutilate(ro) silk(ro) /usr/info -root_squash,no_subtree_check,async mutilate(ro) silk(ro) /usr/share/texlive -no_root_squash,no_subtree_check,async mutilate(rw) silk(rw) /usr/share/xemacs -no_root_squash,no_subtree_check,async mutilate(rw) silk(rw) /usr/share/xplanet -root_squash,no_subtree_check,async mutilate(ro) silk(ro) /usr/share/nethack -root_squash,no_subtree_check,async mutilate(rw) silk(rw) /pkg/non-free -no_root_squash,no_subtree_check,async mutilate(rw) chronix.wkstn.nix(rw) silk(rw) /usr/lib/X11/fonts -root_squash,no_subtree_check,async mutilate(ro) silk(ro) /usr/share/wine -root_squash,no_subtree_check,async mutilate(rw) silk(rw) /usr/share/clamav -root_squash,no_subtree_check,async mutilate(ro) silk(ro) /usr/share/emacs/site-lisp -no_root_squash,no_subtree_check,async mutilate(rw) silk(rw) /usr/archive -root_squash,async,subtree_check mutilate(rw,root_squash,insecure) cinema.srvr.nix(ro,root_squash,insecure) chronix.wkstn.nix(ro) silk(rw,root_squash,insecure) /usr/archive/music/Pete -root_squash,async,subtree_check mutilate(rw,root_squash,insecure) cinema.srvr.nix(ro,root_squash,insecure) chronix.wkstn.nix(ro) silk(rw,root_squash,insecure) /var/log.real -root_squash,no_subtree_check,async mutilate(ro) silk(ro) /etc/shai-hulud -no_root_squash,no_subtree_check,async mutilate(rw) silk(rw) /usr/src -no_root_squash,no_subtree_check,async mutilate(rw) oracle.vm.nix(rw) 192.168.20.0/24(ro) scratch.vm.nix(rw) ubuntu.vm.nix(rw) cinema.srvr.nix(rw,root_squash) chronix.wkstn.nix(rw,root_squash) silk(rw) /var/cache/CPAN -no_root_squash,no_subtree_check,async mutilate(rw) silk(rw) /usr/share/flightgear -root_squash,async,no_subtree_check mutilate(ro) silk(ro) /usr/local/tmp/encoding/mkv -no_subtree_check,root_squash,ro mutilate.wkstn.nix silk.wkstn.nix /pkg/non-free/steam -rw,subtree_check,root_squash mutilate.wkstn.nix loom.srvr.nix chronix.wkstn.nix silk.wkstn.nix(insecure) /.transient/workstations/silk -no_root_squash,async,subtree_check silk(rw) /trees/mirrors/mutilate silk.wkstn.nix(no_root_squash,async,no_subtree_check,rw) /trees/mirrors/silk silk.wkstn.nix(no_root_squash,async,no_subtree_check,rw) ... and, well, it's a bit of a mess and there are a lot of them, but there are no nested mounts there, at least, not outside /usr/archive which is probably not relevant for this. (The /.nfs stuff is particularly ugly: if I try to export e.g. /home/nix/.cache rather than /.nfs/nix/.cache the export simply never appears: /.nfs/$foo is just built out of bind mounts for this purpose, but then /home/$name is also a bind mount from /home/.loom.srvr.nix/$name on this network in any case. /nfs/$foo is usually bind-mounted *back* under /home on the client, too, but still from NFS's perspective I guess this means they are distinct? I hope so...) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) 2021-04-01 13:33 ` Nix @ 2021-04-01 13:44 ` bfields 2021-04-01 21:52 ` Nix 0 siblings, 1 reply; 12+ messages in thread From: bfields @ 2021-04-01 13:44 UTC (permalink / raw) To: Nix; +Cc: Trond Myklebust, linux-nfs On Thu, Apr 01, 2021 at 02:33:02PM +0100, Nix wrote: > [Sorry about the huge delay: your replies got accidentally marked as > read in a MUA snafu. I'll be getting some more debugging dumps -- and > seeing if this still happens! -- when I reboot into 5.11 this weekend.] > > On 24 Feb 2021, bfields@fieldses.org said: > > > On Tue, Feb 23, 2021 at 11:58:51PM +0000, Trond Myklebust wrote: > >> On Tue, 2021-02-23 at 17:57 -0500, J. Bruce Fields wrote: > >> > On Sun, Jan 03, 2021 at 02:27:51PM +0000, Nick Alcock wrote: > >> > > Relevant exports, from /proc/fs/nfs/exports: > >> > > > >> > > / 192.168.16.0/24,silk.wkstn.nix(ro,insecure,no_root_squash,s > >> > > ync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=0a4a4563:00764033 > >> > > :8c827c0e:989cf534,sec=390003:390004:390005:1) > >> > > /home/.loom.srvr.nix *.srvr.nix,fold.srvr.nix(rw,root_squash,syn > >> > > c,wdelay,no_subtree_check,uuid=0a4a4563:00764033:8c827c0e:989cf534, > >> > > sec=1) > >> > >> Isn't that trying to export the same filesystem as '/' on the line > >> above using conflicting export options? > > Hmm. I don't actually have a / mount in /etc/exports (and haven't had > one since I finished building this machine, last year), and looking at > /proc/fs/nfs/exports on the server now, it's not there. Right, but even though you're not exporting /, you're exporting /home/.loom.srvr.nix, and that's on the same filesystem as /, isn't it? --b. > > The 'v4root' makes me wonder if this was automatically synthesised? > > Exports explicitly named in /etc/exports for that machine are: > > /home/.loom.srvr.nix -rw,no_subtree_check,sync fold.srvr.nix(root_squash) mutilate(no_root_squash) silk(no_root_squash) > /.nfs/nix/Graphics/Private -no_root_squash,sync,no_subtree_check mutilate(rw) silk(rw) > /.nfs/nix/share/phones -root_squash,async,no_subtree_check fold.srvr.nix(rw) mutilate(rw) silk(rw) > /.nfs/nix/.cache -root_squash,async,no_subtree_check fold.srvr.nix(rw) mutilate(rw) silk(rw) > /.nfs/nix/Mail/nnmh -root_squash,async,no_subtree_check fold.srvr.nix(ro) mutilate(rw) silk(rw) > /.nfs/compiler/.ccache -root_squash,async,no_subtree_check mutilate(rw) silk(rw) > /.nfs/compiler/.cache -root_squash,async,no_subtree_check fold.srvr.nix(rw) mutilate(rw) silk(rw) > /usr/doc -root_squash,no_subtree_check,async mutilate(ro) silk(ro) > /usr/info -root_squash,no_subtree_check,async mutilate(ro) silk(ro) > /usr/share/texlive -no_root_squash,no_subtree_check,async mutilate(rw) silk(rw) > /usr/share/xemacs -no_root_squash,no_subtree_check,async mutilate(rw) silk(rw) > /usr/share/xplanet -root_squash,no_subtree_check,async mutilate(ro) silk(ro) > /usr/share/nethack -root_squash,no_subtree_check,async mutilate(rw) silk(rw) > /pkg/non-free -no_root_squash,no_subtree_check,async mutilate(rw) chronix.wkstn.nix(rw) silk(rw) > /usr/lib/X11/fonts -root_squash,no_subtree_check,async mutilate(ro) silk(ro) > /usr/share/wine -root_squash,no_subtree_check,async mutilate(rw) silk(rw) > /usr/share/clamav -root_squash,no_subtree_check,async mutilate(ro) silk(ro) > /usr/share/emacs/site-lisp -no_root_squash,no_subtree_check,async mutilate(rw) silk(rw) > /usr/archive -root_squash,async,subtree_check mutilate(rw,root_squash,insecure) cinema.srvr.nix(ro,root_squash,insecure) chronix.wkstn.nix(ro) silk(rw,root_squash,insecure) > /usr/archive/music/Pete -root_squash,async,subtree_check mutilate(rw,root_squash,insecure) cinema.srvr.nix(ro,root_squash,insecure) chronix.wkstn.nix(ro) silk(rw,root_squash,insecure) > /var/log.real -root_squash,no_subtree_check,async mutilate(ro) silk(ro) > /etc/shai-hulud -no_root_squash,no_subtree_check,async mutilate(rw) silk(rw) > /usr/src -no_root_squash,no_subtree_check,async mutilate(rw) oracle.vm.nix(rw) 192.168.20.0/24(ro) scratch.vm.nix(rw) ubuntu.vm.nix(rw) cinema.srvr.nix(rw,root_squash) chronix.wkstn.nix(rw,root_squash) silk(rw) > /var/cache/CPAN -no_root_squash,no_subtree_check,async mutilate(rw) silk(rw) > /usr/share/flightgear -root_squash,async,no_subtree_check mutilate(ro) silk(ro) > /usr/local/tmp/encoding/mkv -no_subtree_check,root_squash,ro mutilate.wkstn.nix silk.wkstn.nix > /pkg/non-free/steam -rw,subtree_check,root_squash mutilate.wkstn.nix loom.srvr.nix chronix.wkstn.nix silk.wkstn.nix(insecure) > /.transient/workstations/silk -no_root_squash,async,subtree_check silk(rw) > /trees/mirrors/mutilate silk.wkstn.nix(no_root_squash,async,no_subtree_check,rw) > /trees/mirrors/silk silk.wkstn.nix(no_root_squash,async,no_subtree_check,rw) > > ... and, well, it's a bit of a mess and there are a lot of them, but > there are no nested mounts there, at least, not outside /usr/archive > which is probably not relevant for this. (The /.nfs stuff is > particularly ugly: if I try to export e.g. /home/nix/.cache rather than > /.nfs/nix/.cache the export simply never appears: /.nfs/$foo is just > built out of bind mounts for this purpose, but then /home/$name is also > a bind mount from /home/.loom.srvr.nix/$name on this network in any > case. /nfs/$foo is usually bind-mounted *back* under /home on the > client, too, but still from NFS's perspective I guess this means they > are distinct? I hope so...) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) 2021-04-01 13:44 ` bfields @ 2021-04-01 21:52 ` Nix 2021-04-02 19:20 ` bfields 0 siblings, 1 reply; 12+ messages in thread From: Nix @ 2021-04-01 21:52 UTC (permalink / raw) To: bfields; +Cc: Trond Myklebust, linux-nfs On 1 Apr 2021, bfields@fieldses.org outgrape: > On Thu, Apr 01, 2021 at 02:33:02PM +0100, Nix wrote: >> [Sorry about the huge delay: your replies got accidentally marked as >> read in a MUA snafu. I'll be getting some more debugging dumps -- and >> seeing if this still happens! -- when I reboot into 5.11 this weekend.] >> >> On 24 Feb 2021, bfields@fieldses.org said: >> >> > On Tue, Feb 23, 2021 at 11:58:51PM +0000, Trond Myklebust wrote: >> >> On Tue, 2021-02-23 at 17:57 -0500, J. Bruce Fields wrote: >> >> > On Sun, Jan 03, 2021 at 02:27:51PM +0000, Nick Alcock wrote: >> >> > > Relevant exports, from /proc/fs/nfs/exports: >> >> > > >> >> > > / 192.168.16.0/24,silk.wkstn.nix(ro,insecure,no_root_squash,s >> >> > > ync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=0a4a4563:00764033 >> >> > > :8c827c0e:989cf534,sec=390003:390004:390005:1) >> >> > > /home/.loom.srvr.nix *.srvr.nix,fold.srvr.nix(rw,root_squash,syn >> >> > > c,wdelay,no_subtree_check,uuid=0a4a4563:00764033:8c827c0e:989cf534, >> >> > > sec=1) >> >> >> >> Isn't that trying to export the same filesystem as '/' on the line >> >> above using conflicting export options? >> >> Hmm. I don't actually have a / mount in /etc/exports (and haven't had >> one since I finished building this machine, last year), and looking at >> /proc/fs/nfs/exports on the server now, it's not there. > > Right, but even though you're not exporting /, you're exporting > /home/.loom.srvr.nix, and that's on the same filesystem as /, isn't it? Yes, but I'm *not* exporting /. (I just checked my backups, and no such export existed at the time I sent the original mail, nor was I importing it on the client). This export is prsumably automatically generated, and likely indicates nothing more than that I am exporting from different subtrees off the root (which I am: various subdirectories of /home, /usr, /trees, /.transient, /.nfs, and /pkg are exported). ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) 2021-04-01 21:52 ` Nix @ 2021-04-02 19:20 ` bfields 2021-04-03 22:41 ` Nix 2021-04-05 11:48 ` Nix 0 siblings, 2 replies; 12+ messages in thread From: bfields @ 2021-04-02 19:20 UTC (permalink / raw) To: Nix; +Cc: Trond Myklebust, linux-nfs On Thu, Apr 01, 2021 at 10:52:47PM +0100, Nix wrote: > On 1 Apr 2021, bfields@fieldses.org outgrape: > > > On Thu, Apr 01, 2021 at 02:33:02PM +0100, Nix wrote: > >> [Sorry about the huge delay: your replies got accidentally marked as > >> read in a MUA snafu. I'll be getting some more debugging dumps -- and > >> seeing if this still happens! -- when I reboot into 5.11 this weekend.] > >> > >> On 24 Feb 2021, bfields@fieldses.org said: > >> > >> > On Tue, Feb 23, 2021 at 11:58:51PM +0000, Trond Myklebust wrote: > >> >> On Tue, 2021-02-23 at 17:57 -0500, J. Bruce Fields wrote: > >> >> > On Sun, Jan 03, 2021 at 02:27:51PM +0000, Nick Alcock wrote: > >> >> > > Relevant exports, from /proc/fs/nfs/exports: > >> >> > > > >> >> > > / 192.168.16.0/24,silk.wkstn.nix(ro,insecure,no_root_squash,s > >> >> > > ync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=0a4a4563:00764033 > >> >> > > :8c827c0e:989cf534,sec=390003:390004:390005:1) > >> >> > > /home/.loom.srvr.nix *.srvr.nix,fold.srvr.nix(rw,root_squash,syn > >> >> > > c,wdelay,no_subtree_check,uuid=0a4a4563:00764033:8c827c0e:989cf534, > >> >> > > sec=1) > >> >> > >> >> Isn't that trying to export the same filesystem as '/' on the line > >> >> above using conflicting export options? > >> > >> Hmm. I don't actually have a / mount in /etc/exports (and haven't had > >> one since I finished building this machine, last year), and looking at > >> /proc/fs/nfs/exports on the server now, it's not there. > > > > Right, but even though you're not exporting /, you're exporting > > /home/.loom.srvr.nix, and that's on the same filesystem as /, isn't it? > > Yes, but I'm *not* exporting /. (I just checked my backups, and no such > export existed at the time I sent the original mail, nor was I importing > it on the client). > > This export is prsumably automatically generated, and likely indicates > nothing more than that I am exporting from different subtrees off the > root (which I am: various subdirectories of /home, /usr, /trees, > /.transient, /.nfs, and /pkg are exported). Right, the "/" export is automatically generated. I don't have any real hypothesis here, I'm just thinking if the server's failing to reply, one possible culprit is that it's waiting for rpc.mountd. The server depends on rpc.mountd for export information. And it sounds like you have a relatively complicated export setup, so there's perhaps more that could go wrong there. Might be worth, on the server: rpcdebug -m rpc -s cache and then dumping the contents of all the files /proc/net/rpc/*/content. That should show us whether the server's waiting on userspace helpers for anything. Sorry, did you say whether nfsd threads or rpc.mountd are blocked? --b. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) 2021-04-02 19:20 ` bfields @ 2021-04-03 22:41 ` Nix 2021-04-05 11:48 ` Nix 1 sibling, 0 replies; 12+ messages in thread From: Nix @ 2021-04-03 22:41 UTC (permalink / raw) To: bfields; +Cc: Trond Myklebust, linux-nfs On 2 Apr 2021, bfields@fieldses.org said: > Sorry, did you say whether nfsd threads or rpc.mountd are blocked? I'm not sure. I will endeavour to find out (I think I did look to see if anything was blocked, but I can't remember the result: it's been a long time now). I'll probably do some more debugging on Monday. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) 2021-04-02 19:20 ` bfields 2021-04-03 22:41 ` Nix @ 2021-04-05 11:48 ` Nix 1 sibling, 0 replies; 12+ messages in thread From: Nix @ 2021-04-05 11:48 UTC (permalink / raw) To: bfields; +Cc: Trond Myklebust, linux-nfs On 2 Apr 2021, bfields@fieldses.org said: > Sorry, did you say whether nfsd threads or rpc.mountd are blocked? ... just about to switch into debugging this, but it does seem to me that if nfsd threads or (especially) mountd on the server side were blocked, I'd see misbehaviour with mounts from every client, not just a few of them. This doesn't happen. While this is going on, my firewall and other clients not engaging in the problematic Steam-related activity can talk NFSv4 to the server perfectly happily: indeed this is actually a problem when debugging because I have to quiesce the bloody things as much as I can to stop their RPC traffic flooding the log with irrelevant junk :) Recovery from this consists only of rebooting the stuck client: the server and all other clients don't need touching (indeed, I'm typing this in an emacs on that server, and since it was last rebooted it's been hit by a client experiencing this hang at least five times: the mailserver also keeps its mailspool on that server as well, and no problems there either). (The server also has fairly silly amounts of RAM compared to the load it's placed under. I'm not concerned about the possibility of rpc.mountd getting swapped out. It just doesn't happen. Even things like git gc of the entire Chromium git repo proceed without swapping.) btw, the filesystem layout on this machine is, in part: /dev/main/root xfs 4294993920 738953092 3556040828 18% / /dev/mapper/main-vms xfs 1073231868 406045460 667186408 38% /vm /dev/mapper/main-steam ext4 1055852896 85367140 916781564 9% /pkg/non-free/steam /dev/mapper/main-archive xfs 3219652608 2761922796 457729812 86% /usr/archive /dev/mapper/main-pete xfs 2468405656 2216785448 251620208 90% /usr/archive/music/Pete /dev/mapper/main-phones xfs 52411388 4354092 48057296 9% /.nfs/nix/share/phones /dev/mapper/main-unifi xfs 10491804 1130228 9361576 11% /var/lib/unifi /dev/mapper/oracrypt-plain 2147510784 144030636 2003480148 7% /home/oranix/oracle/private ... and you'll note that the exported fs I'm seeing hangs on is actually *not* the $HOME on the root fs: it's /pkg/non-free/steam, which is ext4 purely because so many games on x86 still fail horribly when 64-bit inodes are used, and ext4 can emit 32-bit inodes on biggish fses without horrible performance consequences, unlike xfs. The relevant import line: loom:/pkg/non-free/steam/.steam /home/nix/.steam nfs defaults,_netdev (so it is imported to *subdirectory* of a directory which is a mounted NFS export, and *that* one is exported from /). The hang also happens when using nfusr as the NFS client for the .steam import, so whatever it is isn't just down to the client... The primary reason I'm using one big fs for almost everything on this server build is, uh, NFSv4. My last machine had lots of little filesystems, and the result somehow confused the NFS pseudoroot construction process so badly that most of the things I tried to export never appeared on NFSv4, only on v3: only those exports which *were* on the root filesystem were ever available for NFSv4 mounting, so I was stuck with v3 on that machine. At (IIRC) Chuck Lever's suggestion (many years ago, so he probably won't remember) I varied things when I built a new server and was happy to find that with a less baroque setup and a bigger rootfs with more stuff on it, NFSv4 seemed perfectly happy and the pseudoroot was populated fine. OK let's collect some logs so we're not reasoning in the absence of data any more. Back soon! (I hope.) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) 2021-02-24 2:01 ` bfields 2021-04-01 13:33 ` Nix @ 2021-04-05 16:52 ` Nix 1 sibling, 0 replies; 12+ messages in thread From: Nix @ 2021-04-05 16:52 UTC (permalink / raw) To: bfields; +Cc: Trond Myklebust, linux-nfs, linux-net [linux-net added: a stubborn NFS client hang doing uncompression of some Steam games over NFS now looks like it might be TCP-stack or even network-device-related rather than NFS-related] On 24 Feb 2021, bfields@fieldses.org said: > On Tue, Feb 23, 2021 at 11:58:51PM +0000, Trond Myklebust wrote: >> On Tue, 2021-02-23 at 17:57 -0500, J. Bruce Fields wrote: >> > On Sun, Jan 03, 2021 at 02:27:51PM +0000, Nick Alcock wrote: >> > > Relevant exports, from /proc/fs/nfs/exports: >> > > >> > > / 192.168.16.0/24,silk.wkstn.nix(ro,insecure,no_root_squash,s >> > > ync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=0a4a4563:00764033 >> > > :8c827c0e:989cf534,sec=390003:390004:390005:1) >> > > /home/.loom.srvr.nix *.srvr.nix,fold.srvr.nix(rw,root_squash,syn >> > > c,wdelay,no_subtree_check,uuid=0a4a4563:00764033:8c827c0e:989cf534, >> > > sec=1) >> >> Isn't that trying to export the same filesystem as '/' on the line >> above using conflicting export options? > > Yes, and exporting something on the root filesystem is generally a bad > idea for the usual reasons. > > I don't see any explanation for the write hang there, though. > > Unless maybe if it leaves mountd unable to answer an upcall for some > reason, hm. > > I think that would show up in /proc/net/rpc/nfsd.fh/content. Try > cat'ing that file after 'rpcdebug -m rpc -s cache' and that should show > if there's an upcall that's not getting answered. OK, I got some debugging: this is still happening on 5.11, but it's starting to look more like a problem below NFS. There are no unanswered upcalls, and though I did eventually manage to get a multimegabyte pile of NFS debugging info (after fighting with the problem that the Steam client has internal timeouts that are silently exceeded and break things if you just leave debugging on)... I suspect I don't need to bother you with it, because the packet capture was more informative. I have a complete capture of everything on the wire from the moment Steam started, but it's nearly 150MiB xz'ed and includes a lot of boring nonsense related to Steam's revalidation of everything because the last startup crashed: it probably also includes things like my Steam account password, getdents of my home directory etc so if you want that I can send you the details privately: but this capture of only the last few thousand NFS packets is interesting enough: <http://www.esperi.org.uk/~nix/temporary/nfs-trouble.cap>. Looking at this in Wireshark, everything goes fine (ignoring the usual offloading-associated checksum whining) until packet 1644, when we suddenly start seeing out-of-order packets and retransmissions (on an otherwise totally idle 10GbE subnet). They get more and more common, and at packet 1711 the client loses patience and hits the host with a RST. This is... not likely to do NFS any good at all, and with $HOME served over the same connection I'm not surprised the client goes fairly unresponsive after that and the hangcheck timer starts screaming about processes blocked on I/O to some of the odder filesystems I'm getting over NFS on that client, like /var/tmp and /var/account/acct (so, yes, all process termination likely blocks forever on this). So it looks like NFS is being betrayed by TCP and/or some lower layer. (Also, NFS doesn't seem to try too hard to recover -- or perhaps, when it does, the new session goes just as wrong). The socket statistics on the server report 22 rx drops since boot (out of 11903K) and no errors of any kind: possibly each of these drops corresponds to the test runs I've been doing, but there can't be more than one or two drops per test (I must have crashed the client over ten times trying to get dumps) which if true seems more fragile than I expect NFS to be :) so they might well just be coincidental. I simplified my network for this test run, so the link has literally no hosts on it right now other than the NFS server and client at issue and a 10GbE Netgear GS110EMX switch, which reports zero errors and no packets dropped since it was rebooted months ago. Both ends are the same Intel X540-AT2 driven by ixgbe. The network cabling has never given me the least cause for concern, and I've never seen things getting hit by RSTs when talking to this host before. I've certainly never seen *NFS* getting a RST. It happens consistently, anyway, which argues strongly against the cabling :) Neither client nor server are using any sort of iptables (it's not even compiled in), and while the server is doing a bit of policy-based routing and running a couple of bridges for VMs it's not doing anything that should cause this (and this was replicated when the bridges had nothing on them other than the host in any case). Syncookies are compiled out on both hosts (they're still on on the firewall, but that's not participating in this and isn't even on the same subnet). Just in case: complete network-related sysctls changed from default on the client: net.ipv6.conf.default.keep_addr_on_down = 1 and on the server: net.ipv4.ip_forward = 1 net.ipv6.conf.all.forwarding = 1 net.ipv6.conf.default.keep_addr_on_down = 1 net.ipv4.ping_group_range = 73 73 net.core.bpf_jit_enable = 1 None of these seem too likely to cause *this*. I wonder how I could track down where this mess is coming from... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) @ 2021-02-13 15:21 Nick Alcock 0 siblings, 0 replies; 12+ messages in thread From: Nick Alcock @ 2021-02-13 15:21 UTC (permalink / raw) To: NFS list (I can't get References: right on this mail due to the original aging out of my mailbox: archive URL, https://www.spinics.net/lists/linux-nfs/msg81430.html). I now have a little lockdep info from this hang (and reports from at least two others that they've seen similar-looking hangs dating back to 4.19, though much harder to reproduce, taking many hours rather than five minutes: in one case they report not using NFS in production any more because of this). Unfortunately the lockdep info isn't much use: Feb 13 14:13:12 silk warning: : [ 888.834464] Showing all locks held in the system: Feb 13 14:13:12 silk warning: : [ 888.834501] 1 lock held by dmesg/1152: Feb 13 14:13:12 silk warning: : [ 888.834508] #0: ffff980c3b7200d0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x49/0x2d1 Feb 13 14:13:12 silk warning: : [ 888.834540] 2 locks held by tee/1322: Feb 13 14:13:12 silk warning: : [ 888.834546] #0: ffff980c0809a430 (sb_writers#12){.+.+}-{0:0}, at: ksys_write+0x6a/0xdc Feb 13 14:13:12 silk warning: : [ 888.834573] #1: ffff980c3ca7b5e8 (&sb->s_type->i_mutex_key#16){++++}-{3:3}, at: nfs_start_io_write+0x1a/0x45 Feb 13 14:13:12 silk warning: : [ 888.834632] 1 lock held by 192.168.16.8-ma/2302: Feb 13 14:13:12 silk warning: : [ 888.834638] #0: ffff980c0fe6b700 (&acct->lock#2){+.+.}-{3:3}, at: acct_process+0x102/0x2bc The first of those is my ongoing dmesg -w. The last is process accounting. The middle one is an ongoing, always-active Xsession-errors tee over the same NFSv4 connection, which says nothing more than that writes to this NFS server from this client have hung, which we already know. There are no signs of locks held by the Steam client which has hung in the middle of installation. So whateverthehell this is, it's not blocked on a lock. The NFS client is hanging all on its own. (I have no idea how clients can block in the middle of writing if a lock is *not* involved somehow, but that is what it looks like from the lockdep output.) Does anyone know how I might start debugging this sod? ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-04-05 16:52 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-01-03 14:27 steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) Nick Alcock 2021-02-23 22:57 ` J. Bruce Fields 2021-02-23 23:58 ` Trond Myklebust 2021-02-24 2:01 ` bfields 2021-04-01 13:33 ` Nix 2021-04-01 13:44 ` bfields 2021-04-01 21:52 ` Nix 2021-04-02 19:20 ` bfields 2021-04-03 22:41 ` Nix 2021-04-05 11:48 ` Nix 2021-04-05 16:52 ` Nix 2021-02-13 15:21 Nick Alcock
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.