On 16/04/2018 04:40 μμ, Jan Kara wrote: > On Mon 16-04-18 15:25:50, Guillaume Morin wrote: >> Fwiw, there have been already reports of similar soft lockups in >> fsnotify() on 4.14: https://lkml.org/lkml/2018/3/2/1038 >> >> We have also noticed similar softlockups with 4.14.22 here. > > Yeah. > >> On 16 Apr 13:54, Pavlos Parissis wrote: >>> >>> Hi all, >>> [..snip..] >>> [373782.361064] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [kube-apiserver:24261] >>> [373782.378225] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag >>> inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c loop x86_pkg_temp_thermal >>> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel >>> pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf iTCO_wdt ses >>> iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr ipmi_devintf lpc_ich sg mei >>> ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd auth_rpcgss nfs_acl lockd grace >>> sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt >>> fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw mpt3sas ptp libata raid_class >>> pps_core scsi_transport_sas >>> [373782.516807] dm_mirror dm_region_hash dm_log dm_mod dax >>> [373782.531739] CPU: 24 PID: 24261 Comm: kube-apiserver Not tainted 4.14.32-1.el7.x86_64 #1 >>> [373782.549848] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017 >>> [373782.567486] task: ffff882f66d28000 task.stack: ffffc9002120c000 >>> [373782.583441] RIP: 0010:fsnotify+0x197/0x510 >>> [373782.597319] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10 >>> [373782.615308] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002 >>> [373782.632950] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0 >>> [373782.650616] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000 >>> [373782.668287] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >>> [373782.685918] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 >>> [373782.703302] FS: 000000c42009f090(0000) GS:ffff882fbf900000(0000) knlGS:0000000000000000 >>> [373782.721887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [373782.737741] CR2: 00007f82b6539244 CR3: 0000002f3de2a005 CR4: 00000000003606e0 >>> [373782.755247] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> [373782.772722] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>> [373782.790043] Call Trace: >>> [373782.802041] vfs_write+0x151/0x1b0 >>> [373782.815081] ? syscall_trace_enter+0x1cd/0x2b0 >>> [373782.829175] SyS_write+0x55/0xc0 >>> [373782.841870] do_syscall_64+0x79/0x1b0 >>> [373782.855073] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > Can you please run RIP through ./scripts/faddr2line to see where exactly > are we looping? I expect the loop iterating over marks to notify but better > be sure. > I am very newbie on this and I tried with: ../repo/Linux/linux/scripts/faddr2line ./vmlinuz-4.14.32-1.el7.x86_64 0010:fsnotify+0x197/0x510 readelf: Error: Not an ELF file - it has the wrong magic bytes at the start size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag IMAGE_SCN_MEM_NOT_PAGED in section .bss nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag IMAGE_SCN_MEM_NOT_PAGED in section .bss nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag IMAGE_SCN_MEM_NOT_PAGED in section .bss nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag IMAGE_SCN_MEM_NOT_PAGED in section .bss nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols no match for 0010:fsnotify+0x197/0x510 Obviously, I am doing something very wrong. > How easily can you hit this? Very easily, I only need to wait 1-2 days for a crash to occur. > Are you able to run debug kernels Well, I was under the impression I do as I have: grep -E 'DEBUG_KERNEL|DEBUG_INFO' /boot/config-4.14.32-1.el7.x86_64 CONFIG_DEBUG_INFO=y # CONFIG_DEBUG_INFO_REDUCED is not set # CONFIG_DEBUG_INFO_SPLIT is not set # CONFIG_DEBUG_INFO_DWARF4 is not set CONFIG_DEBUG_KERNEL=y Do you think that my kernel doesn't produce a proper crash dump? I have a production cluster where I can run any kernel we need, so if I need to compile again with different settings I can certainly do that. > / inspect > crash dumps when the issue occurs? I can't do that as the server isn't responsive and I can only power cycle it. > Also testing with the latest mainline > kernel (4.16) would be welcome whether this isn't just an issue with the > backport of fsnotify fixes from Miklos. I can try the kernel-ml-4.16.2 from elrepo (we use CentOS 7). Thanks a lot for your reply. Pavlos Parissis