On 17/04/2018 01:31 πμ, Pavlos Parissis wrote: > On 16/04/2018 04:40 μμ, Jan Kara wrote: >> On Mon 16-04-18 15:25:50, Guillaume Morin wrote: >>> Fwiw, there have been already reports of similar soft lockups in >>> fsnotify() on 4.14: https://lkml.org/lkml/2018/3/2/1038 >>> >>> We have also noticed similar softlockups with 4.14.22 here. >> >> Yeah. >> >>> On 16 Apr 13:54, Pavlos Parissis wrote: >>>> >>>> Hi all, >>>> > > [..snip..] > >>>> [373782.361064] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [kube-apiserver:24261] >>>> [373782.378225] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag >>>> inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c loop x86_pkg_temp_thermal >>>> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel >>>> pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf iTCO_wdt ses >>>> iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr ipmi_devintf lpc_ich sg mei >>>> ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd auth_rpcgss nfs_acl lockd grace >>>> sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt >>>> fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw mpt3sas ptp libata raid_class >>>> pps_core scsi_transport_sas >>>> [373782.516807] dm_mirror dm_region_hash dm_log dm_mod dax >>>> [373782.531739] CPU: 24 PID: 24261 Comm: kube-apiserver Not tainted 4.14.32-1.el7.x86_64 #1 >>>> [373782.549848] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017 >>>> [373782.567486] task: ffff882f66d28000 task.stack: ffffc9002120c000 >>>> [373782.583441] RIP: 0010:fsnotify+0x197/0x510 >>>> [373782.597319] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10 >>>> [373782.615308] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002 >>>> [373782.632950] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0 >>>> [373782.650616] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000 >>>> [373782.668287] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >>>> [373782.685918] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 >>>> [373782.703302] FS: 000000c42009f090(0000) GS:ffff882fbf900000(0000) knlGS:0000000000000000 >>>> [373782.721887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [373782.737741] CR2: 00007f82b6539244 CR3: 0000002f3de2a005 CR4: 00000000003606e0 >>>> [373782.755247] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> [373782.772722] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>> [373782.790043] Call Trace: >>>> [373782.802041] vfs_write+0x151/0x1b0 >>>> [373782.815081] ? syscall_trace_enter+0x1cd/0x2b0 >>>> [373782.829175] SyS_write+0x55/0xc0 >>>> [373782.841870] do_syscall_64+0x79/0x1b0 >>>> [373782.855073] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 >> >> Can you please run RIP through ./scripts/faddr2line to see where exactly >> are we looping? I expect the loop iterating over marks to notify but better >> be sure. >> > > I am very newbie on this and I tried with: > ../repo/Linux/linux/scripts/faddr2line ./vmlinuz-4.14.32-1.el7.x86_64 > 0010:fsnotify+0x197/0x510 > readelf: Error: Not an ELF file - it has the wrong magic bytes at the start > size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag > IMAGE_SCN_MEM_NOT_PAGED in section .bss > nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag > IMAGE_SCN_MEM_NOT_PAGED in section .bss > nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols > size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag > IMAGE_SCN_MEM_NOT_PAGED in section .bss > nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag > IMAGE_SCN_MEM_NOT_PAGED in section .bss > nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols > no match for 0010:fsnotify+0x197/0x510 > > Obviously, I am doing something very wrong. > I produced an uncompressed image(the error above caused by giving a compressed image to faddr2line) by compiling 4.14.32 with config which we have in production and now faddr2line reports: ../repo/Linux/linux/scripts/faddr2line ./vmlinux 0010:fsnotify+0x197/0x510 no match for 0010:fsnotify+0x197/0x510 ../repo/Linux/linux/scripts/faddr2line ./vmlinux fsnotify+0x197/0x510 skipping fsnotify address at 0xffffffff8129baf7 due to size mismatch (0x510 != 0x520) no match for fsnotify+0x197/0x510 what am I doing wrong? Cheers, Pavlos