Hi, Alexander Viro and dear Linux Filesystems maintainers, recently we encounter a NULL pointer dereference Oops in our production. We have attempted to analyze the core dump and compare it with source code in the past few weeks, currently still could not understand why `dentry->d_inode` become NULL while other fields look normal. Here is the call stack trace of this Oops. ``` [19521409.363839] BUG: unable to handle kernel NULL pointer dereference at 000000000000000c [19521409.372016] IP: __atime_needs_update+0x5/0x190 [19521409.376757] PGD 80000020326ad067 P4D 80000020326ad067 PUD 200fd06067 PMD 0 [19521409.384025] Oops: 0000 [#1] SMP PTI [19521409.387796] Modules linked in: veth ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat_ipv4 nf_nat br_netfilter bridge stp llc aufs overlay cpuid iptable_filter ip_tables cls_cgroup sch_htb xt_multiport ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_conntrack x_tables bonding nls_utf8 isofs ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi toa(OE) nf_conntrack lp parport intel_rapl skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass intel_cstate intel_rapl_perf ipmi_ssif ipmi_si dcdbas mei_me mei ipmi_devintf lpc_ich shpchp ipmi_msghandler acpi_power_meter mac_hid autofs4 btrfs zstd_compress [19521409.458627] raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul mgag200 ghash_clmulni_intel ttm pcbc drm_kms_helper aesni_intel syscopyarea aes_x86_64 sysfillrect ixgbe igb sysimgblt crypto_simd fb_sys_fops dca i2c_algo_bit glue_helper ptp megaraid_sas ahci drm cryptd mdio pps_core libahci [last unloaded: ip_tables] [19521409.496053] CPU: 46 PID: 10855 Comm: node-exporter Tainted: G OE 4.15.0-42-generic #46~16.04.1+4 [19521409.506851] Hardware name: Dell Inc. PowerEdge R740xd/08D89F, BIOS 1.4.9 06/29/2018 [19521409.514784] RIP: 0010:__atime_needs_update+0x5/0x190 [19521409.520026] RSP: 0018:ffff9dee09c2fc48 EFLAGS: 00010202 [19521409.525528] RAX: ffff8a4281d01ec0 RBX: fefefefefefefeff RCX: 0000000000000040 [19521409.532942] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9dee09c2fde8 [19521409.540354] RBP: ffff9dee09c2fca8 R08: ffff9dee09c2fbf4 R09: ffff9dee09c2fd90 [19521409.547761] R10: ffff8a34397b4022 R11: 6b636f732f74656e R12: 2f2f2f2f2f2f2f2f [19521409.555176] R13: 0000000000000000 R14: ffff8a34397b4026 R15: ffff9dee09c2fde8 [19521409.562592] FS: 000000c000218090(0000) GS:ffff8a3b401c0000(0000) knlGS:0000000000000000 [19521409.570976] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [19521409.577001] CR2: 000000000000000c CR3: 000000203ad22005 CR4: 00000000007606e0 [19521409.584415] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [19521409.592937] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [19521409.601419] PKRU: 55555554 [19521409.605464] Call Trace: [19521409.609235] ? link_path_walk+0x3e4/0x5a0 [19521409.614546] ? path_init+0x177/0x2f0 [19521409.619423] path_openat+0xe4/0x1770 [19521409.624282] ? ttwu_do_wakeup+0x1e/0x140 [19521409.629465] ? ttwu_do_activate+0x77/0x80 [19521409.634713] ? try_to_wake_up+0x59/0x480 [19521409.639864] do_filp_open+0x9b/0x110 [19521409.644638] ? __check_object_size+0xaf/0x1b0 [19521409.650176] ? path_get+0x27/0x30 [19521409.654652] do_sys_open+0x1bb/0x2c0 [19521409.659372] ? do_sys_open+0x1bb/0x2c0 [19521409.664254] SyS_openat+0x14/0x20 [19521409.668677] do_syscall_64+0x73/0x130 [19521409.673479] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [19521409.679602] RIP: 0033:0x4a5c9a [19521409.683706] RSP: 002b:000000c000304ab0 EFLAGS: 00000202 ORIG_RAX: 0000000000000101 [19521409.692316] RAX: ffffffffffffffda RBX: 000000c00002f400 RCX: 00000000004a5c9a [19521409.700483] RDX: 0000000000080000 RSI: 000000c00175a020 RDI: ffffffffffffff9c [19521409.708640] RBP: 000000c000304b28 R08: 0000000000000000 R09: 0000000000000000 [19521409.716806] R10: 0000000000000000 R11: 0000000000000202 R12: ffffffffffffffff [19521409.724962] R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000100 [19521409.733145] Code: 83 ec 08 0f 0d 8f 80 05 00 00 e8 87 ff ff ff 48 85 c0 74 10 48 89 c7 48 89 45 f8 e8 56 d4 ff ff 48 8b 45 f8 c9 c3 0f 1f 44 00 00 46 0c 02 0f 85 9b 00 00 00 83 7e 04 ff 0f 84 91 00 00 00 83 [19521409.753799] RIP: __atime_needs_update+0x5/0x190 RSP: ffff9dee09c2fc48 [19521409.761228] CR2: 000000000000000c ``` In the coredump, we try to figure out how this NULL pointer Oops happen. It looks like when the program `node-exporter` tries to access `/proc/net/sockstat`, when `walk_component()` the `/proc/net`, it got a dentry which `d_inode` is NULL while other fields have data. ``` struct dentry { ... d_name = { { { hash = 2805607892, len = 3 }, hash_len = 15690509780 }, name = 0xffff8a4281d01ef8 "net" }, struct inode *d_inode = 0x0 <======= d_inode is NULL and cause Oops! -> NULL d_iname = "net\000:01:00.0\000\000sage_in_bytes\000B\212\377", ... ``` We extra the nameidata from the crash dump as well, `link_inode` is NULL, looks like either `lookup_slow` or `lookup_fast` return a dentry which `inode` is NULL while other fields look normal. ``` struct nameidata { last = { { { hash = 2805607892, len = 3 }, hash_len = 15690509780 }, name = 0xffff8a34397b4022 "net/sockstat" }, struct filename *name = 0xffff8a34397b4000 -> { name = 0xffff8a34397b401c "/proc/net/sockstat", uptr = 0xc00175a020
, aname = 0xffff8a3b2f3c9860, refcnt = 2, iname = 0xffff8a34397b401c "/proc/net/sockstat" } struct nameidata *saved = 0x0 -> NULL struct inode *link_inode = 0x0 <======= link_inode is NULL as well! -> NULL } ``` We try to reproduce this question at the beginning, however, it looks difficult to reproduce. We keep running `while true; do cat /proc/net/sockstat; done`, but could not reproduce so far. In the past year, we only found two similar crashes in thousands of servers in our production. By right `link_inode` should always have values according to our tiny bpftrace program result. ``` # /tmp/trace_walk_component.bt kprobe:walk_component { $p=((struct nameidata*) arg0); printf("nameidata->last.name: %s, nameidata->link_inode: %p\n", str($p->last.name), $p->link_inode); } ``` ``` # Output nameidata->last.name: net/sockstat, nameidata->link_inode: 0xffffffffab299966 nameidata->last.name: net/sockstat, nameidata->link_inode: 0xffff9a4efe813ab8 nameidata->last.name: net/sockstat, nameidata->link_inode: 0xffffffffab299966 nameidata->last.name: net/sockstat, nameidata->link_inode: 0xffff9a4efe813ab8 nameidata->last.name: net/sockstat, nameidata->link_inode: 0xffffffffab299966 nameidata->last.name: net/sockstat, nameidata->link_inode: 0xffffffffab299966 ``` We try to search in past kernel threads, could not find a similar crash yet, but could find a similar case in another user's blog https://utcc.utoronto.ca/~cks/space/blog/linux/Ubuntu1804OddKernelPanic . However, in that blog, the user didn't figure out the reason as well although their crash stack same as us exactly. Is this a known bug that makes dentry become corrupt? Because we could not reproduce this issue so far, it is difficult to verify if this is fixed in mainline. So we write this email to see if any insights from other Linux developers, any replies would be appreciated. Thank you in advanace. Attach files are the dentry and nameidata which extra from the core dump, not sure if there are helpful to check this Oops. -- Best Regards, Haosdent Huang