On 2019-12-29, Linus Torvalds wrote: > On Sun, Dec 29, 2019 at 11:30 PM Aleksa Sarai wrote: > > > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > > Would you mind building with debug info, and then running the oops through > > scripts/decode_stacktrace.sh > > which makes those addresses much more legible. Will do. > > #PF: supervisor instruction fetch in kernel mode > > #PF: error_code(0x0010) - not-present page > > Somebody jumped through a NULL pointer. > > > RAX: 0000000000000000 RBX: ffff906d0cc3bb40 RCX: 0000000000000abc > > RDX: 0000000000000089 RSI: ffff906d74623cc0 RDI: ffff906d74475df0 > > RBP: ffff906d74475df0 R08: ffffd70b7fb24c20 R09: ffff906d066a5000 > > R10: 0000000000000000 R11: 8080807fffffffff R12: ffff906d74623cc0 > > R13: 0000000000000089 R14: ffffb70b82963dc0 R15: 0000000000000080 > > FS: 00007fbc2a8f0540(0000) GS:ffff906dcf500000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: ffffffffffffffd6 CR3: 00000003c68f8001 CR4: 00000000003606e0 > > Call Trace: > > __lookup_slow+0x94/0x160 > > And "__lookup_slow()" has two indirect calls (they aren't obvious with > retpoline, but look for something like > > call __x86_indirect_thunk_rax > > which is the modern sad way of doing "call *%rax"). One is for > revalidatinging an old dentry, but the one I _suspect_ you trigger is > this one: > > old = inode->i_op->lookup(inode, dentry, flags); > > but I thought we only could get here if we know it's a directory. > > How did we miss the "d_can_lookup()", which is what should check that > yes, we can call that ->lookup() routine. I'll try applying a trivial patch to add d_can_lookup() to see if it fixes the immediate issue. > This is why I have that suspicion that it's somehow that O_PATH fd > opened in another process without O_PATH causes confusion... > > So what I think has happened is that because of the O_PATH thing, > we've ended up with an inode that has never been truly opened (because > O_PATH skips that part), but then with the /proc//fd/xyz open, we > now have a file descriptor that _looks_ like it is valid, and we're > treating that inode as if it can be used. I'm not sure I agree -- as I mentioned in my other mail, re-opening through /proc/self/fd/$n works *very* well and has for a long time (in fact, both LXC and runc depend on this working). -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH