All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bruce Fields <bfields@fieldses.org>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Jonathan Woithe <jwoithe@just42.net>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [Bug report] Recurring oops, 5.15.x, possibly during or soon after client mount
Date: Mon, 17 Jan 2022 10:47:33 -0500	[thread overview]
Message-ID: <20220117154733.GC28708@fieldses.org> (raw)
In-Reply-To: <C7A57602-4DDD-4952-BA38-03F819DBD296@oracle.com>

On Fri, Jan 14, 2022 at 03:18:01PM +0000, Chuck Lever III wrote:
> Hi Jonathan-
> 
> > On Jan 14, 2022, at 5:39 AM, Jonathan Woithe <jwoithe@just42.net> wrote:
> > 
> > Hi all
> > 
> > Recently we migrated an NFS server from a 32-bit environment running 
> > kernel 4.14.128 to a 64-bit 5.15.x kernel.  The NFS configuration remained
> > unchanged between the two systems.
> > 
> > On two separate occasions since the upgrade (5 Jan under 5.15.10, 14 Jan
> > under 5.15.12) the kernel has oopsed at around the time that an NFS client
> > machine is turned on for the day.  On both occasions the call trace was
> > essentially identical.  The full oops sequence is at the end of this email. 
> > The oops was not observed when running the 4.14.128 kernel.
> > 
> > Is there anything more I can provide to help track down the cause of the
> > oops?
> 
> A possible culprit is 7f024fcd5c97 ("Keep read and write fds with each nlm_file"),
> which was introduced in or around v5.15.

Almost definitely it, yeah.

We should really have nlm reboot tests.  I test nlm and v4 reboot but
not nlm reboot....

> You could try a simple test and back
> the server down to v5.14.y to see if the problem persists.
> 
> Otherwise, Bruce, can you have a look at this?

Yep, just catching up....

Given my lack of nlm reboot testing (sorry) I wouldn't be suprised if
it's reproduceable with something really simple, like: take a lock, then
restart the client (so that it notifies the server).  Could still be
rare in production if rebooting while holding a lock is rare.

--b.

> 
> 
> > Regards
> >  jonathan
> > 
> > Oops under 5.15.12:
> > 
> > Jan 14 08:48:30 nfssvr kernel: BUG: kernel NULL pointer dereference, address: 0000000000000110
> > Jan 14 08:48:30 nfssvr kernel: #PF: supervisor read access in kernel mode
> > Jan 14 08:48:30 nfssvr kernel: #PF: error_code(0x0000) - not-present page
> > Jan 14 08:48:30 nfssvr kernel: Oops: 0000 [#1] PREEMPT SMP PTI
> > Jan 14 08:48:30 nfssvr kernel: CPU: 0 PID: 2935 Comm: lockd Not tainted 5.15.12 #1
> > Jan 14 08:48:30 nfssvr kernel: Hardware name:  /DG31PR, BIOS PRG3110H.86A.0038.2007.1221.1757 12/21/2007
> > Jan 14 08:48:30 nfssvr kernel: RIP: 0010:vfs_lock_file+0x5/0x30
> > Jan 14 08:48:30 nfssvr kernel: Code: ff ff 41 89 c4 85 c0 0f 84 42 ff ff ff e9 f8 fe ff ff 0f 0b e8 2c bc d2 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 <48> 8b 47 28 49 89 d0 48 8b 80 98 00 00 00 48 85 c0 74 05 e9 f3 dc
> > Jan 14 08:48:30 nfssvr kernel: RSP: 0018:ffffa478401a3c38 EFLAGS: 00010246
> > Jan 14 08:48:30 nfssvr kernel: RAX: 7fffffffffffffff RBX: 00000000000000e8 RCX: 0000000000000000
> > Jan 14 08:48:30 nfssvr kernel: RDX: ffffa478401a3c40 RSI: 0000000000000006 RDI: 00000000000000e8
> > Jan 14 08:48:30 nfssvr kernel: RBP: ffff946ead1ecc00 R08: ffff946f88ab1000 R09: ffff946f88b33a00
> > Jan 14 08:48:30 nfssvr kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa657ff30
> > Jan 14 08:48:30 nfssvr kernel: R13: ffff946e99df7c40 R14: ffff946e82fb0510 R15: ffff946ead1ecc00
> > Jan 14 08:48:30 nfssvr kernel: FS:  0000000000000000(0000) GS:ffff946fabc00000(0000) knlGS:0000000000000000
> > Jan 14 08:48:30 nfssvr kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Jan 14 08:48:30 nfssvr kernel: CR2: 0000000000000110 CR3: 000000010083a000 CR4: 00000000000006f0
> > Jan 14 08:48:30 nfssvr kernel: Call Trace:
> > Jan 14 08:48:30 nfssvr kernel:  <TASK>
> > Jan 14 08:48:30 nfssvr kernel:  nlm_unlock_files+0x6e/0xb0
> > Jan 14 08:48:30 nfssvr kernel:  ? __skb_recv_udp+0x198/0x330
> > Jan 14 08:48:30 nfssvr kernel:  ? _raw_spin_lock+0x13/0x2e
> > Jan 14 08:48:30 nfssvr kernel:  ? nlmsvc_traverse_blocks+0x36/0x120
> > Jan 14 08:48:30 nfssvr kernel:  ? preempt_count_add+0x68/0xa0
> > Jan 14 08:48:30 nfssvr kernel:  nlm_traverse_files+0x152/0x280
> > Jan 14 08:48:30 nfssvr kernel:  nlmsvc_free_host_resources+0x27/0x40
> > Jan 14 08:48:30 nfssvr kernel:  nlm_host_rebooted+0x23/0x90
> > Jan 14 08:48:30 nfssvr kernel:  nlmsvc_proc_sm_notify+0xae/0x110
> > Jan 14 08:48:30 nfssvr kernel:  ? nlmsvc_decode_reboot+0x8b/0xc0
> > Jan 14 08:48:30 nfssvr kernel:  nlmsvc_dispatch+0x89/0x180
> > Jan 14 08:48:30 nfssvr kernel:  svc_process_common+0x3ce/0x6f0
> > Jan 14 08:48:30 nfssvr kernel:  ? lockd_inet6addr_event+0xf0/0xf0
> > Jan 14 08:48:30 nfssvr kernel:  svc_process+0xb7/0xf0
> > Jan 14 08:48:30 nfssvr kernel:  lockd+0xca/0x1b0
> > Jan 14 08:48:30 nfssvr kernel:  ? preempt_count_add+0x68/0xa0
> > Jan 14 08:48:30 nfssvr kernel:  ? _raw_spin_lock_irqsave+0x19/0x40
> > Jan 14 08:48:30 nfssvr kernel:  ? set_grace_period+0x90/0x90
> > Jan 14 08:48:30 nfssvr kernel:  kthread+0x141/0x170
> > Jan 14 08:48:30 nfssvr kernel:  ? set_kthread_struct+0x40/0x40
> > Jan 14 08:48:30 nfssvr kernel:  ret_from_fork+0x22/0x30
> > Jan 14 08:48:30 nfssvr kernel:  </TASK>
> > Jan 14 08:48:30 nfssvr kernel: Modules linked in: tun nf_nat_ftp nf_conntrack_ftp xt_REDIRECT xt_nat xt_conntrack xt_tcpudp xt_NFLOG nfnetlink_log nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables ipv6 hid_generic usbhid hi
> > Jan 14 08:48:30 nfssvr kernel: CR2: 0000000000000110
> > Jan 14 08:48:30 nfssvr kernel: ---[ end trace f8f28acee6f24340 ]---
> > Jan 14 08:48:30 nfssvr kernel: RIP: 0010:vfs_lock_file+0x5/0x30
> > Jan 14 08:48:30 nfssvr kernel: Code: ff ff 41 89 c4 85 c0 0f 84 42 ff ff ff e9 f8 fe ff ff 0f 0b e8 2c bc d2 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 <48> 8b 47 28 49 89 d0 48 8b 80 98 00 00 00 48 85 c0 74 05 e9 f3 dc
> > Jan 14 08:48:30 nfssvr kernel: RSP: 0018:ffffa478401a3c38 EFLAGS: 00010246
> > Jan 14 08:48:30 nfssvr kernel: RAX: 7fffffffffffffff RBX: 00000000000000e8 RCX: 0000000000000000
> > Jan 14 08:48:30 nfssvr kernel: RDX: ffffa478401a3c40 RSI: 0000000000000006 RDI: 00000000000000e8
> > Jan 14 08:48:30 nfssvr kernel: RBP: ffff946ead1ecc00 R08: ffff946f88ab1000 R09: ffff946f88b33a00
> > Jan 14 08:48:30 nfssvr kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa657ff30
> > Jan 14 08:48:30 nfssvr kernel: R13: ffff946e99df7c40 R14: ffff946e82fb0510 R15: ffff946ead1ecc00
> > Jan 14 08:48:30 nfssvr kernel: FS:  0000000000000000(0000) GS:ffff946fabc00000(0000) knlGS:0000000000000000
> > Jan 14 08:48:30 nfssvr kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Jan 14 08:48:30 nfssvr kernel: CR2: 0000000000000110 CR3: 000000010083a000 CR4: 00000000000006f0
> 
> --
> Chuck Lever
> 
> 

      parent reply	other threads:[~2022-01-17 15:47 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-14 10:39 [Bug report] Recurring oops, 5.15.x, possibly during or soon after client mount Jonathan Woithe
2022-01-14 15:18 ` Chuck Lever III
2022-01-15  8:14   ` Jonathan Woithe
2022-01-15 19:46     ` Chuck Lever III
2022-01-15 21:23       ` Jonathan Woithe
2022-01-16 22:06         ` Jonathan Woithe
2022-01-16 22:30           ` Chuck Lever III
2022-01-17  7:44             ` Jonathan Woithe
2022-01-17 22:08               ` Jonathan Woithe
2022-01-17 22:11                 ` Bruce Fields
2022-01-18 22:00                   ` [PATCH 1/2] lockd: fix server crash on reboot of client holding lock Bruce Fields
2022-01-18 22:00                     ` [PATCH 2/2] lockd: fix failure to cleanup client locks Bruce Fields
2022-01-18 22:20                     ` [PATCH 1/2] lockd: fix server crash on reboot of client holding lock Jonathan Woithe
2022-01-18 22:27                       ` Bruce Fields
2022-03-23 23:33                         ` Jonathan Woithe
2022-03-24 18:28                           ` Bruce Fields
2022-01-19 16:18                     ` Chuck Lever III
2022-01-31 22:20                       ` Jonathan Woithe
2022-02-01  2:10                         ` Chuck Lever III
2022-01-17 15:50       ` [Bug report] Recurring oops, 5.15.x, possibly during or soon after client mount Bruce Fields
2022-01-17 18:22         ` Chuck Lever III
2022-01-17 15:47   ` Bruce Fields [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220117154733.GC28708@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=jwoithe@just42.net \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.