All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Woithe <jwoithe@just42.net>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Bruce Fields <bfields@fieldses.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [Bug report] Recurring oops, 5.15.x, possibly during or soon after client mount
Date: Mon, 17 Jan 2022 18:14:30 +1030	[thread overview]
Message-ID: <20220117074430.GA22026@marvin.atrad.com.au> (raw)
In-Reply-To: <1E71316C-9EE8-4C71-ADA1-71E2910CA070@oracle.com>

Hi Chuck

On Sun, Jan 16, 2022 at 10:30:43PM +0000, Chuck Lever III wrote:
> > On Jan 16, 2022, at 5:06 PM, Jonathan Woithe <jwoithe@just42.net> wrote:
> > 
> > On Sun, Jan 16, 2022 at 07:53:36AM +1030, Jonathan Woithe wrote:
> >> On Sat, Jan 15, 2022 at 07:46:06PM +0000, Chuck Lever III wrote:
> >>>> On Jan 15, 2022, at 3:14 AM, Jonathan Woithe <jwoithe@just42.net> wrote:
> >>>> On Fri, Jan 14, 2022 at 03:18:01PM +0000, Chuck Lever III wrote:
> >>>>>> Recently we migrated an NFS server from a 32-bit environment running 
> >>>>>> kernel 4.14.128 to a 64-bit 5.15.x kernel.  The NFS configuration remained
> >>>>>> unchanged between the two systems.
> >>>>>> 
> >>>>>> On two separate occasions since the upgrade (5 Jan under 5.15.10, 14 Jan
> >>>>>> under 5.15.12) the kernel has oopsed at around the time that an NFS client
> >>>>>> machine is turned on for the day.  On both occasions the call trace was
> >>>>>> essentially identical.  The full oops sequence is at the end of this email. 
> >>>>>> The oops was not observed when running the 4.14.128 kernel.
> >>>>>> 
> >>>>>> Is there anything more I can provide to help track down the cause of the
> >>>>>> oops?
> >>>>> 
> >>>>> A possible culprit is 7f024fcd5c97 ("Keep read and write fds with each
> >>>>> nlm_file"), which was introduced in or around v5.15.  You could try a
> >>>>> simple test and back the server down to v5.14.y to see if the problem
> >>>>> persists.
> > 
> > FYI I have now put the kernel.org 5.14.21 kernel on the affected system and
> > booted it.  Since the oops has taken between 1 and 2 weeks to be triggered
> > in the past, we may have to wait a few weeks to be certain of an outcome. 
> > If there's anything else you need from me in the interim please ask.
> 
> If you identify a particular client that triggers the issue, it would be
> helpful to know:
> 
> - The client's kernel version
> - What was running on the client before it was shut down
> - Whether the application and client shut down was clean

I have been able to identify the client involved.  It was the same client
on both occasions.  That client is running the 4.4.14 kernel.

Unfortunately I have no way to determine what was running on the client when
it was shut down.  However, the logs to tell me that the client was NOT
cleanly shut down prior to both oopses being triggered on the server with
the next boot.  These are the only times when the client wasn't shut down
cleanly; the client WAS shut down cleanly on every other day since 23
December (when the server was moved to the 5.15.x kernel).  It is therefore
possible that the server oops was triggered only when the client was not
shut down cleanly.

I will ask the user if they remember anything happening differently on the
days of the server oops.  My suspicion is that there wasn't anything, and
that the power to the bench which supplies the client was turned off
accidently before shutting the computer down.  We have a new staff member
who knows the correct procedure, but maybe they forgot on a couple of
occasions.  If this is the case the PC is unlikely to have had much running
at the time of the shutdown.  The xfce4 desktop is perhaps a given.  Other
possibilities are firefox, thunderbird and libreoffce.

With the server running 5.14.21, I did a reset of the client (that is,
unclean shutdown) just before I left this evening.  The server did not oops
when the client was rebooted a minute or so later.  I will see if I can
repeat the test with 5.15.12 tomorrow morning before others get in if you
think that will be helpful in light of the above observations.

Regards
  jonathan

  reply	other threads:[~2022-01-17  7:44 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-14 10:39 [Bug report] Recurring oops, 5.15.x, possibly during or soon after client mount Jonathan Woithe
2022-01-14 15:18 ` Chuck Lever III
2022-01-15  8:14   ` Jonathan Woithe
2022-01-15 19:46     ` Chuck Lever III
2022-01-15 21:23       ` Jonathan Woithe
2022-01-16 22:06         ` Jonathan Woithe
2022-01-16 22:30           ` Chuck Lever III
2022-01-17  7:44             ` Jonathan Woithe [this message]
2022-01-17 22:08               ` Jonathan Woithe
2022-01-17 22:11                 ` Bruce Fields
2022-01-18 22:00                   ` [PATCH 1/2] lockd: fix server crash on reboot of client holding lock Bruce Fields
2022-01-18 22:00                     ` [PATCH 2/2] lockd: fix failure to cleanup client locks Bruce Fields
2022-01-18 22:20                     ` [PATCH 1/2] lockd: fix server crash on reboot of client holding lock Jonathan Woithe
2022-01-18 22:27                       ` Bruce Fields
2022-03-23 23:33                         ` Jonathan Woithe
2022-03-24 18:28                           ` Bruce Fields
2022-01-19 16:18                     ` Chuck Lever III
2022-01-31 22:20                       ` Jonathan Woithe
2022-02-01  2:10                         ` Chuck Lever III
2022-01-17 15:50       ` [Bug report] Recurring oops, 5.15.x, possibly during or soon after client mount Bruce Fields
2022-01-17 18:22         ` Chuck Lever III
2022-01-17 15:47   ` Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220117074430.GA22026@marvin.atrad.com.au \
    --to=jwoithe@just42.net \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.