All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Javorski <mike.javorski@gmail.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-nfs@vger.kernel.org
Subject: Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
Date: Sat, 14 Aug 2021 18:23:46 -0700	[thread overview]
Message-ID: <CAOv1SKDDOj5UeUwztrMSNJnLgSoEgD8OU55hqtLHffHvaCQzzA@mail.gmail.com> (raw)
In-Reply-To: <CAOv1SKB_dsam7P9pzzh_SKCtA8uE9cyFdJ=qquEfhLT42-szPA@mail.gmail.com>

I managed to get a cap with several discreet freezes in it, and I
included a chunk with 5 of them in the span of ~3000 frames. I added
packet comments at each frame that the tshark command reported as > 1
sec RPC wait. Just search for "Freeze" in (wire|t)shark in packet
details. This is with kernel 5.13.10 provided by Arch (See
https://github.com/archlinux/linux/compare/a37da2be8e6c85...v5.13.10-arch1
for diff vs mainline, nothing NFS/RPC related I can identify).

I tried unsuccessfully to get any failures with the 5.12.15 kernel.

https://drive.google.com/file/d/1T42iX9xCdF9Oe4f7JXsnWqD8oJPrpMqV/view?usp=sharing

File should be downloadable anonymously.

- mike

On Thu, Aug 12, 2021 at 7:53 PM Mike Javorski <mike.javorski@gmail.com> wrote:
>
> The "semi-known-good" has been the client. I tried updating the server
> multiple times to a 5.13 kernel and each time had to downgrade to the
> last 5.12 kernel that ArchLinux released (5.12.15) to stabilize
> performance. At each attempt, the client was running the same 5.13
> kernel that was being deployed to the server. I never downgraded the
> client.
>
> Thank you for the scripts and all the details, I will test things out
> this weekend when I can dedicate time to it.
>
> - mike
>
> On Thu, Aug 12, 2021 at 7:39 PM NeilBrown <neilb@suse.de> wrote:
> >
> > On Fri, 13 Aug 2021, Mike Javorski wrote:
> > > Neil:
> > >
> > > Apologies for the delay, your message didn't get properly flagged in my email.
> >
> > :-)
> >
> > >
> > > To answer your questions, both client (my Desktop PC) and server (my
> > > NAS) are running ArchLinux; client w/ current kernel (5.13.9), server
> > > w/ current or alternate testing kernels (see below).
> >
> > So the bug could be in the server or the client.  I assume you are
> > careful to test a client against a know-good server, or a server against
> > a known-good client.
> >
> > >                                                                 I
> > > intend to spend some time this weekend attempting to get the tcpdump.
> > > My initial attempts wound up with 400+Mb files which would be
> > > difficult to ship and use for diagnostics.
> >
> > Rather than you sending me the dump, I'll send you the code.
> >
> > Run
> >   tshark -r filename -d tcp.port==2049,rpc -Y 'tcp.port==2049 && rpc.time > 1'
> >
> > This will ensure the NFS traffic is actually decoded as NFS and then
> > report only NFS(rpc) replies that arrive more than 1 second after the
> > request.
> > You can add
> >
> >     -T fields -e frame.number -e rpc.time
> >
> > to find out what the actual delay was.
> >
> > If it reports any, that will be interesting.  Try with a larger time if
> > necessary to get a modest number of hits.  Using editcap and the given
> > frame number you can select out 1000 packets either side of the problem
> > and that should compress to be small enough to transport.
> >
> > However it might not find anything.  If the reply never arrives, you'll
> > never get a reply with a long timeout.  So we need to check that
> > everything got a reply...
> >
> >  tshark -r filename -t tcp.port==2049,rpc  \
> >    -Y 'tcp.port==2049 && rpc.msg == 0' -T fields \
> >    -e rpc.xid -e frame.number | sort > /tmp/requests
> >
> >  tshark -r filename -t tcp.port==2049,rpc  \
> >    -Y 'tcp.port==2049 && rpc.msg == 1' -T fields \
> >    -e rpc.xid -e frame.number | sort > /tmp/replies
> >
> >  join -a1 /tmp/requests /tmp/replies | awk 'NF==2'
> >
> > This should list the xid and frame number of all requests that didn't
> > get a reply.  Again, editcap can extract a range of frames into a file of
> > manageable size.
> >
> > Another possibility is that requests are getting replies, but the reply
> > says "NFS4ERR_DELAY"
> >
> >  tshark -r filename -t tcp.port==2049,rpc -Y nfs.nfsstat4==10008
> >
> > should report any reply with that error code.
> >
> > Hopefully something there will be interesting.
> >
> > NeilBrown

  reply	other threads:[~2021-08-15  1:24 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-08 22:37 NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
2021-08-08 22:47 ` Chuck Lever III
2021-08-08 23:23   ` Mike Javorski
2021-08-09  0:01 ` NeilBrown
2021-08-09  0:28   ` Mike Javorski
2021-08-10  0:50     ` Mike Javorski
2021-08-10  1:28       ` NeilBrown
2021-08-10 11:54         ` Daire Byrne
2021-08-13  1:51         ` Mike Javorski
2021-08-13  2:39           ` NeilBrown
2021-08-13  2:53             ` Mike Javorski
2021-08-15  1:23               ` Mike Javorski [this message]
2021-08-16  1:20                 ` NeilBrown
2021-08-16 13:21                   ` Chuck Lever III
2021-08-16 16:25                     ` Mike Javorski
2021-08-16 23:01                       ` NeilBrown
2021-08-20  0:31                         ` NeilBrown
2021-08-20  0:52                           ` Mike Javorski
2021-08-22  0:17                             ` Mike Javorski
2021-08-22  3:41                               ` NeilBrown
2021-08-22  4:05                                 ` Mike Javorski
2021-08-22 22:00                                   ` NeilBrown
2021-08-26 19:34                                     ` Mike Javorski
2021-08-26 21:44                                       ` NeilBrown
2021-08-27  0:07                                         ` Mike Javorski
2021-08-27  5:27                                           ` NeilBrown
2021-08-27  6:11                                             ` Mike Javorski
2021-08-27  7:14                                               ` NeilBrown
2021-08-27 14:13                                                 ` Chuck Lever III
2021-08-27 17:07                                                   ` Mike Javorski
2021-08-27 22:00                                                     ` Mike Javorski
2021-08-27 23:49                                                       ` Chuck Lever III
2021-08-28  3:22                                                         ` Mike Javorski
2021-08-28 18:23                                                           ` Chuck Lever III
2021-08-29 22:28                                                             ` [PATCH] MM: clarify effort used in alloc_pages_bulk_*() NeilBrown
2021-08-30  9:11                                                               ` Mel Gorman
2021-08-29 22:36                                                             ` [PATCH] SUNRPC: don't pause on incomplete allocation NeilBrown
2021-08-30  9:12                                                               ` Mel Gorman
2021-08-30 20:46                                                               ` J. Bruce Fields
2021-09-04 17:41                                                             ` NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
2021-09-05  2:02                                                               ` Chuck Lever III
2021-09-16  2:45                                                                 ` Mike Javorski
2021-09-16 18:58                                                                   ` Chuck Lever III
2021-09-16 19:21                                                                     ` Mike Javorski
2021-09-17 14:41                                                                       ` J. Bruce Fields
2021-08-16 16:09                   ` Mike Javorski
2021-08-16 23:04                     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOv1SKDDOj5UeUwztrMSNJnLgSoEgD8OU55hqtLHffHvaCQzzA@mail.gmail.com \
    --to=mike.javorski@gmail.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.