All of lore.kernel.org
 help / color / mirror / Atom feed
From: "NeilBrown" <neilb@suse.de>
To: "Mike Javorski" <mike.javorski@gmail.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
Date: Mon, 16 Aug 2021 11:20:19 +1000	[thread overview]
Message-ID: <162907681945.1695.10796003189432247877@noble.neil.brown.name> (raw)
In-Reply-To: <CAOv1SKDDOj5UeUwztrMSNJnLgSoEgD8OU55hqtLHffHvaCQzzA@mail.gmail.com>

On Sun, 15 Aug 2021, Mike Javorski wrote:
> I managed to get a cap with several discreet freezes in it, and I
> included a chunk with 5 of them in the span of ~3000 frames. I added
> packet comments at each frame that the tshark command reported as > 1
> sec RPC wait. Just search for "Freeze" in (wire|t)shark in packet
> details. This is with kernel 5.13.10 provided by Arch (See
> https://github.com/archlinux/linux/compare/a37da2be8e6c85...v5.13.10-arch1
> for diff vs mainline, nothing NFS/RPC related I can identify).
> 
> I tried unsuccessfully to get any failures with the 5.12.15 kernel.
> 
> https://drive.google.com/file/d/1T42iX9xCdF9Oe4f7JXsnWqD8oJPrpMqV/view?usp=sharing
> 
> File should be downloadable anonymously.

Got it, thanks.

There are 3 RPC replies that are more than 1 second after the request.
The replies are in frames 983, 1005, 1777 These roughly correspond to
where you added the "Freeze" annotation (I didn't know you could do that!).

983:
  This is the (Start of the) reply to a READ Request in frame 980.
  It is a 128K read.  The whole reply has arrived by frame 1004, 2ms
  later.
  The request (Frame 980) is followed 13ms later by a TCP retransmit
  of the request and the (20usec later) a TCP ACK from the server.

  The fact that the client needed to retransmit seems a little odd
  but as it is the only retransmit in the whole capture, I don't think
  we can read much into it.

1005:
  This is the reply to a 128K READ request in frame 793 - earlier than
  previous one.
  So there were two READ requests, then a 2 second delay then both
  replies in reverse order.

1777:
  This is a similar READ reply - to 1761.
  There were READs in 1760, 1761, and 1775
  1760 is replied to almost immediately
  1761 gets a reply in 4 seconds (1777)
  1775 never gets a reply (in the available packet capture).

Looking at other delays ... most READs get a reply in under a millisec.
There are about 20 where the reply is more than 1ms - the longest delay
not already mentioned is 90ms with reply 1857.
The pattern here is 
   READ req (1)
   GETATTR req
   GETATTR reply
   READ req (2)
   READ reply (1)
  pause
   READ reply (2)

I suspect this is the same problem occurring, but it isn't so
noticeable.

My first thought was that the reply might be getting stuck in the TCP
transmit queue on the server, but checking the TSval in the TCP
timestamp option shows that - for frame 983 which shows a 2second delay
- the TSval is also 2seconds later than the previous packet.  So the
delay happens before the TCP-level decision to create the packet.

So I cannot see any real evidence to suggest a TCP-level problem.
The time of 2 or 4 seconds - and maybe even 90ms - seem unlikely to be
caused by an NFSd problem.

So my guess is that the delay comes from the filesystem.  Maybe.
What filesystem are you exporting?

How can we check this? Probably by turning on nfsd tracing.
There are a bunch of tracepoints that related to reading:

	trace_nfsd_read_start
	trace_nfsd_read_done
	trace_nfsd_read_io_done
	trace_nfsd_read_err
	trace_nfsd_read_splice
	trace_nfsd_read_vector
	trace_nfsd_read_start
	trace_nfsd_read_done

Maybe enabling them might be useful as you should be able to see if the
delay was within one read request, or between two read requests.
But I don't have much (if any) experience in enabling trace points.  I
really should try that some day.  Maybe you can find guidance on using
these tracepoint somewhere ... or maybe you already know how :-)

NeilBrown

  reply	other threads:[~2021-08-16  1:20 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-08 22:37 NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
2021-08-08 22:47 ` Chuck Lever III
2021-08-08 23:23   ` Mike Javorski
2021-08-09  0:01 ` NeilBrown
2021-08-09  0:28   ` Mike Javorski
2021-08-10  0:50     ` Mike Javorski
2021-08-10  1:28       ` NeilBrown
2021-08-10 11:54         ` Daire Byrne
2021-08-13  1:51         ` Mike Javorski
2021-08-13  2:39           ` NeilBrown
2021-08-13  2:53             ` Mike Javorski
2021-08-15  1:23               ` Mike Javorski
2021-08-16  1:20                 ` NeilBrown [this message]
2021-08-16 13:21                   ` Chuck Lever III
2021-08-16 16:25                     ` Mike Javorski
2021-08-16 23:01                       ` NeilBrown
2021-08-20  0:31                         ` NeilBrown
2021-08-20  0:52                           ` Mike Javorski
2021-08-22  0:17                             ` Mike Javorski
2021-08-22  3:41                               ` NeilBrown
2021-08-22  4:05                                 ` Mike Javorski
2021-08-22 22:00                                   ` NeilBrown
2021-08-26 19:34                                     ` Mike Javorski
2021-08-26 21:44                                       ` NeilBrown
2021-08-27  0:07                                         ` Mike Javorski
2021-08-27  5:27                                           ` NeilBrown
2021-08-27  6:11                                             ` Mike Javorski
2021-08-27  7:14                                               ` NeilBrown
2021-08-27 14:13                                                 ` Chuck Lever III
2021-08-27 17:07                                                   ` Mike Javorski
2021-08-27 22:00                                                     ` Mike Javorski
2021-08-27 23:49                                                       ` Chuck Lever III
2021-08-28  3:22                                                         ` Mike Javorski
2021-08-28 18:23                                                           ` Chuck Lever III
2021-08-29 22:28                                                             ` [PATCH] MM: clarify effort used in alloc_pages_bulk_*() NeilBrown
2021-08-30  9:11                                                               ` Mel Gorman
2021-08-29 22:36                                                             ` [PATCH] SUNRPC: don't pause on incomplete allocation NeilBrown
2021-08-30  9:12                                                               ` Mel Gorman
2021-08-30 20:46                                                               ` J. Bruce Fields
2021-09-04 17:41                                                             ` NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
2021-09-05  2:02                                                               ` Chuck Lever III
2021-09-16  2:45                                                                 ` Mike Javorski
2021-09-16 18:58                                                                   ` Chuck Lever III
2021-09-16 19:21                                                                     ` Mike Javorski
2021-09-17 14:41                                                                       ` J. Bruce Fields
2021-08-16 16:09                   ` Mike Javorski
2021-08-16 23:04                     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=162907681945.1695.10796003189432247877@noble.neil.brown.name \
    --to=neilb@suse.de \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mike.javorski@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.