All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Klaassen <andrew.klaassen@boatrocker.com>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: RE: Trying to reduce NFSv4 timeouts to a few seconds on an established connection
Date: Mon, 23 Jan 2023 16:41:59 +0000	[thread overview]
Message-ID: <YQBPR01MB107243E2377AA2B52379225D486C89@YQBPR01MB10724.CANPRD01.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <A00E49F5-9BD7-498F-95A9-FA2D5572694F@oracle.com>

> From: Chuck Lever III <chuck.lever@oracle.com>
> Sent: Monday, January 23, 2023 11:35 AM
> 
> > On Jan 23, 2023, at 11:31 AM, Andrew Klaassen
> <andrew.klaassen@boatrocker.com> wrote:
> >
> > Hello,
> >
> > There's a specific NFSv4 mount on a specific machine which we'd like to
> timeout and return an error after a few seconds if the server goes away.
> >
> > I've confirmed the following on two different kernels, 4.18.0-
> 348.12.2.el8_5.x86_64 and 6.1.7-200.fc37.x86_64.
> >
> > I've been able to get both autofs and the mount command to cooperate,
> so that the mount attempt fails after an arbitrary number of seconds.  This
> mount command, for example, will fail after 6 seconds, as expected based on
> the timeo=20,retrans=2,retry=0 options:
> >
> > $ time sudo mount -t nfs4 -o
> > rw,relatime,sync,vers=4.2,rsize=131072,wsize=131072,namlen=255,acregmi
> >
> n=0,acregmax=0,acdirmin=0,acdirmax=0,soft,noac,proto=tcp,timeo=20,retr
> > ans=2,retry=0,sec=sys thor04:/mnt/thorfs04  /mnt/thor04
> > mount.nfs4: Connection timed out
> >
> > real    0m6.084s
> > user    0m0.007s
> > sys     0m0.015s
> >
> > However, if the share is already mounted and the server goes away, the
> timeout is always 2 minutes plus the time I expect based on timeo and
> retrans.  In this case, 2 minutes and 6 seconds:
> >
> > $ time ls /mnt/thor04
> > ls: cannot access '/mnt/thor04': Connection timed out
> >
> > real    2m6.025s
> > user    0m0.003s
> > sys     0m0.000s
> >
> > Watching the outgoing packets in the second case, the pattern is always
> the same:
> > - 0.2 seconds between the first two, then doubling each time until the two
> minute mark is exceeded (so the last NFS packet, which is always the 11th
> packet, is sent around 1:45 after the first).
> > - Then some generic packets that start exactly-ish on the two minute
> > mark, 1 second between the first two, then doubling each time.  (By
> > this time the NFS command has given up.)
> >
> > 11:10:21.898305 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> > 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834889483 ecr
> > 1589769203], length 200: NFS request xid 3614904256 196 getattr fh
> > 0,2/53
> > 11:10:22.105189 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> > 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834889690 ecr
> > 1589769203], length 200: NFS request xid 3614904256 196 getattr fh
> > 0,2/53
> > 11:10:22.313290 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> > 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834889898 ecr
> > 1589769203], length 200: NFS request xid 3614904256 196 getattr fh
> > 0,2/53
> > 11:10:22.721269 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> > 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834890306 ecr
> > 1589769203], length 200: NFS request xid 3614904256 196 getattr fh
> > 0,2/53
> > 11:10:23.569192 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> > 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834891154 ecr
> > 1589769203], length 200: NFS request xid 3614904256 196 getattr fh
> > 0,2/53
> > 11:10:25.233212 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> > 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834892818 ecr
> > 1589769203], length 200: NFS request xid 3614904256 196 getattr fh
> > 0,2/53
> > 11:10:28.497282 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> > 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834896082 ecr
> > 1589769203], length 200: NFS request xid 3614904256 196 getattr fh
> > 0,2/53
> > 11:10:35.025219 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> > 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834902610 ecr
> > 1589769203], length 200: NFS request xid 3614904256 196 getattr fh
> > 0,2/53
> > 11:10:48.337201 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> > 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834915922 ecr
> > 1589769203], length 200: NFS request xid 3614904256 196 getattr fh
> > 0,2/53
> > 11:11:14.449303 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> > 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834942034 ecr
> > 1589769203], length 200: NFS request xid 3614904256 196 getattr fh
> > 0,2/53
> > 11:12:08.721251 IP 10.30.13.2.916 > 10.31.3.13.2049: Flags [P.], seq
> > 14452:14652, ack 18561, win 501, options [nop,nop,TS val 834996306 ecr
> > 1589769203], length 200: NFS request xid 3614904256 196 getattr fh
> > 0,2/53
> > 11:12:22.545394 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq
> > 1375256951, win 64240, options [mss 1460,sackOK,TS val 835010130 ecr
> > 0,nop,wscale 7], length 0
> > 11:12:23.570199 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq
> > 1375256951, win 64240, options [mss 1460,sackOK,TS val 835011155 ecr
> > 0,nop,wscale 7], length 0
> > 11:12:25.617284 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq
> > 1375256951, win 64240, options [mss 1460,sackOK,TS val 835013202 ecr
> > 0,nop,wscale 7], length 0
> > 11:12:29.649219 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq
> > 1375256951, win 64240, options [mss 1460,sackOK,TS val 835017234 ecr
> > 0,nop,wscale 7], length 0
> > 11:12:37.905274 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq
> > 1375256951, win 64240, options [mss 1460,sackOK,TS val 835025490 ecr
> > 0,nop,wscale 7], length 0
> > 11:12:54.289212 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq
> > 1375256951, win 64240, options [mss 1460,sackOK,TS val 835041874 ecr
> > 0,nop,wscale 7], length 0
> > 11:13:26.545304 IP 10.30.13.2.942 > 10.31.3.13.2049: Flags [S], seq
> > 1375256951, win 64240, options [mss 1460,sackOK,TS val 835074130 ecr
> > 0,nop,wscale 7], length 0
> >
> > I tried changing tcp_retries2 as suggested in another thread from this list:
> >
> > # echo 3 > /proc/sys/net/ipv4/tcp_retries2
> >
> > ...but it made no difference on either kernel.  The 2 minute timeout also
> doesn't seem to match with what I'd calculate from the initial value of
> tcp_retries2, which should give a much higher timeout.
> >
> > The only clue I've been able to find is in the retry=n entry in the NFS
> manpage:
> >
> > " For TCP the default is 3 minutes, but system TCP connection timeouts will
> sometimes limit the timeout of each retransmission to around 2 minutes."
> >
> > What I'm not able to make sense of:
> > - The retry option says that it applies to mount operations, not read/write
> operations.  However, in this case I'm seeing the 2 minute delay on
> read/write operations but *not* mount operations.
> > - A couple of hours of searching didn't lead me to any kernel settings that
> would result in a 2 minute timeout.
> >
> > Does anyone have any clues about a) what's happening and b) how to get
> our desired behaviour of being able to control both mount and read/write
> timeouts down to a few seconds?
> 
> If the server is already mounted on that client at another mount point, then
> the client will share the transport amongst mounts of the same server.
> 
> The first mount's options take precedent, and subsequent mounts re-use
> that mount's transport and the mount options that control it.

That's good to know, Chuck, thanks.

In this case, though, I'm seeing the behaviour with only this single NFS mount on my test client.

Andrew



  reply	other threads:[~2023-01-23 16:44 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-23 16:31 Trying to reduce NFSv4 timeouts to a few seconds on an established connection Andrew Klaassen
2023-01-23 16:35 ` Chuck Lever III
2023-01-23 16:41   ` Andrew Klaassen [this message]
2023-01-26 15:31 ` Andrew Klaassen
2023-01-26 22:08   ` Andrew Klaassen
2023-01-27 13:33     ` Jeff Layton
2023-01-30 19:33       ` Andrew Klaassen
2023-01-30 19:55         ` Jeff Layton
2023-01-30 20:03           ` Andrew Klaassen
2023-01-30 20:31             ` Jeff Layton
2023-01-30 22:11               ` Zombie / Orphan open files Andrew J. Romero
2023-01-31  0:10                 ` Chuck Lever III
2023-01-31 13:27                 ` Jeff Layton
2023-01-31 14:42                   ` Andrew J. Romero
2023-01-31 15:24                     ` Jeff Layton
2023-01-31 15:31                     ` Chuck Lever III
2023-01-31 16:34                     ` Chuck Lever III
2023-01-31 16:59                       ` Andrew J. Romero
2023-01-31 18:05                         ` Chuck Lever III
2023-01-31 18:33                           ` Andrew J. Romero
2023-01-31 18:51                             ` Chuck Lever III
2023-01-31 19:32                               ` Andrew J. Romero
2023-01-31 19:08                             ` Olga Kornievskaia
2023-01-31 19:31                         ` Olga Kornievskaia
2023-01-31 19:54                           ` Andrew J. Romero
2023-01-31 22:14                             ` Olga Kornievskaia
2023-01-31 22:26                               ` Andrew J. Romero
2023-01-31 22:47                                 ` Olga Kornievskaia
2023-01-31 23:08                                   ` Andrew J. Romero
2023-02-01 14:28                                     ` Olga Kornievskaia
     [not found]                                       ` <SA1PR09MB755217D2B3E29E9486D4796FA7D19@SA1PR09MB7552.namprd09.prod.outlook.com>
     [not found]                                         ` <CAN-5tyGaX=Go+kwrM33K2EaY41sXmf4v1+2JO8MhbDuGTGG7zA@mail.gmail.com>
     [not found]                                           ` <SA1PR09MB755277F59EB463643BEBDD77A7D69@SA1PR09MB7552.namprd09.prod.outlook.com>
2023-02-02  0:53                                             ` Olga Kornievskaia
2023-01-31 22:28                               ` Jeff Layton
2023-01-31 18:13                       ` Jeff Layton
2023-01-31 16:26                 ` Olga Kornievskaia
2023-01-31 17:44                   ` Andrew J. Romero
2023-01-31 18:18                   ` Frank Filz
2023-01-31 19:19                     ` Olga Kornievskaia
2023-01-31 21:31                       ` Frank Filz
2023-01-31 21:46                         ` Andrew J. Romero
2023-02-02 18:16               ` Trying to reduce NFSv4 timeouts to a few seconds on an established connection Andrew Klaassen
2023-02-06 15:27                 ` Andrew Klaassen
2023-02-06 17:18                   ` Andrew Klaassen
2023-02-27 14:48                     ` Andrew Klaassen
2023-02-28 13:23                       ` Jeff Layton
2023-03-02 15:25                         ` Andrew Klaassen
2023-03-02 18:47                         ` Andrew Klaassen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YQBPR01MB107243E2377AA2B52379225D486C89@YQBPR01MB10724.CANPRD01.PROD.OUTLOOK.COM \
    --to=andrew.klaassen@boatrocker.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.