Re: NFSv4.x behavior of 'soft' in unresponsive server cases - bounded time for application to wait on NFS?

* Re: NFSv4.x behavior of 'soft' in unresponsive server cases - bounded time for application to wait on NFS?
       [not found] <CALF+zOm77MCP1QbLihn0hB65SB9JxkHEVSy8=-QgwW9H9E1Hng@mail.gmail.com>
@ 2020-02-13 17:16 ` Trond Myklebust
  2020-02-18 16:26   ` David Wysochanski
  0 siblings, 1 reply; 2+ messages in thread
From: Trond Myklebust @ 2020-02-13 17:16 UTC (permalink / raw)
  To: dwysocha; +Cc: linux-nfs

Hi Dave,

On Thu, 2020-02-13 at 09:44 -0500, David Wysochanski wrote:
> Hi Trond,
> 
> I'm getting up to speed on your patchset from last year titled "Fix
> up soft mounts for NFSv4.x"
> https://spinics.net/lists/linux-nfs/msg72467.html
> 
> Specifically I have concerns about this patch because after it, so
> far I cannot find any way that an application can achieve a bounded
> wait on an NFS4 operation:
> e4ec48d3cc61 SUNRPC: Make "no retrans timeout" soft tasks behave like
> softconn for timeouts
> 
> The patchset changed 'soft' semantics and I want to be sure I
> understand this and what the intended behavior is in the case of an
> unresponsive server.  Specifically I am investigating TCP and two
> cases:
> a) server is responsive at the TCP level but not at the NFS level to
> some operations (slow IO - read or a write)

This is the case that the NFSv4 protocol covers in RFC7530 Section
3.1.1. ("Client Retransmission Behavior") and RFC5661 Section 2.9.2.
The behaviour we are adopting here for 'soft' is specifically designed
to be compliant with those two sections.

> b) server is not responsive at the TCP level (network partition)
> 
> Primarily I am testing kernel v5.5 with 'a' since I think 'b' is
> covered by a reset of the connection after it looks like 2 minutes. 
> I realize the NFS4 client cannot retransmit an RPC per the NFS4 RFC. 
> However, is there some way to achieve a bounded wait of say 'T' an
> application after this patch in both of those instances, basically
> something like "soft,retrans=0,timeo=T"?  Is there an option to force
> a reset of the TCP connection in case 'a' after a specified time, or
> is this impossible for some reason?  Or is the minimum timeout not
> bounded by anything specified on the mount options but by other
> factors such as server responsiveness to an operation (case 'a') or
> TCP connection timeout / reset (case 'b')?
> 
> Thanks.

Yes, there is the option of closing the TCP socket on the client.
However that breaks replay semantics on NFSv4.0 and it just forces us
into a livelock situation in the case where the server is responsive,
but slow/congested. This is particularly true of NFSv4.x (x>0), where
the session semantics mean that we cannot send a new request on the
slot before the old one has completed being processed by the server.

So yes, the new semantics are a compromise, but they are designed to
address the situations where the server really is gone away, and are
designed to avoid overloading the server further in situations where it
is already congested.

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 2+ messages in thread