All of lore.kernel.org
 help / color / mirror / Atom feed
* question about handling off an unresponsive server during lease renewal
@ 2020-07-13 17:59 Olga Kornievskaia
  2020-07-13 18:15 ` Trond Myklebust
  0 siblings, 1 reply; 3+ messages in thread
From: Olga Kornievskaia @ 2020-07-13 17:59 UTC (permalink / raw)
  To: Trond Myklebust, linux-nfs

Hi Trond,

To the best of your knowledge, does the client implement this part of
the spec that deals with when the server isn't responding and the
lease is timing out.

RFC5661 section 8.3 talks about:

Transport retransmission delays might become so large as to
      approach or exceed the length of the lease period.  This may be
      particularly likely when the server is unresponsive due to a
      restart; see Section 8.4.2.1.  If the client implementation is not
      careful, transport retransmission delays can result in the client
      failing to detect a server restart before the grace period ends.
      The scenario is that the client is using a transport with
      exponential backoff, such that the maximum retransmission timeout
      exceeds both the grace period and the lease_time attribute.  A
      network partition causes the client's connection's retransmission
      interval to back off, and even after the partition heals, the next
      transport-level retransmission is sent after the server has
      restarted and its grace period ends.

      The client MUST either recover from the ensuing NFS4ERR_NO_GRACE
      errors or it MUST ensure that, despite transport-level
      retransmission intervals that exceed the lease_time, a SEQUENCE
      operation is sent that renews the lease before expiration.  The
      client can achieve this by associating a new connection with the
      session, and sending a SEQUENCE operation on it.  However, if the
      attempt to establish a new connection is delayed for some reason
      (e.g., exponential backoff of the connection establishment
      packets), the client will have to abort the connection
      establishment attempt before the lease expires, and attempt to
      reconnect.

SEQUNCE op is sent and server rebooted, it's coming up (but not responding).
At the TCP layer, TCP is exponentially backing off before retrying. At
some point the timeout goes more than 100s. Which means that by the
time the client resends the server is up and out of grace.

Does the client have any control over not letting the TCP wait for
longer than the lease period and instead, it needs to abort the
connection and start the new one? I mean I sort of find the 2nd
paragraph in contradiction to the fact that the client must never give
up on waiting for a reply from the server? But maybe this is a special
case where the client is supposed to know its lease hasn't been
renewed and it's OK to give up?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: question about handling off an unresponsive server during lease renewal
  2020-07-13 17:59 question about handling off an unresponsive server during lease renewal Olga Kornievskaia
@ 2020-07-13 18:15 ` Trond Myklebust
  2020-08-26 16:24   ` Dai Ngo
  0 siblings, 1 reply; 3+ messages in thread
From: Trond Myklebust @ 2020-07-13 18:15 UTC (permalink / raw)
  To: linux-nfs, aglo

Hi Olga

On Mon, 2020-07-13 at 13:59 -0400, Olga Kornievskaia wrote:
> Hi Trond,
> 
> To the best of your knowledge, does the client implement this part of
> the spec that deals with when the server isn't responding and the
> lease is timing out.
> 
> RFC5661 section 8.3 talks about:
> 
> Transport retransmission delays might become so large as to
>       approach or exceed the length of the lease period.  This may be
>       particularly likely when the server is unresponsive due to a
>       restart; see Section 8.4.2.1.  If the client implementation is
> not
>       careful, transport retransmission delays can result in the
> client
>       failing to detect a server restart before the grace period
> ends.
>       The scenario is that the client is using a transport with
>       exponential backoff, such that the maximum retransmission
> timeout
>       exceeds both the grace period and the lease_time attribute.  A
>       network partition causes the client's connection's
> retransmission
>       interval to back off, and even after the partition heals, the
> next
>       transport-level retransmission is sent after the server has
>       restarted and its grace period ends.
> 
>       The client MUST either recover from the ensuing
> NFS4ERR_NO_GRACE
>       errors or it MUST ensure that, despite transport-level
>       retransmission intervals that exceed the lease_time, a SEQUENCE
>       operation is sent that renews the lease before expiration.  The
>       client can achieve this by associating a new connection with
> the
>       session, and sending a SEQUENCE operation on it.  However, if
> the
>       attempt to establish a new connection is delayed for some
> reason
>       (e.g., exponential backoff of the connection establishment
>       packets), the client will have to abort the connection
>       establishment attempt before the lease expires, and attempt to
>       reconnect.
> 
> SEQUNCE op is sent and server rebooted, it's coming up (but not
> responding).
> At the TCP layer, TCP is exponentially backing off before retrying.
> At
> some point the timeout goes more than 100s. Which means that by the
> time the client resends the server is up and out of grace.
> 
> Does the client have any control over not letting the TCP wait for
> longer than the lease period and instead, it needs to abort the
> connection and start the new one? I mean I sort of find the 2nd
> paragraph in contradiction to the fact that the client must never
> give
> up on waiting for a reply from the server? But maybe this is a
> special
> case where the client is supposed to know its lease hasn't been
> renewed and it's OK to give up?

That is what this code is supposed to ensure:

/**
 * nfs4_set_lease_period - Sets the lease period on a nfs_client
 *
 * @clp: pointer to nfs_client
 * @lease: new value for lease period
 */
void nfs4_set_lease_period(struct nfs_client *clp,
                unsigned long lease)
{
        spin_lock(&clp->cl_lock);
        clp->cl_lease_time = lease;
        spin_unlock(&clp->cl_lock);

        /* Cap maximum reconnect timeout at 1/2 lease period */
        rpc_set_connect_timeout(clp->cl_rpcclient, lease, lease >> 1);
}

The call to rpc_set_connect_timeout() iterates through all of the
transports associated with that server, and calls xprt->ops-
>set_connect_timeout() with the appropriate connect and reconnect
timeouts.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: question about handling off an unresponsive server during lease renewal
  2020-07-13 18:15 ` Trond Myklebust
@ 2020-08-26 16:24   ` Dai Ngo
  0 siblings, 0 replies; 3+ messages in thread
From: Dai Ngo @ 2020-08-26 16:24 UTC (permalink / raw)
  To: Trond Myklebust, linux-nfs, aglo

Hi Olga and Trond,

On 7/13/20 11:15 AM, Trond Myklebust wrote:
> Hi Olga
>
> On Mon, 2020-07-13 at 13:59 -0400, Olga Kornievskaia wrote:
>> Hi Trond,
>>
>> To the best of your knowledge, does the client implement this part of
>> the spec that deals with when the server isn't responding and the
>> lease is timing out.
>>
>> RFC5661 section 8.3 talks about:
>>
>> Transport retransmission delays might become so large as to
>>        approach or exceed the length of the lease period.  This may be
>>        particularly likely when the server is unresponsive due to a
>>        restart; see Section 8.4.2.1.  If the client implementation is
>> not
>>        careful, transport retransmission delays can result in the
>> client
>>        failing to detect a server restart before the grace period
>> ends.
>>        The scenario is that the client is using a transport with
>>        exponential backoff, such that the maximum retransmission
>> timeout
>>        exceeds both the grace period and the lease_time attribute.  A
>>        network partition causes the client's connection's
>> retransmission
>>        interval to back off, and even after the partition heals, the
>> next
>>        transport-level retransmission is sent after the server has
>>        restarted and its grace period ends.
>>
>>        The client MUST either recover from the ensuing
>> NFS4ERR_NO_GRACE
>>        errors or it MUST ensure that, despite transport-level
>>        retransmission intervals that exceed the lease_time, a SEQUENCE
>>        operation is sent that renews the lease before expiration.  The
>>        client can achieve this by associating a new connection with
>> the
>>        session, and sending a SEQUENCE operation on it.  However, if
>> the
>>        attempt to establish a new connection is delayed for some
>> reason
>>        (e.g., exponential backoff of the connection establishment
>>        packets), the client will have to abort the connection
>>        establishment attempt before the lease expires, and attempt to
>>        reconnect.
>>
>> SEQUNCE op is sent and server rebooted, it's coming up (but not
>> responding).
>> At the TCP layer, TCP is exponentially backing off before retrying.
>> At
>> some point the timeout goes more than 100s. Which means that by the
>> time the client resends the server is up and out of grace.
>>
>> Does the client have any control over not letting the TCP wait for
>> longer than the lease period and instead, it needs to abort the
>> connection and start the new one? I mean I sort of find the 2nd
>> paragraph in contradiction to the fact that the client must never
>> give
>> up on waiting for a reply from the server? But maybe this is a
>> special
>> case where the client is supposed to know its lease hasn't been
>> renewed and it's OK to give up?
> That is what this code is supposed to ensure:
>
> /**
>   * nfs4_set_lease_period - Sets the lease period on a nfs_client
>   *
>   * @clp: pointer to nfs_client
>   * @lease: new value for lease period
>   */
> void nfs4_set_lease_period(struct nfs_client *clp,
>                  unsigned long lease)
> {
>          spin_lock(&clp->cl_lock);
>          clp->cl_lease_time = lease;
>          spin_unlock(&clp->cl_lock);
>
>          /* Cap maximum reconnect timeout at 1/2 lease period */
>          rpc_set_connect_timeout(clp->cl_rpcclient, lease, lease >> 1);
> }
>
> The call to rpc_set_connect_timeout() iterates through all of the
> transports associated with that server, and calls xprt->ops-
>> set_connect_timeout() with the appropriate connect and reconnect
> timeouts.

xs_tcp_set_connect_timeout is called to setup the rpc_timeout structure
in sock_xprt based on lease and lease >> 1.  With the v4 lease period
of 90 secs, the to_initval and to_maxval are both set to 30000ms and
to_retries is set to 2 (default).

xs_tcp_set_socket_timeouts uses the rpc_timeout in sock_xprt to set up
the TCP keep-alive timer and the TCP_USER_TIMEOUT option for the socket.

Currently, with the v4 lease of 90 secs, the TCP_USER_TIMEOUT is set to
90,000ms which is the same as the lease period.  Since the lease period
and the TCP_USER_TIMEOUT are the same, there will be cases where the
client does not have enough time to reclaim its locks.  Should the
TCP_USER_TIMEOUT value be less than the lease period, perhaps the same
as the lease renewal period which is 60 secs?

Thanks,
-Dai


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-08-26 16:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13 17:59 question about handling off an unresponsive server during lease renewal Olga Kornievskaia
2020-07-13 18:15 ` Trond Myklebust
2020-08-26 16:24   ` Dai Ngo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.