All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] SUNRPC dont update timeout value on connection reset
@ 2020-06-23 15:24 Olga Kornievskaia
  2020-06-28 18:03 ` Olga Kornievskaia
  0 siblings, 1 reply; 13+ messages in thread
From: Olga Kornievskaia @ 2020-06-23 15:24 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker; +Cc: linux-nfs

Current behaviour: every time a v3 operation is re-sent to the server
we update (double) the timeout. There is no distinction between whether
or not the previous timer had expired before the re-sent happened.

Here's the scenario:
1. Client sends a v3 operation
2. Server RST-s the connection (prior to the timeout) (eg., connection
is immediately reset)
3. Client re-sends a v3 operation but the timeout is now 120sec.

As a result, an application sees 2mins pause before a retry in case
server again does not reply. Where as if a connection reset didn't
change the timeout value, the client would have re-tried (the 3rd
time) after 60secs.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>

--- I have a question with regards to should we also not update the
number of retries when connection is RST-ed? This would allow the
client to still weather a 6mins (60+120+180) of unresponsive server.
After this patch the client can handle only 3mins (60+120) of
unresponsive server after the initial RST ---
---
 net/sunrpc/clnt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index a91d1cd..65517cf 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2405,7 +2405,8 @@ void rpc_force_rebind(struct rpc_clnt *clnt)
 		goto out_exit;
 	}
 	task->tk_action = call_encode;
-	rpc_check_timeout(task);
+	if (status != -ECONNRESET && status != -ECONNABORTED)
+		rpc_check_timeout(task);
 	return;
 out_exit:
 	rpc_call_rpcerror(task, status);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] SUNRPC dont update timeout value on connection reset
  2020-06-23 15:24 [PATCH 1/1] SUNRPC dont update timeout value on connection reset Olga Kornievskaia
@ 2020-06-28 18:03 ` Olga Kornievskaia
  2020-06-28 21:16   ` Trond Myklebust
  0 siblings, 1 reply; 13+ messages in thread
From: Olga Kornievskaia @ 2020-06-28 18:03 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

Trond/Anna,

Any comments on this patch?

On Tue, Jun 23, 2020 at 11:22 AM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> Current behaviour: every time a v3 operation is re-sent to the server
> we update (double) the timeout. There is no distinction between whether
> or not the previous timer had expired before the re-sent happened.
>
> Here's the scenario:
> 1. Client sends a v3 operation
> 2. Server RST-s the connection (prior to the timeout) (eg., connection
> is immediately reset)
> 3. Client re-sends a v3 operation but the timeout is now 120sec.
>
> As a result, an application sees 2mins pause before a retry in case
> server again does not reply. Where as if a connection reset didn't
> change the timeout value, the client would have re-tried (the 3rd
> time) after 60secs.
>
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>
> --- I have a question with regards to should we also not update the
> number of retries when connection is RST-ed? This would allow the
> client to still weather a 6mins (60+120+180) of unresponsive server.
> After this patch the client can handle only 3mins (60+120) of
> unresponsive server after the initial RST ---
> ---
>  net/sunrpc/clnt.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index a91d1cd..65517cf 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -2405,7 +2405,8 @@ void rpc_force_rebind(struct rpc_clnt *clnt)
>                 goto out_exit;
>         }
>         task->tk_action = call_encode;
> -       rpc_check_timeout(task);
> +       if (status != -ECONNRESET && status != -ECONNABORTED)
> +               rpc_check_timeout(task);
>         return;
>  out_exit:
>         rpc_call_rpcerror(task, status);
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] SUNRPC dont update timeout value on connection reset
  2020-06-28 18:03 ` Olga Kornievskaia
@ 2020-06-28 21:16   ` Trond Myklebust
  2020-07-08 21:04     ` Olga Kornievskaia
  0 siblings, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2020-06-28 21:16 UTC (permalink / raw)
  To: anna.schumaker, olga.kornievskaia; +Cc: linux-nfs

On Sun, 2020-06-28 at 14:03 -0400, Olga Kornievskaia wrote:
> Trond/Anna,
> 
> Any comments on this patch?
> 
> On Tue, Jun 23, 2020 at 11:22 AM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
> > Current behaviour: every time a v3 operation is re-sent to the
> > server
> > we update (double) the timeout. There is no distinction between
> > whether
> > or not the previous timer had expired before the re-sent happened.
> > 
> > Here's the scenario:
> > 1. Client sends a v3 operation
> > 2. Server RST-s the connection (prior to the timeout) (eg.,
> > connection
> > is immediately reset)
> > 3. Client re-sends a v3 operation but the timeout is now 120sec.

Ah... The problem here is clearly '3.' incrementing the timeout value
before we've actually hit a minor or major timeout...

So I think we want to look carefully at xprt_adjust_timeout(). The
first rule there should be that if we're below the threshold for a
minor timeout, we just want to exit without changing anything.

The second rule is then that if we're below the threshold for a major
timeout, then we adjust the timeout value by doubling it (if to-
>to_exponential) or adding the value to->to_increment (if !to-
>to_exponential) and then exit.

Finally, if this is a major timeout, we reset req->rq_timeout to to-
>to_initval, reset req->rq_retries, call xprt_reset_majortimeo(), reset
the RTT counters and return ETIMEDOUT.

None of this should be specific to your connection reset case. This is
how we want timeouts to work in the generic case, so we need to fix
that.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] SUNRPC dont update timeout value on connection reset
  2020-06-28 21:16   ` Trond Myklebust
@ 2020-07-08 21:04     ` Olga Kornievskaia
  0 siblings, 0 replies; 13+ messages in thread
From: Olga Kornievskaia @ 2020-07-08 21:04 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: anna.schumaker, linux-nfs

On Sun, Jun 28, 2020 at 5:16 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Sun, 2020-06-28 at 14:03 -0400, Olga Kornievskaia wrote:
> > Trond/Anna,
> >
> > Any comments on this patch?
> >
> > On Tue, Jun 23, 2020 at 11:22 AM Olga Kornievskaia
> > <olga.kornievskaia@gmail.com> wrote:
> > > Current behaviour: every time a v3 operation is re-sent to the
> > > server
> > > we update (double) the timeout. There is no distinction between
> > > whether
> > > or not the previous timer had expired before the re-sent happened.
> > >
> > > Here's the scenario:
> > > 1. Client sends a v3 operation
> > > 2. Server RST-s the connection (prior to the timeout) (eg.,
> > > connection
> > > is immediately reset)
> > > 3. Client re-sends a v3 operation but the timeout is now 120sec.
>
> Ah... The problem here is clearly '3.' incrementing the timeout value
> before we've actually hit a minor or major timeout...
>
> So I think we want to look carefully at xprt_adjust_timeout(). The
> first rule there should be that if we're below the threshold for a
> minor timeout, we just want to exit without changing anything.
>
> The second rule is then that if we're below the threshold for a major
> timeout, then we adjust the timeout value by doubling it (if to-
> >to_exponential) or adding the value to->to_increment (if !to-
> >to_exponential) and then exit.
>
> Finally, if this is a major timeout, we reset req->rq_timeout to to-
> >to_initval, reset req->rq_retries, call xprt_reset_majortimeo(), reset
> the RTT counters and return ETIMEDOUT.
>
> None of this should be specific to your connection reset case. This is
> how we want timeouts to work in the generic case, so we need to fix
> that.
>

Ok thanks for comments. I don't know if I got it right but I submitted
a new version.

> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] SUNRPC dont update timeout value on connection reset
  2020-07-13 13:47             ` Trond Myklebust
@ 2020-07-13 16:18               ` Olga Kornievskaia
  0 siblings, 0 replies; 13+ messages in thread
From: Olga Kornievskaia @ 2020-07-13 16:18 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, anna.schumaker

On Mon, Jul 13, 2020 at 9:47 AM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> Hi Olga
>
> On Fri, 2020-07-10 at 14:40 -0400, Olga Kornievskaia wrote:
> > On Fri, Jul 10, 2020 at 1:35 PM Olga Kornievskaia
> > <olga.kornievskaia@gmail.com> wrote:
> > > On Thu, Jul 9, 2020 at 5:07 PM Olga Kornievskaia
> > > <olga.kornievskaia@gmail.com> wrote:
> > > > On Thu, Jul 9, 2020 at 1:19 PM Trond Myklebust <
> > > > trondmy@hammerspace.com> wrote:
> > > > > On Thu, 2020-07-09 at 11:43 -0400, Olga Kornievskaia wrote:
> > > > > > On Thu, Jul 9, 2020 at 8:08 AM Trond Myklebust <
> > > > > > trondmy@hammerspace.com> wrote:
> > > > > > > Hi Olga
> > > > > > >
> > > > > > > On Wed, 2020-07-08 at 17:05 -0400, Olga Kornievskaia wrote:
> > > > > > > > Current behaviour: every time a v3 operation is re-sent
> > > > > > > > to the
> > > > > > > > server
> > > > > > > > we update (double) the timeout. There is no distinction
> > > > > > > > between
> > > > > > > > whether
> > > > > > > > or not the previous timer had expired before the re-sent
> > > > > > > > happened.
> > > > > > > >
> > > > > > > > Here's the scenario:
> > > > > > > > 1. Client sends a v3 operation
> > > > > > > > 2. Server RST-s the connection (prior to the timeout)
> > > > > > > > (eg.,
> > > > > > > > connection
> > > > > > > > is immediately reset)
> > > > > > > > 3. Client re-sends a v3 operation but the timeout is now
> > > > > > > > 120sec.
> > > > > > > >
> > > > > > > > As a result, an application sees 2mins pause before a
> > > > > > > > retry in
> > > > > > > > case
> > > > > > > > server again does not reply.
> > > > > > > >
> > > > > > > > Instead, this patch proposes to keep track off when the
> > > > > > > > minor
> > > > > > > > timeout
> > > > > > > > should happen and if it didn't, then don't update the new
> > > > > > > > timeout.
> > > > > > > >
> > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > > > > ---
> > > > > > > >  include/linux/sunrpc/xprt.h |  1 +
> > > > > > > >  net/sunrpc/xprt.c           | 11 +++++++++++
> > > > > > > >  2 files changed, 12 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/include/linux/sunrpc/xprt.h
> > > > > > > > b/include/linux/sunrpc/xprt.h
> > > > > > > > index e64bd82..a603d48 100644
> > > > > > > > --- a/include/linux/sunrpc/xprt.h
> > > > > > > > +++ b/include/linux/sunrpc/xprt.h
> > > > > > > > @@ -101,6 +101,7 @@ struct rpc_rqst {
> > > > > > > >                                                        *
> > > > > > > > used in
> > > > > > > > the
> > > > > > > > softirq.
> > > > > > > >                                                        */
> > > > > > > >       unsigned long           rq_majortimeo;  /* major
> > > > > > > > timeout
> > > > > > > > alarm */
> > > > > > > > +     unsigned long           rq_minortimeo;  /* minor
> > > > > > > > timeout
> > > > > > > > alarm */
> > > > > > > >       unsigned long           rq_timeout;     /* Current
> > > > > > > > timeout
> > > > > > > > value */
> > > > > > > >       ktime_t                 rq_rtt;         /* round-
> > > > > > > > trip time
> > > > > > > > */
> > > > > > > >       unsigned int            rq_retries;     /* # of
> > > > > > > > retries */
> > > > > > > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > > > > > > > index d5cc5db..c0ce232 100644
> > > > > > > > --- a/net/sunrpc/xprt.c
> > > > > > > > +++ b/net/sunrpc/xprt.c
> > > > > > > > @@ -607,6 +607,11 @@ static void
> > > > > > > > xprt_reset_majortimeo(struct
> > > > > > > > rpc_rqst *req)
> > > > > > > >       req->rq_majortimeo += xprt_calc_majortimeo(req);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static void xprt_reset_minortimeo(struct rpc_rqst *req)
> > > > > > > > +{
> > > > > > > > +     req->rq_minortimeo = jiffies + req->rq_timeout;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  static void xprt_init_majortimeo(struct rpc_task *task,
> > > > > > > > struct
> > > > > > > > rpc_rqst *req)
> > > > > > > >  {
> > > > > > > >       unsigned long time_init;
> > > > > > > > @@ -618,6 +623,7 @@ static void
> > > > > > > > xprt_init_majortimeo(struct
> > > > > > > > rpc_task
> > > > > > > > *task, struct rpc_rqst *req)
> > > > > > > >               time_init = xprt_abs_ktime_to_jiffies(task-
> > > > > > > > > tk_start);
> > > > > > > >       req->rq_timeout = task->tk_client->cl_timeout-
> > > > > > > > >to_initval;
> > > > > > > >       req->rq_majortimeo = time_init +
> > > > > > > > xprt_calc_majortimeo(req);
> > > > > > > > +     req->rq_minortimeo = time_init + req->rq_timeout;
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  /**
> > > > > > > > @@ -631,6 +637,10 @@ int xprt_adjust_timeout(struct
> > > > > > > > rpc_rqst
> > > > > > > > *req)
> > > > > > > >       const struct rpc_timeout *to = req->rq_task-
> > > > > > > > >tk_client-
> > > > > > > > > cl_timeout;
> > > > > > > >       int status = 0;
> > > > > > > >
> > > > > > > > +     if (time_before(jiffies, req->rq_minortimeo)) {
> > > > > > > > +             xprt_reset_minortimeo(req);
> > > > > > > > +             return status;
> > > > > > >
> > > > > > > Shouldn't this case be just returning without updating the
> > > > > > > timeout?
> > > > > > > After all, this is the case where nothing has expired yet.
> > > > > >
> > > > > > I think we perhaps should readjust the minor timeout every
> > > > > > here but I
> > > > > > can't figure out what the desired behaviour should be. When
> > > > > > should we
> > > > > > consider it's appropriate to double the timer. Consider the
> > > > > > following:
> > > > > >
> > > > > > time1: v3 op sent
> > > > > > time1+50s: server RSTs
> > > > > > We check that it's not yet the minor timeout (time1+60s)
> > > > > > time1+50s: v3 op re-sent  (say we don't reset the minor
> > > > > > timeout to be
> > > > > > current time+60s)
> > > > > > time1+60s: server RSTs
> > > > > > Client will resend the op but now it's past the initial minor
> > > > > > timeout
> > > > > > so the timeout will be doubled. Is that what we really want?
> > > > > > Maybe it
> > > > > > is.
> > > > > > Say now the server RSTs the connection again (shortly after
> > > > > > or in
> > > > > > less
> > > > > > than 60s), since we are not updating the minor timeout value,
> > > > > > then
> > > > > > the
> > > > > > client will again modify the timeout before resending. Is
> > > > > > that Ok?
> > > > > >
> > > > > > That's why my reasoning was that at every re-evaluation of
> > > > > > the
> > > > > > timeout
> > > > > > value, we have the minor timeout set for current time+60s and
> > > > > > we get
> > > > > > an RST within it then we don't modify the timeout value.
> > > > >
> > > > > So a couple of issues with that:
> > > > >
> > > > > The first is that a series of RST calls could cause the timeout
> > > > > to get
> > > > > pushed to the max value fairly quickly (btw,
> > > > > xprt_reset_minortimeo()
> > > > > does not enforce a limit right now).
> > > > >
> > > > > The second is that we end up pushing out the major timeout
> > > > > value, since
> > > > > the major timeout cannot occur unless the value of jiffies is
> > > > > after the
> > > > > minor timeout (which keeps changing on each pass).
> > > >
> > > > But dont we want to push out the major timeout?
> > > >
> > > > Actually i think, back in my example of getting the RST, at
> > > > (time1+50s). shouldn't minor_timeo and majortimeo be reset to
> > > > currenttime+appropriate value of minor/major?  If we are
> > > > evaluating
> > > > the timer and the time difference between when the operation was
> > > > sent
> > > > and now is less than 60s, we shouldn't say a timeout has
> > > > occurried
> > > > (it's a pre-mature timeout) and thus its value shouldn't be
> > > > modified.
> > > >
> > > > Thoughts?
> > >
> > > Do you feel that the following approach is incorrect? Sry it's just
> > > cut-and-paste but the logic is there. Thank you.
> >
> > Scratch this... So with this we'd never timeout an operation at all.
>
> I think the ideal solution would respect the fact that most sysadmins
> who read the nfs manpage assume that timeouts are a predictable
> feature, and that if I set timeo=600, retrans=2, for a TCP mount, then
> the minor timeouts will occur 60s, and 180s (60+120) after the RPC call
> was initially attempted sent, and then the first major timeout will
> occur 360s (60+120+180) after the RPC call was initially attempted
> sent.
> i.e. the timeouts are calculated relative to the time at which the RPC
> call was initially attempted transmitted.
>
> If we start extending any one of those timeouts, then things like soft
> mounts become unpredictable, and we no longer control when the EIO is
> going to be reported to the application. This has been a source of
> complaints from users in the past.

Thanks Trond. I re-submitted the patch with your initial suggestion.


> > > diff --git a/include/linux/sunrpc/xprt.h
> > > b/include/linux/sunrpc/xprt.h
> > > index e64bd82..a603d48 100644
> > > --- a/include/linux/sunrpc/xprt.h
> > > +++ b/include/linux/sunrpc/xprt.h
> > > @@ -101,6 +101,7 @@ struct rpc_rqst {
> > >   * used in the softirq.
> > >   */
> > >   unsigned long rq_majortimeo; /* major timeout alarm */
> > > + unsigned long rq_minortimeo; /* minor timeout alarm */
> > >   unsigned long rq_timeout; /* Current timeout value */
> > >   ktime_t rq_rtt; /* round-trip time */
> > >   unsigned int rq_retries; /* # of retries */
> > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > > index d5cc5db..66d412b 100644
> > > --- a/net/sunrpc/xprt.c
> > > +++ b/net/sunrpc/xprt.c
> > > @@ -607,6 +607,11 @@ static void xprt_reset_majortimeo(struct
> > > rpc_rqst *req)
> > >   req->rq_majortimeo += xprt_calc_majortimeo(req);
> > >  }
> > >
> > > +static void xprt_reset_minortimeo(struct rpc_rqst *req)
> > > +{
> > > + req->rq_minortimeo = jiffies + req->rq_timeout;
> > > +}
> > > +
> > >  static void xprt_init_majortimeo(struct rpc_task *task, struct
> > > rpc_rqst *req)
> > >  {
> > >   unsigned long time_init;
> > > @@ -618,6 +623,7 @@ static void xprt_init_majortimeo(struct
> > > rpc_task
> > > *task, struct rpc_rqst *req)
> > >   time_init = xprt_abs_ktime_to_jiffies(task->tk_start);
> > >   req->rq_timeout = task->tk_client->cl_timeout->to_initval;
> > >   req->rq_majortimeo = time_init + xprt_calc_majortimeo(req);
> > > + req->rq_minortimeo = time_init + req->rq_timeout;
> > >  }
> > >
> > >  /**
> > > @@ -631,6 +637,11 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
> > >   const struct rpc_timeout *to = req->rq_task->tk_client-
> > > >cl_timeout;
> > >   int status = 0;
> > >
> > > + if (time_before(jiffies, req->rq_minortimeo)) {
> > > + req->rq_majortimeo = jiffies + xprt_calc_majortimeo(req);
> > > + req->rq_minortimeo = jiffies + req->rq_timeout;
> > > + return status;
> > > + }
> > >   if (time_before(jiffies, req->rq_majortimeo)) {
> > >   if (to->to_exponential)
> > >   req->rq_timeout <<= 1;
> > > @@ -649,6 +660,7 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
> > >   spin_unlock(&xprt->transport_lock);
> > >   status = -ETIMEDOUT;
> > >   }
> > > + xprt_reset_minortimeo(req);
> > >
> > >   if (req->rq_timeout == 0) {
> > >   printk(KERN_WARNING "xprt_adjust_timeout: rq_timeout = 0!\n");
> > > --
> > >
> > > > > > > > +     }
> > > > > > > >       if (time_before(jiffies, req->rq_majortimeo)) {
> > > > > > > >               if (to->to_exponential)
> > > > > > > >                       req->rq_timeout <<= 1;
> > > > > > > > @@ -638,6 +648,7 @@ int xprt_adjust_timeout(struct
> > > > > > > > rpc_rqst *req)
> > > > > > > >                       req->rq_timeout += to-
> > > > > > > > >to_increment;
> > > > > > > >               if (to->to_maxval && req->rq_timeout >= to-
> > > > > > > > > to_maxval)
> > > > > > > >                       req->rq_timeout = to->to_maxval;
> > > > > > > > +             xprt_reset_minortimeo(req);
> > > > > > >
> > > > > > > ...and then perhaps this can just be moved out of the
> > > > > > > time_before()
> > > > > > > condition, since it looks to me as if we also want to reset
> > > > > > > req-
> > > > > > > > rq_minortimeo when a major timeout occurs.
> > > > > > > >               req->rq_retries++;
> > > > > > > >       } else {
> > > > > > > >               req->rq_timeout = to->to_initval;
> > > > >
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] SUNRPC dont update timeout value on connection reset
  2020-07-10 18:40           ` Olga Kornievskaia
@ 2020-07-13 13:47             ` Trond Myklebust
  2020-07-13 16:18               ` Olga Kornievskaia
  0 siblings, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2020-07-13 13:47 UTC (permalink / raw)
  To: olga.kornievskaia; +Cc: linux-nfs, anna.schumaker

Hi Olga

On Fri, 2020-07-10 at 14:40 -0400, Olga Kornievskaia wrote:
> On Fri, Jul 10, 2020 at 1:35 PM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
> > On Thu, Jul 9, 2020 at 5:07 PM Olga Kornievskaia
> > <olga.kornievskaia@gmail.com> wrote:
> > > On Thu, Jul 9, 2020 at 1:19 PM Trond Myklebust <
> > > trondmy@hammerspace.com> wrote:
> > > > On Thu, 2020-07-09 at 11:43 -0400, Olga Kornievskaia wrote:
> > > > > On Thu, Jul 9, 2020 at 8:08 AM Trond Myklebust <
> > > > > trondmy@hammerspace.com> wrote:
> > > > > > Hi Olga
> > > > > > 
> > > > > > On Wed, 2020-07-08 at 17:05 -0400, Olga Kornievskaia wrote:
> > > > > > > Current behaviour: every time a v3 operation is re-sent
> > > > > > > to the
> > > > > > > server
> > > > > > > we update (double) the timeout. There is no distinction
> > > > > > > between
> > > > > > > whether
> > > > > > > or not the previous timer had expired before the re-sent
> > > > > > > happened.
> > > > > > > 
> > > > > > > Here's the scenario:
> > > > > > > 1. Client sends a v3 operation
> > > > > > > 2. Server RST-s the connection (prior to the timeout)
> > > > > > > (eg.,
> > > > > > > connection
> > > > > > > is immediately reset)
> > > > > > > 3. Client re-sends a v3 operation but the timeout is now
> > > > > > > 120sec.
> > > > > > > 
> > > > > > > As a result, an application sees 2mins pause before a
> > > > > > > retry in
> > > > > > > case
> > > > > > > server again does not reply.
> > > > > > > 
> > > > > > > Instead, this patch proposes to keep track off when the
> > > > > > > minor
> > > > > > > timeout
> > > > > > > should happen and if it didn't, then don't update the new
> > > > > > > timeout.
> > > > > > > 
> > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > > > ---
> > > > > > >  include/linux/sunrpc/xprt.h |  1 +
> > > > > > >  net/sunrpc/xprt.c           | 11 +++++++++++
> > > > > > >  2 files changed, 12 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/include/linux/sunrpc/xprt.h
> > > > > > > b/include/linux/sunrpc/xprt.h
> > > > > > > index e64bd82..a603d48 100644
> > > > > > > --- a/include/linux/sunrpc/xprt.h
> > > > > > > +++ b/include/linux/sunrpc/xprt.h
> > > > > > > @@ -101,6 +101,7 @@ struct rpc_rqst {
> > > > > > >                                                        *
> > > > > > > used in
> > > > > > > the
> > > > > > > softirq.
> > > > > > >                                                        */
> > > > > > >       unsigned long           rq_majortimeo;  /* major
> > > > > > > timeout
> > > > > > > alarm */
> > > > > > > +     unsigned long           rq_minortimeo;  /* minor
> > > > > > > timeout
> > > > > > > alarm */
> > > > > > >       unsigned long           rq_timeout;     /* Current
> > > > > > > timeout
> > > > > > > value */
> > > > > > >       ktime_t                 rq_rtt;         /* round-
> > > > > > > trip time
> > > > > > > */
> > > > > > >       unsigned int            rq_retries;     /* # of
> > > > > > > retries */
> > > > > > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > > > > > > index d5cc5db..c0ce232 100644
> > > > > > > --- a/net/sunrpc/xprt.c
> > > > > > > +++ b/net/sunrpc/xprt.c
> > > > > > > @@ -607,6 +607,11 @@ static void
> > > > > > > xprt_reset_majortimeo(struct
> > > > > > > rpc_rqst *req)
> > > > > > >       req->rq_majortimeo += xprt_calc_majortimeo(req);
> > > > > > >  }
> > > > > > > 
> > > > > > > +static void xprt_reset_minortimeo(struct rpc_rqst *req)
> > > > > > > +{
> > > > > > > +     req->rq_minortimeo = jiffies + req->rq_timeout;
> > > > > > > +}
> > > > > > > +
> > > > > > >  static void xprt_init_majortimeo(struct rpc_task *task,
> > > > > > > struct
> > > > > > > rpc_rqst *req)
> > > > > > >  {
> > > > > > >       unsigned long time_init;
> > > > > > > @@ -618,6 +623,7 @@ static void
> > > > > > > xprt_init_majortimeo(struct
> > > > > > > rpc_task
> > > > > > > *task, struct rpc_rqst *req)
> > > > > > >               time_init = xprt_abs_ktime_to_jiffies(task-
> > > > > > > > tk_start);
> > > > > > >       req->rq_timeout = task->tk_client->cl_timeout-
> > > > > > > >to_initval;
> > > > > > >       req->rq_majortimeo = time_init +
> > > > > > > xprt_calc_majortimeo(req);
> > > > > > > +     req->rq_minortimeo = time_init + req->rq_timeout;
> > > > > > >  }
> > > > > > > 
> > > > > > >  /**
> > > > > > > @@ -631,6 +637,10 @@ int xprt_adjust_timeout(struct
> > > > > > > rpc_rqst
> > > > > > > *req)
> > > > > > >       const struct rpc_timeout *to = req->rq_task-
> > > > > > > >tk_client-
> > > > > > > > cl_timeout;
> > > > > > >       int status = 0;
> > > > > > > 
> > > > > > > +     if (time_before(jiffies, req->rq_minortimeo)) {
> > > > > > > +             xprt_reset_minortimeo(req);
> > > > > > > +             return status;
> > > > > > 
> > > > > > Shouldn't this case be just returning without updating the
> > > > > > timeout?
> > > > > > After all, this is the case where nothing has expired yet.
> > > > > 
> > > > > I think we perhaps should readjust the minor timeout every
> > > > > here but I
> > > > > can't figure out what the desired behaviour should be. When
> > > > > should we
> > > > > consider it's appropriate to double the timer. Consider the
> > > > > following:
> > > > > 
> > > > > time1: v3 op sent
> > > > > time1+50s: server RSTs
> > > > > We check that it's not yet the minor timeout (time1+60s)
> > > > > time1+50s: v3 op re-sent  (say we don't reset the minor
> > > > > timeout to be
> > > > > current time+60s)
> > > > > time1+60s: server RSTs
> > > > > Client will resend the op but now it's past the initial minor
> > > > > timeout
> > > > > so the timeout will be doubled. Is that what we really want?
> > > > > Maybe it
> > > > > is.
> > > > > Say now the server RSTs the connection again (shortly after
> > > > > or in
> > > > > less
> > > > > than 60s), since we are not updating the minor timeout value,
> > > > > then
> > > > > the
> > > > > client will again modify the timeout before resending. Is
> > > > > that Ok?
> > > > > 
> > > > > That's why my reasoning was that at every re-evaluation of
> > > > > the
> > > > > timeout
> > > > > value, we have the minor timeout set for current time+60s and
> > > > > we get
> > > > > an RST within it then we don't modify the timeout value.
> > > > 
> > > > So a couple of issues with that:
> > > > 
> > > > The first is that a series of RST calls could cause the timeout
> > > > to get
> > > > pushed to the max value fairly quickly (btw,
> > > > xprt_reset_minortimeo()
> > > > does not enforce a limit right now).
> > > > 
> > > > The second is that we end up pushing out the major timeout
> > > > value, since
> > > > the major timeout cannot occur unless the value of jiffies is
> > > > after the
> > > > minor timeout (which keeps changing on each pass).
> > > 
> > > But dont we want to push out the major timeout?
> > > 
> > > Actually i think, back in my example of getting the RST, at
> > > (time1+50s). shouldn't minor_timeo and majortimeo be reset to
> > > currenttime+appropriate value of minor/major?  If we are
> > > evaluating
> > > the timer and the time difference between when the operation was
> > > sent
> > > and now is less than 60s, we shouldn't say a timeout has
> > > occurried
> > > (it's a pre-mature timeout) and thus its value shouldn't be
> > > modified.
> > > 
> > > Thoughts?
> > 
> > Do you feel that the following approach is incorrect? Sry it's just
> > cut-and-paste but the logic is there. Thank you.
> 
> Scratch this... So with this we'd never timeout an operation at all.

I think the ideal solution would respect the fact that most sysadmins
who read the nfs manpage assume that timeouts are a predictable
feature, and that if I set timeo=600, retrans=2, for a TCP mount, then
the minor timeouts will occur 60s, and 180s (60+120) after the RPC call
was initially attempted sent, and then the first major timeout will
occur 360s (60+120+180) after the RPC call was initially attempted
sent.
i.e. the timeouts are calculated relative to the time at which the RPC
call was initially attempted transmitted.

If we start extending any one of those timeouts, then things like soft
mounts become unpredictable, and we no longer control when the EIO is
going to be reported to the application. This has been a source of
complaints from users in the past.

> 
> > diff --git a/include/linux/sunrpc/xprt.h
> > b/include/linux/sunrpc/xprt.h
> > index e64bd82..a603d48 100644
> > --- a/include/linux/sunrpc/xprt.h
> > +++ b/include/linux/sunrpc/xprt.h
> > @@ -101,6 +101,7 @@ struct rpc_rqst {
> >   * used in the softirq.
> >   */
> >   unsigned long rq_majortimeo; /* major timeout alarm */
> > + unsigned long rq_minortimeo; /* minor timeout alarm */
> >   unsigned long rq_timeout; /* Current timeout value */
> >   ktime_t rq_rtt; /* round-trip time */
> >   unsigned int rq_retries; /* # of retries */
> > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > index d5cc5db..66d412b 100644
> > --- a/net/sunrpc/xprt.c
> > +++ b/net/sunrpc/xprt.c
> > @@ -607,6 +607,11 @@ static void xprt_reset_majortimeo(struct
> > rpc_rqst *req)
> >   req->rq_majortimeo += xprt_calc_majortimeo(req);
> >  }
> > 
> > +static void xprt_reset_minortimeo(struct rpc_rqst *req)
> > +{
> > + req->rq_minortimeo = jiffies + req->rq_timeout;
> > +}
> > +
> >  static void xprt_init_majortimeo(struct rpc_task *task, struct
> > rpc_rqst *req)
> >  {
> >   unsigned long time_init;
> > @@ -618,6 +623,7 @@ static void xprt_init_majortimeo(struct
> > rpc_task
> > *task, struct rpc_rqst *req)
> >   time_init = xprt_abs_ktime_to_jiffies(task->tk_start);
> >   req->rq_timeout = task->tk_client->cl_timeout->to_initval;
> >   req->rq_majortimeo = time_init + xprt_calc_majortimeo(req);
> > + req->rq_minortimeo = time_init + req->rq_timeout;
> >  }
> > 
> >  /**
> > @@ -631,6 +637,11 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
> >   const struct rpc_timeout *to = req->rq_task->tk_client-
> > >cl_timeout;
> >   int status = 0;
> > 
> > + if (time_before(jiffies, req->rq_minortimeo)) {
> > + req->rq_majortimeo = jiffies + xprt_calc_majortimeo(req);
> > + req->rq_minortimeo = jiffies + req->rq_timeout;
> > + return status;
> > + }
> >   if (time_before(jiffies, req->rq_majortimeo)) {
> >   if (to->to_exponential)
> >   req->rq_timeout <<= 1;
> > @@ -649,6 +660,7 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
> >   spin_unlock(&xprt->transport_lock);
> >   status = -ETIMEDOUT;
> >   }
> > + xprt_reset_minortimeo(req);
> > 
> >   if (req->rq_timeout == 0) {
> >   printk(KERN_WARNING "xprt_adjust_timeout: rq_timeout = 0!\n");
> > --
> > 
> > > > > > > +     }
> > > > > > >       if (time_before(jiffies, req->rq_majortimeo)) {
> > > > > > >               if (to->to_exponential)
> > > > > > >                       req->rq_timeout <<= 1;
> > > > > > > @@ -638,6 +648,7 @@ int xprt_adjust_timeout(struct
> > > > > > > rpc_rqst *req)
> > > > > > >                       req->rq_timeout += to-
> > > > > > > >to_increment;
> > > > > > >               if (to->to_maxval && req->rq_timeout >= to-
> > > > > > > > to_maxval)
> > > > > > >                       req->rq_timeout = to->to_maxval;
> > > > > > > +             xprt_reset_minortimeo(req);
> > > > > > 
> > > > > > ...and then perhaps this can just be moved out of the
> > > > > > time_before()
> > > > > > condition, since it looks to me as if we also want to reset
> > > > > > req-
> > > > > > > rq_minortimeo when a major timeout occurs.
> > > > > > >               req->rq_retries++;
> > > > > > >       } else {
> > > > > > >               req->rq_timeout = to->to_initval;
> > > > 
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] SUNRPC dont update timeout value on connection reset
  2020-07-10 17:35         ` Olga Kornievskaia
@ 2020-07-10 18:40           ` Olga Kornievskaia
  2020-07-13 13:47             ` Trond Myklebust
  0 siblings, 1 reply; 13+ messages in thread
From: Olga Kornievskaia @ 2020-07-10 18:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, anna.schumaker

On Fri, Jul 10, 2020 at 1:35 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Thu, Jul 9, 2020 at 5:07 PM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
> >
> > On Thu, Jul 9, 2020 at 1:19 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
> > >
> > > On Thu, 2020-07-09 at 11:43 -0400, Olga Kornievskaia wrote:
> > > > On Thu, Jul 9, 2020 at 8:08 AM Trond Myklebust <
> > > > trondmy@hammerspace.com> wrote:
> > > > > Hi Olga
> > > > >
> > > > > On Wed, 2020-07-08 at 17:05 -0400, Olga Kornievskaia wrote:
> > > > > > Current behaviour: every time a v3 operation is re-sent to the
> > > > > > server
> > > > > > we update (double) the timeout. There is no distinction between
> > > > > > whether
> > > > > > or not the previous timer had expired before the re-sent
> > > > > > happened.
> > > > > >
> > > > > > Here's the scenario:
> > > > > > 1. Client sends a v3 operation
> > > > > > 2. Server RST-s the connection (prior to the timeout) (eg.,
> > > > > > connection
> > > > > > is immediately reset)
> > > > > > 3. Client re-sends a v3 operation but the timeout is now 120sec.
> > > > > >
> > > > > > As a result, an application sees 2mins pause before a retry in
> > > > > > case
> > > > > > server again does not reply.
> > > > > >
> > > > > > Instead, this patch proposes to keep track off when the minor
> > > > > > timeout
> > > > > > should happen and if it didn't, then don't update the new
> > > > > > timeout.
> > > > > >
> > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > > ---
> > > > > >  include/linux/sunrpc/xprt.h |  1 +
> > > > > >  net/sunrpc/xprt.c           | 11 +++++++++++
> > > > > >  2 files changed, 12 insertions(+)
> > > > > >
> > > > > > diff --git a/include/linux/sunrpc/xprt.h
> > > > > > b/include/linux/sunrpc/xprt.h
> > > > > > index e64bd82..a603d48 100644
> > > > > > --- a/include/linux/sunrpc/xprt.h
> > > > > > +++ b/include/linux/sunrpc/xprt.h
> > > > > > @@ -101,6 +101,7 @@ struct rpc_rqst {
> > > > > >                                                        * used in
> > > > > > the
> > > > > > softirq.
> > > > > >                                                        */
> > > > > >       unsigned long           rq_majortimeo;  /* major timeout
> > > > > > alarm */
> > > > > > +     unsigned long           rq_minortimeo;  /* minor timeout
> > > > > > alarm */
> > > > > >       unsigned long           rq_timeout;     /* Current timeout
> > > > > > value */
> > > > > >       ktime_t                 rq_rtt;         /* round-trip time
> > > > > > */
> > > > > >       unsigned int            rq_retries;     /* # of retries */
> > > > > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > > > > > index d5cc5db..c0ce232 100644
> > > > > > --- a/net/sunrpc/xprt.c
> > > > > > +++ b/net/sunrpc/xprt.c
> > > > > > @@ -607,6 +607,11 @@ static void xprt_reset_majortimeo(struct
> > > > > > rpc_rqst *req)
> > > > > >       req->rq_majortimeo += xprt_calc_majortimeo(req);
> > > > > >  }
> > > > > >
> > > > > > +static void xprt_reset_minortimeo(struct rpc_rqst *req)
> > > > > > +{
> > > > > > +     req->rq_minortimeo = jiffies + req->rq_timeout;
> > > > > > +}
> > > > > > +
> > > > > >  static void xprt_init_majortimeo(struct rpc_task *task, struct
> > > > > > rpc_rqst *req)
> > > > > >  {
> > > > > >       unsigned long time_init;
> > > > > > @@ -618,6 +623,7 @@ static void xprt_init_majortimeo(struct
> > > > > > rpc_task
> > > > > > *task, struct rpc_rqst *req)
> > > > > >               time_init = xprt_abs_ktime_to_jiffies(task-
> > > > > > >tk_start);
> > > > > >       req->rq_timeout = task->tk_client->cl_timeout->to_initval;
> > > > > >       req->rq_majortimeo = time_init + xprt_calc_majortimeo(req);
> > > > > > +     req->rq_minortimeo = time_init + req->rq_timeout;
> > > > > >  }
> > > > > >
> > > > > >  /**
> > > > > > @@ -631,6 +637,10 @@ int xprt_adjust_timeout(struct rpc_rqst
> > > > > > *req)
> > > > > >       const struct rpc_timeout *to = req->rq_task->tk_client-
> > > > > > > cl_timeout;
> > > > > >       int status = 0;
> > > > > >
> > > > > > +     if (time_before(jiffies, req->rq_minortimeo)) {
> > > > > > +             xprt_reset_minortimeo(req);
> > > > > > +             return status;
> > > > >
> > > > > Shouldn't this case be just returning without updating the timeout?
> > > > > After all, this is the case where nothing has expired yet.
> > > >
> > > > I think we perhaps should readjust the minor timeout every here but I
> > > > can't figure out what the desired behaviour should be. When should we
> > > > consider it's appropriate to double the timer. Consider the
> > > > following:
> > > >
> > > > time1: v3 op sent
> > > > time1+50s: server RSTs
> > > > We check that it's not yet the minor timeout (time1+60s)
> > > > time1+50s: v3 op re-sent  (say we don't reset the minor timeout to be
> > > > current time+60s)
> > > > time1+60s: server RSTs
> > > > Client will resend the op but now it's past the initial minor timeout
> > > > so the timeout will be doubled. Is that what we really want? Maybe it
> > > > is.
> > > > Say now the server RSTs the connection again (shortly after or in
> > > > less
> > > > than 60s), since we are not updating the minor timeout value, then
> > > > the
> > > > client will again modify the timeout before resending. Is that Ok?
> > > >
> > > > That's why my reasoning was that at every re-evaluation of the
> > > > timeout
> > > > value, we have the minor timeout set for current time+60s and we get
> > > > an RST within it then we don't modify the timeout value.
> > >
> > > So a couple of issues with that:
> > >
> > > The first is that a series of RST calls could cause the timeout to get
> > > pushed to the max value fairly quickly (btw, xprt_reset_minortimeo()
> > > does not enforce a limit right now).
> > >
> > > The second is that we end up pushing out the major timeout value, since
> > > the major timeout cannot occur unless the value of jiffies is after the
> > > minor timeout (which keeps changing on each pass).
> >
> > But dont we want to push out the major timeout?
> >
> > Actually i think, back in my example of getting the RST, at
> > (time1+50s). shouldn't minor_timeo and majortimeo be reset to
> > currenttime+appropriate value of minor/major?  If we are evaluating
> > the timer and the time difference between when the operation was sent
> > and now is less than 60s, we shouldn't say a timeout has occurried
> > (it's a pre-mature timeout) and thus its value shouldn't be modified.
> >
> > Thoughts?
>
> Do you feel that the following approach is incorrect? Sry it's just
> cut-and-paste but the logic is there. Thank you.

Scratch this... So with this we'd never timeout an operation at all.

> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
> index e64bd82..a603d48 100644
> --- a/include/linux/sunrpc/xprt.h
> +++ b/include/linux/sunrpc/xprt.h
> @@ -101,6 +101,7 @@ struct rpc_rqst {
>   * used in the softirq.
>   */
>   unsigned long rq_majortimeo; /* major timeout alarm */
> + unsigned long rq_minortimeo; /* minor timeout alarm */
>   unsigned long rq_timeout; /* Current timeout value */
>   ktime_t rq_rtt; /* round-trip time */
>   unsigned int rq_retries; /* # of retries */
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index d5cc5db..66d412b 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -607,6 +607,11 @@ static void xprt_reset_majortimeo(struct rpc_rqst *req)
>   req->rq_majortimeo += xprt_calc_majortimeo(req);
>  }
>
> +static void xprt_reset_minortimeo(struct rpc_rqst *req)
> +{
> + req->rq_minortimeo = jiffies + req->rq_timeout;
> +}
> +
>  static void xprt_init_majortimeo(struct rpc_task *task, struct rpc_rqst *req)
>  {
>   unsigned long time_init;
> @@ -618,6 +623,7 @@ static void xprt_init_majortimeo(struct rpc_task
> *task, struct rpc_rqst *req)
>   time_init = xprt_abs_ktime_to_jiffies(task->tk_start);
>   req->rq_timeout = task->tk_client->cl_timeout->to_initval;
>   req->rq_majortimeo = time_init + xprt_calc_majortimeo(req);
> + req->rq_minortimeo = time_init + req->rq_timeout;
>  }
>
>  /**
> @@ -631,6 +637,11 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
>   const struct rpc_timeout *to = req->rq_task->tk_client->cl_timeout;
>   int status = 0;
>
> + if (time_before(jiffies, req->rq_minortimeo)) {
> + req->rq_majortimeo = jiffies + xprt_calc_majortimeo(req);
> + req->rq_minortimeo = jiffies + req->rq_timeout;
> + return status;
> + }
>   if (time_before(jiffies, req->rq_majortimeo)) {
>   if (to->to_exponential)
>   req->rq_timeout <<= 1;
> @@ -649,6 +660,7 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
>   spin_unlock(&xprt->transport_lock);
>   status = -ETIMEDOUT;
>   }
> + xprt_reset_minortimeo(req);
>
>   if (req->rq_timeout == 0) {
>   printk(KERN_WARNING "xprt_adjust_timeout: rq_timeout = 0!\n");
> --
>
> > > > > > +     }
> > > > > >       if (time_before(jiffies, req->rq_majortimeo)) {
> > > > > >               if (to->to_exponential)
> > > > > >                       req->rq_timeout <<= 1;
> > > > > > @@ -638,6 +648,7 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
> > > > > >                       req->rq_timeout += to->to_increment;
> > > > > >               if (to->to_maxval && req->rq_timeout >= to-
> > > > > > >to_maxval)
> > > > > >                       req->rq_timeout = to->to_maxval;
> > > > > > +             xprt_reset_minortimeo(req);
> > > > >
> > > > > ...and then perhaps this can just be moved out of the time_before()
> > > > > condition, since it looks to me as if we also want to reset req-
> > > > > > rq_minortimeo when a major timeout occurs.
> > > > > >               req->rq_retries++;
> > > > > >       } else {
> > > > > >               req->rq_timeout = to->to_initval;
> > >
> > > --
> > > Trond Myklebust
> > > Linux NFS client maintainer, Hammerspace
> > > trond.myklebust@hammerspace.com
> > >
> > >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] SUNRPC dont update timeout value on connection reset
  2020-07-09 21:07       ` Olga Kornievskaia
@ 2020-07-10 17:35         ` Olga Kornievskaia
  2020-07-10 18:40           ` Olga Kornievskaia
  0 siblings, 1 reply; 13+ messages in thread
From: Olga Kornievskaia @ 2020-07-10 17:35 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, anna.schumaker

On Thu, Jul 9, 2020 at 5:07 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Thu, Jul 9, 2020 at 1:19 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
> >
> > On Thu, 2020-07-09 at 11:43 -0400, Olga Kornievskaia wrote:
> > > On Thu, Jul 9, 2020 at 8:08 AM Trond Myklebust <
> > > trondmy@hammerspace.com> wrote:
> > > > Hi Olga
> > > >
> > > > On Wed, 2020-07-08 at 17:05 -0400, Olga Kornievskaia wrote:
> > > > > Current behaviour: every time a v3 operation is re-sent to the
> > > > > server
> > > > > we update (double) the timeout. There is no distinction between
> > > > > whether
> > > > > or not the previous timer had expired before the re-sent
> > > > > happened.
> > > > >
> > > > > Here's the scenario:
> > > > > 1. Client sends a v3 operation
> > > > > 2. Server RST-s the connection (prior to the timeout) (eg.,
> > > > > connection
> > > > > is immediately reset)
> > > > > 3. Client re-sends a v3 operation but the timeout is now 120sec.
> > > > >
> > > > > As a result, an application sees 2mins pause before a retry in
> > > > > case
> > > > > server again does not reply.
> > > > >
> > > > > Instead, this patch proposes to keep track off when the minor
> > > > > timeout
> > > > > should happen and if it didn't, then don't update the new
> > > > > timeout.
> > > > >
> > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > ---
> > > > >  include/linux/sunrpc/xprt.h |  1 +
> > > > >  net/sunrpc/xprt.c           | 11 +++++++++++
> > > > >  2 files changed, 12 insertions(+)
> > > > >
> > > > > diff --git a/include/linux/sunrpc/xprt.h
> > > > > b/include/linux/sunrpc/xprt.h
> > > > > index e64bd82..a603d48 100644
> > > > > --- a/include/linux/sunrpc/xprt.h
> > > > > +++ b/include/linux/sunrpc/xprt.h
> > > > > @@ -101,6 +101,7 @@ struct rpc_rqst {
> > > > >                                                        * used in
> > > > > the
> > > > > softirq.
> > > > >                                                        */
> > > > >       unsigned long           rq_majortimeo;  /* major timeout
> > > > > alarm */
> > > > > +     unsigned long           rq_minortimeo;  /* minor timeout
> > > > > alarm */
> > > > >       unsigned long           rq_timeout;     /* Current timeout
> > > > > value */
> > > > >       ktime_t                 rq_rtt;         /* round-trip time
> > > > > */
> > > > >       unsigned int            rq_retries;     /* # of retries */
> > > > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > > > > index d5cc5db..c0ce232 100644
> > > > > --- a/net/sunrpc/xprt.c
> > > > > +++ b/net/sunrpc/xprt.c
> > > > > @@ -607,6 +607,11 @@ static void xprt_reset_majortimeo(struct
> > > > > rpc_rqst *req)
> > > > >       req->rq_majortimeo += xprt_calc_majortimeo(req);
> > > > >  }
> > > > >
> > > > > +static void xprt_reset_minortimeo(struct rpc_rqst *req)
> > > > > +{
> > > > > +     req->rq_minortimeo = jiffies + req->rq_timeout;
> > > > > +}
> > > > > +
> > > > >  static void xprt_init_majortimeo(struct rpc_task *task, struct
> > > > > rpc_rqst *req)
> > > > >  {
> > > > >       unsigned long time_init;
> > > > > @@ -618,6 +623,7 @@ static void xprt_init_majortimeo(struct
> > > > > rpc_task
> > > > > *task, struct rpc_rqst *req)
> > > > >               time_init = xprt_abs_ktime_to_jiffies(task-
> > > > > >tk_start);
> > > > >       req->rq_timeout = task->tk_client->cl_timeout->to_initval;
> > > > >       req->rq_majortimeo = time_init + xprt_calc_majortimeo(req);
> > > > > +     req->rq_minortimeo = time_init + req->rq_timeout;
> > > > >  }
> > > > >
> > > > >  /**
> > > > > @@ -631,6 +637,10 @@ int xprt_adjust_timeout(struct rpc_rqst
> > > > > *req)
> > > > >       const struct rpc_timeout *to = req->rq_task->tk_client-
> > > > > > cl_timeout;
> > > > >       int status = 0;
> > > > >
> > > > > +     if (time_before(jiffies, req->rq_minortimeo)) {
> > > > > +             xprt_reset_minortimeo(req);
> > > > > +             return status;
> > > >
> > > > Shouldn't this case be just returning without updating the timeout?
> > > > After all, this is the case where nothing has expired yet.
> > >
> > > I think we perhaps should readjust the minor timeout every here but I
> > > can't figure out what the desired behaviour should be. When should we
> > > consider it's appropriate to double the timer. Consider the
> > > following:
> > >
> > > time1: v3 op sent
> > > time1+50s: server RSTs
> > > We check that it's not yet the minor timeout (time1+60s)
> > > time1+50s: v3 op re-sent  (say we don't reset the minor timeout to be
> > > current time+60s)
> > > time1+60s: server RSTs
> > > Client will resend the op but now it's past the initial minor timeout
> > > so the timeout will be doubled. Is that what we really want? Maybe it
> > > is.
> > > Say now the server RSTs the connection again (shortly after or in
> > > less
> > > than 60s), since we are not updating the minor timeout value, then
> > > the
> > > client will again modify the timeout before resending. Is that Ok?
> > >
> > > That's why my reasoning was that at every re-evaluation of the
> > > timeout
> > > value, we have the minor timeout set for current time+60s and we get
> > > an RST within it then we don't modify the timeout value.
> >
> > So a couple of issues with that:
> >
> > The first is that a series of RST calls could cause the timeout to get
> > pushed to the max value fairly quickly (btw, xprt_reset_minortimeo()
> > does not enforce a limit right now).
> >
> > The second is that we end up pushing out the major timeout value, since
> > the major timeout cannot occur unless the value of jiffies is after the
> > minor timeout (which keeps changing on each pass).
>
> But dont we want to push out the major timeout?
>
> Actually i think, back in my example of getting the RST, at
> (time1+50s). shouldn't minor_timeo and majortimeo be reset to
> currenttime+appropriate value of minor/major?  If we are evaluating
> the timer and the time difference between when the operation was sent
> and now is less than 60s, we shouldn't say a timeout has occurried
> (it's a pre-mature timeout) and thus its value shouldn't be modified.
>
> Thoughts?

Do you feel that the following approach is incorrect? Sry it's just
cut-and-paste but the logic is there. Thank you.

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index e64bd82..a603d48 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -101,6 +101,7 @@ struct rpc_rqst {
  * used in the softirq.
  */
  unsigned long rq_majortimeo; /* major timeout alarm */
+ unsigned long rq_minortimeo; /* minor timeout alarm */
  unsigned long rq_timeout; /* Current timeout value */
  ktime_t rq_rtt; /* round-trip time */
  unsigned int rq_retries; /* # of retries */
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index d5cc5db..66d412b 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -607,6 +607,11 @@ static void xprt_reset_majortimeo(struct rpc_rqst *req)
  req->rq_majortimeo += xprt_calc_majortimeo(req);
 }

+static void xprt_reset_minortimeo(struct rpc_rqst *req)
+{
+ req->rq_minortimeo = jiffies + req->rq_timeout;
+}
+
 static void xprt_init_majortimeo(struct rpc_task *task, struct rpc_rqst *req)
 {
  unsigned long time_init;
@@ -618,6 +623,7 @@ static void xprt_init_majortimeo(struct rpc_task
*task, struct rpc_rqst *req)
  time_init = xprt_abs_ktime_to_jiffies(task->tk_start);
  req->rq_timeout = task->tk_client->cl_timeout->to_initval;
  req->rq_majortimeo = time_init + xprt_calc_majortimeo(req);
+ req->rq_minortimeo = time_init + req->rq_timeout;
 }

 /**
@@ -631,6 +637,11 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
  const struct rpc_timeout *to = req->rq_task->tk_client->cl_timeout;
  int status = 0;

+ if (time_before(jiffies, req->rq_minortimeo)) {
+ req->rq_majortimeo = jiffies + xprt_calc_majortimeo(req);
+ req->rq_minortimeo = jiffies + req->rq_timeout;
+ return status;
+ }
  if (time_before(jiffies, req->rq_majortimeo)) {
  if (to->to_exponential)
  req->rq_timeout <<= 1;
@@ -649,6 +660,7 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
  spin_unlock(&xprt->transport_lock);
  status = -ETIMEDOUT;
  }
+ xprt_reset_minortimeo(req);

  if (req->rq_timeout == 0) {
  printk(KERN_WARNING "xprt_adjust_timeout: rq_timeout = 0!\n");
-- 

> > > > > +     }
> > > > >       if (time_before(jiffies, req->rq_majortimeo)) {
> > > > >               if (to->to_exponential)
> > > > >                       req->rq_timeout <<= 1;
> > > > > @@ -638,6 +648,7 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
> > > > >                       req->rq_timeout += to->to_increment;
> > > > >               if (to->to_maxval && req->rq_timeout >= to-
> > > > > >to_maxval)
> > > > >                       req->rq_timeout = to->to_maxval;
> > > > > +             xprt_reset_minortimeo(req);
> > > >
> > > > ...and then perhaps this can just be moved out of the time_before()
> > > > condition, since it looks to me as if we also want to reset req-
> > > > > rq_minortimeo when a major timeout occurs.
> > > > >               req->rq_retries++;
> > > > >       } else {
> > > > >               req->rq_timeout = to->to_initval;
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.myklebust@hammerspace.com
> >
> >

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] SUNRPC dont update timeout value on connection reset
  2020-07-09 17:19     ` Trond Myklebust
@ 2020-07-09 21:07       ` Olga Kornievskaia
  2020-07-10 17:35         ` Olga Kornievskaia
  0 siblings, 1 reply; 13+ messages in thread
From: Olga Kornievskaia @ 2020-07-09 21:07 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, anna.schumaker

On Thu, Jul 9, 2020 at 1:19 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Thu, 2020-07-09 at 11:43 -0400, Olga Kornievskaia wrote:
> > On Thu, Jul 9, 2020 at 8:08 AM Trond Myklebust <
> > trondmy@hammerspace.com> wrote:
> > > Hi Olga
> > >
> > > On Wed, 2020-07-08 at 17:05 -0400, Olga Kornievskaia wrote:
> > > > Current behaviour: every time a v3 operation is re-sent to the
> > > > server
> > > > we update (double) the timeout. There is no distinction between
> > > > whether
> > > > or not the previous timer had expired before the re-sent
> > > > happened.
> > > >
> > > > Here's the scenario:
> > > > 1. Client sends a v3 operation
> > > > 2. Server RST-s the connection (prior to the timeout) (eg.,
> > > > connection
> > > > is immediately reset)
> > > > 3. Client re-sends a v3 operation but the timeout is now 120sec.
> > > >
> > > > As a result, an application sees 2mins pause before a retry in
> > > > case
> > > > server again does not reply.
> > > >
> > > > Instead, this patch proposes to keep track off when the minor
> > > > timeout
> > > > should happen and if it didn't, then don't update the new
> > > > timeout.
> > > >
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > > >  include/linux/sunrpc/xprt.h |  1 +
> > > >  net/sunrpc/xprt.c           | 11 +++++++++++
> > > >  2 files changed, 12 insertions(+)
> > > >
> > > > diff --git a/include/linux/sunrpc/xprt.h
> > > > b/include/linux/sunrpc/xprt.h
> > > > index e64bd82..a603d48 100644
> > > > --- a/include/linux/sunrpc/xprt.h
> > > > +++ b/include/linux/sunrpc/xprt.h
> > > > @@ -101,6 +101,7 @@ struct rpc_rqst {
> > > >                                                        * used in
> > > > the
> > > > softirq.
> > > >                                                        */
> > > >       unsigned long           rq_majortimeo;  /* major timeout
> > > > alarm */
> > > > +     unsigned long           rq_minortimeo;  /* minor timeout
> > > > alarm */
> > > >       unsigned long           rq_timeout;     /* Current timeout
> > > > value */
> > > >       ktime_t                 rq_rtt;         /* round-trip time
> > > > */
> > > >       unsigned int            rq_retries;     /* # of retries */
> > > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > > > index d5cc5db..c0ce232 100644
> > > > --- a/net/sunrpc/xprt.c
> > > > +++ b/net/sunrpc/xprt.c
> > > > @@ -607,6 +607,11 @@ static void xprt_reset_majortimeo(struct
> > > > rpc_rqst *req)
> > > >       req->rq_majortimeo += xprt_calc_majortimeo(req);
> > > >  }
> > > >
> > > > +static void xprt_reset_minortimeo(struct rpc_rqst *req)
> > > > +{
> > > > +     req->rq_minortimeo = jiffies + req->rq_timeout;
> > > > +}
> > > > +
> > > >  static void xprt_init_majortimeo(struct rpc_task *task, struct
> > > > rpc_rqst *req)
> > > >  {
> > > >       unsigned long time_init;
> > > > @@ -618,6 +623,7 @@ static void xprt_init_majortimeo(struct
> > > > rpc_task
> > > > *task, struct rpc_rqst *req)
> > > >               time_init = xprt_abs_ktime_to_jiffies(task-
> > > > >tk_start);
> > > >       req->rq_timeout = task->tk_client->cl_timeout->to_initval;
> > > >       req->rq_majortimeo = time_init + xprt_calc_majortimeo(req);
> > > > +     req->rq_minortimeo = time_init + req->rq_timeout;
> > > >  }
> > > >
> > > >  /**
> > > > @@ -631,6 +637,10 @@ int xprt_adjust_timeout(struct rpc_rqst
> > > > *req)
> > > >       const struct rpc_timeout *to = req->rq_task->tk_client-
> > > > > cl_timeout;
> > > >       int status = 0;
> > > >
> > > > +     if (time_before(jiffies, req->rq_minortimeo)) {
> > > > +             xprt_reset_minortimeo(req);
> > > > +             return status;
> > >
> > > Shouldn't this case be just returning without updating the timeout?
> > > After all, this is the case where nothing has expired yet.
> >
> > I think we perhaps should readjust the minor timeout every here but I
> > can't figure out what the desired behaviour should be. When should we
> > consider it's appropriate to double the timer. Consider the
> > following:
> >
> > time1: v3 op sent
> > time1+50s: server RSTs
> > We check that it's not yet the minor timeout (time1+60s)
> > time1+50s: v3 op re-sent  (say we don't reset the minor timeout to be
> > current time+60s)
> > time1+60s: server RSTs
> > Client will resend the op but now it's past the initial minor timeout
> > so the timeout will be doubled. Is that what we really want? Maybe it
> > is.
> > Say now the server RSTs the connection again (shortly after or in
> > less
> > than 60s), since we are not updating the minor timeout value, then
> > the
> > client will again modify the timeout before resending. Is that Ok?
> >
> > That's why my reasoning was that at every re-evaluation of the
> > timeout
> > value, we have the minor timeout set for current time+60s and we get
> > an RST within it then we don't modify the timeout value.
>
> So a couple of issues with that:
>
> The first is that a series of RST calls could cause the timeout to get
> pushed to the max value fairly quickly (btw, xprt_reset_minortimeo()
> does not enforce a limit right now).
>
> The second is that we end up pushing out the major timeout value, since
> the major timeout cannot occur unless the value of jiffies is after the
> minor timeout (which keeps changing on each pass).

But dont we want to push out the major timeout?

Actually i think, back in my example of getting the RST, at
(time1+50s). shouldn't minor_timeo and majortimeo be reset to
currenttime+appropriate value of minor/major?  If we are evaluating
the timer and the time difference between when the operation was sent
and now is less than 60s, we shouldn't say a timeout has occurried
(it's a pre-mature timeout) and thus its value shouldn't be modified.

Thoughts?


>
> >
> >
> > > > +     }
> > > >       if (time_before(jiffies, req->rq_majortimeo)) {
> > > >               if (to->to_exponential)
> > > >                       req->rq_timeout <<= 1;
> > > > @@ -638,6 +648,7 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
> > > >                       req->rq_timeout += to->to_increment;
> > > >               if (to->to_maxval && req->rq_timeout >= to-
> > > > >to_maxval)
> > > >                       req->rq_timeout = to->to_maxval;
> > > > +             xprt_reset_minortimeo(req);
> > >
> > > ...and then perhaps this can just be moved out of the time_before()
> > > condition, since it looks to me as if we also want to reset req-
> > > > rq_minortimeo when a major timeout occurs.
> > > >               req->rq_retries++;
> > > >       } else {
> > > >               req->rq_timeout = to->to_initval;
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] SUNRPC dont update timeout value on connection reset
  2020-07-09 15:43   ` Olga Kornievskaia
@ 2020-07-09 17:19     ` Trond Myklebust
  2020-07-09 21:07       ` Olga Kornievskaia
  0 siblings, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2020-07-09 17:19 UTC (permalink / raw)
  To: olga.kornievskaia; +Cc: linux-nfs, anna.schumaker

On Thu, 2020-07-09 at 11:43 -0400, Olga Kornievskaia wrote:
> On Thu, Jul 9, 2020 at 8:08 AM Trond Myklebust <
> trondmy@hammerspace.com> wrote:
> > Hi Olga
> > 
> > On Wed, 2020-07-08 at 17:05 -0400, Olga Kornievskaia wrote:
> > > Current behaviour: every time a v3 operation is re-sent to the
> > > server
> > > we update (double) the timeout. There is no distinction between
> > > whether
> > > or not the previous timer had expired before the re-sent
> > > happened.
> > > 
> > > Here's the scenario:
> > > 1. Client sends a v3 operation
> > > 2. Server RST-s the connection (prior to the timeout) (eg.,
> > > connection
> > > is immediately reset)
> > > 3. Client re-sends a v3 operation but the timeout is now 120sec.
> > > 
> > > As a result, an application sees 2mins pause before a retry in
> > > case
> > > server again does not reply.
> > > 
> > > Instead, this patch proposes to keep track off when the minor
> > > timeout
> > > should happen and if it didn't, then don't update the new
> > > timeout.
> > > 
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> > >  include/linux/sunrpc/xprt.h |  1 +
> > >  net/sunrpc/xprt.c           | 11 +++++++++++
> > >  2 files changed, 12 insertions(+)
> > > 
> > > diff --git a/include/linux/sunrpc/xprt.h
> > > b/include/linux/sunrpc/xprt.h
> > > index e64bd82..a603d48 100644
> > > --- a/include/linux/sunrpc/xprt.h
> > > +++ b/include/linux/sunrpc/xprt.h
> > > @@ -101,6 +101,7 @@ struct rpc_rqst {
> > >                                                        * used in
> > > the
> > > softirq.
> > >                                                        */
> > >       unsigned long           rq_majortimeo;  /* major timeout
> > > alarm */
> > > +     unsigned long           rq_minortimeo;  /* minor timeout
> > > alarm */
> > >       unsigned long           rq_timeout;     /* Current timeout
> > > value */
> > >       ktime_t                 rq_rtt;         /* round-trip time
> > > */
> > >       unsigned int            rq_retries;     /* # of retries */
> > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > > index d5cc5db..c0ce232 100644
> > > --- a/net/sunrpc/xprt.c
> > > +++ b/net/sunrpc/xprt.c
> > > @@ -607,6 +607,11 @@ static void xprt_reset_majortimeo(struct
> > > rpc_rqst *req)
> > >       req->rq_majortimeo += xprt_calc_majortimeo(req);
> > >  }
> > > 
> > > +static void xprt_reset_minortimeo(struct rpc_rqst *req)
> > > +{
> > > +     req->rq_minortimeo = jiffies + req->rq_timeout;
> > > +}
> > > +
> > >  static void xprt_init_majortimeo(struct rpc_task *task, struct
> > > rpc_rqst *req)
> > >  {
> > >       unsigned long time_init;
> > > @@ -618,6 +623,7 @@ static void xprt_init_majortimeo(struct
> > > rpc_task
> > > *task, struct rpc_rqst *req)
> > >               time_init = xprt_abs_ktime_to_jiffies(task-
> > > >tk_start);
> > >       req->rq_timeout = task->tk_client->cl_timeout->to_initval;
> > >       req->rq_majortimeo = time_init + xprt_calc_majortimeo(req);
> > > +     req->rq_minortimeo = time_init + req->rq_timeout;
> > >  }
> > > 
> > >  /**
> > > @@ -631,6 +637,10 @@ int xprt_adjust_timeout(struct rpc_rqst
> > > *req)
> > >       const struct rpc_timeout *to = req->rq_task->tk_client-
> > > > cl_timeout;
> > >       int status = 0;
> > > 
> > > +     if (time_before(jiffies, req->rq_minortimeo)) {
> > > +             xprt_reset_minortimeo(req);
> > > +             return status;
> > 
> > Shouldn't this case be just returning without updating the timeout?
> > After all, this is the case where nothing has expired yet.
> 
> I think we perhaps should readjust the minor timeout every here but I
> can't figure out what the desired behaviour should be. When should we
> consider it's appropriate to double the timer. Consider the
> following:
> 
> time1: v3 op sent
> time1+50s: server RSTs
> We check that it's not yet the minor timeout (time1+60s)
> time1+50s: v3 op re-sent  (say we don't reset the minor timeout to be
> current time+60s)
> time1+60s: server RSTs
> Client will resend the op but now it's past the initial minor timeout
> so the timeout will be doubled. Is that what we really want? Maybe it
> is.
> Say now the server RSTs the connection again (shortly after or in
> less
> than 60s), since we are not updating the minor timeout value, then
> the
> client will again modify the timeout before resending. Is that Ok?
> 
> That's why my reasoning was that at every re-evaluation of the
> timeout
> value, we have the minor timeout set for current time+60s and we get
> an RST within it then we don't modify the timeout value.

So a couple of issues with that:

The first is that a series of RST calls could cause the timeout to get
pushed to the max value fairly quickly (btw, xprt_reset_minortimeo()
does not enforce a limit right now).

The second is that we end up pushing out the major timeout value, since
the major timeout cannot occur unless the value of jiffies is after the
minor timeout (which keeps changing on each pass).

> 
> 
> > > +     }
> > >       if (time_before(jiffies, req->rq_majortimeo)) {
> > >               if (to->to_exponential)
> > >                       req->rq_timeout <<= 1;
> > > @@ -638,6 +648,7 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
> > >                       req->rq_timeout += to->to_increment;
> > >               if (to->to_maxval && req->rq_timeout >= to-
> > > >to_maxval)
> > >                       req->rq_timeout = to->to_maxval;
> > > +             xprt_reset_minortimeo(req);
> > 
> > ...and then perhaps this can just be moved out of the time_before()
> > condition, since it looks to me as if we also want to reset req-
> > > rq_minortimeo when a major timeout occurs.
> > >               req->rq_retries++;
> > >       } else {
> > >               req->rq_timeout = to->to_initval;

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] SUNRPC dont update timeout value on connection reset
  2020-07-09 12:08 ` Trond Myklebust
@ 2020-07-09 15:43   ` Olga Kornievskaia
  2020-07-09 17:19     ` Trond Myklebust
  0 siblings, 1 reply; 13+ messages in thread
From: Olga Kornievskaia @ 2020-07-09 15:43 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: anna.schumaker, linux-nfs

On Thu, Jul 9, 2020 at 8:08 AM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> Hi Olga
>
> On Wed, 2020-07-08 at 17:05 -0400, Olga Kornievskaia wrote:
> > Current behaviour: every time a v3 operation is re-sent to the server
> > we update (double) the timeout. There is no distinction between
> > whether
> > or not the previous timer had expired before the re-sent happened.
> >
> > Here's the scenario:
> > 1. Client sends a v3 operation
> > 2. Server RST-s the connection (prior to the timeout) (eg.,
> > connection
> > is immediately reset)
> > 3. Client re-sends a v3 operation but the timeout is now 120sec.
> >
> > As a result, an application sees 2mins pause before a retry in case
> > server again does not reply.
> >
> > Instead, this patch proposes to keep track off when the minor timeout
> > should happen and if it didn't, then don't update the new timeout.
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  include/linux/sunrpc/xprt.h |  1 +
> >  net/sunrpc/xprt.c           | 11 +++++++++++
> >  2 files changed, 12 insertions(+)
> >
> > diff --git a/include/linux/sunrpc/xprt.h
> > b/include/linux/sunrpc/xprt.h
> > index e64bd82..a603d48 100644
> > --- a/include/linux/sunrpc/xprt.h
> > +++ b/include/linux/sunrpc/xprt.h
> > @@ -101,6 +101,7 @@ struct rpc_rqst {
> >                                                        * used in the
> > softirq.
> >                                                        */
> >       unsigned long           rq_majortimeo;  /* major timeout
> > alarm */
> > +     unsigned long           rq_minortimeo;  /* minor timeout
> > alarm */
> >       unsigned long           rq_timeout;     /* Current timeout
> > value */
> >       ktime_t                 rq_rtt;         /* round-trip time */
> >       unsigned int            rq_retries;     /* # of retries */
> > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > index d5cc5db..c0ce232 100644
> > --- a/net/sunrpc/xprt.c
> > +++ b/net/sunrpc/xprt.c
> > @@ -607,6 +607,11 @@ static void xprt_reset_majortimeo(struct
> > rpc_rqst *req)
> >       req->rq_majortimeo += xprt_calc_majortimeo(req);
> >  }
> >
> > +static void xprt_reset_minortimeo(struct rpc_rqst *req)
> > +{
> > +     req->rq_minortimeo = jiffies + req->rq_timeout;
> > +}
> > +
> >  static void xprt_init_majortimeo(struct rpc_task *task, struct
> > rpc_rqst *req)
> >  {
> >       unsigned long time_init;
> > @@ -618,6 +623,7 @@ static void xprt_init_majortimeo(struct rpc_task
> > *task, struct rpc_rqst *req)
> >               time_init = xprt_abs_ktime_to_jiffies(task->tk_start);
> >       req->rq_timeout = task->tk_client->cl_timeout->to_initval;
> >       req->rq_majortimeo = time_init + xprt_calc_majortimeo(req);
> > +     req->rq_minortimeo = time_init + req->rq_timeout;
> >  }
> >
> >  /**
> > @@ -631,6 +637,10 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
> >       const struct rpc_timeout *to = req->rq_task->tk_client-
> > >cl_timeout;
> >       int status = 0;
> >
> > +     if (time_before(jiffies, req->rq_minortimeo)) {
> > +             xprt_reset_minortimeo(req);
> > +             return status;
>
> Shouldn't this case be just returning without updating the timeout?
> After all, this is the case where nothing has expired yet.

I think we perhaps should readjust the minor timeout every here but I
can't figure out what the desired behaviour should be. When should we
consider it's appropriate to double the timer. Consider the following:

time1: v3 op sent
time1+50s: server RSTs
We check that it's not yet the minor timeout (time1+60s)
time1+50s: v3 op re-sent  (say we don't reset the minor timeout to be
current time+60s)
time1+60s: server RSTs
Client will resend the op but now it's past the initial minor timeout
so the timeout will be doubled. Is that what we really want? Maybe it
is.
Say now the server RSTs the connection again (shortly after or in less
than 60s), since we are not updating the minor timeout value, then the
client will again modify the timeout before resending. Is that Ok?

That's why my reasoning was that at every re-evaluation of the timeout
value, we have the minor timeout set for current time+60s and we get
an RST within it then we don't modify the timeout value.


>
> > +     }
> >       if (time_before(jiffies, req->rq_majortimeo)) {
> >               if (to->to_exponential)
> >                       req->rq_timeout <<= 1;
> > @@ -638,6 +648,7 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
> >                       req->rq_timeout += to->to_increment;
> >               if (to->to_maxval && req->rq_timeout >= to->to_maxval)
> >                       req->rq_timeout = to->to_maxval;
> > +             xprt_reset_minortimeo(req);
>
> ...and then perhaps this can just be moved out of the time_before()
> condition, since it looks to me as if we also want to reset req-
> >rq_minortimeo when a major timeout occurs.
>
> >               req->rq_retries++;
> >       } else {
> >               req->rq_timeout = to->to_initval;
>
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] SUNRPC dont update timeout value on connection reset
  2020-07-08 21:05 Olga Kornievskaia
@ 2020-07-09 12:08 ` Trond Myklebust
  2020-07-09 15:43   ` Olga Kornievskaia
  0 siblings, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2020-07-09 12:08 UTC (permalink / raw)
  To: anna.schumaker, olga.kornievskaia; +Cc: linux-nfs

Hi Olga

On Wed, 2020-07-08 at 17:05 -0400, Olga Kornievskaia wrote:
> Current behaviour: every time a v3 operation is re-sent to the server
> we update (double) the timeout. There is no distinction between
> whether
> or not the previous timer had expired before the re-sent happened.
> 
> Here's the scenario:
> 1. Client sends a v3 operation
> 2. Server RST-s the connection (prior to the timeout) (eg.,
> connection
> is immediately reset)
> 3. Client re-sends a v3 operation but the timeout is now 120sec.
> 
> As a result, an application sees 2mins pause before a retry in case
> server again does not reply.
> 
> Instead, this patch proposes to keep track off when the minor timeout
> should happen and if it didn't, then don't update the new timeout.
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  include/linux/sunrpc/xprt.h |  1 +
>  net/sunrpc/xprt.c           | 11 +++++++++++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/include/linux/sunrpc/xprt.h
> b/include/linux/sunrpc/xprt.h
> index e64bd82..a603d48 100644
> --- a/include/linux/sunrpc/xprt.h
> +++ b/include/linux/sunrpc/xprt.h
> @@ -101,6 +101,7 @@ struct rpc_rqst {
>  							 * used in the
> softirq.
>  							 */
>  	unsigned long		rq_majortimeo;	/* major timeout
> alarm */
> +	unsigned long		rq_minortimeo;	/* minor timeout
> alarm */
>  	unsigned long		rq_timeout;	/* Current timeout
> value */
>  	ktime_t			rq_rtt;		/* round-trip time */
>  	unsigned int		rq_retries;	/* # of retries */
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index d5cc5db..c0ce232 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -607,6 +607,11 @@ static void xprt_reset_majortimeo(struct
> rpc_rqst *req)
>  	req->rq_majortimeo += xprt_calc_majortimeo(req);
>  }
>  
> +static void xprt_reset_minortimeo(struct rpc_rqst *req)
> +{
> +	req->rq_minortimeo = jiffies + req->rq_timeout;
> +}
> +
>  static void xprt_init_majortimeo(struct rpc_task *task, struct
> rpc_rqst *req)
>  {
>  	unsigned long time_init;
> @@ -618,6 +623,7 @@ static void xprt_init_majortimeo(struct rpc_task
> *task, struct rpc_rqst *req)
>  		time_init = xprt_abs_ktime_to_jiffies(task->tk_start);
>  	req->rq_timeout = task->tk_client->cl_timeout->to_initval;
>  	req->rq_majortimeo = time_init + xprt_calc_majortimeo(req);
> +	req->rq_minortimeo = time_init + req->rq_timeout;
>  }
>  
>  /**
> @@ -631,6 +637,10 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
>  	const struct rpc_timeout *to = req->rq_task->tk_client-
> >cl_timeout;
>  	int status = 0;
>  
> +	if (time_before(jiffies, req->rq_minortimeo)) {
> +		xprt_reset_minortimeo(req);
> +		return status;

Shouldn't this case be just returning without updating the timeout?
After all, this is the case where nothing has expired yet.

> +	}
>  	if (time_before(jiffies, req->rq_majortimeo)) {
>  		if (to->to_exponential)
>  			req->rq_timeout <<= 1;
> @@ -638,6 +648,7 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
>  			req->rq_timeout += to->to_increment;
>  		if (to->to_maxval && req->rq_timeout >= to->to_maxval)
>  			req->rq_timeout = to->to_maxval;
> +		xprt_reset_minortimeo(req);

...and then perhaps this can just be moved out of the time_before()
condition, since it looks to me as if we also want to reset req-
>rq_minortimeo when a major timeout occurs.

>  		req->rq_retries++;
>  	} else {
>  		req->rq_timeout = to->to_initval;


-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/1] SUNRPC dont update timeout value on connection reset
@ 2020-07-08 21:05 Olga Kornievskaia
  2020-07-09 12:08 ` Trond Myklebust
  0 siblings, 1 reply; 13+ messages in thread
From: Olga Kornievskaia @ 2020-07-08 21:05 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker; +Cc: linux-nfs

Current behaviour: every time a v3 operation is re-sent to the server
we update (double) the timeout. There is no distinction between whether
or not the previous timer had expired before the re-sent happened.

Here's the scenario:
1. Client sends a v3 operation
2. Server RST-s the connection (prior to the timeout) (eg., connection
is immediately reset)
3. Client re-sends a v3 operation but the timeout is now 120sec.

As a result, an application sees 2mins pause before a retry in case
server again does not reply.

Instead, this patch proposes to keep track off when the minor timeout
should happen and if it didn't, then don't update the new timeout.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 include/linux/sunrpc/xprt.h |  1 +
 net/sunrpc/xprt.c           | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index e64bd82..a603d48 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -101,6 +101,7 @@ struct rpc_rqst {
 							 * used in the softirq.
 							 */
 	unsigned long		rq_majortimeo;	/* major timeout alarm */
+	unsigned long		rq_minortimeo;	/* minor timeout alarm */
 	unsigned long		rq_timeout;	/* Current timeout value */
 	ktime_t			rq_rtt;		/* round-trip time */
 	unsigned int		rq_retries;	/* # of retries */
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index d5cc5db..c0ce232 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -607,6 +607,11 @@ static void xprt_reset_majortimeo(struct rpc_rqst *req)
 	req->rq_majortimeo += xprt_calc_majortimeo(req);
 }
 
+static void xprt_reset_minortimeo(struct rpc_rqst *req)
+{
+	req->rq_minortimeo = jiffies + req->rq_timeout;
+}
+
 static void xprt_init_majortimeo(struct rpc_task *task, struct rpc_rqst *req)
 {
 	unsigned long time_init;
@@ -618,6 +623,7 @@ static void xprt_init_majortimeo(struct rpc_task *task, struct rpc_rqst *req)
 		time_init = xprt_abs_ktime_to_jiffies(task->tk_start);
 	req->rq_timeout = task->tk_client->cl_timeout->to_initval;
 	req->rq_majortimeo = time_init + xprt_calc_majortimeo(req);
+	req->rq_minortimeo = time_init + req->rq_timeout;
 }
 
 /**
@@ -631,6 +637,10 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
 	const struct rpc_timeout *to = req->rq_task->tk_client->cl_timeout;
 	int status = 0;
 
+	if (time_before(jiffies, req->rq_minortimeo)) {
+		xprt_reset_minortimeo(req);
+		return status;
+	}
 	if (time_before(jiffies, req->rq_majortimeo)) {
 		if (to->to_exponential)
 			req->rq_timeout <<= 1;
@@ -638,6 +648,7 @@ int xprt_adjust_timeout(struct rpc_rqst *req)
 			req->rq_timeout += to->to_increment;
 		if (to->to_maxval && req->rq_timeout >= to->to_maxval)
 			req->rq_timeout = to->to_maxval;
+		xprt_reset_minortimeo(req);
 		req->rq_retries++;
 	} else {
 		req->rq_timeout = to->to_initval;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-07-13 16:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-23 15:24 [PATCH 1/1] SUNRPC dont update timeout value on connection reset Olga Kornievskaia
2020-06-28 18:03 ` Olga Kornievskaia
2020-06-28 21:16   ` Trond Myklebust
2020-07-08 21:04     ` Olga Kornievskaia
2020-07-08 21:05 Olga Kornievskaia
2020-07-09 12:08 ` Trond Myklebust
2020-07-09 15:43   ` Olga Kornievskaia
2020-07-09 17:19     ` Trond Myklebust
2020-07-09 21:07       ` Olga Kornievskaia
2020-07-10 17:35         ` Olga Kornievskaia
2020-07-10 18:40           ` Olga Kornievskaia
2020-07-13 13:47             ` Trond Myklebust
2020-07-13 16:18               ` Olga Kornievskaia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.