All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Mancuso <bmancuso@akamai.com>
To: nfs@lists.sourceforge.net
Subject: Re: NFS/UDP slow read, lost fragments
Date: Thu, 25 Sep 2003 17:44:26 -0400	[thread overview]
Message-ID: <20030925174426.B787@muppet.kendall.corp.akamai.com> (raw)
In-Reply-To: <Pine.LNX.4.44.0309251019170.28586-100000@localhost.localdomain>; from rmillner@transmeta.com on Thu, Sep 25, 2003 at 10:59:43AM -0700

On Thu, Sep 25, 2003 at 10:59:43AM -0700, Robert L. Millner wrote:
: Hello,
: 
: The problem I am seeing is similar to what was posted by Larry Sendlosky
: on Jun 27, "2.4.20-pre3 -> 2.4.21 : nfs client read performance broken"
: though I have not done as through a drill-down into the nature of the
: problem.
: 
: Somewhere between 2.4.19 and 2.4.20, NFS/UDP read performance began to
: suck because of a large number of request retransmits.  From tcpdump, the
: retransmits are for read transactions which return data in a reasonable
: time frame but are missing one or more fragments of the return packet.

This is because code that exponentially backs off RTO for UDP RPC was
backported from the 2.[56] series in 2.4.20, and this code is
completely broken. Trond has a patch in his patchset for 2.6 that
significantly fixes these problems, however this patch still has one
problem that can result in a large number of unnecessary retransmits
for RPC sessions that have low variance in RTT: RTO is calculated to
be the filtered round trip time plus a small constant times the mean
deviation of round trip times. However, because the RTT calculation
code implements Karn's algorithm (from TCP: RTT calculation isn't done
for responses to RPC requests that have been retransmitted), RTT is
never allowed to increase, for were a response to take longer than
measured RTT plus the (assumed small) deviation, the packet will be
retransmitted and a calculatation that will increase measured RTT
won't be done. Thus if a server's real RTT were to increase over time,
initial RTO values would never grow (for measured RTT would never grow
beyond the minimum ever measured), and RPC requests will frequently be
retransmitted at least once. This can be easily remedied however by
TCP's technique of inheriting backoff of RTO from previous
transactions: create a new variable somewhere in the clnt structure
called, say, cl_backoff; Each time an RPC transaction completes, store
the number of retransmits for that transaction (req->rq_nresend) in
cl_backoff; calculate RTO to be rpc_calc_rto() left shifted by the
number of retransmits for this transaction (initially 0) plus
clnt->cl_backoff (the number of retransmits for the last completed
transaction).

The backported code mentioned above will also result in significantly
more EIO events for users with soft UDP mounts. Users seeing lots of
EIO's should see them diminish after these problems are fixed.

: Is this a known problem?  Is there a patch already out there or in the
: works that fixes this?  What other data would help drill into this
: problem?

I will post a patch tomorrow for 2.4.20.

Brian Mancuso


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

  parent reply	other threads:[~2003-09-25 21:44 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-09-25 17:59 NFS/UDP slow read, lost fragments Robert L. Millner
2003-09-25 20:22 ` Brian Mancuso
2003-09-25 20:33 ` brianm
2003-09-25 21:44 ` Brian Mancuso [this message]
2003-09-25 23:31   ` Trond Myklebust
2003-09-26 21:07     ` Brian Mancuso
2003-09-27  5:02       ` Robert L. Millner
2003-10-14  1:20       ` Steve Dickson
2003-10-14 14:52         ` Trond Myklebust
2003-10-15  3:17           ` Steve Dickson
2003-10-15  3:27             ` Trond Myklebust
2003-09-25 20:11 Lever, Charles

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030925174426.B787@muppet.kendall.corp.akamai.com \
    --to=bmancuso@akamai.com \
    --cc=nfs@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.