All of lore.kernel.org
 help / color / mirror / Atom feed
From: "bfields@fieldses.org" <bfields@fieldses.org>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: "neilb@suse.de" <neilb@suse.de>,
	"fsorenso@redhat.com" <fsorenso@redhat.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"aglo@umich.edu" <aglo@umich.edu>,
	"bcodding@redhat.com" <bcodding@redhat.com>,
	"jshivers@redhat.com" <jshivers@redhat.com>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>
Subject: Re: Re: unsharing tcp connections from different NFS mounts
Date: Tue, 4 May 2021 12:51:20 -0400	[thread overview]
Message-ID: <20210504165120.GA18746@fieldses.org> (raw)
In-Reply-To: <4033e1e8b52c27503abe5855f81b7d12b2e46eec.camel@hammerspace.com>

Thanks very much to all of you for the explanations and concrete
suggestions for things to look at, I feel much less stuck!

--b.

On Tue, May 04, 2021 at 02:27:04PM +0000, Trond Myklebust wrote:
> On Tue, 2021-05-04 at 12:08 +1000, NeilBrown wrote:
> > On Tue, 04 May 2021, bfields@fieldses.org wrote:
> > > On Wed, Jan 20, 2021 at 10:07:37AM -0500, bfields@fieldses.org wrote:
> > > > 
> > > > So mainly:
> > > > 
> > > > > > > Why is there a performance regression being seen by these
> > > > > > > setups
> > > > > > > when they share the same connection? Is it really the
> > > > > > > connection,
> > > > > > > or is it the fact that they all share the same fixed-slot
> > > > > > > session?
> > > > 
> > > > I don't know.  Any pointers how we might go about finding the
> > > > answer?
> > > 
> > > I set this aside and then get bugged about it again.
> > > 
> > > I apologize, I don't understand what you're asking for here, but it
> > > seemed obvious to you and Tom, so I'm sure the problem is me.  Are
> > > you
> > > free for a call sometime maybe?  Or do you have any suggestions for
> > > how
> > > you'd go about investigating this?
> > 
> > I think a useful first step would be to understand what is getting in
> > the way of the small requests.
> >  - are they in the client waiting for slots which are all consumed by
> >    large writes?
> >  - are they in TCP stream behind megabytes of writes that need to be
> >    consumed before they can even be seen by the server?
> >  - are they in a socket buffer on the server waiting to be served
> >    while all the nfsd thread are busy handling writes?
> > 
> > I cannot see an easy way to measure which it is.
> 
> The nfs4_sequence_done tracepoint will give you a running count of the
> highest slot id in use.
> 
> The mountstats 'execute time' will give you the time between the
> request being created and the time a reply was received. That time
> includes the time spent waiting for a NFSv4 session slot.
> 
> The mountstats 'backlog wait' will tell you the time spent waiting for
> an RPC slot after obtaining the NFSv4 session slot.
> 
> The mountstats 'RTT' will give you the time spend waiting for the RPC
> request to be received, processed and replied to by the server.
> 
> Finally, the mountstats also tell you average per-op bytes sent/bytes
> received.
> 
> IOW: The mountstats really gives you almost all the information you
> need here, particularly if you use it in the 'interval reporting' mode.
> The only thing it does not tell you is whether or not the NFSv4 session
> slot table is full (which is why you want the tracepoint).
> 
> > I guess monitoring how much of the time that the client has no free
> > slots might give hints about the first.  If there are always free
> > slots,
> > the first case cannot be the problem.
> > 
> > With NFSv3, the slot management happened at the RPC layer and there
> > were
> > several queues (RPC_PRIORITY_LOW/NORMAL/HIGH/PRIVILEGED) where requests
> > could wait for a free slot.  Since we gained dynamic slot allocation -
> > up to 65536 by default - I wonder if that has much effect any more.
> > 
> > For NFSv4.1+ the slot management is at the NFS level.  The server sets
> > a
> > maximum which defaults to (maybe is limited to) 1024 by the Linux
> > server.
> > So there are always free rpc slots.
> > The Linux client only has a single queue for each slot table, and I
> > think there is one slot table for the forward channel of a session.
> > So it seems we no longer get any priority management (sync writes used
> > to get priority over async writes).
> > 
> > Increasing the number of slots advertised by the server might be
> > interesting.  It is unlikely to fix anything, but it might move the
> > bottle-neck.
> > 
> > Decreasing the maximum of number of tcp slots might also be interesting
> > (below the number of NFS slots at least). 
> > That would allow the RPC priority infrastructure to work, and if the
> > large-file writes are async, they might gets slowed down.
> > 
> > If the problem is in the TCP stream (which is possible if the relevant
> > network buffers are bloated), then you'd really need multiple TCP
> > streams
> > (which can certainly improve throughput in some cases).  That is what
> > nconnect give you.  nconnect does minimal balancing.  It general it
> > will
> > round-robin, but if the number of requests (not bytes) queued on one
> > socket is below average, that socket is likely to get the next request.
> 
> It's not round-robin. Transports are allocated to a new RPC request
> based on a measure of their queue length in order to skip over those
> that show signs of above average congestion.
> 
> > So just adding more connections with nconnect is unlikely to help. 
> > You
> > would need to add a policy engine (struct rpc_xpr_iter_ops) which
> > reserves some connections for small requests.  That should be fairly
> > easy to write a proof-of-concept for.
> 
> Ideally we would want to tie into cgroups as the control mechanism so
> that NFS can be treated like any other I/O resource.
> 
> > 
> > NeilBrown
> > 
> > 
> > > 
> > > Would it be worth experimenting with giving some sort of advantage
> > > to
> > > readers?  (E.g., reserving a few slots for reads and getattrs and
> > > such?)
> > > 
> > > --b.
> > > 
> > > > It's easy to test the case of entirely seperate state & tcp
> > > > connections.
> > > > 
> > > > If we want to test with a shared connection but separate slots I
> > > > guess
> > > > we'd need to create a separate session for each nfs4_server, and
> > > > a lot
> > > > of functions that currently take an nfs4_client would need to
> > > > take an
> > > > nfs4_server?
> > > > 
> > > > --b.
> > > 
> > > 
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
> 

  reply	other threads:[~2021-05-04 16:51 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-06 15:13 unsharing tcp connections from different NFS mounts J. Bruce Fields
2020-10-06 15:20 ` Chuck Lever
2020-10-06 15:22   ` Bruce Fields
2020-10-06 17:07     ` Tom Talpey
2020-10-06 19:30       ` Bruce Fields
     [not found]         ` <CAGrwUG5_KeRVR8chcA8=3FSeii2+4c8FbuE=CSGAtYVYqV4kLg@mail.gmail.com>
2020-10-07 14:08           ` Tom Talpey
2020-10-06 19:36 ` Benjamin Coddington
2020-10-06 21:46   ` Olga Kornievskaia
2020-10-07  0:18     ` J. Bruce Fields
2020-10-07 11:27       ` Benjamin Coddington
2020-10-07 12:55         ` Benjamin Coddington
2020-10-07 13:45           ` Chuck Lever
2020-10-07 14:05             ` Bruce Fields
2020-10-07 14:15               ` Chuck Lever
2020-10-07 16:05                 ` Bruce Fields
2020-10-07 16:44                   ` Trond Myklebust
2020-10-07 17:15                     ` Bruce Fields
2020-10-07 17:29                       ` Trond Myklebust
2020-10-07 18:05                         ` bfields
2020-10-07 19:11                           ` Trond Myklebust
2020-10-07 20:29                             ` bfields
2020-10-07 18:04                     ` Benjamin Coddington
2020-10-07 18:19                       ` Trond Myklebust
2020-10-07 16:50                   ` Trond Myklebust
2021-01-19 22:22                     ` bfields
2021-01-19 23:09                       ` Trond Myklebust
2021-01-20 15:07                         ` bfields
2021-05-03 20:09                           ` bfields
2021-05-04  2:08                             ` NeilBrown
2021-05-04 13:27                               ` Tom Talpey
2021-05-04 14:27                               ` Trond Myklebust
2021-05-04 16:51                                 ` bfields [this message]
2021-05-04 21:32                                   ` Daire Byrne
2021-05-04 21:48                                     ` Trond Myklebust
2021-05-05 12:53                                       ` Daire Byrne
2021-01-20 15:58                       ` Chuck Lever
2020-10-07 13:56 ` Patrick Goetz
2020-10-07 16:28   ` Igor Ostrovsky
2020-10-07 16:30   ` Benjamin Coddington

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210504165120.GA18746@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=aglo@umich.edu \
    --cc=bcodding@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=fsorenso@redhat.com \
    --cc=jshivers@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.