All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever III <chuck.lever@oracle.com>
To: Nagendra Tomar <Nagendra.Tomar@microsoft.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Trond Myklebust <trond.myklebust@hammerspace.com>,
	Anna Schumaker <anna.schumaker@netapp.com>
Subject: Re: [PATCH 0/5] nfs: Add mount option for forcing RPC requests for one file over one connection
Date: Wed, 24 Mar 2021 14:34:45 +0000	[thread overview]
Message-ID: <A0D817BC-3F15-4A4C-BC9D-AD2238754F96@oracle.com> (raw)
In-Reply-To: <SG2P153MB03616FAC8BFEAF305A10A71C9E649@SG2P153MB0361.APCP153.PROD.OUTLOOK.COM>



> On Mar 23, 2021, at 7:31 PM, Nagendra Tomar <Nagendra.Tomar@microsoft.com> wrote:
>> 
>>> I was hoping that such a client side change could be useful to possibly more
>>> users with similar setups, after all file->connection affinity doesn't sound too
>>> arcane and one can think of benefits of one node processing one file. No?
>> 
>> That's where I'm getting hung up (outside the personal preference
>> that we not introduce yes another mount option). While I understand
>> what's going on now (thanks!) I'm not sure this is a common usage
>> scenario for NFSv3. Other opinions welcome here!
>> 
>> Nor does it seem like one that we want to encourage over solutions
>> like pNFS. Generally the Linux community has taken the position
>> that server bugs should be addressed on the server, and this seems
>> like a problem that is introduced by your middlebox and server
>> combination. 
> 
> I would like to look at it not as a problem created by our server setup,
> but rather as "one more scenario" which the client can much easily and
> generically handle and hence the patch.
> 
>> The client is working properly and is complying with spec.
> 
> The nconnect roundrobin distribution is just one way of utilizing multiple
> connections, which happens to be limiting for this specific usecase. 
> My patch proposes another way of distributing RPCs over the connections,
> which is more suitable for this usecase and maybe others.

Indeed, the nconnect work isn't quite complete, and the client will
need some way to specify how to schedule RPCs over several connections
to the same server. There seems to be two somewhat orthogonal
components to your proposal:

A. The introduction of a mount option to specify an RPC connection
scheduling mechanism

B. The use of a file handle hash to do that scheduling


For A: Again, I'd rather avoid adding more mount options, for reasons
I've described most recently over in the d_type/READDIR thread. There
are other options here. Anna has proposed a sysfs API that exposes
each kernel RPC connection for fine-grained control. See this thread:

https://lore.kernel.org/linux-nfs/20210312211826.360959-1-Anna.Schumaker@Netapp.com/

Dan Aloni has proposed an additional mechanism that enables user space
to associate an NFS mount point to its underlying RPC connections.

These approaches might be suitable for your purpose, or they might
only be a little inspiration to get creative.


For B: I agree with Tom that leaving this up to client system
administrators is a punt and usually not a scalable or future-looking
solution.

And I maintain you will be better off with a centralized and easily
configurable mechanism for balancing load, not a fixed algorithm that
you have to introduce to your clients via code changes or repeated
distributed changes to mount options.


There are other ways to utilize your LB. Since this is NFSv3, you
might expose your back-end NFSv3 servers by destination port (aka,
a set of NAT rules).

MDS NFSv4 server: clients get to it at the VIP address, port 2049
DS NFSv3 server A: clients get to it at the VIP address, port i
DS NFSv3 server B: clients get to it at the VIP address, port j
DS NFSv3 server C: clients get to it at the VIP address, port k

The LB translates [VIP]:i into [server A]:2049, [VIP]:j into
[server B]:2049, and so on.

I'm not sure if the flexfiles layout carries universal addresses with
port information, though. If it did, that would enable you to expose
all your backend data servers directly to clients via a single VIP,
and yet the LB would still be just a Layer 3 forwarding service and
not application-aware.


--
Chuck Lever




  reply	other threads:[~2021-03-24 14:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-23  5:46 [PATCH 0/5] nfs: Add mount option for forcing RPC requests for one file over one connection Nagendra Tomar
2021-03-23 13:06 ` Tom Talpey
2021-03-23 14:41   ` [EXTERNAL] " Nagendra Tomar
2021-03-23 13:53 ` Chuck Lever III
2021-03-23 15:57   ` Nagendra Tomar
2021-03-23 16:14     ` Chuck Lever III
2021-03-23 16:29       ` Nagendra Tomar
2021-03-23 17:24         ` Chuck Lever III
2021-03-23 18:01           ` Nagendra Tomar
2021-03-23 18:25             ` Chuck Lever III
2021-03-23 23:31               ` Nagendra Tomar
2021-03-24 14:34                 ` Chuck Lever III [this message]
2021-03-24 13:23       ` Tom Talpey
2021-03-24 14:11         ` Trond Myklebust
2021-03-24 22:58           ` Nagendra Tomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A0D817BC-3F15-4A4C-BC9D-AD2238754F96@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=Nagendra.Tomar@microsoft.com \
    --cc=anna.schumaker@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.