All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <Anna.Schumaker@netapp.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Tom Talpey <tom@talpey.com>
Subject: Re: [PATCH v1 13/16] NFS: Add sidecar RPC client support
Date: Wed, 22 Oct 2014 13:20:03 -0400	[thread overview]
Message-ID: <F1F06936-AB16-4E64-A484-B8D4B628E51A@oracle.com> (raw)
In-Reply-To: <CAHQdGtTgE3MdkcW9+NFgKCg81nUC0bGW8Mp26EKcu0RAT2TiPA@mail.gmail.com>


> On Oct 22, 2014, at 4:39 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote:
> 
>> On Tue, Oct 21, 2014 at 8:11 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
>> 
>>> On Oct 21, 2014, at 3:45 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote:
>>> 
>>>> On Tue, Oct 21, 2014 at 4:06 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>>>> 
>>>> There is no show-stopper (see Section 5.1, after all). It’s
>>>> simply a matter of development effort: a side-car is much
>>>> less work than implementing full RDMA backchannel support for
>>>> both a client and server, especially since TCP backchannel
>>>> already works and can be used immediately.
>>>> 
>>>> Also, no problem with eventually implementing RDMA backchannel
>>>> if the complexity, and any performance overhead it introduces in
>>>> the forward channel, can be justified. The client can use the
>>>> CREATE_SESSION flags to detect what a server supports.
>>> 
>>> What complexity and performance overhead does it introduce in the
>>> forward channel?
>> 
>> The benefit of RDMA is that there are opportunities to
>> reduce host CPU interaction with incoming data.
>> Bi-direction requires that the transport look at the RPC
>> header to determine the direction of the message. That
>> could have an impact on the forward channel, but it’s
>> never been measured, to my knowledge.
>> 
>> The reason this is more of an issue for RPC/RDMA is that
>> a copy of the XID appears in the RPC/RDMA header to avoid
>> the need to look at the RPC header. That’s typically what
>> implementations use to steer RPC reply processing.
>> 
>> Often the RPC/RDMA header and RPC header land in
>> disparate buffers. The RPC/RDMA reply handler looks
>> strictly at the RPC/RDMA header, and runs in a tasklet
>> usually on a different CPU. Adding bi-direction would mean
>> the transport would have to peek into the upper layer
>> headers, possibly resulting in cache line bouncing.
> 
> Under what circumstances would you expect to receive a valid NFSv4.1
> callback with an RDMA header that spans multiple cache lines?

The RPC header and RPC/RDMA header are separate entities, but
together can span multiple cache lines if the server has returned a
chunk list containing multiple entries.

For example, RDMA_NOMSG would send the RPC/RDMA header
via RDMA SEND with a chunk list that represents the RPC and NFS
payload. That list could make the header larger than 32 bytes.

I expect that any callback that involves more than 1024 byte of
RPC payload will need to use RDMA_NOMSG. A long device
info list might fit that category?

>> The complexity would be the addition of over a hundred
>> new lines of code on the client, and possibly a similar
>> amount of new code on the server. Small, perhaps, but
>> not insignificant.
> 
> Until there are RDMA users, I care a lot less about code changes to
> xprtrdma than to NFS.
> 
>>>>> 2) Why do we instead have to solve the whole backchannel problem in
>>>>> the NFSv4.1 layer, and where is the discussion of the merits for and
>>>>> against that particular solution? As far as I can tell, it imposes at
>>>>> least 2 extra requirements:
>>>>> a) NFSv4.1 client+server must have support either for session
>>>>> trunking or for clientid trunking
>>>> 
>>>> Very minimal trunking support. The only operation allowed on
>>>> the TCP side-car's forward channel is BIND_CONN_TO_SESSION.
>>>> 
>>>> Bruce told me that associating multiple transports to a
>>>> clientid/session should not be an issue for his server (his
>>>> words were “if that doesn’t work, it’s a bug”).
>>>> 
>>>> Would this restrictive form of trunking present a problem?
>>>> 
>>>>> b) NFSv4.1 client must be able to set up a TCP connection to the
>>>>> server (that can be session/clientid trunked with the existing RDMA
>>>>> channel)
>>>> 
>>>> Also very minimal changes. The changes are already done,
>>>> posted in v1 of this patch series.
>>> 
>>> I'm not asking for details on the size of the changesets, but for a
>>> justification of the design itself.
>> 
>> The size of the changeset _is_ the justification. It’s
>> a much less invasive change to add a TCP side-car than
>> it is to implement RDMA backchannel on both server and
>> client.
> 
> Please define your use of the word "invasive" in the above context. To
> me "invasive" means "will affect code that is in use by others".

The server side, then, is non-invasive. The client side makes minor
changes to state management.

> 
>> Most servers would require almost no change. Linux needs
>> only a bug fix or two. Effectively zero-impact for
>> servers that already support NFSv4.0 on RDMA to get
>> NFSv4.1 and pNFS on RDMA, with working callbacks.
>> 
>> That’s really all there is to it. It’s almost entirely a
>> practical consideration: we have the infrastructure and
>> can make it work in just a few lines of code.
>> 
>>> If it is possible to confine all
>>> the changes to the RPC/RDMA layer, then why consider patches that
>>> change the NFSv4.1 layer at all?
>> 
>> The fast new transport bring-up benefit is probably the
>> biggest win. A TCP side-car makes bringing up any new
>> transport implementation simpler.
> 
> That's an assertion that assumes:
> - we actually want to implement more transports aside from RDMA

So you no longer consider RPC/SCTP a possibility?

> - implementing bi-directional transports in the RPC layer is non-simple

I don't care to generalize about that. In the RPC/RDMA case, there
are some complications that make it non-simple, but not impossible.
So we have an example of a non-simple case, IMO.

> Right now, the benefit is only to RDMA users. Nobody else is asking
> for such a change.
> 
>> And, RPC/RDMA offers zero performance benefit for
>> backchannel traffic, especially since CB traffic would
>> never move via RDMA READ/WRITE (as per RFC 5667 section
>> 5.1).
>> 
>> The primary benefit to doing an RPC/RDMA-only solution
>> is that there is no upper layer impact. Is that a design
>> requirement?

Based on your objections, it appears that "no upper layer
impact" is a hard design requirement. I will take this as a
NACK for the side-car approach.

  reply	other threads:[~2014-10-22 17:20 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-16 19:38 [PATCH v1 00/16] NFS/RDMA patches for 3.19 Chuck Lever
2014-10-16 19:38 ` [PATCH v1 01/16] xprtrdma: Return an errno from rpcrdma_register_external() Chuck Lever
2014-10-16 19:38 ` [PATCH v1 02/16] xprtrdma: Cap req_cqinit Chuck Lever
2014-10-20 13:27   ` Anna Schumaker
2014-10-16 19:38 ` [PATCH v1 03/16] SUNRPC: Pass callsize and recvsize to buf_alloc as separate arguments Chuck Lever
2014-10-20 14:04   ` Anna Schumaker
2014-10-20 18:21     ` Chuck Lever
2014-10-16 19:38 ` [PATCH v1 04/16] xprtrdma: Re-write rpcrdma_flush_cqs() Chuck Lever
2014-10-16 19:38 ` [PATCH v1 05/16] xprtrdma: unmap all FMRs during transport disconnect Chuck Lever
2014-10-16 19:39 ` [PATCH v1 06/16] xprtrdma: spin CQ completion vectors Chuck Lever
2014-10-16 19:39 ` [PATCH v1 07/16] SUNRPC: serialize iostats updates Chuck Lever
2014-10-16 19:39 ` [PATCH v1 08/16] xprtrdma: Display async errors Chuck Lever
2014-10-16 19:39 ` [PATCH v1 09/16] xprtrdma: Enable pad optimization Chuck Lever
2014-10-16 19:39 ` [PATCH v1 10/16] NFS: Include transport protocol name in UCS client string Chuck Lever
2014-10-16 19:39 ` [PATCH v1 11/16] NFS: Clean up nfs4_init_callback() Chuck Lever
2014-10-16 19:39 ` [PATCH v1 12/16] SUNRPC: Add rpc_xprt_is_bidirectional() Chuck Lever
2014-10-16 19:40 ` [PATCH v1 13/16] NFS: Add sidecar RPC client support Chuck Lever
2014-10-20 17:33   ` Anna Schumaker
2014-10-20 18:09     ` Chuck Lever
2014-10-20 19:40       ` Trond Myklebust
2014-10-20 20:11         ` Chuck Lever
2014-10-20 22:31           ` Trond Myklebust
2014-10-21  1:06             ` Chuck Lever
2014-10-21  7:45               ` Trond Myklebust
2014-10-21 17:11                 ` Chuck Lever
2014-10-22  8:39                   ` Trond Myklebust
2014-10-22 17:20                     ` Chuck Lever [this message]
2014-10-22 20:53                       ` Trond Myklebust
2014-10-22 22:38                         ` Chuck Lever
2014-10-23 13:32                   ` J. Bruce Fields
2014-10-23 13:55                     ` Chuck Lever
2014-10-16 19:40 ` [PATCH v1 14/16] NFS: Set BIND_CONN_TO_SESSION arguments in the proc layer Chuck Lever
2014-10-16 19:40 ` [PATCH v1 15/16] NFS: Bind side-car connection to session Chuck Lever
2014-10-16 19:40 ` [PATCH v1 16/16] NFS: Disable SESSION4_BACK_CHAN when a backchannel sidecar is to be used Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F1F06936-AB16-4E64-A484-B8D4B628E51A@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=Anna.Schumaker@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=tom@talpey.com \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.