All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tom Talpey <tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Linux NFS Mailing List
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Cc: Shirley Ma <shirley.ma-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Subject: Re: Proposal for simplifying NFS/RDMA client memory registration
Date: Fri, 28 Feb 2014 13:41:27 -0800	[thread overview]
Message-ID: <53110287.9000400@talpey.com> (raw)
In-Reply-To: <01C4496A-F074-4F72-9DF0-6076C05E8A1F-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

On 2/26/2014 8:44 AM, Chuck Lever wrote:
> Hi-
>
> Shirley Ma and I are reviving work on the NFS/RDMA client code base in the Linux kernel.  So far we’ve built and run functional tests to determine what is working and what is broken.
>
> One complication is the number of memory registration modes supported by the RPC/RDMA transport: there are seven.  These were added over the years to support particular HCAs or as proof-of-concept.  The transport chooses a registration mode at mount time based on what the link HCA supports.
>
> Not all HCAs support all memory registration modes, so our test matrix is quite large.  I’d like to propose removing support for one or more of these memory registration modes in the name of making it easier to change this code and test it without breaking something that we can’t test.
>
> BOUNCEBUFFERS - All HCAs support this mode.  Does not use RDMA READ and WRITE, and the client end copies data into place.  RDMA is offloaded, by data copy is not.  I’m told it was never intended for production use.
>
> REGISTER - Safe but relatively slow.  Uses reg_phys_mr verb which is not supported in mlx4/mlx5, but all other HCAs/providers can use this mode.
>
> MEM_WINDOWS - Uses bind_mr verb.  Safe, but supports only a narrow range of HCAs.
>
> MEM_WINDOWS_ASYNC - Not always safe, and only a narrow range of HCAs is supported.
>
> MTHCA_FMR - Uses alloc_fmr verb.  Safe, reasonably fast, but only a narrow range of older HCAs is supported.

The MTHCA FMR is not completely safe - it protects only on page
boundaries, therefore the neighboring bytes are vulnerable to
silent corruption (reads) and exposure (write).

It is quite correct that they are supported on only a specific
set of legacy Mellanox HCA. You should consider removing the
code that looked for this PCI ID and attempted to alter the
device's wire MTU, to overcome another of its limitations.

>
> FRMR - Safe, generally fast.  Currently the preferred registration mode, but is not supported with some older HCAs/providers.

This should be, by far, the preferred mode. Also, if I recall
correctly, the server depends on this mode being available/supported.
However, it may not be supported by Soft iWARP. Physical addressing
is used.

>
> ALLPHYSICAL - Usually fast, but not safe as it exposes client memory.  All HCAs support this mode.

Not safe is an understatement. It exposes all of client physical
memory to the peer, for both read and write. A simple pointer error
on the server will silently corrupt the client. This mode was
intended only for testing, and in experimental deployments.


Tom.

>
>
> I propose removing BOUNCEBUFFERS since it is not intended for production use.
>
> I propose removing ALLPHYSICAL and MEM_WINDOWS_ASYNC as they are not generally safe.  RFC 5666 suggests that unsafe memory registration modes be avoided.
>
> I propose removing MEM_WINDOWS as it adds complexity without adding a lot of HCA compatibility.
>
> I propose removing MTHCA_FMR as I’m told it is hard to obtain HCAs we would need for testing this registration mode, and these are all old adapters anyway.
>
> This leaves NFS/RDMA client support for REGISTER and FRMR, which should cover all existing HCAs, and it is easy to test both of these memory registration modes with just one or two well-picked HCAs.
>
> We would contribute these changes to the client code base.  The NFS/RDMA server code could use similar attention, but we are not volunteering to change it at this time.
>
> Thoughts/comments?
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Tom Talpey <tom@talpey.com>
To: Chuck Lever <chuck.lever@oracle.com>,
	linux-rdma@vger.kernel.org,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Cc: Shirley Ma <shirley.ma@oracle.com>
Subject: Re: Proposal for simplifying NFS/RDMA client memory registration
Date: Fri, 28 Feb 2014 13:41:27 -0800	[thread overview]
Message-ID: <53110287.9000400@talpey.com> (raw)
In-Reply-To: <01C4496A-F074-4F72-9DF0-6076C05E8A1F@oracle.com>

On 2/26/2014 8:44 AM, Chuck Lever wrote:
> Hi-
>
> Shirley Ma and I are reviving work on the NFS/RDMA client code base in the Linux kernel.  So far we’ve built and run functional tests to determine what is working and what is broken.
>
> One complication is the number of memory registration modes supported by the RPC/RDMA transport: there are seven.  These were added over the years to support particular HCAs or as proof-of-concept.  The transport chooses a registration mode at mount time based on what the link HCA supports.
>
> Not all HCAs support all memory registration modes, so our test matrix is quite large.  I’d like to propose removing support for one or more of these memory registration modes in the name of making it easier to change this code and test it without breaking something that we can’t test.
>
> BOUNCEBUFFERS - All HCAs support this mode.  Does not use RDMA READ and WRITE, and the client end copies data into place.  RDMA is offloaded, by data copy is not.  I’m told it was never intended for production use.
>
> REGISTER - Safe but relatively slow.  Uses reg_phys_mr verb which is not supported in mlx4/mlx5, but all other HCAs/providers can use this mode.
>
> MEM_WINDOWS - Uses bind_mr verb.  Safe, but supports only a narrow range of HCAs.
>
> MEM_WINDOWS_ASYNC - Not always safe, and only a narrow range of HCAs is supported.
>
> MTHCA_FMR - Uses alloc_fmr verb.  Safe, reasonably fast, but only a narrow range of older HCAs is supported.

The MTHCA FMR is not completely safe - it protects only on page
boundaries, therefore the neighboring bytes are vulnerable to
silent corruption (reads) and exposure (write).

It is quite correct that they are supported on only a specific
set of legacy Mellanox HCA. You should consider removing the
code that looked for this PCI ID and attempted to alter the
device's wire MTU, to overcome another of its limitations.

>
> FRMR - Safe, generally fast.  Currently the preferred registration mode, but is not supported with some older HCAs/providers.

This should be, by far, the preferred mode. Also, if I recall
correctly, the server depends on this mode being available/supported.
However, it may not be supported by Soft iWARP. Physical addressing
is used.

>
> ALLPHYSICAL - Usually fast, but not safe as it exposes client memory.  All HCAs support this mode.

Not safe is an understatement. It exposes all of client physical
memory to the peer, for both read and write. A simple pointer error
on the server will silently corrupt the client. This mode was
intended only for testing, and in experimental deployments.


Tom.

>
>
> I propose removing BOUNCEBUFFERS since it is not intended for production use.
>
> I propose removing ALLPHYSICAL and MEM_WINDOWS_ASYNC as they are not generally safe.  RFC 5666 suggests that unsafe memory registration modes be avoided.
>
> I propose removing MEM_WINDOWS as it adds complexity without adding a lot of HCA compatibility.
>
> I propose removing MTHCA_FMR as I’m told it is hard to obtain HCAs we would need for testing this registration mode, and these are all old adapters anyway.
>
> This leaves NFS/RDMA client support for REGISTER and FRMR, which should cover all existing HCAs, and it is easy to test both of these memory registration modes with just one or two well-picked HCAs.
>
> We would contribute these changes to the client code base.  The NFS/RDMA server code could use similar attention, but we are not volunteering to change it at this time.
>
> Thoughts/comments?
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

  parent reply	other threads:[~2014-02-28 21:41 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-26 16:44 Proposal for simplifying NFS/RDMA client memory registration Chuck Lever
2014-02-26 16:44 ` Chuck Lever
     [not found] ` <01C4496A-F074-4F72-9DF0-6076C05E8A1F-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-02-26 17:42   ` faibish, sorin
2014-02-26 17:42     ` faibish, sorin
2014-02-28 21:41   ` Tom Talpey [this message]
2014-02-28 21:41     ` Tom Talpey
     [not found]     ` <CABgxfbHwo13ma=-Xn+S36WwD8LVNLdw6UHztFe8EkCA_=NBenw@mail.gmail.com>
     [not found]       ` <CABgxfbHwo13ma=-Xn+S36WwD8LVNLdw6UHztFe8EkCA_=NBenw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-02-28 22:26         ` Wendy Cheng
2014-02-28 22:26           ` Wendy Cheng
     [not found]           ` <CABgxfbHg0B02c2zpYdCB4pUZZhhA4aKRuyh4Kx=NE1yAisFLLQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-01  2:59             ` Chuck Lever
2014-03-01  2:59               ` Chuck Lever
     [not found]               ` <B83B4DA6-E9CF-4DF3-9227-5EF9B4D25F4D-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-03-01 16:00                 ` Jeff Layton
2014-03-01 16:00                   ` Jeff Layton
     [not found]                   ` <20140301110022.417eb088-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2014-03-01 17:14                     ` Chuck Lever
2014-03-01 17:14                       ` Chuck Lever
2014-03-01 21:29                 ` Tom Tucker
2014-03-01 21:29                   ` Tom Tucker
     [not found]                   ` <53125125.9010709-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-03-02 22:28                     ` Chuck Lever
2014-03-02 22:28                       ` Chuck Lever
2014-03-03 17:02   ` Chuck Lever
2014-03-03 17:02     ` Chuck Lever
     [not found]     ` <E755E683-C064-4ED3-8448-C61F4E5AF16E-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-03-03 19:07       ` Christoph Hellwig
2014-03-03 19:07         ` Christoph Hellwig
     [not found]         ` <20140303190733.GA4556-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-03-03 19:54           ` faibish, sorin
2014-03-03 19:54             ` faibish, sorin
     [not found]             ` <79E85B7D-5ECC-4A63-BB07-C45A570DA53D-mb1K0bWo544@public.gmane.org>
2014-03-03 20:33               ` Wendy Cheng
2014-03-03 20:33                 ` Wendy Cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53110287.9000400@talpey.com \
    --to=tom-cls1zie5n5hqt0dzr+alfa@public.gmane.org \
    --cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=shirley.ma-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.