From: Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> To: Tom Talpey <tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> Cc: Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>, Linux NFS Mailing List <linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Subject: Re: [PATCH v1 00/16] NFS/RDMA patches proposed for 4.1 Date: Tue, 5 May 2015 14:06:27 -0700 [thread overview] Message-ID: <20150505210627.GA5941@infradead.org> (raw) In-Reply-To: <55492ED3.7000507-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> On Tue, May 05, 2015 at 04:57:55PM -0400, Tom Talpey wrote: > Actually, I strongly disagree that the in-kernel consumers want to > register a struct page. They want to register a list of pages, often > a rather long one. They want this because it allows the RDMA layer to > address the list with a single memory handle. This is where things > get tricky. Yes, I agree - my wording was wrong and if you look at the next point it should be obvious that I meant multiple struct pages. > So the "pinned" or "wired" term is because in order to do RDMA, the > page needs to have a fixed mapping to this handle. Usually, that means > a physical address. There are some new approaches that allow the NIC > to raise a fault and/or walk kernel page tables, but one way or the > other the page had better be resident. RDMA NICs, generally speaking, > don't buffer in-flight RDMA data, nor do you want them to. But that whole painpoint only existist for userspace ib verbs consumers. And in-kernel consumer fits into the "pinned" or "wired" categegory, as any local DMA requires it. > > - In many but not all cases we might need an offset/length for each > > page (think struct bvec, paged sk_buffs, or scatterlists of some > > sort), in other an offset/len for the whole set of pages is fine, > > but that's a superset of the one above. > > Yep, RDMA calls this FBO and length, and further, the protocol requires > that the data itself be contiguous within the registration, that is, the > FBO can be non-zero, but no other holes be present. The contiguous requirements isn't something we can alway guarantee. While a lot of I/O will have that form the form where there are holes can happen, although it's not common. > > - we usually want it to be as fast as possible > > In the case of file protocols such as NFS/RDMA and SMB Direct, as well > as block protocols such as iSER, these registrations are set up and > torn down on a per-I/O basis, in order to protect the data from > misbehaving peers or misbehaving hardware. So to me as a storage > protocol provider, "usually" means "always". Yes. As I said I haven't actually found anything yet that doesn't fit the pattern, but the RDMA in-kernel API is such a mess that I didn't want to put my hand in the fire and say always. > I totally get where you're coming from, my main question is whether > it's possible to nail the requirements of some useful common API. > It has been tried before, shall I say. Do you have any information on these attempts and why the failed? Note that the only interesting ones would be for in-kernel consumers. Userspace verbs are another order of magnitude more problems, so they're not too interesting. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: Christoph Hellwig <hch@infradead.org> To: Tom Talpey <tom@talpey.com> Cc: Christoph Hellwig <hch@infradead.org>, Chuck Lever <chuck.lever@oracle.com>, Linux NFS Mailing List <linux-nfs@vger.kernel.org>, linux-rdma@vger.kernel.org Subject: Re: [PATCH v1 00/16] NFS/RDMA patches proposed for 4.1 Date: Tue, 5 May 2015 14:06:27 -0700 [thread overview] Message-ID: <20150505210627.GA5941@infradead.org> (raw) In-Reply-To: <55492ED3.7000507@talpey.com> On Tue, May 05, 2015 at 04:57:55PM -0400, Tom Talpey wrote: > Actually, I strongly disagree that the in-kernel consumers want to > register a struct page. They want to register a list of pages, often > a rather long one. They want this because it allows the RDMA layer to > address the list with a single memory handle. This is where things > get tricky. Yes, I agree - my wording was wrong and if you look at the next point it should be obvious that I meant multiple struct pages. > So the "pinned" or "wired" term is because in order to do RDMA, the > page needs to have a fixed mapping to this handle. Usually, that means > a physical address. There are some new approaches that allow the NIC > to raise a fault and/or walk kernel page tables, but one way or the > other the page had better be resident. RDMA NICs, generally speaking, > don't buffer in-flight RDMA data, nor do you want them to. But that whole painpoint only existist for userspace ib verbs consumers. And in-kernel consumer fits into the "pinned" or "wired" categegory, as any local DMA requires it. > > - In many but not all cases we might need an offset/length for each > > page (think struct bvec, paged sk_buffs, or scatterlists of some > > sort), in other an offset/len for the whole set of pages is fine, > > but that's a superset of the one above. > > Yep, RDMA calls this FBO and length, and further, the protocol requires > that the data itself be contiguous within the registration, that is, the > FBO can be non-zero, but no other holes be present. The contiguous requirements isn't something we can alway guarantee. While a lot of I/O will have that form the form where there are holes can happen, although it's not common. > > - we usually want it to be as fast as possible > > In the case of file protocols such as NFS/RDMA and SMB Direct, as well > as block protocols such as iSER, these registrations are set up and > torn down on a per-I/O basis, in order to protect the data from > misbehaving peers or misbehaving hardware. So to me as a storage > protocol provider, "usually" means "always". Yes. As I said I haven't actually found anything yet that doesn't fit the pattern, but the RDMA in-kernel API is such a mess that I didn't want to put my hand in the fire and say always. > I totally get where you're coming from, my main question is whether > it's possible to nail the requirements of some useful common API. > It has been tried before, shall I say. Do you have any information on these attempts and why the failed? Note that the only interesting ones would be for in-kernel consumers. Userspace verbs are another order of magnitude more problems, so they're not too interesting.
next prev parent reply other threads:[~2015-05-05 21:06 UTC|newest] Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-03-13 21:21 [PATCH v1 00/16] NFS/RDMA patches proposed for 4.1 Chuck Lever 2015-03-13 21:21 ` [PATCH v1 01/16] xprtrdma: Display IPv6 addresses and port numbers correctly Chuck Lever 2015-03-13 21:21 ` [PATCH v1 02/16] xprtrdma: Perform a full marshal on retransmit Chuck Lever 2015-03-13 21:21 ` [PATCH v1 03/16] xprtrdma: Add vector of ops for each memory registration strategy Chuck Lever 2015-03-13 21:21 ` [PATCH v1 04/16] xprtrdma: Add a "max_payload" op for each memreg mode Chuck Lever 2015-03-13 21:22 ` [PATCH v1 05/16] xprtrdma: Add a "register_external" " Chuck Lever 2015-03-13 21:22 ` [PATCH v1 06/16] xprtrdma: Add a "deregister_external" " Chuck Lever 2015-03-17 14:37 ` Anna Schumaker 2015-03-17 15:04 ` Chuck Lever 2015-03-13 21:22 ` [PATCH v1 07/16] xprtrdma: Add "init MRs" memreg op Chuck Lever 2015-03-13 21:22 ` [PATCH v1 08/16] xprtrdma: Add "reset " Chuck Lever 2015-03-13 21:22 ` [PATCH v1 09/16] xprtrdma: Add "destroy " Chuck Lever 2015-03-13 21:22 ` [PATCH v1 10/16] xprtrdma: Add "open" " Chuck Lever 2015-03-17 15:16 ` Anna Schumaker 2015-03-17 15:19 ` Chuck Lever 2015-03-13 21:23 ` [PATCH v1 11/16] xprtrdma: Handle non-SEND completions via a callout Chuck Lever 2015-03-13 21:23 ` [PATCH v1 12/16] xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external() Chuck Lever 2015-03-13 21:23 ` [PATCH v1 13/16] xprtrdma: Acquire MRs in rpcrdma_register_external() Chuck Lever 2015-03-13 21:23 ` [PATCH v1 14/16] xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy Chuck Lever 2015-03-13 21:23 ` [PATCH v1 15/16] xprtrdma: Make rpcrdma_{un}map_one() into inline functions Chuck Lever 2015-03-13 21:23 ` [PATCH v1 16/16] xprtrdma: Split rb_lock Chuck Lever [not found] ` <20150313211124.22471.14517.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org> 2015-05-05 15:44 ` [PATCH v1 00/16] NFS/RDMA patches proposed for 4.1 Christoph Hellwig 2015-05-05 15:44 ` Christoph Hellwig [not found] ` <20150505154411.GA16729-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-05-05 16:04 ` Chuck Lever 2015-05-05 16:04 ` Chuck Lever [not found] ` <5E1B32EA-9803-49AA-856D-BF0E1A5DFFF4-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2015-05-05 17:25 ` Christoph Hellwig 2015-05-05 17:25 ` Christoph Hellwig [not found] ` <20150505172540.GA19442-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-05-05 18:14 ` Tom Talpey 2015-05-05 18:14 ` Tom Talpey [not found] ` <55490886.4070502-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> 2015-05-05 19:10 ` Christoph Hellwig 2015-05-05 19:10 ` Christoph Hellwig [not found] ` <20150505191012.GA21164-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-05-05 20:57 ` Tom Talpey 2015-05-05 20:57 ` Tom Talpey [not found] ` <55492ED3.7000507-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> 2015-05-05 21:06 ` Christoph Hellwig [this message] 2015-05-05 21:06 ` Christoph Hellwig [not found] ` <20150505210627.GA5941-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-05-05 21:32 ` Tom Talpey 2015-05-05 21:32 ` Tom Talpey [not found] ` <554936E5.80607-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> 2015-05-05 22:38 ` Jason Gunthorpe 2015-05-05 22:38 ` Jason Gunthorpe [not found] ` <20150505223855.GA7696-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-05-06 0:16 ` Tom Talpey 2015-05-06 0:16 ` Tom Talpey [not found] ` <55495D41.5090502-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> 2015-05-06 16:20 ` Jason Gunthorpe 2015-05-06 16:20 ` Jason Gunthorpe 2015-05-06 7:01 ` Bart Van Assche 2015-05-06 7:01 ` Bart Van Assche [not found] ` <5549BC33.30905-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> 2015-05-06 16:38 ` Jason Gunthorpe 2015-05-06 16:38 ` Jason Gunthorpe 2015-05-06 7:33 ` Christoph Hellwig 2015-05-06 7:33 ` Christoph Hellwig 2015-05-06 7:09 ` Bart Van Assche 2015-05-06 7:09 ` Bart Van Assche [not found] ` <5549BE30.8020505-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> 2015-05-06 7:29 ` Christoph Hellwig 2015-05-06 7:29 ` Christoph Hellwig 2015-05-06 12:15 ` Sagi Grimberg 2015-05-06 12:15 ` Sagi Grimberg 2015-03-13 21:26 Chuck Lever
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20150505210627.GA5941@infradead.org \ --to=hch-wegcikhe2lqwvfeawa7xhq@public.gmane.org \ --cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \ --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.