All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: Jason Gunthorpe
	<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Cc: Sagi Grimberg
	<sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>,
	Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Steve Wise
	<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>,
	Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Oren Duer <oren-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>,
	Liran Liss <liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	"Hefty,
	Sean" <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Tom Talpey <tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
Subject: Re: Kernel fast memory registration API proposal [RFC]
Date: Thu, 16 Jul 2015 10:45:46 -0400	[thread overview]
Message-ID: <F0518DEF-D43C-4CB6-89ED-CA3E94A4DD72@oracle.com> (raw)
In-Reply-To: <20150715224928.GA941-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>


On Jul 15, 2015, at 6:49 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote:

> On Wed, Jul 15, 2015 at 05:25:11PM -0400, Chuck Lever wrote:
> 
>> NFS READ and WRITE data payloads are mapped with ib_map_phys_mr()
>> just before the RPC is sent, and those payloads are unmapped
>> with ib_unmap_fmr() as soon as the client sees the server’s RPC
>> reply.
> 
> Okay.. but.. ib_unmap_fmr is the thing that sleeps, so you must
> already have a sleepable context when you call it?

The RPC scheduler operates on the assumption that the processing
during each step does not sleep.

We’re not holding a lock, so a short sleep here works. In general
this kind of thing can deadlock pretty easily, but right at this
step I think it’s avoiding deadlock "by accident."

For some time, I’ve been considering deferring ib_unmap_fmr() to
a work queue, but FMR is operational and is a bit of an antique
so I haven’t put much effort into bettering it.

The point is, this is not something that should be perpetuated
into a new API, and certainly the other initiators have a hard
intolerance for a sleep.


> I was poking around to see how NFS is working (to see how we might fit
> a different API under here), I didn't find the call to ro_unmap I'd
> expect? xprt_rdma_free is presumbly the place, but how it relates to
> rpcrdma_reply_handler I could not obviously see. Does the upper layer
> call back to xprt_rdma_free before any of the RDMA buffers are
> touched?  Can you clear up the call chain for me?

The server performs RDMA READ and WRITE operations, then SENDs the
RPC reply.

On the client, rpcrdma_recvcq_upcall() is invoked when the RPC
reply arrives and the RECV completes.

rpcrdma_schedule_tasklet() queues the incoming RPC reply on a
global list and spanks our reply tasklet.

The tasklet invokes rpcrdma_reply_handler() for each reply on the
list.

The reply handler parses the incoming reply, looks up the XID and
matches it to a waiting RPC request (xprt_lookup_rqst). It then
wakes that request (xprt_complete_rqst). The tasklet goes to the
next reply on the global list.

The RPC scheduler sees the awoken RPC request and steps the
finished request through to completion, at which point
xprt_release() is invoked to retire the request slot.

Here resources allocated to the RPC request are freed. For
RPC/RDMA transports, xprt->ops->buf_free is xprt_rdma_free().
xprt_rdma_free() invokes the ro_unmap method to unmap/invalidate
the MRs involved with the RPC request.


> Second, the FRWR stuff looks deeply suspicious, it is posting a
> IB_WR_LOCAL_INV, but the completion of that (in frwr_sendcompletion)
> triggers nothing. Handoff to the kernel must be done only after seeing
> IB_WC_LOCAL_INV, never before.

I don’t understand. Our LOCAL_INV is typically unsignalled because
there’s nothing to do in the normal case. frwr_sendcompletion is
there to handle only flushed sends.

Send queue ordering and the mw_list prevent each MR from being
reused before it is truly invalidated.


> Third all the unmaps do something like this:
> 
> frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
> {
> 	invalidate_wr.opcode = IB_WR_LOCAL_INV;
>   [..]
> 	while (seg1->mr_nsegs--)
> 		rpcrdma_unmap_one(ia->ri_device, seg++);
> 	read_lock(&ia->ri_qplock);
> 	rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
> 
> That is the wrong order, the DMA unmap of rpcrdma_unmap_one must only
> be done once the invalidate is complete. For FR this is ib_unmap_fmr
> returning, for FRWR it is when you see IB_WC_LOCAL_INV.

I’m assuming you mean the DMA unmap has to be done after LINV
completes.

I’m not sure it matters here, because when the RPC reply shows
up at the client, it already means the server isn’t going to
access that MR/rkey again. (If the server does access that MR
again, it would be a protocol violation).

Can you provide an example in another kernel ULP?


> Finally, where is the flow control for posting the IB_WR_LOCAL_INV to
> the SQ? I'm guessing there is some kind of implicit flow control here
> where the SEND buffer is recycled during RECV of the response, and
> this limits the SQ usage, then there are guarenteed 3x as many SQEs as
> SEND buffers to accommodate the REG_MR and INVALIDATE WRs??

RPC/RDMA provides flow control via credits. The peers agree on
a maximum number of concurrent outstanding RPC requests.
Typically that is 32, though implementations are increasing that
default to 128.

There’s a comment in frwr_op_open that explains how we calculate
the maximum number of send queue entries for each credit.


>> These memory regions require an rkey, which is sent in the RPC
>> call to the server. The server performs RDMA READ or WRITE on
>> these regions.
>> 
>> I don’t think the server ever uses FMR to register the target
>> memory regions for RDMA READ and WRITE.
> 
> What happens if you hit the SGE limit when constructing the RDMA
> READ/WRITE? Upper layer forbids that? What about iWARP, how do you
> avoid the 1 SGE limit on RDMA READ?

I’m much less familiar with the server side. Maybe Steve knows,
but I suspect the RPC/RDMA code is careful to construct more
READ Work Requests if it runs out of sges.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2015-07-16 14:45 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-10  9:09 Kernel fast memory registration API proposal [RFC] Sagi Grimberg
     [not found] ` <559F8BD1.9080308-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-11 10:39   ` Christoph Hellwig
     [not found]     ` <20150711103920.GE14741-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-12  7:57       ` Sagi Grimberg
     [not found]         ` <55A21DF6.6090909-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-12 18:15           ` Chuck Lever
     [not found]             ` <96901C8F-D916-4ECF-8DA4-C5C67FB8539E-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-13  6:47               ` Christoph Hellwig
     [not found]                 ` <20150713064701.GB31842-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-13 14:16                   ` Chuck Lever
     [not found]                     ` <1D9C0527-E277-4C3F-A80D-C4FBAA3D82E9-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-14  8:50                       ` Sagi Grimberg
     [not found]                         ` <55A4CD5B.9030000-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-14 18:58                           ` Chuck Lever
2015-07-13 16:30   ` Jason Gunthorpe
     [not found]     ` <20150713163015.GA23832-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-14  8:39       ` Sagi Grimberg
     [not found]         ` <55A4CABC.5050807-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-14 14:42           ` Steve Wise
2015-07-14 15:33           ` Christoph Hellwig
     [not found]             ` <20150714153347.GA11026-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-14 15:53               ` Jason Gunthorpe
     [not found]                 ` <20150714155340.GA7399-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-14 16:46                   ` Sagi Grimberg
     [not found]                     ` <55A53CFA.7070509-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-14 17:08                       ` Jason Gunthorpe
     [not found]                         ` <20150714170808.GA19814-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-14 18:07                           ` Steve Wise
2015-07-15  3:05                           ` Doug Ledford
     [not found]                             ` <55A5CDE2.4060904-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-07-15  8:52                               ` Sagi Grimberg
2015-07-14 16:12               ` Sagi Grimberg
     [not found]                 ` <55A534D1.6030008-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-14 16:16                   ` Steve Wise
2015-07-14 17:29                     ` Tom Talpey
2015-07-14 16:35                   ` Jason Gunthorpe
     [not found]                     ` <20150714163506.GC7399-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-14 16:55                       ` Sagi Grimberg
     [not found]                         ` <55A53F0B.5050009-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-14 17:09                           ` Jason Gunthorpe
     [not found]                             ` <20150714170859.GB19814-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-15  8:01                               ` Sagi Grimberg
     [not found]                                 ` <55A6136A.8010204-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-15 14:32                                   ` Chuck Lever
     [not found]                                     ` <A9EF2F26-E737-4E80-B2E3-F8D6406F9893-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-15 14:39                                       ` Chuck Lever
2015-07-15 17:19                                       ` Jason Gunthorpe
     [not found]                                         ` <20150715171926.GB23588-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-15 18:39                                           ` Steve Wise
2015-07-15 21:25                                           ` Chuck Lever
     [not found]                                             ` <F2C64EE9-38A5-4DEE-B60E-AD8430FE1049-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-15 22:49                                               ` Jason Gunthorpe
     [not found]                                                 ` <20150715224928.GA941-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-16 14:45                                                   ` Chuck Lever [this message]
     [not found]                                                     ` <F0518DEF-D43C-4CB6-89ED-CA3E94A4DD72-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-16 14:56                                                       ` Steve Wise
2015-07-16 17:40                                                       ` Jason Gunthorpe
     [not found]                                                         ` <20150716174046.GB3680-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-16 20:07                                                           ` Chuck Lever
     [not found]                                                             ` <F8484ABB-BED9-463F-8AEA-EB898EBDD93C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-16 20:49                                                               ` Jason Gunthorpe
     [not found]                                                                 ` <20150716204932.GA10638-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-17 15:03                                                                   ` Chuck Lever
     [not found]                                                                     ` <62F9F5B8-0A18-4DF8-B47E-7408BFFE9904-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-17 17:21                                                                       ` Jason Gunthorpe
     [not found]                                                                         ` <20150717172141.GA15808-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-17 19:26                                                                           ` Chuck Lever
     [not found]                                                                             ` <9A70883F-9963-42D0-9F5C-EF49F822A037-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-17 20:36                                                                               ` Jason Gunthorpe
2015-07-16  6:52                                       ` Sagi Grimberg
     [not found]                                         ` <55A754BC.6010706-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-16  8:07                                           ` Christoph Hellwig
     [not found]                                             ` <20150716080702.GD9093-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-16  8:29                                               ` Sagi Grimberg
     [not found]                                                 ` <55A76B84.30504-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-16 14:25                                                   ` Steve Wise
2015-07-16 14:40                                                     ` Sagi Grimberg
2015-07-15 18:31                                   ` Jason Gunthorpe
     [not found]                                     ` <20150715183129.GC23588-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-15 18:50                                       ` Steve Wise
2015-07-15 19:09                                         ` Jason Gunthorpe
     [not found]                                           ` <20150715190947.GE23588-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-15 19:26                                             ` Steve Wise
2015-07-16  8:02                                       ` Christoph Hellwig
2015-07-15  7:32   ` Christoph Hellwig
     [not found]     ` <20150715073233.GA11535-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-15  8:33       ` Sagi Grimberg
     [not found]         ` <55A61AE3.8020609-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-15  9:07           ` Christoph Hellwig
2015-07-15 19:15           ` Jason Gunthorpe
2015-07-15 17:07       ` Jason Gunthorpe
     [not found]         ` <20150715170750.GA23588-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-16 12:21           ` Sagi Grimberg
     [not found]             ` <55A7A1B0.5000808-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-16 18:08               ` Jason Gunthorpe
     [not found]                 ` <20150716180806.GC3680-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-19  5:33                   ` Sagi Grimberg
     [not found]                     ` <55AB36A4.1070102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-20 16:23                       ` Jason Gunthorpe
     [not found]                         ` <20150720162340.GB18336-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-20 16:29                           ` Sagi Grimberg
2015-07-19  5:45   ` Sagi Grimberg
     [not found]     ` <55AB3976.7060202-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-20 16:18       ` Jason Gunthorpe
     [not found]         ` <20150720161821.GA18336-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-20 16:27           ` Sagi Grimberg
     [not found]             ` <55AD2188.50708-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-20 17:00               ` Jason Gunthorpe
     [not found]                 ` <20150720170033.GA20350-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-20 17:07                   ` Sagi Grimberg
     [not found]                     ` <55AD2AB4.8010209-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-20 19:50                       ` Jason Gunthorpe
     [not found]                         ` <20150720195027.GA24162-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-21 11:40                           ` Sagi Grimberg
     [not found]                             ` <55AE2FA2.3000601-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-21 16:00                               ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F0518DEF-D43C-4CB6-89ED-CA3E94A4DD72@oracle.com \
    --to=chuck.lever-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
    --cc=bvanassche-HInyCGIudOg@public.gmane.org \
    --cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=oren-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
    --cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org \
    --cc=tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.