All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: Sagi Grimberg
	<sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>,
	Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Steve Wise
	<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>,
	Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Oren Duer <oren-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>,
	Liran Liss <liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	"Hefty,
	Sean" <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Tom Talpey <tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
Subject: Re: Kernel fast memory registration API proposal [RFC]
Date: Fri, 17 Jul 2015 11:21:41 -0600	[thread overview]
Message-ID: <20150717172141.GA15808@obsidianresearch.com> (raw)
In-Reply-To: <62F9F5B8-0A18-4DF8-B47E-7408BFFE9904-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

On Fri, Jul 17, 2015 at 11:03:45AM -0400, Chuck Lever wrote:
> 
> On Jul 16, 2015, at 4:49 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote:
> 
> > On Thu, Jul 16, 2015 at 04:07:04PM -0400, Chuck Lever wrote:
> > 
> >> The MRs are registered only for remote read. I don’t think
> >> catastrophic harm can occur on the client in this case if the
> >> invalidation and DMA sync comes late. In fact, I’m unsure why
> >> a DMA sync is even necessary as the MR is invalidated in this
> >> case.
> > 
> > For RDMA, the worst case would be some kind of information leakage or
> > machine check halt.
> > 
> > For read side the DMA API should be called before posting the FRWR, no
> > completion side issues.
> 
> It is: rpcrdma_map_one() is done by .ro_map in both the RDMA READ
> and WRITE cases.
> 
> Just to confirm: you’re saying that for MRs that are read-accessed,
> no matching ib_dma_unmap_{page,single}() is required ?

Sorry, I wasn't clear, dma_map/unmap must always be paired, they should
ideally be in the right ordering:
 dma_map(.)
 create MR
 invalidate MR
 dma_unmap()

Remember the DMA API could spin up IOMMU mappings or otherwise,
pairing is critical, ordering is critical, and I'd have some concern
around timeliness too..

When I said no issues, I was talking about running the MR invalidate
async with RPC processing. That should be fine.

But the dma unmap should be done from the SCQ processing loop, after
it is known the INVALIDATE for the ACCESS_REMOTE_READ MR is
complete. You could perhaps suppress completion for the invalidate, as
long as there is a scheme to track the needed invalidate (see my last
email)

A ACCES_REMOTE_WRITE MR side is basically the same, except the
INVALIDATE should be signaled and RPC processing should resume from
the SCQ side.

This is where you'd put a 'server trust' performance option to run
even the write invalidate async, then the dma_unmap should be done
when the invalidate is posted.

> Sure. It might be possible to move both the DMA unmap and the
> invalidate into the reply handler without a lot of surgery.
> We’ll see.
> 
> There would be some performance cost. That’s unfortunate because
> the scenarios we’re guarding against are exceptionally rare.

NFS needs to learn to do SEND WITH INVALIDATE to mitigate the
invalidate cost...

> > Use a scheme where you supress signaling and use the SQE accounting to
> > request a completion entry and signal around every 1/2 length of the
> > SQ.
> 
> Actually Sagi and I have found we can’t leave more than about 80
> sends unsignalled, no matter how long the pre-allocated SQ is.

Hum, I'm pretty sure I've done more than that before on mlx4 and
mthca. Certainly, I can't think of any reason (spec wise) for the
above to be true. Sagi, do you know what this is?

The fact you see unexplained problems like this is more likely to be a
reflection of NFS not following the rules for running the SQ, than a
driver bug. QP blow ups and posting failures are exactly the symptoms
of not following the rules :)

Once the ULP is absolutely certain, by direct accounting of consumed
SQEs that it is not over posting, would I look for a driver/hw
problem....

> Since most send completions are silenced, xprtrdma relies on seeing
> the completion of a _subsequent_ WR.

Right, since you don't care about the sends, you only need enough
information and signalling to flow control the SQ/SCQ. But, a SEND
that would other wise be silenced, should be signaled if it falls at
the 1/2 mark, or is the last WR placed into a becoming full SQ. That
minimum basic mandatory signalling is required to avoid deadlocking.

> So, if my reply handler were to issue a LOCAL_INV WR and wait for
> its completion, then the completion of send WRs submitted before
> that one, even if they are silent, is guaranteed.

Yes, the SQ is strongly ordered.

> In the cases where the reply handler issues a LOCAL_INV, waiting
> for its completion before allowing the next RPC to be sent is
> enough to guarantee space on the SQ, I would think.

> For FMR and smaller RPCs that don’t need RDMA, we’d probably
> have to wait on the completion of the RDMA SEND of the RPC call
> message.

> So, we could get away with signalling only the last send WR issued
> for each RPC.

I think I see you thinking about how to bolt on a different implicit
accounting scheme, again using inference about X completing meaning Y
is available?

I'm sure that can be made to work (and I think you've got the right
reasoning), but I strongly don't recommend it - it is complicated and
brittle to maintain. ie Perhaps NFS had a reasonable scheme like this
once, but the FRWR additions appear to have damanged it's basic
unstated assumptions.

Directly track the number of SQEs used and available, use WARN_ON
before every post to make sure the invariant isn't violated.

Because NFS has a mixed scheme where only INVALIDATE is required
synchronous, I'd optimize for free flow without requiring SEND to be
signaled.

Based on your comments, I think an accounting scheme like this makes
sense:
 0. Broadly we want to have three pools for a RPC slot:
     - Submitted to the upper layer and available for immediate use
     - Submitted to the network and currently executing
     - Waiting for resources to recycle
       * A recv buffer is posted to the local RQ
       * The far end has posted its recv buffer to its RQ
       * The SQ/SCQ has avilable space to issue any RPC
 1. Each RPC slot takes a maximum of N SQE credits. Figure this
    constant out at the start of time. I suspect it is 3 when using FRWR.
 2. When you pass a RPC slot to the upper layer, either at the start
    of time, or when completing recvs, decrease the SQE accounting
    by N. ie the upper layer is now free to use that RPC slot at any
    momement, the maximum N SQEs it could require are guaranteed
    available and nothing can steal them.

    If N SQEs are not available then do not give the slot to the
    upper layer.
 3. When the RPC is actually submitted figure out how many SQEs it
    really needs and adjust the accounting. Ie if only 1 is needed then
    return 2 SQE credits.
 4. Track SQE credits use at SCQ time using some scheme, and return
    credit for explicitly&implicitly completed SQEs.
 5. Figure out the right place to inject the 3rd pool of #0. This can
    absolutely be done by deferring advancing the recvQ until the RPC
    recycling conditions are all met, but it would be better
    (latency wise) to process the recv and then defer recycling the
    empty RPC slot.

Use signaling when necessary: at the 1/2 point, for all SQEs when free
space is < N (deadlock avoidance) and when NFS needs to wait for a
sync invalidate.

It sounds more complicated than it is. :)

If you have a work load with no sync-invalidates then the above still
functions at full speed without requiring extra SEND signaling.
sync-invalidates cause SQE credits to recycle faster and guarentees we
won't do the deferral in #5.

Size the SQ length to be at least something like 2*N*(the # of RPC)
slots..

I'd say the above is broadly typical for what I'd consider correct use
of a RDMA QP.. The three flow control loops of #0 should be fairly obvious
and explicit in the code.

> > kfree/dma unmap/etc may only be done on a SEND buffer after seeing a
> > SCQE proving that buffer is done, or tearing down the QP and halting
> > the send side.
> 
> The buffers the client uses to send an RPC call are DMA mapped once
> when the transport is created, and a local lkey is used in the SEND
> WR.
> 
> They are re-used for the next RPCs in the pipe, but as far as I can
> tell the client’s send buffer contains the RPC call data until the
> RPC request slot is retired (xprt_release).

It is what I'd expect based on your past descriptions - just making
sure you are aware :)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2015-07-17 17:21 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-10  9:09 Kernel fast memory registration API proposal [RFC] Sagi Grimberg
     [not found] ` <559F8BD1.9080308-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-11 10:39   ` Christoph Hellwig
     [not found]     ` <20150711103920.GE14741-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-12  7:57       ` Sagi Grimberg
     [not found]         ` <55A21DF6.6090909-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-12 18:15           ` Chuck Lever
     [not found]             ` <96901C8F-D916-4ECF-8DA4-C5C67FB8539E-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-13  6:47               ` Christoph Hellwig
     [not found]                 ` <20150713064701.GB31842-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-13 14:16                   ` Chuck Lever
     [not found]                     ` <1D9C0527-E277-4C3F-A80D-C4FBAA3D82E9-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-14  8:50                       ` Sagi Grimberg
     [not found]                         ` <55A4CD5B.9030000-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-14 18:58                           ` Chuck Lever
2015-07-13 16:30   ` Jason Gunthorpe
     [not found]     ` <20150713163015.GA23832-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-14  8:39       ` Sagi Grimberg
     [not found]         ` <55A4CABC.5050807-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-14 14:42           ` Steve Wise
2015-07-14 15:33           ` Christoph Hellwig
     [not found]             ` <20150714153347.GA11026-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-14 15:53               ` Jason Gunthorpe
     [not found]                 ` <20150714155340.GA7399-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-14 16:46                   ` Sagi Grimberg
     [not found]                     ` <55A53CFA.7070509-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-14 17:08                       ` Jason Gunthorpe
     [not found]                         ` <20150714170808.GA19814-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-14 18:07                           ` Steve Wise
2015-07-15  3:05                           ` Doug Ledford
     [not found]                             ` <55A5CDE2.4060904-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-07-15  8:52                               ` Sagi Grimberg
2015-07-14 16:12               ` Sagi Grimberg
     [not found]                 ` <55A534D1.6030008-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-14 16:16                   ` Steve Wise
2015-07-14 17:29                     ` Tom Talpey
2015-07-14 16:35                   ` Jason Gunthorpe
     [not found]                     ` <20150714163506.GC7399-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-14 16:55                       ` Sagi Grimberg
     [not found]                         ` <55A53F0B.5050009-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-14 17:09                           ` Jason Gunthorpe
     [not found]                             ` <20150714170859.GB19814-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-15  8:01                               ` Sagi Grimberg
     [not found]                                 ` <55A6136A.8010204-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-15 14:32                                   ` Chuck Lever
     [not found]                                     ` <A9EF2F26-E737-4E80-B2E3-F8D6406F9893-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-15 14:39                                       ` Chuck Lever
2015-07-15 17:19                                       ` Jason Gunthorpe
     [not found]                                         ` <20150715171926.GB23588-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-15 18:39                                           ` Steve Wise
2015-07-15 21:25                                           ` Chuck Lever
     [not found]                                             ` <F2C64EE9-38A5-4DEE-B60E-AD8430FE1049-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-15 22:49                                               ` Jason Gunthorpe
     [not found]                                                 ` <20150715224928.GA941-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-16 14:45                                                   ` Chuck Lever
     [not found]                                                     ` <F0518DEF-D43C-4CB6-89ED-CA3E94A4DD72-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-16 14:56                                                       ` Steve Wise
2015-07-16 17:40                                                       ` Jason Gunthorpe
     [not found]                                                         ` <20150716174046.GB3680-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-16 20:07                                                           ` Chuck Lever
     [not found]                                                             ` <F8484ABB-BED9-463F-8AEA-EB898EBDD93C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-16 20:49                                                               ` Jason Gunthorpe
     [not found]                                                                 ` <20150716204932.GA10638-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-17 15:03                                                                   ` Chuck Lever
     [not found]                                                                     ` <62F9F5B8-0A18-4DF8-B47E-7408BFFE9904-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-17 17:21                                                                       ` Jason Gunthorpe [this message]
     [not found]                                                                         ` <20150717172141.GA15808-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-17 19:26                                                                           ` Chuck Lever
     [not found]                                                                             ` <9A70883F-9963-42D0-9F5C-EF49F822A037-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-17 20:36                                                                               ` Jason Gunthorpe
2015-07-16  6:52                                       ` Sagi Grimberg
     [not found]                                         ` <55A754BC.6010706-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-16  8:07                                           ` Christoph Hellwig
     [not found]                                             ` <20150716080702.GD9093-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-16  8:29                                               ` Sagi Grimberg
     [not found]                                                 ` <55A76B84.30504-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-16 14:25                                                   ` Steve Wise
2015-07-16 14:40                                                     ` Sagi Grimberg
2015-07-15 18:31                                   ` Jason Gunthorpe
     [not found]                                     ` <20150715183129.GC23588-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-15 18:50                                       ` Steve Wise
2015-07-15 19:09                                         ` Jason Gunthorpe
     [not found]                                           ` <20150715190947.GE23588-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-15 19:26                                             ` Steve Wise
2015-07-16  8:02                                       ` Christoph Hellwig
2015-07-15  7:32   ` Christoph Hellwig
     [not found]     ` <20150715073233.GA11535-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-15  8:33       ` Sagi Grimberg
     [not found]         ` <55A61AE3.8020609-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-15  9:07           ` Christoph Hellwig
2015-07-15 19:15           ` Jason Gunthorpe
2015-07-15 17:07       ` Jason Gunthorpe
     [not found]         ` <20150715170750.GA23588-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-16 12:21           ` Sagi Grimberg
     [not found]             ` <55A7A1B0.5000808-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-16 18:08               ` Jason Gunthorpe
     [not found]                 ` <20150716180806.GC3680-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-19  5:33                   ` Sagi Grimberg
     [not found]                     ` <55AB36A4.1070102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-20 16:23                       ` Jason Gunthorpe
     [not found]                         ` <20150720162340.GB18336-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-20 16:29                           ` Sagi Grimberg
2015-07-19  5:45   ` Sagi Grimberg
     [not found]     ` <55AB3976.7060202-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-20 16:18       ` Jason Gunthorpe
     [not found]         ` <20150720161821.GA18336-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-20 16:27           ` Sagi Grimberg
     [not found]             ` <55AD2188.50708-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-20 17:00               ` Jason Gunthorpe
     [not found]                 ` <20150720170033.GA20350-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-20 17:07                   ` Sagi Grimberg
     [not found]                     ` <55AD2AB4.8010209-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-20 19:50                       ` Jason Gunthorpe
     [not found]                         ` <20150720195027.GA24162-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-21 11:40                           ` Sagi Grimberg
     [not found]                             ` <55AE2FA2.3000601-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-21 16:00                               ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150717172141.GA15808@obsidianresearch.com \
    --to=jgunthorpe-epgobjl8dl3ta4ec/59zmfatqe2ktcn/@public.gmane.org \
    --cc=bvanassche-HInyCGIudOg@public.gmane.org \
    --cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=oren-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
    --cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org \
    --cc=tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.