All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Cc: Steve Wise
	<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>,
	Linux NFS Mailing List
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH 7/8] xprtrdma: Split the completion queue
Date: Sat, 19 Apr 2014 12:31:29 -0400	[thread overview]
Message-ID: <593D9BFA-714E-417F-ACA0-05594290C4D1@oracle.com> (raw)
In-Reply-To: <5350277C.20608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>

Hi Sagi-

On Apr 17, 2014, at 3:11 PM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:

> On 4/17/2014 5:34 PM, Steve Wise wrote:
> 
> <SNIP>
>> You could use a small array combined with a loop and a budget count.  So the code would
>> grab, say, 4 at a time, and keep looping polling up to 4 until the CQ is empty or the
>> desired budget is reached...
> 
> Bingo... couldn't agree more.
> 
> Poll Arrays are a nice optimization,

Typically, a provider's poll_cq implementation takes the CQ lock
using spin_lock_irqsave().  My goal of using a poll array is to
reduce the number of times the completion handler invokes
spin_lock_irqsave / spin_unlock_irqsave pairs when draining a
large queue.

> but large arrays will just burden the stack (and might even make things worse in high workloads...)


My prototype moves the poll array off the stack and into allocated
storage.  Making that array as large as a single page would be
sufficient for 50 or more ib_wc structures on a platform with 4KB
pages and 64-bit addresses.

The xprtrdma completion handler polls twice:

  1.  Drain the CQ completely

  2.  Re-arm

  3.  Drain the CQ completely again

So between steps 1. and 3. a single notification could handle over
100 WCs, if we were to budget by draining just a single array's worth
during each step. (Btw, I'm not opposed to looping while polling
arrays. This is just an example for discussion).

As for budgeting itself, I wonder if there is a possibility of losing
notifications.  The purpose of re-arming and then draining again is to
ensure that any items queued after step 1. and before step 2. are
captured, as by themselves they would never generate an upcall
notification, IIUC.

When the handler hits its budget and returns, xprtrdma needs to be
invoked again to finish draining the completion queue. How is that
guaranteed?

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Chuck Lever <chuck.lever@oracle.com>
To: Sagi Grimberg <sagig@dev.mellanox.co.il>
Cc: Steve Wise <swise@opengridcomputing.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	linux-rdma@vger.kernel.org
Subject: Re: [PATCH 7/8] xprtrdma: Split the completion queue
Date: Sat, 19 Apr 2014 12:31:29 -0400	[thread overview]
Message-ID: <593D9BFA-714E-417F-ACA0-05594290C4D1@oracle.com> (raw)
In-Reply-To: <5350277C.20608@dev.mellanox.co.il>

Hi Sagi-

On Apr 17, 2014, at 3:11 PM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote:

> On 4/17/2014 5:34 PM, Steve Wise wrote:
> 
> <SNIP>
>> You could use a small array combined with a loop and a budget count.  So the code would
>> grab, say, 4 at a time, and keep looping polling up to 4 until the CQ is empty or the
>> desired budget is reached...
> 
> Bingo... couldn't agree more.
> 
> Poll Arrays are a nice optimization,

Typically, a provider's poll_cq implementation takes the CQ lock
using spin_lock_irqsave().  My goal of using a poll array is to
reduce the number of times the completion handler invokes
spin_lock_irqsave / spin_unlock_irqsave pairs when draining a
large queue.

> but large arrays will just burden the stack (and might even make things worse in high workloads...)


My prototype moves the poll array off the stack and into allocated
storage.  Making that array as large as a single page would be
sufficient for 50 or more ib_wc structures on a platform with 4KB
pages and 64-bit addresses.

The xprtrdma completion handler polls twice:

  1.  Drain the CQ completely

  2.  Re-arm

  3.  Drain the CQ completely again

So between steps 1. and 3. a single notification could handle over
100 WCs, if we were to budget by draining just a single array's worth
during each step. (Btw, I'm not opposed to looping while polling
arrays. This is just an example for discussion).

As for budgeting itself, I wonder if there is a possibility of losing
notifications.  The purpose of re-arming and then draining again is to
ensure that any items queued after step 1. and before step 2. are
captured, as by themselves they would never generate an upcall
notification, IIUC.

When the handler hits its budget and returns, xprtrdma needs to be
invoked again to finish draining the completion queue. How is that
guaranteed?

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





  parent reply	other threads:[~2014-04-19 16:31 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-14 22:22 [PATCH 0/8] NFS/RDMA patches for review Chuck Lever
2014-04-14 22:22 ` Chuck Lever
     [not found] ` <20140414220041.20646.63991.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-04-14 22:22   ` [PATCH 1/8] xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context Chuck Lever
2014-04-14 22:22     ` Chuck Lever
2014-04-14 22:22   ` [PATCH 2/8] xprtrdma: Remove BOUNCEBUFFERS memory registration mode Chuck Lever
2014-04-14 22:22     ` Chuck Lever
2014-04-14 22:22   ` [PATCH 3/8] xprtrdma: Disable ALLPHYSICAL mode by default Chuck Lever
2014-04-14 22:22     ` Chuck Lever
2014-04-14 22:22   ` [PATCH 4/8] xprtrdma: Remove support for MEMWINDOWS registration mode Chuck Lever
2014-04-14 22:22     ` Chuck Lever
2014-04-14 22:23   ` [PATCH 5/8] xprtrdma: Simplify rpcrdma_deregister_external() synopsis Chuck Lever
2014-04-14 22:23     ` Chuck Lever
2014-04-14 22:23   ` [PATCH 6/8] xprtrdma: Make rpcrdma_ep_destroy() return void Chuck Lever
2014-04-14 22:23     ` Chuck Lever
2014-04-14 22:23   ` [PATCH 7/8] xprtrdma: Split the completion queue Chuck Lever
2014-04-14 22:23     ` Chuck Lever
     [not found]     ` <20140414222323.20646.66946.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-04-16 12:48       ` Sagi Grimberg
2014-04-16 12:48         ` Sagi Grimberg
     [not found]         ` <534E7C1C.5070407-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-04-16 13:30           ` Steve Wise
2014-04-16 13:30             ` Steve Wise
     [not found]             ` <534E8608.8030801-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-04-16 14:12               ` Sagi Grimberg
2014-04-16 14:12                 ` Sagi Grimberg
     [not found]                 ` <534E8FCE.909-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-04-16 14:25                   ` Steve Wise
2014-04-16 14:25                     ` Steve Wise
2014-04-16 14:35                     ` Sagi Grimberg
2014-04-16 14:35                       ` Sagi Grimberg
     [not found]                       ` <534E9534.9020004-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-04-16 14:43                         ` Steve Wise
2014-04-16 14:43                           ` Steve Wise
2014-04-16 15:18                           ` Sagi Grimberg
2014-04-16 15:18                             ` Sagi Grimberg
     [not found]                             ` <534E9F40.8000905-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-04-16 15:46                               ` Steve Wise
2014-04-16 15:46                                 ` Steve Wise
2014-04-16 15:08               ` Chuck Lever
2014-04-16 15:08                 ` Chuck Lever
     [not found]                 ` <E9B601B5-1984-40D8-914A-DDD1380E8183-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-04-16 15:23                   ` Sagi Grimberg
2014-04-16 15:23                     ` Sagi Grimberg
     [not found]                     ` <534EA06A.7090200-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-04-16 18:21                       ` Chuck Lever
2014-04-16 18:21                         ` Chuck Lever
     [not found]                         ` <FC61F219-ACA2-4CE8-BF02-1B8EE2464639-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-04-17  7:06                           ` Sagi Grimberg
2014-04-17  7:06                             ` Sagi Grimberg
     [not found]                             ` <534F7D5F.1090908-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-04-17 13:55                               ` Chuck Lever
2014-04-17 13:55                                 ` Chuck Lever
     [not found]                                 ` <A7E4B101-BAF0-480C-95AE-26BB845EE2C3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-04-17 14:34                                   ` Steve Wise
2014-04-17 14:34                                     ` Steve Wise
2014-04-17 19:11                                     ` Sagi Grimberg
2014-04-17 19:11                                       ` Sagi Grimberg
     [not found]                                       ` <5350277C.20608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-04-19 16:31                                         ` Chuck Lever [this message]
2014-04-19 16:31                                           ` Chuck Lever
     [not found]                                           ` <593D9BFA-714E-417F-ACA0-05594290C4D1-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-04-20 12:42                                             ` Sagi Grimberg
2014-04-20 12:42                                               ` Sagi Grimberg
2014-04-17 19:08                                   ` Sagi Grimberg
2014-04-17 19:08                                     ` Sagi Grimberg
2014-04-14 22:23   ` [PATCH 8/8] xprtrdma: Reduce the number of hardway buffer allocations Chuck Lever
2014-04-14 22:23     ` Chuck Lever
2014-04-15 20:15   ` [PATCH 0/8] NFS/RDMA patches for review Steve Wise
2014-04-15 20:15     ` Steve Wise
     [not found]     ` <534D9373.3050406-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-04-15 20:20       ` Steve Wise
2014-04-15 20:20         ` Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=593D9BFA-714E-417F-ACA0-05594290C4D1@oracle.com \
    --to=chuck.lever-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
    --cc=swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.