nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: "matsuda-daisuke@fujitsu.com" <matsuda-daisuke@fujitsu.com>
To: 'Bob Pearson' <rpearsonhpe@gmail.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"leonro@nvidia.com" <leonro@nvidia.com>,
	"jgg@nvidia.com" <jgg@nvidia.com>,
	"zyjzyj2000@gmail.com" <zyjzyj2000@gmail.com>,
	"Ziemba, Ian" <ian.ziemba@hpe.com>,
	Frank Zago <frank.zago@hpe.com>
Cc: "nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"yangx.jy@fujitsu.com" <yangx.jy@fujitsu.com>,
	"lizhijian@fujitsu.com" <lizhijian@fujitsu.com>,
	"y-goto@fujitsu.com" <y-goto@fujitsu.com>
Subject: RE: [RFC PATCH 2/7] RDMA/rxe: Convert the triple tasklets to workqueues
Date: Mon, 12 Sep 2022 08:27:02 +0000	[thread overview]
Message-ID: <TYCPR01MB845576AC0D07947D560F77DCE5449@TYCPR01MB8455.jpnprd01.prod.outlook.com> (raw)
In-Reply-To: <faf616cf-ea40-153e-bc42-a50b418e9ed9@gmail.com>

On Sat, Sep 10, 2022 4:39 AM Bob Pearson wrote:
> On 9/6/22 21:43, Daisuke Matsuda wrote:
> > In order to implement On-Demand Paging on the rxe driver, triple tasklets
> > (requester, responder, and completer) must be allowed to sleep so that they
> > can trigger page fault when pages being accessed are not present.
> >
> > This patch replaces the tasklets with workqueues, but still allows direct-
> > call of works from softirq context if it is obvious that MRs are not going
> > to be accessed and there is no work being processed in the workqueue.
> >
> > As counterparts to tasklet_disable() and tasklet_enable() are missing for
> > workqueues, an atomic value is introduced to get works suspended while qp
> > reset is in progress.
> >
> > As a reference, performance change was measured using ib_send_bw and
> > ib_send_lat commands over RC connection. Both the client and the server
> > were placed on the same KVM host. An option "-n 100000" was given to the
> > respective commands to iterate over 100000 times.
> >
> > Before applying this patch:
> > [ib_send_bw]
> > ---------------------------------------------------------------------------------------
> >  #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
> >  65536      100000           0.00               203.13             0.003250
> > ---------------------------------------------------------------------------------------
> > [ib_send_lat]
> > ---------------------------------------------------------------------------------------
> >  #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99%
> percentile[usec]   99.9% percentile[usec]
> >  2       100000          30.91          1519.05      34.23             36.06            5.15            48.49
> 82.37
> > ---------------------------------------------------------------------------------------
> >
> > After applying this patch:
> > [ib_send_bw]
> > ---------------------------------------------------------------------------------------
> >  #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
> >  65536      100000           0.00               240.88             0.003854
> > ---------------------------------------------------------------------------------------
> > [ib_send_lat]
> > ---------------------------------------------------------------------------------------
> >  #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99%
> percentile[usec]   99.9% percentile[usec]
> >  2       100000          40.88          2994.82      47.80             50.25            13.70           76.42
> 185.04
> > ---------------------------------------------------------------------------------------
> >
> > The test was conducted 3 times for each kernel, and the results with median
> > "BW average" and "t_typical" are shown above. It shows the conversion
> > improves the bandwidth while causing higher latency.
> >
> > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> > ---
> >  drivers/infiniband/sw/rxe/Makefile    |   2 +-
> >  drivers/infiniband/sw/rxe/rxe_comp.c  |  42 ++++---
> >  drivers/infiniband/sw/rxe/rxe_loc.h   |   2 +-
> >  drivers/infiniband/sw/rxe/rxe_net.c   |   4 +-
> >  drivers/infiniband/sw/rxe/rxe_param.h |   2 +-
> >  drivers/infiniband/sw/rxe/rxe_qp.c    |  68 +++++------
> >  drivers/infiniband/sw/rxe/rxe_recv.c  |   2 +-
> >  drivers/infiniband/sw/rxe/rxe_req.c   |  14 +--
> >  drivers/infiniband/sw/rxe/rxe_resp.c  |  16 +--
> >  drivers/infiniband/sw/rxe/rxe_task.c  | 152 ------------------------
> >  drivers/infiniband/sw/rxe/rxe_task.h  |  69 -----------
> >  drivers/infiniband/sw/rxe/rxe_verbs.c |   8 +-
> >  drivers/infiniband/sw/rxe/rxe_verbs.h |   8 +-
> >  drivers/infiniband/sw/rxe/rxe_wq.c    | 161 ++++++++++++++++++++++++++
> >  drivers/infiniband/sw/rxe/rxe_wq.h    |  71 ++++++++++++
> >  15 files changed, 322 insertions(+), 299 deletions(-)
> >  delete mode 100644 drivers/infiniband/sw/rxe/rxe_task.c
> >  delete mode 100644 drivers/infiniband/sw/rxe/rxe_task.h
> >  create mode 100644 drivers/infiniband/sw/rxe/rxe_wq.c
> >  create mode 100644 drivers/infiniband/sw/rxe/rxe_wq.h

<...>

> 
> Daiskuke,
> 
> We (hpe) have an internal patch set that also allows using work queues instead of tasklets.
> There is a certain amount of work to allow control over cpu assignment from ulp or application
> level. This is critical for high performance in multithreaded IO applications. It would be in
> both of our interests if we could find a common implementation that works for everyone.

I agree with you, and I am interested in your work.

Would you post the patch set? It may be possible to rebase my work on your implementation.
I just simply replaced tasklets mechanically and modified the conditions when dispatching works
to responder and completer workqueues. The changes are not so complicated. I'll be away between
9/15-9/20. After that, I can take the time to look into it.

By the way, I would also like you to join in the discussion about whether to preserve tasklets or not.
Cf. https://lore.kernel.org/all/TYCPR01MB8455D739FC9FB034E3485C87E5449@TYCPR01MB8455.jpnprd01.prod.outlook.com/

> 
> Perhaps we could arrange for a call to discuss?

This is an important change that may affect all of rxe users,
so I think the discussion should be open to anybody at least at this stage.

Thanks,
Daisuke


  reply	other threads:[~2022-09-12  8:28 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-07  2:42 [RFC PATCH 0/7] RDMA/rxe: On-Demand Paging on SoftRoCE Daisuke Matsuda
2022-09-07  2:42 ` [RFC PATCH 1/7] IB/mlx5: Change ib_umem_odp_map_dma_single_page() to retain umem_mutex Daisuke Matsuda
2022-09-07  2:43 ` [RFC PATCH 2/7] RDMA/rxe: Convert the triple tasklets to workqueues Daisuke Matsuda
2022-09-09 19:39   ` Bob Pearson
2022-09-12  8:27     ` matsuda-daisuke [this message]
2022-09-11  7:10   ` Yanjun Zhu
2022-09-11 15:08     ` Bart Van Assche
2022-09-12  7:58       ` matsuda-daisuke
2022-09-12  8:29         ` Yanjun Zhu
2022-09-12 19:52         ` Bob Pearson
2022-09-28  6:40           ` matsuda-daisuke
2022-09-12  8:25       ` Yanjun Zhu
2022-09-07  2:43 ` [RFC PATCH 3/7] RDMA/rxe: Cleanup code for responder Atomic operations Daisuke Matsuda
2022-09-07  2:43 ` [RFC PATCH 4/7] RDMA/rxe: Add page invalidation support Daisuke Matsuda
2022-09-07  2:43 ` [RFC PATCH 5/7] RDMA/rxe: Allow registering MRs for On-Demand Paging Daisuke Matsuda
2022-09-08 16:57   ` Haris Iqbal
2022-09-09  0:55     ` matsuda-daisuke
2022-09-07  2:43 ` [RFC PATCH 6/7] RDMA/rxe: Add support for Send/Recv/Write/Read operations with ODP Daisuke Matsuda
2022-09-08  8:29   ` Leon Romanovsky
2022-09-09  2:45     ` matsuda-daisuke
2022-09-07  2:43 ` [RFC PATCH 7/7] RDMA/rxe: Add support for the traditional Atomic " Daisuke Matsuda
2022-09-08  8:40 ` [RFC PATCH 0/7] RDMA/rxe: On-Demand Paging on SoftRoCE Zhu Yanjun
2022-09-08 10:25   ` matsuda-daisuke
2022-09-09  3:07 ` Li Zhijian
2022-09-12  9:21   ` matsuda-daisuke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=TYCPR01MB845576AC0D07947D560F77DCE5449@TYCPR01MB8455.jpnprd01.prod.outlook.com \
    --to=matsuda-daisuke@fujitsu.com \
    --cc=frank.zago@hpe.com \
    --cc=ian.ziemba@hpe.com \
    --cc=jgg@nvidia.com \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=lizhijian@fujitsu.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=rpearsonhpe@gmail.com \
    --cc=y-goto@fujitsu.com \
    --cc=yangx.jy@fujitsu.com \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).