From: "matsuda-daisuke@fujitsu.com" <matsuda-daisuke@fujitsu.com>
To: 'Bob Pearson' <rpearsonhpe@gmail.com>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
"leonro@nvidia.com" <leonro@nvidia.com>,
"jgg@nvidia.com" <jgg@nvidia.com>,
"zyjzyj2000@gmail.com" <zyjzyj2000@gmail.com>,
"Ziemba, Ian" <ian.ziemba@hpe.com>,
Frank Zago <frank.zago@hpe.com>
Cc: "nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"yangx.jy@fujitsu.com" <yangx.jy@fujitsu.com>,
"lizhijian@fujitsu.com" <lizhijian@fujitsu.com>,
"y-goto@fujitsu.com" <y-goto@fujitsu.com>
Subject: RE: [RFC PATCH 2/7] RDMA/rxe: Convert the triple tasklets to workqueues
Date: Mon, 12 Sep 2022 08:27:02 +0000 [thread overview]
Message-ID: <TYCPR01MB845576AC0D07947D560F77DCE5449@TYCPR01MB8455.jpnprd01.prod.outlook.com> (raw)
In-Reply-To: <faf616cf-ea40-153e-bc42-a50b418e9ed9@gmail.com>
On Sat, Sep 10, 2022 4:39 AM Bob Pearson wrote:
> On 9/6/22 21:43, Daisuke Matsuda wrote:
> > In order to implement On-Demand Paging on the rxe driver, triple tasklets
> > (requester, responder, and completer) must be allowed to sleep so that they
> > can trigger page fault when pages being accessed are not present.
> >
> > This patch replaces the tasklets with workqueues, but still allows direct-
> > call of works from softirq context if it is obvious that MRs are not going
> > to be accessed and there is no work being processed in the workqueue.
> >
> > As counterparts to tasklet_disable() and tasklet_enable() are missing for
> > workqueues, an atomic value is introduced to get works suspended while qp
> > reset is in progress.
> >
> > As a reference, performance change was measured using ib_send_bw and
> > ib_send_lat commands over RC connection. Both the client and the server
> > were placed on the same KVM host. An option "-n 100000" was given to the
> > respective commands to iterate over 100000 times.
> >
> > Before applying this patch:
> > [ib_send_bw]
> > ---------------------------------------------------------------------------------------
> > #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
> > 65536 100000 0.00 203.13 0.003250
> > ---------------------------------------------------------------------------------------
> > [ib_send_lat]
> > ---------------------------------------------------------------------------------------
> > #bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99%
> percentile[usec] 99.9% percentile[usec]
> > 2 100000 30.91 1519.05 34.23 36.06 5.15 48.49
> 82.37
> > ---------------------------------------------------------------------------------------
> >
> > After applying this patch:
> > [ib_send_bw]
> > ---------------------------------------------------------------------------------------
> > #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
> > 65536 100000 0.00 240.88 0.003854
> > ---------------------------------------------------------------------------------------
> > [ib_send_lat]
> > ---------------------------------------------------------------------------------------
> > #bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99%
> percentile[usec] 99.9% percentile[usec]
> > 2 100000 40.88 2994.82 47.80 50.25 13.70 76.42
> 185.04
> > ---------------------------------------------------------------------------------------
> >
> > The test was conducted 3 times for each kernel, and the results with median
> > "BW average" and "t_typical" are shown above. It shows the conversion
> > improves the bandwidth while causing higher latency.
> >
> > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> > ---
> > drivers/infiniband/sw/rxe/Makefile | 2 +-
> > drivers/infiniband/sw/rxe/rxe_comp.c | 42 ++++---
> > drivers/infiniband/sw/rxe/rxe_loc.h | 2 +-
> > drivers/infiniband/sw/rxe/rxe_net.c | 4 +-
> > drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
> > drivers/infiniband/sw/rxe/rxe_qp.c | 68 +++++------
> > drivers/infiniband/sw/rxe/rxe_recv.c | 2 +-
> > drivers/infiniband/sw/rxe/rxe_req.c | 14 +--
> > drivers/infiniband/sw/rxe/rxe_resp.c | 16 +--
> > drivers/infiniband/sw/rxe/rxe_task.c | 152 ------------------------
> > drivers/infiniband/sw/rxe/rxe_task.h | 69 -----------
> > drivers/infiniband/sw/rxe/rxe_verbs.c | 8 +-
> > drivers/infiniband/sw/rxe/rxe_verbs.h | 8 +-
> > drivers/infiniband/sw/rxe/rxe_wq.c | 161 ++++++++++++++++++++++++++
> > drivers/infiniband/sw/rxe/rxe_wq.h | 71 ++++++++++++
> > 15 files changed, 322 insertions(+), 299 deletions(-)
> > delete mode 100644 drivers/infiniband/sw/rxe/rxe_task.c
> > delete mode 100644 drivers/infiniband/sw/rxe/rxe_task.h
> > create mode 100644 drivers/infiniband/sw/rxe/rxe_wq.c
> > create mode 100644 drivers/infiniband/sw/rxe/rxe_wq.h
<...>
>
> Daiskuke,
>
> We (hpe) have an internal patch set that also allows using work queues instead of tasklets.
> There is a certain amount of work to allow control over cpu assignment from ulp or application
> level. This is critical for high performance in multithreaded IO applications. It would be in
> both of our interests if we could find a common implementation that works for everyone.
I agree with you, and I am interested in your work.
Would you post the patch set? It may be possible to rebase my work on your implementation.
I just simply replaced tasklets mechanically and modified the conditions when dispatching works
to responder and completer workqueues. The changes are not so complicated. I'll be away between
9/15-9/20. After that, I can take the time to look into it.
By the way, I would also like you to join in the discussion about whether to preserve tasklets or not.
Cf. https://lore.kernel.org/all/TYCPR01MB8455D739FC9FB034E3485C87E5449@TYCPR01MB8455.jpnprd01.prod.outlook.com/
>
> Perhaps we could arrange for a call to discuss?
This is an important change that may affect all of rxe users,
so I think the discussion should be open to anybody at least at this stage.
Thanks,
Daisuke
next prev parent reply other threads:[~2022-09-12 8:28 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-07 2:42 [RFC PATCH 0/7] RDMA/rxe: On-Demand Paging on SoftRoCE Daisuke Matsuda
2022-09-07 2:42 ` [RFC PATCH 1/7] IB/mlx5: Change ib_umem_odp_map_dma_single_page() to retain umem_mutex Daisuke Matsuda
2022-09-07 2:43 ` [RFC PATCH 2/7] RDMA/rxe: Convert the triple tasklets to workqueues Daisuke Matsuda
2022-09-09 19:39 ` Bob Pearson
2022-09-12 8:27 ` matsuda-daisuke [this message]
2022-09-11 7:10 ` Yanjun Zhu
2022-09-11 15:08 ` Bart Van Assche
2022-09-12 7:58 ` matsuda-daisuke
2022-09-12 8:29 ` Yanjun Zhu
2022-09-12 19:52 ` Bob Pearson
2022-09-28 6:40 ` matsuda-daisuke
2022-09-12 8:25 ` Yanjun Zhu
2022-09-07 2:43 ` [RFC PATCH 3/7] RDMA/rxe: Cleanup code for responder Atomic operations Daisuke Matsuda
2022-09-07 2:43 ` [RFC PATCH 4/7] RDMA/rxe: Add page invalidation support Daisuke Matsuda
2022-09-07 2:43 ` [RFC PATCH 5/7] RDMA/rxe: Allow registering MRs for On-Demand Paging Daisuke Matsuda
2022-09-08 16:57 ` Haris Iqbal
2022-09-09 0:55 ` matsuda-daisuke
2022-09-07 2:43 ` [RFC PATCH 6/7] RDMA/rxe: Add support for Send/Recv/Write/Read operations with ODP Daisuke Matsuda
2022-09-08 8:29 ` Leon Romanovsky
2022-09-09 2:45 ` matsuda-daisuke
2022-09-07 2:43 ` [RFC PATCH 7/7] RDMA/rxe: Add support for the traditional Atomic " Daisuke Matsuda
2022-09-08 8:40 ` [RFC PATCH 0/7] RDMA/rxe: On-Demand Paging on SoftRoCE Zhu Yanjun
2022-09-08 10:25 ` matsuda-daisuke
2022-09-09 3:07 ` Li Zhijian
2022-09-12 9:21 ` matsuda-daisuke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=TYCPR01MB845576AC0D07947D560F77DCE5449@TYCPR01MB8455.jpnprd01.prod.outlook.com \
--to=matsuda-daisuke@fujitsu.com \
--cc=frank.zago@hpe.com \
--cc=ian.ziemba@hpe.com \
--cc=jgg@nvidia.com \
--cc=leonro@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=lizhijian@fujitsu.com \
--cc=nvdimm@lists.linux.dev \
--cc=rpearsonhpe@gmail.com \
--cc=y-goto@fujitsu.com \
--cc=yangx.jy@fujitsu.com \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).