From: Jinpu Wang <jinpuwang@gmail.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Danil Kipnis <danil.kipnis@cloud.ionos.com>,
linux-block@vger.kernel.org, linux-rdma@vger.kernel.org,
Jens Axboe <axboe@kernel.dk>,
Christoph Hellwig <hch@infradead.org>,
bvanassche@acm.org, jgg@mellanox.com, dledford@redhat.com,
Roman Pen <r.peniaev@gmail.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [PATCH v4 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)
Date: Fri, 12 Jul 2019 09:57:54 +0200 [thread overview]
Message-ID: <CAD9gYJKcJ47ogKL4S_KMtxpS1gPHHhqqG7-GTi-2c0cOJ-LJtw@mail.gmail.com> (raw)
In-Reply-To: <aef765ed-4bb9-2211-05d0-b320cc3ac275@grimberg.me>
Hi Sagi,
> >> Another question, from what I understand from the code, the client
> >> always rdma_writes data on writes (with imm) from a remote pool of
> >> server buffers dedicated to it. Essentially all writes are immediate (no
> >> rdma reads ever). How is that different than using send wrs to a set of
> >> pre-posted recv buffers (like all others are doing)? Is it faster?
> > At the very beginning of the project we did some measurements and saw,
> > that it is faster. I'm not sure if this is still true
>
> Its not significantly faster (can't imagine why it would be).
> What could make a difference is probably the fact that you never
> do rdma reads for I/O writes which might be better. Also perhaps the
> fact that you normally don't wait for send completions before completing
> I/O (which is broken), and the fact that you batch recv operations.
I don't know how do you come to the conclusion we don't wait for send
completion before completing IO.
We do chain wr on successfull read request from server, see funtion
rdma_write_sg,
318 static int rdma_write_sg(struct ibtrs_srv_op *id)
319 {
320 struct ibtrs_srv_sess *sess = to_srv_sess(id->con->c.sess);
321 dma_addr_t dma_addr = sess->dma_addr[id->msg_id];
322 struct ibtrs_srv *srv = sess->srv;
323 struct ib_send_wr inv_wr, imm_wr;
324 struct ib_rdma_wr *wr = NULL;
snip
333 need_inval = le16_to_cpu(id->rd_msg->flags) &
IBTRS_MSG_NEED_INVAL_F;
snip
357 wr->wr.wr_cqe = &io_comp_cqe;
358 wr->wr.sg_list = list;
359 wr->wr.num_sge = 1;
360 wr->remote_addr = le64_to_cpu(id->rd_msg->desc[i].addr);
361 wr->rkey = le32_to_cpu(id->rd_msg->desc[i].key);
snip
368 if (i < (sg_cnt - 1))
369 wr->wr.next = &id->tx_wr[i + 1].wr;
370 else if (need_inval)
371 wr->wr.next = &inv_wr;
372 else
373 wr->wr.next = &imm_wr;
374
375 wr->wr.opcode = IB_WR_RDMA_WRITE;
376 wr->wr.ex.imm_data = 0;
377 wr->wr.send_flags = 0;
snip
386 if (need_inval) {
387 inv_wr.next = &imm_wr;
388 inv_wr.wr_cqe = &io_comp_cqe;
389 inv_wr.sg_list = NULL;
390 inv_wr.num_sge = 0;
391 inv_wr.opcode = IB_WR_SEND_WITH_INV;
392 inv_wr.send_flags = 0;
393 inv_wr.ex.invalidate_rkey = rkey;
394 }
395 imm_wr.next = NULL;
396 imm_wr.wr_cqe = &io_comp_cqe;
397 imm_wr.sg_list = NULL;
398 imm_wr.num_sge = 0;
399 imm_wr.opcode = IB_WR_RDMA_WRITE_WITH_IMM;
400 imm_wr.send_flags = flags;
401 imm_wr.ex.imm_data = cpu_to_be32(ibtrs_to_io_rsp_imm(id->msg_id,
402 0,
need_inval));
403
when we need to do invalidation of remote memory, there will chain WR
togather, last 2 are inv_wr, and imm_wr.
imm_wr is the last one, this is important, due to the fact RC QP are
ordered, we know when when we received
IB_WC_RECV_RDMA_WITH_IMM and w_inval is true, hardware should already
finished it's job to invalidate the MR.
If server fails to invalidate, we will do local invalidation, and wait
for completion.
On client side
284 static void complete_rdma_req(struct ibtrs_clt_io_req *req, int errno,
285 bool notify, bool can_wait)
286 {
287 struct ibtrs_clt_con *con = req->con;
288 struct ibtrs_clt_sess *sess;
289 struct ibtrs_clt *clt;
290 int err;
291
292 if (WARN_ON(!req->in_use))
293 return;
294 if (WARN_ON(!req->con))
295 return;
296 sess = to_clt_sess(con->c.sess);
297 clt = sess->clt;
298
299 if (req->sg_cnt) {
300 if (unlikely(req->dir == DMA_FROM_DEVICE &&
req->need_inv)) {
301 /*
302 * We are here to invalidate RDMA read requests
303 * ourselves. In normal scenario server should
304 * send INV for all requested RDMA reads, but
305 * we are here, thus two things could happen:
306 *
307 * 1. this is failover, when errno != 0
308 * and can_wait == 1,
309 *
310 * 2. something totally bad happened and
311 * server forgot to send INV, so we
312 * should do that ourselves.
313 */
314
315 if (likely(can_wait)) {
316 req->need_inv_comp = true;
317 } else {
318 /* This should be IO path, so
always notify */
319 WARN_ON(!notify);
320 /* Save errno for INV callback */
321 req->inv_errno = errno;
322 }
323
324 err = ibtrs_inv_rkey(req);
325 if (unlikely(err)) {
326 ibtrs_err(sess, "Send INV WR
key=%#x: %d\n",
327 req->mr->rkey, err);
328 } else if (likely(can_wait)) {
329 wait_for_completion(&req->inv_comp);
330 } else {
330 } else {
331 /*
332 * Something went wrong, so request will be
333 * completed from INV callback.
334 */
335 WARN_ON_ONCE(1);
336
337 return;
338 }
339 }
340 ib_dma_unmap_sg(sess->s.dev->ib_dev, req->sglist,
341 req->sg_cnt, req->dir);
342 }
343 if (sess->stats.enable_rdma_lat)
344 ibtrs_clt_update_rdma_lat(&sess->stats,
345 req->dir == DMA_FROM_DEVICE,
346 jiffies_to_msecs(jiffies -
req->start_jiffies));
347 ibtrs_clt_decrease_inflight(&sess->stats);
348
349 req->in_use = false;
350 req->con = NULL;
351
352 if (notify)
353 req->conf(req->priv, errno);
354 }
356 static void process_io_rsp(struct ibtrs_clt_sess *sess, u32
msg_id,
357 s16 errno, bool w_inval)
358 {
359 struct ibtrs_clt_io_req *req;
360
361 if (WARN_ON(msg_id >= sess->queue_depth))
362 return;
363
364 req = &sess->reqs[msg_id];
365 /* Drop need_inv if server responsed with invalidation */
366 req->need_inv &= !w_inval;
367 complete_rdma_req(req, errno, true, false);
368 }
Hope this clears the doubt.
Regards,
Jack
next prev parent reply other threads:[~2019-07-12 7:58 UTC|newest]
Thread overview: 123+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20190620150337.7847-1-jinpuwang@gmail.com>
2019-06-20 15:03 ` [PATCH v4 01/25] sysfs: export sysfs_remove_file_self() Jack Wang
2019-09-23 17:21 ` Bart Van Assche
2019-09-25 9:30 ` Danil Kipnis
2019-07-09 9:55 ` [PATCH v4 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD) Danil Kipnis
2019-07-09 11:00 ` Leon Romanovsky
2019-07-09 11:17 ` Greg KH
2019-07-09 11:57 ` Jinpu Wang
2019-07-09 13:32 ` Leon Romanovsky
2019-07-09 15:39 ` Bart Van Assche
2019-07-09 11:37 ` Jinpu Wang
2019-07-09 12:06 ` Jason Gunthorpe
2019-07-09 13:15 ` Jinpu Wang
2019-07-09 13:19 ` Jason Gunthorpe
2019-07-09 14:17 ` Jinpu Wang
2019-07-09 21:27 ` Sagi Grimberg
2019-07-19 13:12 ` Danil Kipnis
2019-07-10 14:55 ` Danil Kipnis
2019-07-09 12:04 ` Jason Gunthorpe
2019-07-09 19:45 ` Sagi Grimberg
2019-07-10 13:55 ` Jason Gunthorpe
2019-07-10 16:25 ` Sagi Grimberg
2019-07-10 17:25 ` Jason Gunthorpe
2019-07-10 19:11 ` Sagi Grimberg
2019-07-11 7:27 ` Danil Kipnis
2019-07-11 8:54 ` Danil Kipnis
2019-07-12 0:22 ` Sagi Grimberg
2019-07-12 7:57 ` Jinpu Wang [this message]
2019-07-12 19:40 ` Sagi Grimberg
2019-07-15 11:21 ` Jinpu Wang
2019-07-12 10:58 ` Danil Kipnis
[not found] ` <20190620150337.7847-26-jinpuwang@gmail.com>
2019-07-09 15:10 ` [PATCH v4 25/25] MAINTAINERS: Add maintainer for IBNBD/IBTRS modules Leon Romanovsky
2019-07-09 15:18 ` Jinpu Wang
2019-07-09 15:51 ` Leon Romanovsky
2019-09-13 23:56 ` Bart Van Assche
2019-09-19 10:30 ` Jinpu Wang
[not found] ` <20190620150337.7847-16-jinpuwang@gmail.com>
2019-09-13 22:10 ` [PATCH v4 15/25] ibnbd: private headers with IBNBD protocol structs and helpers Bart Van Assche
2019-09-15 14:30 ` Jinpu Wang
2019-09-16 5:27 ` Leon Romanovsky
2019-09-16 13:45 ` Bart Van Assche
2019-09-17 15:41 ` Leon Romanovsky
2019-09-17 15:52 ` Jinpu Wang
2019-09-16 7:08 ` Danil Kipnis
2019-09-16 14:57 ` Jinpu Wang
2019-09-16 17:25 ` Bart Van Assche
2019-09-17 12:27 ` Jinpu Wang
2019-09-16 15:39 ` Jinpu Wang
2019-09-18 15:26 ` Bart Van Assche
2019-09-18 16:11 ` Jinpu Wang
[not found] ` <20190620150337.7847-17-jinpuwang@gmail.com>
2019-09-13 22:25 ` [PATCH v4 16/25] ibnbd: client: private header with client structs and functions Bart Van Assche
2019-09-17 16:36 ` Jinpu Wang
2019-09-25 23:43 ` Danil Kipnis
2019-09-26 10:00 ` Jinpu Wang
[not found] ` <20190620150337.7847-18-jinpuwang@gmail.com>
2019-09-13 23:46 ` [PATCH v4 17/25] ibnbd: client: main functionality Bart Van Assche
2019-09-16 14:17 ` Danil Kipnis
2019-09-16 16:46 ` Bart Van Assche
2019-09-17 11:39 ` Danil Kipnis
2019-09-18 7:14 ` Danil Kipnis
2019-09-18 15:47 ` Bart Van Assche
2019-09-20 8:29 ` Danil Kipnis
2019-09-25 22:26 ` Danil Kipnis
2019-09-26 9:55 ` Roman Penyaev
2019-09-26 15:01 ` Bart Van Assche
2019-09-27 8:52 ` Roman Penyaev
2019-09-27 9:32 ` Danil Kipnis
2019-09-27 12:18 ` Danil Kipnis
2019-09-27 16:37 ` Bart Van Assche
2019-09-27 16:50 ` Roman Penyaev
2019-09-27 17:16 ` Bart Van Assche
2019-09-17 13:09 ` Jinpu Wang
2019-09-17 16:46 ` Bart Van Assche
2019-09-18 12:02 ` Jinpu Wang
2019-09-18 16:05 ` Jinpu Wang
2019-09-14 0:00 ` Bart Van Assche
[not found] ` <20190620150337.7847-25-jinpuwang@gmail.com>
2019-09-13 23:58 ` [PATCH v4 24/25] ibnbd: a bit of documentation Bart Van Assche
2019-09-18 12:22 ` Jinpu Wang
[not found] ` <20190620150337.7847-19-jinpuwang@gmail.com>
2019-09-18 16:28 ` [PATCH v4 18/25] ibnbd: client: sysfs interface functions Bart Van Assche
2019-09-19 15:55 ` Jinpu Wang
[not found] ` <20190620150337.7847-21-jinpuwang@gmail.com>
2019-09-18 17:41 ` [PATCH v4 20/25] ibnbd: server: main functionality Bart Van Assche
2019-09-20 7:36 ` Danil Kipnis
2019-09-20 15:42 ` Bart Van Assche
2019-09-23 15:19 ` Danil Kipnis
[not found] ` <20190620150337.7847-22-jinpuwang@gmail.com>
2019-09-18 21:46 ` [PATCH v4 21/25] ibnbd: server: functionality for IO submission to file or block dev Bart Van Assche
2019-09-26 14:04 ` Jinpu Wang
2019-09-26 15:11 ` Bart Van Assche
2019-09-26 15:25 ` Danil Kipnis
2019-09-26 15:29 ` Bart Van Assche
2019-09-26 15:38 ` Danil Kipnis
2019-09-26 15:42 ` Jinpu Wang
[not found] ` <20190620150337.7847-3-jinpuwang@gmail.com>
2019-09-23 17:44 ` [PATCH v4 02/25] ibtrs: public interface header to establish RDMA connections Bart Van Assche
2019-09-25 10:20 ` Danil Kipnis
2019-09-25 15:38 ` Bart Van Assche
[not found] ` <20190620150337.7847-7-jinpuwang@gmail.com>
2019-09-23 21:51 ` [PATCH v4 06/25] ibtrs: client: main functionality Bart Van Assche
2019-09-25 17:36 ` Danil Kipnis
2019-09-25 18:55 ` Bart Van Assche
2019-09-25 20:50 ` Danil Kipnis
2019-09-25 21:08 ` Bart Van Assche
2019-09-25 21:16 ` Bart Van Assche
2019-09-25 22:53 ` Danil Kipnis
2019-09-25 23:21 ` Bart Van Assche
2019-09-26 9:16 ` Danil Kipnis
[not found] ` <20190620150337.7847-4-jinpuwang@gmail.com>
2019-09-23 22:50 ` [PATCH v4 03/25] ibtrs: private headers with IBTRS protocol structs and helpers Bart Van Assche
2019-09-25 21:45 ` Danil Kipnis
2019-09-25 21:57 ` Bart Van Assche
2019-09-27 8:56 ` Jinpu Wang
[not found] ` <20190620150337.7847-5-jinpuwang@gmail.com>
2019-09-23 23:03 ` [PATCH v4 04/25] ibtrs: core: lib functions shared between client and server modules Bart Van Assche
2019-09-27 10:13 ` Jinpu Wang
[not found] ` <20190620150337.7847-6-jinpuwang@gmail.com>
2019-09-23 23:05 ` [PATCH v4 05/25] ibtrs: client: private header with client structs and functions Bart Van Assche
2019-09-27 10:18 ` Jinpu Wang
[not found] ` <20190620150337.7847-8-jinpuwang@gmail.com>
2019-09-23 23:15 ` [PATCH v4 07/25] ibtrs: client: statistics functions Bart Van Assche
2019-09-27 12:00 ` Jinpu Wang
[not found] ` <20190620150337.7847-10-jinpuwang@gmail.com>
2019-09-23 23:21 ` [PATCH v4 09/25] ibtrs: server: private header with server structs and functions Bart Van Assche
2019-09-27 12:04 ` Jinpu Wang
[not found] ` <20190620150337.7847-11-jinpuwang@gmail.com>
2019-09-23 23:49 ` [PATCH v4 10/25] ibtrs: server: main functionality Bart Van Assche
2019-09-27 15:03 ` Jinpu Wang
2019-09-27 15:11 ` Bart Van Assche
2019-09-27 15:19 ` Jinpu Wang
[not found] ` <20190620150337.7847-12-jinpuwang@gmail.com>
2019-09-23 23:56 ` [PATCH v4 11/25] ibtrs: server: statistics functions Bart Van Assche
2019-10-02 15:15 ` Jinpu Wang
2019-10-02 15:42 ` Leon Romanovsky
2019-10-02 15:45 ` Jinpu Wang
2019-10-02 16:00 ` Leon Romanovsky
[not found] ` <20190620150337.7847-13-jinpuwang@gmail.com>
2019-09-24 0:00 ` [PATCH v4 12/25] ibtrs: server: sysfs interface functions Bart Van Assche
2019-10-02 15:11 ` Jinpu Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAD9gYJKcJ47ogKL4S_KMtxpS1gPHHhqqG7-GTi-2c0cOJ-LJtw@mail.gmail.com \
--to=jinpuwang@gmail.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=danil.kipnis@cloud.ionos.com \
--cc=dledford@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=hch@infradead.org \
--cc=jgg@mellanox.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=r.peniaev@gmail.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).