Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
From: Jinpu Wang <jinpuwang@gmail.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Danil Kipnis <danil.kipnis@cloud.ionos.com>,
	linux-block@vger.kernel.org, linux-rdma@vger.kernel.org,
	Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	bvanassche@acm.org, jgg@mellanox.com, dledford@redhat.com,
	Roman Pen <r.peniaev@gmail.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [PATCH v4 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)
Date: Fri, 12 Jul 2019 09:57:54 +0200
Message-ID: <CAD9gYJKcJ47ogKL4S_KMtxpS1gPHHhqqG7-GTi-2c0cOJ-LJtw@mail.gmail.com> (raw)
In-Reply-To: <aef765ed-4bb9-2211-05d0-b320cc3ac275@grimberg.me>

Hi Sagi,

> >> Another question, from what I understand from the code, the client
> >> always rdma_writes data on writes (with imm) from a remote pool of
> >> server buffers dedicated to it. Essentially all writes are immediate (no
> >> rdma reads ever). How is that different than using send wrs to a set of
> >> pre-posted recv buffers (like all others are doing)? Is it faster?
> > At the very beginning of the project we did some measurements and saw,
> > that it is faster. I'm not sure if this is still true
>
> Its not significantly faster (can't imagine why it would be).
> What could make a difference is probably the fact that you never
> do rdma reads for I/O writes which might be better. Also perhaps the
> fact that you normally don't wait for send completions before completing
> I/O (which is broken), and the fact that you batch recv operations.

I don't know how do you come to the conclusion we don't wait for send
completion before completing IO.

We do chain wr on successfull read request from server, see funtion
rdma_write_sg,

 318 static int rdma_write_sg(struct ibtrs_srv_op *id)
 319 {
 320         struct ibtrs_srv_sess *sess = to_srv_sess(id->con->c.sess);
 321         dma_addr_t dma_addr = sess->dma_addr[id->msg_id];
 322         struct ibtrs_srv *srv = sess->srv;
 323         struct ib_send_wr inv_wr, imm_wr;
 324         struct ib_rdma_wr *wr = NULL;
snip
333         need_inval = le16_to_cpu(id->rd_msg->flags) &
IBTRS_MSG_NEED_INVAL_F;
snip
 357                 wr->wr.wr_cqe   = &io_comp_cqe;
 358                 wr->wr.sg_list  = list;
 359                 wr->wr.num_sge  = 1;
 360                 wr->remote_addr = le64_to_cpu(id->rd_msg->desc[i].addr);
 361                 wr->rkey        = le32_to_cpu(id->rd_msg->desc[i].key);
 snip
368                 if (i < (sg_cnt - 1))
 369                         wr->wr.next = &id->tx_wr[i + 1].wr;
 370                 else if (need_inval)
 371                         wr->wr.next = &inv_wr;
 372                 else
 373                         wr->wr.next = &imm_wr;
 374
 375                 wr->wr.opcode = IB_WR_RDMA_WRITE;
 376                 wr->wr.ex.imm_data = 0;
 377                 wr->wr.send_flags  = 0;
snip
 386         if (need_inval) {
 387                 inv_wr.next = &imm_wr;
 388                 inv_wr.wr_cqe = &io_comp_cqe;
 389                 inv_wr.sg_list = NULL;
 390                 inv_wr.num_sge = 0;
 391                 inv_wr.opcode = IB_WR_SEND_WITH_INV;
 392                 inv_wr.send_flags = 0;
 393                 inv_wr.ex.invalidate_rkey = rkey;
 394         }
 395         imm_wr.next = NULL;
 396         imm_wr.wr_cqe = &io_comp_cqe;
 397         imm_wr.sg_list = NULL;
 398         imm_wr.num_sge = 0;
 399         imm_wr.opcode = IB_WR_RDMA_WRITE_WITH_IMM;
 400         imm_wr.send_flags = flags;
 401         imm_wr.ex.imm_data = cpu_to_be32(ibtrs_to_io_rsp_imm(id->msg_id,
 402                                                              0,
need_inval));
 403


when we need to do invalidation of remote memory, there will chain WR
togather, last 2 are inv_wr, and imm_wr.
imm_wr is the last one, this is important, due to the fact RC QP are
ordered, we know when when we received
IB_WC_RECV_RDMA_WITH_IMM and w_inval is true, hardware should already
finished it's job to invalidate the MR.
If server fails to invalidate, we will do local invalidation, and wait
for completion.

On client side
284 static void complete_rdma_req(struct ibtrs_clt_io_req *req, int errno,
 285                               bool notify, bool can_wait)
 286 {
 287         struct ibtrs_clt_con *con = req->con;
 288         struct ibtrs_clt_sess *sess;
 289         struct ibtrs_clt *clt;
 290         int err;
 291
 292         if (WARN_ON(!req->in_use))
 293                 return;
 294         if (WARN_ON(!req->con))
 295                 return;
 296         sess = to_clt_sess(con->c.sess);
 297         clt = sess->clt;
 298
 299         if (req->sg_cnt) {
 300                 if (unlikely(req->dir == DMA_FROM_DEVICE &&
req->need_inv)) {
 301                         /*
 302                          * We are here to invalidate RDMA read requests
 303                          * ourselves.  In normal scenario server should
 304                          * send INV for all requested RDMA reads, but
 305                          * we are here, thus two things could happen:
 306                          *
 307                          *    1.  this is failover, when errno != 0
 308                          *        and can_wait == 1,
 309                          *
 310                          *    2.  something totally bad happened and
 311                          *        server forgot to send INV, so we
 312                          *        should do that ourselves.
 313                          */
 314
 315                         if (likely(can_wait)) {
 316                                 req->need_inv_comp = true;
 317                         } else {
 318                                 /* This should be IO path, so
always notify */
 319                                 WARN_ON(!notify);
 320                                 /* Save errno for INV callback */
 321                                 req->inv_errno = errno;
 322                         }
 323
 324                         err = ibtrs_inv_rkey(req);
 325                         if (unlikely(err)) {
 326                                 ibtrs_err(sess, "Send INV WR
key=%#x: %d\n",
 327                                           req->mr->rkey, err);
 328                         } else if (likely(can_wait)) {
 329                                 wait_for_completion(&req->inv_comp);
 330                         } else {
330                         } else {
 331                                 /*
 332                                  * Something went wrong, so request will be
 333                                  * completed from INV callback.
 334                                  */
 335                                 WARN_ON_ONCE(1);
 336
 337                                 return;
 338                         }
 339                 }
 340                 ib_dma_unmap_sg(sess->s.dev->ib_dev, req->sglist,
 341                                 req->sg_cnt, req->dir);
 342         }
 343         if (sess->stats.enable_rdma_lat)
 344                 ibtrs_clt_update_rdma_lat(&sess->stats,
 345                                 req->dir == DMA_FROM_DEVICE,
 346                                 jiffies_to_msecs(jiffies -
req->start_jiffies));
 347         ibtrs_clt_decrease_inflight(&sess->stats);
 348
 349         req->in_use = false;
 350         req->con = NULL;
 351
 352         if (notify)
 353                 req->conf(req->priv, errno);
 354 }

 356 static void process_io_rsp(struct ibtrs_clt_sess *sess, u32
msg_id,
 357                            s16 errno, bool w_inval)
 358 {
 359         struct ibtrs_clt_io_req *req;
 360
 361         if (WARN_ON(msg_id >= sess->queue_depth))
 362                 return;
 363
 364         req = &sess->reqs[msg_id];
 365         /* Drop need_inv if server responsed with invalidation */
 366         req->need_inv &= !w_inval;
 367         complete_rdma_req(req, errno, true, false);
 368 }

Hope this clears the doubt.

Regards,
Jack

  reply index

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190620150337.7847-1-jinpuwang@gmail.com>
2019-06-20 15:03 ` [PATCH v4 01/25] sysfs: export sysfs_remove_file_self() Jack Wang
2019-09-23 17:21   ` Bart Van Assche
2019-09-25  9:30     ` Danil Kipnis
2019-07-09  9:55 ` [PATCH v4 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD) Danil Kipnis
2019-07-09 11:00   ` Leon Romanovsky
2019-07-09 11:17     ` Greg KH
2019-07-09 11:57       ` Jinpu Wang
2019-07-09 13:32       ` Leon Romanovsky
2019-07-09 15:39       ` Bart Van Assche
2019-07-09 11:37     ` Jinpu Wang
2019-07-09 12:06       ` Jason Gunthorpe
2019-07-09 13:15         ` Jinpu Wang
2019-07-09 13:19           ` Jason Gunthorpe
2019-07-09 14:17             ` Jinpu Wang
2019-07-09 21:27             ` Sagi Grimberg
2019-07-19 13:12               ` Danil Kipnis
2019-07-10 14:55     ` Danil Kipnis
2019-07-09 12:04   ` Jason Gunthorpe
2019-07-09 19:45   ` Sagi Grimberg
2019-07-10 13:55     ` Jason Gunthorpe
2019-07-10 16:25       ` Sagi Grimberg
2019-07-10 17:25         ` Jason Gunthorpe
2019-07-10 19:11           ` Sagi Grimberg
2019-07-11  7:27             ` Danil Kipnis
2019-07-11  8:54     ` Danil Kipnis
2019-07-12  0:22       ` Sagi Grimberg
2019-07-12  7:57         ` Jinpu Wang [this message]
2019-07-12 19:40           ` Sagi Grimberg
2019-07-15 11:21             ` Jinpu Wang
2019-07-12 10:58         ` Danil Kipnis
     [not found] ` <20190620150337.7847-26-jinpuwang@gmail.com>
2019-07-09 15:10   ` [PATCH v4 25/25] MAINTAINERS: Add maintainer for IBNBD/IBTRS modules Leon Romanovsky
2019-07-09 15:18     ` Jinpu Wang
2019-07-09 15:51       ` Leon Romanovsky
2019-09-13 23:56   ` Bart Van Assche
2019-09-19 10:30     ` Jinpu Wang
     [not found] ` <20190620150337.7847-16-jinpuwang@gmail.com>
2019-09-13 22:10   ` [PATCH v4 15/25] ibnbd: private headers with IBNBD protocol structs and helpers Bart Van Assche
2019-09-15 14:30     ` Jinpu Wang
2019-09-16  5:27       ` Leon Romanovsky
2019-09-16 13:45         ` Bart Van Assche
2019-09-17 15:41           ` Leon Romanovsky
2019-09-17 15:52             ` Jinpu Wang
2019-09-16  7:08       ` Danil Kipnis
2019-09-16 14:57       ` Jinpu Wang
2019-09-16 17:25         ` Bart Van Assche
2019-09-17 12:27           ` Jinpu Wang
2019-09-16 15:39       ` Jinpu Wang
2019-09-18 15:26         ` Bart Van Assche
2019-09-18 16:11           ` Jinpu Wang
     [not found] ` <20190620150337.7847-17-jinpuwang@gmail.com>
2019-09-13 22:25   ` [PATCH v4 16/25] ibnbd: client: private header with client structs and functions Bart Van Assche
2019-09-17 16:36     ` Jinpu Wang
2019-09-25 23:43       ` Danil Kipnis
2019-09-26 10:00         ` Jinpu Wang
     [not found] ` <20190620150337.7847-18-jinpuwang@gmail.com>
2019-09-13 23:46   ` [PATCH v4 17/25] ibnbd: client: main functionality Bart Van Assche
2019-09-16 14:17     ` Danil Kipnis
2019-09-16 16:46       ` Bart Van Assche
2019-09-17 11:39         ` Danil Kipnis
2019-09-18  7:14           ` Danil Kipnis
2019-09-18 15:47             ` Bart Van Assche
2019-09-20  8:29               ` Danil Kipnis
2019-09-25 22:26               ` Danil Kipnis
2019-09-26  9:55                 ` Roman Penyaev
2019-09-26 15:01                   ` Bart Van Assche
2019-09-27  8:52                     ` Roman Penyaev
2019-09-27  9:32                       ` Danil Kipnis
2019-09-27 12:18                         ` Danil Kipnis
2019-09-27 16:37                       ` Bart Van Assche
2019-09-27 16:50                         ` Roman Penyaev
2019-09-27 17:16                           ` Bart Van Assche
2019-09-17 13:09     ` Jinpu Wang
2019-09-17 16:46       ` Bart Van Assche
2019-09-18 12:02         ` Jinpu Wang
2019-09-18 16:05     ` Jinpu Wang
2019-09-14  0:00   ` Bart Van Assche
     [not found] ` <20190620150337.7847-25-jinpuwang@gmail.com>
2019-09-13 23:58   ` [PATCH v4 24/25] ibnbd: a bit of documentation Bart Van Assche
2019-09-18 12:22     ` Jinpu Wang
     [not found] ` <20190620150337.7847-19-jinpuwang@gmail.com>
2019-09-18 16:28   ` [PATCH v4 18/25] ibnbd: client: sysfs interface functions Bart Van Assche
2019-09-19 15:55     ` Jinpu Wang
     [not found] ` <20190620150337.7847-21-jinpuwang@gmail.com>
2019-09-18 17:41   ` [PATCH v4 20/25] ibnbd: server: main functionality Bart Van Assche
2019-09-20  7:36     ` Danil Kipnis
2019-09-20 15:42       ` Bart Van Assche
2019-09-23 15:19         ` Danil Kipnis
     [not found] ` <20190620150337.7847-22-jinpuwang@gmail.com>
2019-09-18 21:46   ` [PATCH v4 21/25] ibnbd: server: functionality for IO submission to file or block dev Bart Van Assche
2019-09-26 14:04     ` Jinpu Wang
2019-09-26 15:11       ` Bart Van Assche
2019-09-26 15:25         ` Danil Kipnis
2019-09-26 15:29           ` Bart Van Assche
2019-09-26 15:38             ` Danil Kipnis
2019-09-26 15:42               ` Jinpu Wang
     [not found] ` <20190620150337.7847-3-jinpuwang@gmail.com>
2019-09-23 17:44   ` [PATCH v4 02/25] ibtrs: public interface header to establish RDMA connections Bart Van Assche
2019-09-25 10:20     ` Danil Kipnis
2019-09-25 15:38       ` Bart Van Assche
     [not found] ` <20190620150337.7847-7-jinpuwang@gmail.com>
2019-09-23 21:51   ` [PATCH v4 06/25] ibtrs: client: main functionality Bart Van Assche
2019-09-25 17:36     ` Danil Kipnis
2019-09-25 18:55       ` Bart Van Assche
2019-09-25 20:50         ` Danil Kipnis
2019-09-25 21:08           ` Bart Van Assche
2019-09-25 21:16             ` Bart Van Assche
2019-09-25 22:53             ` Danil Kipnis
2019-09-25 23:21               ` Bart Van Assche
2019-09-26  9:16                 ` Danil Kipnis
     [not found] ` <20190620150337.7847-4-jinpuwang@gmail.com>
2019-09-23 22:50   ` [PATCH v4 03/25] ibtrs: private headers with IBTRS protocol structs and helpers Bart Van Assche
2019-09-25 21:45     ` Danil Kipnis
2019-09-25 21:57       ` Bart Van Assche
2019-09-27  8:56     ` Jinpu Wang
     [not found] ` <20190620150337.7847-5-jinpuwang@gmail.com>
2019-09-23 23:03   ` [PATCH v4 04/25] ibtrs: core: lib functions shared between client and server modules Bart Van Assche
2019-09-27 10:13     ` Jinpu Wang
     [not found] ` <20190620150337.7847-6-jinpuwang@gmail.com>
2019-09-23 23:05   ` [PATCH v4 05/25] ibtrs: client: private header with client structs and functions Bart Van Assche
2019-09-27 10:18     ` Jinpu Wang
     [not found] ` <20190620150337.7847-8-jinpuwang@gmail.com>
2019-09-23 23:15   ` [PATCH v4 07/25] ibtrs: client: statistics functions Bart Van Assche
2019-09-27 12:00     ` Jinpu Wang
     [not found] ` <20190620150337.7847-10-jinpuwang@gmail.com>
2019-09-23 23:21   ` [PATCH v4 09/25] ibtrs: server: private header with server structs and functions Bart Van Assche
2019-09-27 12:04     ` Jinpu Wang
     [not found] ` <20190620150337.7847-11-jinpuwang@gmail.com>
2019-09-23 23:49   ` [PATCH v4 10/25] ibtrs: server: main functionality Bart Van Assche
2019-09-27 15:03     ` Jinpu Wang
2019-09-27 15:11       ` Bart Van Assche
2019-09-27 15:19         ` Jinpu Wang
     [not found] ` <20190620150337.7847-12-jinpuwang@gmail.com>
2019-09-23 23:56   ` [PATCH v4 11/25] ibtrs: server: statistics functions Bart Van Assche
2019-10-02 15:15     ` Jinpu Wang
2019-10-02 15:42       ` Leon Romanovsky
2019-10-02 15:45         ` Jinpu Wang
2019-10-02 16:00           ` Leon Romanovsky
     [not found] ` <20190620150337.7847-13-jinpuwang@gmail.com>
2019-09-24  0:00   ` [PATCH v4 12/25] ibtrs: server: sysfs interface functions Bart Van Assche
2019-10-02 15:11     ` Jinpu Wang

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAD9gYJKcJ47ogKL4S_KMtxpS1gPHHhqqG7-GTi-2c0cOJ-LJtw@mail.gmail.com \
    --to=jinpuwang@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=danil.kipnis@cloud.ionos.com \
    --cc=dledford@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=jgg@mellanox.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=r.peniaev@gmail.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git