linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yi Zhang <yi.zhang@redhat.com>
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
	Zhu Yanjun <zyjzyj2000@gmail.com>,
	RDMA mailing list <linux-rdma@vger.kernel.org>,
	Bart Van Assche <bvanassche@acm.org>,
	mie@igel.co.jp, rao.shoaib@oracle.com,
	Sagi Grimberg <sagi@grimberg.me>
Subject: Re: [PATCH for-rc v4 0/5] RDMA/rxe: Various bug fixes.
Date: Sat, 18 Sep 2021 10:13:19 +0800	[thread overview]
Message-ID: <CAHj4cs_nO40bY0rDo8KB52QRCi4Qz6nVAQCSBJmgm84FtvM-BA@mail.gmail.com> (raw)
In-Reply-To: <20210914164206.19768-1-rpearsonhpe@gmail.com>

Hi Bob
With this patch serious, the blktests nvme-rdma still can be failed
with the below error. and the test can be pass with siw.

[ 1702.140090] loop0: detected capacity change from 0 to 2097152
[ 1702.150729] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 1702.151425] nvmet_rdma: enabling port 0 (10.16.221.116:4420)
[ 1702.158810] nvmet: creating controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0035-4b10-8044-b9c04f463333.
[ 1702.159037] nvme nvme0: creating 32 I/O queues.
[ 1702.171671] nvme nvme0: failed to initialize MR pool sized 128 for QID 32
[ 1702.178482] nvme nvme0: rdma connection establishment failed (-12)
[ 1702.292261] eno2 speed is unknown, defaulting to 1000
[ 1702.297325] eno3 speed is unknown, defaulting to 1000
[ 1702.302389] eno4 speed is unknown, defaulting to 1000
[ 1702.317991] rdma_rxe: unloaded

Failure from:
        /*
         * Currently we don't use SG_GAPS MR's so if the first entry is
         * misaligned we'll end up using two entries for a single data page,
         * so one additional entry is required.
         */
        pages_per_mr = nvme_rdma_get_max_fr_pages(ibdev, queue->pi_support) + 1;
        ret = ib_mr_pool_init(queue->qp, &queue->qp->rdma_mrs,
                              queue->queue_size,
                              IB_MR_TYPE_MEM_REG,
                              pages_per_mr, 0);
        if (ret) {
                dev_err(queue->ctrl->ctrl.device,
                        "failed to initialize MR pool sized %d for QID %d\n",
                        queue->queue_size, nvme_rdma_queue_idx(queue));
                goto out_destroy_ring;
        }


On Wed, Sep 15, 2021 at 12:43 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> This series of patches implements several bug fixes and minor
> cleanups of the rxe driver. Specifically these fix a bug exposed
> by blktest.
>
> They apply cleanly to both
> commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc)
> commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next)
>
> The first patch is a rewrite of an earlier patch.
> It adds memory barriers to kernel to kernel queues. The logic for this
> is the same as an earlier patch that only treated user to kernel queues.
> Without this patch kernel to kernel queues are expected to intermittently
> fail at low frequency as was seen for the other queues.
>
> The second patch cleans up the state and type enums used by MRs.
>
> The third patch separates the keys in rxe_mr and ib_mr. This allows
> the following sequence seen in the srp driver to work correctly.
>
>         do {
>                 ib_post_send( IB_WR_LOCAL_INV )
>                 ib_update_fast_reg_key()
>                 ib_map_mr_sg()
>                 ib_post_send( IB_WR_REG_MR )
>         } while ( !done )
>
> The fourth patch creates duplicate mapping tables for fast MRs. This
> prevents rkeys referencing fast MRs from accessing data from an updated
> map after the call to ib_map_mr_sg() call by keeping the new and old
> mappings separate and atomically swapping them when a reg mr WR is
> executed.
>
> The fifth patch checks the type of MRs which receive local or remote
> invalidate operations to prevent invalidating user MRs.
>
> v3->v4:
> Two of the patches in v3 were accepted in v5.15 so have been dropped
> here.
>
> The first patch was rewritten to correctly deal with queue operations
> in rxe_verbs.c where the code is the client and not the server.
>
> v2->v3:
> The v2 version had a typo which broke clean application to for-next.
> Additionally in v3 the order of the patches was changed to make
> it a little cleaner.
>
> Bob Pearson (5):
>   RDMA/rxe: Add memory barriers to kernel queues
>   RDMA/rxe: Cleanup MR status and type enums
>   RDMA/rxe: Separate HW and SW l/rkeys
>   RDMA/rxe: Create duplicate mapping tables for FMRs
>   RDMA/rxe: Only allow invalidate for appropriate MRs
>
>  drivers/infiniband/sw/rxe/rxe_comp.c  |  12 +-
>  drivers/infiniband/sw/rxe/rxe_cq.c    |  25 +--
>  drivers/infiniband/sw/rxe/rxe_loc.h   |   2 +
>  drivers/infiniband/sw/rxe/rxe_mr.c    | 267 ++++++++++++++++-------
>  drivers/infiniband/sw/rxe/rxe_mw.c    |  36 ++--
>  drivers/infiniband/sw/rxe/rxe_qp.c    |  12 +-
>  drivers/infiniband/sw/rxe/rxe_queue.c |  30 ++-
>  drivers/infiniband/sw/rxe/rxe_queue.h | 292 +++++++++++---------------
>  drivers/infiniband/sw/rxe/rxe_req.c   |  51 ++---
>  drivers/infiniband/sw/rxe/rxe_resp.c  |  40 +---
>  drivers/infiniband/sw/rxe/rxe_srq.c   |   2 +-
>  drivers/infiniband/sw/rxe/rxe_verbs.c |  92 ++------
>  drivers/infiniband/sw/rxe/rxe_verbs.h |  48 ++---
>  13 files changed, 438 insertions(+), 471 deletions(-)
>
> --
> 2.30.2
>


-- 
Best Regards,
  Yi Zhang


  parent reply	other threads:[~2021-09-18  2:13 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-14 16:42 Bob Pearson
2021-09-14 16:42 ` [PATCH for-rc v4 1/5] RDMA/rxe: Add memory barriers to kernel queues Bob Pearson
2021-09-14 16:42 ` [PATCH for-rc v4 2/5] RDMA/rxe: Cleanup MR status and type enums Bob Pearson
2021-09-14 16:42 ` [PATCH for-rc v4 3/5] RDMA/rxe: Separate HW and SW l/rkeys Bob Pearson
2021-09-14 16:42 ` [PATCH for-rc v4 4/5] RDMA/rxe: Create duplicate mapping tables for FMRs Bob Pearson
2021-09-14 16:42 ` [PATCH for-rc v4 5/5] RDMA/rxe: Only allow invalidate for appropriate MRs Bob Pearson
2021-09-15  0:07 ` [PATCH for-rc v4 0/5] RDMA/rxe: Various bug fixes Shoaib Rao
2021-09-15  0:58   ` Bob Pearson
2021-09-18  2:13 ` Yi Zhang [this message]
2021-09-25  7:55   ` Yi Zhang
2021-09-25 20:32     ` Pearson, Robert B
2021-09-23 18:49 ` Olga Kornievskaia
2021-09-23 19:56   ` Jason Gunthorpe
2021-09-24  1:25     ` Bob Pearson
2021-09-24 13:54 ` Jason Gunthorpe
2021-09-24 15:44   ` Shoaib Rao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHj4cs_nO40bY0rDo8KB52QRCi4Qz6nVAQCSBJmgm84FtvM-BA@mail.gmail.com \
    --to=yi.zhang@redhat.com \
    --cc=bvanassche@acm.org \
    --cc=jgg@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mie@igel.co.jp \
    --cc=rao.shoaib@oracle.com \
    --cc=rpearsonhpe@gmail.com \
    --cc=sagi@grimberg.me \
    --cc=zyjzyj2000@gmail.com \
    --subject='Re: [PATCH for-rc v4 0/5] RDMA/rxe: Various bug fixes.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).