linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Pearson, Robert B" <robert.pearson2@hpe.com>
To: Yi Zhang <yi.zhang@redhat.com>,
	Bob Pearson <rpearsonhpe@gmail.com>,
	"Sagi Grimberg" <sagi@grimberg.me>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
	Zhu Yanjun <zyjzyj2000@gmail.com>,
	"RDMA mailing list" <linux-rdma@vger.kernel.org>,
	Bart Van Assche <bvanassche@acm.org>,
	"mie@igel.co.jp" <mie@igel.co.jp>,
	"rao.shoaib@oracle.com" <rao.shoaib@oracle.com>
Subject: RE: [PATCH for-rc v4 0/5] RDMA/rxe: Various bug fixes.
Date: Sat, 25 Sep 2021 20:32:12 +0000	[thread overview]
Message-ID: <CS1PR8401MB10777D25DDD14D3F00E4C257BCA59@CS1PR8401MB1077.NAMPRD84.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <CAHj4cs9FL-RNR3NqEQ=mRNokU0D0+0xFHTghmzOp13J6MCZQ2A@mail.gmail.com>

Sorry that was my fault. 256K seemed way too big but 4K is too small. This whole area has been problematic lately. There are several use cases for rxe and one is to be an easy test target which is hard if you can't reach the limits.
On the other hand if you want to really use it you just want the limits to get out of the way. One way to fix this would be to add a bunch of module parameters but there are too many attributes to be practical.
Another would be to have 2 or 3 attribute sets (large, medium, small) that you could pick with one module parameter. Or, you could have a configuration file somewhere that the driver could read but seems messy.
Or you could pass strings to one module parameter that could be parsed to set the attributes. If anyone has a preference I'd be happy to write something.

Bob

-----Original Message-----
From: Yi Zhang <yi.zhang@redhat.com> 
Sent: Saturday, September 25, 2021 2:56 AM
To: Bob Pearson <rpearsonhpe@gmail.com>; Sagi Grimberg <sagi@grimberg.me>
Cc: Jason Gunthorpe <jgg@nvidia.com>; Zhu Yanjun <zyjzyj2000@gmail.com>; RDMA mailing list <linux-rdma@vger.kernel.org>; Bart Van Assche <bvanassche@acm.org>; mie@igel.co.jp; rao.shoaib@oracle.com
Subject: Re: [PATCH for-rc v4 0/5] RDMA/rxe: Various bug fixes.

Hi Bob/Sagi

Regarding the blktests nvme-rdma failure with rdma_rxe, I found the patch[3] reduce the number of MRs and MWs to 4K from 256K, which lead the nvme connect failed[1], if I change the "number of io queues to use" to 31, it will works[2], and the default io queues num is the CPU core count(mine is 48), that's why it failed on my environment.

I also have tried revert patch[2], and the blktests run successfully on my server, could we change back to 256K again?

[1]
# nvme connect -t rdma -a 10.16.221.74 -s 4420 -n testnqn  -i 32 Failed to write to /dev/nvme-fabrics: Cannot allocate memory # dmesg [  857.546927] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-4310-8058-b9c04f325732.
[  857.547640] nvme nvme0: creating 32 I/O queues.
[  857.595285] nvme nvme0: failed to initialize MR pool sized 128 for QID 32 [  857.602141] nvme nvme0: rdma connection establishment failed (-12) [2] # nvme connect -t rdma -a 10.16.221.74 -s 4420 -n testnqn  -i 31 # dmesg [  863.792874] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-4310-8058-b9c04f325732.
[  863.793638] nvme nvme0: creating 31 I/O queues.
[  863.845594] nvme nvme0: mapped 31/0/0 default/read/poll queues.
[  863.853633] nvme nvme0: new ctrl: NQN "testnqn", addr 10.16.221.74:4420

[3]
commit af732adfacb2c6d886713624af2ff8e555c32aa4
Author: Bob Pearson <rpearsonhpe@gmail.com>
Date:   Mon Jun 7 23:25:46 2021 -0500

    RDMA/rxe: Enable MW object pool

    Currently the rxe driver has a rxe_mw struct object but nothing about
    memory windows is enabled. This patch turns on memory windows and some
    minor cleanup.

    Set device attribute in rxe.c so max_mw = MAX_MW.  Change parameters in
    rxe_param.h so that MAX_MW is the same as MAX_MR.  Reduce the number of
    MRs and MWs to 4K from 256K.  Add device capability bits for 2a and 2b
    memory windows.  Removed RXE_MR_TYPE_MW from the rxe_mr_type enum.



On Sat, Sep 18, 2021 at 10:13 AM Yi Zhang <yi.zhang@redhat.com> wrote:
>
> Hi Bob
> With this patch serious, the blktests nvme-rdma still can be failed 
> with the below error. and the test can be pass with siw.
>
> [ 1702.140090] loop0: detected capacity change from 0 to 2097152 [ 
> 1702.150729] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ 
> 1702.151425] nvmet_rdma: enabling port 0 (10.16.221.116:4420) [ 
> 1702.158810] nvmet: creating controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0035-4b10-8044-b9c04f463333.
> [ 1702.159037] nvme nvme0: creating 32 I/O queues.
> [ 1702.171671] nvme nvme0: failed to initialize MR pool sized 128 for 
> QID 32 [ 1702.178482] nvme nvme0: rdma connection establishment failed 
> (-12) [ 1702.292261] eno2 speed is unknown, defaulting to 1000 [ 
> 1702.297325] eno3 speed is unknown, defaulting to 1000 [ 1702.302389] 
> eno4 speed is unknown, defaulting to 1000 [ 1702.317991] rdma_rxe: 
> unloaded
>
> Failure from:
>         /*
>          * Currently we don't use SG_GAPS MR's so if the first entry is
>          * misaligned we'll end up using two entries for a single data page,
>          * so one additional entry is required.
>          */
>         pages_per_mr = nvme_rdma_get_max_fr_pages(ibdev, queue->pi_support) + 1;
>         ret = ib_mr_pool_init(queue->qp, &queue->qp->rdma_mrs,
>                               queue->queue_size,
>                               IB_MR_TYPE_MEM_REG,
>                               pages_per_mr, 0);
>         if (ret) {
>                 dev_err(queue->ctrl->ctrl.device,
>                         "failed to initialize MR pool sized %d for QID %d\n",
>                         queue->queue_size, nvme_rdma_queue_idx(queue));
>                 goto out_destroy_ring;
>         }
>
>
> On Wed, Sep 15, 2021 at 12:43 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
> >
> > This series of patches implements several bug fixes and minor 
> > cleanups of the rxe driver. Specifically these fix a bug exposed by 
> > blktest.
> >
> > They apply cleanly to both
> > commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc) 
> > commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next)
> >
> > The first patch is a rewrite of an earlier patch.
> > It adds memory barriers to kernel to kernel queues. The logic for 
> > this is the same as an earlier patch that only treated user to kernel queues.
> > Without this patch kernel to kernel queues are expected to 
> > intermittently fail at low frequency as was seen for the other queues.
> >
> > The second patch cleans up the state and type enums used by MRs.
> >
> > The third patch separates the keys in rxe_mr and ib_mr. This allows 
> > the following sequence seen in the srp driver to work correctly.
> >
> >         do {
> >                 ib_post_send( IB_WR_LOCAL_INV )
> >                 ib_update_fast_reg_key()
> >                 ib_map_mr_sg()
> >                 ib_post_send( IB_WR_REG_MR )
> >         } while ( !done )
> >
> > The fourth patch creates duplicate mapping tables for fast MRs. This 
> > prevents rkeys referencing fast MRs from accessing data from an 
> > updated map after the call to ib_map_mr_sg() call by keeping the new 
> > and old mappings separate and atomically swapping them when a reg mr 
> > WR is executed.
> >
> > The fifth patch checks the type of MRs which receive local or remote 
> > invalidate operations to prevent invalidating user MRs.
> >
> > v3->v4:
> > Two of the patches in v3 were accepted in v5.15 so have been dropped 
> > here.
> >
> > The first patch was rewritten to correctly deal with queue 
> > operations in rxe_verbs.c where the code is the client and not the server.
> >
> > v2->v3:
> > The v2 version had a typo which broke clean application to for-next.
> > Additionally in v3 the order of the patches was changed to make it a 
> > little cleaner.
> >
> > Bob Pearson (5):
> >   RDMA/rxe: Add memory barriers to kernel queues
> >   RDMA/rxe: Cleanup MR status and type enums
> >   RDMA/rxe: Separate HW and SW l/rkeys
> >   RDMA/rxe: Create duplicate mapping tables for FMRs
> >   RDMA/rxe: Only allow invalidate for appropriate MRs
> >
> >  drivers/infiniband/sw/rxe/rxe_comp.c  |  12 +-
> >  drivers/infiniband/sw/rxe/rxe_cq.c    |  25 +--
> >  drivers/infiniband/sw/rxe/rxe_loc.h   |   2 +
> >  drivers/infiniband/sw/rxe/rxe_mr.c    | 267 ++++++++++++++++-------
> >  drivers/infiniband/sw/rxe/rxe_mw.c    |  36 ++--
> >  drivers/infiniband/sw/rxe/rxe_qp.c    |  12 +-
> >  drivers/infiniband/sw/rxe/rxe_queue.c |  30 ++-  
> > drivers/infiniband/sw/rxe/rxe_queue.h | 292 +++++++++++---------------
> >  drivers/infiniband/sw/rxe/rxe_req.c   |  51 ++---
> >  drivers/infiniband/sw/rxe/rxe_resp.c  |  40 +---
> >  drivers/infiniband/sw/rxe/rxe_srq.c   |   2 +-
> >  drivers/infiniband/sw/rxe/rxe_verbs.c |  92 ++------  
> > drivers/infiniband/sw/rxe/rxe_verbs.h |  48 ++---
> >  13 files changed, 438 insertions(+), 471 deletions(-)
> >
> > --
> > 2.30.2
> >
>
>
> --
> Best Regards,
>   Yi Zhang



--
Best Regards,
  Yi Zhang


  reply	other threads:[~2021-09-25 20:36 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-14 16:42 Bob Pearson
2021-09-14 16:42 ` [PATCH for-rc v4 1/5] RDMA/rxe: Add memory barriers to kernel queues Bob Pearson
2021-09-14 16:42 ` [PATCH for-rc v4 2/5] RDMA/rxe: Cleanup MR status and type enums Bob Pearson
2021-09-14 16:42 ` [PATCH for-rc v4 3/5] RDMA/rxe: Separate HW and SW l/rkeys Bob Pearson
2021-09-14 16:42 ` [PATCH for-rc v4 4/5] RDMA/rxe: Create duplicate mapping tables for FMRs Bob Pearson
2021-09-14 16:42 ` [PATCH for-rc v4 5/5] RDMA/rxe: Only allow invalidate for appropriate MRs Bob Pearson
2021-09-15  0:07 ` [PATCH for-rc v4 0/5] RDMA/rxe: Various bug fixes Shoaib Rao
2021-09-15  0:58   ` Bob Pearson
2021-09-18  2:13 ` Yi Zhang
2021-09-25  7:55   ` Yi Zhang
2021-09-25 20:32     ` Pearson, Robert B [this message]
2021-09-23 18:49 ` Olga Kornievskaia
2021-09-23 19:56   ` Jason Gunthorpe
2021-09-24  1:25     ` Bob Pearson
2021-09-24 13:54 ` Jason Gunthorpe
2021-09-24 15:44   ` Shoaib Rao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CS1PR8401MB10777D25DDD14D3F00E4C257BCA59@CS1PR8401MB1077.NAMPRD84.PROD.OUTLOOK.COM \
    --to=robert.pearson2@hpe.com \
    --cc=bvanassche@acm.org \
    --cc=jgg@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mie@igel.co.jp \
    --cc=rao.shoaib@oracle.com \
    --cc=rpearsonhpe@gmail.com \
    --cc=sagi@grimberg.me \
    --cc=yi.zhang@redhat.com \
    --cc=zyjzyj2000@gmail.com \
    --subject='RE: [PATCH for-rc v4 0/5] RDMA/rxe: Various bug fixes.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).