All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zhu Yanjun <zyjzyj2000@gmail.com>
To: Olga Kornievskaia <aglo@umich.edu>
Cc: Leon Romanovsky <leon@kernel.org>,
	Bob Pearson <rpearsonhpe@gmail.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	linux-rdma <linux-rdma@vger.kernel.org>
Subject: Re: RXE status in the upstream rping using rxe
Date: Tue, 17 Aug 2021 10:28:16 +0800	[thread overview]
Message-ID: <CAD=hENdqho3mRy=gUSE-vuXzLvZPkwJ7kEFrjRN-AxLwvQP18Q@mail.gmail.com> (raw)
In-Reply-To: <CAN-5tyG4kBYBEaCDPGr=gUTNGkcoznMUy8e4BwCzWZkSPG-=+Q@mail.gmail.com>

On Fri, Aug 6, 2021 at 10:37 AM Olga Kornievskaia <aglo@umich.edu> wrote:
>
> On Wed, Aug 4, 2021 at 5:05 AM Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
> >
> > On Wed, Aug 4, 2021 at 1:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Wed, Aug 04, 2021 at 09:09:41AM +0800, Zhu Yanjun wrote:
> > > > On Wed, Aug 4, 2021 at 9:01 AM Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
> > > > >
> > > > > On Wed, Aug 4, 2021 at 2:07 AM Leon Romanovsky <leon@kernel.org> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Can you please help me to understand the RXE status in the upstream?
> > > > > >
> > > > > > Does we still have crashes/interop issues/e.t.c?
> > > > >
> > > > > I made some developments with the RXE in the upstream, from my usage
> > > > > with latest RXE,
> > > > > I found the following:
> > > > >
> > > > > 1. rdma-core can not work well with latest RDMA git;
> > > >
> > > > The latest RDMA git is
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
> > >
> > > "Latest" is a relative term, what SHA did you test?
> > > Let's focus on fixing RXE before we will continue with new features.
> >
> > Thanks a lot. I agree with you.
>
> I believe simple rping still doesn't work linux-to-linux. The last
> working version (of rping in rxe) was 5.13 I think. I have posted a
> number of crashes rping encounters (gotta get that working before I
> can even try NFSoRDMA).

The following are my tests.

1. Modprobe rdma_rxe
2. Modprobe -v -r rdma_rxe
3. Rdma link add rxe
4. Rdma link del rxe
5. Latest rdma-core && latest kernel upstream;
6. Latest kernel < ------rping---- > 5.10.y stable
7. Latest kernel < ------rping---- > 5.11.y stable
8. Latest kernel < ------rping---- > 5.12.y stable
9. Latest kernel < ------rping---- > 5.13.y stable

It seems that the latest kernel upstream (5.14-rc6) can rping other
stable kernels.
Can you make tests again?

Zhu Yanjun
>
> Thank you for working on the code.
>
> We (NFS community) do test NFSoRDMA every git pull using rxe and siw
> but lately have been encountering problems.
>
> > rdma-core:
> > 313509f8 (HEAD -> master, origin/master, origin/HEAD) Merge pull
> > request #1038 from selvintxavier/master
> > 2d3dc48b Merge pull request #1039 from amzn/pyverbs-mac-fix-pr
> > 327d45e0 tests: Add missing MAC element to args list
> > 66aba73d bnxt_re/lib: Move hardware queue to 16B aligned indices
> > 8754fb51 bnxt_re/lib: Use separate indices for shadow queue
> > be4d8abf bnxt_re/lib: add a function to initialize software queue
> >
> > kernel rdma:
> > 0050a57638ca (HEAD -> for-next, origin/for-next, origin/HEAD)
> > RDMA/qedr: Improve error logs for rdma_alloc_tid error return
> > 090473004b02 RDMA/qed: Use accurate error num in qed_cxt_dynamic_ilt_alloc
> > 991c4274dc17 RDMA/hfi1: Fix typo in comments
> > 8d7e415d5561 docs: Fix infiniband uverbs minor number
> > bbafcbc2b1c9 RDMA/iwpm: Rely on the rdma_nl_[un]register() to ensure
> > that requests are valid
> > bdb0e4e3ff19 RDMA/iwpm: Remove not-needed reference counting
> > e677b72a0647 RDMA/iwcm: Release resources if iw_cm module initialization fails
> > a0293eb24936 RDMA/hfi1: Convert from atomic_t to refcount_t on
> > hfi1_devdata->user_refcount
> >
> > with the above kernel and rdma-core, the following messages will appear.
> > "
> > [   54.214608] rdma_rxe: loaded
> > [   54.217089] infiniband rxe0: set active
> > [   54.217101] infiniband rxe0: added enp0s8
> > [  167.623200] rdma_rxe: cqe(32768) > max_cqe(32767)
> > [  167.645590] rdma_rxe: cqe(1) < current # elements in queue (6)
> > [  167.733297] rdma_rxe: cqe(32768) > max_cqe(32767)
> > [  169.074755] rdma_rxe: check_rkey: no MW matches rkey 0x1000247
> > [  169.074796] rdma_rxe: qp#27 moved to error state
> > [  169.138851] rdma_rxe: check_rkey: no MW matches rkey 0x10005de
> > [  169.138889] rdma_rxe: qp#30 moved to error state
> > [  169.160565] rdma_rxe: check_rkey: no MW matches rkey 0x10006f7
> > [  169.160601] rdma_rxe: qp#31 moved to error state
> > [  169.182132] rdma_rxe: check_rkey: no MW matches rkey 0x1000782
> > [  169.182170] rdma_rxe: qp#32 moved to error state
> > [  169.667803] rdma_rxe: check_rkey: no MR matches rkey 0x18d8
> > [  169.667850] rdma_rxe: qp#39 moved to error state
> > [  198.872649] rdma_rxe: cqe(32768) > max_cqe(32767)
> > [  198.894829] rdma_rxe: cqe(1) < current # elements in queue (6)
> > [  198.981839] rdma_rxe: cqe(32768) > max_cqe(32767)
> > [  200.332031] rdma_rxe: check_rkey: no MW matches rkey 0x1000887
> > [  200.332086] rdma_rxe: qp#58 moved to error state
> > [  200.396476] rdma_rxe: check_rkey: no MW matches rkey 0x1000b0d
> > [  200.396514] rdma_rxe: qp#61 moved to error state
> > [  200.417919] rdma_rxe: check_rkey: no MW matches rkey 0x1000c40
> > [  200.417956] rdma_rxe: qp#62 moved to error state
> > [  200.439616] rdma_rxe: check_rkey: no MW matches rkey 0x1000d24
> > [  200.439654] rdma_rxe: qp#63 moved to error state
> > [  200.933104] rdma_rxe: check_rkey: no MR matches rkey 0x37d8
> > [  200.933153] rdma_rxe: qp#70 moved to error state
> > [  206.880305] rdma_rxe: cqe(32768) > max_cqe(32767)
> > [  206.904030] rdma_rxe: cqe(1) < current # elements in queue (6)
> > [  206.991494] rdma_rxe: cqe(32768) > max_cqe(32767)
> > [  208.359987] rdma_rxe: check_rkey: no MW matches rkey 0x1000e4d
> > [  208.360028] rdma_rxe: qp#89 moved to error state
> > [  208.425637] rdma_rxe: check_rkey: no MW matches rkey 0x1001136
> > [  208.425675] rdma_rxe: qp#92 moved to error state
> > [  208.447333] rdma_rxe: check_rkey: no MW matches rkey 0x10012d8
> > [  208.447370] rdma_rxe: qp#93 moved to error state
> > [  208.469511] rdma_rxe: check_rkey: no MW matches rkey 0x100137a
> > [  208.469550] rdma_rxe: qp#94 moved to error state
> > [  208.956691] rdma_rxe: check_rkey: no MR matches rkey 0x5670
> > [  208.956731] rdma_rxe: qp#100 moved to error state
> > [  216.879703] rdma_rxe: cqe(32768) > max_cqe(32767)
> > [  216.902199] rdma_rxe: cqe(1) < current # elements in queue (6)
> > [  216.989264] rdma_rxe: cqe(32768) > max_cqe(32767)
> > [  218.363765] rdma_rxe: check_rkey: no MW matches rkey 0x10014d6
> > [  218.363808] rdma_rxe: qp#119 moved to error state
> > [  218.429474] rdma_rxe: check_rkey: no MW matches rkey 0x10017e4
> > [  218.429513] rdma_rxe: qp#122 moved to error state
> > [  218.451443] rdma_rxe: check_rkey: no MW matches rkey 0x1001895
> > [  218.451481] rdma_rxe: qp#123 moved to error state
> > [  218.473869] rdma_rxe: check_rkey: no MW matches rkey 0x1001910
> > [  218.473908] rdma_rxe: qp#124 moved to error state
> > [  218.963602] rdma_rxe: check_rkey: no MR matches rkey 0x757b
> > [  218.963641] rdma_rxe: qp#130 moved to error state
> > [  233.855140] rdma_rxe: cqe(32768) > max_cqe(32767)
> > [  233.877202] rdma_rxe: cqe(1) < current # elements in queue (6)
> > [  233.963952] rdma_rxe: cqe(32768) > max_cqe(32767)
> > [  235.305274] rdma_rxe: check_rkey: no MW matches rkey 0x1001ac2
> > [  235.305319] rdma_rxe: qp#149 moved to error state
> > [  235.368800] rdma_rxe: check_rkey: no MW matches rkey 0x1001db8
> > [  235.368838] rdma_rxe: qp#152 moved to error state
> > [  235.390155] rdma_rxe: check_rkey: no MW matches rkey 0x1001e4d
> > [  235.390192] rdma_rxe: qp#153 moved to error state
> > [  235.411336] rdma_rxe: check_rkey: no MW matches rkey 0x1001f4c
> > [  235.411374] rdma_rxe: qp#154 moved to error state
> > [  235.895784] rdma_rxe: check_rkey: no MR matches rkey 0x9482
> > [  235.895828] rdma_rxe: qp#161 moved to error state
> > "
> > Not sure if they are problems.
> > IMO, we should make further investigations.
> >
> > Thanks
> > Zhu Yanjun
> > >
> > > Thanks

  reply	other threads:[~2021-08-17  2:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-03 18:07 RXE status in the upstream rping using rxe Leon Romanovsky
2021-08-04  1:01 ` Zhu Yanjun
2021-08-04  1:09   ` Zhu Yanjun
2021-08-04  5:41     ` Leon Romanovsky
2021-08-04  9:05       ` Zhu Yanjun
2021-08-06  2:37         ` Olga Kornievskaia
2021-08-17  2:28           ` Zhu Yanjun [this message]
2021-08-18  6:43             ` yangx.jy
2021-08-18  7:20               ` Zhu Yanjun
2021-08-18  7:44                 ` yangx.jy
2021-08-18  8:28                   ` Zhu Yanjun
2021-08-18 14:33                     ` yangx.jy
2021-08-20  3:31                       ` Zhu Yanjun
2021-08-20  7:42                         ` yangx.jy
2021-08-20 21:40                           ` Bob Pearson
2021-08-20 22:09                             ` Bob Pearson
2021-08-13 21:53         ` Bob Pearson
2021-08-14  5:32           ` Leon Romanovsky
2021-08-23  7:53 ` Zhu Yanjun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD=hENdqho3mRy=gUSE-vuXzLvZPkwJ7kEFrjRN-AxLwvQP18Q@mail.gmail.com' \
    --to=zyjzyj2000@gmail.com \
    --cc=aglo@umich.edu \
    --cc=jgg@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=rpearsonhpe@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.