All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zhu Yanjun <zyjzyj2000@gmail.com>
To: Guoqing Jiang <guoqing.jiang@linux.dev>
Cc: Jason Gunthorpe <jgg@ziepe.ca>,
	Bob Pearson <rpearsonhpe@gmail.com>,
	RDMA mailing list <linux-rdma@vger.kernel.org>
Subject: Re: bug report for rxe
Date: Tue, 22 Feb 2022 18:04:20 +0800	[thread overview]
Message-ID: <CAD=hENeU=cf4_AZPYBDke-kv3Lv3+AUkkEjZm4Drkc6YLJOeLQ@mail.gmail.com> (raw)
In-Reply-To: <473a53b6-9ab2-0d48-a9cf-c84b8dc4c3f3@linux.dev>

On Tue, Feb 22, 2022 at 5:50 PM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
>
>
> On 2/10/22 3:36 PM, Guoqing Jiang wrote:
> > However, seems rnbd/rtrs over rxe still can't work with 5.17-rc3 kernel,
> > dmesg reports below.
> >
> > 1. server side
> >
> > [  440.723182] rdma_rxe: qp#17 moved to error state
> > [  440.725300] rtrs_server L1205: <bla>: remote access error (wr_cqe: 000000003b14397c, type: 0, vendor_err: 0x0, len: 0)
> > [  440.845926] rnbd_server L256: RTRS Session bla disconnected
> >
> > 2. client side
> >
> > [  997.817536] rnbd_client L596: Mapping device /dev/loop1 on session bla, (access_mode: rw, nr_poll_queues: 0)
> > [  998.968810] rnbd_client L1213: [session=bla] mapped 8/8 default/read queues.
> > [  999.017988] rtrs_client L610: <bla>: RDMA failed: remote access error
> > [ 1029.836943] rtrs_client L353: <bla>: Failed IB_WR_LOCAL_INV: WR flushe
> >
> > Then I tried 5.16 and 5.15 version, seems 5.15 does work as follows.
> >
> > 1. server side
> >
> > [  333.076482] rnbd_server L800: </dev/loop1@bla>: Opened device 'loop1'
> >
> > 2. client side
> >
> > [ 1584.325825] rnbd_client L596: Mapping device /dev/loop1 on session bla, (access_mode: rw, nr_poll_queues: 0)
> > [ 1585.268291] rnbd_client L1213: [session=bla] mapped 8/8 default/read queues.
> > [ 1585.349300] rnbd_client L1607: </dev/loop1@bla> map_device: Device mapped as rnbd0 (nsectors: 0, logical_block_size: 512, physical_block_size: 512, max_write_same_sectors: 0, max_discard_sectors: 0, discard_granularity: 0, discard_alignment: 0, secure_discard: 0, max_segments: 128, max_hw_sectors: 248, rotational: 1, wc: 0, fua: 0)
> >
> > I would appreciate if someone shed light on why it doesn't work after 5.15,
> > And I am happy to test potential patch for it.
>
> After investigation, seems the culprit is commit 647bf13ce944 ("RDMA/rxe:
> Create duplicate mapping tables for FMRs"). The problem is mr_check_range
> returns -EFAULT after find iova and length are not valid, so connection
> between
> two VMs can't be established.
>
> Revert the commit manually or apply below temporary change,  rxe works again
> with rnbd/rtrs though I don't think it is the right thing to do. Could
> experts provide
> a proper solution? Thanks.
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c
> b/drivers/infiniband/sw/rxe/rxe_mr.c
> index 453ef3c9d535..4a2fc4d5809d 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -652,7 +652,7 @@ int rxe_reg_fast_mr(struct rxe_qp *qp, struct
> rxe_send_wqe *wqe)
>          mr->state = RXE_MR_STATE_VALID;
>
>          set = mr->cur_map_set;
> -       mr->cur_map_set = mr->next_map_set;
> +       //mr->cur_map_set = mr->next_map_set;
>          mr->cur_map_set->iova = wqe->wr.wr.reg.mr->iova;
>          mr->next_map_set = set;
>
> @@ -662,7 +662,7 @@ int rxe_reg_fast_mr(struct rxe_qp *qp, struct
> rxe_send_wqe *wqe)
>   int rxe_mr_set_page(struct ib_mr *ibmr, u64 addr)
>   {
>          struct rxe_mr *mr = to_rmr(ibmr);
> -       struct rxe_map_set *set = mr->next_map_set;
> +       struct rxe_map_set *set = mr->cur_map_set;
>          struct rxe_map *map;
>          struct rxe_phys_buf *buf;
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c
> b/drivers/infiniband/sw/rxe/rxe_verbs.c
> index 80df9a8f71a1..e41d2c8612d8 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> @@ -992,7 +992,7 @@ static int rxe_map_mr_sg(struct ib_mr *ibmr, struct
> scatterlist *sg,
>                           int sg_nents, unsigned int *sg_offset)
>   {
>          struct rxe_mr *mr = to_rmr(ibmr);
> -       struct rxe_map_set *set = mr->next_map_set;
> +       struct rxe_map_set *set = mr->cur_map_set;

Thanks a lot. Please file a patch for the above changes.

Zhu Yanjun

>
> And the test is pretty simple.
>
> 1.  VM (server)
>
> modprobe rdma_rxe
> rdma link add rxe0 type rxe netdev ens3
> modprobe rnbd-server
>
> 2.  VM (client)
>
> modprobe rdma_rxe
> rdma link add rxe0 type rxe netdev ens3
> modprobe rnbd-client
> echo "sessname=bla path=ip:$serverip
> device_path=$block_device_in_server" >
> /sys/devices/virtual/rnbd-client/ctl/map_device
>
> BTW, I tried wip/jgg-for-next branch with commit 3810c1a1cbe8f.
>
> Guoqing

  reply	other threads:[~2022-02-22 10:04 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-10  7:36 [PATCH 0/3] patches and bug report for rxe Guoqing Jiang
2022-02-10  7:36 ` [PATCH 1/3] RDMA/rxe: Replace write_lock_bh with write_lock_irqsave in __rxe_add_index Guoqing Jiang
2022-02-10 13:29   ` Zhu Yanjun
2022-02-10  7:36 ` [PATCH 2/3] RDMA/rxe: Replace write_lock_bh with write_lock_irqsave in __rxe_drop_index Guoqing Jiang
2022-02-10 14:16   ` Zhu Yanjun
2022-02-10 15:49     ` Bob Pearson
2022-02-11 10:09       ` Guoqing Jiang
2022-02-11 17:37         ` Bob Pearson
2022-02-12  0:59           ` Guoqing Jiang
2022-02-10  7:36 ` [PATCH 3/3] RDMA/rxe: Replace spin_lock_bh with spin_lock_irqsave in post_one_send Guoqing Jiang
2022-02-10 14:18   ` Zhu Yanjun
2022-02-22  9:50 ` bug report for rxe Guoqing Jiang
2022-02-22 10:04   ` Zhu Yanjun [this message]
2022-02-22 16:58     ` Pearson, Robert B
2022-02-23  4:43       ` Guoqing Jiang
2022-02-23  5:01         ` Bob Pearson
2022-02-23  5:50           ` Guoqing Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD=hENeU=cf4_AZPYBDke-kv3Lv3+AUkkEjZm4Drkc6YLJOeLQ@mail.gmail.com' \
    --to=zyjzyj2000@gmail.com \
    --cc=guoqing.jiang@linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=linux-rdma@vger.kernel.org \
    --cc=rpearsonhpe@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.