All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: Yanjun Zhu <yanjun.zhu@linux.dev>,
	Zhu Yanjun <zyjzyj2000@gmail.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: bug report for rdma_rxe
Date: Tue, 26 Apr 2022 08:42:31 -0300	[thread overview]
Message-ID: <20220426114231.GI2125828@nvidia.com> (raw)
In-Reply-To: <2f84097e-b31c-52b4-80b3-9e275a3b83bc@gmail.com>

On Mon, Apr 25, 2022 at 08:40:30PM -0500, Bob Pearson wrote:
> On 4/25/22 17:58, Jason Gunthorpe wrote:

> Imagine a very long RDMA read operation that times out several times before finally
> getting all the data returned to the requester. Now imagine it is followed by some
> small RDMA ops to a different node that use fast reg MRs and are executed by the
> other node after receiving a small control message. E.g.
> 
> 	node1					node2					node3
> 
> 1:	Send: RDMA_READ(mr1 to node2)
> 						RDMA_READ_REPLY(mr1@node1, 1of2)
> 	ib_map_mr_sg(mr2a local)
> 	Send: IB_WR_REG_MR(mr2a local)
> 	Send: Control msg (mr2a to node3)
> 											Send: RDMA_WRITE(mr2a@node1)
> 	Send: IB_WR_LOCAL_INV(mr2a local)
> 	ib_update_fast_reg_key(mr2a->mr2b)
> 	ib_map_mr_sg(mr2b local)
> 	Send: Control msg (mr2b to node3)
> 											Send: RDMA_WRITE(mr2b@node1)
> 	Timeout: replay from 1 (w/o local ops)
> 	Send: RDMA_READ(mr1 to node2)
> 						RDMA_READ_REPLY(mr1@node1, 2of2)
> 	Send: Control msg (mr2a to node3)
> 											Send: RDMA_WRITE(mr2a@node1)
> 											FAILS because mr2a has been
> 											replaced by mr2b.
> On the other hand if we replay the REG_MR local command that won't work either
> because we didn't know to rerun the ib_map_mr_sg() call.

How did you get two destination nodes into an RC send queue? We have
SRQ not SSQ.

In any event, the above is a buggy ULP. The IB_WR_LOCAL_INV cannot be
posted until the CQ for Send with mr2a is received. (or possibly a
strong fence is used)

It follows the general rule that the ULP cannot alter the data memory
under a WQE until it sees the CQE for that WQE to know the NIC has
completed finished with the memory.

Jason

  reply	other threads:[~2022-04-26 11:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-22 21:04 bug report for rdma_rxe Bob Pearson
2022-04-23  1:54 ` Bob Pearson
2022-04-25  0:04 ` Yanjun Zhu
2022-04-25 16:58   ` Bob Pearson
2022-04-25 22:58     ` Jason Gunthorpe
2022-04-26  1:40       ` Bob Pearson
2022-04-26 11:42         ` Jason Gunthorpe [this message]
2022-04-28 13:31       ` Bob Pearson
2022-04-28 14:29         ` Jason Gunthorpe
2022-04-26  3:10     ` Zhu Yanjun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220426114231.GI2125828@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=rpearsonhpe@gmail.com \
    --cc=yanjun.zhu@linux.dev \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.