linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation
@ 2022-09-27  5:53 Li Zhijian
  2022-09-27  5:53 ` [for-next PATCH v5 01/11] RDMA/rxe: make sure requested access is a subset of {mr,mw}->access Li Zhijian
                   ` (12 more replies)
  0 siblings, 13 replies; 31+ messages in thread
From: Li Zhijian @ 2022-09-27  5:53 UTC (permalink / raw)
  To: Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, y-goto, mbloch, liangwenpeng, tom,
	tomasz.gromadzki, dan.j.williams, linux-kernel, Li Zhijian

Hey folks,

Firstly i want to say thank you to all you guys, especially Bob, who in the
past 1+ month, gave me a lots of idea and inspiration.

With the your help, some changes are make in 5th version, such as:
- new names and new patch split schemem, suggested by Bob
- bugfix: set is_pmem true only if the whole MR is pmem. it's possible the
  one MR container both PMEM and DRAM.
- introduce feth structure, instead of u32
- new bugfix to rxe_lookup_mw() and lookup_mr(), see (RDMA/rxe: make sure requested access is a subset of {mr,mw}->access),
  with this fix, we remove check_placement_type(), lookup_mr() has done the such check.
- Enable QP attr flushable
These change logs also appear in the patch it belongs to.

These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
In IB SPEC 1.5[1], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
added in the MEMORY PLACEMENT EXTENSIONS section.

This patchset makes SoftRoCE support new RDMA FLUSH on RC service.

You can verify the patchset by building and running the rdma_flush example[2].
server:
$ ./rdma_flush_server -s [server_address] -p [port_number]
client:
$ ./rdma_flush_client -s [server_address] -p [port_number]

Corresponding pyverbs and tests(tests.test_qpex.QpExTestCase.test_qp_ex_rc_rdma_flush)
are also added to rdma-core

[1]: https://www.infinibandta.org/wp-content/uploads/2021/08/IBTA-Overview-of-IBTA-Volume-1-Release-1.5-and-MPE-2021-08-17-Secure.pptx
[2]: https://github.com/zhijianli88/rdma-core/tree/rdma-flush-v5

CC: Xiao Yang <yangx.jy@fujitsu.com>
CC: "Gotou, Yasunori" <y-goto@fujitsu.com>
CC: Jason Gunthorpe <jgg@ziepe.ca>
CC: Zhu Yanjun <zyjzyj2000@gmail.com>
CC: Leon Romanovsky <leon@kernel.org>
CC: Bob Pearson <rpearsonhpe@gmail.com>
CC: Mark Bloch <mbloch@nvidia.com>
CC: Wenpeng Liang <liangwenpeng@huawei.com>
CC: Tom Talpey <tom@talpey.com>
CC: "Gromadzki, Tomasz" <tomasz.gromadzki@intel.com>
CC: Dan Williams <dan.j.williams@intel.com>
CC: linux-rdma@vger.kernel.org
CC: linux-kernel@vger.kernel.org

Can also access the kernel source in:
https://github.com/zhijianli88/linux/tree/rdma-flush-v5
Changes log
V4:
- rework responder process
- rebase to v5.19+
- remove [7/7]: RDMA/rxe: Add RD FLUSH service support since RD is not really supported

V3:
- Just rebase and commit log and comment updates
- delete patch-1: "RDMA: mr: Introduce is_pmem", which will be combined into "Allow registering persistent flag for pmem MR only"
- delete patch-7

V2:
RDMA: mr: Introduce is_pmem
   check 1st byte to avoid crossing page boundary
   new scheme to check is_pmem # Dan

RDMA: Allow registering MR with flush access flags
   combine with [03/10] RDMA/rxe: Allow registering FLUSH flags for supported device only to this patch # Jason
   split RDMA_FLUSH to 2 capabilities

RDMA/rxe: Allow registering persistent flag for pmem MR only
   update commit message, get rid of confusing ib_check_flush_access_flags() # Tom

RDMA/rxe: Implement RC RDMA FLUSH service in requester side
   extend flush to include length field. # Tom and Tomasz

RDMA/rxe: Implement flush execution in responder side
   adjust start for WHOLE MR level # Tom
   don't support DMA mr for flush # Tom
   check flush return value

RDMA/rxe: Enable RDMA FLUSH capability for rxe device
   adjust patch's order. move it here from [04/10]

Li Zhijian (11):
  RDMA/rxe: make sure requested access is a subset of {mr,mw}->access
  RDMA: Extend RDMA user ABI to support flush
  RDMA: Extend RDMA kernel verbs ABI to support flush
  RDMA/rxe: Extend rxe user ABI to support flush
  RDMA/rxe: Allow registering persistent flag for pmem MR only
  RDMA/rxe: Extend rxe packet format to support flush
  RDMA/rxe: Implement RC RDMA FLUSH service in requester side
  RDMA/rxe: Implement flush execution in responder side
  RDMA/rxe: Implement flush completion
  RDMA/cm: Make QP FLUSHABLE
  RDMA/rxe: Enable RDMA FLUSH capability for rxe device

 drivers/infiniband/core/cm.c            |   3 +-
 drivers/infiniband/sw/rxe/rxe_comp.c    |   4 +-
 drivers/infiniband/sw/rxe/rxe_hdr.h     |  47 +++++++
 drivers/infiniband/sw/rxe/rxe_loc.h     |   1 +
 drivers/infiniband/sw/rxe/rxe_mr.c      |  81 ++++++++++-
 drivers/infiniband/sw/rxe/rxe_mw.c      |   3 +-
 drivers/infiniband/sw/rxe/rxe_opcode.c  |  17 +++
 drivers/infiniband/sw/rxe/rxe_opcode.h  |  16 ++-
 drivers/infiniband/sw/rxe/rxe_param.h   |   4 +-
 drivers/infiniband/sw/rxe/rxe_req.c     |  15 +-
 drivers/infiniband/sw/rxe/rxe_resp.c    | 180 +++++++++++++++++++++---
 drivers/infiniband/sw/rxe/rxe_verbs.h   |   6 +
 include/rdma/ib_pack.h                  |   3 +
 include/rdma/ib_verbs.h                 |  20 ++-
 include/uapi/rdma/ib_user_ioctl_verbs.h |   2 +
 include/uapi/rdma/ib_user_verbs.h       |  16 +++
 include/uapi/rdma/rdma_user_rxe.h       |   7 +
 17 files changed, 389 insertions(+), 36 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [for-next PATCH v5 01/11] RDMA/rxe: make sure requested access is a subset of {mr,mw}->access
  2022-09-27  5:53 [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
@ 2022-09-27  5:53 ` Li Zhijian
  2022-10-28 17:45   ` Jason Gunthorpe
  2022-09-27  5:53 ` [for-next PATCH v5 02/11] RDMA: Extend RDMA user ABI to support flush Li Zhijian
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: Li Zhijian @ 2022-09-27  5:53 UTC (permalink / raw)
  To: Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, y-goto, mbloch, liangwenpeng, tom,
	tomasz.gromadzki, dan.j.williams, linux-kernel, Li Zhijian

We should reject the requests with access flags that is not registered
by MR/MW. For example, lookup_mr() should return NULL when requested access
is 0x03 and mr->access is 0x01.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_mr.c | 2 +-
 drivers/infiniband/sw/rxe/rxe_mw.c | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 502e9ada99b3..74a38d06332f 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -511,7 +511,7 @@ struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key,
 
 	if (unlikely((type == RXE_LOOKUP_LOCAL && mr->lkey != key) ||
 		     (type == RXE_LOOKUP_REMOTE && mr->rkey != key) ||
-		     mr_pd(mr) != pd || (access && !(access & mr->access)) ||
+		     mr_pd(mr) != pd || ((access & mr->access) != access) ||
 		     mr->state != RXE_MR_STATE_VALID)) {
 		rxe_put(mr);
 		mr = NULL;
diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c
index 902b7df7aaed..8df1c9066ed8 100644
--- a/drivers/infiniband/sw/rxe/rxe_mw.c
+++ b/drivers/infiniband/sw/rxe/rxe_mw.c
@@ -293,8 +293,7 @@ struct rxe_mw *rxe_lookup_mw(struct rxe_qp *qp, int access, u32 rkey)
 
 	if (unlikely((mw->rkey != rkey) || rxe_mw_pd(mw) != pd ||
 		     (mw->ibmw.type == IB_MW_TYPE_2 && mw->qp != qp) ||
-		     (mw->length == 0) ||
-		     (access && !(access & mw->access)) ||
+		     (mw->length == 0) || ((access & mw->access) != access) ||
 		     mw->state != RXE_MW_STATE_VALID)) {
 		rxe_put(mw);
 		return NULL;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [for-next PATCH v5 02/11] RDMA: Extend RDMA user ABI to support flush
  2022-09-27  5:53 [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
  2022-09-27  5:53 ` [for-next PATCH v5 01/11] RDMA/rxe: make sure requested access is a subset of {mr,mw}->access Li Zhijian
@ 2022-09-27  5:53 ` Li Zhijian
  2022-09-27  5:53 ` [for-next PATCH v5 03/11] RDMA: Extend RDMA kernel verbs " Li Zhijian
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 31+ messages in thread
From: Li Zhijian @ 2022-09-27  5:53 UTC (permalink / raw)
  To: Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, y-goto, mbloch, liangwenpeng, tom,
	tomasz.gromadzki, dan.j.williams, linux-kernel, Li Zhijian

This commit extends the RDMA user ABI to support the flush
operation defined in IBA A19.4.1. These changes are
backwards compatible with the existing RDMA user ABI.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V5: new names and new patch split scheme, suggested by Bob
---
 include/uapi/rdma/ib_user_ioctl_verbs.h |  2 ++
 include/uapi/rdma/ib_user_verbs.h       | 16 ++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index 7dd56210226f..07b105e22f6f 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -57,6 +57,8 @@ enum ib_uverbs_access_flags {
 	IB_UVERBS_ACCESS_ZERO_BASED = 1 << 5,
 	IB_UVERBS_ACCESS_ON_DEMAND = 1 << 6,
 	IB_UVERBS_ACCESS_HUGETLB = 1 << 7,
+	IB_UVERBS_ACCESS_FLUSH_GLOBAL = 1 << 8,
+	IB_UVERBS_ACCESS_FLUSH_PERSISTENT = 1 << 9,
 
 	IB_UVERBS_ACCESS_RELAXED_ORDERING = IB_UVERBS_ACCESS_OPTIONAL_FIRST,
 	IB_UVERBS_ACCESS_OPTIONAL_RANGE =
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 43672cb1fd57..2d5f32d9d0d9 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -105,6 +105,18 @@ enum {
 	IB_USER_VERBS_EX_CMD_MODIFY_CQ
 };
 
+/* see IBA A19.4.1.1 Placement Types */
+enum ib_placement_type {
+	IB_FLUSH_GLOBAL = 1U << 0,
+	IB_FLUSH_PERSISTENT = 1U << 1,
+};
+
+/* see IBA A19.4.1.2 Selectivity Level */
+enum ib_selectivity_level {
+	IB_FLUSH_RANGE = 0,
+	IB_FLUSH_MR,
+};
+
 /*
  * Make sure that all structs defined in this file remain laid out so
  * that they pack the same way on 32-bit and 64-bit architectures (to
@@ -466,6 +478,7 @@ enum ib_uverbs_wc_opcode {
 	IB_UVERBS_WC_BIND_MW = 5,
 	IB_UVERBS_WC_LOCAL_INV = 6,
 	IB_UVERBS_WC_TSO = 7,
+	IB_UVERBS_WC_FLUSH = 8,
 };
 
 struct ib_uverbs_wc {
@@ -784,6 +797,7 @@ enum ib_uverbs_wr_opcode {
 	IB_UVERBS_WR_RDMA_READ_WITH_INV = 11,
 	IB_UVERBS_WR_MASKED_ATOMIC_CMP_AND_SWP = 12,
 	IB_UVERBS_WR_MASKED_ATOMIC_FETCH_AND_ADD = 13,
+	IB_UVERBS_WR_FLUSH = 14,
 	/* Review enum ib_wr_opcode before modifying this */
 };
 
@@ -1331,6 +1345,8 @@ enum ib_uverbs_device_cap_flags {
 	/* Deprecated. Please use IB_UVERBS_RAW_PACKET_CAP_SCATTER_FCS. */
 	IB_UVERBS_DEVICE_RAW_SCATTER_FCS = 1ULL << 34,
 	IB_UVERBS_DEVICE_PCI_WRITE_END_PADDING = 1ULL << 36,
+	IB_UVERBS_DEVICE_FLUSH_GLOBAL = 1ULL << 38,
+	IB_UVERBS_DEVICE_FLUSH_PERSISTENT = 1ULL << 39,
 };
 
 enum ib_uverbs_raw_packet_caps {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [for-next PATCH v5 03/11] RDMA: Extend RDMA kernel verbs ABI to support flush
  2022-09-27  5:53 [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
  2022-09-27  5:53 ` [for-next PATCH v5 01/11] RDMA/rxe: make sure requested access is a subset of {mr,mw}->access Li Zhijian
  2022-09-27  5:53 ` [for-next PATCH v5 02/11] RDMA: Extend RDMA user ABI to support flush Li Zhijian
@ 2022-09-27  5:53 ` Li Zhijian
  2022-09-29  6:21   ` Li Zhijian
  2022-10-28 17:44   ` Jason Gunthorpe
  2022-09-27  5:53 ` [for-next PATCH v5 04/11] RDMA/rxe: Extend rxe user " Li Zhijian
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 31+ messages in thread
From: Li Zhijian @ 2022-09-27  5:53 UTC (permalink / raw)
  To: Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, y-goto, mbloch, liangwenpeng, tom,
	tomasz.gromadzki, dan.j.williams, linux-kernel, Li Zhijian

This commit extends the RDMA kernel verbs ABI to support the flush
operation defined in IBA A19.4.1. These changes are
backwards compatible with the existing RDMA kernel verbs ABI.

It makes device/HCA support new FLUSH attributes/capabilities, and it
also makes memory region support new FLUSH access flags.

Users can use ibv_reg_mr(3) to register flush access flags. Only the
access flags also supported by device's capabilities can be registered
successfully.

Once registered successfully, it means the MR is flushable. Similarly,
A flushable MR should also have one or both of GLOBAL_VISIBILITY and
PERSISTENT attributes/capabilities like device/HCA.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V5: new names and new patch split scheme, suggested by Bob
---
 include/rdma/ib_pack.h  |  3 +++
 include/rdma/ib_verbs.h | 20 +++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index a9162f25beaf..56211d1cc9f9 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -84,6 +84,7 @@ enum {
 	/* opcode 0x15 is reserved */
 	IB_OPCODE_SEND_LAST_WITH_INVALIDATE         = 0x16,
 	IB_OPCODE_SEND_ONLY_WITH_INVALIDATE         = 0x17,
+	IB_OPCODE_FLUSH                             = 0x1C,
 
 	/* real constants follow -- see comment about above IB_OPCODE()
 	   macro for more details */
@@ -112,6 +113,7 @@ enum {
 	IB_OPCODE(RC, FETCH_ADD),
 	IB_OPCODE(RC, SEND_LAST_WITH_INVALIDATE),
 	IB_OPCODE(RC, SEND_ONLY_WITH_INVALIDATE),
+	IB_OPCODE(RC, FLUSH),
 
 	/* UC */
 	IB_OPCODE(UC, SEND_FIRST),
@@ -149,6 +151,7 @@ enum {
 	IB_OPCODE(RD, ATOMIC_ACKNOWLEDGE),
 	IB_OPCODE(RD, COMPARE_SWAP),
 	IB_OPCODE(RD, FETCH_ADD),
+	IB_OPCODE(RD, FLUSH),
 
 	/* UD */
 	IB_OPCODE(UD, SEND_ONLY),
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 975d6e9efbcb..571838dd06eb 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -270,6 +270,9 @@ enum ib_device_cap_flags {
 	/* The device supports padding incoming writes to cacheline. */
 	IB_DEVICE_PCI_WRITE_END_PADDING =
 		IB_UVERBS_DEVICE_PCI_WRITE_END_PADDING,
+	/* Placement type attributes */
+	IB_DEVICE_FLUSH_GLOBAL = IB_UVERBS_DEVICE_FLUSH_GLOBAL,
+	IB_DEVICE_FLUSH_PERSISTENT = IB_UVERBS_DEVICE_FLUSH_PERSISTENT,
 };
 
 enum ib_kernel_cap_flags {
@@ -985,6 +988,7 @@ enum ib_wc_opcode {
 	IB_WC_REG_MR,
 	IB_WC_MASKED_COMP_SWAP,
 	IB_WC_MASKED_FETCH_ADD,
+	IB_WC_FLUSH = IB_UVERBS_WC_FLUSH,
 /*
  * Set value of IB_WC_RECV so consumers can test if a completion is a
  * receive by testing (opcode & IB_WC_RECV).
@@ -1325,6 +1329,7 @@ enum ib_wr_opcode {
 		IB_UVERBS_WR_MASKED_ATOMIC_CMP_AND_SWP,
 	IB_WR_MASKED_ATOMIC_FETCH_AND_ADD =
 		IB_UVERBS_WR_MASKED_ATOMIC_FETCH_AND_ADD,
+	IB_WR_FLUSH = IB_UVERBS_WR_FLUSH,
 
 	/* These are kernel only and can not be issued by userspace */
 	IB_WR_REG_MR = 0x20,
@@ -1458,10 +1463,14 @@ enum ib_access_flags {
 	IB_ACCESS_ON_DEMAND = IB_UVERBS_ACCESS_ON_DEMAND,
 	IB_ACCESS_HUGETLB = IB_UVERBS_ACCESS_HUGETLB,
 	IB_ACCESS_RELAXED_ORDERING = IB_UVERBS_ACCESS_RELAXED_ORDERING,
+	IB_ACCESS_FLUSH_GLOBAL = IB_UVERBS_ACCESS_FLUSH_GLOBAL,
+	IB_ACCESS_FLUSH_PERSISTENT = IB_UVERBS_ACCESS_FLUSH_PERSISTENT,
+	IB_ACCESS_FLUSHABLE = IB_ACCESS_FLUSH_GLOBAL |
+			      IB_ACCESS_FLUSH_PERSISTENT,
 
 	IB_ACCESS_OPTIONAL = IB_UVERBS_ACCESS_OPTIONAL_RANGE,
 	IB_ACCESS_SUPPORTED =
-		((IB_ACCESS_HUGETLB << 1) - 1) | IB_ACCESS_OPTIONAL,
+		((IB_ACCESS_FLUSH_PERSISTENT << 1) - 1) | IB_ACCESS_OPTIONAL,
 };
 
 /*
@@ -4321,6 +4330,8 @@ int ib_dealloc_xrcd_user(struct ib_xrcd *xrcd, struct ib_udata *udata);
 static inline int ib_check_mr_access(struct ib_device *ib_dev,
 				     unsigned int flags)
 {
+	u64 device_cap = ib_dev->attrs.device_cap_flags;
+
 	/*
 	 * Local write permission is required if remote write or
 	 * remote atomic permission is also requested.
@@ -4335,6 +4346,13 @@ static inline int ib_check_mr_access(struct ib_device *ib_dev,
 	if (flags & IB_ACCESS_ON_DEMAND &&
 	    !(ib_dev->attrs.kernel_cap_flags & IBK_ON_DEMAND_PAGING))
 		return -EINVAL;
+
+	if ((flags & IB_ACCESS_FLUSH_GLOBAL &&
+	    !(device_cap & IB_DEVICE_FLUSH_GLOBAL)) ||
+	    (flags & IB_ACCESS_FLUSH_PERSISTENT &&
+	    !(device_cap & IB_DEVICE_FLUSH_PERSISTENT)))
+		return -EINVAL;
+
 	return 0;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [for-next PATCH v5 04/11] RDMA/rxe: Extend rxe user ABI to support flush
  2022-09-27  5:53 [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (2 preceding siblings ...)
  2022-09-27  5:53 ` [for-next PATCH v5 03/11] RDMA: Extend RDMA kernel verbs " Li Zhijian
@ 2022-09-27  5:53 ` Li Zhijian
  2022-09-27  5:53 ` [for-next PATCH v5 05/11] RDMA/rxe: Allow registering persistent flag for pmem MR only Li Zhijian
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 31+ messages in thread
From: Li Zhijian @ 2022-09-27  5:53 UTC (permalink / raw)
  To: Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, y-goto, mbloch, liangwenpeng, tom,
	tomasz.gromadzki, dan.j.williams, linux-kernel, Li Zhijian

This commit extends the rxe user ABI to support the flush
operation defined in IBA A19.4.1. These changes are
backwards compatible with the existing rxe user ABI.

The user API request a flush by filling this structure.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V5: new patch split scheme, suggested by Bob
---
 include/uapi/rdma/rdma_user_rxe.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/uapi/rdma/rdma_user_rxe.h b/include/uapi/rdma/rdma_user_rxe.h
index 73f679dfd2df..e2b93df94590 100644
--- a/include/uapi/rdma/rdma_user_rxe.h
+++ b/include/uapi/rdma/rdma_user_rxe.h
@@ -82,6 +82,13 @@ struct rxe_send_wr {
 		__u32		invalidate_rkey;
 	} ex;
 	union {
+		struct {
+			__aligned_u64 remote_addr;
+			__u32	length;
+			__u32	rkey;
+			__u8	type;
+			__u8	level;
+		} flush;
 		struct {
 			__aligned_u64 remote_addr;
 			__u32	rkey;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [for-next PATCH v5 05/11] RDMA/rxe: Allow registering persistent flag for pmem MR only
  2022-09-27  5:53 [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (3 preceding siblings ...)
  2022-09-27  5:53 ` [for-next PATCH v5 04/11] RDMA/rxe: Extend rxe user " Li Zhijian
@ 2022-09-27  5:53 ` Li Zhijian
  2022-10-28 17:53   ` Jason Gunthorpe
  2022-09-27  5:53 ` [for-next PATCH v5 06/11] RDMA/rxe: Extend rxe packet format to support flush Li Zhijian
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: Li Zhijian @ 2022-09-27  5:53 UTC (permalink / raw)
  To: Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, y-goto, mbloch, liangwenpeng, tom,
	tomasz.gromadzki, dan.j.williams, linux-kernel, Li Zhijian

Memory region could  support at most 2 flush access flags:
IB_ACCESS_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL

But we only allow user to register persistent flush flags to the pmem MR
where it has the ability of persisting data across power cycles.

So register a persistent access flag to a non-pmem MR will be rejected.

CC: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V5: make sure the whole MR is pmem
V4: set is_pmem more simple
V2: new scheme check is_pmem # Dan
    update commit message, get rid of confusing ib_check_flush_access_flags() # Tom
---
 drivers/infiniband/sw/rxe/rxe_mr.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 74a38d06332f..1da3ad5eba64 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -112,6 +112,13 @@ void rxe_mr_init_dma(int access, struct rxe_mr *mr)
 	mr->type = IB_MR_TYPE_DMA;
 }
 
+static bool vaddr_in_pmem(char *vaddr)
+{
+	return REGION_INTERSECTS ==
+	       region_intersects(virt_to_phys(vaddr), 1, IORESOURCE_MEM,
+				 IORES_DESC_PERSISTENT_MEMORY);
+}
+
 int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 		     int access, struct rxe_mr *mr)
 {
@@ -122,6 +129,7 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 	int			num_buf;
 	void			*vaddr;
 	int err;
+	bool is_pmem = false;
 	int i;
 
 	umem = ib_umem_get(&rxe->ib_dev, start, length, access);
@@ -149,6 +157,7 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 	num_buf			= 0;
 	map = mr->map;
 	if (length > 0) {
+		is_pmem = true;
 		buf = map[0]->buf;
 
 		for_each_sgtable_page (&umem->sgt_append.sgt, &sg_iter, 0) {
@@ -166,6 +175,10 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 				goto err_cleanup_map;
 			}
 
+			/* True only if the *whole* MR is pmem */
+			if (is_pmem)
+				is_pmem = vaddr_in_pmem(vaddr);
+
 			buf->addr = (uintptr_t)vaddr;
 			buf->size = PAGE_SIZE;
 			num_buf++;
@@ -174,6 +187,12 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 		}
 	}
 
+	if (!is_pmem && access & IB_ACCESS_FLUSH_PERSISTENT) {
+		pr_warn("Cannot register IB_ACCESS_FLUSH_PERSISTENT for non-pmem memory\n");
+		err = -EINVAL;
+		goto err_release_umem;
+	}
+
 	mr->umem = umem;
 	mr->access = access;
 	mr->offset = ib_umem_offset(umem);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [for-next PATCH v5 06/11] RDMA/rxe: Extend rxe packet format to support flush
  2022-09-27  5:53 [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (4 preceding siblings ...)
  2022-09-27  5:53 ` [for-next PATCH v5 05/11] RDMA/rxe: Allow registering persistent flag for pmem MR only Li Zhijian
@ 2022-09-27  5:53 ` Li Zhijian
  2022-11-11  8:43   ` Yanjun Zhu
  2022-09-27  5:53 ` [for-next PATCH v5 07/11] RDMA/rxe: Implement RC RDMA FLUSH service in requester side Li Zhijian
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: Li Zhijian @ 2022-09-27  5:53 UTC (permalink / raw)
  To: Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, y-goto, mbloch, liangwenpeng, tom,
	tomasz.gromadzki, dan.j.williams, linux-kernel, Li Zhijian

Extend rxe opcode tables, headers, helper and constants to support
flush operations.

Refer to the IBA A19.4.1 for more FETH definition details

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V5: new FETH structure and simplify header helper
    new names and new patch split scheme, suggested by Bob.
---
 drivers/infiniband/sw/rxe/rxe_hdr.h    | 47 ++++++++++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_opcode.c | 17 ++++++++++
 drivers/infiniband/sw/rxe/rxe_opcode.h | 16 +++++----
 3 files changed, 74 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_hdr.h b/drivers/infiniband/sw/rxe/rxe_hdr.h
index e432f9e37795..e995a97c54fd 100644
--- a/drivers/infiniband/sw/rxe/rxe_hdr.h
+++ b/drivers/infiniband/sw/rxe/rxe_hdr.h
@@ -607,6 +607,52 @@ static inline void reth_set_len(struct rxe_pkt_info *pkt, u32 len)
 		rxe_opcode[pkt->opcode].offset[RXE_RETH], len);
 }
 
+/******************************************************************************
+ * FLUSH Extended Transport Header
+ ******************************************************************************/
+
+struct rxe_feth {
+	__be32 bits;
+};
+
+#define FETH_PLT_MASK		(0x0000000f) /* bits 3-0 */
+#define FETH_SEL_MASK		(0x00000030) /* bits 5-4 */
+#define FETH_SEL_SHIFT		(4U)
+
+static inline u32 __feth_plt(void *arg)
+{
+	struct rxe_feth *feth = arg;
+
+	return be32_to_cpu(feth->bits) & FETH_PLT_MASK;
+}
+
+static inline u32 __feth_sel(void *arg)
+{
+	struct rxe_feth *feth = arg;
+
+	return (be32_to_cpu(feth->bits) & FETH_SEL_MASK) >> FETH_SEL_SHIFT;
+}
+
+static inline u32 feth_plt(struct rxe_pkt_info *pkt)
+{
+	return __feth_plt(pkt->hdr + rxe_opcode[pkt->opcode].offset[RXE_FETH]);
+}
+
+static inline u32 feth_sel(struct rxe_pkt_info *pkt)
+{
+	return __feth_sel(pkt->hdr + rxe_opcode[pkt->opcode].offset[RXE_FETH]);
+}
+
+static inline void feth_init(struct rxe_pkt_info *pkt, u8 type, u8 level)
+{
+	struct rxe_feth *feth = (struct rxe_feth *)
+		    (pkt->hdr + rxe_opcode[pkt->opcode].offset[RXE_FETH]);
+	u32 bits = ((level << FETH_SEL_SHIFT) & FETH_SEL_MASK) |
+		   (type & FETH_PLT_MASK);
+
+	feth->bits = cpu_to_be32(bits);
+}
+
 /******************************************************************************
  * Atomic Extended Transport Header
  ******************************************************************************/
@@ -910,6 +956,7 @@ enum rxe_hdr_length {
 	RXE_ATMETH_BYTES	= sizeof(struct rxe_atmeth),
 	RXE_IETH_BYTES		= sizeof(struct rxe_ieth),
 	RXE_RDETH_BYTES		= sizeof(struct rxe_rdeth),
+	RXE_FETH_BYTES		= sizeof(struct rxe_feth),
 };
 
 static inline size_t header_size(struct rxe_pkt_info *pkt)
diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.c b/drivers/infiniband/sw/rxe/rxe_opcode.c
index d4ba4d506f17..55aad13e57bb 100644
--- a/drivers/infiniband/sw/rxe/rxe_opcode.c
+++ b/drivers/infiniband/sw/rxe/rxe_opcode.c
@@ -101,6 +101,12 @@ struct rxe_wr_opcode_info rxe_wr_opcode_info[] = {
 			[IB_QPT_UC]	= WR_LOCAL_OP_MASK,
 		},
 	},
+	[IB_WR_FLUSH]					= {
+		.name   = "IB_WR_FLUSH",
+		.mask   = {
+			[IB_QPT_RC]	= WR_FLUSH_MASK,
+		},
+	},
 };
 
 struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = {
@@ -378,6 +384,17 @@ struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = {
 					  RXE_IETH_BYTES,
 		}
 	},
+	[IB_OPCODE_RC_FLUSH]					= {
+		.name	= "IB_OPCODE_RC_FLUSH",
+		.mask	= RXE_FETH_MASK | RXE_RETH_MASK | RXE_FLUSH_MASK |
+			  RXE_START_MASK | RXE_END_MASK | RXE_REQ_MASK,
+		.length = RXE_BTH_BYTES + RXE_FETH_BYTES + RXE_RETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_FETH]	= RXE_BTH_BYTES,
+			[RXE_RETH]	= RXE_BTH_BYTES + RXE_FETH_BYTES,
+		}
+	},
 
 	/* UC */
 	[IB_OPCODE_UC_SEND_FIRST]			= {
diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.h b/drivers/infiniband/sw/rxe/rxe_opcode.h
index 8f9aaaf260f2..02d256745793 100644
--- a/drivers/infiniband/sw/rxe/rxe_opcode.h
+++ b/drivers/infiniband/sw/rxe/rxe_opcode.h
@@ -19,7 +19,8 @@ enum rxe_wr_mask {
 	WR_SEND_MASK			= BIT(2),
 	WR_READ_MASK			= BIT(3),
 	WR_WRITE_MASK			= BIT(4),
-	WR_LOCAL_OP_MASK		= BIT(5),
+	WR_FLUSH_MASK			= BIT(5),
+	WR_LOCAL_OP_MASK		= BIT(6),
 
 	WR_READ_OR_WRITE_MASK		= WR_READ_MASK | WR_WRITE_MASK,
 	WR_WRITE_OR_SEND_MASK		= WR_WRITE_MASK | WR_SEND_MASK,
@@ -47,6 +48,7 @@ enum rxe_hdr_type {
 	RXE_RDETH,
 	RXE_DETH,
 	RXE_IMMDT,
+	RXE_FETH,
 	RXE_PAYLOAD,
 	NUM_HDR_TYPES
 };
@@ -63,6 +65,7 @@ enum rxe_hdr_mask {
 	RXE_IETH_MASK		= BIT(RXE_IETH),
 	RXE_RDETH_MASK		= BIT(RXE_RDETH),
 	RXE_DETH_MASK		= BIT(RXE_DETH),
+	RXE_FETH_MASK		= BIT(RXE_FETH),
 	RXE_PAYLOAD_MASK	= BIT(RXE_PAYLOAD),
 
 	RXE_REQ_MASK		= BIT(NUM_HDR_TYPES + 0),
@@ -71,13 +74,14 @@ enum rxe_hdr_mask {
 	RXE_WRITE_MASK		= BIT(NUM_HDR_TYPES + 3),
 	RXE_READ_MASK		= BIT(NUM_HDR_TYPES + 4),
 	RXE_ATOMIC_MASK		= BIT(NUM_HDR_TYPES + 5),
+	RXE_FLUSH_MASK		= BIT(NUM_HDR_TYPES + 6),
 
-	RXE_RWR_MASK		= BIT(NUM_HDR_TYPES + 6),
-	RXE_COMP_MASK		= BIT(NUM_HDR_TYPES + 7),
+	RXE_RWR_MASK		= BIT(NUM_HDR_TYPES + 7),
+	RXE_COMP_MASK		= BIT(NUM_HDR_TYPES + 8),
 
-	RXE_START_MASK		= BIT(NUM_HDR_TYPES + 8),
-	RXE_MIDDLE_MASK		= BIT(NUM_HDR_TYPES + 9),
-	RXE_END_MASK		= BIT(NUM_HDR_TYPES + 10),
+	RXE_START_MASK		= BIT(NUM_HDR_TYPES + 9),
+	RXE_MIDDLE_MASK		= BIT(NUM_HDR_TYPES + 10),
+	RXE_END_MASK		= BIT(NUM_HDR_TYPES + 11),
 
 	RXE_LOOPBACK_MASK	= BIT(NUM_HDR_TYPES + 12),
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [for-next PATCH v5 07/11] RDMA/rxe: Implement RC RDMA FLUSH service in requester side
  2022-09-27  5:53 [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (5 preceding siblings ...)
  2022-09-27  5:53 ` [for-next PATCH v5 06/11] RDMA/rxe: Extend rxe packet format to support flush Li Zhijian
@ 2022-09-27  5:53 ` Li Zhijian
  2022-09-27  5:53 ` [for-next PATCH v5 08/11] RDMA/rxe: Implement flush execution in responder side Li Zhijian
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 31+ messages in thread
From: Li Zhijian @ 2022-09-27  5:53 UTC (permalink / raw)
  To: Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, y-goto, mbloch, liangwenpeng, tom,
	tomasz.gromadzki, dan.j.williams, linux-kernel, Li Zhijian

Implement FLUSH request operation in the requester.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V4: Remove flush union for legecy API, add WR_FLUSH_MASK
V3: Fix sparse: incorrect type in assignment; Reported-by: kernel test robot <lkp@intel.com>
V2: extend flush to include length field.
---
 drivers/infiniband/sw/rxe/rxe_req.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index f63771207970..5996b0e3177a 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -241,6 +241,9 @@ static int next_opcode_rc(struct rxe_qp *qp, u32 opcode, int fits)
 				IB_OPCODE_RC_SEND_ONLY_WITH_IMMEDIATE :
 				IB_OPCODE_RC_SEND_FIRST;
 
+	case IB_WR_FLUSH:
+		return IB_OPCODE_RC_FLUSH;
+
 	case IB_WR_RDMA_READ:
 		return IB_OPCODE_RC_RDMA_READ_REQUEST;
 
@@ -421,11 +424,18 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
 
 	/* init optional headers */
 	if (pkt->mask & RXE_RETH_MASK) {
-		reth_set_rkey(pkt, ibwr->wr.rdma.rkey);
+		if (pkt->mask & RXE_FETH_MASK)
+			reth_set_rkey(pkt, ibwr->wr.flush.rkey);
+		else
+			reth_set_rkey(pkt, ibwr->wr.rdma.rkey);
 		reth_set_va(pkt, wqe->iova);
 		reth_set_len(pkt, wqe->dma.resid);
 	}
 
+	/* Fill Flush Extension Transport Header */
+	if (pkt->mask & RXE_FETH_MASK)
+		feth_init(pkt, ibwr->wr.flush.type, ibwr->wr.flush.level);
+
 	if (pkt->mask & RXE_IMMDT_MASK)
 		immdt_set_imm(pkt, ibwr->ex.imm_data);
 
@@ -484,6 +494,9 @@ static int finish_packet(struct rxe_qp *qp, struct rxe_av *av,
 
 			memset(pad, 0, bth_pad(pkt));
 		}
+	} else if (pkt->mask & RXE_FLUSH_MASK) {
+		/* oA19-2: shall have no payload. */
+		wqe->dma.resid = 0;
 	}
 
 	return 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [for-next PATCH v5 08/11] RDMA/rxe: Implement flush execution in responder side
  2022-09-27  5:53 [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (6 preceding siblings ...)
  2022-09-27  5:53 ` [for-next PATCH v5 07/11] RDMA/rxe: Implement RC RDMA FLUSH service in requester side Li Zhijian
@ 2022-09-27  5:53 ` Li Zhijian
  2022-09-27  5:53 ` [for-next PATCH v5 09/11] RDMA/rxe: Implement flush completion Li Zhijian
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 31+ messages in thread
From: Li Zhijian @ 2022-09-27  5:53 UTC (permalink / raw)
  To: Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, y-goto, mbloch, liangwenpeng, tom,
	tomasz.gromadzki, dan.j.williams, linux-kernel, Li Zhijian

Only the requested placement types that also registered in the destination
memory region are acceptable.
Otherwise, responder will also reply NAK "Remote Access Error" if it
found a placement type violation.

We will persist data via arch_wb_cache_pmem(), which could be
architecture specific.

This commit also add 2 helpers to update qp.resp from the incoming packet.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
v5: add QP attr check for flush access
    rename flush_nvdimm_iova -> rxe_flush_pmem_iova()
v4: add send_read_response_ack and flush resource
---
 drivers/infiniband/sw/rxe/rxe_loc.h   |   1 +
 drivers/infiniband/sw/rxe/rxe_mr.c    |  60 +++++++++
 drivers/infiniband/sw/rxe/rxe_resp.c  | 180 ++++++++++++++++++++++----
 drivers/infiniband/sw/rxe/rxe_verbs.h |   6 +
 4 files changed, 225 insertions(+), 22 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index c2a5c8814a48..944d564a11cd 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -68,6 +68,7 @@ void rxe_mr_init_dma(int access, struct rxe_mr *mr);
 int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 		     int access, struct rxe_mr *mr);
 int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr);
+int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, int length);
 int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length,
 		enum rxe_mr_copy_dir dir);
 int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma,
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 1da3ad5eba64..fa7e71074233 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -4,6 +4,8 @@
  * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
  */
 
+#include <linux/libnvdimm.h>
+
 #include "rxe.h"
 #include "rxe_loc.h"
 
@@ -305,6 +307,64 @@ void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length)
 	return addr;
 }
 
+int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, int length)
+{
+	int err;
+	int bytes;
+	u8 *va;
+	struct rxe_map **map;
+	struct rxe_phys_buf *buf;
+	int m;
+	int i;
+	size_t offset;
+
+	if (length == 0)
+		return 0;
+
+	if (mr->type == IB_MR_TYPE_DMA) {
+		err = -EFAULT;
+		goto err1;
+	}
+
+	err = mr_check_range(mr, iova, length);
+	if (err) {
+		err = -EFAULT;
+		goto err1;
+	}
+
+	lookup_iova(mr, iova, &m, &i, &offset);
+
+	map = mr->map + m;
+	buf = map[0]->buf + i;
+
+	while (length > 0) {
+		va = (u8 *)(uintptr_t)buf->addr + offset;
+		bytes = buf->size - offset;
+
+		if (bytes > length)
+			bytes = length;
+
+		arch_wb_cache_pmem(va, bytes);
+
+		length -= bytes;
+
+		offset = 0;
+		buf++;
+		i++;
+
+		if (i == RXE_BUF_PER_MAP) {
+			i = 0;
+			map++;
+			buf = map[0]->buf;
+		}
+	}
+
+	return 0;
+
+err1:
+	return err;
+}
+
 /* copy data from a range (vaddr, vaddr+length-1) to or from
  * a mr object starting at iova.
  */
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index ed5a09e86417..0b68e5d8e1d2 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -22,6 +22,7 @@ enum resp_states {
 	RESPST_EXECUTE,
 	RESPST_READ_REPLY,
 	RESPST_ATOMIC_REPLY,
+	RESPST_PROCESS_FLUSH,
 	RESPST_COMPLETE,
 	RESPST_ACKNOWLEDGE,
 	RESPST_CLEANUP,
@@ -57,6 +58,7 @@ static char *resp_state_name[] = {
 	[RESPST_EXECUTE]			= "EXECUTE",
 	[RESPST_READ_REPLY]			= "READ_REPLY",
 	[RESPST_ATOMIC_REPLY]			= "ATOMIC_REPLY",
+	[RESPST_PROCESS_FLUSH]			= "PROCESS_FLUSH",
 	[RESPST_COMPLETE]			= "COMPLETE",
 	[RESPST_ACKNOWLEDGE]			= "ACKNOWLEDGE",
 	[RESPST_CLEANUP]			= "CLEANUP",
@@ -253,19 +255,38 @@ static enum resp_states check_op_seq(struct rxe_qp *qp,
 	}
 }
 
+static bool check_qp_attr_access(struct rxe_qp *qp,
+				 struct rxe_pkt_info *pkt)
+{
+	if (((pkt->mask & RXE_READ_MASK) &&
+	     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_READ)) ||
+	    ((pkt->mask & RXE_WRITE_MASK) &&
+	     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_WRITE)) ||
+	    ((pkt->mask & RXE_ATOMIC_MASK) &&
+	     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_ATOMIC))) {
+		return false;
+	}
+
+	if (pkt->mask & RXE_FLUSH_MASK) {
+		u32 flush_type = feth_plt(pkt);
+
+		if ((flush_type & IB_FLUSH_GLOBAL &&
+		     !(qp->attr.qp_access_flags & IB_ACCESS_FLUSH_GLOBAL)) ||
+		    (flush_type & IB_FLUSH_PERSISTENT &&
+		     !(qp->attr.qp_access_flags & IB_ACCESS_FLUSH_PERSISTENT)))
+			return false;
+	}
+
+	return true;
+}
+
 static enum resp_states check_op_valid(struct rxe_qp *qp,
 				       struct rxe_pkt_info *pkt)
 {
 	switch (qp_type(qp)) {
 	case IB_QPT_RC:
-		if (((pkt->mask & RXE_READ_MASK) &&
-		     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_READ)) ||
-		    ((pkt->mask & RXE_WRITE_MASK) &&
-		     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_WRITE)) ||
-		    ((pkt->mask & RXE_ATOMIC_MASK) &&
-		     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_ATOMIC))) {
+		if (!check_qp_attr_access(qp, pkt))
 			return RESPST_ERR_UNSUPPORTED_OPCODE;
-		}
 
 		break;
 
@@ -402,6 +423,23 @@ static enum resp_states check_length(struct rxe_qp *qp,
 	}
 }
 
+static void qp_resp_from_reth(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
+{
+	qp->resp.va = reth_va(pkt);
+	qp->resp.offset = 0;
+	qp->resp.rkey = reth_rkey(pkt);
+	qp->resp.resid = reth_len(pkt);
+	qp->resp.length = reth_len(pkt);
+}
+
+static void qp_resp_from_atmeth(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
+{
+	qp->resp.va = atmeth_va(pkt);
+	qp->resp.offset = 0;
+	qp->resp.rkey = atmeth_rkey(pkt);
+	qp->resp.resid = sizeof(u64);
+}
+
 static enum resp_states check_rkey(struct rxe_qp *qp,
 				   struct rxe_pkt_info *pkt)
 {
@@ -413,23 +451,26 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 	u32 pktlen;
 	int mtu = qp->mtu;
 	enum resp_states state;
-	int access;
+	int access = 0;
 
 	if (pkt->mask & RXE_READ_OR_WRITE_MASK) {
-		if (pkt->mask & RXE_RETH_MASK) {
-			qp->resp.va = reth_va(pkt);
-			qp->resp.offset = 0;
-			qp->resp.rkey = reth_rkey(pkt);
-			qp->resp.resid = reth_len(pkt);
-			qp->resp.length = reth_len(pkt);
-		}
+		if (pkt->mask & RXE_RETH_MASK)
+			qp_resp_from_reth(qp, pkt);
+
 		access = (pkt->mask & RXE_READ_MASK) ? IB_ACCESS_REMOTE_READ
 						     : IB_ACCESS_REMOTE_WRITE;
+	} else if (pkt->mask & RXE_FLUSH_MASK) {
+		u32 flush_type = feth_plt(pkt);
+
+		if (pkt->mask & RXE_RETH_MASK)
+			qp_resp_from_reth(qp, pkt);
+
+		if (flush_type & IB_FLUSH_GLOBAL)
+			access |= IB_ACCESS_FLUSH_GLOBAL;
+		if (flush_type & IB_FLUSH_PERSISTENT)
+			access |= IB_ACCESS_FLUSH_PERSISTENT;
 	} else if (pkt->mask & RXE_ATOMIC_MASK) {
-		qp->resp.va = atmeth_va(pkt);
-		qp->resp.offset = 0;
-		qp->resp.rkey = atmeth_rkey(pkt);
-		qp->resp.resid = sizeof(u64);
+		qp_resp_from_atmeth(qp, pkt);
 		access = IB_ACCESS_REMOTE_ATOMIC;
 	} else {
 		return RESPST_EXECUTE;
@@ -450,7 +491,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 	if (rkey_is_mw(rkey)) {
 		mw = rxe_lookup_mw(qp, access, rkey);
 		if (!mw) {
-			pr_debug("%s: no MW matches rkey %#x\n",
+			pr_err("%s: no MW matches rkey %#x\n",
 					__func__, rkey);
 			state = RESPST_ERR_RKEY_VIOLATION;
 			goto err;
@@ -458,7 +499,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 
 		mr = mw->mr;
 		if (!mr) {
-			pr_err("%s: MW doesn't have an MR\n", __func__);
+			pr_debug("%s: MW doesn't have an MR\n", __func__);
 			state = RESPST_ERR_RKEY_VIOLATION;
 			goto err;
 		}
@@ -478,12 +519,21 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 		}
 	}
 
+	if (pkt->mask & RXE_FLUSH_MASK) {
+		/* FLUSH MR may not set va or resid
+		 * no need to check range since we will flush whole mr
+		 */
+		if (feth_sel(pkt) == IB_FLUSH_MR)
+			goto skip_check_range;
+	}
+
 	if (mr_check_range(mr, va + qp->resp.offset, resid)) {
 		state = RESPST_ERR_RKEY_VIOLATION;
 		goto err;
 	}
 
-	if (pkt->mask & RXE_WRITE_MASK)	 {
+skip_check_range:
+	if (pkt->mask & RXE_WRITE_MASK) {
 		if (resid > mtu) {
 			if (pktlen != mtu || bth_pad(pkt)) {
 				state = RESPST_ERR_LENGTH;
@@ -587,11 +637,61 @@ static struct resp_res *rxe_prepare_res(struct rxe_qp *qp,
 		res->last_psn = pkt->psn;
 		res->cur_psn = pkt->psn;
 		break;
+	case RXE_FLUSH_MASK:
+		res->flush.va = qp->resp.va + qp->resp.offset;
+		res->flush.length = qp->resp.length;
+		res->flush.type = feth_plt(pkt);
+		res->flush.level = feth_sel(pkt);
 	}
 
 	return res;
 }
 
+static enum resp_states process_flush(struct rxe_qp *qp,
+				       struct rxe_pkt_info *pkt)
+{
+	u64 length, start;
+	struct rxe_mr *mr = qp->resp.mr;
+	struct resp_res *res = qp->resp.res;
+
+	/* oA19-14, oA19-15 */
+	if (res && res->replay)
+		return RESPST_ACKNOWLEDGE;
+	else if (!res) {
+		res = rxe_prepare_res(qp, pkt, RXE_FLUSH_MASK);
+		qp->resp.res = res;
+	}
+
+	if (res->flush.level == IB_FLUSH_RANGE) {
+		start = res->flush.va;
+		length = res->flush.length;
+	} else { /* level == IB_FLUSH_MR */
+		start = mr->ibmr.iova;
+		length = mr->ibmr.length;
+	}
+
+	if (res->flush.type & IB_FLUSH_PERSISTENT) {
+		if (rxe_flush_pmem_iova(mr, start, length))
+			return RESPST_ERR_RKEY_VIOLATION;
+		/* Make data persistent. */
+		wmb();
+	} else if (res->flush.type & IB_FLUSH_GLOBAL) {
+		/* Make data global visibility. */
+		wmb();
+	}
+
+	qp->resp.msn++;
+
+	/* next expected psn, read handles this separately */
+	qp->resp.psn = (pkt->psn + 1) & BTH_PSN_MASK;
+	qp->resp.ack_psn = qp->resp.psn;
+
+	qp->resp.opcode = pkt->opcode;
+	qp->resp.status = IB_WC_SUCCESS;
+
+	return RESPST_ACKNOWLEDGE;
+}
+
 /* Guarantee atomicity of atomic operations at the machine level. */
 static DEFINE_SPINLOCK(atomic_ops_lock);
 
@@ -888,6 +988,8 @@ static enum resp_states execute(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
 		return RESPST_READ_REPLY;
 	} else if (pkt->mask & RXE_ATOMIC_MASK) {
 		return RESPST_ATOMIC_REPLY;
+	} else if (pkt->mask & RXE_FLUSH_MASK) {
+		return RESPST_PROCESS_FLUSH;
 	} else {
 		/* Unreachable */
 		WARN_ON_ONCE(1);
@@ -1061,6 +1163,19 @@ static int send_atomic_ack(struct rxe_qp *qp, u8 syndrome, u32 psn)
 	return ret;
 }
 
+static int send_read_response_ack(struct rxe_qp *qp, u8 syndrome, u32 psn)
+{
+	int ret = send_common_ack(qp, syndrome, psn,
+			IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY,
+			"RDMA READ response of length zero ACK");
+
+	/* have to clear this since it is used to trigger
+	 * long read replies
+	 */
+	qp->resp.res = NULL;
+	return ret;
+}
+
 static enum resp_states acknowledge(struct rxe_qp *qp,
 				    struct rxe_pkt_info *pkt)
 {
@@ -1071,6 +1186,8 @@ static enum resp_states acknowledge(struct rxe_qp *qp,
 		send_ack(qp, qp->resp.aeth_syndrome, pkt->psn);
 	else if (pkt->mask & RXE_ATOMIC_MASK)
 		send_atomic_ack(qp, AETH_ACK_UNLIMITED, pkt->psn);
+	else if (pkt->mask & RXE_FLUSH_MASK)
+		send_read_response_ack(qp, AETH_ACK_UNLIMITED, pkt->psn);
 	else if (bth_ack(pkt))
 		send_ack(qp, AETH_ACK_UNLIMITED, pkt->psn);
 
@@ -1127,6 +1244,22 @@ static enum resp_states duplicate_request(struct rxe_qp *qp,
 		/* SEND. Ack again and cleanup. C9-105. */
 		send_ack(qp, AETH_ACK_UNLIMITED, prev_psn);
 		return RESPST_CLEANUP;
+	} else if (pkt->mask & RXE_FLUSH_MASK) {
+		struct resp_res *res;
+
+		/* Find the operation in our list of responder resources. */
+		res = find_resource(qp, pkt->psn);
+		if (res) {
+			res->replay = 1;
+			res->cur_psn = pkt->psn;
+			qp->resp.res = res;
+			rc = RESPST_PROCESS_FLUSH;
+			goto out;
+		}
+
+		/* Resource not found. Class D error. Drop the request. */
+		rc = RESPST_CLEANUP;
+		goto out;
 	} else if (pkt->mask & RXE_READ_MASK) {
 		struct resp_res *res;
 
@@ -1320,6 +1453,9 @@ int rxe_responder(void *arg)
 		case RESPST_ATOMIC_REPLY:
 			state = atomic_reply(qp, pkt);
 			break;
+		case RESPST_PROCESS_FLUSH:
+			state = process_flush(qp, pkt);
+			break;
 		case RESPST_ACKNOWLEDGE:
 			state = acknowledge(qp, pkt);
 			break;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 5f5cbfcb3569..4cfe4d8b0aaa 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -165,6 +165,12 @@ struct resp_res {
 			u64		va;
 			u32		resid;
 		} read;
+		struct {
+			u32		length;
+			u64		va;
+			u8		type;
+			u8		level;
+		} flush;
 	};
 };
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [for-next PATCH v5 09/11] RDMA/rxe: Implement flush completion
  2022-09-27  5:53 [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (7 preceding siblings ...)
  2022-09-27  5:53 ` [for-next PATCH v5 08/11] RDMA/rxe: Implement flush execution in responder side Li Zhijian
@ 2022-09-27  5:53 ` Li Zhijian
  2022-09-27  5:53 ` [for-next PATCH v5 10/11] RDMA/cm: Make QP FLUSHABLE Li Zhijian
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 31+ messages in thread
From: Li Zhijian @ 2022-09-27  5:53 UTC (permalink / raw)
  To: Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, y-goto, mbloch, liangwenpeng, tom,
	tomasz.gromadzki, dan.j.williams, linux-kernel, Li Zhijian

Per IBA SPEC, FLUSH will ack in rdma read response with 0 length.

Use IB_WC_FLUSH (aka IB_UVERBS_WC_FLUSH) code to tell userspace a FLUSH
completion.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index fb0c008af78c..2dea786e20ad 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -104,6 +104,7 @@ static enum ib_wc_opcode wr_to_wc_opcode(enum ib_wr_opcode opcode)
 	case IB_WR_LOCAL_INV:			return IB_WC_LOCAL_INV;
 	case IB_WR_REG_MR:			return IB_WC_REG_MR;
 	case IB_WR_BIND_MW:			return IB_WC_BIND_MW;
+	case IB_WR_FLUSH:			return IB_WC_FLUSH;
 
 	default:
 		return 0xff;
@@ -263,7 +264,8 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
 		 */
 	case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE:
 		if (wqe->wr.opcode != IB_WR_RDMA_READ &&
-		    wqe->wr.opcode != IB_WR_RDMA_READ_WITH_INV) {
+		    wqe->wr.opcode != IB_WR_RDMA_READ_WITH_INV &&
+		    wqe->wr.opcode != IB_WR_FLUSH) {
 			wqe->status = IB_WC_FATAL_ERR;
 			return COMPST_ERROR;
 		}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [for-next PATCH v5 10/11] RDMA/cm: Make QP FLUSHABLE
  2022-09-27  5:53 [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (8 preceding siblings ...)
  2022-09-27  5:53 ` [for-next PATCH v5 09/11] RDMA/rxe: Implement flush completion Li Zhijian
@ 2022-09-27  5:53 ` Li Zhijian
  2022-09-27  5:53 ` [for-next PATCH v5 11/11] RDMA/rxe: Enable RDMA FLUSH capability for rxe device Li Zhijian
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 31+ messages in thread
From: Li Zhijian @ 2022-09-27  5:53 UTC (permalink / raw)
  To: Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, y-goto, mbloch, liangwenpeng, tom,
	tomasz.gromadzki, dan.j.williams, linux-kernel, Li Zhijian

It enables flushable access flag for qp

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V5: new patch, inspired by Bob
---
 drivers/infiniband/core/cm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 1f9938a2c475..58837aac980b 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -4096,7 +4096,8 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
 		qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE;
 		if (cm_id_priv->responder_resources)
 			qp_attr->qp_access_flags |= IB_ACCESS_REMOTE_READ |
-						    IB_ACCESS_REMOTE_ATOMIC;
+						    IB_ACCESS_REMOTE_ATOMIC |
+						    IB_ACCESS_FLUSHABLE;
 		qp_attr->pkey_index = cm_id_priv->av.pkey_index;
 		if (cm_id_priv->av.port)
 			qp_attr->port_num = cm_id_priv->av.port->port_num;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [for-next PATCH v5 11/11] RDMA/rxe: Enable RDMA FLUSH capability for rxe device
  2022-09-27  5:53 [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (9 preceding siblings ...)
  2022-09-27  5:53 ` [for-next PATCH v5 10/11] RDMA/cm: Make QP FLUSHABLE Li Zhijian
@ 2022-09-27  5:53 ` Li Zhijian
  2022-10-28 17:44 ` [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Jason Gunthorpe
  2022-10-28 17:57 ` Jason Gunthorpe
  12 siblings, 0 replies; 31+ messages in thread
From: Li Zhijian @ 2022-09-27  5:53 UTC (permalink / raw)
  To: Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, y-goto, mbloch, liangwenpeng, tom,
	tomasz.gromadzki, dan.j.williams, linux-kernel, Li Zhijian

Now we are ready to enable RDMA FLUSH capability for RXE.
It can support Global Visibility and Persistence placement types.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_param.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
index 86c7a8bf3cbb..c7a82823a041 100644
--- a/drivers/infiniband/sw/rxe/rxe_param.h
+++ b/drivers/infiniband/sw/rxe/rxe_param.h
@@ -51,7 +51,9 @@ enum rxe_device_param {
 					| IB_DEVICE_SRQ_RESIZE
 					| IB_DEVICE_MEM_MGT_EXTENSIONS
 					| IB_DEVICE_MEM_WINDOW
-					| IB_DEVICE_MEM_WINDOW_TYPE_2B,
+					| IB_DEVICE_MEM_WINDOW_TYPE_2B
+					| IB_DEVICE_FLUSH_GLOBAL
+					| IB_DEVICE_FLUSH_PERSISTENT,
 	RXE_MAX_SGE			= 32,
 	RXE_MAX_WQE_SIZE		= sizeof(struct rxe_send_wqe) +
 					  sizeof(struct ib_sge) * RXE_MAX_SGE,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 03/11] RDMA: Extend RDMA kernel verbs ABI to support flush
  2022-09-27  5:53 ` [for-next PATCH v5 03/11] RDMA: Extend RDMA kernel verbs " Li Zhijian
@ 2022-09-29  6:21   ` Li Zhijian
  2022-09-30 18:04     ` Jason Gunthorpe
  2022-10-28 17:44   ` Jason Gunthorpe
  1 sibling, 1 reply; 31+ messages in thread
From: Li Zhijian @ 2022-09-29  6:21 UTC (permalink / raw)
  To: Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, y-goto, mbloch, liangwenpeng, tom,
	tomasz.gromadzki, dan.j.williams, linux-kernel

Leon, Jason


On 27/09/2022 13:53, Li Zhijian wrote:
>   /*
> @@ -4321,6 +4330,8 @@ int ib_dealloc_xrcd_user(struct ib_xrcd *xrcd, struct ib_udata *udata);
>   static inline int ib_check_mr_access(struct ib_device *ib_dev,
>   				     unsigned int flags)
>   {
> +	u64 device_cap = ib_dev->attrs.device_cap_flags;
> +
>   	/*
>   	 * Local write permission is required if remote write or
>   	 * remote atomic permission is also requested.
> @@ -4335,6 +4346,13 @@ static inline int ib_check_mr_access(struct ib_device *ib_dev,
>   	if (flags & IB_ACCESS_ON_DEMAND &&
>   	    !(ib_dev->attrs.kernel_cap_flags & IBK_ON_DEMAND_PAGING))
>   		return -EINVAL;
> +
> +	if ((flags & IB_ACCESS_FLUSH_GLOBAL &&
> +	    !(device_cap & IB_DEVICE_FLUSH_GLOBAL)) ||
> +	    (flags & IB_ACCESS_FLUSH_PERSISTENT &&
> +	    !(device_cap & IB_DEVICE_FLUSH_PERSISTENT)))
> +		return -EINVAL;
> +
Regarding of the return value of ib_check_mr_access. While updating the man page of ibv_reg_mr(3) of rdma-core,
```
        IBV_ACCESS_REMOTE_READ Enable Remote Read Access
        IBV_ACCESS_REMOTE_ATOMIC Enable Remote Atomic Operation Access (if supported)
        IBV_ACCESS_MW_BIND Enable Memory Window Binding
        IBV_ACCESS_ZERO_BASED  Use  byte offset from beginning of MR to access this MR, instead of a pointer address
        IBV_ACCESS_ON_DEMAND Create an on-demand paging MR (if supported)
...
RETURN VALUE
        ibv_reg_mr() / ibv_reg_mr_iova() / ibv_reg_dmabuf_mr() returns a pointer to the registered MR, or NULL if the request fails.  The local key (L_Key) field lkey is used as the lkey field of struct  ibv_sge  when  posting
        buffers  with  ibv_post_* verbs, and the the remote key (R_Key) field rkey is used by remote processes to perform Atomic and RDMA operations.  The remote process places this rkey as the rkey field of struct ibv_send_wr
        passed to the ibv_post_send function.
```
we can see, IBV_ACCESS_REMOTE_ATOMIC and IBV_ACCESS_ON_DEMAND are tagged "if supported" . but currently kernel
just returns EINVAL when user registers a MR with IB_ACCESS_ON_DEMAND to RXE.

I wonder we should return -EOPNOTSUPP if the device doesn't support requested capabilities

Thanks
Li


>   	return 0;
>   }
>   


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 03/11] RDMA: Extend RDMA kernel verbs ABI to support flush
  2022-09-29  6:21   ` Li Zhijian
@ 2022-09-30 18:04     ` Jason Gunthorpe
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2022-09-30 18:04 UTC (permalink / raw)
  To: Li Zhijian
  Cc: Bob Pearson, Leon Romanovsky, linux-rdma, Zhu Yanjun, yangx.jy,
	y-goto, mbloch, liangwenpeng, tom, tomasz.gromadzki,
	dan.j.williams, linux-kernel

On Thu, Sep 29, 2022 at 02:21:24PM +0800, Li Zhijian wrote:

> we can see, IBV_ACCESS_REMOTE_ATOMIC and IBV_ACCESS_ON_DEMAND are
> tagged "if supported" . but currently kernel just returns EINVAL
> when user registers a MR with IB_ACCESS_ON_DEMAND to RXE.
>
> I wonder we should return -EOPNOTSUPP if the device doesn't support requested capabilities

Yes, unsupported combinations of access flags should trigger
EOPNOTSUPP

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation
  2022-09-27  5:53 [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (10 preceding siblings ...)
  2022-09-27  5:53 ` [for-next PATCH v5 11/11] RDMA/rxe: Enable RDMA FLUSH capability for rxe device Li Zhijian
@ 2022-10-28 17:44 ` Jason Gunthorpe
  2022-10-28 17:57 ` Jason Gunthorpe
  12 siblings, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2022-10-28 17:44 UTC (permalink / raw)
  To: Li Zhijian
  Cc: Bob Pearson, Leon Romanovsky, linux-rdma, Zhu Yanjun, yangx.jy,
	y-goto, mbloch, liangwenpeng, tom, tomasz.gromadzki,
	dan.j.williams, linux-kernel

On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
> Hey folks,
> 
> Firstly i want to say thank you to all you guys, especially Bob, who in the
> past 1+ month, gave me a lots of idea and inspiration.
> 
> With the your help, some changes are make in 5th version, such as:
> - new names and new patch split schemem, suggested by Bob
> - bugfix: set is_pmem true only if the whole MR is pmem. it's possible the
>   one MR container both PMEM and DRAM.
> - introduce feth structure, instead of u32
> - new bugfix to rxe_lookup_mw() and lookup_mr(), see (RDMA/rxe: make sure requested access is a subset of {mr,mw}->access),
>   with this fix, we remove check_placement_type(), lookup_mr() has done the such check.
> - Enable QP attr flushable
> These change logs also appear in the patch it belongs to.
> 
> These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
> In IB SPEC 1.5[1], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
> added in the MEMORY PLACEMENT EXTENSIONS section.

This doesn't apply anymore, I did try to fix it, but it ended up not
compiling, so it is better if you handle it and repost.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 03/11] RDMA: Extend RDMA kernel verbs ABI to support flush
  2022-09-27  5:53 ` [for-next PATCH v5 03/11] RDMA: Extend RDMA kernel verbs " Li Zhijian
  2022-09-29  6:21   ` Li Zhijian
@ 2022-10-28 17:44   ` Jason Gunthorpe
  2022-10-29  3:15     ` Li Zhijian
  1 sibling, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2022-10-28 17:44 UTC (permalink / raw)
  To: Li Zhijian
  Cc: Bob Pearson, Leon Romanovsky, linux-rdma, Zhu Yanjun, yangx.jy,
	y-goto, mbloch, liangwenpeng, tom, tomasz.gromadzki,
	dan.j.williams, linux-kernel

On Tue, Sep 27, 2022 at 01:53:29PM +0800, Li Zhijian wrote:
> @@ -4321,6 +4330,8 @@ int ib_dealloc_xrcd_user(struct ib_xrcd *xrcd, struct ib_udata *udata);
>  static inline int ib_check_mr_access(struct ib_device *ib_dev,
>  				     unsigned int flags)
>  {
> +	u64 device_cap = ib_dev->attrs.device_cap_flags;
> +
>  	/*
>  	 * Local write permission is required if remote write or
>  	 * remote atomic permission is also requested.
> @@ -4335,6 +4346,13 @@ static inline int ib_check_mr_access(struct ib_device *ib_dev,
>  	if (flags & IB_ACCESS_ON_DEMAND &&
>  	    !(ib_dev->attrs.kernel_cap_flags & IBK_ON_DEMAND_PAGING))
>  		return -EINVAL;
> +
> +	if ((flags & IB_ACCESS_FLUSH_GLOBAL &&
> +	    !(device_cap & IB_DEVICE_FLUSH_GLOBAL)) ||
> +	    (flags & IB_ACCESS_FLUSH_PERSISTENT &&
> +	    !(device_cap & IB_DEVICE_FLUSH_PERSISTENT)))
> +		return -EINVAL;

This should be -EOPNOTSUPP as the above is changed to in for-next

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 01/11] RDMA/rxe: make sure requested access is a subset of {mr,mw}->access
  2022-09-27  5:53 ` [for-next PATCH v5 01/11] RDMA/rxe: make sure requested access is a subset of {mr,mw}->access Li Zhijian
@ 2022-10-28 17:45   ` Jason Gunthorpe
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2022-10-28 17:45 UTC (permalink / raw)
  To: Li Zhijian
  Cc: Bob Pearson, Leon Romanovsky, linux-rdma, Zhu Yanjun, yangx.jy,
	y-goto, mbloch, liangwenpeng, tom, tomasz.gromadzki,
	dan.j.williams, linux-kernel

On Tue, Sep 27, 2022 at 01:53:27PM +0800, Li Zhijian wrote:
> We should reject the requests with access flags that is not registered
> by MR/MW. For example, lookup_mr() should return NULL when requested access
> is 0x03 and mr->access is 0x01.
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_mr.c | 2 +-
>  drivers/infiniband/sw/rxe/rxe_mw.c | 3 +--
>  2 files changed, 2 insertions(+), 3 deletions(-)

I'm going to apply this little bug fix to for-next

Thanks,
Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 05/11] RDMA/rxe: Allow registering persistent flag for pmem MR only
  2022-09-27  5:53 ` [for-next PATCH v5 05/11] RDMA/rxe: Allow registering persistent flag for pmem MR only Li Zhijian
@ 2022-10-28 17:53   ` Jason Gunthorpe
  2022-10-30  3:33     ` Li Zhijian
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2022-10-28 17:53 UTC (permalink / raw)
  To: Li Zhijian
  Cc: Bob Pearson, Leon Romanovsky, linux-rdma, Zhu Yanjun, yangx.jy,
	y-goto, mbloch, liangwenpeng, tom, tomasz.gromadzki,
	dan.j.williams, linux-kernel

On Tue, Sep 27, 2022 at 01:53:31PM +0800, Li Zhijian wrote:
> @@ -122,6 +129,7 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>  	int			num_buf;
>  	void			*vaddr;
>  	int err;
> +	bool is_pmem = false;
>  	int i;
>  
>  	umem = ib_umem_get(&rxe->ib_dev, start, length, access);
> @@ -149,6 +157,7 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>  	num_buf			= 0;
>  	map = mr->map;
>  	if (length > 0) {
> +		is_pmem = true;
>  		buf = map[0]->buf;
>  
>  		for_each_sgtable_page (&umem->sgt_append.sgt, &sg_iter, 0) {
> @@ -166,6 +175,10 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>  				goto err_cleanup_map;
>  			}
>  
> +			/* True only if the *whole* MR is pmem */
> +			if (is_pmem)
> +				is_pmem = vaddr_in_pmem(vaddr);
> +

I'm not so keen on this use of resources, but this should be written more
like

phys = page_to_phys(sg_page_iter_page(&sg_iter))
region_intersects(phys + sg_iter->offset, sg_iter->length,.. )

And you understand this will make memory registration of every RXE
user a bit slower? And actual pmem will be painfully slow.

It seems like we are doing something wrong here..

> @@ -174,6 +187,12 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>  		}
>  	}
>  
> +	if (!is_pmem && access & IB_ACCESS_FLUSH_PERSISTENT) {
> +		pr_warn("Cannot register IB_ACCESS_FLUSH_PERSISTENT for non-pmem memory\n");
> +		err = -EINVAL;
> +		goto err_release_umem;
> +	}

Do not pr_warn on syscall paths

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation
  2022-09-27  5:53 [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (11 preceding siblings ...)
  2022-10-28 17:44 ` [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Jason Gunthorpe
@ 2022-10-28 17:57 ` Jason Gunthorpe
  2022-11-11  2:49   ` Yanjun Zhu
  12 siblings, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2022-10-28 17:57 UTC (permalink / raw)
  To: Li Zhijian
  Cc: Bob Pearson, Leon Romanovsky, linux-rdma, Zhu Yanjun, yangx.jy,
	y-goto, mbloch, liangwenpeng, tom, tomasz.gromadzki,
	dan.j.williams, linux-kernel

On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
> Hey folks,
> 
> Firstly i want to say thank you to all you guys, especially Bob, who in the
> past 1+ month, gave me a lots of idea and inspiration.

I would like it if someone familiar with rxe could reviewed-by the
protocol parts.

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 03/11] RDMA: Extend RDMA kernel verbs ABI to support flush
  2022-10-28 17:44   ` Jason Gunthorpe
@ 2022-10-29  3:15     ` Li Zhijian
  0 siblings, 0 replies; 31+ messages in thread
From: Li Zhijian @ 2022-10-29  3:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bob Pearson, Leon Romanovsky, linux-rdma, Zhu Yanjun, yangx.jy,
	y-goto, mbloch, liangwenpeng, tom, tomasz.gromadzki,
	dan.j.williams, linux-kernel



On 29/10/2022 01:44, Jason Gunthorpe wrote:
> On Tue, Sep 27, 2022 at 01:53:29PM +0800, Li Zhijian wrote:
>> @@ -4321,6 +4330,8 @@ int ib_dealloc_xrcd_user(struct ib_xrcd *xrcd, struct ib_udata *udata);
>>   static inline int ib_check_mr_access(struct ib_device *ib_dev,
>>   				     unsigned int flags)
>>   {
>> +	u64 device_cap = ib_dev->attrs.device_cap_flags;
>> +
>>   	/*
>>   	 * Local write permission is required if remote write or
>>   	 * remote atomic permission is also requested.
>> @@ -4335,6 +4346,13 @@ static inline int ib_check_mr_access(struct ib_device *ib_dev,
>>   	if (flags & IB_ACCESS_ON_DEMAND &&
>>   	    !(ib_dev->attrs.kernel_cap_flags & IBK_ON_DEMAND_PAGING))
>>   		return -EINVAL;
>> +
>> +	if ((flags & IB_ACCESS_FLUSH_GLOBAL &&
>> +	    !(device_cap & IB_DEVICE_FLUSH_GLOBAL)) ||
>> +	    (flags & IB_ACCESS_FLUSH_PERSISTENT &&
>> +	    !(device_cap & IB_DEVICE_FLUSH_PERSISTENT)))
>> +		return -EINVAL;
> This should be -EOPNOTSUPP as the above is changed to in for-next
Yes,  my local tree(V6) had updated this. will repost this later.



>
> Jason


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 05/11] RDMA/rxe: Allow registering persistent flag for pmem MR only
  2022-10-28 17:53   ` Jason Gunthorpe
@ 2022-10-30  3:33     ` Li Zhijian
  0 siblings, 0 replies; 31+ messages in thread
From: Li Zhijian @ 2022-10-30  3:33 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bob Pearson, Leon Romanovsky, linux-rdma, Zhu Yanjun, yangx.jy,
	y-goto, mbloch, liangwenpeng, tom, tomasz.gromadzki,
	dan.j.williams, linux-kernel



On 29/10/2022 01:53, Jason Gunthorpe wrote:
> On Tue, Sep 27, 2022 at 01:53:31PM +0800, Li Zhijian wrote:
>> @@ -122,6 +129,7 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>>   	int			num_buf;
>>   	void			*vaddr;
>>   	int err;
>> +	bool is_pmem = false;
>>   	int i;
>>   
>>   	umem = ib_umem_get(&rxe->ib_dev, start, length, access);
>> @@ -149,6 +157,7 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>>   	num_buf			= 0;
>>   	map = mr->map;
>>   	if (length > 0) {
>> +		is_pmem = true;
>>   		buf = map[0]->buf;
>>   
>>   		for_each_sgtable_page (&umem->sgt_append.sgt, &sg_iter, 0) {
>> @@ -166,6 +175,10 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>>   				goto err_cleanup_map;
>>   			}
>>   
>> +			/* True only if the *whole* MR is pmem */
>> +			if (is_pmem)
>> +				is_pmem = vaddr_in_pmem(vaddr);
>> +
> I'm not so keen on this use of resources, but this should be written more
> like
>
> phys = page_to_phys(sg_page_iter_page(&sg_iter))
> region_intersects(phys + sg_iter->offset, sg_iter->length,.. )
>
> And you understand this will make memory registration of every RXE
> user a bit slower?
Good catch, i missed it before.
I tested it qemu guest in which pmem is backing to a normal file in host.
In this case, this testing take ~+9% overhead(1.2S -> 1.3S) for 1G size mr. most the time was taken by gup.

the real pmem environment will be tested later.

To minimize side effect, i updated the code to do pmem mr checking on if the require_pmem is true.

> region_intersects(phys + sg_iter->offset, sg_iter->length,.. )
I haven't fully apply this suggestion since i think my assumption that a page can only associate to a unique/same
memory zone is true. So i only check 1 byte of each page.



index 5d014cef916e..e4e7c180fa0d 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -112,6 +112,13 @@ void rxe_mr_init_dma(int access, struct rxe_mr *mr)
         mr->ibmr.type = IB_MR_TYPE_DMA;
  }

+static bool paddr_in_pmem(unsigned long paddr)
+{
+       return REGION_INTERSECTS ==
+              region_intersects(paddr, 1, IORESOURCE_MEM,
+                                IORES_DESC_PERSISTENT_MEMORY);
+}
+
  int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
                      int access, struct rxe_mr *mr)
  {
@@ -122,6 +129,7 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
         int                     num_buf;
         void                    *vaddr;
         int err;
+       bool require_pmem = access & IB_ACCESS_FLUSH_PERSISTENT;

         umem = ib_umem_get(&rxe->ib_dev, start, length, access);
         if (IS_ERR(umem)) {
@@ -149,6 +157,7 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
         num_buf                 = 0;
         map = mr->map;
         if (length > 0) {
+               struct page *pg;
                 buf = map[0]->buf;

                 for_each_sgtable_page (&umem->sgt_append.sgt, &sg_iter, 0) {
@@ -158,13 +167,20 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
                                 num_buf = 0;
                         }

-                       vaddr = page_address(sg_page_iter_page(&sg_iter));
+                       pg = sg_page_iter_page(&sg_iter);
+                       vaddr = page_address(pg);
                         if (!vaddr) {
                                 pr_warn("%s: Unable to get virtual address\n",
                                                 __func__);
                                 err = -ENOMEM;
                                 goto err_release_umem;
                         }
+
+                       if (require_pmem && !paddr_in_pmem(page_to_phys(pg))) {
+                               err = -EINVAL;
+                               goto err_release_umem;
+                       }
+
                         buf->addr = (uintptr_t)vaddr;
                         num_buf++;
                         buf++;


> And actual pmem will be painfully slow.
>
> It seems like we are doing something wrong here..
>

Do you think we don't need this patch ?


>> @@ -174,6 +187,12 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>>   		}
>>   	}
>>   
>> +	if (!is_pmem && access & IB_ACCESS_FLUSH_PERSISTENT) {
>> +		pr_warn("Cannot register IB_ACCESS_FLUSH_PERSISTENT for non-pmem memory\n");
>> +		err = -EINVAL;
>> +		goto err_release_umem;
>> +	}
> Do not pr_warn on syscall paths

Got it, will remove it.

Thanks
Zhijian


>
> Jason


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation
  2022-10-28 17:57 ` Jason Gunthorpe
@ 2022-11-11  2:49   ` Yanjun Zhu
  2022-11-11  5:10     ` lizhijian
  0 siblings, 1 reply; 31+ messages in thread
From: Yanjun Zhu @ 2022-11-11  2:49 UTC (permalink / raw)
  To: Jason Gunthorpe, Li Zhijian
  Cc: Bob Pearson, Leon Romanovsky, linux-rdma, Zhu Yanjun, yangx.jy,
	y-goto, mbloch, liangwenpeng, tom, tomasz.gromadzki,
	dan.j.williams, linux-kernel

在 2022/10/29 1:57, Jason Gunthorpe 写道:
> On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
>> Hey folks,
>>
>> Firstly i want to say thank you to all you guys, especially Bob, who in the
>> past 1+ month, gave me a lots of idea and inspiration.
> 
> I would like it if someone familiar with rxe could reviewed-by the
> protocol parts.

Hi, Jason

I reviewed these patches. I am fine with these patches.

Hi, Zhijian

I noticed the followings:
"
$ ./rdma_flush_server -s [server_address] -p [port_number]
client:
$ ./rdma_flush_client -s [server_address] -p [port_number]
"
Can you merge the server and the client to rdma-core?

Thanks,
Zhu Yanjun

> 
> Jason


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation
  2022-11-11  2:49   ` Yanjun Zhu
@ 2022-11-11  5:10     ` lizhijian
  2022-11-11  5:52       ` Yanjun Zhu
  0 siblings, 1 reply; 31+ messages in thread
From: lizhijian @ 2022-11-11  5:10 UTC (permalink / raw)
  To: Yanjun Zhu, Jason Gunthorpe
  Cc: Bob Pearson, Leon Romanovsky, linux-rdma, Zhu Yanjun, yangx.jy,
	Yasunori Gotou (Fujitsu),
	mbloch, liangwenpeng, tom, tomasz.gromadzki, dan.j.williams,
	linux-kernel



On 11/11/2022 10:49, Yanjun Zhu wrote:
> 在 2022/10/29 1:57, Jason Gunthorpe 写道:
>> On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
>>> Hey folks,
>>>
>>> Firstly i want to say thank you to all you guys, especially Bob, who 
>>> in the
>>> past 1+ month, gave me a lots of idea and inspiration.
>>
>> I would like it if someone familiar with rxe could reviewed-by the
>> protocol parts.
> 
> Hi, Jason
> 
> I reviewed these patches. I am fine with these patches.
> 
> Hi, Zhijian
> 
> I noticed the followings:
> "
> $ ./rdma_flush_server -s [server_address] -p [port_number]
> client:
> $ ./rdma_flush_client -s [server_address] -p [port_number]
> "
> Can you merge the server and the client to rdma-core?

Yanjun,

Yes, there was already a draft PR here 
https://github.com/linux-rdma/rdma-core/pull/1181, but it cannot go 
ahead until the kernel's patches are merged.

and i will post a new version these days, would you mind if i add your 
"Reviewed-by" in next version ?



> 
> Thanks,
> Zhu Yanjun
> 
>>
>> Jason
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation
  2022-11-11  5:10     ` lizhijian
@ 2022-11-11  5:52       ` Yanjun Zhu
  2022-11-11  6:10         ` lizhijian
  0 siblings, 1 reply; 31+ messages in thread
From: Yanjun Zhu @ 2022-11-11  5:52 UTC (permalink / raw)
  To: lizhijian, Yanjun Zhu, Jason Gunthorpe
  Cc: Bob Pearson, Leon Romanovsky, linux-rdma, Zhu Yanjun, yangx.jy,
	Yasunori Gotou (Fujitsu),
	mbloch, liangwenpeng, tom, tomasz.gromadzki, dan.j.williams,
	linux-kernel

在 2022/11/11 13:10, lizhijian@fujitsu.com 写道:
> 
> 
> On 11/11/2022 10:49, Yanjun Zhu wrote:
>> 在 2022/10/29 1:57, Jason Gunthorpe 写道:
>>> On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
>>>> Hey folks,
>>>>
>>>> Firstly i want to say thank you to all you guys, especially Bob, who
>>>> in the
>>>> past 1+ month, gave me a lots of idea and inspiration.
>>>
>>> I would like it if someone familiar with rxe could reviewed-by the
>>> protocol parts.
>>
>> Hi, Jason
>>
>> I reviewed these patches. I am fine with these patches.
>>
>> Hi, Zhijian
>>
>> I noticed the followings:
>> "
>> $ ./rdma_flush_server -s [server_address] -p [port_number]
>> client:
>> $ ./rdma_flush_client -s [server_address] -p [port_number]
>> "
>> Can you merge the server and the client to rdma-core?
> 
> Yanjun,
> 
> Yes, there was already a draft PR here
> https://github.com/linux-rdma/rdma-core/pull/1181, but it cannot go
> ahead until the kernel's patches are merged.
> 
> and i will post a new version these days, would you mind if i add your
> "Reviewed-by" in next version ?

Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Thanks.

Another problem, normally rxe should connect to physical ib devices, 
such as mlx ib device. That is, one host is rxe, the other host is mlx 
ib device. The rdma connection should be created between the 2 hosts.

Do you connect to mlx ib device with this RDMA FLUSH operation?
And what is the test result?

Thanks a lot.
Zhu Yanjun

> 
> 
> 
>>
>> Thanks,
>> Zhu Yanjun
>>
>>>
>>> Jason


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation
  2022-11-11  5:52       ` Yanjun Zhu
@ 2022-11-11  6:10         ` lizhijian
  2022-11-11  6:30           ` Yanjun Zhu
  0 siblings, 1 reply; 31+ messages in thread
From: lizhijian @ 2022-11-11  6:10 UTC (permalink / raw)
  To: Yanjun Zhu, Jason Gunthorpe
  Cc: Bob Pearson, Leon Romanovsky, linux-rdma, Zhu Yanjun, yangx.jy,
	Yasunori Gotou (Fujitsu),
	mbloch, liangwenpeng, tom, tomasz.gromadzki, dan.j.williams,
	linux-kernel



On 11/11/2022 13:52, Yanjun Zhu wrote:
> 在 2022/11/11 13:10, lizhijian@fujitsu.com 写道:
>>
>>
>> On 11/11/2022 10:49, Yanjun Zhu wrote:
>>> 在 2022/10/29 1:57, Jason Gunthorpe 写道:
>>>> On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
>>>>> Hey folks,
>>>>>
>>>>> Firstly i want to say thank you to all you guys, especially Bob, who
>>>>> in the
>>>>> past 1+ month, gave me a lots of idea and inspiration.
>>>>
>>>> I would like it if someone familiar with rxe could reviewed-by the
>>>> protocol parts.
>>>
>>> Hi, Jason
>>>
>>> I reviewed these patches. I am fine with these patches.
>>>
>>> Hi, Zhijian
>>>
>>> I noticed the followings:
>>> "
>>> $ ./rdma_flush_server -s [server_address] -p [port_number]
>>> client:
>>> $ ./rdma_flush_client -s [server_address] -p [port_number]
>>> "
>>> Can you merge the server and the client to rdma-core?
>>
>> Yanjun,
>>
>> Yes, there was already a draft PR here
>> https://github.com/linux-rdma/rdma-core/pull/1181, but it cannot go
>> ahead until the kernel's patches are merged.
>>
>> and i will post a new version these days, would you mind if i add your
>> "Reviewed-by" in next version ?
> 
> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> Thanks.
> 
> Another problem, normally rxe should connect to physical ib devices, 
> such as mlx ib device. That is, one host is rxe, the other host is mlx 
> ib device. The rdma connection should be created between the 2 hosts.

it's fully compatible with old operation.


> 
> Do you connect to mlx ib device with this RDMA FLUSH operation?
> And what is the test result?

Yes, i tested it.

After these patches, only RXE device can register *FLUSHABLE* MRs 
successfully. If mlx try that, EOPNOSUPP will be returned.

Similarly, Since other hardwares(MLX for example) have not supported 
FLUSH operation, EOPNOSUPP will be returned if users try to to that.

In short, for RXE requester, MLX responder will return error for the 
request. MLX requester is not able to request a FLUSH operation.

Thanks
Zhijian


> 
> Thanks a lot.
> Zhu Yanjun
> 
>>
>>
>>
>>>
>>> Thanks,
>>> Zhu Yanjun
>>>
>>>>
>>>> Jason
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation
  2022-11-11  6:10         ` lizhijian
@ 2022-11-11  6:30           ` Yanjun Zhu
  2022-11-11  6:38             ` lizhijian
  0 siblings, 1 reply; 31+ messages in thread
From: Yanjun Zhu @ 2022-11-11  6:30 UTC (permalink / raw)
  To: lizhijian, Jason Gunthorpe
  Cc: Bob Pearson, Leon Romanovsky, linux-rdma, Zhu Yanjun, yangx.jy,
	Yasunori Gotou (Fujitsu),
	mbloch, liangwenpeng, tom, tomasz.gromadzki, dan.j.williams,
	linux-kernel


在 2022/11/11 14:10, lizhijian@fujitsu.com 写道:
>
> On 11/11/2022 13:52, Yanjun Zhu wrote:
>> 在 2022/11/11 13:10, lizhijian@fujitsu.com 写道:
>>>
>>> On 11/11/2022 10:49, Yanjun Zhu wrote:
>>>> 在 2022/10/29 1:57, Jason Gunthorpe 写道:
>>>>> On Tue, Sep 27, 2022 at 01:53:26PM +0800, Li Zhijian wrote:
>>>>>> Hey folks,
>>>>>>
>>>>>> Firstly i want to say thank you to all you guys, especially Bob, who
>>>>>> in the
>>>>>> past 1+ month, gave me a lots of idea and inspiration.
>>>>> I would like it if someone familiar with rxe could reviewed-by the
>>>>> protocol parts.
>>>> Hi, Jason
>>>>
>>>> I reviewed these patches. I am fine with these patches.
>>>>
>>>> Hi, Zhijian
>>>>
>>>> I noticed the followings:
>>>> "
>>>> $ ./rdma_flush_server -s [server_address] -p [port_number]
>>>> client:
>>>> $ ./rdma_flush_client -s [server_address] -p [port_number]
>>>> "
>>>> Can you merge the server and the client to rdma-core?
>>> Yanjun,
>>>
>>> Yes, there was already a draft PR here
>>> https://github.com/linux-rdma/rdma-core/pull/1181, but it cannot go
>>> ahead until the kernel's patches are merged.
>>>
>>> and i will post a new version these days, would you mind if i add your
>>> "Reviewed-by" in next version ?
>> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>> Thanks.
>>
>> Another problem, normally rxe should connect to physical ib devices,
>> such as mlx ib device. That is, one host is rxe, the other host is mlx
>> ib device. The rdma connection should be created between the 2 hosts.
> it's fully compatible with old operation.
>
>
>> Do you connect to mlx ib device with this RDMA FLUSH operation?
>> And what is the test result?
> Yes, i tested it.
>
> After these patches, only RXE device can register *FLUSHABLE* MRs
> successfully. If mlx try that, EOPNOSUPP will be returned.
>
> Similarly, Since other hardwares(MLX for example) have not supported
> FLUSH operation, EOPNOSUPP will be returned if users try to to that.
>
> In short, for RXE requester, MLX responder will return error for the
> request. MLX requester is not able to request a FLUSH operation.

Thanks. Do you mean that FLUSH operation is only supported in RXE? ^_^

And MLX does not support FLUSH operation currently?

Zhu Yanjun

>
> Thanks
> Zhijian
>
>
>> Thanks a lot.
>> Zhu Yanjun
>>
>>>
>>>
>>>> Thanks,
>>>> Zhu Yanjun
>>>>
>>>>> Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation
  2022-11-11  6:30           ` Yanjun Zhu
@ 2022-11-11  6:38             ` lizhijian
  2022-11-11  7:08               ` Yanjun Zhu
  0 siblings, 1 reply; 31+ messages in thread
From: lizhijian @ 2022-11-11  6:38 UTC (permalink / raw)
  To: Yanjun Zhu, Jason Gunthorpe
  Cc: Bob Pearson, Leon Romanovsky, linux-rdma, Zhu Yanjun, yangx.jy,
	Yasunori Gotou (Fujitsu),
	mbloch, liangwenpeng, tom, tomasz.gromadzki, dan.j.williams,
	linux-kernel



On 11/11/2022 14:30, Yanjun Zhu wrote:
>>
>> After these patches, only RXE device can register *FLUSHABLE* MRs
>> successfully. If mlx try that, EOPNOSUPP will be returned.
>>
>> Similarly, Since other hardwares(MLX for example) have not supported
>> FLUSH operation, EOPNOSUPP will be returned if users try to to that.
>>
>> In short, for RXE requester, MLX responder will return error for the
>> request. MLX requester is not able to request a FLUSH operation.
> 
> Thanks. Do you mean that FLUSH operation is only supported in RXE? ^_^
> 
> And MLX does not support FLUSH operation currently?

IMO, FLUSH and Atomic Write are newly introduced by IBA spec 1.5 
published in 2021. So hardware/drivers(MLX) should do something to
support it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation
  2022-11-11  6:38             ` lizhijian
@ 2022-11-11  7:08               ` Yanjun Zhu
  0 siblings, 0 replies; 31+ messages in thread
From: Yanjun Zhu @ 2022-11-11  7:08 UTC (permalink / raw)
  To: lizhijian, Jason Gunthorpe
  Cc: Bob Pearson, Leon Romanovsky, linux-rdma, Zhu Yanjun, yangx.jy,
	Yasunori Gotou (Fujitsu),
	mbloch, liangwenpeng, tom, tomasz.gromadzki, dan.j.williams,
	linux-kernel


在 2022/11/11 14:38, lizhijian@fujitsu.com 写道:
>
> On 11/11/2022 14:30, Yanjun Zhu wrote:
>>> After these patches, only RXE device can register *FLUSHABLE* MRs
>>> successfully. If mlx try that, EOPNOSUPP will be returned.
>>>
>>> Similarly, Since other hardwares(MLX for example) have not supported
>>> FLUSH operation, EOPNOSUPP will be returned if users try to to that.
>>>
>>> In short, for RXE requester, MLX responder will return error for the
>>> request. MLX requester is not able to request a FLUSH operation.
>> Thanks. Do you mean that FLUSH operation is only supported in RXE? ^_^
>>
>> And MLX does not support FLUSH operation currently?
> IMO, FLUSH and Atomic Write are newly introduced by IBA spec 1.5
> published in 2021. So hardware/drivers(MLX) should do something to
> support it.

Thanks.

If I got you correctly, FLUSH and Atomic Write is a new feature. And 
from the test result, it is not supported by MLX driver currently.

Wait for MLX Engineer for updates about FLUSH and Atomic Write.

IMO, it had better make rxe successfully connect to one physical ib 
device with FLUSH and Atomic Write, such as MLX or others.

Zhu Yanjun


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 06/11] RDMA/rxe: Extend rxe packet format to support flush
  2022-09-27  5:53 ` [for-next PATCH v5 06/11] RDMA/rxe: Extend rxe packet format to support flush Li Zhijian
@ 2022-11-11  8:43   ` Yanjun Zhu
  2022-11-11  8:55     ` lizhijian
  0 siblings, 1 reply; 31+ messages in thread
From: Yanjun Zhu @ 2022-11-11  8:43 UTC (permalink / raw)
  To: Li Zhijian, Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, y-goto, mbloch, liangwenpeng, tom,
	tomasz.gromadzki, dan.j.williams, linux-kernel

在 2022/9/27 13:53, Li Zhijian 写道:
> Extend rxe opcode tables, headers, helper and constants to support
> flush operations.
> 
> Refer to the IBA A19.4.1 for more FETH definition details
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
> V5: new FETH structure and simplify header helper
>      new names and new patch split scheme, suggested by Bob.
> ---
>   drivers/infiniband/sw/rxe/rxe_hdr.h    | 47 ++++++++++++++++++++++++++
>   drivers/infiniband/sw/rxe/rxe_opcode.c | 17 ++++++++++
>   drivers/infiniband/sw/rxe/rxe_opcode.h | 16 +++++----
>   3 files changed, 74 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_hdr.h b/drivers/infiniband/sw/rxe/rxe_hdr.h
> index e432f9e37795..e995a97c54fd 100644
> --- a/drivers/infiniband/sw/rxe/rxe_hdr.h
> +++ b/drivers/infiniband/sw/rxe/rxe_hdr.h
> @@ -607,6 +607,52 @@ static inline void reth_set_len(struct rxe_pkt_info *pkt, u32 len)
>   		rxe_opcode[pkt->opcode].offset[RXE_RETH], len);
>   }
>   
> +/******************************************************************************
> + * FLUSH Extended Transport Header
> + ******************************************************************************/
> +
> +struct rxe_feth {
> +	__be32 bits;
> +};
> +
> +#define FETH_PLT_MASK		(0x0000000f) /* bits 3-0 */
> +#define FETH_SEL_MASK		(0x00000030) /* bits 5-4 */
> +#define FETH_SEL_SHIFT		(4U)
> +
> +static inline u32 __feth_plt(void *arg)
> +{
> +	struct rxe_feth *feth = arg;
> +
> +	return be32_to_cpu(feth->bits) & FETH_PLT_MASK;
> +}
> +
> +static inline u32 __feth_sel(void *arg)
> +{
> +	struct rxe_feth *feth = arg;
> +
> +	return (be32_to_cpu(feth->bits) & FETH_SEL_MASK) >> FETH_SEL_SHIFT;
> +}
> +
> +static inline u32 feth_plt(struct rxe_pkt_info *pkt)
> +{
> +	return __feth_plt(pkt->hdr + rxe_opcode[pkt->opcode].offset[RXE_FETH]);
> +}
> +
> +static inline u32 feth_sel(struct rxe_pkt_info *pkt)
> +{
> +	return __feth_sel(pkt->hdr + rxe_opcode[pkt->opcode].offset[RXE_FETH]);
> +}
> +
> +static inline void feth_init(struct rxe_pkt_info *pkt, u8 type, u8 level)
> +{
> +	struct rxe_feth *feth = (struct rxe_feth *)
> +		    (pkt->hdr + rxe_opcode[pkt->opcode].offset[RXE_FETH]);
> +	u32 bits = ((level << FETH_SEL_SHIFT) & FETH_SEL_MASK) |
> +		   (type & FETH_PLT_MASK);
> +
> +	feth->bits = cpu_to_be32(bits);
> +}
> +
>   /******************************************************************************
>    * Atomic Extended Transport Header
>    ******************************************************************************/
> @@ -910,6 +956,7 @@ enum rxe_hdr_length {
>   	RXE_ATMETH_BYTES	= sizeof(struct rxe_atmeth),
>   	RXE_IETH_BYTES		= sizeof(struct rxe_ieth),
>   	RXE_RDETH_BYTES		= sizeof(struct rxe_rdeth),
> +	RXE_FETH_BYTES		= sizeof(struct rxe_feth),
>   };
>   
>   static inline size_t header_size(struct rxe_pkt_info *pkt)
> diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.c b/drivers/infiniband/sw/rxe/rxe_opcode.c
> index d4ba4d506f17..55aad13e57bb 100644
> --- a/drivers/infiniband/sw/rxe/rxe_opcode.c
> +++ b/drivers/infiniband/sw/rxe/rxe_opcode.c
> @@ -101,6 +101,12 @@ struct rxe_wr_opcode_info rxe_wr_opcode_info[] = {
>   			[IB_QPT_UC]	= WR_LOCAL_OP_MASK,
>   		},
>   	},
> +	[IB_WR_FLUSH]					= {
> +		.name   = "IB_WR_FLUSH",
> +		.mask   = {
> +			[IB_QPT_RC]	= WR_FLUSH_MASK,
> +		},
> +	},
>   };

Hi, Zhijian

I am making tests with it. Except rc, other modes are supported? such as 
rd, xrc?

Zhu Yanjun

>   
>   struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = {
> @@ -378,6 +384,17 @@ struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = {
>   					  RXE_IETH_BYTES,
>   		}
>   	},
> +	[IB_OPCODE_RC_FLUSH]					= {
> +		.name	= "IB_OPCODE_RC_FLUSH",
> +		.mask	= RXE_FETH_MASK | RXE_RETH_MASK | RXE_FLUSH_MASK |
> +			  RXE_START_MASK | RXE_END_MASK | RXE_REQ_MASK,
> +		.length = RXE_BTH_BYTES + RXE_FETH_BYTES + RXE_RETH_BYTES,
> +		.offset = {
> +			[RXE_BTH]	= 0,
> +			[RXE_FETH]	= RXE_BTH_BYTES,
> +			[RXE_RETH]	= RXE_BTH_BYTES + RXE_FETH_BYTES,
> +		}
> +	},
>   
>   	/* UC */
>   	[IB_OPCODE_UC_SEND_FIRST]			= {
> diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.h b/drivers/infiniband/sw/rxe/rxe_opcode.h
> index 8f9aaaf260f2..02d256745793 100644
> --- a/drivers/infiniband/sw/rxe/rxe_opcode.h
> +++ b/drivers/infiniband/sw/rxe/rxe_opcode.h
> @@ -19,7 +19,8 @@ enum rxe_wr_mask {
>   	WR_SEND_MASK			= BIT(2),
>   	WR_READ_MASK			= BIT(3),
>   	WR_WRITE_MASK			= BIT(4),
> -	WR_LOCAL_OP_MASK		= BIT(5),
> +	WR_FLUSH_MASK			= BIT(5),
> +	WR_LOCAL_OP_MASK		= BIT(6),
>   
>   	WR_READ_OR_WRITE_MASK		= WR_READ_MASK | WR_WRITE_MASK,
>   	WR_WRITE_OR_SEND_MASK		= WR_WRITE_MASK | WR_SEND_MASK,
> @@ -47,6 +48,7 @@ enum rxe_hdr_type {
>   	RXE_RDETH,
>   	RXE_DETH,
>   	RXE_IMMDT,
> +	RXE_FETH,
>   	RXE_PAYLOAD,
>   	NUM_HDR_TYPES
>   };
> @@ -63,6 +65,7 @@ enum rxe_hdr_mask {
>   	RXE_IETH_MASK		= BIT(RXE_IETH),
>   	RXE_RDETH_MASK		= BIT(RXE_RDETH),
>   	RXE_DETH_MASK		= BIT(RXE_DETH),
> +	RXE_FETH_MASK		= BIT(RXE_FETH),
>   	RXE_PAYLOAD_MASK	= BIT(RXE_PAYLOAD),
>   
>   	RXE_REQ_MASK		= BIT(NUM_HDR_TYPES + 0),
> @@ -71,13 +74,14 @@ enum rxe_hdr_mask {
>   	RXE_WRITE_MASK		= BIT(NUM_HDR_TYPES + 3),
>   	RXE_READ_MASK		= BIT(NUM_HDR_TYPES + 4),
>   	RXE_ATOMIC_MASK		= BIT(NUM_HDR_TYPES + 5),
> +	RXE_FLUSH_MASK		= BIT(NUM_HDR_TYPES + 6),
>   
> -	RXE_RWR_MASK		= BIT(NUM_HDR_TYPES + 6),
> -	RXE_COMP_MASK		= BIT(NUM_HDR_TYPES + 7),
> +	RXE_RWR_MASK		= BIT(NUM_HDR_TYPES + 7),
> +	RXE_COMP_MASK		= BIT(NUM_HDR_TYPES + 8),
>   
> -	RXE_START_MASK		= BIT(NUM_HDR_TYPES + 8),
> -	RXE_MIDDLE_MASK		= BIT(NUM_HDR_TYPES + 9),
> -	RXE_END_MASK		= BIT(NUM_HDR_TYPES + 10),
> +	RXE_START_MASK		= BIT(NUM_HDR_TYPES + 9),
> +	RXE_MIDDLE_MASK		= BIT(NUM_HDR_TYPES + 10),
> +	RXE_END_MASK		= BIT(NUM_HDR_TYPES + 11),
>   
>   	RXE_LOOPBACK_MASK	= BIT(NUM_HDR_TYPES + 12),
>   


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 06/11] RDMA/rxe: Extend rxe packet format to support flush
  2022-11-11  8:43   ` Yanjun Zhu
@ 2022-11-11  8:55     ` lizhijian
  2022-11-11  9:28       ` Yanjun Zhu
  0 siblings, 1 reply; 31+ messages in thread
From: lizhijian @ 2022-11-11  8:55 UTC (permalink / raw)
  To: Yanjun Zhu, Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, Yasunori Gotou (Fujitsu),
	mbloch, liangwenpeng, tom, tomasz.gromadzki, dan.j.williams,
	linux-kernel



On 11/11/2022 16:43, Yanjun Zhu wrote:
>> /******************************************************************************
>>    * Atomic Extended Transport Header
>>    
>> ******************************************************************************/
>> @@ -910,6 +956,7 @@ enum rxe_hdr_length {
>>       RXE_ATMETH_BYTES    = sizeof(struct rxe_atmeth),
>>       RXE_IETH_BYTES        = sizeof(struct rxe_ieth),
>>       RXE_RDETH_BYTES        = sizeof(struct rxe_rdeth),
>> +    RXE_FETH_BYTES        = sizeof(struct rxe_feth),
>>   };
>>   static inline size_t header_size(struct rxe_pkt_info *pkt)
>> diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.c 
>> b/drivers/infiniband/sw/rxe/rxe_opcode.c
>> index d4ba4d506f17..55aad13e57bb 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_opcode.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_opcode.c
>> @@ -101,6 +101,12 @@ struct rxe_wr_opcode_info rxe_wr_opcode_info[] = {
>>               [IB_QPT_UC]    = WR_LOCAL_OP_MASK,
>>           },
>>       },
>> +    [IB_WR_FLUSH]                    = {
>> +        .name   = "IB_WR_FLUSH",
>> +        .mask   = {
>> +            [IB_QPT_RC]    = WR_FLUSH_MASK,
>> +        },
>> +    },
>>   };
> 
> Hi, Zhijian
> 
> I am making tests with it. Except rc, other modes are supported? such as 
> rd, xrc?
> 

Only RC is implemented for FLUSH, current RXE only supports RC service[1].
BTW, XRC is on the way in Bob's patch IIRC.

https://lore.kernel.org/r/cce0f07d-25fc-5880-69e7-001d951750b7@gmail.com



> Zhu Yanjun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [for-next PATCH v5 06/11] RDMA/rxe: Extend rxe packet format to support flush
  2022-11-11  8:55     ` lizhijian
@ 2022-11-11  9:28       ` Yanjun Zhu
  0 siblings, 0 replies; 31+ messages in thread
From: Yanjun Zhu @ 2022-11-11  9:28 UTC (permalink / raw)
  To: lizhijian, Bob Pearson, Leon Romanovsky, Jason Gunthorpe, linux-rdma
  Cc: Zhu Yanjun, yangx.jy, Yasunori Gotou (Fujitsu),
	mbloch, liangwenpeng, tom, tomasz.gromadzki, dan.j.williams,
	linux-kernel


在 2022/11/11 16:55, lizhijian@fujitsu.com 写道:
>
> On 11/11/2022 16:43, Yanjun Zhu wrote:
>>> /******************************************************************************
>>>     * Atomic Extended Transport Header
>>>     
>>> ******************************************************************************/
>>> @@ -910,6 +956,7 @@ enum rxe_hdr_length {
>>>        RXE_ATMETH_BYTES    = sizeof(struct rxe_atmeth),
>>>        RXE_IETH_BYTES        = sizeof(struct rxe_ieth),
>>>        RXE_RDETH_BYTES        = sizeof(struct rxe_rdeth),
>>> +    RXE_FETH_BYTES        = sizeof(struct rxe_feth),
>>>    };
>>>    static inline size_t header_size(struct rxe_pkt_info *pkt)
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.c
>>> b/drivers/infiniband/sw/rxe/rxe_opcode.c
>>> index d4ba4d506f17..55aad13e57bb 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_opcode.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_opcode.c
>>> @@ -101,6 +101,12 @@ struct rxe_wr_opcode_info rxe_wr_opcode_info[] = {
>>>                [IB_QPT_UC]    = WR_LOCAL_OP_MASK,
>>>            },
>>>        },
>>> +    [IB_WR_FLUSH]                    = {
>>> +        .name   = "IB_WR_FLUSH",
>>> +        .mask   = {
>>> +            [IB_QPT_RC]    = WR_FLUSH_MASK,
>>> +        },
>>> +    },
>>>    };
>> Hi, Zhijian
>>
>> I am making tests with it. Except rc, other modes are supported? such as
>> rd, xrc?
>>
> Only RC is implemented for FLUSH, current RXE only supports RC service[1].
> BTW, XRC is on the way in Bob's patch IIRC.
>
> https://lore.kernel.org/r/cce0f07d-25fc-5880-69e7-001d951750b7@gmail.com
  40  * IBA header types and methods
  41  *
  42  * Some of these are for reference and completeness only since


  43  * rxe does not currently support RD transport


  44  * most of this could be moved into IB core. ib_pack.h has

  45  * part of this but is incomplete

Zhu Yanjun

>
>
>> Zhu Yanjun

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-11-11  9:29 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27  5:53 [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
2022-09-27  5:53 ` [for-next PATCH v5 01/11] RDMA/rxe: make sure requested access is a subset of {mr,mw}->access Li Zhijian
2022-10-28 17:45   ` Jason Gunthorpe
2022-09-27  5:53 ` [for-next PATCH v5 02/11] RDMA: Extend RDMA user ABI to support flush Li Zhijian
2022-09-27  5:53 ` [for-next PATCH v5 03/11] RDMA: Extend RDMA kernel verbs " Li Zhijian
2022-09-29  6:21   ` Li Zhijian
2022-09-30 18:04     ` Jason Gunthorpe
2022-10-28 17:44   ` Jason Gunthorpe
2022-10-29  3:15     ` Li Zhijian
2022-09-27  5:53 ` [for-next PATCH v5 04/11] RDMA/rxe: Extend rxe user " Li Zhijian
2022-09-27  5:53 ` [for-next PATCH v5 05/11] RDMA/rxe: Allow registering persistent flag for pmem MR only Li Zhijian
2022-10-28 17:53   ` Jason Gunthorpe
2022-10-30  3:33     ` Li Zhijian
2022-09-27  5:53 ` [for-next PATCH v5 06/11] RDMA/rxe: Extend rxe packet format to support flush Li Zhijian
2022-11-11  8:43   ` Yanjun Zhu
2022-11-11  8:55     ` lizhijian
2022-11-11  9:28       ` Yanjun Zhu
2022-09-27  5:53 ` [for-next PATCH v5 07/11] RDMA/rxe: Implement RC RDMA FLUSH service in requester side Li Zhijian
2022-09-27  5:53 ` [for-next PATCH v5 08/11] RDMA/rxe: Implement flush execution in responder side Li Zhijian
2022-09-27  5:53 ` [for-next PATCH v5 09/11] RDMA/rxe: Implement flush completion Li Zhijian
2022-09-27  5:53 ` [for-next PATCH v5 10/11] RDMA/cm: Make QP FLUSHABLE Li Zhijian
2022-09-27  5:53 ` [for-next PATCH v5 11/11] RDMA/rxe: Enable RDMA FLUSH capability for rxe device Li Zhijian
2022-10-28 17:44 ` [for-next PATCH v5 00/11] RDMA/rxe: Add RDMA FLUSH operation Jason Gunthorpe
2022-10-28 17:57 ` Jason Gunthorpe
2022-11-11  2:49   ` Yanjun Zhu
2022-11-11  5:10     ` lizhijian
2022-11-11  5:52       ` Yanjun Zhu
2022-11-11  6:10         ` lizhijian
2022-11-11  6:30           ` Yanjun Zhu
2022-11-11  6:38             ` lizhijian
2022-11-11  7:08               ` Yanjun Zhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).