All of lore.kernel.org
 help / color / mirror / Atom feed
* [for-next PATCH v6 00/10] RDMA/rxe: Add RDMA FLUSH operation
@ 2022-11-16  8:19 Li Zhijian
  2022-11-16  8:19 ` [for-next PATCH v6 01/10] RDMA: Extend RDMA user ABI to support flush Li Zhijian
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Li Zhijian @ 2022-11-16  8:19 UTC (permalink / raw)
  To: zyjzyj2000, jgg, leon, Bob Pearson, linux-rdma
  Cc: Mark Bloch, Tom Talpey, tomasz.gromadzki, Dan Williams,
	Xiao Yang, y-goto, linux-kernel, Li Zhijian

Hey folks,

Changes in V6:
- rebase to for-next(v6.1-rc1)
- add Yanjun's reviewed-by except "Allow registering persistent flag for pmem MR only"
- minimize pmem checking side effect # Jason
- return EOPNOSUPP if HCA doesn't support flush operation

These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
In IB SPEC 1.5[1], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
added in the MEMORY PLACEMENT EXTENSIONS section.

This patchset makes SoftRoCE support new RDMA FLUSH on RC service.

You can verify the patchset by building and running the rdma_flush example[2].
server:
$ ./rdma_flush_server -s [server_address] -p [port_number]
client:
$ ./rdma_flush_client -s [server_address] -p [port_number]

Corresponding pyverbs and tests(tests.test_qpex.QpExTestCase.test_qp_ex_rc_rdma_flush)
are also added to rdma-core

[1]: https://www.infinibandta.org/wp-content/uploads/2021/08/IBTA-Overview-of-IBTA-Volume-1-Release-1.5-and-MPE-2021-08-17-Secure.pptx
[2]: https://github.com/zhijianli88/rdma-core/tree/rdma-flush-v5

CC: Xiao Yang <yangx.jy@fujitsu.com>
CC: "Gotou, Yasunori" <y-goto@fujitsu.com>
CC: Jason Gunthorpe <jgg@ziepe.ca>
CC: Zhu Yanjun <zyjzyj2000@gmail.com>
CC: Leon Romanovsky <leon@kernel.org>
CC: Bob Pearson <rpearsonhpe@gmail.com>
CC: Mark Bloch <mbloch@nvidia.com>
CC: Tom Talpey <tom@talpey.com>
CC: "Gromadzki, Tomasz" <tomasz.gromadzki@intel.com>
CC: Dan Williams <dan.j.williams@intel.com>
CC: linux-rdma@vger.kernel.org
CC: linux-kernel@vger.kernel.org

Can also access the kernel source in:
https://github.com/zhijianli88/linux/tree/rdma-flush-v6
Changes log
V5: https://lore.kernel.org/lkml/20220927055337.22630-12-lizhijian@fujitsu.com/t/
V4:
- rework responder process
- rebase to v5.19+
- remove [7/7]: RDMA/rxe: Add RD FLUSH service support since RD is not really supported

V3:
- Just rebase and commit log and comment updates
- delete patch-1: "RDMA: mr: Introduce is_pmem", which will be combined into "Allow registering persistent flag for pmem MR only"
- delete patch-7

V2:
RDMA: mr: Introduce is_pmem
   check 1st byte to avoid crossing page boundary
   new scheme to check is_pmem # Dan

RDMA: Allow registering MR with flush access flags
   combine with [03/10] RDMA/rxe: Allow registering FLUSH flags for supported device only to this patch # Jason
   split RDMA_FLUSH to 2 capabilities

RDMA/rxe: Allow registering persistent flag for pmem MR only
   update commit message, get rid of confusing ib_check_flush_access_flags() # Tom

RDMA/rxe: Implement RC RDMA FLUSH service in requester side
   extend flush to include length field. # Tom and Tomasz

RDMA/rxe: Implement flush execution in responder side
   adjust start for WHOLE MR level # Tom
   don't support DMA mr for flush # Tom
   check flush return value

RDMA/rxe: Enable RDMA FLUSH capability for rxe device
   adjust patch's order. move it here from [04/10]

Li Zhijian (10):
  RDMA: Extend RDMA user ABI to support flush
  RDMA: Extend RDMA kernel verbs ABI to support flush
  RDMA/rxe: Extend rxe user ABI to support flush
  RDMA/rxe: Allow registering persistent flag for pmem MR only
  RDMA/rxe: Extend rxe packet format to support flush
  RDMA/rxe: Implement RC RDMA FLUSH service in requester side
  RDMA/rxe: Implement flush execution in responder side
  RDMA/rxe: Implement flush completion
  RDMA/cm: Make QP FLUSHABLE
  RDMA/rxe: Enable RDMA FLUSH capability for rxe device

 drivers/infiniband/core/cm.c            |   3 +-
 drivers/infiniband/sw/rxe/rxe_comp.c    |   4 +-
 drivers/infiniband/sw/rxe/rxe_hdr.h     |  47 +++++++
 drivers/infiniband/sw/rxe/rxe_loc.h     |   1 +
 drivers/infiniband/sw/rxe/rxe_mr.c      |  58 +++++++-
 drivers/infiniband/sw/rxe/rxe_opcode.c  |  17 +++
 drivers/infiniband/sw/rxe/rxe_opcode.h  |  16 ++-
 drivers/infiniband/sw/rxe/rxe_param.h   |   4 +-
 drivers/infiniband/sw/rxe/rxe_req.c     |  15 +-
 drivers/infiniband/sw/rxe/rxe_resp.c    | 176 +++++++++++++++++++++---
 drivers/infiniband/sw/rxe/rxe_verbs.h   |   6 +
 include/rdma/ib_pack.h                  |   3 +
 include/rdma/ib_verbs.h                 |  20 ++-
 include/uapi/rdma/ib_user_ioctl_verbs.h |   2 +
 include/uapi/rdma/ib_user_verbs.h       |  16 +++
 include/uapi/rdma/rdma_user_rxe.h       |   7 +
 16 files changed, 362 insertions(+), 33 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [for-next PATCH v6 01/10] RDMA: Extend RDMA user ABI to support flush
  2022-11-16  8:19 [for-next PATCH v6 00/10] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
@ 2022-11-16  8:19 ` Li Zhijian
  2022-11-16  8:19 ` [for-next PATCH v6 02/10] RDMA: Extend RDMA kernel verbs " Li Zhijian
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2022-11-16  8:19 UTC (permalink / raw)
  To: zyjzyj2000, jgg, leon, Bob Pearson, linux-rdma
  Cc: Mark Bloch, Tom Talpey, tomasz.gromadzki, Dan Williams,
	Xiao Yang, y-goto, linux-kernel, Li Zhijian

This commit extends the RDMA user ABI to support the flush
operation defined in IBA A19.4.1. These changes are
backwards compatible with the existing RDMA user ABI.

Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V5: new names and new patch split scheme, suggested by Bob
---
 include/uapi/rdma/ib_user_ioctl_verbs.h |  2 ++
 include/uapi/rdma/ib_user_verbs.h       | 16 ++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index 7dd56210226f..07b105e22f6f 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -57,6 +57,8 @@ enum ib_uverbs_access_flags {
 	IB_UVERBS_ACCESS_ZERO_BASED = 1 << 5,
 	IB_UVERBS_ACCESS_ON_DEMAND = 1 << 6,
 	IB_UVERBS_ACCESS_HUGETLB = 1 << 7,
+	IB_UVERBS_ACCESS_FLUSH_GLOBAL = 1 << 8,
+	IB_UVERBS_ACCESS_FLUSH_PERSISTENT = 1 << 9,
 
 	IB_UVERBS_ACCESS_RELAXED_ORDERING = IB_UVERBS_ACCESS_OPTIONAL_FIRST,
 	IB_UVERBS_ACCESS_OPTIONAL_RANGE =
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 43672cb1fd57..2d5f32d9d0d9 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -105,6 +105,18 @@ enum {
 	IB_USER_VERBS_EX_CMD_MODIFY_CQ
 };
 
+/* see IBA A19.4.1.1 Placement Types */
+enum ib_placement_type {
+	IB_FLUSH_GLOBAL = 1U << 0,
+	IB_FLUSH_PERSISTENT = 1U << 1,
+};
+
+/* see IBA A19.4.1.2 Selectivity Level */
+enum ib_selectivity_level {
+	IB_FLUSH_RANGE = 0,
+	IB_FLUSH_MR,
+};
+
 /*
  * Make sure that all structs defined in this file remain laid out so
  * that they pack the same way on 32-bit and 64-bit architectures (to
@@ -466,6 +478,7 @@ enum ib_uverbs_wc_opcode {
 	IB_UVERBS_WC_BIND_MW = 5,
 	IB_UVERBS_WC_LOCAL_INV = 6,
 	IB_UVERBS_WC_TSO = 7,
+	IB_UVERBS_WC_FLUSH = 8,
 };
 
 struct ib_uverbs_wc {
@@ -784,6 +797,7 @@ enum ib_uverbs_wr_opcode {
 	IB_UVERBS_WR_RDMA_READ_WITH_INV = 11,
 	IB_UVERBS_WR_MASKED_ATOMIC_CMP_AND_SWP = 12,
 	IB_UVERBS_WR_MASKED_ATOMIC_FETCH_AND_ADD = 13,
+	IB_UVERBS_WR_FLUSH = 14,
 	/* Review enum ib_wr_opcode before modifying this */
 };
 
@@ -1331,6 +1345,8 @@ enum ib_uverbs_device_cap_flags {
 	/* Deprecated. Please use IB_UVERBS_RAW_PACKET_CAP_SCATTER_FCS. */
 	IB_UVERBS_DEVICE_RAW_SCATTER_FCS = 1ULL << 34,
 	IB_UVERBS_DEVICE_PCI_WRITE_END_PADDING = 1ULL << 36,
+	IB_UVERBS_DEVICE_FLUSH_GLOBAL = 1ULL << 38,
+	IB_UVERBS_DEVICE_FLUSH_PERSISTENT = 1ULL << 39,
 };
 
 enum ib_uverbs_raw_packet_caps {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [for-next PATCH v6 02/10] RDMA: Extend RDMA kernel verbs ABI to support flush
  2022-11-16  8:19 [for-next PATCH v6 00/10] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
  2022-11-16  8:19 ` [for-next PATCH v6 01/10] RDMA: Extend RDMA user ABI to support flush Li Zhijian
@ 2022-11-16  8:19 ` Li Zhijian
  2022-11-16  8:19 ` [for-next PATCH v6 03/10] RDMA/rxe: Extend rxe user " Li Zhijian
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2022-11-16  8:19 UTC (permalink / raw)
  To: zyjzyj2000, jgg, leon, Bob Pearson, linux-rdma
  Cc: Mark Bloch, Tom Talpey, tomasz.gromadzki, Dan Williams,
	Xiao Yang, y-goto, linux-kernel, Li Zhijian

This commit extends the RDMA kernel verbs ABI to support the flush
operation defined in IBA A19.4.1. These changes are
backwards compatible with the existing RDMA kernel verbs ABI.

It makes device/HCA support new FLUSH attributes/capabilities, and it
also makes memory region support new FLUSH access flags.

Users can use ibv_reg_mr(3) to register flush access flags. Only the
access flags also supported by device's capabilities can be registered
successfully.

Once registered successfully, it means the MR is flushable. Similarly,
A flushable MR should also have one or both of GLOBAL_VISIBILITY and
PERSISTENT attributes/capabilities like device/HCA.

Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V5: new names and new patch split scheme, suggested by Bob
---
 include/rdma/ib_pack.h  |  3 +++
 include/rdma/ib_verbs.h | 20 +++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index a9162f25beaf..56211d1cc9f9 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -84,6 +84,7 @@ enum {
 	/* opcode 0x15 is reserved */
 	IB_OPCODE_SEND_LAST_WITH_INVALIDATE         = 0x16,
 	IB_OPCODE_SEND_ONLY_WITH_INVALIDATE         = 0x17,
+	IB_OPCODE_FLUSH                             = 0x1C,
 
 	/* real constants follow -- see comment about above IB_OPCODE()
 	   macro for more details */
@@ -112,6 +113,7 @@ enum {
 	IB_OPCODE(RC, FETCH_ADD),
 	IB_OPCODE(RC, SEND_LAST_WITH_INVALIDATE),
 	IB_OPCODE(RC, SEND_ONLY_WITH_INVALIDATE),
+	IB_OPCODE(RC, FLUSH),
 
 	/* UC */
 	IB_OPCODE(UC, SEND_FIRST),
@@ -149,6 +151,7 @@ enum {
 	IB_OPCODE(RD, ATOMIC_ACKNOWLEDGE),
 	IB_OPCODE(RD, COMPARE_SWAP),
 	IB_OPCODE(RD, FETCH_ADD),
+	IB_OPCODE(RD, FLUSH),
 
 	/* UD */
 	IB_OPCODE(UD, SEND_ONLY),
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index a1f4d53a4bb6..bd436e0135ba 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -270,6 +270,9 @@ enum ib_device_cap_flags {
 	/* The device supports padding incoming writes to cacheline. */
 	IB_DEVICE_PCI_WRITE_END_PADDING =
 		IB_UVERBS_DEVICE_PCI_WRITE_END_PADDING,
+	/* Placement type attributes */
+	IB_DEVICE_FLUSH_GLOBAL = IB_UVERBS_DEVICE_FLUSH_GLOBAL,
+	IB_DEVICE_FLUSH_PERSISTENT = IB_UVERBS_DEVICE_FLUSH_PERSISTENT,
 };
 
 enum ib_kernel_cap_flags {
@@ -985,6 +988,7 @@ enum ib_wc_opcode {
 	IB_WC_REG_MR,
 	IB_WC_MASKED_COMP_SWAP,
 	IB_WC_MASKED_FETCH_ADD,
+	IB_WC_FLUSH = IB_UVERBS_WC_FLUSH,
 /*
  * Set value of IB_WC_RECV so consumers can test if a completion is a
  * receive by testing (opcode & IB_WC_RECV).
@@ -1325,6 +1329,7 @@ enum ib_wr_opcode {
 		IB_UVERBS_WR_MASKED_ATOMIC_CMP_AND_SWP,
 	IB_WR_MASKED_ATOMIC_FETCH_AND_ADD =
 		IB_UVERBS_WR_MASKED_ATOMIC_FETCH_AND_ADD,
+	IB_WR_FLUSH = IB_UVERBS_WR_FLUSH,
 
 	/* These are kernel only and can not be issued by userspace */
 	IB_WR_REG_MR = 0x20,
@@ -1458,10 +1463,14 @@ enum ib_access_flags {
 	IB_ACCESS_ON_DEMAND = IB_UVERBS_ACCESS_ON_DEMAND,
 	IB_ACCESS_HUGETLB = IB_UVERBS_ACCESS_HUGETLB,
 	IB_ACCESS_RELAXED_ORDERING = IB_UVERBS_ACCESS_RELAXED_ORDERING,
+	IB_ACCESS_FLUSH_GLOBAL = IB_UVERBS_ACCESS_FLUSH_GLOBAL,
+	IB_ACCESS_FLUSH_PERSISTENT = IB_UVERBS_ACCESS_FLUSH_PERSISTENT,
+	IB_ACCESS_FLUSHABLE = IB_ACCESS_FLUSH_GLOBAL |
+			      IB_ACCESS_FLUSH_PERSISTENT,
 
 	IB_ACCESS_OPTIONAL = IB_UVERBS_ACCESS_OPTIONAL_RANGE,
 	IB_ACCESS_SUPPORTED =
-		((IB_ACCESS_HUGETLB << 1) - 1) | IB_ACCESS_OPTIONAL,
+		((IB_ACCESS_FLUSH_PERSISTENT << 1) - 1) | IB_ACCESS_OPTIONAL,
 };
 
 /*
@@ -4321,6 +4330,8 @@ int ib_dealloc_xrcd_user(struct ib_xrcd *xrcd, struct ib_udata *udata);
 static inline int ib_check_mr_access(struct ib_device *ib_dev,
 				     unsigned int flags)
 {
+	u64 device_cap = ib_dev->attrs.device_cap_flags;
+
 	/*
 	 * Local write permission is required if remote write or
 	 * remote atomic permission is also requested.
@@ -4335,6 +4346,13 @@ static inline int ib_check_mr_access(struct ib_device *ib_dev,
 	if (flags & IB_ACCESS_ON_DEMAND &&
 	    !(ib_dev->attrs.kernel_cap_flags & IBK_ON_DEMAND_PAGING))
 		return -EOPNOTSUPP;
+
+	if ((flags & IB_ACCESS_FLUSH_GLOBAL &&
+	    !(device_cap & IB_DEVICE_FLUSH_GLOBAL)) ||
+	    (flags & IB_ACCESS_FLUSH_PERSISTENT &&
+	    !(device_cap & IB_DEVICE_FLUSH_PERSISTENT)))
+		return -EOPNOTSUPP;
+
 	return 0;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [for-next PATCH v6 03/10] RDMA/rxe: Extend rxe user ABI to support flush
  2022-11-16  8:19 [for-next PATCH v6 00/10] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
  2022-11-16  8:19 ` [for-next PATCH v6 01/10] RDMA: Extend RDMA user ABI to support flush Li Zhijian
  2022-11-16  8:19 ` [for-next PATCH v6 02/10] RDMA: Extend RDMA kernel verbs " Li Zhijian
@ 2022-11-16  8:19 ` Li Zhijian
  2022-11-16  8:19 ` [for-next PATCH v6 04/10] RDMA/rxe: Allow registering persistent flag for pmem MR only Li Zhijian
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2022-11-16  8:19 UTC (permalink / raw)
  To: zyjzyj2000, jgg, leon, Bob Pearson, linux-rdma
  Cc: Mark Bloch, Tom Talpey, tomasz.gromadzki, Dan Williams,
	Xiao Yang, y-goto, linux-kernel, Li Zhijian

This commit extends the rxe user ABI to support the flush
operation defined in IBA A19.4.1. These changes are
backwards compatible with the existing rxe user ABI.

The user API request a flush by filling this structure.

Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V5: new patch split scheme, suggested by Bob
---
 include/uapi/rdma/rdma_user_rxe.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/uapi/rdma/rdma_user_rxe.h b/include/uapi/rdma/rdma_user_rxe.h
index 73f679dfd2df..e2b93df94590 100644
--- a/include/uapi/rdma/rdma_user_rxe.h
+++ b/include/uapi/rdma/rdma_user_rxe.h
@@ -82,6 +82,13 @@ struct rxe_send_wr {
 		__u32		invalidate_rkey;
 	} ex;
 	union {
+		struct {
+			__aligned_u64 remote_addr;
+			__u32	length;
+			__u32	rkey;
+			__u8	type;
+			__u8	level;
+		} flush;
 		struct {
 			__aligned_u64 remote_addr;
 			__u32	rkey;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [for-next PATCH v6 04/10] RDMA/rxe: Allow registering persistent flag for pmem MR only
  2022-11-16  8:19 [for-next PATCH v6 00/10] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (2 preceding siblings ...)
  2022-11-16  8:19 ` [for-next PATCH v6 03/10] RDMA/rxe: Extend rxe user " Li Zhijian
@ 2022-11-16  8:19 ` Li Zhijian
  2022-11-22 14:46   ` Jason Gunthorpe
  2022-11-16  8:19 ` [for-next PATCH v6 05/10] RDMA/rxe: Extend rxe packet format to support flush Li Zhijian
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Li Zhijian @ 2022-11-16  8:19 UTC (permalink / raw)
  To: zyjzyj2000, jgg, leon, Bob Pearson, linux-rdma
  Cc: Mark Bloch, Tom Talpey, tomasz.gromadzki, Dan Williams,
	Xiao Yang, y-goto, linux-kernel, Li Zhijian

Memory region could  support at most 2 flush access flags:
IB_ACCESS_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL

But we only allow user to register persistent flush flags to the pmem MR
where it has the ability of persisting data across power cycles.

So register a persistent access flag to a non-pmem MR will be rejected.

CC: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V6: Minimize pmem checking side effect # Jason
V5: make sure the whole MR is pmem # Bob
V4: set is_pmem more simple
V2: new scheme check is_pmem # Dan
    update commit message, get rid of confusing ib_check_flush_access_flags() # Tom
---
 drivers/infiniband/sw/rxe/rxe_mr.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index bc081002bddc..fd423c015be0 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -111,6 +111,15 @@ void rxe_mr_init_dma(int access, struct rxe_mr *mr)
 	mr->ibmr.type = IB_MR_TYPE_DMA;
 }
 
+static bool is_pmem_page(struct page *pg)
+{
+	unsigned long paddr = page_to_phys(pg);;
+
+	return REGION_INTERSECTS ==
+	       region_intersects(paddr, PAGE_SIZE, IORESOURCE_MEM,
+				 IORES_DESC_PERSISTENT_MEMORY);
+}
+
 int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 		     int access, struct rxe_mr *mr)
 {
@@ -148,16 +157,25 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 	num_buf			= 0;
 	map = mr->map;
 	if (length > 0) {
-		buf = map[0]->buf;
+		bool persistent_access = access & IB_ACCESS_FLUSH_PERSISTENT;
 
+		buf = map[0]->buf;
 		for_each_sgtable_page (&umem->sgt_append.sgt, &sg_iter, 0) {
+			struct page *pg = sg_page_iter_page(&sg_iter);
+
+			if (persistent_access && !is_pmem_page(pg)) {
+				pr_debug("Unable to register persistent access to non-pmem device\n");
+				err = -EINVAL;
+				goto err_release_umem;
+			}
+
 			if (num_buf >= RXE_BUF_PER_MAP) {
 				map++;
 				buf = map[0]->buf;
 				num_buf = 0;
 			}
 
-			vaddr = page_address(sg_page_iter_page(&sg_iter));
+			vaddr = page_address(pg);
 			if (!vaddr) {
 				pr_warn("%s: Unable to get virtual address\n",
 						__func__);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [for-next PATCH v6 05/10] RDMA/rxe: Extend rxe packet format to support flush
  2022-11-16  8:19 [for-next PATCH v6 00/10] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (3 preceding siblings ...)
  2022-11-16  8:19 ` [for-next PATCH v6 04/10] RDMA/rxe: Allow registering persistent flag for pmem MR only Li Zhijian
@ 2022-11-16  8:19 ` Li Zhijian
  2022-11-16  8:19 ` [for-next PATCH v6 06/10] RDMA/rxe: Implement RC RDMA FLUSH service in requester side Li Zhijian
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2022-11-16  8:19 UTC (permalink / raw)
  To: zyjzyj2000, jgg, leon, Bob Pearson, linux-rdma
  Cc: Mark Bloch, Tom Talpey, tomasz.gromadzki, Dan Williams,
	Xiao Yang, y-goto, linux-kernel, Li Zhijian

Extend rxe opcode tables, headers, helper and constants to support
flush operations.

Refer to the IBA A19.4.1 for more FETH definition details

Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V5: new FETH structure and simplify header helper
    new names and new patch split scheme, suggested by Bob.
---
 drivers/infiniband/sw/rxe/rxe_hdr.h    | 47 ++++++++++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_opcode.c | 17 ++++++++++
 drivers/infiniband/sw/rxe/rxe_opcode.h | 16 +++++----
 3 files changed, 74 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_hdr.h b/drivers/infiniband/sw/rxe/rxe_hdr.h
index e432f9e37795..e995a97c54fd 100644
--- a/drivers/infiniband/sw/rxe/rxe_hdr.h
+++ b/drivers/infiniband/sw/rxe/rxe_hdr.h
@@ -607,6 +607,52 @@ static inline void reth_set_len(struct rxe_pkt_info *pkt, u32 len)
 		rxe_opcode[pkt->opcode].offset[RXE_RETH], len);
 }
 
+/******************************************************************************
+ * FLUSH Extended Transport Header
+ ******************************************************************************/
+
+struct rxe_feth {
+	__be32 bits;
+};
+
+#define FETH_PLT_MASK		(0x0000000f) /* bits 3-0 */
+#define FETH_SEL_MASK		(0x00000030) /* bits 5-4 */
+#define FETH_SEL_SHIFT		(4U)
+
+static inline u32 __feth_plt(void *arg)
+{
+	struct rxe_feth *feth = arg;
+
+	return be32_to_cpu(feth->bits) & FETH_PLT_MASK;
+}
+
+static inline u32 __feth_sel(void *arg)
+{
+	struct rxe_feth *feth = arg;
+
+	return (be32_to_cpu(feth->bits) & FETH_SEL_MASK) >> FETH_SEL_SHIFT;
+}
+
+static inline u32 feth_plt(struct rxe_pkt_info *pkt)
+{
+	return __feth_plt(pkt->hdr + rxe_opcode[pkt->opcode].offset[RXE_FETH]);
+}
+
+static inline u32 feth_sel(struct rxe_pkt_info *pkt)
+{
+	return __feth_sel(pkt->hdr + rxe_opcode[pkt->opcode].offset[RXE_FETH]);
+}
+
+static inline void feth_init(struct rxe_pkt_info *pkt, u8 type, u8 level)
+{
+	struct rxe_feth *feth = (struct rxe_feth *)
+		    (pkt->hdr + rxe_opcode[pkt->opcode].offset[RXE_FETH]);
+	u32 bits = ((level << FETH_SEL_SHIFT) & FETH_SEL_MASK) |
+		   (type & FETH_PLT_MASK);
+
+	feth->bits = cpu_to_be32(bits);
+}
+
 /******************************************************************************
  * Atomic Extended Transport Header
  ******************************************************************************/
@@ -910,6 +956,7 @@ enum rxe_hdr_length {
 	RXE_ATMETH_BYTES	= sizeof(struct rxe_atmeth),
 	RXE_IETH_BYTES		= sizeof(struct rxe_ieth),
 	RXE_RDETH_BYTES		= sizeof(struct rxe_rdeth),
+	RXE_FETH_BYTES		= sizeof(struct rxe_feth),
 };
 
 static inline size_t header_size(struct rxe_pkt_info *pkt)
diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.c b/drivers/infiniband/sw/rxe/rxe_opcode.c
index d4ba4d506f17..55aad13e57bb 100644
--- a/drivers/infiniband/sw/rxe/rxe_opcode.c
+++ b/drivers/infiniband/sw/rxe/rxe_opcode.c
@@ -101,6 +101,12 @@ struct rxe_wr_opcode_info rxe_wr_opcode_info[] = {
 			[IB_QPT_UC]	= WR_LOCAL_OP_MASK,
 		},
 	},
+	[IB_WR_FLUSH]					= {
+		.name   = "IB_WR_FLUSH",
+		.mask   = {
+			[IB_QPT_RC]	= WR_FLUSH_MASK,
+		},
+	},
 };
 
 struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = {
@@ -378,6 +384,17 @@ struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = {
 					  RXE_IETH_BYTES,
 		}
 	},
+	[IB_OPCODE_RC_FLUSH]					= {
+		.name	= "IB_OPCODE_RC_FLUSH",
+		.mask	= RXE_FETH_MASK | RXE_RETH_MASK | RXE_FLUSH_MASK |
+			  RXE_START_MASK | RXE_END_MASK | RXE_REQ_MASK,
+		.length = RXE_BTH_BYTES + RXE_FETH_BYTES + RXE_RETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_FETH]	= RXE_BTH_BYTES,
+			[RXE_RETH]	= RXE_BTH_BYTES + RXE_FETH_BYTES,
+		}
+	},
 
 	/* UC */
 	[IB_OPCODE_UC_SEND_FIRST]			= {
diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.h b/drivers/infiniband/sw/rxe/rxe_opcode.h
index 8f9aaaf260f2..02d256745793 100644
--- a/drivers/infiniband/sw/rxe/rxe_opcode.h
+++ b/drivers/infiniband/sw/rxe/rxe_opcode.h
@@ -19,7 +19,8 @@ enum rxe_wr_mask {
 	WR_SEND_MASK			= BIT(2),
 	WR_READ_MASK			= BIT(3),
 	WR_WRITE_MASK			= BIT(4),
-	WR_LOCAL_OP_MASK		= BIT(5),
+	WR_FLUSH_MASK			= BIT(5),
+	WR_LOCAL_OP_MASK		= BIT(6),
 
 	WR_READ_OR_WRITE_MASK		= WR_READ_MASK | WR_WRITE_MASK,
 	WR_WRITE_OR_SEND_MASK		= WR_WRITE_MASK | WR_SEND_MASK,
@@ -47,6 +48,7 @@ enum rxe_hdr_type {
 	RXE_RDETH,
 	RXE_DETH,
 	RXE_IMMDT,
+	RXE_FETH,
 	RXE_PAYLOAD,
 	NUM_HDR_TYPES
 };
@@ -63,6 +65,7 @@ enum rxe_hdr_mask {
 	RXE_IETH_MASK		= BIT(RXE_IETH),
 	RXE_RDETH_MASK		= BIT(RXE_RDETH),
 	RXE_DETH_MASK		= BIT(RXE_DETH),
+	RXE_FETH_MASK		= BIT(RXE_FETH),
 	RXE_PAYLOAD_MASK	= BIT(RXE_PAYLOAD),
 
 	RXE_REQ_MASK		= BIT(NUM_HDR_TYPES + 0),
@@ -71,13 +74,14 @@ enum rxe_hdr_mask {
 	RXE_WRITE_MASK		= BIT(NUM_HDR_TYPES + 3),
 	RXE_READ_MASK		= BIT(NUM_HDR_TYPES + 4),
 	RXE_ATOMIC_MASK		= BIT(NUM_HDR_TYPES + 5),
+	RXE_FLUSH_MASK		= BIT(NUM_HDR_TYPES + 6),
 
-	RXE_RWR_MASK		= BIT(NUM_HDR_TYPES + 6),
-	RXE_COMP_MASK		= BIT(NUM_HDR_TYPES + 7),
+	RXE_RWR_MASK		= BIT(NUM_HDR_TYPES + 7),
+	RXE_COMP_MASK		= BIT(NUM_HDR_TYPES + 8),
 
-	RXE_START_MASK		= BIT(NUM_HDR_TYPES + 8),
-	RXE_MIDDLE_MASK		= BIT(NUM_HDR_TYPES + 9),
-	RXE_END_MASK		= BIT(NUM_HDR_TYPES + 10),
+	RXE_START_MASK		= BIT(NUM_HDR_TYPES + 9),
+	RXE_MIDDLE_MASK		= BIT(NUM_HDR_TYPES + 10),
+	RXE_END_MASK		= BIT(NUM_HDR_TYPES + 11),
 
 	RXE_LOOPBACK_MASK	= BIT(NUM_HDR_TYPES + 12),
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [for-next PATCH v6 06/10] RDMA/rxe: Implement RC RDMA FLUSH service in requester side
  2022-11-16  8:19 [for-next PATCH v6 00/10] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (4 preceding siblings ...)
  2022-11-16  8:19 ` [for-next PATCH v6 05/10] RDMA/rxe: Extend rxe packet format to support flush Li Zhijian
@ 2022-11-16  8:19 ` Li Zhijian
  2022-11-16  8:19 ` [for-next PATCH v6 07/10] RDMA/rxe: Implement flush execution in responder side Li Zhijian
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2022-11-16  8:19 UTC (permalink / raw)
  To: zyjzyj2000, jgg, leon, Bob Pearson, linux-rdma
  Cc: Mark Bloch, Tom Talpey, tomasz.gromadzki, Dan Williams,
	Xiao Yang, y-goto, linux-kernel, Li Zhijian

Implement FLUSH request operation in the requester.

Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V4: Remove flush union for legecy API, add WR_FLUSH_MASK
V3: Fix sparse: incorrect type in assignment; Reported-by: kernel test robot <lkp@intel.com>
V2: extend flush to include length field.
---
 drivers/infiniband/sw/rxe/rxe_req.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 41f1d84f0acb..b4e098298f25 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -241,6 +241,9 @@ static int next_opcode_rc(struct rxe_qp *qp, u32 opcode, int fits)
 				IB_OPCODE_RC_SEND_ONLY_WITH_IMMEDIATE :
 				IB_OPCODE_RC_SEND_FIRST;
 
+	case IB_WR_FLUSH:
+		return IB_OPCODE_RC_FLUSH;
+
 	case IB_WR_RDMA_READ:
 		return IB_OPCODE_RC_RDMA_READ_REQUEST;
 
@@ -421,11 +424,18 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
 
 	/* init optional headers */
 	if (pkt->mask & RXE_RETH_MASK) {
-		reth_set_rkey(pkt, ibwr->wr.rdma.rkey);
+		if (pkt->mask & RXE_FETH_MASK)
+			reth_set_rkey(pkt, ibwr->wr.flush.rkey);
+		else
+			reth_set_rkey(pkt, ibwr->wr.rdma.rkey);
 		reth_set_va(pkt, wqe->iova);
 		reth_set_len(pkt, wqe->dma.resid);
 	}
 
+	/* Fill Flush Extension Transport Header */
+	if (pkt->mask & RXE_FETH_MASK)
+		feth_init(pkt, ibwr->wr.flush.type, ibwr->wr.flush.level);
+
 	if (pkt->mask & RXE_IMMDT_MASK)
 		immdt_set_imm(pkt, ibwr->ex.imm_data);
 
@@ -484,6 +494,9 @@ static int finish_packet(struct rxe_qp *qp, struct rxe_av *av,
 
 			memset(pad, 0, bth_pad(pkt));
 		}
+	} else if (pkt->mask & RXE_FLUSH_MASK) {
+		/* oA19-2: shall have no payload. */
+		wqe->dma.resid = 0;
 	}
 
 	return 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [for-next PATCH v6 07/10] RDMA/rxe: Implement flush execution in responder side
  2022-11-16  8:19 [for-next PATCH v6 00/10] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (5 preceding siblings ...)
  2022-11-16  8:19 ` [for-next PATCH v6 06/10] RDMA/rxe: Implement RC RDMA FLUSH service in requester side Li Zhijian
@ 2022-11-16  8:19 ` Li Zhijian
  2022-11-16  8:19 ` [for-next PATCH v6 08/10] RDMA/rxe: Implement flush completion Li Zhijian
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2022-11-16  8:19 UTC (permalink / raw)
  To: zyjzyj2000, jgg, leon, Bob Pearson, linux-rdma
  Cc: Mark Bloch, Tom Talpey, tomasz.gromadzki, Dan Williams,
	Xiao Yang, y-goto, linux-kernel, Li Zhijian

Only the requested placement types that also registered in the destination
memory region are acceptable.
Otherwise, responder will also reply NAK "Remote Access Error" if it
found a placement type violation.

We will persist data via arch_wb_cache_pmem(), which could be
architecture specific.

This commit also add 2 helpers to update qp.resp from the incoming packet.

Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
v6: call iova_to_vaddr to transform iova
v5: add QP attr check for flush access
    rename flush_nvdimm_iova -> rxe_flush_pmem_iova()
v4: add send_read_response_ack and flush resource
---
 drivers/infiniband/sw/rxe/rxe_loc.h   |   1 +
 drivers/infiniband/sw/rxe/rxe_mr.c    |  36 ++++++
 drivers/infiniband/sw/rxe/rxe_resp.c  | 176 +++++++++++++++++++++++---
 drivers/infiniband/sw/rxe/rxe_verbs.h |   6 +
 4 files changed, 199 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index c2a5c8814a48..944d564a11cd 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -68,6 +68,7 @@ void rxe_mr_init_dma(int access, struct rxe_mr *mr);
 int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 		     int access, struct rxe_mr *mr);
 int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr);
+int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, int length);
 int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length,
 		enum rxe_mr_copy_dir dir);
 int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma,
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index fd423c015be0..592965ee89fa 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -4,6 +4,8 @@
  * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
  */
 
+#include <linux/libnvdimm.h>
+
 #include "rxe.h"
 #include "rxe_loc.h"
 
@@ -196,6 +198,7 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 	mr->offset = ib_umem_offset(umem);
 	mr->state = RXE_MR_STATE_VALID;
 	mr->ibmr.type = IB_MR_TYPE_USER;
+	mr->ibmr.page_size = PAGE_SIZE;
 
 	return 0;
 
@@ -303,6 +306,39 @@ void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length)
 	return addr;
 }
 
+int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, int length)
+{
+	size_t offset;
+
+	if (length == 0)
+		return 0;
+
+	if (mr->ibmr.type == IB_MR_TYPE_DMA)
+		return -EFAULT;
+
+	offset = (iova - mr->ibmr.iova + mr->offset) & mr->page_mask;
+	while (length > 0) {
+		u8 *va;
+		int bytes;
+
+		bytes = mr->ibmr.page_size - offset;
+		if (bytes > length)
+			bytes = length;
+
+		va = iova_to_vaddr(mr, iova, length);
+		if (!va)
+			return -EFAULT;
+
+		arch_wb_cache_pmem(va, bytes);
+
+		length -= bytes;
+		iova += bytes;
+		offset = 0;
+	}
+
+	return 0;
+}
+
 /* copy data from a range (vaddr, vaddr+length-1) to or from
  * a mr object starting at iova.
  */
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 8caa9941e70e..43cf3fb04674 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -22,6 +22,7 @@ enum resp_states {
 	RESPST_EXECUTE,
 	RESPST_READ_REPLY,
 	RESPST_ATOMIC_REPLY,
+	RESPST_PROCESS_FLUSH,
 	RESPST_COMPLETE,
 	RESPST_ACKNOWLEDGE,
 	RESPST_CLEANUP,
@@ -57,6 +58,7 @@ static char *resp_state_name[] = {
 	[RESPST_EXECUTE]			= "EXECUTE",
 	[RESPST_READ_REPLY]			= "READ_REPLY",
 	[RESPST_ATOMIC_REPLY]			= "ATOMIC_REPLY",
+	[RESPST_PROCESS_FLUSH]			= "PROCESS_FLUSH",
 	[RESPST_COMPLETE]			= "COMPLETE",
 	[RESPST_ACKNOWLEDGE]			= "ACKNOWLEDGE",
 	[RESPST_CLEANUP]			= "CLEANUP",
@@ -256,19 +258,38 @@ static enum resp_states check_op_seq(struct rxe_qp *qp,
 	}
 }
 
+static bool check_qp_attr_access(struct rxe_qp *qp,
+				 struct rxe_pkt_info *pkt)
+{
+	if (((pkt->mask & RXE_READ_MASK) &&
+	     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_READ)) ||
+	    ((pkt->mask & RXE_WRITE_MASK) &&
+	     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_WRITE)) ||
+	    ((pkt->mask & RXE_ATOMIC_MASK) &&
+	     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_ATOMIC))) {
+		return false;
+	}
+
+	if (pkt->mask & RXE_FLUSH_MASK) {
+		u32 flush_type = feth_plt(pkt);
+
+		if ((flush_type & IB_FLUSH_GLOBAL &&
+		     !(qp->attr.qp_access_flags & IB_ACCESS_FLUSH_GLOBAL)) ||
+		    (flush_type & IB_FLUSH_PERSISTENT &&
+		     !(qp->attr.qp_access_flags & IB_ACCESS_FLUSH_PERSISTENT)))
+			return false;
+	}
+
+	return true;
+}
+
 static enum resp_states check_op_valid(struct rxe_qp *qp,
 				       struct rxe_pkt_info *pkt)
 {
 	switch (qp_type(qp)) {
 	case IB_QPT_RC:
-		if (((pkt->mask & RXE_READ_MASK) &&
-		     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_READ)) ||
-		    ((pkt->mask & RXE_WRITE_MASK) &&
-		     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_WRITE)) ||
-		    ((pkt->mask & RXE_ATOMIC_MASK) &&
-		     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_ATOMIC))) {
+		if (!check_qp_attr_access(qp, pkt))
 			return RESPST_ERR_UNSUPPORTED_OPCODE;
-		}
 
 		break;
 
@@ -425,6 +446,23 @@ static enum resp_states check_length(struct rxe_qp *qp,
 	return RESPST_CHK_RKEY;
 }
 
+static void qp_resp_from_reth(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
+{
+	qp->resp.va = reth_va(pkt);
+	qp->resp.offset = 0;
+	qp->resp.rkey = reth_rkey(pkt);
+	qp->resp.resid = reth_len(pkt);
+	qp->resp.length = reth_len(pkt);
+}
+
+static void qp_resp_from_atmeth(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
+{
+	qp->resp.va = atmeth_va(pkt);
+	qp->resp.offset = 0;
+	qp->resp.rkey = atmeth_rkey(pkt);
+	qp->resp.resid = sizeof(u64);
+}
+
 static enum resp_states check_rkey(struct rxe_qp *qp,
 				   struct rxe_pkt_info *pkt)
 {
@@ -436,23 +474,26 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 	u32 pktlen;
 	int mtu = qp->mtu;
 	enum resp_states state;
-	int access;
+	int access = 0;
 
 	if (pkt->mask & RXE_READ_OR_WRITE_MASK) {
-		if (pkt->mask & RXE_RETH_MASK) {
-			qp->resp.va = reth_va(pkt);
-			qp->resp.offset = 0;
-			qp->resp.rkey = reth_rkey(pkt);
-			qp->resp.resid = reth_len(pkt);
-			qp->resp.length = reth_len(pkt);
-		}
+		if (pkt->mask & RXE_RETH_MASK)
+			qp_resp_from_reth(qp, pkt);
+
 		access = (pkt->mask & RXE_READ_MASK) ? IB_ACCESS_REMOTE_READ
 						     : IB_ACCESS_REMOTE_WRITE;
+	} else if (pkt->mask & RXE_FLUSH_MASK) {
+		u32 flush_type = feth_plt(pkt);
+
+		if (pkt->mask & RXE_RETH_MASK)
+			qp_resp_from_reth(qp, pkt);
+
+		if (flush_type & IB_FLUSH_GLOBAL)
+			access |= IB_ACCESS_FLUSH_GLOBAL;
+		if (flush_type & IB_FLUSH_PERSISTENT)
+			access |= IB_ACCESS_FLUSH_PERSISTENT;
 	} else if (pkt->mask & RXE_ATOMIC_MASK) {
-		qp->resp.va = atmeth_va(pkt);
-		qp->resp.offset = 0;
-		qp->resp.rkey = atmeth_rkey(pkt);
-		qp->resp.resid = sizeof(u64);
+		qp_resp_from_atmeth(qp, pkt);
 		access = IB_ACCESS_REMOTE_ATOMIC;
 	} else {
 		return RESPST_EXECUTE;
@@ -501,12 +542,21 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 		}
 	}
 
+	if (pkt->mask & RXE_FLUSH_MASK) {
+		/* FLUSH MR may not set va or resid
+		 * no need to check range since we will flush whole mr
+		 */
+		if (feth_sel(pkt) == IB_FLUSH_MR)
+			goto skip_check_range;
+	}
+
 	if (mr_check_range(mr, va + qp->resp.offset, resid)) {
 		state = RESPST_ERR_RKEY_VIOLATION;
 		goto err;
 	}
 
-	if (pkt->mask & RXE_WRITE_MASK)	 {
+skip_check_range:
+	if (pkt->mask & RXE_WRITE_MASK) {
 		if (resid > mtu) {
 			if (pktlen != mtu || bth_pad(pkt)) {
 				state = RESPST_ERR_LENGTH;
@@ -610,11 +660,61 @@ static struct resp_res *rxe_prepare_res(struct rxe_qp *qp,
 		res->last_psn = pkt->psn;
 		res->cur_psn = pkt->psn;
 		break;
+	case RXE_FLUSH_MASK:
+		res->flush.va = qp->resp.va + qp->resp.offset;
+		res->flush.length = qp->resp.length;
+		res->flush.type = feth_plt(pkt);
+		res->flush.level = feth_sel(pkt);
 	}
 
 	return res;
 }
 
+static enum resp_states process_flush(struct rxe_qp *qp,
+				       struct rxe_pkt_info *pkt)
+{
+	u64 length, start;
+	struct rxe_mr *mr = qp->resp.mr;
+	struct resp_res *res = qp->resp.res;
+
+	/* oA19-14, oA19-15 */
+	if (res && res->replay)
+		return RESPST_ACKNOWLEDGE;
+	else if (!res) {
+		res = rxe_prepare_res(qp, pkt, RXE_FLUSH_MASK);
+		qp->resp.res = res;
+	}
+
+	if (res->flush.level == IB_FLUSH_RANGE) {
+		start = res->flush.va;
+		length = res->flush.length;
+	} else { /* level == IB_FLUSH_MR */
+		start = mr->ibmr.iova;
+		length = mr->ibmr.length;
+	}
+
+	if (res->flush.type & IB_FLUSH_PERSISTENT) {
+		if (rxe_flush_pmem_iova(mr, start, length))
+			return RESPST_ERR_RKEY_VIOLATION;
+		/* Make data persistent. */
+		wmb();
+	} else if (res->flush.type & IB_FLUSH_GLOBAL) {
+		/* Make data global visibility. */
+		wmb();
+	}
+
+	qp->resp.msn++;
+
+	/* next expected psn, read handles this separately */
+	qp->resp.psn = (pkt->psn + 1) & BTH_PSN_MASK;
+	qp->resp.ack_psn = qp->resp.psn;
+
+	qp->resp.opcode = pkt->opcode;
+	qp->resp.status = IB_WC_SUCCESS;
+
+	return RESPST_ACKNOWLEDGE;
+}
+
 /* Guarantee atomicity of atomic operations at the machine level. */
 static DEFINE_SPINLOCK(atomic_ops_lock);
 
@@ -916,6 +1016,8 @@ static enum resp_states execute(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
 		return RESPST_READ_REPLY;
 	} else if (pkt->mask & RXE_ATOMIC_MASK) {
 		return RESPST_ATOMIC_REPLY;
+	} else if (pkt->mask & RXE_FLUSH_MASK) {
+		return RESPST_PROCESS_FLUSH;
 	} else {
 		/* Unreachable */
 		WARN_ON_ONCE(1);
@@ -1089,6 +1191,19 @@ static int send_atomic_ack(struct rxe_qp *qp, u8 syndrome, u32 psn)
 	return ret;
 }
 
+static int send_read_response_ack(struct rxe_qp *qp, u8 syndrome, u32 psn)
+{
+	int ret = send_common_ack(qp, syndrome, psn,
+			IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY,
+			"RDMA READ response of length zero ACK");
+
+	/* have to clear this since it is used to trigger
+	 * long read replies
+	 */
+	qp->resp.res = NULL;
+	return ret;
+}
+
 static enum resp_states acknowledge(struct rxe_qp *qp,
 				    struct rxe_pkt_info *pkt)
 {
@@ -1099,6 +1214,8 @@ static enum resp_states acknowledge(struct rxe_qp *qp,
 		send_ack(qp, qp->resp.aeth_syndrome, pkt->psn);
 	else if (pkt->mask & RXE_ATOMIC_MASK)
 		send_atomic_ack(qp, AETH_ACK_UNLIMITED, pkt->psn);
+	else if (pkt->mask & RXE_FLUSH_MASK)
+		send_read_response_ack(qp, AETH_ACK_UNLIMITED, pkt->psn);
 	else if (bth_ack(pkt))
 		send_ack(qp, AETH_ACK_UNLIMITED, pkt->psn);
 
@@ -1155,6 +1272,22 @@ static enum resp_states duplicate_request(struct rxe_qp *qp,
 		/* SEND. Ack again and cleanup. C9-105. */
 		send_ack(qp, AETH_ACK_UNLIMITED, prev_psn);
 		return RESPST_CLEANUP;
+	} else if (pkt->mask & RXE_FLUSH_MASK) {
+		struct resp_res *res;
+
+		/* Find the operation in our list of responder resources. */
+		res = find_resource(qp, pkt->psn);
+		if (res) {
+			res->replay = 1;
+			res->cur_psn = pkt->psn;
+			qp->resp.res = res;
+			rc = RESPST_PROCESS_FLUSH;
+			goto out;
+		}
+
+		/* Resource not found. Class D error. Drop the request. */
+		rc = RESPST_CLEANUP;
+		goto out;
 	} else if (pkt->mask & RXE_READ_MASK) {
 		struct resp_res *res;
 
@@ -1348,6 +1481,9 @@ int rxe_responder(void *arg)
 		case RESPST_ATOMIC_REPLY:
 			state = atomic_reply(qp, pkt);
 			break;
+		case RESPST_PROCESS_FLUSH:
+			state = process_flush(qp, pkt);
+			break;
 		case RESPST_ACKNOWLEDGE:
 			state = acknowledge(qp, pkt);
 			break;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 22a299b0a9f0..19ddfa890480 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -165,6 +165,12 @@ struct resp_res {
 			u64		va;
 			u32		resid;
 		} read;
+		struct {
+			u32		length;
+			u64		va;
+			u8		type;
+			u8		level;
+		} flush;
 	};
 };
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [for-next PATCH v6 08/10] RDMA/rxe: Implement flush completion
  2022-11-16  8:19 [for-next PATCH v6 00/10] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (6 preceding siblings ...)
  2022-11-16  8:19 ` [for-next PATCH v6 07/10] RDMA/rxe: Implement flush execution in responder side Li Zhijian
@ 2022-11-16  8:19 ` Li Zhijian
  2022-11-16  8:19 ` [for-next PATCH v6 09/10] RDMA/cm: Make QP FLUSHABLE Li Zhijian
  2022-11-16  8:19 ` [for-next PATCH v6 10/10] RDMA/rxe: Enable RDMA FLUSH capability for rxe device Li Zhijian
  9 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2022-11-16  8:19 UTC (permalink / raw)
  To: zyjzyj2000, jgg, leon, Bob Pearson, linux-rdma
  Cc: Mark Bloch, Tom Talpey, tomasz.gromadzki, Dan Williams,
	Xiao Yang, y-goto, linux-kernel, Li Zhijian

Per IBA SPEC, FLUSH will ack in rdma read response with 0 length.

Use IB_WC_FLUSH (aka IB_UVERBS_WC_FLUSH) code to tell userspace a FLUSH
completion.

Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 66f392810c86..12252ac6b5c2 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -104,6 +104,7 @@ static enum ib_wc_opcode wr_to_wc_opcode(enum ib_wr_opcode opcode)
 	case IB_WR_LOCAL_INV:			return IB_WC_LOCAL_INV;
 	case IB_WR_REG_MR:			return IB_WC_REG_MR;
 	case IB_WR_BIND_MW:			return IB_WC_BIND_MW;
+	case IB_WR_FLUSH:			return IB_WC_FLUSH;
 
 	default:
 		return 0xff;
@@ -274,7 +275,8 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
 		 */
 	case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE:
 		if (wqe->wr.opcode != IB_WR_RDMA_READ &&
-		    wqe->wr.opcode != IB_WR_RDMA_READ_WITH_INV) {
+		    wqe->wr.opcode != IB_WR_RDMA_READ_WITH_INV &&
+		    wqe->wr.opcode != IB_WR_FLUSH) {
 			wqe->status = IB_WC_FATAL_ERR;
 			return COMPST_ERROR;
 		}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [for-next PATCH v6 09/10] RDMA/cm: Make QP FLUSHABLE
  2022-11-16  8:19 [for-next PATCH v6 00/10] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (7 preceding siblings ...)
  2022-11-16  8:19 ` [for-next PATCH v6 08/10] RDMA/rxe: Implement flush completion Li Zhijian
@ 2022-11-16  8:19 ` Li Zhijian
  2022-11-22 14:52   ` Jason Gunthorpe
  2022-11-16  8:19 ` [for-next PATCH v6 10/10] RDMA/rxe: Enable RDMA FLUSH capability for rxe device Li Zhijian
  9 siblings, 1 reply; 22+ messages in thread
From: Li Zhijian @ 2022-11-16  8:19 UTC (permalink / raw)
  To: zyjzyj2000, jgg, leon, Bob Pearson, linux-rdma
  Cc: Mark Bloch, Tom Talpey, tomasz.gromadzki, Dan Williams,
	Xiao Yang, y-goto, linux-kernel, Li Zhijian

It enables flushable access flag for qp

Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V5: new patch, inspired by Bob
---
 drivers/infiniband/core/cm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 1f9938a2c475..58837aac980b 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -4096,7 +4096,8 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
 		qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE;
 		if (cm_id_priv->responder_resources)
 			qp_attr->qp_access_flags |= IB_ACCESS_REMOTE_READ |
-						    IB_ACCESS_REMOTE_ATOMIC;
+						    IB_ACCESS_REMOTE_ATOMIC |
+						    IB_ACCESS_FLUSHABLE;
 		qp_attr->pkey_index = cm_id_priv->av.pkey_index;
 		if (cm_id_priv->av.port)
 			qp_attr->port_num = cm_id_priv->av.port->port_num;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [for-next PATCH v6 10/10] RDMA/rxe: Enable RDMA FLUSH capability for rxe device
  2022-11-16  8:19 [for-next PATCH v6 00/10] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (8 preceding siblings ...)
  2022-11-16  8:19 ` [for-next PATCH v6 09/10] RDMA/cm: Make QP FLUSHABLE Li Zhijian
@ 2022-11-16  8:19 ` Li Zhijian
  9 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2022-11-16  8:19 UTC (permalink / raw)
  To: zyjzyj2000, jgg, leon, Bob Pearson, linux-rdma
  Cc: Mark Bloch, Tom Talpey, tomasz.gromadzki, Dan Williams,
	Xiao Yang, y-goto, linux-kernel, Li Zhijian

Now we are ready to enable RDMA FLUSH capability for RXE.
It can support Global Visibility and Persistence placement types.

Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_param.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
index 86c7a8bf3cbb..c7a82823a041 100644
--- a/drivers/infiniband/sw/rxe/rxe_param.h
+++ b/drivers/infiniband/sw/rxe/rxe_param.h
@@ -51,7 +51,9 @@ enum rxe_device_param {
 					| IB_DEVICE_SRQ_RESIZE
 					| IB_DEVICE_MEM_MGT_EXTENSIONS
 					| IB_DEVICE_MEM_WINDOW
-					| IB_DEVICE_MEM_WINDOW_TYPE_2B,
+					| IB_DEVICE_MEM_WINDOW_TYPE_2B
+					| IB_DEVICE_FLUSH_GLOBAL
+					| IB_DEVICE_FLUSH_PERSISTENT,
 	RXE_MAX_SGE			= 32,
 	RXE_MAX_WQE_SIZE		= sizeof(struct rxe_send_wqe) +
 					  sizeof(struct ib_sge) * RXE_MAX_SGE,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [for-next PATCH v6 04/10] RDMA/rxe: Allow registering persistent flag for pmem MR only
  2022-11-16  8:19 ` [for-next PATCH v6 04/10] RDMA/rxe: Allow registering persistent flag for pmem MR only Li Zhijian
@ 2022-11-22 14:46   ` Jason Gunthorpe
  2022-11-23  6:12     ` lizhijian
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2022-11-22 14:46 UTC (permalink / raw)
  To: Li Zhijian
  Cc: zyjzyj2000, leon, Bob Pearson, linux-rdma, Mark Bloch,
	Tom Talpey, tomasz.gromadzki, Dan Williams, Xiao Yang, y-goto,
	linux-kernel

On Wed, Nov 16, 2022 at 04:19:45PM +0800, Li Zhijian wrote:
>  int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>  		     int access, struct rxe_mr *mr)
>  {
> @@ -148,16 +157,25 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>  	num_buf			= 0;
>  	map = mr->map;
>  	if (length > 0) {
> -		buf = map[0]->buf;
> +		bool persistent_access = access & IB_ACCESS_FLUSH_PERSISTENT;
>  
> +		buf = map[0]->buf;
>  		for_each_sgtable_page (&umem->sgt_append.sgt, &sg_iter, 0) {
> +			struct page *pg = sg_page_iter_page(&sg_iter);
> +
> +			if (persistent_access && !is_pmem_page(pg)) {
> +				pr_debug("Unable to register persistent access to non-pmem device\n");
> +				err = -EINVAL;
> +				goto err_release_umem;

This should use rxe_dbg_mr()

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [for-next PATCH v6 09/10] RDMA/cm: Make QP FLUSHABLE
  2022-11-16  8:19 ` [for-next PATCH v6 09/10] RDMA/cm: Make QP FLUSHABLE Li Zhijian
@ 2022-11-22 14:52   ` Jason Gunthorpe
  2022-11-23  6:07     ` lizhijian
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2022-11-22 14:52 UTC (permalink / raw)
  To: Li Zhijian
  Cc: zyjzyj2000, leon, Bob Pearson, linux-rdma, Mark Bloch,
	Tom Talpey, tomasz.gromadzki, Dan Williams, Xiao Yang, y-goto,
	linux-kernel

On Wed, Nov 16, 2022 at 04:19:50PM +0800, Li Zhijian wrote:
> It enables flushable access flag for qp
> 
> Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
> V5: new patch, inspired by Bob
> ---
>  drivers/infiniband/core/cm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index 1f9938a2c475..58837aac980b 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -4096,7 +4096,8 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
>  		qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE;
>  		if (cm_id_priv->responder_resources)
>  			qp_attr->qp_access_flags |= IB_ACCESS_REMOTE_READ |
> -						    IB_ACCESS_REMOTE_ATOMIC;
> +						    IB_ACCESS_REMOTE_ATOMIC |
> +						    IB_ACCESS_FLUSHABLE;

What is the point of this? Nothing checks IB_ACCESS_FLUSHABLE ?

Do flush ops require a responder resource?

Why should CM set it unconditionally?

Explain in the commit message

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [for-next PATCH v6 09/10] RDMA/cm: Make QP FLUSHABLE
  2022-11-22 14:52   ` Jason Gunthorpe
@ 2022-11-23  6:07     ` lizhijian
  2022-11-24 17:39       ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: lizhijian @ 2022-11-23  6:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: zyjzyj2000, leon, Bob Pearson, linux-rdma, Mark Bloch,
	Tom Talpey, tomasz.gromadzki, Dan Williams, yangx.jy,
	Yasunori Gotou (Fujitsu),
	linux-kernel



On 22/11/2022 22:52, Jason Gunthorpe wrote:
> On Wed, Nov 16, 2022 at 04:19:50PM +0800, Li Zhijian wrote:
>> It enables flushable access flag for qp
>>
>> Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>> ---
>> V5: new patch, inspired by Bob
>> ---
>>   drivers/infiniband/core/cm.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>> index 1f9938a2c475..58837aac980b 100644
>> --- a/drivers/infiniband/core/cm.c
>> +++ b/drivers/infiniband/core/cm.c
>> @@ -4096,7 +4096,8 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
>>   		qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE;
>>   		if (cm_id_priv->responder_resources)
>>   			qp_attr->qp_access_flags |= IB_ACCESS_REMOTE_READ |
>> -						    IB_ACCESS_REMOTE_ATOMIC;
>> +						    IB_ACCESS_REMOTE_ATOMIC |
>> +						    IB_ACCESS_FLUSHABLE;
> 
> What is the point of this? Nothing checks IB_ACCESS_FLUSHABLE ?

Previous, responder of RXE will check qp_access_flags in check_op_valid():
  256 static enum resp_states check_op_valid(struct rxe_qp *qp, 

  257                                        struct rxe_pkt_info *pkt) 

  258 { 

  259         switch (qp_type(qp)) { 

  260         case IB_QPT_RC: 

  261                 if (((pkt->mask & RXE_READ_MASK) && 

  262                      !(qp->attr.qp_access_flags & 
IB_ACCESS_REMOTE_READ)) || 
 

  263                     ((pkt->mask & RXE_WRITE_MASK) && 

  264                      !(qp->attr.qp_access_flags & 
IB_ACCESS_REMOTE_WRITE)) ||
  265                     ((pkt->mask & RXE_ATOMIC_MASK) && 

  266                      !(qp->attr.qp_access_flags & 
IB_ACCESS_REMOTE_ATOMIC))) {
  267                         return RESPST_ERR_UNSUPPORTED_OPCODE; 

  268                 }

based on this, additional IB_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL 
were added in patch7 since V5 suggested by Bob.
Because of this change, QP should become FLUSHABLE correspondingly.

> 
> Do flush ops require a responder resource?

Yes, i think so. See IBA spec, oA19-9: FLUSH shall consume a single 
responder...


> 
> Why should CM set it unconditionally?
> 

I had ever checked git history log of qp->qp_access_flags, they did as 
it's. So i also think qp_access_flags should accept all new IBA 
abilities unconditionally.

What do you think of this @Jason


Thanks
Zhijian
> Explain in the commit message
> 
> Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [for-next PATCH v6 04/10] RDMA/rxe: Allow registering persistent flag for pmem MR only
  2022-11-22 14:46   ` Jason Gunthorpe
@ 2022-11-23  6:12     ` lizhijian
  0 siblings, 0 replies; 22+ messages in thread
From: lizhijian @ 2022-11-23  6:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: zyjzyj2000, leon, Bob Pearson, linux-rdma, Mark Bloch,
	Tom Talpey, tomasz.gromadzki, Dan Williams, yangx.jy,
	Yasunori Gotou (Fujitsu),
	linux-kernel



On 22/11/2022 22:46, Jason Gunthorpe wrote:
> On Wed, Nov 16, 2022 at 04:19:45PM +0800, Li Zhijian wrote:
>>   int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>>   		     int access, struct rxe_mr *mr)
>>   {
>> @@ -148,16 +157,25 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>>   	num_buf			= 0;
>>   	map = mr->map;
>>   	if (length > 0) {
>> -		buf = map[0]->buf;
>> +		bool persistent_access = access & IB_ACCESS_FLUSH_PERSISTENT;
>>   
>> +		buf = map[0]->buf;
>>   		for_each_sgtable_page (&umem->sgt_append.sgt, &sg_iter, 0) {
>> +			struct page *pg = sg_page_iter_page(&sg_iter);
>> +
>> +			if (persistent_access && !is_pmem_page(pg)) {
>> +				pr_debug("Unable to register persistent access to non-pmem device\n");
>> +				err = -EINVAL;
>> +				goto err_release_umem;
> 
> This should use rxe_dbg_mr()
> 

Good catch, thanks

Zhijian

> Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [for-next PATCH v6 09/10] RDMA/cm: Make QP FLUSHABLE
  2022-11-23  6:07     ` lizhijian
@ 2022-11-24 17:39       ` Jason Gunthorpe
  2022-11-25  2:22         ` lizhijian
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2022-11-24 17:39 UTC (permalink / raw)
  To: lizhijian
  Cc: zyjzyj2000, leon, Bob Pearson, linux-rdma, Mark Bloch,
	Tom Talpey, tomasz.gromadzki, Dan Williams, yangx.jy,
	Yasunori Gotou (Fujitsu),
	linux-kernel

On Wed, Nov 23, 2022 at 06:07:37AM +0000, lizhijian@fujitsu.com wrote:
> 
> 
> On 22/11/2022 22:52, Jason Gunthorpe wrote:
> > On Wed, Nov 16, 2022 at 04:19:50PM +0800, Li Zhijian wrote:
> >> It enables flushable access flag for qp
> >>
> >> Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
> >> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> >> ---
> >> V5: new patch, inspired by Bob
> >> ---
> >>   drivers/infiniband/core/cm.c | 3 ++-
> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> >> index 1f9938a2c475..58837aac980b 100644
> >> --- a/drivers/infiniband/core/cm.c
> >> +++ b/drivers/infiniband/core/cm.c
> >> @@ -4096,7 +4096,8 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
> >>   		qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE;
> >>   		if (cm_id_priv->responder_resources)
> >>   			qp_attr->qp_access_flags |= IB_ACCESS_REMOTE_READ |
> >> -						    IB_ACCESS_REMOTE_ATOMIC;
> >> +						    IB_ACCESS_REMOTE_ATOMIC |
> >> +						    IB_ACCESS_FLUSHABLE;
> > 
> > What is the point of this? Nothing checks IB_ACCESS_FLUSHABLE ?
> 
> Previous, responder of RXE will check qp_access_flags in check_op_valid():
>   256 static enum resp_states check_op_valid(struct rxe_qp *qp, 
> 
>   257                                        struct rxe_pkt_info *pkt) 
> 
>   258 { 
> 
>   259         switch (qp_type(qp)) { 
> 
>   260         case IB_QPT_RC: 
> 
>   261                 if (((pkt->mask & RXE_READ_MASK) && 
> 
>   262                      !(qp->attr.qp_access_flags & 
> IB_ACCESS_REMOTE_READ)) || 
>  
> 
>   263                     ((pkt->mask & RXE_WRITE_MASK) && 
> 
>   264                      !(qp->attr.qp_access_flags & 
> IB_ACCESS_REMOTE_WRITE)) ||
>   265                     ((pkt->mask & RXE_ATOMIC_MASK) && 
> 
>   266                      !(qp->attr.qp_access_flags & 
> IB_ACCESS_REMOTE_ATOMIC))) {
>   267                         return RESPST_ERR_UNSUPPORTED_OPCODE; 
> 
>   268                 }
> 
> based on this, additional IB_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL 
> were added in patch7 since V5 suggested by Bob.
> Because of this change, QP should become FLUSHABLE correspondingly.

But nothing ever reads IB_ACCESS_FLUSHABLE, so why is it added?

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [for-next PATCH v6 09/10] RDMA/cm: Make QP FLUSHABLE
  2022-11-24 17:39       ` Jason Gunthorpe
@ 2022-11-25  2:22         ` lizhijian
  2022-11-25 14:16           ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: lizhijian @ 2022-11-25  2:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: zyjzyj2000, leon, Bob Pearson, linux-rdma, Mark Bloch,
	Tom Talpey, tomasz.gromadzki, Dan Williams, yangx.jy,
	Yasunori Gotou (Fujitsu),
	linux-kernel



On 25/11/2022 01:39, Jason Gunthorpe wrote:
> On Wed, Nov 23, 2022 at 06:07:37AM +0000, lizhijian@fujitsu.com wrote:
>>
>>
>> On 22/11/2022 22:52, Jason Gunthorpe wrote:
>>> On Wed, Nov 16, 2022 at 04:19:50PM +0800, Li Zhijian wrote:
>>>> It enables flushable access flag for qp
>>>>
>>>> Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
>>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>>>> ---
>>>> V5: new patch, inspired by Bob
>>>> ---
>>>>    drivers/infiniband/core/cm.c | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>>> index 1f9938a2c475..58837aac980b 100644
>>>> --- a/drivers/infiniband/core/cm.c
>>>> +++ b/drivers/infiniband/core/cm.c
>>>> @@ -4096,7 +4096,8 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
>>>>    		qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE;
>>>>    		if (cm_id_priv->responder_resources)
>>>>    			qp_attr->qp_access_flags |= IB_ACCESS_REMOTE_READ |
>>>> -						    IB_ACCESS_REMOTE_ATOMIC;
>>>> +						    IB_ACCESS_REMOTE_ATOMIC |
>>>> +						    IB_ACCESS_FLUSHABLE;
>>>
>>> What is the point of this? Nothing checks IB_ACCESS_FLUSHABLE ?
>>
>> Previous, responder of RXE will check qp_access_flags in check_op_valid():
>>    256 static enum resp_states check_op_valid(struct rxe_qp *qp,
>>
>>    257                                        struct rxe_pkt_info *pkt)
>>
>>    258 {
>>
>>    259         switch (qp_type(qp)) {
>>
>>    260         case IB_QPT_RC:
>>
>>    261                 if (((pkt->mask & RXE_READ_MASK) &&
>>
>>    262                      !(qp->attr.qp_access_flags &
>> IB_ACCESS_REMOTE_READ)) ||
>>   
>>
>>    263                     ((pkt->mask & RXE_WRITE_MASK) &&
>>
>>    264                      !(qp->attr.qp_access_flags &
>> IB_ACCESS_REMOTE_WRITE)) ||
>>    265                     ((pkt->mask & RXE_ATOMIC_MASK) &&
>>
>>    266                      !(qp->attr.qp_access_flags &
>> IB_ACCESS_REMOTE_ATOMIC))) {
>>    267                         return RESPST_ERR_UNSUPPORTED_OPCODE;
>>
>>    268                 }
>>
>> based on this, additional IB_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL
>> were added in patch7 since V5 suggested by Bob.
>> Because of this change, QP should become FLUSHABLE correspondingly.
> 
> But nothing ever reads IB_ACCESS_FLUSHABLE, so why is it added?

include/rdma/ib_verbs.h:
+	IB_ACCESS_FLUSH_GLOBAL = IB_UVERBS_ACCESS_FLUSH_GLOBAL,
+	IB_ACCESS_FLUSH_PERSISTENT = IB_UVERBS_ACCESS_FLUSH_PERSISTENT,
+	IB_ACCESS_FLUSHABLE = IB_ACCESS_FLUSH_GLOBAL |
+			      IB_ACCESS_FLUSH_PERSISTENT,

IB_ACCESS_FLUSHABLE is a wrapper of IB_ACCESS_FLUSH_GLOBAL | 
IB_ACCESS_FLUSH_PERSISTENT. With this wrapper, i will write one less 
line of code :)

I'm fine to expand it in next version.


> 
> Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [for-next PATCH v6 09/10] RDMA/cm: Make QP FLUSHABLE
  2022-11-25  2:22         ` lizhijian
@ 2022-11-25 14:16           ` Jason Gunthorpe
  2022-11-28 10:23             ` lizhijian
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2022-11-25 14:16 UTC (permalink / raw)
  To: lizhijian
  Cc: zyjzyj2000, leon, Bob Pearson, linux-rdma, Mark Bloch,
	Tom Talpey, tomasz.gromadzki, Dan Williams, yangx.jy,
	Yasunori Gotou (Fujitsu),
	linux-kernel

On Fri, Nov 25, 2022 at 02:22:24AM +0000, lizhijian@fujitsu.com wrote:
> 
> 
> On 25/11/2022 01:39, Jason Gunthorpe wrote:
> > On Wed, Nov 23, 2022 at 06:07:37AM +0000, lizhijian@fujitsu.com wrote:
> >>
> >>
> >> On 22/11/2022 22:52, Jason Gunthorpe wrote:
> >>> On Wed, Nov 16, 2022 at 04:19:50PM +0800, Li Zhijian wrote:
> >>>> It enables flushable access flag for qp
> >>>>
> >>>> Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
> >>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> >>>> ---
> >>>> V5: new patch, inspired by Bob
> >>>> ---
> >>>>    drivers/infiniband/core/cm.c | 3 ++-
> >>>>    1 file changed, 2 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> >>>> index 1f9938a2c475..58837aac980b 100644
> >>>> --- a/drivers/infiniband/core/cm.c
> >>>> +++ b/drivers/infiniband/core/cm.c
> >>>> @@ -4096,7 +4096,8 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
> >>>>    		qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE;
> >>>>    		if (cm_id_priv->responder_resources)
> >>>>    			qp_attr->qp_access_flags |= IB_ACCESS_REMOTE_READ |
> >>>> -						    IB_ACCESS_REMOTE_ATOMIC;
> >>>> +						    IB_ACCESS_REMOTE_ATOMIC |
> >>>> +						    IB_ACCESS_FLUSHABLE;
> >>>
> >>> What is the point of this? Nothing checks IB_ACCESS_FLUSHABLE ?
> >>
> >> Previous, responder of RXE will check qp_access_flags in check_op_valid():
> >>    256 static enum resp_states check_op_valid(struct rxe_qp *qp,
> >>
> >>    257                                        struct rxe_pkt_info *pkt)
> >>
> >>    258 {
> >>
> >>    259         switch (qp_type(qp)) {
> >>
> >>    260         case IB_QPT_RC:
> >>
> >>    261                 if (((pkt->mask & RXE_READ_MASK) &&
> >>
> >>    262                      !(qp->attr.qp_access_flags &
> >> IB_ACCESS_REMOTE_READ)) ||
> >>   
> >>
> >>    263                     ((pkt->mask & RXE_WRITE_MASK) &&
> >>
> >>    264                      !(qp->attr.qp_access_flags &
> >> IB_ACCESS_REMOTE_WRITE)) ||
> >>    265                     ((pkt->mask & RXE_ATOMIC_MASK) &&
> >>
> >>    266                      !(qp->attr.qp_access_flags &
> >> IB_ACCESS_REMOTE_ATOMIC))) {
> >>    267                         return RESPST_ERR_UNSUPPORTED_OPCODE;
> >>
> >>    268                 }
> >>
> >> based on this, additional IB_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL
> >> were added in patch7 since V5 suggested by Bob.
> >> Because of this change, QP should become FLUSHABLE correspondingly.
> > 
> > But nothing ever reads IB_ACCESS_FLUSHABLE, so why is it added?
> 
> include/rdma/ib_verbs.h:
> +	IB_ACCESS_FLUSH_GLOBAL = IB_UVERBS_ACCESS_FLUSH_GLOBAL,
> +	IB_ACCESS_FLUSH_PERSISTENT = IB_UVERBS_ACCESS_FLUSH_PERSISTENT,
> +	IB_ACCESS_FLUSHABLE = IB_ACCESS_FLUSH_GLOBAL |
> +			      IB_ACCESS_FLUSH_PERSISTENT,
> 
> IB_ACCESS_FLUSHABLE is a wrapper of IB_ACCESS_FLUSH_GLOBAL | 
> IB_ACCESS_FLUSH_PERSISTENT. With this wrapper, i will write one less 
> line of code :)
> 
> I'm fine to expand it in next version.

OIC, that is why it escaped grep

But this is back to my original question - why is it OK to do this
here in CMA? Shouldn't this cause other drivers to refuse to create
the QP because they don't support the flag?

Jason

> 
> > 
> > Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [for-next PATCH v6 09/10] RDMA/cm: Make QP FLUSHABLE
  2022-11-25 14:16           ` Jason Gunthorpe
@ 2022-11-28 10:23             ` lizhijian
  2022-12-05 10:07               ` lizhijian
  0 siblings, 1 reply; 22+ messages in thread
From: lizhijian @ 2022-11-28 10:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: zyjzyj2000, leon, Bob Pearson, linux-rdma, Mark Bloch,
	Tom Talpey, tomasz.gromadzki, Dan Williams, yangx.jy,
	Yasunori Gotou (Fujitsu),
	linux-kernel



On 25/11/2022 22:16, Jason Gunthorpe wrote:
>>>>>> ---
>>>>>>     drivers/infiniband/core/cm.c | 3 ++-
>>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>>>>> index 1f9938a2c475..58837aac980b 100644
>>>>>> --- a/drivers/infiniband/core/cm.c
>>>>>> +++ b/drivers/infiniband/core/cm.c
>>>>>> @@ -4096,7 +4096,8 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
>>>>>>     		qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE;
>>>>>>     		if (cm_id_priv->responder_resources)
>>>>>>     			qp_attr->qp_access_flags |= IB_ACCESS_REMOTE_READ |
>>>>>> -						    IB_ACCESS_REMOTE_ATOMIC;
>>>>>> +						    IB_ACCESS_REMOTE_ATOMIC |
>>>>>> +						    IB_ACCESS_FLUSHABLE;
>>>>> What is the point of this? Nothing checks IB_ACCESS_FLUSHABLE ?

>> I'm fine to expand it in next version.
> OIC, that is why it escaped grep
> 
> But this is back to my original question - why is it OK to do this
> here in CMA? Shouldn't this cause other drivers to refuse to create
> the QP because they don't support the flag?
> 

Jason,

My flush example got wrong since V5 where responder side does 
qp_access_flags check. so i added this patch.

I also agreed it's a good idea that we should only add this flush flag 
to the supported drivers. But i haven't figured out how to achieve it in 
current RDMA.

After more digging into rdma-core, i found that this flag can be also 
set from usespace by calling ibv_modify_qp().
For server side(responder), ibv_modify_qp() must be called after 
rdma_accept(). rdma_accept() inside will modify qp_access_flags 
again(rdma_get_request is the first place to modify qp_access_flags).

Back to the original question, IIUC, current rdma-core have no API to 
set qp_access_flags during qp creating.

FLUSH operation in responder side will check both mr->access_flags and 
qp_access_flags. So unsupported device/driver are not able to do flush 
at all though qp_access_flags apply to all drivers.


------------------------------------------
(gdb) bt
#0  __ibv_modify_qp_1_1 (qp=0x40fbf0, attr=0x7fffffffd920, attr_mask=57)
     at /home/lizhijian/rdma-core/libibverbs/verbs.c:715
#1  0x00007ffff7faa1db in ucma_init_conn_qp (id_priv=0x40f6d0, qp=0x40fbf0)
     at /home/lizhijian/rdma-core/librdmacm/cma.c:1380
#2  0x00007ffff7faada2 in rdma_create_qp_ex (id=0x40f6d0, 
attr=0x7fffffffda30)
     at /home/lizhijian/rdma-core/librdmacm/cma.c:1676
#3  0x00007ffff7faae94 in rdma_create_qp (id=0x40f6d0, pd=0x407710, 
qp_init_attr=0x7fffffffdae0)
     at /home/lizhijian/rdma-core/librdmacm/cma.c:1702
#4  0x00007ffff7fab5d3 in rdma_get_request (listen=0x40ede0, id=0x4051a8 
<id>)
     at /home/lizhijian/rdma-core/librdmacm/cma.c:1883
#5  0x0000000000401af9 in run () at 
/home/lizhijian/rdma-core/librdmacm/examples/rdma_flush_server.c:91
#6  0x00000000004021ea in main (argc=7, argv=0x7fffffffe228)
     at /home/lizhijian/rdma-core/librdmacm/examples/rdma_flush_server.c:282

(gdb) bt
#0  __ibv_modify_qp_1_1 (qp=0x40fbf0, attr=0x7fffffffd930, 
attr_mask=1216897)
     at /home/lizhijian/rdma-core/libibverbs/verbs.c:715
#1  0x00007ffff7fa9f16 in ucma_modify_qp_rtr (id=0x40f6d0, resp_res=128 
'\200')
     at /home/lizhijian/rdma-core/librdmacm/cma.c:1304
#2  0x00007ffff7fab80a in rdma_accept (id=0x40f6d0, conn_param=0x0)
     at /home/lizhijian/rdma-core/librdmacm/cma.c:1932
#3  0x0000000000401c8a in run () at 
/home/lizhijian/rdma-core/librdmacm/examples/rdma_flush_server.c:132
#4  0x00000000004021ea in main (argc=7, argv=0x7fffffffe228)
     at /home/lizhijian/rdma-core/librdmacm/examples/rdma_flush_server.c:282


> Jason
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [for-next PATCH v6 09/10] RDMA/cm: Make QP FLUSHABLE
  2022-11-28 10:23             ` lizhijian
@ 2022-12-05 10:07               ` lizhijian
  2022-12-05 17:12                 ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: lizhijian @ 2022-12-05 10:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: zyjzyj2000, leon, Bob Pearson, linux-rdma, Mark Bloch,
	Tom Talpey, tomasz.gromadzki, Dan Williams, yangx.jy,
	Yasunori Gotou (Fujitsu),
	linux-kernel

Jason

Could you take a look at this update:
- Make QP FLUSHABLE for supported device only
- add more explanation

commit 1a95f94125b59183d5fc643a917e0a2e7bb07981
Author: Li Zhijian <lizhijian@fujitsu.com>
Date:   Mon Sep 26 17:53:06 2022 +0800

     RDMA/cm: Make QP FLUSHABLE for supported device
     
     Similar to RDMA and Atomic qp attributes enabled by default in CM, enable
     FLUSH attribute for supported device. That makes applications that are
     built with rdma_create_ep, rdma_accept APIs have FLUSH qp attribute
     natively so that user is able to request FLUSH operation simpler.
     
     Note that, a FLUSH operation requires FLUSH are supported by both
     device(HCA) and memory region(MR) and QP at the same time, so it's safe
     to enable FLUSH qp attribute by default here.
     
     FLUSH attribute can be disable by modify_qp() interface.
     
     Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
     ---
     V6: enable flush for supported device only #Jason
     V5: new patch, inspired by Bob
     Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 1f9938a2c475..603c0aecc361 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -4094,9 +4094,18 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
                 *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS |
                                 IB_QP_PKEY_INDEX | IB_QP_PORT;
                 qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE;
-               if (cm_id_priv->responder_resources)
+               if (cm_id_priv->responder_resources) {
+                       struct ib_device *ib_dev = cm_id_priv->id.device;
+                       u64 support_flush = ib_dev->attrs.device_cap_flags &
+                         (IB_DEVICE_FLUSH_GLOBAL | IB_DEVICE_FLUSH_PERSISTENT);
+                       u32 flushable = support_flush ?
+                                       (IB_ACCESS_FLUSH_GLOBAL |
+                                        IB_ACCESS_FLUSH_PERSISTENT) : 0;
+
                         qp_attr->qp_access_flags |= IB_ACCESS_REMOTE_READ |
-                                                   IB_ACCESS_REMOTE_ATOMIC;
+                                                   IB_ACCESS_REMOTE_ATOMIC |
+                                                   flushable;
+               }
                 qp_attr->pkey_index = cm_id_priv->av.pkey_index;
                 if (cm_id_priv->av.port)
                         qp_attr->port_num = cm_id_priv->av.port->port_num;

Thanks
Zhijian


On 28/11/2022 18:23, lizhijian@fujitsu.com wrote:
> 
> 
> On 25/11/2022 22:16, Jason Gunthorpe wrote:
>>>>>>> ---
>>>>>>>      drivers/infiniband/core/cm.c | 3 ++-
>>>>>>>      1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>>>>>> index 1f9938a2c475..58837aac980b 100644
>>>>>>> --- a/drivers/infiniband/core/cm.c
>>>>>>> +++ b/drivers/infiniband/core/cm.c
>>>>>>> @@ -4096,7 +4096,8 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
>>>>>>>      		qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE;
>>>>>>>      		if (cm_id_priv->responder_resources)
>>>>>>>      			qp_attr->qp_access_flags |= IB_ACCESS_REMOTE_READ |
>>>>>>> -						    IB_ACCESS_REMOTE_ATOMIC;
>>>>>>> +						    IB_ACCESS_REMOTE_ATOMIC |
>>>>>>> +						    IB_ACCESS_FLUSHABLE;
>>>>>> What is the point of this? Nothing checks IB_ACCESS_FLUSHABLE ?
> 
>>> I'm fine to expand it in next version.
>> OIC, that is why it escaped grep
>>
>> But this is back to my original question - why is it OK to do this
>> here in CMA? Shouldn't this cause other drivers to refuse to create
>> the QP because they don't support the flag?
>>
> 
> Jason,
> 
> My flush example got wrong since V5 where responder side does
> qp_access_flags check. so i added this patch.
> 
> I also agreed it's a good idea that we should only add this flush flag
> to the supported drivers. But i haven't figured out how to achieve it in
> current RDMA.
> 
> After more digging into rdma-core, i found that this flag can be also
> set from usespace by calling ibv_modify_qp().
> For server side(responder), ibv_modify_qp() must be called after
> rdma_accept(). rdma_accept() inside will modify qp_access_flags
> again(rdma_get_request is the first place to modify qp_access_flags).
> 
> Back to the original question, IIUC, current rdma-core have no API to
> set qp_access_flags during qp creating.
> 
> FLUSH operation in responder side will check both mr->access_flags and
> qp_access_flags. So unsupported device/driver are not able to do flush
> at all though qp_access_flags apply to all drivers.
> 
> 
> ------------------------------------------
> (gdb) bt
> #0  __ibv_modify_qp_1_1 (qp=0x40fbf0, attr=0x7fffffffd920, attr_mask=57)
>       at /home/lizhijian/rdma-core/libibverbs/verbs.c:715
> #1  0x00007ffff7faa1db in ucma_init_conn_qp (id_priv=0x40f6d0, qp=0x40fbf0)
>       at /home/lizhijian/rdma-core/librdmacm/cma.c:1380
> #2  0x00007ffff7faada2 in rdma_create_qp_ex (id=0x40f6d0,
> attr=0x7fffffffda30)
>       at /home/lizhijian/rdma-core/librdmacm/cma.c:1676
> #3  0x00007ffff7faae94 in rdma_create_qp (id=0x40f6d0, pd=0x407710,
> qp_init_attr=0x7fffffffdae0)
>       at /home/lizhijian/rdma-core/librdmacm/cma.c:1702
> #4  0x00007ffff7fab5d3 in rdma_get_request (listen=0x40ede0, id=0x4051a8
> <id>)
>       at /home/lizhijian/rdma-core/librdmacm/cma.c:1883
> #5  0x0000000000401af9 in run () at
> /home/lizhijian/rdma-core/librdmacm/examples/rdma_flush_server.c:91
> #6  0x00000000004021ea in main (argc=7, argv=0x7fffffffe228)
>       at /home/lizhijian/rdma-core/librdmacm/examples/rdma_flush_server.c:282
> 
> (gdb) bt
> #0  __ibv_modify_qp_1_1 (qp=0x40fbf0, attr=0x7fffffffd930,
> attr_mask=1216897)
>       at /home/lizhijian/rdma-core/libibverbs/verbs.c:715
> #1  0x00007ffff7fa9f16 in ucma_modify_qp_rtr (id=0x40f6d0, resp_res=128
> '\200')
>       at /home/lizhijian/rdma-core/librdmacm/cma.c:1304
> #2  0x00007ffff7fab80a in rdma_accept (id=0x40f6d0, conn_param=0x0)
>       at /home/lizhijian/rdma-core/librdmacm/cma.c:1932
> #3  0x0000000000401c8a in run () at
> /home/lizhijian/rdma-core/librdmacm/examples/rdma_flush_server.c:132
> #4  0x00000000004021ea in main (argc=7, argv=0x7fffffffe228)
>       at /home/lizhijian/rdma-core/librdmacm/examples/rdma_flush_server.c:282
> 
> 
>> Jason

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [for-next PATCH v6 09/10] RDMA/cm: Make QP FLUSHABLE
  2022-12-05 10:07               ` lizhijian
@ 2022-12-05 17:12                 ` Jason Gunthorpe
  2022-12-07  1:25                   ` lizhijian
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2022-12-05 17:12 UTC (permalink / raw)
  To: lizhijian
  Cc: zyjzyj2000, leon, Bob Pearson, linux-rdma, Mark Bloch,
	Tom Talpey, tomasz.gromadzki, Dan Williams, yangx.jy,
	Yasunori Gotou (Fujitsu),
	linux-kernel

On Mon, Dec 05, 2022 at 10:07:11AM +0000, lizhijian@fujitsu.com wrote:
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index 1f9938a2c475..603c0aecc361 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -4094,9 +4094,18 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
>                  *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS |
>                                  IB_QP_PKEY_INDEX | IB_QP_PORT;
>                  qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE;
> -               if (cm_id_priv->responder_resources)
> +               if (cm_id_priv->responder_resources) {
> +                       struct ib_device *ib_dev = cm_id_priv->id.device;
> +                       u64 support_flush = ib_dev->attrs.device_cap_flags &
> +                         (IB_DEVICE_FLUSH_GLOBAL | IB_DEVICE_FLUSH_PERSISTENT);
> +                       u32 flushable = support_flush ?
> +                                       (IB_ACCESS_FLUSH_GLOBAL |
> +                                        IB_ACCESS_FLUSH_PERSISTENT) : 0;
> +
>                          qp_attr->qp_access_flags |= IB_ACCESS_REMOTE_READ |
> -                                                   IB_ACCESS_REMOTE_ATOMIC;
> +                                                   IB_ACCESS_REMOTE_ATOMIC |
> +                                                   flushable;
> +               }

This makes more sense

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [for-next PATCH v6 09/10] RDMA/cm: Make QP FLUSHABLE
  2022-12-05 17:12                 ` Jason Gunthorpe
@ 2022-12-07  1:25                   ` lizhijian
  0 siblings, 0 replies; 22+ messages in thread
From: lizhijian @ 2022-12-07  1:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: zyjzyj2000, leon, Bob Pearson, linux-rdma, Mark Bloch,
	Tom Talpey, tomasz.gromadzki, Dan Williams, yangx.jy,
	Yasunori Gotou (Fujitsu),
	linux-kernel

  

On 06/12/2022 01:12, Jason Gunthorpe wrote:
> On Mon, Dec 05, 2022 at 10:07:11AM +0000, lizhijian@fujitsu.com wrote:
>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>> index 1f9938a2c475..603c0aecc361 100644
>> --- a/drivers/infiniband/core/cm.c
>> +++ b/drivers/infiniband/core/cm.c
>> @@ -4094,9 +4094,18 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
>>                   *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS |
>>                                   IB_QP_PKEY_INDEX | IB_QP_PORT;
>>                   qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE;
>> -               if (cm_id_priv->responder_resources)
>> +               if (cm_id_priv->responder_resources) {
>> +                       struct ib_device *ib_dev = cm_id_priv->id.device;
>> +                       u64 support_flush = ib_dev->attrs.device_cap_flags &
>> +                         (IB_DEVICE_FLUSH_GLOBAL | IB_DEVICE_FLUSH_PERSISTENT);
>> +                       u32 flushable = support_flush ?
>> +                                       (IB_ACCESS_FLUSH_GLOBAL |
>> +                                        IB_ACCESS_FLUSH_PERSISTENT) : 0;
>> +
>>                           qp_attr->qp_access_flags |= IB_ACCESS_REMOTE_READ |
>> -                                                   IB_ACCESS_REMOTE_ATOMIC;
>> +                                                   IB_ACCESS_REMOTE_ATOMIC |
>> +                                                   flushable;
>> +               }
> 
> This makes more sense

thanks for your help, i have posted V7 revision.
https://lore.kernel.org/lkml/20221206130201.30986-1-lizhijian@fujitsu.com/T/#t

Thanks
Zhijian

> 
> Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-12-07  1:26 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-16  8:19 [for-next PATCH v6 00/10] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
2022-11-16  8:19 ` [for-next PATCH v6 01/10] RDMA: Extend RDMA user ABI to support flush Li Zhijian
2022-11-16  8:19 ` [for-next PATCH v6 02/10] RDMA: Extend RDMA kernel verbs " Li Zhijian
2022-11-16  8:19 ` [for-next PATCH v6 03/10] RDMA/rxe: Extend rxe user " Li Zhijian
2022-11-16  8:19 ` [for-next PATCH v6 04/10] RDMA/rxe: Allow registering persistent flag for pmem MR only Li Zhijian
2022-11-22 14:46   ` Jason Gunthorpe
2022-11-23  6:12     ` lizhijian
2022-11-16  8:19 ` [for-next PATCH v6 05/10] RDMA/rxe: Extend rxe packet format to support flush Li Zhijian
2022-11-16  8:19 ` [for-next PATCH v6 06/10] RDMA/rxe: Implement RC RDMA FLUSH service in requester side Li Zhijian
2022-11-16  8:19 ` [for-next PATCH v6 07/10] RDMA/rxe: Implement flush execution in responder side Li Zhijian
2022-11-16  8:19 ` [for-next PATCH v6 08/10] RDMA/rxe: Implement flush completion Li Zhijian
2022-11-16  8:19 ` [for-next PATCH v6 09/10] RDMA/cm: Make QP FLUSHABLE Li Zhijian
2022-11-22 14:52   ` Jason Gunthorpe
2022-11-23  6:07     ` lizhijian
2022-11-24 17:39       ` Jason Gunthorpe
2022-11-25  2:22         ` lizhijian
2022-11-25 14:16           ` Jason Gunthorpe
2022-11-28 10:23             ` lizhijian
2022-12-05 10:07               ` lizhijian
2022-12-05 17:12                 ` Jason Gunthorpe
2022-12-07  1:25                   ` lizhijian
2022-11-16  8:19 ` [for-next PATCH v6 10/10] RDMA/rxe: Enable RDMA FLUSH capability for rxe device Li Zhijian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.