All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation
@ 2022-03-15 10:18 Li Zhijian
  2022-03-15 10:18 ` [RFC PATCH v3 1/7] RDMA: Allow registering MR with flush access flags Li Zhijian
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Li Zhijian @ 2022-03-15 10:18 UTC (permalink / raw)
  To: linux-rdma, zyjzyj2000, jgg, aharonl, leon, tom, tomasz.gromadzki
  Cc: linux-kernel, mbloch, liangwenpeng, yangx.jy, y-goto,
	rpearsonhpe, dan.j.williams, Li Zhijian

Hey folks,

These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
In IB SPEC 1.5[1], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
added in the MEMORY PLACEMENT EXTENSIONS section.

This patchset makes SoftRoCE support new RDMA FLUSH on RC and RD service.

You can verify the patchset by building and running the rdma_flush example[2].
server:
$ ./rdma_flush_server -s [server_address] -p [port_number]
client:
$ ./rdma_flush_client -s [server_address] -p [port_number]

- We introduce new packet format for FLUSH request.
- We introduce FLUSH placement type attributes to HCA
- We introduce FLUSH access flags that users are able to register with

[1]: https://www.infinibandta.org/wp-content/uploads/2021/08/IBTA-Overview-of-IBTA-Volume-1-Release-1.5-and-MPE-2021-08-17-Secure.pptx
[2]: https://github.com/zhijianli88/rdma-core/tree/rdma-flush

CC: Xiao Yang <yangx.jy@fujitsu.com>
CC: y-goto@fujitsu.com
CC: Jason Gunthorpe <jgg@ziepe.ca>
CC: Zhu Yanjun <zyjzyj2000@gmail.com
CC: Leon Romanovsky <leon@kernel.org>
CC: Bob Pearson <rpearsonhpe@gmail.com>
CC: Mark Bloch <mbloch@nvidia.com>
CC: Wenpeng Liang <liangwenpeng@huawei.com>
CC: Aharon Landau <aharonl@nvidia.com>
CC: Tom Talpey <tom@talpey.com>
CC: "Gromadzki, Tomasz" <tomasz.gromadzki@intel.com>
CC: Dan Williams <dan.j.williams@intel.com>
CC: linux-rdma@vger.kernel.org
CC: linux-kernel@vger.kernel.org

Can also access the kernel source in:
https://github.com/zhijianli88/linux/tree/rdma-flush
Changes log
V3:
- Just rebase and commit log and comment updates
- delete patch-1: "RDMA: mr: Introduce is_pmem", which will be combined into "Allow registering persistent flag for pmem MR only"
- delete patch-7

V2:
RDMA: mr: Introduce is_pmem
   check 1st byte to avoid crossing page boundary
   new scheme to check is_pmem # Dan

RDMA: Allow registering MR with flush access flags
   combine with [03/10] RDMA/rxe: Allow registering FLUSH flags for supported device only to this patch # Jason
   split RDMA_FLUSH to 2 capabilities

RDMA/rxe: Allow registering persistent flag for pmem MR only
   update commit message, get rid of confusing ib_check_flush_access_flags() # Tom

RDMA/rxe: Implement RC RDMA FLUSH service in requester side
   extend flush to include length field. # Tom and Tomasz

RDMA/rxe: Implement flush execution in responder side
   adjust start for WHOLE MR level # Tom
   don't support DMA mr for flush # Tom
   check flush return value

RDMA/rxe: Enable RDMA FLUSH capability for rxe device
   adjust patch's order. move it here from [04/10]

Li Zhijian (7):
  RDMA: Allow registering MR with flush access flags
  RDMA/rxe: Allow registering persistent flag for pmem MR only
  RDMA/rxe: Implement RC RDMA FLUSH service in requester side
  RDMA/rxe: Implement flush execution in responder side
  RDMA/rxe: Implement flush completion
  RDMA/rxe: Enable RDMA FLUSH capability for rxe device
  RDMA/rxe: Add RD FLUSH service support

 drivers/infiniband/core/uverbs_cmd.c    |  17 +++
 drivers/infiniband/sw/rxe/rxe_comp.c    |   4 +-
 drivers/infiniband/sw/rxe/rxe_hdr.h     |  48 +++++++++
 drivers/infiniband/sw/rxe/rxe_loc.h     |   2 +
 drivers/infiniband/sw/rxe/rxe_mr.c      |  36 ++++++-
 drivers/infiniband/sw/rxe/rxe_opcode.c  |  35 ++++++
 drivers/infiniband/sw/rxe/rxe_opcode.h  |   3 +
 drivers/infiniband/sw/rxe/rxe_param.h   |   4 +-
 drivers/infiniband/sw/rxe/rxe_req.c     |  15 ++-
 drivers/infiniband/sw/rxe/rxe_resp.c    | 135 ++++++++++++++++++++++--
 include/rdma/ib_pack.h                  |   3 +
 include/rdma/ib_verbs.h                 |  29 ++++-
 include/uapi/rdma/ib_user_ioctl_verbs.h |   2 +
 include/uapi/rdma/ib_user_verbs.h       |  19 ++++
 include/uapi/rdma/rdma_user_rxe.h       |   7 ++
 15 files changed, 346 insertions(+), 13 deletions(-)

-- 
2.31.1




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH v3 1/7] RDMA: Allow registering MR with flush access flags
  2022-03-15 10:18 [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
@ 2022-03-15 10:18 ` Li Zhijian
  2022-03-15 10:18 ` [RFC PATCH v3 2/7] RDMA/rxe: Allow registering persistent flag for pmem MR only Li Zhijian
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Li Zhijian @ 2022-03-15 10:18 UTC (permalink / raw)
  To: linux-rdma, zyjzyj2000, jgg, aharonl, leon, tom, tomasz.gromadzki
  Cc: linux-kernel, mbloch, liangwenpeng, yangx.jy, y-goto,
	rpearsonhpe, dan.j.williams, Li Zhijian

It introduces new attributes/capabilities for device.

Users can use ibv_reg_mr(3) to register flush access flags. Only the
access flags also supported by device can be registered successfully.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V2: combine [03/10] RDMA/rxe: Allow registering FLUSH flags for supported device only to this patch # Jason
    split RDMA_FLUSH to 2 capabilities
    Fix typo
---
 include/rdma/ib_verbs.h                 | 18 ++++++++++++++++--
 include/uapi/rdma/ib_user_ioctl_verbs.h |  2 ++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 69d883f7fb41..465de3bab1e9 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -291,6 +291,9 @@ enum ib_device_cap_flags {
 	/* The device supports padding incoming writes to cacheline. */
 	IB_DEVICE_PCI_WRITE_END_PADDING		= (1ULL << 36),
 	IB_DEVICE_ALLOW_USER_UNREG		= (1ULL << 37),
+	/* Placement type attributes */
+	IB_DEVICE_PLT_GLOBAL_VISIBILITY		= (1ULL << 38),
+	IB_DEVICE_PLT_PERSISTENT		= (1ULL << 39),
 };
 
 enum ib_atomic_cap {
@@ -1444,10 +1447,14 @@ enum ib_access_flags {
 	IB_ACCESS_ON_DEMAND = IB_UVERBS_ACCESS_ON_DEMAND,
 	IB_ACCESS_HUGETLB = IB_UVERBS_ACCESS_HUGETLB,
 	IB_ACCESS_RELAXED_ORDERING = IB_UVERBS_ACCESS_RELAXED_ORDERING,
+	IB_ACCESS_FLUSH_GLOBAL_VISIBILITY = IB_UVERBS_ACCESS_FLUSH_GLOBAL_VISIBILITY,
+	IB_ACCESS_FLUSH_PERSISTENT = IB_UVERBS_ACCESS_FLUSH_PERSISTENT,
 
+	IB_ACCESS_FLUSHABLE = IB_ACCESS_FLUSH_GLOBAL_VISIBILITY |
+			      IB_ACCESS_FLUSH_PERSISTENT,
 	IB_ACCESS_OPTIONAL = IB_UVERBS_ACCESS_OPTIONAL_RANGE,
 	IB_ACCESS_SUPPORTED =
-		((IB_ACCESS_HUGETLB << 1) - 1) | IB_ACCESS_OPTIONAL,
+		((IB_ACCESS_FLUSH_PERSISTENT << 1) - 1) | IB_ACCESS_OPTIONAL,
 };
 
 /*
@@ -4300,6 +4307,7 @@ int ib_dealloc_xrcd_user(struct ib_xrcd *xrcd, struct ib_udata *udata);
 static inline int ib_check_mr_access(struct ib_device *ib_dev,
 				     unsigned int flags)
 {
+	u64 device_cap = ib_dev->attrs.device_cap_flags;
 	/*
 	 * Local write permission is required if remote write or
 	 * remote atomic permission is also requested.
@@ -4312,7 +4320,13 @@ static inline int ib_check_mr_access(struct ib_device *ib_dev,
 		return -EINVAL;
 
 	if (flags & IB_ACCESS_ON_DEMAND &&
-	    !(ib_dev->attrs.device_cap_flags & IB_DEVICE_ON_DEMAND_PAGING))
+	    !(device_cap & IB_DEVICE_ON_DEMAND_PAGING))
+		return -EINVAL;
+
+	if ((flags & IB_ACCESS_FLUSH_GLOBAL_VISIBILITY &&
+	    !(device_cap & IB_DEVICE_PLT_GLOBAL_VISIBILITY)) ||
+	    (flags & IB_ACCESS_FLUSH_PERSISTENT &&
+	    !(device_cap & IB_DEVICE_PLT_PERSISTENT)))
 		return -EINVAL;
 	return 0;
 }
diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index 3072e5d6b692..2c28f90ec54c 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -57,6 +57,8 @@ enum ib_uverbs_access_flags {
 	IB_UVERBS_ACCESS_ZERO_BASED = 1 << 5,
 	IB_UVERBS_ACCESS_ON_DEMAND = 1 << 6,
 	IB_UVERBS_ACCESS_HUGETLB = 1 << 7,
+	IB_UVERBS_ACCESS_FLUSH_GLOBAL_VISIBILITY = 1 << 8,
+	IB_UVERBS_ACCESS_FLUSH_PERSISTENT = 1 << 9,
 
 	IB_UVERBS_ACCESS_RELAXED_ORDERING = IB_UVERBS_ACCESS_OPTIONAL_FIRST,
 	IB_UVERBS_ACCESS_OPTIONAL_RANGE =
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH v3 2/7] RDMA/rxe: Allow registering persistent flag for pmem MR only
  2022-03-15 10:18 [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
  2022-03-15 10:18 ` [RFC PATCH v3 1/7] RDMA: Allow registering MR with flush access flags Li Zhijian
@ 2022-03-15 10:18 ` Li Zhijian
  2022-03-15 10:18 ` [RFC PATCH v3 3/7] RDMA/rxe: Implement RC RDMA FLUSH service in requester side Li Zhijian
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Li Zhijian @ 2022-03-15 10:18 UTC (permalink / raw)
  To: linux-rdma, zyjzyj2000, jgg, aharonl, leon, tom, tomasz.gromadzki
  Cc: linux-kernel, mbloch, liangwenpeng, yangx.jy, y-goto,
	rpearsonhpe, dan.j.williams, Li Zhijian

Memory region could at most support 2 placement types:
IB_ACCESS_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL_VISIBILITY

But we only allow user to register persistence flush to non-pmem MR.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V3: combine [RFC PATCH v2 1/9] RDMA: mr: Introduce is_pmem
V2: update commit message, get rid of confusing ib_check_flush_access_flags() # Tom
---
 drivers/infiniband/sw/rxe/rxe_mr.c | 32 ++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 453ef3c9d535..4f5c4af19fe0 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -161,6 +161,28 @@ void rxe_mr_init_dma(struct rxe_pd *pd, int access, struct rxe_mr *mr)
 	mr->type = IB_MR_TYPE_DMA;
 }
 
+static bool iova_in_pmem(struct rxe_mr *mr, u64 iova, int length)
+{
+	char *vaddr;
+	int is_pmem;
+
+	/* XXX: Shall me allow length == 0 */
+	if (length == 0) {
+		return false;
+	}
+	/* check the 1st byte only to avoid crossing page boundary */
+	vaddr = iova_to_vaddr(mr, iova, 1);
+	if (!vaddr) {
+		pr_warn("not a valid iova 0x%llx\n", iova);
+		return false;
+	}
+
+	is_pmem = region_intersects(virt_to_phys(vaddr), 1, IORESOURCE_MEM,
+				    IORES_DESC_PERSISTENT_MEMORY);
+
+	return is_pmem == REGION_INTERSECTS;
+}
+
 int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova,
 		     int access, struct rxe_mr *mr)
 {
@@ -235,6 +257,16 @@ int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova,
 	set->va = start;
 	set->offset = ib_umem_offset(umem);
 
+	// iova_in_pmem() must be called after set is updated
+	if (!iova_in_pmem(mr, iova, length) &&
+	    access & IB_ACCESS_FLUSH_PERSISTENT) {
+		pr_warn("Cannot register IB_ACCESS_FLUSH_PERSISTENT for non-pmem memory\n");
+		mr->state = RXE_MR_STATE_INVALID;
+		mr->umem = NULL;
+		err = -EINVAL;
+		goto err_release_umem;
+	}
+
 	return 0;
 
 err_release_umem:
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH v3 3/7] RDMA/rxe: Implement RC RDMA FLUSH service in requester side
  2022-03-15 10:18 [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
  2022-03-15 10:18 ` [RFC PATCH v3 1/7] RDMA: Allow registering MR with flush access flags Li Zhijian
  2022-03-15 10:18 ` [RFC PATCH v3 2/7] RDMA/rxe: Allow registering persistent flag for pmem MR only Li Zhijian
@ 2022-03-15 10:18 ` Li Zhijian
  2022-03-15 10:18 ` [RFC PATCH v3 4/7] RDMA/rxe: Implement flush execution in responder side Li Zhijian
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Li Zhijian @ 2022-03-15 10:18 UTC (permalink / raw)
  To: linux-rdma, zyjzyj2000, jgg, aharonl, leon, tom, tomasz.gromadzki
  Cc: linux-kernel, mbloch, liangwenpeng, yangx.jy, y-goto,
	rpearsonhpe, dan.j.williams, Li Zhijian

It introudces a new packet format for FLUSH request.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V3: Fix sparse: incorrect type in assignment; Reported-by: kernel test robot <lkp@intel.com>
V2: extend flush to include length field.
---
 drivers/infiniband/core/uverbs_cmd.c   | 17 +++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_hdr.h    | 20 ++++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_opcode.c | 15 +++++++++++++++
 drivers/infiniband/sw/rxe/rxe_opcode.h |  3 +++
 drivers/infiniband/sw/rxe/rxe_req.c    | 15 ++++++++++++++-
 include/rdma/ib_pack.h                 |  2 ++
 include/rdma/ib_verbs.h                | 10 ++++++++++
 include/uapi/rdma/ib_user_verbs.h      |  8 ++++++++
 include/uapi/rdma/rdma_user_rxe.h      |  7 +++++++
 9 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 6b6393176b3c..632e1747fb60 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2080,6 +2080,23 @@ static int ib_uverbs_post_send(struct uverbs_attr_bundle *attrs)
 			rdma->rkey = user_wr->wr.rdma.rkey;
 
 			next = &rdma->wr;
+		} else if (user_wr->opcode == IB_WR_RDMA_FLUSH) {
+			struct ib_flush_wr *flush;
+
+			next_size = sizeof(*flush);
+			flush = alloc_wr(next_size, user_wr->num_sge);
+			if (!flush) {
+				ret = -ENOMEM;
+				goto out_put;
+			}
+
+			flush->remote_addr = user_wr->wr.flush.remote_addr;
+			flush->length = user_wr->wr.flush.length;
+			flush->rkey = user_wr->wr.flush.rkey;
+			flush->type = user_wr->wr.flush.type;
+			flush->level = user_wr->wr.flush.level;
+
+			next = &flush->wr;
 		} else if (user_wr->opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
 			   user_wr->opcode == IB_WR_ATOMIC_FETCH_AND_ADD) {
 			struct ib_atomic_wr *atomic;
diff --git a/drivers/infiniband/sw/rxe/rxe_hdr.h b/drivers/infiniband/sw/rxe/rxe_hdr.h
index e432f9e37795..8063b5018445 100644
--- a/drivers/infiniband/sw/rxe/rxe_hdr.h
+++ b/drivers/infiniband/sw/rxe/rxe_hdr.h
@@ -607,6 +607,25 @@ static inline void reth_set_len(struct rxe_pkt_info *pkt, u32 len)
 		rxe_opcode[pkt->opcode].offset[RXE_RETH], len);
 }
 
+/*
+ * FLUSH Extended Transport Header
+ */
+#define FETH_PLT_SHIFT 0UL
+#define FETH_SEL_SHIFT 4UL
+#define FETH_RESERVED_SHIFT 6UL
+#define FETH_PLT_MASK ((1UL << FETH_SEL_SHIFT) - 1UL)
+#define FETH_SEL_MASK (~FETH_PLT_MASK & ((1UL << FETH_RESERVED_SHIFT) - 1UL))
+
+static inline void feth_init(struct rxe_pkt_info *pkt, u8 type, u8 level)
+{
+	__be32 *p = (__be32 *)
+		    (pkt->hdr + rxe_opcode[pkt->opcode].offset[RXE_FETH]);
+	u32 feth = ((level << FETH_SEL_SHIFT) & FETH_SEL_MASK) |
+		   ((type << FETH_PLT_SHIFT) & FETH_PLT_MASK);
+
+	*p = cpu_to_be32(feth);
+}
+
 /******************************************************************************
  * Atomic Extended Transport Header
  ******************************************************************************/
@@ -910,6 +929,7 @@ enum rxe_hdr_length {
 	RXE_ATMETH_BYTES	= sizeof(struct rxe_atmeth),
 	RXE_IETH_BYTES		= sizeof(struct rxe_ieth),
 	RXE_RDETH_BYTES		= sizeof(struct rxe_rdeth),
+	RXE_FETH_BYTES		= sizeof(u32),
 };
 
 static inline size_t header_size(struct rxe_pkt_info *pkt)
diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.c b/drivers/infiniband/sw/rxe/rxe_opcode.c
index df596ba7527d..adea6c16dfb5 100644
--- a/drivers/infiniband/sw/rxe/rxe_opcode.c
+++ b/drivers/infiniband/sw/rxe/rxe_opcode.c
@@ -316,6 +316,21 @@ struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = {
 					  RXE_AETH_BYTES,
 		}
 	},
+	[IB_OPCODE_RC_RDMA_FLUSH]			= {
+		.name	= "IB_OPCODE_RC_RDMA_FLUSH",
+		.mask	= RXE_FETH_MASK | RXE_RETH_MASK | RXE_FLUSH_MASK |
+			  RXE_START_MASK | RXE_END_MASK | RXE_REQ_MASK,
+		.length = RXE_BTH_BYTES + RXE_FETH_BYTES + RXE_RETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_FETH]	= RXE_BTH_BYTES,
+			[RXE_RETH]	= RXE_BTH_BYTES +
+					  RXE_FETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES +
+					  RXE_FETH_BYTES +
+					  RXE_RETH_BYTES,
+		}
+	},
 	[IB_OPCODE_RC_ATOMIC_ACKNOWLEDGE]			= {
 		.name	= "IB_OPCODE_RC_ATOMIC_ACKNOWLEDGE",
 		.mask	= RXE_AETH_MASK | RXE_ATMACK_MASK | RXE_ACK_MASK |
diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.h b/drivers/infiniband/sw/rxe/rxe_opcode.h
index 8f9aaaf260f2..dbc2eca8a92c 100644
--- a/drivers/infiniband/sw/rxe/rxe_opcode.h
+++ b/drivers/infiniband/sw/rxe/rxe_opcode.h
@@ -48,6 +48,7 @@ enum rxe_hdr_type {
 	RXE_DETH,
 	RXE_IMMDT,
 	RXE_PAYLOAD,
+	RXE_FETH,
 	NUM_HDR_TYPES
 };
 
@@ -63,6 +64,7 @@ enum rxe_hdr_mask {
 	RXE_IETH_MASK		= BIT(RXE_IETH),
 	RXE_RDETH_MASK		= BIT(RXE_RDETH),
 	RXE_DETH_MASK		= BIT(RXE_DETH),
+	RXE_FETH_MASK		= BIT(RXE_FETH),
 	RXE_PAYLOAD_MASK	= BIT(RXE_PAYLOAD),
 
 	RXE_REQ_MASK		= BIT(NUM_HDR_TYPES + 0),
@@ -80,6 +82,7 @@ enum rxe_hdr_mask {
 	RXE_END_MASK		= BIT(NUM_HDR_TYPES + 10),
 
 	RXE_LOOPBACK_MASK	= BIT(NUM_HDR_TYPES + 12),
+	RXE_FLUSH_MASK		= BIT(NUM_HDR_TYPES + 13),
 
 	RXE_READ_OR_ATOMIC_MASK	= (RXE_READ_MASK | RXE_ATOMIC_MASK),
 	RXE_WRITE_OR_SEND_MASK	= (RXE_WRITE_MASK | RXE_SEND_MASK),
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 5eb89052dd66..708138117136 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -220,6 +220,9 @@ static int next_opcode_rc(struct rxe_qp *qp, u32 opcode, int fits)
 				IB_OPCODE_RC_SEND_ONLY_WITH_IMMEDIATE :
 				IB_OPCODE_RC_SEND_FIRST;
 
+	case IB_WR_RDMA_FLUSH:
+		return IB_OPCODE_RC_RDMA_FLUSH;
+
 	case IB_WR_RDMA_READ:
 		return IB_OPCODE_RC_RDMA_READ_REQUEST;
 
@@ -413,11 +416,18 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
 
 	/* init optional headers */
 	if (pkt->mask & RXE_RETH_MASK) {
-		reth_set_rkey(pkt, ibwr->wr.rdma.rkey);
+		if (pkt->mask & RXE_FETH_MASK)
+			reth_set_rkey(pkt, ibwr->wr.flush.rkey);
+		else
+			reth_set_rkey(pkt, ibwr->wr.rdma.rkey);
 		reth_set_va(pkt, wqe->iova);
 		reth_set_len(pkt, wqe->dma.resid);
 	}
 
+	/* Fill Flush Extension Transport Header */
+	if (pkt->mask & RXE_FETH_MASK)
+		feth_init(pkt, ibwr->wr.flush.type, ibwr->wr.flush.level);
+
 	if (pkt->mask & RXE_IMMDT_MASK)
 		immdt_set_imm(pkt, ibwr->ex.imm_data);
 
@@ -477,6 +487,9 @@ static int finish_packet(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
 
 			memset(pad, 0, bth_pad(pkt));
 		}
+	} else if (pkt->mask & RXE_FLUSH_MASK) {
+		// oA19-2: shall have no payload.
+		wqe->dma.resid = 0;
 	}
 
 	return 0;
diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index a9162f25beaf..d19edb502de6 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -84,6 +84,7 @@ enum {
 	/* opcode 0x15 is reserved */
 	IB_OPCODE_SEND_LAST_WITH_INVALIDATE         = 0x16,
 	IB_OPCODE_SEND_ONLY_WITH_INVALIDATE         = 0x17,
+	IB_OPCODE_RDMA_FLUSH                        = 0x1C,
 
 	/* real constants follow -- see comment about above IB_OPCODE()
 	   macro for more details */
@@ -112,6 +113,7 @@ enum {
 	IB_OPCODE(RC, FETCH_ADD),
 	IB_OPCODE(RC, SEND_LAST_WITH_INVALIDATE),
 	IB_OPCODE(RC, SEND_ONLY_WITH_INVALIDATE),
+	IB_OPCODE(RC, RDMA_FLUSH),
 
 	/* UC */
 	IB_OPCODE(UC, SEND_FIRST),
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 465de3bab1e9..8f04e45b56aa 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1299,6 +1299,7 @@ struct ib_qp_attr {
 enum ib_wr_opcode {
 	/* These are shared with userspace */
 	IB_WR_RDMA_WRITE = IB_UVERBS_WR_RDMA_WRITE,
+	IB_WR_RDMA_FLUSH = IB_UVERBS_WR_RDMA_FLUSH,
 	IB_WR_RDMA_WRITE_WITH_IMM = IB_UVERBS_WR_RDMA_WRITE_WITH_IMM,
 	IB_WR_SEND = IB_UVERBS_WR_SEND,
 	IB_WR_SEND_WITH_IMM = IB_UVERBS_WR_SEND_WITH_IMM,
@@ -1393,6 +1394,15 @@ struct ib_atomic_wr {
 	u32			rkey;
 };
 
+struct ib_flush_wr {
+	struct ib_send_wr	wr;
+	u64			remote_addr;
+	u32			length;
+	u32			rkey;
+	u8			type;
+	u8			level;
+};
+
 static inline const struct ib_atomic_wr *atomic_wr(const struct ib_send_wr *wr)
 {
 	return container_of(wr, struct ib_atomic_wr, wr);
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 7ee73a0652f1..c4131913ef6a 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -784,6 +784,7 @@ enum ib_uverbs_wr_opcode {
 	IB_UVERBS_WR_RDMA_READ_WITH_INV = 11,
 	IB_UVERBS_WR_MASKED_ATOMIC_CMP_AND_SWP = 12,
 	IB_UVERBS_WR_MASKED_ATOMIC_FETCH_AND_ADD = 13,
+	IB_UVERBS_WR_RDMA_FLUSH = 14,
 	/* Review enum ib_wr_opcode before modifying this */
 };
 
@@ -797,6 +798,13 @@ struct ib_uverbs_send_wr {
 		__u32 invalidate_rkey;
 	} ex;
 	union {
+		struct {
+			__aligned_u64 remote_addr;
+			__u32 length;
+			__u32 rkey;
+			__u8 type;
+			__u8 level;
+		} flush;
 		struct {
 			__aligned_u64 remote_addr;
 			__u32 rkey;
diff --git a/include/uapi/rdma/rdma_user_rxe.h b/include/uapi/rdma/rdma_user_rxe.h
index f09c5c9e3dd5..3de56ed5c24f 100644
--- a/include/uapi/rdma/rdma_user_rxe.h
+++ b/include/uapi/rdma/rdma_user_rxe.h
@@ -82,6 +82,13 @@ struct rxe_send_wr {
 		__u32		invalidate_rkey;
 	} ex;
 	union {
+		struct {
+			__aligned_u64 remote_addr;
+			__u32	length;
+			__u32	rkey;
+			__u8	type;
+			__u8	level;
+		} flush;
 		struct {
 			__aligned_u64 remote_addr;
 			__u32	rkey;
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH v3 4/7] RDMA/rxe: Implement flush execution in responder side
  2022-03-15 10:18 [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (2 preceding siblings ...)
  2022-03-15 10:18 ` [RFC PATCH v3 3/7] RDMA/rxe: Implement RC RDMA FLUSH service in requester side Li Zhijian
@ 2022-03-15 10:18 ` Li Zhijian
  2022-03-15 10:18 ` [RFC PATCH v3 5/7] RDMA/rxe: Implement flush completion Li Zhijian
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Li Zhijian @ 2022-03-15 10:18 UTC (permalink / raw)
  To: linux-rdma, zyjzyj2000, jgg, aharonl, leon, tom, tomasz.gromadzki
  Cc: linux-kernel, mbloch, liangwenpeng, yangx.jy, y-goto,
	rpearsonhpe, dan.j.williams, Li Zhijian

In contrast to other opcodes, after a series of sanity checking, FLUSH
opcode will do a Placement Type checking before it really do the FLUSH
operation.

We will persist data via arch_wb_cache_pmem(), which could be
architecture specific.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V3: Fix sparse: incorrect type in assignment; Reported-by: kernel test robot <lkp@intel.com>
V2:
 # from Tom
 - adjust start for WHOLE MR level
 - don't support DMA mr for flush
 - check flush return value
 - FLUSH only requires FLUSH access flags, not READ nor WRITE
---
 drivers/infiniband/sw/rxe/rxe_hdr.h  |  28 ++++++
 drivers/infiniband/sw/rxe/rxe_loc.h  |   2 +
 drivers/infiniband/sw/rxe/rxe_mr.c   |   4 +-
 drivers/infiniband/sw/rxe/rxe_resp.c | 135 +++++++++++++++++++++++++--
 include/uapi/rdma/ib_user_verbs.h    |  10 ++
 5 files changed, 171 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_hdr.h b/drivers/infiniband/sw/rxe/rxe_hdr.h
index 8063b5018445..2fe98146130e 100644
--- a/drivers/infiniband/sw/rxe/rxe_hdr.h
+++ b/drivers/infiniband/sw/rxe/rxe_hdr.h
@@ -626,6 +626,34 @@ static inline void feth_init(struct rxe_pkt_info *pkt, u8 type, u8 level)
 	*p = cpu_to_be32(feth);
 }
 
+static inline u32 __feth_plt(void *arg)
+{
+	__be32 *fethp = arg;
+	u32 feth = be32_to_cpu(*fethp);
+
+	return (feth & FETH_PLT_MASK) >> FETH_PLT_SHIFT;
+}
+
+static inline u32 __feth_sel(void *arg)
+{
+	__be32 *fethp = arg;
+	u32 feth = be32_to_cpu(*fethp);
+
+	return (feth & FETH_SEL_MASK) >> FETH_SEL_SHIFT;
+}
+
+static inline u32 feth_plt(struct rxe_pkt_info *pkt)
+{
+	return __feth_plt(pkt->hdr +
+		rxe_opcode[pkt->opcode].offset[RXE_FETH]);
+}
+
+static inline u32 feth_sel(struct rxe_pkt_info *pkt)
+{
+	return __feth_sel(pkt->hdr +
+		rxe_opcode[pkt->opcode].offset[RXE_FETH]);
+}
+
 /******************************************************************************
  * Atomic Extended Transport Header
  ******************************************************************************/
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index b1e174afb1d4..73c39ff11e28 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -80,6 +80,8 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length,
 		enum rxe_mr_copy_dir dir);
 int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma,
 	      void *addr, int length, enum rxe_mr_copy_dir dir);
+void lookup_iova(struct rxe_mr *mr, u64 iova, int *m_out, int *n_out,
+		 size_t *offset_out);
 void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length);
 struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key,
 			 enum rxe_mr_lookup_type type);
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 4f5c4af19fe0..28bef8a39cd7 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -297,8 +297,8 @@ int rxe_mr_init_fast(struct rxe_pd *pd, int max_pages, struct rxe_mr *mr)
 	return err;
 }
 
-static void lookup_iova(struct rxe_mr *mr, u64 iova, int *m_out, int *n_out,
-			size_t *offset_out)
+void lookup_iova(struct rxe_mr *mr, u64 iova, int *m_out, int *n_out,
+		 size_t *offset_out)
 {
 	struct rxe_map_set *set = mr->cur_map_set;
 	size_t offset = iova - set->iova + set->offset;
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index e0093fad4e0f..8ad35667a476 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -5,6 +5,7 @@
  */
 
 #include <linux/skbuff.h>
+#include <linux/libnvdimm.h>
 
 #include "rxe.h"
 #include "rxe_loc.h"
@@ -19,6 +20,7 @@ enum resp_states {
 	RESPST_CHK_RESOURCE,
 	RESPST_CHK_LENGTH,
 	RESPST_CHK_RKEY,
+	RESPST_CHK_PLT,
 	RESPST_EXECUTE,
 	RESPST_READ_REPLY,
 	RESPST_COMPLETE,
@@ -35,6 +37,7 @@ enum resp_states {
 	RESPST_ERR_TOO_MANY_RDMA_ATM_REQ,
 	RESPST_ERR_RNR,
 	RESPST_ERR_RKEY_VIOLATION,
+	RESPST_ERR_PLT_VIOLATION,
 	RESPST_ERR_INVALIDATE_RKEY,
 	RESPST_ERR_LENGTH,
 	RESPST_ERR_CQ_OVERFLOW,
@@ -53,6 +56,7 @@ static char *resp_state_name[] = {
 	[RESPST_CHK_RESOURCE]			= "CHK_RESOURCE",
 	[RESPST_CHK_LENGTH]			= "CHK_LENGTH",
 	[RESPST_CHK_RKEY]			= "CHK_RKEY",
+	[RESPST_CHK_PLT]			= "CHK_PLACEMENT_TYPE",
 	[RESPST_EXECUTE]			= "EXECUTE",
 	[RESPST_READ_REPLY]			= "READ_REPLY",
 	[RESPST_COMPLETE]			= "COMPLETE",
@@ -69,6 +73,7 @@ static char *resp_state_name[] = {
 	[RESPST_ERR_TOO_MANY_RDMA_ATM_REQ]	= "ERR_TOO_MANY_RDMA_ATM_REQ",
 	[RESPST_ERR_RNR]			= "ERR_RNR",
 	[RESPST_ERR_RKEY_VIOLATION]		= "ERR_RKEY_VIOLATION",
+	[RESPST_ERR_PLT_VIOLATION]		= "ERR_PLACEMENT_TYPE_VIOLATION",
 	[RESPST_ERR_INVALIDATE_RKEY]		= "ERR_INVALIDATE_RKEY_VIOLATION",
 	[RESPST_ERR_LENGTH]			= "ERR_LENGTH",
 	[RESPST_ERR_CQ_OVERFLOW]		= "ERR_CQ_OVERFLOW",
@@ -400,6 +405,24 @@ static enum resp_states check_length(struct rxe_qp *qp,
 	}
 }
 
+static enum resp_states check_placement_type(struct rxe_qp *qp,
+					     struct rxe_pkt_info *pkt)
+{
+	struct rxe_mr *mr = qp->resp.mr;
+	u32 plt = feth_plt(pkt);
+
+	if ((plt & IB_EXT_PLT_GLB_VIS &&
+	    !(mr->access & IB_ACCESS_FLUSH_GLOBAL_VISIBILITY)) ||
+	    (plt & IB_EXT_PLT_PERSIST &&
+	    !(mr->access & IB_ACCESS_FLUSH_PERSISTENT))) {
+		pr_info("Target MR didn't support this placement type, registered flag: %x, requested flag: %x\n",
+		        (mr->access & IB_ACCESS_FLUSHABLE) >> 8, plt);
+		return RESPST_ERR_PLT_VIOLATION;
+	}
+
+	return RESPST_EXECUTE;
+}
+
 static enum resp_states check_rkey(struct rxe_qp *qp,
 				   struct rxe_pkt_info *pkt)
 {
@@ -413,7 +436,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 	enum resp_states state;
 	int access;
 
-	if (pkt->mask & RXE_READ_OR_WRITE_MASK) {
+	if (pkt->mask & (RXE_READ_OR_WRITE_MASK | RXE_FLUSH_MASK)) {
 		if (pkt->mask & RXE_RETH_MASK) {
 			qp->resp.va = reth_va(pkt);
 			qp->resp.offset = 0;
@@ -421,8 +444,12 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 			qp->resp.resid = reth_len(pkt);
 			qp->resp.length = reth_len(pkt);
 		}
-		access = (pkt->mask & RXE_READ_MASK) ? IB_ACCESS_REMOTE_READ
-						     : IB_ACCESS_REMOTE_WRITE;
+		if (pkt->mask & RXE_FLUSH_MASK)
+			access = IB_ACCESS_FLUSHABLE;
+		else if (pkt->mask & RXE_READ_MASK)
+			access = IB_ACCESS_REMOTE_READ;
+		else
+			access = IB_ACCESS_REMOTE_WRITE;
 	} else if (pkt->mask & RXE_ATOMIC_MASK) {
 		qp->resp.va = atmeth_va(pkt);
 		qp->resp.offset = 0;
@@ -434,8 +461,10 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 	}
 
 	/* A zero-byte op is not required to set an addr or rkey. */
+	/* RXE_FETH_MASK carraies zero-byte playload */
 	if ((pkt->mask & RXE_READ_OR_WRITE_MASK) &&
 	    (pkt->mask & RXE_RETH_MASK) &&
+	    !(pkt->mask & RXE_FETH_MASK) &&
 	    reth_len(pkt) == 0) {
 		return RESPST_EXECUTE;
 	}
@@ -503,7 +532,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 	WARN_ON_ONCE(qp->resp.mr);
 
 	qp->resp.mr = mr;
-	return RESPST_EXECUTE;
+	return pkt->mask & RXE_FETH_MASK ? RESPST_CHK_PLT : RESPST_EXECUTE;
 
 err:
 	if (mr)
@@ -549,6 +578,93 @@ static enum resp_states write_data_in(struct rxe_qp *qp,
 	return rc;
 }
 
+static int nvdimm_flush_iova(struct rxe_mr *mr, u64 iova, int length)
+{
+	int err;
+	int bytes;
+	u8 *va;
+	struct rxe_map **map;
+	struct rxe_phys_buf *buf;
+	int m;
+	int i;
+	size_t offset;
+
+	if (length == 0)
+		return 0;
+
+	if (mr->type == IB_MR_TYPE_DMA) {
+		err = -EFAULT;
+		goto err1;
+	}
+
+	err = mr_check_range(mr, iova, length);
+	if (err) {
+		err = -EFAULT;
+		goto err1;
+	}
+
+	lookup_iova(mr, iova, &m, &i, &offset);
+
+	map = mr->cur_map_set->map + m;
+	buf = map[0]->buf + i;
+
+	while (length > 0) {
+		va = (u8 *)(uintptr_t)buf->addr + offset;
+		bytes = buf->size - offset;
+
+		if (bytes > length)
+			bytes = length;
+
+		arch_wb_cache_pmem(va, bytes);
+
+		length -= bytes;
+
+		offset = 0;
+		buf++;
+		i++;
+
+		if (i == RXE_BUF_PER_MAP) {
+			i = 0;
+			map++;
+			buf = map[0]->buf;
+		}
+	}
+
+	return 0;
+
+err1:
+	return err;
+}
+
+static enum resp_states process_flush(struct rxe_qp *qp,
+				       struct rxe_pkt_info *pkt)
+{
+	u64 length, start;
+	u32 sel = feth_sel(pkt);
+	u32 plt = feth_plt(pkt);
+	struct rxe_mr *mr = qp->resp.mr;
+
+	if (sel == IB_EXT_SEL_MR_RANGE) {
+		start = qp->resp.va;
+		length = qp->resp.length;
+	} else { /* sel == IB_EXT_SEL_MR_WHOLE */
+		start = mr->cur_map_set->iova;
+		length = mr->cur_map_set->length;
+	}
+
+	if (plt & IB_EXT_PLT_PERSIST) {
+		if (nvdimm_flush_iova(mr, start, length))
+			return RESPST_ERR_RKEY_VIOLATION;
+		wmb();
+	} else if (plt & IB_EXT_PLT_GLB_VIS)
+		wmb();
+
+	/* Prepare RDMA READ response of zero */
+	qp->resp.resid = 0;
+
+	return RESPST_READ_REPLY;
+}
+
 /* Guarantee atomicity of atomic operations at the machine level. */
 static DEFINE_SPINLOCK(atomic_ops_lock);
 
@@ -801,6 +917,8 @@ static enum resp_states execute(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
 		err = process_atomic(qp, pkt);
 		if (err)
 			return err;
+	} else if (pkt->mask & RXE_FLUSH_MASK) {
+		return process_flush(qp, pkt);
 	} else {
 		/* Unreachable */
 		WARN_ON_ONCE(1);
@@ -1061,7 +1179,7 @@ static enum resp_states duplicate_request(struct rxe_qp *qp,
 		/* SEND. Ack again and cleanup. C9-105. */
 		send_ack(qp, pkt, AETH_ACK_UNLIMITED, prev_psn);
 		return RESPST_CLEANUP;
-	} else if (pkt->mask & RXE_READ_MASK) {
+	} else if (pkt->mask & RXE_READ_MASK || pkt->mask & RXE_FLUSH_MASK) {
 		struct resp_res *res;
 
 		res = find_resource(qp, pkt->psn);
@@ -1100,7 +1218,7 @@ static enum resp_states duplicate_request(struct rxe_qp *qp,
 			/* Reset the resource, except length. */
 			res->read.va_org = iova;
 			res->read.va = iova;
-			res->read.resid = resid;
+			res->read.resid = pkt->mask & RXE_FLUSH_MASK ? 0 : resid;
 
 			/* Replay the RDMA read reply. */
 			qp->resp.res = res;
@@ -1247,6 +1365,9 @@ int rxe_responder(void *arg)
 		case RESPST_CHK_RKEY:
 			state = check_rkey(qp, pkt);
 			break;
+		case RESPST_CHK_PLT:
+			state = check_placement_type(qp, pkt);
+			break;
 		case RESPST_EXECUTE:
 			state = execute(qp, pkt);
 			break;
@@ -1301,6 +1422,8 @@ int rxe_responder(void *arg)
 			break;
 
 		case RESPST_ERR_RKEY_VIOLATION:
+		/* oA19-13 8 */
+		case RESPST_ERR_PLT_VIOLATION:
 			if (qp_type(qp) == IB_QPT_RC) {
 				/* Class C */
 				do_class_ac_error(qp, AETH_NAK_REM_ACC_ERR,
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index c4131913ef6a..69a04bb828a0 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -105,6 +105,16 @@ enum {
 	IB_USER_VERBS_EX_CMD_MODIFY_CQ
 };
 
+enum ib_ext_placement_type {
+	IB_EXT_PLT_GLB_VIS = 1 << 0,
+	IB_EXT_PLT_PERSIST = 1 << 1,
+};
+
+enum ib_ext_selectivity_level {
+	IB_EXT_SEL_MR_RANGE = 0, /* select a MR range */
+	IB_EXT_SEL_MR_WHOLE, /* select the whole MR */
+};
+
 /*
  * Make sure that all structs defined in this file remain laid out so
  * that they pack the same way on 32-bit and 64-bit architectures (to
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH v3 5/7] RDMA/rxe: Implement flush completion
  2022-03-15 10:18 [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (3 preceding siblings ...)
  2022-03-15 10:18 ` [RFC PATCH v3 4/7] RDMA/rxe: Implement flush execution in responder side Li Zhijian
@ 2022-03-15 10:18 ` Li Zhijian
  2022-03-15 10:18 ` [RFC PATCH v3 6/7] RDMA/rxe: Enable RDMA FLUSH capability for rxe device Li Zhijian
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Li Zhijian @ 2022-03-15 10:18 UTC (permalink / raw)
  To: linux-rdma, zyjzyj2000, jgg, aharonl, leon, tom, tomasz.gromadzki
  Cc: linux-kernel, mbloch, liangwenpeng, yangx.jy, y-goto,
	rpearsonhpe, dan.j.williams, Li Zhijian

Introduce a new IB_UVERBS_WC_FLUSH code to tell userspace a FLUSH
completion.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c | 4 +++-
 include/rdma/ib_verbs.h              | 1 +
 include/uapi/rdma/ib_user_verbs.h    | 1 +
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index f363fe3fa414..e5b9d07eba93 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -104,6 +104,7 @@ static enum ib_wc_opcode wr_to_wc_opcode(enum ib_wr_opcode opcode)
 	case IB_WR_LOCAL_INV:			return IB_WC_LOCAL_INV;
 	case IB_WR_REG_MR:			return IB_WC_REG_MR;
 	case IB_WR_BIND_MW:			return IB_WC_BIND_MW;
+	case IB_WR_RDMA_FLUSH:			return IB_WC_RDMA_FLUSH;
 
 	default:
 		return 0xff;
@@ -261,7 +262,8 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
 		 */
 	case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE:
 		if (wqe->wr.opcode != IB_WR_RDMA_READ &&
-		    wqe->wr.opcode != IB_WR_RDMA_READ_WITH_INV) {
+		    wqe->wr.opcode != IB_WR_RDMA_READ_WITH_INV &&
+		    wqe->wr.opcode != IB_WR_RDMA_FLUSH) {
 			wqe->status = IB_WC_FATAL_ERR;
 			return COMPST_ERROR;
 		}
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 8f04e45b56aa..a73c746759b0 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -965,6 +965,7 @@ const char *__attribute_const__ ib_wc_status_msg(enum ib_wc_status status);
 enum ib_wc_opcode {
 	IB_WC_SEND = IB_UVERBS_WC_SEND,
 	IB_WC_RDMA_WRITE = IB_UVERBS_WC_RDMA_WRITE,
+	IB_WC_RDMA_FLUSH = IB_UVERBS_WC_FLUSH,
 	IB_WC_RDMA_READ = IB_UVERBS_WC_RDMA_READ,
 	IB_WC_COMP_SWAP = IB_UVERBS_WC_COMP_SWAP,
 	IB_WC_FETCH_ADD = IB_UVERBS_WC_FETCH_ADD,
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 69a04bb828a0..a8ba12aa04e6 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -476,6 +476,7 @@ enum ib_uverbs_wc_opcode {
 	IB_UVERBS_WC_BIND_MW = 5,
 	IB_UVERBS_WC_LOCAL_INV = 6,
 	IB_UVERBS_WC_TSO = 7,
+	IB_UVERBS_WC_FLUSH = 8,
 };
 
 struct ib_uverbs_wc {
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH v3 6/7] RDMA/rxe: Enable RDMA FLUSH capability for rxe device
  2022-03-15 10:18 [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (4 preceding siblings ...)
  2022-03-15 10:18 ` [RFC PATCH v3 5/7] RDMA/rxe: Implement flush completion Li Zhijian
@ 2022-03-15 10:18 ` Li Zhijian
  2022-03-15 10:18 ` [RFC PATCH v3 7/7] RDMA/rxe: Add RD FLUSH service support Li Zhijian
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Li Zhijian @ 2022-03-15 10:18 UTC (permalink / raw)
  To: linux-rdma, zyjzyj2000, jgg, aharonl, leon, tom, tomasz.gromadzki
  Cc: linux-kernel, mbloch, liangwenpeng, yangx.jy, y-goto,
	rpearsonhpe, dan.j.williams, Li Zhijian

Now we are ready to enable RDMA FLUSH capability for RXE.
It can support Global Visibility and Persistence placement types.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V2: adjust patch's order. move it here from [04/10]
---
 drivers/infiniband/sw/rxe/rxe_param.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
index 918270e34a35..281e1977b147 100644
--- a/drivers/infiniband/sw/rxe/rxe_param.h
+++ b/drivers/infiniband/sw/rxe/rxe_param.h
@@ -53,7 +53,9 @@ enum rxe_device_param {
 					| IB_DEVICE_ALLOW_USER_UNREG
 					| IB_DEVICE_MEM_WINDOW
 					| IB_DEVICE_MEM_WINDOW_TYPE_2A
-					| IB_DEVICE_MEM_WINDOW_TYPE_2B,
+					| IB_DEVICE_MEM_WINDOW_TYPE_2B
+					| IB_DEVICE_PLT_GLOBAL_VISIBILITY
+					| IB_DEVICE_PLT_PERSISTENT,
 	RXE_MAX_SGE			= 32,
 	RXE_MAX_WQE_SIZE		= sizeof(struct rxe_send_wqe) +
 					  sizeof(struct ib_sge) * RXE_MAX_SGE,
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH v3 7/7] RDMA/rxe: Add RD FLUSH service support
  2022-03-15 10:18 [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (5 preceding siblings ...)
  2022-03-15 10:18 ` [RFC PATCH v3 6/7] RDMA/rxe: Enable RDMA FLUSH capability for rxe device Li Zhijian
@ 2022-03-15 10:18 ` Li Zhijian
  2022-03-25  7:34 ` [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation lizhijian
  2022-07-04 13:26 ` Jason Gunthorpe
  8 siblings, 0 replies; 10+ messages in thread
From: Li Zhijian @ 2022-03-15 10:18 UTC (permalink / raw)
  To: linux-rdma, zyjzyj2000, jgg, aharonl, leon, tom, tomasz.gromadzki
  Cc: linux-kernel, mbloch, liangwenpeng, yangx.jy, y-goto,
	rpearsonhpe, dan.j.williams, Li Zhijian

Since XRC has not been supported by the rxe, XRC FLUSH will not be
supported until rxe implements XRC service.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
I have not setup a RD environment to test this protocol
---
 drivers/infiniband/sw/rxe/rxe_opcode.c | 20 ++++++++++++++++++++
 include/rdma/ib_pack.h                 |  1 +
 2 files changed, 21 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.c b/drivers/infiniband/sw/rxe/rxe_opcode.c
index adea6c16dfb5..3d86129558f7 100644
--- a/drivers/infiniband/sw/rxe/rxe_opcode.c
+++ b/drivers/infiniband/sw/rxe/rxe_opcode.c
@@ -922,6 +922,26 @@ struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = {
 					  RXE_RDETH_BYTES,
 		}
 	},
+	[IB_OPCODE_RD_RDMA_FLUSH]			= {
+		.name	= "IB_OPCODE_RD_RDMA_FLUSH",
+		.mask	= RXE_RDETH_MASK | RXE_FETH_MASK | RXE_RETH_MASK |
+			  RXE_FLUSH_MASK | RXE_START_MASK |
+			  RXE_END_MASK | RXE_REQ_MASK,
+		.length = RXE_BTH_BYTES + RXE_FETH_BYTES + RXE_RETH_BYTES,
+		.offset = {
+			[RXE_BTH]	= 0,
+			[RXE_RDETH]	= RXE_BTH_BYTES,
+			[RXE_FETH]	= RXE_BTH_BYTES +
+					  RXE_RDETH_BYTES,
+			[RXE_RETH]	= RXE_BTH_BYTES +
+					  RXE_RDETH_BYTES +
+					  RXE_FETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES +
+					  RXE_RDETH_BYTES +
+					  RXE_FETH_BYTES +
+					  RXE_RETH_BYTES,
+		}
+	},
 
 	/* UD */
 	[IB_OPCODE_UD_SEND_ONLY]			= {
diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index d19edb502de6..40568a33ead8 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -151,6 +151,7 @@ enum {
 	IB_OPCODE(RD, ATOMIC_ACKNOWLEDGE),
 	IB_OPCODE(RD, COMPARE_SWAP),
 	IB_OPCODE(RD, FETCH_ADD),
+	IB_OPCODE(RD, RDMA_FLUSH),
 
 	/* UD */
 	IB_OPCODE(UD, SEND_ONLY),
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation
  2022-03-15 10:18 [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (6 preceding siblings ...)
  2022-03-15 10:18 ` [RFC PATCH v3 7/7] RDMA/rxe: Add RD FLUSH service support Li Zhijian
@ 2022-03-25  7:34 ` lizhijian
  2022-07-04 13:26 ` Jason Gunthorpe
  8 siblings, 0 replies; 10+ messages in thread
From: lizhijian @ 2022-03-25  7:34 UTC (permalink / raw)
  To: linux-rdma, zyjzyj2000, jgg, aharonl, leon, tom, tomasz.gromadzki
  Cc: linux-kernel, mbloch, liangwenpeng, yangx.jy, y-goto,
	rpearsonhpe, dan.j.williams

kindly ping


On 15/03/2022 18:18, Li Zhijian wrote:
> Hey folks,
>
> These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
> In IB SPEC 1.5[1], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
> added in the MEMORY PLACEMENT EXTENSIONS section.
>
> This patchset makes SoftRoCE support new RDMA FLUSH on RC and RD service.
>
> You can verify the patchset by building and running the rdma_flush example[2].
> server:
> $ ./rdma_flush_server -s [server_address] -p [port_number]
> client:
> $ ./rdma_flush_client -s [server_address] -p [port_number]
>
> - We introduce new packet format for FLUSH request.
> - We introduce FLUSH placement type attributes to HCA
> - We introduce FLUSH access flags that users are able to register with
>
> [1]: https://www.infinibandta.org/wp-content/uploads/2021/08/IBTA-Overview-of-IBTA-Volume-1-Release-1.5-and-MPE-2021-08-17-Secure.pptx
> [2]: https://github.com/zhijianli88/rdma-core/tree/rdma-flush
>
> CC: Xiao Yang <yangx.jy@fujitsu.com>
> CC: y-goto@fujitsu.com
> CC: Jason Gunthorpe <jgg@ziepe.ca>
> CC: Zhu Yanjun <zyjzyj2000@gmail.com
> CC: Leon Romanovsky <leon@kernel.org>
> CC: Bob Pearson <rpearsonhpe@gmail.com>
> CC: Mark Bloch <mbloch@nvidia.com>
> CC: Wenpeng Liang <liangwenpeng@huawei.com>
> CC: Aharon Landau <aharonl@nvidia.com>
> CC: Tom Talpey <tom@talpey.com>
> CC: "Gromadzki, Tomasz" <tomasz.gromadzki@intel.com>
> CC: Dan Williams <dan.j.williams@intel.com>
> CC: linux-rdma@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
>
> Can also access the kernel source in:
> https://github.com/zhijianli88/linux/tree/rdma-flush
> Changes log
> V3:
> - Just rebase and commit log and comment updates
> - delete patch-1: "RDMA: mr: Introduce is_pmem", which will be combined into "Allow registering persistent flag for pmem MR only"
> - delete patch-7
>
> V2:
> RDMA: mr: Introduce is_pmem
>     check 1st byte to avoid crossing page boundary
>     new scheme to check is_pmem # Dan
>
> RDMA: Allow registering MR with flush access flags
>     combine with [03/10] RDMA/rxe: Allow registering FLUSH flags for supported device only to this patch # Jason
>     split RDMA_FLUSH to 2 capabilities
>
> RDMA/rxe: Allow registering persistent flag for pmem MR only
>     update commit message, get rid of confusing ib_check_flush_access_flags() # Tom
>
> RDMA/rxe: Implement RC RDMA FLUSH service in requester side
>     extend flush to include length field. # Tom and Tomasz
>
> RDMA/rxe: Implement flush execution in responder side
>     adjust start for WHOLE MR level # Tom
>     don't support DMA mr for flush # Tom
>     check flush return value
>
> RDMA/rxe: Enable RDMA FLUSH capability for rxe device
>     adjust patch's order. move it here from [04/10]
>
> Li Zhijian (7):
>    RDMA: Allow registering MR with flush access flags
>    RDMA/rxe: Allow registering persistent flag for pmem MR only
>    RDMA/rxe: Implement RC RDMA FLUSH service in requester side
>    RDMA/rxe: Implement flush execution in responder side
>    RDMA/rxe: Implement flush completion
>    RDMA/rxe: Enable RDMA FLUSH capability for rxe device
>    RDMA/rxe: Add RD FLUSH service support
>
>   drivers/infiniband/core/uverbs_cmd.c    |  17 +++
>   drivers/infiniband/sw/rxe/rxe_comp.c    |   4 +-
>   drivers/infiniband/sw/rxe/rxe_hdr.h     |  48 +++++++++
>   drivers/infiniband/sw/rxe/rxe_loc.h     |   2 +
>   drivers/infiniband/sw/rxe/rxe_mr.c      |  36 ++++++-
>   drivers/infiniband/sw/rxe/rxe_opcode.c  |  35 ++++++
>   drivers/infiniband/sw/rxe/rxe_opcode.h  |   3 +
>   drivers/infiniband/sw/rxe/rxe_param.h   |   4 +-
>   drivers/infiniband/sw/rxe/rxe_req.c     |  15 ++-
>   drivers/infiniband/sw/rxe/rxe_resp.c    | 135 ++++++++++++++++++++++--
>   include/rdma/ib_pack.h                  |   3 +
>   include/rdma/ib_verbs.h                 |  29 ++++-
>   include/uapi/rdma/ib_user_ioctl_verbs.h |   2 +
>   include/uapi/rdma/ib_user_verbs.h       |  19 ++++
>   include/uapi/rdma/rdma_user_rxe.h       |   7 ++
>   15 files changed, 346 insertions(+), 13 deletions(-)
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation
  2022-03-15 10:18 [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
                   ` (7 preceding siblings ...)
  2022-03-25  7:34 ` [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation lizhijian
@ 2022-07-04 13:26 ` Jason Gunthorpe
  8 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2022-07-04 13:26 UTC (permalink / raw)
  To: Li Zhijian
  Cc: linux-rdma, zyjzyj2000, aharonl, leon, tom, tomasz.gromadzki,
	linux-kernel, mbloch, liangwenpeng, yangx.jy, y-goto,
	rpearsonhpe, dan.j.williams

On Tue, Mar 15, 2022 at 06:18:38PM +0800, Li Zhijian wrote:
> Hey folks,
> 
> These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
> In IB SPEC 1.5[1], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
> added in the MEMORY PLACEMENT EXTENSIONS section.
> 
> This patchset makes SoftRoCE support new RDMA FLUSH on RC and RD service.
> 
> You can verify the patchset by building and running the rdma_flush example[2].
> server:
> $ ./rdma_flush_server -s [server_address] -p [port_number]
> client:
> $ ./rdma_flush_client -s [server_address] -p [port_number]
> 
> - We introduce new packet format for FLUSH request.
> - We introduce FLUSH placement type attributes to HCA
> - We introduce FLUSH access flags that users are able to register with

So where are we on this? Are all the rxe regressions fixed now? It
doesn't apply so I'm dropping it from patchworks.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-07-04 13:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-15 10:18 [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation Li Zhijian
2022-03-15 10:18 ` [RFC PATCH v3 1/7] RDMA: Allow registering MR with flush access flags Li Zhijian
2022-03-15 10:18 ` [RFC PATCH v3 2/7] RDMA/rxe: Allow registering persistent flag for pmem MR only Li Zhijian
2022-03-15 10:18 ` [RFC PATCH v3 3/7] RDMA/rxe: Implement RC RDMA FLUSH service in requester side Li Zhijian
2022-03-15 10:18 ` [RFC PATCH v3 4/7] RDMA/rxe: Implement flush execution in responder side Li Zhijian
2022-03-15 10:18 ` [RFC PATCH v3 5/7] RDMA/rxe: Implement flush completion Li Zhijian
2022-03-15 10:18 ` [RFC PATCH v3 6/7] RDMA/rxe: Enable RDMA FLUSH capability for rxe device Li Zhijian
2022-03-15 10:18 ` [RFC PATCH v3 7/7] RDMA/rxe: Add RD FLUSH service support Li Zhijian
2022-03-25  7:34 ` [RFC PATCH v3 0/7] RDMA/rxe: Add RDMA FLUSH operation lizhijian
2022-07-04 13:26 ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.