All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/2] RDMA/rxe: Add RDMA Atomic Write operation
@ 2022-01-13  3:03 Xiao Yang
  2022-01-13  3:03 ` [RFC PATCH v2 1/2] RDMA/rxe: Rename send_atomic_ack() and atomic member of struct resp_res Xiao Yang
  2022-01-13  3:03 ` [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation Xiao Yang
  0 siblings, 2 replies; 27+ messages in thread
From: Xiao Yang @ 2022-01-13  3:03 UTC (permalink / raw)
  To: linux-rdma, jgg, tom
  Cc: yanjun.zhu, rpearsonhpe, y-goto, lizhijian, tomasz.gromadzki, Xiao Yang

The IB SPEC v1.5[1][2] defined new RDMA operations (Atomic Write and Flush).
This patchset makes SoftRoCE support new RDMA Atomic Write on RC service.

I add ibv_wr_rdma_atomic_write() and a rdma_atomic_write example on my
rdma-core repository[3].  You can verify the patchset by building and
running the rdma_atomic_write example.
server:
$ ./rdma_atomic_write_server -s [server_address] -p [port_number]
client:
$ ./rdma_atomic_write_client -s [server_address] -p [port_number]

[1]: https://www.infinibandta.org/ibta-specification/ # login required
[2]: https://www.infinibandta.org/wp-content/uploads/2021/08/IBTA-Overview-of-IBTA-Volume-1-Release-1.5-and-MPE-2021-08-17-Secure.pptx
[3]: https://github.com/yangx-jy/rdma-core/tree/new_api

BTW: This patchset also needs the following fix.
https://www.spinics.net/lists/linux-rdma/msg107838.html

V1->V2:
1) Set IB_OPCODE_RDMA_ATOMIC_WRITE to 0x1D
2) Add rdma.atomic_wr in struct rxe_send_wr and use it to pass the atomic write value
3) Use smp_store_release() to ensure that all prior operations have completed

Xiao Yang (2):
  RDMA/rxe: Rename send_atomic_ack() and atomic member of struct
    resp_res
  RDMA/rxe: Support RDMA Atomic Write operation

 drivers/infiniband/sw/rxe/rxe_comp.c   |  4 ++
 drivers/infiniband/sw/rxe/rxe_opcode.c | 18 +++++++++
 drivers/infiniband/sw/rxe/rxe_opcode.h |  3 ++
 drivers/infiniband/sw/rxe/rxe_qp.c     |  5 ++-
 drivers/infiniband/sw/rxe/rxe_req.c    | 11 +++++-
 drivers/infiniband/sw/rxe/rxe_resp.c   | 53 +++++++++++++++++++-------
 drivers/infiniband/sw/rxe/rxe_verbs.h  |  2 +-
 include/rdma/ib_pack.h                 |  2 +
 include/rdma/ib_verbs.h                |  2 +
 include/uapi/rdma/ib_user_verbs.h      |  2 +
 include/uapi/rdma/rdma_user_rxe.h      |  1 +
 11 files changed, 85 insertions(+), 18 deletions(-)

-- 
2.23.0




^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH v2 1/2] RDMA/rxe: Rename send_atomic_ack() and atomic member of struct resp_res
  2022-01-13  3:03 [RFC PATCH v2 0/2] RDMA/rxe: Add RDMA Atomic Write operation Xiao Yang
@ 2022-01-13  3:03 ` Xiao Yang
  2022-01-13  3:03 ` [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation Xiao Yang
  1 sibling, 0 replies; 27+ messages in thread
From: Xiao Yang @ 2022-01-13  3:03 UTC (permalink / raw)
  To: linux-rdma, jgg, tom
  Cc: yanjun.zhu, rpearsonhpe, y-goto, lizhijian, tomasz.gromadzki, Xiao Yang

send_atomic_ack() and atomic member of struct resp_res will be common
in the future so rename them.

Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c    |  2 +-
 drivers/infiniband/sw/rxe/rxe_resp.c  | 10 +++++-----
 drivers/infiniband/sw/rxe/rxe_verbs.h |  2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 54b8711321c1..f3eec350f95c 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -136,7 +136,7 @@ static void free_rd_atomic_resources(struct rxe_qp *qp)
 void free_rd_atomic_resource(struct rxe_qp *qp, struct resp_res *res)
 {
 	if (res->type == RXE_ATOMIC_MASK) {
-		kfree_skb(res->atomic.skb);
+		kfree_skb(res->resp.skb);
 	} else if (res->type == RXE_READ_MASK) {
 		if (res->read.mr)
 			rxe_drop_ref(res->read.mr);
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index e8f435fa6e4d..e015860e8c34 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -957,7 +957,7 @@ static int send_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 	return err;
 }
 
-static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
+static int send_resp(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 			   u8 syndrome)
 {
 	int rc = 0;
@@ -979,7 +979,7 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 
 	skb_get(skb);
 	res->type = RXE_ATOMIC_MASK;
-	res->atomic.skb = skb;
+	res->resp.skb = skb;
 	res->first_psn = ack_pkt.psn;
 	res->last_psn  = ack_pkt.psn;
 	res->cur_psn   = ack_pkt.psn;
@@ -1002,7 +1002,7 @@ static enum resp_states acknowledge(struct rxe_qp *qp,
 	if (qp->resp.aeth_syndrome != AETH_ACK_UNLIMITED)
 		send_ack(qp, pkt, qp->resp.aeth_syndrome, pkt->psn);
 	else if (pkt->mask & RXE_ATOMIC_MASK)
-		send_atomic_ack(qp, pkt, AETH_ACK_UNLIMITED);
+		send_resp(qp, pkt, AETH_ACK_UNLIMITED);
 	else if (bth_ack(pkt))
 		send_ack(qp, pkt, AETH_ACK_UNLIMITED, pkt->psn);
 
@@ -1111,9 +1111,9 @@ static enum resp_states duplicate_request(struct rxe_qp *qp,
 		/* Find the operation in our list of responder resources. */
 		res = find_resource(qp, pkt->psn);
 		if (res) {
-			skb_get(res->atomic.skb);
+			skb_get(res->resp.skb);
 			/* Resend the result. */
-			rc = rxe_xmit_packet(qp, pkt, res->atomic.skb);
+			rc = rxe_xmit_packet(qp, pkt, res->resp.skb);
 			if (rc) {
 				pr_err("Failed resending result. This flow is not handled - skb ignored\n");
 				rc = RESPST_CLEANUP;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 35e041450090..e0606e5f9962 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -155,7 +155,7 @@ struct resp_res {
 	union {
 		struct {
 			struct sk_buff	*skb;
-		} atomic;
+		} resp;
 		struct {
 			struct rxe_mr	*mr;
 			u64		va_org;
-- 
2.23.0




^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-13  3:03 [RFC PATCH v2 0/2] RDMA/rxe: Add RDMA Atomic Write operation Xiao Yang
  2022-01-13  3:03 ` [RFC PATCH v2 1/2] RDMA/rxe: Rename send_atomic_ack() and atomic member of struct resp_res Xiao Yang
@ 2022-01-13  3:03 ` Xiao Yang
  2022-01-13 11:35   ` kernel test robot
                     ` (2 more replies)
  1 sibling, 3 replies; 27+ messages in thread
From: Xiao Yang @ 2022-01-13  3:03 UTC (permalink / raw)
  To: linux-rdma, jgg, tom
  Cc: yanjun.zhu, rpearsonhpe, y-goto, lizhijian, tomasz.gromadzki, Xiao Yang

This patch implements RDMA Atomic Write operation for RC service.

Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c   |  4 +++
 drivers/infiniband/sw/rxe/rxe_opcode.c | 18 +++++++++++
 drivers/infiniband/sw/rxe/rxe_opcode.h |  3 ++
 drivers/infiniband/sw/rxe/rxe_qp.c     |  3 +-
 drivers/infiniband/sw/rxe/rxe_req.c    | 11 +++++--
 drivers/infiniband/sw/rxe/rxe_resp.c   | 43 +++++++++++++++++++++-----
 include/rdma/ib_pack.h                 |  2 ++
 include/rdma/ib_verbs.h                |  2 ++
 include/uapi/rdma/ib_user_verbs.h      |  2 ++
 include/uapi/rdma/rdma_user_rxe.h      |  1 +
 10 files changed, 78 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index d771ba8449a1..5a5c0c3501de 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -104,6 +104,7 @@ static enum ib_wc_opcode wr_to_wc_opcode(enum ib_wr_opcode opcode)
 	case IB_WR_LOCAL_INV:			return IB_WC_LOCAL_INV;
 	case IB_WR_REG_MR:			return IB_WC_REG_MR;
 	case IB_WR_BIND_MW:			return IB_WC_BIND_MW;
+	case IB_WR_RDMA_ATOMIC_WRITE:		return IB_WC_RDMA_ATOMIC_WRITE;
 
 	default:
 		return 0xff;
@@ -256,6 +257,9 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
 		if ((syn & AETH_TYPE_MASK) != AETH_ACK)
 			return COMPST_ERROR;
 
+		if (wqe->wr.opcode == IB_WR_RDMA_ATOMIC_WRITE)
+			return COMPST_WRITE_SEND;
+
 		fallthrough;
 		/* (IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE doesn't have an AETH)
 		 */
diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.c b/drivers/infiniband/sw/rxe/rxe_opcode.c
index 3ef5a10a6efd..ec0f795adb5c 100644
--- a/drivers/infiniband/sw/rxe/rxe_opcode.c
+++ b/drivers/infiniband/sw/rxe/rxe_opcode.c
@@ -103,6 +103,12 @@ struct rxe_wr_opcode_info rxe_wr_opcode_info[] = {
 			[IB_QPT_UC]	= WR_LOCAL_OP_MASK,
 		},
 	},
+	[IB_WR_RDMA_ATOMIC_WRITE]                       = {
+		.name   = "IB_WR_RDMA_ATOMIC_WRITE",
+		.mask   = {
+			[IB_QPT_RC]     = WR_ATOMIC_WRITE_MASK,
+		},
+	},
 };
 
 struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = {
@@ -379,6 +385,18 @@ struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = {
 						+ RXE_IETH_BYTES,
 		}
 	},
+	[IB_OPCODE_RC_RDMA_ATOMIC_WRITE]                        = {
+		.name   = "IB_OPCODE_RC_RDMA_ATOMIC_WRITE",
+		.mask   = RXE_RETH_MASK | RXE_PAYLOAD_MASK | RXE_REQ_MASK
+				| RXE_ATOMIC_WRITE_MASK | RXE_START_MASK
+				| RXE_END_MASK,
+		.length = RXE_BTH_BYTES + RXE_RETH_BYTES,
+		.offset = {
+			[RXE_BTH]       = 0,
+			[RXE_RETH]      = RXE_BTH_BYTES,
+			[RXE_PAYLOAD]   = RXE_BTH_BYTES + RXE_RETH_BYTES,
+		}
+	},
 
 	/* UC */
 	[IB_OPCODE_UC_SEND_FIRST]			= {
diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.h b/drivers/infiniband/sw/rxe/rxe_opcode.h
index 8f9aaaf260f2..a470e9b0b884 100644
--- a/drivers/infiniband/sw/rxe/rxe_opcode.h
+++ b/drivers/infiniband/sw/rxe/rxe_opcode.h
@@ -20,6 +20,7 @@ enum rxe_wr_mask {
 	WR_READ_MASK			= BIT(3),
 	WR_WRITE_MASK			= BIT(4),
 	WR_LOCAL_OP_MASK		= BIT(5),
+	WR_ATOMIC_WRITE_MASK		= BIT(7),
 
 	WR_READ_OR_WRITE_MASK		= WR_READ_MASK | WR_WRITE_MASK,
 	WR_WRITE_OR_SEND_MASK		= WR_WRITE_MASK | WR_SEND_MASK,
@@ -81,6 +82,8 @@ enum rxe_hdr_mask {
 
 	RXE_LOOPBACK_MASK	= BIT(NUM_HDR_TYPES + 12),
 
+	RXE_ATOMIC_WRITE_MASK   = BIT(NUM_HDR_TYPES + 14),
+
 	RXE_READ_OR_ATOMIC_MASK	= (RXE_READ_MASK | RXE_ATOMIC_MASK),
 	RXE_WRITE_OR_SEND_MASK	= (RXE_WRITE_MASK | RXE_SEND_MASK),
 	RXE_READ_OR_WRITE_MASK	= (RXE_READ_MASK | RXE_WRITE_MASK),
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index f3eec350f95c..22a1c4bcfa60 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -135,7 +135,8 @@ static void free_rd_atomic_resources(struct rxe_qp *qp)
 
 void free_rd_atomic_resource(struct rxe_qp *qp, struct resp_res *res)
 {
-	if (res->type == RXE_ATOMIC_MASK) {
+	if (res->type == RXE_ATOMIC_MASK ||
+			res->type == RXE_ATOMIC_WRITE_MASK) {
 		kfree_skb(res->resp.skb);
 	} else if (res->type == RXE_READ_MASK) {
 		if (res->read.mr)
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 0c9d2af15f3d..203d8d19f84a 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -240,6 +240,10 @@ static int next_opcode_rc(struct rxe_qp *qp, u32 opcode, int fits)
 		else
 			return fits ? IB_OPCODE_RC_SEND_ONLY_WITH_INVALIDATE :
 				IB_OPCODE_RC_SEND_FIRST;
+
+	case IB_WR_RDMA_ATOMIC_WRITE:
+		return IB_OPCODE_RC_RDMA_ATOMIC_WRITE;
+
 	case IB_WR_REG_MR:
 	case IB_WR_LOCAL_INV:
 		return opcode;
@@ -485,6 +489,9 @@ static int finish_packet(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
 		}
 	}
 
+	if (pkt->mask & RXE_ATOMIC_WRITE_MASK)
+		memcpy(payload_addr(pkt), &wqe->wr.wr.rdma.atomic_wr, paylen);
+
 	return 0;
 }
 
@@ -680,13 +687,13 @@ int rxe_requester(void *arg)
 	}
 
 	mask = rxe_opcode[opcode].mask;
-	if (unlikely(mask & RXE_READ_OR_ATOMIC_MASK)) {
+	if (unlikely(mask & (RXE_READ_OR_ATOMIC_MASK | RXE_ATOMIC_WRITE_MASK))) {
 		if (check_init_depth(qp, wqe))
 			goto exit;
 	}
 
 	mtu = get_mtu(qp);
-	payload = (mask & RXE_WRITE_OR_SEND_MASK) ? wqe->dma.resid : 0;
+	payload = (mask & (RXE_WRITE_OR_SEND_MASK | RXE_ATOMIC_WRITE_MASK)) ? wqe->dma.resid : 0;
 	if (payload > mtu) {
 		if (qp_type(qp) == IB_QPT_UD) {
 			/* C10-93.1.1: If the total sum of all the buffer lengths specified for a
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index e015860e8c34..cf678a3aaaa9 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -258,7 +258,7 @@ static enum resp_states check_op_valid(struct rxe_qp *qp,
 	case IB_QPT_RC:
 		if (((pkt->mask & RXE_READ_MASK) &&
 		     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_READ)) ||
-		    ((pkt->mask & RXE_WRITE_MASK) &&
+		    ((pkt->mask & (RXE_WRITE_MASK | RXE_ATOMIC_WRITE_MASK)) &&
 		     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_WRITE)) ||
 		    ((pkt->mask & RXE_ATOMIC_MASK) &&
 		     !(qp->attr.qp_access_flags & IB_ACCESS_REMOTE_ATOMIC))) {
@@ -362,7 +362,7 @@ static enum resp_states check_resource(struct rxe_qp *qp,
 		}
 	}
 
-	if (pkt->mask & RXE_READ_OR_ATOMIC_MASK) {
+	if (pkt->mask & (RXE_READ_OR_ATOMIC_MASK | RXE_ATOMIC_WRITE_MASK)) {
 		/* it is the requesters job to not send
 		 * too many read/atomic ops, we just
 		 * recycle the responder resource queue
@@ -413,7 +413,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 	enum resp_states state;
 	int access;
 
-	if (pkt->mask & RXE_READ_OR_WRITE_MASK) {
+	if (pkt->mask & (RXE_READ_OR_WRITE_MASK | RXE_ATOMIC_WRITE_MASK)) {
 		if (pkt->mask & RXE_RETH_MASK) {
 			qp->resp.va = reth_va(pkt);
 			qp->resp.offset = 0;
@@ -479,7 +479,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 		goto err;
 	}
 
-	if (pkt->mask & RXE_WRITE_MASK)	 {
+	if (pkt->mask & (RXE_WRITE_MASK | RXE_ATOMIC_WRITE_MASK)) {
 		if (resid > mtu) {
 			if (pktlen != mtu || bth_pad(pkt)) {
 				state = RESPST_ERR_LENGTH;
@@ -591,6 +591,26 @@ static enum resp_states process_atomic(struct rxe_qp *qp,
 	return ret;
 }
 
+static enum resp_states process_atomic_write(struct rxe_qp *qp,
+					     struct rxe_pkt_info *pkt)
+{
+	struct rxe_mr *mr = qp->resp.mr;
+
+	u64 *src = payload_addr(pkt);
+
+	u64 *dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, sizeof(u64));
+	if (!dst || (uintptr_t)dst & 7)
+		return RESPST_ERR_MISALIGNED_ATOMIC;
+
+	/* Do atomic write after all prior operations have completed */
+	smp_store_release(dst, *src);
+
+	/* decrease resp.resid to zero */
+	qp->resp.resid -= sizeof(u64);
+
+	return RESPST_NONE;
+}
+
 static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp,
 					  struct rxe_pkt_info *pkt,
 					  struct rxe_pkt_info *ack,
@@ -801,6 +821,10 @@ static enum resp_states execute(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
 		err = process_atomic(qp, pkt);
 		if (err)
 			return err;
+	} else if (pkt->mask & RXE_ATOMIC_WRITE_MASK) {
+		err = process_atomic_write(qp, pkt);
+		if (err)
+			return err;
 	} else {
 		/* Unreachable */
 		WARN_ON_ONCE(1);
@@ -965,9 +989,12 @@ static int send_resp(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 	struct sk_buff *skb;
 	struct resp_res *res;
 
+	int opcode = pkt->mask & RXE_ATOMIC_MASK ?
+				IB_OPCODE_RC_ATOMIC_ACKNOWLEDGE :
+				IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY;
+
 	skb = prepare_ack_packet(qp, pkt, &ack_pkt,
-				 IB_OPCODE_RC_ATOMIC_ACKNOWLEDGE, 0, pkt->psn,
-				 syndrome);
+				 opcode, 0, pkt->psn, syndrome);
 	if (!skb) {
 		rc = -ENOMEM;
 		goto out;
@@ -978,7 +1005,7 @@ static int send_resp(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 	rxe_advance_resp_resource(qp);
 
 	skb_get(skb);
-	res->type = RXE_ATOMIC_MASK;
+	res->type = pkt->mask & (RXE_ATOMIC_MASK | RXE_ATOMIC_WRITE_MASK);
 	res->resp.skb = skb;
 	res->first_psn = ack_pkt.psn;
 	res->last_psn  = ack_pkt.psn;
@@ -1001,7 +1028,7 @@ static enum resp_states acknowledge(struct rxe_qp *qp,
 
 	if (qp->resp.aeth_syndrome != AETH_ACK_UNLIMITED)
 		send_ack(qp, pkt, qp->resp.aeth_syndrome, pkt->psn);
-	else if (pkt->mask & RXE_ATOMIC_MASK)
+	else if (pkt->mask & (RXE_ATOMIC_MASK | RXE_ATOMIC_WRITE_MASK))
 		send_resp(qp, pkt, AETH_ACK_UNLIMITED);
 	else if (bth_ack(pkt))
 		send_ack(qp, pkt, AETH_ACK_UNLIMITED, pkt->psn);
diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index a9162f25beaf..519ec6b841e7 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -84,6 +84,7 @@ enum {
 	/* opcode 0x15 is reserved */
 	IB_OPCODE_SEND_LAST_WITH_INVALIDATE         = 0x16,
 	IB_OPCODE_SEND_ONLY_WITH_INVALIDATE         = 0x17,
+	IB_OPCODE_RDMA_ATOMIC_WRITE                 = 0x1D,
 
 	/* real constants follow -- see comment about above IB_OPCODE()
 	   macro for more details */
@@ -112,6 +113,7 @@ enum {
 	IB_OPCODE(RC, FETCH_ADD),
 	IB_OPCODE(RC, SEND_LAST_WITH_INVALIDATE),
 	IB_OPCODE(RC, SEND_ONLY_WITH_INVALIDATE),
+	IB_OPCODE(RC, RDMA_ATOMIC_WRITE),
 
 	/* UC */
 	IB_OPCODE(UC, SEND_FIRST),
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6e9ad656ecb7..949b3586d35b 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -971,6 +971,7 @@ enum ib_wc_opcode {
 	IB_WC_REG_MR,
 	IB_WC_MASKED_COMP_SWAP,
 	IB_WC_MASKED_FETCH_ADD,
+	IB_WC_RDMA_ATOMIC_WRITE = IB_UVERBS_WC_RDMA_ATOMIC_WRITE,
 /*
  * Set value of IB_WC_RECV so consumers can test if a completion is a
  * receive by testing (opcode & IB_WC_RECV).
@@ -1311,6 +1312,7 @@ enum ib_wr_opcode {
 		IB_UVERBS_WR_MASKED_ATOMIC_CMP_AND_SWP,
 	IB_WR_MASKED_ATOMIC_FETCH_AND_ADD =
 		IB_UVERBS_WR_MASKED_ATOMIC_FETCH_AND_ADD,
+	IB_WR_RDMA_ATOMIC_WRITE = IB_UVERBS_WR_RDMA_ATOMIC_WRITE,
 
 	/* These are kernel only and can not be issued by userspace */
 	IB_WR_REG_MR = 0x20,
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 7ee73a0652f1..3b0b509fb96f 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -466,6 +466,7 @@ enum ib_uverbs_wc_opcode {
 	IB_UVERBS_WC_BIND_MW = 5,
 	IB_UVERBS_WC_LOCAL_INV = 6,
 	IB_UVERBS_WC_TSO = 7,
+	IB_UVERBS_WC_RDMA_ATOMIC_WRITE = 9,
 };
 
 struct ib_uverbs_wc {
@@ -784,6 +785,7 @@ enum ib_uverbs_wr_opcode {
 	IB_UVERBS_WR_RDMA_READ_WITH_INV = 11,
 	IB_UVERBS_WR_MASKED_ATOMIC_CMP_AND_SWP = 12,
 	IB_UVERBS_WR_MASKED_ATOMIC_FETCH_AND_ADD = 13,
+	IB_UVERBS_WR_RDMA_ATOMIC_WRITE = 15,
 	/* Review enum ib_wr_opcode before modifying this */
 };
 
diff --git a/include/uapi/rdma/rdma_user_rxe.h b/include/uapi/rdma/rdma_user_rxe.h
index f09c5c9e3dd5..7e02c614d826 100644
--- a/include/uapi/rdma/rdma_user_rxe.h
+++ b/include/uapi/rdma/rdma_user_rxe.h
@@ -86,6 +86,7 @@ struct rxe_send_wr {
 			__aligned_u64 remote_addr;
 			__u32	rkey;
 			__u32	reserved;
+			__aligned_u64 atomic_wr;
 		} rdma;
 		struct {
 			__aligned_u64 remote_addr;
-- 
2.23.0




^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-13  3:03 ` [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation Xiao Yang
@ 2022-01-13 11:35   ` kernel test robot
  2022-01-13 12:16     ` kernel test robot
  2022-01-17 13:16   ` Jason Gunthorpe
  2 siblings, 0 replies; 27+ messages in thread
From: kernel test robot @ 2022-01-13 11:35 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4676 bytes --]

Hi Xiao,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on v5.16]
[also build test ERROR on next-20220113]
[cannot apply to rdma/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xiao-Yang/RDMA-rxe-Add-RDMA-Atomic-Write-operation/20220113-110548
base:    df0cc57e057f18e44dac8e6c18aba47ab53202f9
config: arc-randconfig-r003-20220113 (https://download.01.org/0day-ci/archive/20220113/202201131838.Rbqf6UOl-lkp(a)intel.com/config)
compiler: arc-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/4294d58609cfb4c296c7937eafd58e044a4d81c6
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xiao-Yang/RDMA-rxe-Add-RDMA-Atomic-Write-operation/20220113-110548
        git checkout 4294d58609cfb4c296c7937eafd58e044a4d81c6
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash drivers/infiniband/sw/rxe/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from <command-line>:
   In function 'process_atomic_write',
       inlined from 'execute' at drivers/infiniband/sw/rxe/rxe_resp.c:825:9:
>> include/linux/compiler_types.h:335:45: error: call to '__compiletime_assert_467' declared with attribute error: Need native word sized stores/loads for atomicity.
     335 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |                                             ^
   include/linux/compiler_types.h:316:25: note: in definition of macro '__compiletime_assert'
     316 |                         prefix ## suffix();                             \
         |                         ^~~~~~
   include/linux/compiler_types.h:335:9: note: in expansion of macro '_compiletime_assert'
     335 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/linux/compiler_types.h:338:9: note: in expansion of macro 'compiletime_assert'
     338 |         compiletime_assert(__native_word(t),                            \
         |         ^~~~~~~~~~~~~~~~~~
   include/asm-generic/barrier.h:162:9: note: in expansion of macro 'compiletime_assert_atomic_type'
     162 |         compiletime_assert_atomic_type(*p);                             \
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/infiniband/sw/rxe/rxe_resp.c:606:9: note: in expansion of macro 'smp_store_release'
     606 |         smp_store_release(dst, *src);
         |         ^~~~~~~~~~~~~~~~~


vim +/__compiletime_assert_467 +335 include/linux/compiler_types.h

eb5c2d4b45e3d2 Will Deacon 2020-07-21  321  
eb5c2d4b45e3d2 Will Deacon 2020-07-21  322  #define _compiletime_assert(condition, msg, prefix, suffix) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21  323  	__compiletime_assert(condition, msg, prefix, suffix)
eb5c2d4b45e3d2 Will Deacon 2020-07-21  324  
eb5c2d4b45e3d2 Will Deacon 2020-07-21  325  /**
eb5c2d4b45e3d2 Will Deacon 2020-07-21  326   * compiletime_assert - break build and emit msg if condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21  327   * @condition: a compile-time constant condition to check
eb5c2d4b45e3d2 Will Deacon 2020-07-21  328   * @msg:       a message to emit if condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21  329   *
eb5c2d4b45e3d2 Will Deacon 2020-07-21  330   * In tradition of POSIX assert, this macro will break the build if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21  331   * supplied condition is *false*, emitting the supplied error message if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21  332   * compiler has support to do so.
eb5c2d4b45e3d2 Will Deacon 2020-07-21  333   */
eb5c2d4b45e3d2 Will Deacon 2020-07-21  334  #define compiletime_assert(condition, msg) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21 @335  	_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
eb5c2d4b45e3d2 Will Deacon 2020-07-21  336  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-13  3:03 ` [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation Xiao Yang
@ 2022-01-13 12:16     ` kernel test robot
  2022-01-13 12:16     ` kernel test robot
  2022-01-17 13:16   ` Jason Gunthorpe
  2 siblings, 0 replies; 27+ messages in thread
From: kernel test robot @ 2022-01-13 12:16 UTC (permalink / raw)
  To: Xiao Yang; +Cc: llvm, kbuild-all

Hi Xiao,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on v5.16]
[also build test ERROR on next-20220113]
[cannot apply to rdma/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xiao-Yang/RDMA-rxe-Add-RDMA-Atomic-Write-operation/20220113-110548
base:    df0cc57e057f18e44dac8e6c18aba47ab53202f9
config: riscv-randconfig-r035-20220113 (https://download.01.org/0day-ci/archive/20220113/202201132027.agYTyFRu-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project d1021978b8e7e35dcc30201ca1731d64b5a602a8)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/0day-ci/linux/commit/4294d58609cfb4c296c7937eafd58e044a4d81c6
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xiao-Yang/RDMA-rxe-Add-RDMA-Atomic-Write-operation/20220113-110548
        git checkout 4294d58609cfb4c296c7937eafd58e044a4d81c6
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash drivers/infiniband/sw/rxe/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/infiniband/sw/rxe/rxe_resp.c:606:2: error: call to __compiletime_assert_499 declared with 'error' attribute: Need native word sized stores/loads for atomicity.
           smp_store_release(dst, *src);
           ^
   include/asm-generic/barrier.h:162:2: note: expanded from macro 'smp_store_release'
           compiletime_assert_atomic_type(*p);                             \
           ^
   include/linux/compiler_types.h:338:2: note: expanded from macro 'compiletime_assert_atomic_type'
           compiletime_assert(__native_word(t),                            \
           ^
   include/linux/compiler_types.h:335:2: note: expanded from macro 'compiletime_assert'
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
           ^
   include/linux/compiler_types.h:323:2: note: expanded from macro '_compiletime_assert'
           __compiletime_assert(condition, msg, prefix, suffix)
           ^
   include/linux/compiler_types.h:316:4: note: expanded from macro '__compiletime_assert'
                           prefix ## suffix();                             \
                           ^
   <scratch space>:165:1: note: expanded from here
   __compiletime_assert_499
   ^
   1 error generated.


vim +/error +606 drivers/infiniband/sw/rxe/rxe_resp.c

   593	
   594	static enum resp_states process_atomic_write(struct rxe_qp *qp,
   595						     struct rxe_pkt_info *pkt)
   596	{
   597		struct rxe_mr *mr = qp->resp.mr;
   598	
   599		u64 *src = payload_addr(pkt);
   600	
   601		u64 *dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, sizeof(u64));
   602		if (!dst || (uintptr_t)dst & 7)
   603			return RESPST_ERR_MISALIGNED_ATOMIC;
   604	
   605		/* Do atomic write after all prior operations have completed */
 > 606		smp_store_release(dst, *src);
   607	
   608		/* decrease resp.resid to zero */
   609		qp->resp.resid -= sizeof(u64);
   610	
   611		return RESPST_NONE;
   612	}
   613	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
@ 2022-01-13 12:16     ` kernel test robot
  0 siblings, 0 replies; 27+ messages in thread
From: kernel test robot @ 2022-01-13 12:16 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3899 bytes --]

Hi Xiao,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on v5.16]
[also build test ERROR on next-20220113]
[cannot apply to rdma/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xiao-Yang/RDMA-rxe-Add-RDMA-Atomic-Write-operation/20220113-110548
base:    df0cc57e057f18e44dac8e6c18aba47ab53202f9
config: riscv-randconfig-r035-20220113 (https://download.01.org/0day-ci/archive/20220113/202201132027.agYTyFRu-lkp(a)intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project d1021978b8e7e35dcc30201ca1731d64b5a602a8)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/0day-ci/linux/commit/4294d58609cfb4c296c7937eafd58e044a4d81c6
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xiao-Yang/RDMA-rxe-Add-RDMA-Atomic-Write-operation/20220113-110548
        git checkout 4294d58609cfb4c296c7937eafd58e044a4d81c6
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash drivers/infiniband/sw/rxe/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/infiniband/sw/rxe/rxe_resp.c:606:2: error: call to __compiletime_assert_499 declared with 'error' attribute: Need native word sized stores/loads for atomicity.
           smp_store_release(dst, *src);
           ^
   include/asm-generic/barrier.h:162:2: note: expanded from macro 'smp_store_release'
           compiletime_assert_atomic_type(*p);                             \
           ^
   include/linux/compiler_types.h:338:2: note: expanded from macro 'compiletime_assert_atomic_type'
           compiletime_assert(__native_word(t),                            \
           ^
   include/linux/compiler_types.h:335:2: note: expanded from macro 'compiletime_assert'
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
           ^
   include/linux/compiler_types.h:323:2: note: expanded from macro '_compiletime_assert'
           __compiletime_assert(condition, msg, prefix, suffix)
           ^
   include/linux/compiler_types.h:316:4: note: expanded from macro '__compiletime_assert'
                           prefix ## suffix();                             \
                           ^
   <scratch space>:165:1: note: expanded from here
   __compiletime_assert_499
   ^
   1 error generated.


vim +/error +606 drivers/infiniband/sw/rxe/rxe_resp.c

   593	
   594	static enum resp_states process_atomic_write(struct rxe_qp *qp,
   595						     struct rxe_pkt_info *pkt)
   596	{
   597		struct rxe_mr *mr = qp->resp.mr;
   598	
   599		u64 *src = payload_addr(pkt);
   600	
   601		u64 *dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, sizeof(u64));
   602		if (!dst || (uintptr_t)dst & 7)
   603			return RESPST_ERR_MISALIGNED_ATOMIC;
   604	
   605		/* Do atomic write after all prior operations have completed */
 > 606		smp_store_release(dst, *src);
   607	
   608		/* decrease resp.resid to zero */
   609		qp->resp.resid -= sizeof(u64);
   610	
   611		return RESPST_NONE;
   612	}
   613	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-13  3:03 ` [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation Xiao Yang
  2022-01-13 11:35   ` kernel test robot
  2022-01-13 12:16     ` kernel test robot
@ 2022-01-17 13:16   ` Jason Gunthorpe
  2022-01-18  8:01     ` yangx.jy
                       ` (2 more replies)
  2 siblings, 3 replies; 27+ messages in thread
From: Jason Gunthorpe @ 2022-01-17 13:16 UTC (permalink / raw)
  To: Xiao Yang
  Cc: linux-rdma, tom, yanjun.zhu, rpearsonhpe, y-goto, lizhijian,
	tomasz.gromadzki

On Thu, Jan 13, 2022 at 11:03:50AM +0800, Xiao Yang wrote:
> +static enum resp_states process_atomic_write(struct rxe_qp *qp,
> +					     struct rxe_pkt_info *pkt)
> +{
> +	struct rxe_mr *mr = qp->resp.mr;
> +
> +	u64 *src = payload_addr(pkt);
> +
> +	u64 *dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, sizeof(u64));
> +	if (!dst || (uintptr_t)dst & 7)
> +		return RESPST_ERR_MISALIGNED_ATOMIC;

It looks to me like iova_to_vaddr is completely broken, where is the
kmap on that flow?

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-17 13:16   ` Jason Gunthorpe
@ 2022-01-18  8:01     ` yangx.jy
  2022-01-18 12:35       ` Jason Gunthorpe
  2022-01-18  8:02     ` yangx.jy
  2022-01-18  8:04     ` yangx.jy
  2 siblings, 1 reply; 27+ messages in thread
From: yangx.jy @ 2022-01-18  8:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, tom, yanjun.zhu, rpearsonhpe, y-goto, lizhijian,
	tomasz.gromadzki

On 2022/1/17 21:16, Jason Gunthorpe wrote:
> On Thu, Jan 13, 2022 at 11:03:50AM +0800, Xiao Yang wrote:
>> +static enum resp_states process_atomic_write(struct rxe_qp *qp,
>> +					     struct rxe_pkt_info *pkt)
>> +{
>> +	struct rxe_mr *mr = qp->resp.mr;
>> +
>> +	u64 *src = payload_addr(pkt);
>> +
>> +	u64 *dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, sizeof(u64));
>> +	if (!dst || (uintptr_t)dst&  7)
>> +		return RESPST_ERR_MISALIGNED_ATOMIC;
> It looks to me like iova_to_vaddr is completely broken, where is the
> kmap on that flow?
Hi Jason,

I think rxe_mr_init_user() maps the user addr space to the kernel addr 
space during memory region registration, the mapping records are saved 
into mr->cur_map_set->map[x].
Why do you think iova_to_vaddr() is completely broken?

Best Regards,
Xiao Yang
> Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-17 13:16   ` Jason Gunthorpe
  2022-01-18  8:01     ` yangx.jy
@ 2022-01-18  8:02     ` yangx.jy
  2022-01-18  8:04     ` yangx.jy
  2 siblings, 0 replies; 27+ messages in thread
From: yangx.jy @ 2022-01-18  8:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, tom, yanjun.zhu, rpearsonhpe, y-goto, lizhijian,
	tomasz.gromadzki

On 2022/1/17 21:16, Jason Gunthorpe wrote:
> On Thu, Jan 13, 2022 at 11:03:50AM +0800, Xiao Yang wrote:
>> +static enum resp_states process_atomic_write(struct rxe_qp *qp,
>> +					     struct rxe_pkt_info *pkt)
>> +{
>> +	struct rxe_mr *mr = qp->resp.mr;
>> +
>> +	u64 *src = payload_addr(pkt);
>> +
>> +	u64 *dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, sizeof(u64));
>> +	if (!dst || (uintptr_t)dst&  7)
>> +		return RESPST_ERR_MISALIGNED_ATOMIC;
> It looks to me like iova_to_vaddr is completely broken, where is the
> kmap on that flow?
Hi Jason,

I think rxe_mr_init_user() maps the user addr space to the kernel addr 
space during memory region registration, the mapping records are saved 
into mr->cur_map_set->map[x].
Why do you think iova_to_vaddr() is completely broken?

Best Regards,
Xiao Yang
> Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-17 13:16   ` Jason Gunthorpe
  2022-01-18  8:01     ` yangx.jy
  2022-01-18  8:02     ` yangx.jy
@ 2022-01-18  8:04     ` yangx.jy
  2022-01-18  9:03       ` yangx.jy
  2 siblings, 1 reply; 27+ messages in thread
From: yangx.jy @ 2022-01-18  8:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, tom, yanjun.zhu, rpearsonhpe, y-goto, lizhijian,
	tomasz.gromadzki

On 2022/1/17 21:16, Jason Gunthorpe wrote:
> On Thu, Jan 13, 2022 at 11:03:50AM +0800, Xiao Yang wrote:
>> +static enum resp_states process_atomic_write(struct rxe_qp *qp,
>> +					     struct rxe_pkt_info *pkt)
>> +{
>> +	struct rxe_mr *mr = qp->resp.mr;
>> +
>> +	u64 *src = payload_addr(pkt);
>> +
>> +	u64 *dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, sizeof(u64));
>> +	if (!dst || (uintptr_t)dst&  7)
>> +		return RESPST_ERR_MISALIGNED_ATOMIC;
> It looks to me like iova_to_vaddr is completely broken, where is the
> kmap on that flow?
Hi Jason,

I think rxe_mr_init_user() maps the user addr space to the kernel addr 
space during memory region registration, the mapping records are saved 
into mr->cur_map_set->map[x].
Why do you think iova_to_vaddr() is completely broken?

Best Regards,
Xiao Yang
> Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-18  8:04     ` yangx.jy
@ 2022-01-18  9:03       ` yangx.jy
  0 siblings, 0 replies; 27+ messages in thread
From: yangx.jy @ 2022-01-18  9:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, tom, yanjun.zhu, rpearsonhpe, y-goto, lizhijian,
	tomasz.gromadzki

Hi,

Sorry for the duplicate replies.

Best Regards,
Xiao Yang
On 2022/1/18 16:04, yangx.jy@fujitsu.com wrote:
> On 2022/1/17 21:16, Jason Gunthorpe wrote:
>> On Thu, Jan 13, 2022 at 11:03:50AM +0800, Xiao Yang wrote:
>>> +static enum resp_states process_atomic_write(struct rxe_qp *qp,
>>> +					     struct rxe_pkt_info *pkt)
>>> +{
>>> +	struct rxe_mr *mr = qp->resp.mr;
>>> +
>>> +	u64 *src = payload_addr(pkt);
>>> +
>>> +	u64 *dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, sizeof(u64));
>>> +	if (!dst || (uintptr_t)dst&  7)
>>> +		return RESPST_ERR_MISALIGNED_ATOMIC;
>> It looks to me like iova_to_vaddr is completely broken, where is the
>> kmap on that flow?
> Hi Jason,
>
> I think rxe_mr_init_user() maps the user addr space to the kernel addr 
> space during memory region registration, the mapping records are saved 
> into mr->cur_map_set->map[x].
> Why do you think iova_to_vaddr() is completely broken?
>
> Best Regards,
> Xiao Yang
>> Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-18  8:01     ` yangx.jy
@ 2022-01-18 12:35       ` Jason Gunthorpe
  2022-01-19  1:54         ` lizhijian
  0 siblings, 1 reply; 27+ messages in thread
From: Jason Gunthorpe @ 2022-01-18 12:35 UTC (permalink / raw)
  To: yangx.jy
  Cc: linux-rdma, tom, yanjun.zhu, rpearsonhpe, y-goto, lizhijian,
	tomasz.gromadzki

On Tue, Jan 18, 2022 at 08:01:59AM +0000, yangx.jy@fujitsu.com wrote:
> On 2022/1/17 21:16, Jason Gunthorpe wrote:
> > On Thu, Jan 13, 2022 at 11:03:50AM +0800, Xiao Yang wrote:
> >> +static enum resp_states process_atomic_write(struct rxe_qp *qp,
> >> +					     struct rxe_pkt_info *pkt)
> >> +{
> >> +	struct rxe_mr *mr = qp->resp.mr;
> >> +
> >> +	u64 *src = payload_addr(pkt);
> >> +
> >> +	u64 *dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, sizeof(u64));
> >> +	if (!dst || (uintptr_t)dst&  7)
> >> +		return RESPST_ERR_MISALIGNED_ATOMIC;
> > It looks to me like iova_to_vaddr is completely broken, where is the
> > kmap on that flow?
> Hi Jason,
> 
> I think rxe_mr_init_user() maps the user addr space to the kernel addr 
> space during memory region registration, the mapping records are saved 
> into mr->cur_map_set->map[x].

There is no way to touch user memory from the CPU in the kernel
without calling one of the kmap's, so I don't know what this thinks it
is doing.

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-18 12:35       ` Jason Gunthorpe
@ 2022-01-19  1:54         ` lizhijian
  2022-01-19 12:36           ` Jason Gunthorpe
  0 siblings, 1 reply; 27+ messages in thread
From: lizhijian @ 2022-01-19  1:54 UTC (permalink / raw)
  To: Jason Gunthorpe, yangx.jy
  Cc: linux-rdma, tom, yanjun.zhu, rpearsonhpe, y-goto, tomasz.gromadzki



On 18/01/2022 20:35, Jason Gunthorpe wrote:
> On Tue, Jan 18, 2022 at 08:01:59AM +0000, yangx.jy@fujitsu.com wrote:
>> On 2022/1/17 21:16, Jason Gunthorpe wrote:
>>> On Thu, Jan 13, 2022 at 11:03:50AM +0800, Xiao Yang wrote:
>>>> +static enum resp_states process_atomic_write(struct rxe_qp *qp,
>>>> +					     struct rxe_pkt_info *pkt)
>>>> +{
>>>> +	struct rxe_mr *mr = qp->resp.mr;
>>>> +
>>>> +	u64 *src = payload_addr(pkt);
>>>> +
>>>> +	u64 *dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, sizeof(u64));
>>>> +	if (!dst || (uintptr_t)dst&  7)
>>>> +		return RESPST_ERR_MISALIGNED_ATOMIC;
>>> It looks to me like iova_to_vaddr is completely broken, where is the
>>> kmap on that flow?
>> Hi Jason,
>>
>> I think rxe_mr_init_user() maps the user addr space to the kernel addr
>> space during memory region registration, the mapping records are saved
>> into mr->cur_map_set->map[x].
> There is no way to touch user memory from the CPU in the kernel
That's absolutely right, but I don't think it references that user memory directly.

> without calling one of the kmap's, so I don't know what this thinks it
> is doing.
>
> Jason

IMHO, for the rxe, rxe_mr_init_user() will call get_user_page() to pin iova first, and then
the page address will be recorded into mr->cur_map_set->map[x]. So that when we want
to reference iova's kernel address, we can call iova_to_vaddr() where it will retrieve its kernel
address by travel the mr->cur_map_set->map[x].

Currently RDMA WRITE, RDMA ATOMIC and etc use the same scheme to reference to iova.
Feel free to correct me if i missed something :)

Do you mean we should retrieve iova's page first, and the reference the kernel address by
kmap(), sorry for my stupid question ?

Thanks
Zhijian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-19  1:54         ` lizhijian
@ 2022-01-19 12:36           ` Jason Gunthorpe
  2022-01-19 16:47             ` Ira Weiny
  2022-01-20 12:07             ` Li, Zhijian
  0 siblings, 2 replies; 27+ messages in thread
From: Jason Gunthorpe @ 2022-01-19 12:36 UTC (permalink / raw)
  To: lizhijian
  Cc: yangx.jy, linux-rdma, tom, yanjun.zhu, rpearsonhpe, y-goto,
	tomasz.gromadzki

On Wed, Jan 19, 2022 at 01:54:32AM +0000, lizhijian@fujitsu.com wrote:
> 
> 
> On 18/01/2022 20:35, Jason Gunthorpe wrote:
> > On Tue, Jan 18, 2022 at 08:01:59AM +0000, yangx.jy@fujitsu.com wrote:
> >> On 2022/1/17 21:16, Jason Gunthorpe wrote:
> >>> On Thu, Jan 13, 2022 at 11:03:50AM +0800, Xiao Yang wrote:
> >>>> +static enum resp_states process_atomic_write(struct rxe_qp *qp,
> >>>> +					     struct rxe_pkt_info *pkt)
> >>>> +{
> >>>> +	struct rxe_mr *mr = qp->resp.mr;
> >>>> +
> >>>> +	u64 *src = payload_addr(pkt);
> >>>> +
> >>>> +	u64 *dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, sizeof(u64));
> >>>> +	if (!dst || (uintptr_t)dst&  7)
> >>>> +		return RESPST_ERR_MISALIGNED_ATOMIC;
> >>> It looks to me like iova_to_vaddr is completely broken, where is the
> >>> kmap on that flow?
> >> Hi Jason,
> >>
> >> I think rxe_mr_init_user() maps the user addr space to the kernel addr
> >> space during memory region registration, the mapping records are saved
> >> into mr->cur_map_set->map[x].
> > There is no way to touch user memory from the CPU in the kernel
> That's absolutely right, but I don't think it references that user memory directly.
> 
> > without calling one of the kmap's, so I don't know what this thinks it
> > is doing.
> >
> > Jason
> 
> IMHO, for the rxe, rxe_mr_init_user() will call get_user_page() to pin iova first, and then
> the page address will be recorded into mr->cur_map_set->map[x]. So that when we want
> to reference iova's kernel address, we can call iova_to_vaddr() where it will retrieve its kernel
> address by travel the mr->cur_map_set->map[x].

That flow needs a kmap

> Do you mean we should retrieve iova's page first, and the reference the kernel address by
> kmap(), sorry for my stupid question ?

Going from struct page to something the kernel can can touch requires
kmap

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-19 12:36           ` Jason Gunthorpe
@ 2022-01-19 16:47             ` Ira Weiny
  2022-01-20 12:07             ` Li, Zhijian
  1 sibling, 0 replies; 27+ messages in thread
From: Ira Weiny @ 2022-01-19 16:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: lizhijian, yangx.jy, linux-rdma, tom, yanjun.zhu, rpearsonhpe,
	y-goto, tomasz.gromadzki

On Wed, Jan 19, 2022 at 08:36:35AM -0400, Jason Gunthorpe wrote:
> On Wed, Jan 19, 2022 at 01:54:32AM +0000, lizhijian@fujitsu.com wrote:

[snip]

> > 
> > IMHO, for the rxe, rxe_mr_init_user() will call get_user_page() to pin iova first, and then
> > the page address will be recorded into mr->cur_map_set->map[x]. So that when we want
> > to reference iova's kernel address, we can call iova_to_vaddr() where it will retrieve its kernel
> > address by travel the mr->cur_map_set->map[x].
> 
> That flow needs a kmap
> 
> > Do you mean we should retrieve iova's page first, and the reference the kernel address by
> > kmap(), sorry for my stupid question ?
> 
> Going from struct page to something the kernel can can touch requires
> kmap

kmap() the call is being deprecated.  Please use kmap_local_page() if possible.

https://lore.kernel.org/lkml/20211210232404.4098157-1-ira.weiny@intel.com/

Thanks,
Ira


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-19 12:36           ` Jason Gunthorpe
  2022-01-19 16:47             ` Ira Weiny
@ 2022-01-20 12:07             ` Li, Zhijian
  2022-01-21 12:58               ` Jason Gunthorpe
  1 sibling, 1 reply; 27+ messages in thread
From: Li, Zhijian @ 2022-01-20 12:07 UTC (permalink / raw)
  To: Jason Gunthorpe, lizhijian
  Cc: yangx.jy, linux-rdma, tom, yanjun.zhu, rpearsonhpe, y-goto,
	tomasz.gromadzki

Hi Jason


on 2022/1/19 20:36, Jason Gunthorpe wrote:
> On Wed, Jan 19, 2022 at 01:54:32AM +0000, lizhijian@fujitsu.com wrote:
>>
>> On 18/01/2022 20:35, Jason Gunthorpe wrote:
>>> On Tue, Jan 18, 2022 at 08:01:59AM +0000, yangx.jy@fujitsu.com wrote:
>>>> On 2022/1/17 21:16, Jason Gunthorpe wrote:
>>>>> On Thu, Jan 13, 2022 at 11:03:50AM +0800, Xiao Yang wrote:
>>>>>> +static enum resp_states process_atomic_write(struct rxe_qp *qp,
>>>>>> +					     struct rxe_pkt_info *pkt)
>>>>>> +{
>>>>>> +	struct rxe_mr *mr = qp->resp.mr;
>>>>>> +
>>>>>> +	u64 *src = payload_addr(pkt);
>>>>>> +
>>>>>> +	u64 *dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, sizeof(u64));
>>>>>> +	if (!dst || (uintptr_t)dst&  7)
>>>>>> +		return RESPST_ERR_MISALIGNED_ATOMIC;
>>>>> It looks to me like iova_to_vaddr is completely broken, where is the
>>>>> kmap on that flow?
>>>> Hi Jason,
>>>>
>>>> I think rxe_mr_init_user() maps the user addr space to the kernel addr
>>>> space during memory region registration, the mapping records are saved
>>>> into mr->cur_map_set->map[x].
>>> There is no way to touch user memory from the CPU in the kernel
>> That's absolutely right, but I don't think it references that user memory directly.
>>
>>> without calling one of the kmap's, so I don't know what this thinks it
>>> is doing.
>>>
>>> Jason
>> IMHO, for the rxe, rxe_mr_init_user() will call get_user_page() to pin iova first, and then
>> the page address will be recorded into mr->cur_map_set->map[x]. So that when we want
>> to reference iova's kernel address, we can call iova_to_vaddr() where it will retrieve its kernel
>> address by travel the mr->cur_map_set->map[x].
> That flow needs a kmap

IIUC, this was a existing issue in iova_to_vaddr() right ?
Alternatively, can we have below changes to fix it simply?

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c 
b/drivers/infiniband/sw/rxe/rxe_mr.c
index 0621d387ccba..978fdd23665c 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -260,7 +260,8 @@ int rxe_mr_init_user(struct rxe_pd *pd, u64 start, 
u64 length, u64 iova,
                                 num_buf = 0;
                         }

-                       vaddr = page_address(sg_page_iter_page(&sg_iter));
+                       // FIXME: don't forget to kunmap_local(vaddr)
+                       vaddr = 
kmap_local_page(sg_page_iter_page(&sg_iter));
                         if (!vaddr) {
                                 pr_warn("%s: Unable to get virtual 
address\n",
                                                 __func__);



>
>> Do you mean we should retrieve iova's page first, and the reference the kernel address by
>> kmap(), sorry for my stupid question ?
> Going from struct page to something the kernel can can touch requires
> kmap

Got it

Thanks

Zhijian


>
> Jason



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-20 12:07             ` Li, Zhijian
@ 2022-01-21 12:58               ` Jason Gunthorpe
  2022-01-21 16:06                 ` Ira Weiny
  0 siblings, 1 reply; 27+ messages in thread
From: Jason Gunthorpe @ 2022-01-21 12:58 UTC (permalink / raw)
  To: Li, Zhijian
  Cc: lizhijian, yangx.jy, linux-rdma, tom, yanjun.zhu, rpearsonhpe,
	y-goto, tomasz.gromadzki

On Thu, Jan 20, 2022 at 08:07:36PM +0800, Li, Zhijian wrote:

> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c
> b/drivers/infiniband/sw/rxe/rxe_mr.c
> index 0621d387ccba..978fdd23665c 100644
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -260,7 +260,8 @@ int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64
> length, u64 iova,
>                                 num_buf = 0;
>                         }
> 
> -                       vaddr = page_address(sg_page_iter_page(&sg_iter));
> +                       // FIXME: don't forget to kunmap_local(vaddr)
> +                       vaddr = kmap_local_page(sg_page_iter_page(&sg_iter));

No, you can't leave the kmap open for a long time. The kmap has to
just be around the usage.

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-21 12:58               ` Jason Gunthorpe
@ 2022-01-21 16:06                 ` Ira Weiny
  2022-01-21 16:08                   ` Ira Weiny
  2022-01-27  9:37                   ` yangx.jy
  0 siblings, 2 replies; 27+ messages in thread
From: Ira Weiny @ 2022-01-21 16:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Li, Zhijian, lizhijian, yangx.jy, linux-rdma, tom, yanjun.zhu,
	rpearsonhpe, y-goto, tomasz.gromadzki

On Fri, Jan 21, 2022 at 08:58:37AM -0400, Jason Gunthorpe wrote:
> On Thu, Jan 20, 2022 at 08:07:36PM +0800, Li, Zhijian wrote:
> 
> > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c
> > b/drivers/infiniband/sw/rxe/rxe_mr.c
> > index 0621d387ccba..978fdd23665c 100644
> > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> > @@ -260,7 +260,8 @@ int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64
> > length, u64 iova,
> >                                 num_buf = 0;
> >                         }
> > 
> > -                       vaddr = page_address(sg_page_iter_page(&sg_iter));
> > +                       // FIXME: don't forget to kunmap_local(vaddr)
> > +                       vaddr = kmap_local_page(sg_page_iter_page(&sg_iter));
> 
> No, you can't leave the kmap open for a long time. The kmap has to
> just be around the usage.

Indeed Jason is correct here.  A couple of details here.  First
kmap_local_page() is only valid within the current thread of execution.  So
what you propose above will not work at all.  Second, kmap() is to be avoided.

Finally, that page_address() should be avoided IMO and will be broken, at least
for persistent memory pages, once some of my work lands.[*]  Jason would know
better, but I think page_address should be avoided in all driver code.  But
there is no clear documentation on that.

Taking a quick look at rxe_mr.c buf->addr is only used in rxe_mr_init_user().
You need to kmap_local_page() around that access.  What else is struct
rxe_phys_buf->addr used for?  Can you just map the page when you need it in
rxe_mr_init_user()?

If you must create a mapping that is permanent you could look at vmap().

Ira

> 
> Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-21 16:06                 ` Ira Weiny
@ 2022-01-21 16:08                   ` Ira Weiny
  2022-01-24  3:47                     ` lizhijian
  2022-01-27  9:37                   ` yangx.jy
  1 sibling, 1 reply; 27+ messages in thread
From: Ira Weiny @ 2022-01-21 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Li, Zhijian, lizhijian, yangx.jy, linux-rdma, tom, yanjun.zhu,
	rpearsonhpe, y-goto, tomasz.gromadzki

On Fri, Jan 21, 2022 at 08:06:54AM -0800, 'Ira Weiny' wrote:
> On Fri, Jan 21, 2022 at 08:58:37AM -0400, Jason Gunthorpe wrote:
> > On Thu, Jan 20, 2022 at 08:07:36PM +0800, Li, Zhijian wrote:
> > 
> > > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c
> > > b/drivers/infiniband/sw/rxe/rxe_mr.c
> > > index 0621d387ccba..978fdd23665c 100644
> > > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> > > @@ -260,7 +260,8 @@ int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64
> > > length, u64 iova,
> > >                                 num_buf = 0;
> > >                         }
> > > 
> > > -                       vaddr = page_address(sg_page_iter_page(&sg_iter));
> > > +                       // FIXME: don't forget to kunmap_local(vaddr)
> > > +                       vaddr = kmap_local_page(sg_page_iter_page(&sg_iter));
> > 
> > No, you can't leave the kmap open for a long time. The kmap has to
> > just be around the usage.
> 
> Indeed Jason is correct here.  A couple of details here.  First
> kmap_local_page() is only valid within the current thread of execution.  So
> what you propose above will not work at all.  Second, kmap() is to be avoided.
> 
> Finally, that page_address() should be avoided IMO and will be broken, at least
> for persistent memory pages, once some of my work lands.[*]  Jason would know
> better, but I think page_address should be avoided in all driver code.  But
> there is no clear documentation on that.
> 
> Taking a quick look at rxe_mr.c buf->addr is only used in rxe_mr_init_user().
                                                            ^^^^^^^^^^^^^^
Sorry...                                            I meant rxe_mr_copy()...

> You need to kmap_local_page() around that access.  What else is struct
> rxe_phys_buf->addr used for?  Can you just map the page when you need it in
> rxe_mr_init_user()?

rxe_mr_copy()...

Ira

> 
> If you must create a mapping that is permanent you could look at vmap().
> 
> Ira
> 
> > 
> > Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-21 16:08                   ` Ira Weiny
@ 2022-01-24  3:47                     ` lizhijian
  0 siblings, 0 replies; 27+ messages in thread
From: lizhijian @ 2022-01-24  3:47 UTC (permalink / raw)
  To: Ira Weiny, Jason Gunthorpe
  Cc: lizhijian, yangx.jy, linux-rdma, tom, yanjun.zhu, rpearsonhpe,
	y-goto, tomasz.gromadzki

Jason & Ira


On 22/01/2022 00:08, Ira Weiny wrote:
> On Fri, Jan 21, 2022 at 08:06:54AM -0800, 'Ira Weiny' wrote:
>> On Fri, Jan 21, 2022 at 08:58:37AM -0400, Jason Gunthorpe wrote:
>>> On Thu, Jan 20, 2022 at 08:07:36PM +0800, Li, Zhijian wrote:
>>>
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c
>>>> b/drivers/infiniband/sw/rxe/rxe_mr.c
>>>> index 0621d387ccba..978fdd23665c 100644
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>>>> @@ -260,7 +260,8 @@ int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64
>>>> length, u64 iova,
>>>>                                  num_buf = 0;
>>>>                          }
>>>>
>>>> -                       vaddr = page_address(sg_page_iter_page(&sg_iter));
>>>> +                       // FIXME: don't forget to kunmap_local(vaddr)
>>>> +                       vaddr = kmap_local_page(sg_page_iter_page(&sg_iter));
>>> No, you can't leave the kmap open for a long time. The kmap has to
>>> just be around the usage.
>> Indeed Jason is correct here.  A couple of details here.  First
>> kmap_local_page() is only valid within the current thread of execution.  So
>> what you propose above will not work at all.

Thanks you all for the details explanation. It *does* make sense.



>>   Second, kmap() is to be avoided.
>>
>> Finally, that page_address() should be avoided IMO and will be broken, at least
>> for persistent memory pages, once some of my work lands.[*]  Jason would know
>> better, but I think page_address should be avoided in all driver code.  But
>> there is no clear documentation on that.
>>
>> Taking a quick look at rxe_mr.c buf->addr is only used in rxe_mr_init_user().
>                                                              ^^^^^^^^^^^^^^
> Sorry...                                            I meant rxe_mr_copy()...

iova_to_vaddr() is another user by process_atomic().



>
>> You need to kmap_local_page() around that access.  What else is struct
>> rxe_phys_buf->addr used for?

IIUC, rxe_phys_buf->addr is used for permanently mapping a user space address to kernel space address.
So in RDMA scene, REMOTE side can access(read/write) LOCAL memory allocated by user space application directly.


>> Can you just map the page when you need it in
>> rxe_mr_init_user()?

It can be in theory, but  I'm not sure. @Jason, what's your opinion.


> rxe_mr_copy()...
>
> Ira
>
>> If you must create a mapping that is permanent you could look at vmap().

Well, let me see


Thanks
Zhijian

>>
>> Ira
>>
>>> Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-21 16:06                 ` Ira Weiny
  2022-01-21 16:08                   ` Ira Weiny
@ 2022-01-27  9:37                   ` yangx.jy
  2022-01-27  9:57                     ` hch
  1 sibling, 1 reply; 27+ messages in thread
From: yangx.jy @ 2022-01-27  9:37 UTC (permalink / raw)
  To: Ira Weiny, Jason Gunthorpe
  Cc: lizhijian, lizhijian, linux-rdma, tom, yanjun.zhu, rpearsonhpe,
	y-goto, tomasz.gromadzki, hch

On 2022/1/22 0:06, Ira Weiny wrote:
> On Fri, Jan 21, 2022 at 08:58:37AM -0400, Jason Gunthorpe wrote:
>> On Thu, Jan 20, 2022 at 08:07:36PM +0800, Li, Zhijian wrote:
>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c
>>> b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> index 0621d387ccba..978fdd23665c 100644
>>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> @@ -260,7 +260,8 @@ int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64
>>> length, u64 iova,
>>>                                  num_buf = 0;
>>>                          }
>>>
>>> -                       vaddr = page_address(sg_page_iter_page(&sg_iter));
>>> +                       // FIXME: don't forget to kunmap_local(vaddr)
>>> +                       vaddr = kmap_local_page(sg_page_iter_page(&sg_iter));
>> No, you can't leave the kmap open for a long time. The kmap has to
>> just be around the usage.

Cc Christoph Hellwig

Hi Jason, Ira

Sorry for the late reply.

Do you mean we have to consider that some allocated pages come from high 
memory?

I think INFINIBAND_VIRT_DMA kconfig[1] has ensured that all allocated 
pages have a kernel virtual address.

In this case, is it OK to call page_address() directly?

[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b1e678bf290db5a76f1b6a9f7c381310e03440d6


> Indeed Jason is correct here.  A couple of details here.  First
> kmap_local_page() is only valid within the current thread of execution.  So
> what you propose above will not work at all.  Second, kmap() is to be avoided.
>
> Finally, that page_address() should be avoided IMO and will be broken, at least
> for persistent memory pages, once some of my work lands.[*]  Jason would know
> better, but I think page_address should be avoided in all driver code.  But
> there is no clear documentation on that.

Could you explain why page_address() will be broken for persistent 
memory pages?

Sorry, I am not familiar with the principle of persistent memory.

Best Regards,

Xiao Yang

>
> Taking a quick look at rxe_mr.c buf->addr is only used in rxe_mr_init_user().
> You need to kmap_local_page() around that access.  What else is struct
> rxe_phys_buf->addr used for?  Can you just map the page when you need it in
> rxe_mr_init_user()?
>
> If you must create a mapping that is permanent you could look at vmap().
>
> Ira
>
>> Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-27  9:37                   ` yangx.jy
@ 2022-01-27  9:57                     ` hch
  2022-01-27 18:08                       ` Ira Weiny
  0 siblings, 1 reply; 27+ messages in thread
From: hch @ 2022-01-27  9:57 UTC (permalink / raw)
  To: yangx.jy
  Cc: Ira Weiny, Jason Gunthorpe, lizhijian, linux-rdma, tom,
	yanjun.zhu, rpearsonhpe, y-goto, tomasz.gromadzki, hch

On Thu, Jan 27, 2022 at 09:37:59AM +0000, yangx.jy@fujitsu.com wrote:
> Do you mean we have to consider that some allocated pages come from high 
> memory?
>
> I think INFINIBAND_VIRT_DMA kconfig[1] has ensured that all allocated 
> pages have a kernel virtual address.

rxe and siw depend on INFINIBAND_VIRT_DMA which depends on !HIGHMEM,
so you don't need kmap here at all.

> In this case, is it OK to call page_address() directly?

Yes.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-27  9:57                     ` hch
@ 2022-01-27 18:08                       ` Ira Weiny
  2022-01-28  6:16                         ` hch
  0 siblings, 1 reply; 27+ messages in thread
From: Ira Weiny @ 2022-01-27 18:08 UTC (permalink / raw)
  To: hch
  Cc: yangx.jy, Jason Gunthorpe, lizhijian, linux-rdma, tom,
	yanjun.zhu, rpearsonhpe, y-goto, tomasz.gromadzki

On Thu, Jan 27, 2022 at 10:57:30AM +0100, Christoph Hellwig wrote:
> On Thu, Jan 27, 2022 at 09:37:59AM +0000, yangx.jy@fujitsu.com wrote:
> > Do you mean we have to consider that some allocated pages come from high 
> > memory?
> >
> > I think INFINIBAND_VIRT_DMA kconfig[1] has ensured that all allocated 
> > pages have a kernel virtual address.
> 
> rxe and siw depend on INFINIBAND_VIRT_DMA which depends on !HIGHMEM,
> so you don't need kmap here at all.

Until/if I get PKS protecting pmem.[1]  Then if the page is pmem, page_address()
will give you an address which you will fault on when you access it.

> 
> > In this case, is it OK to call page_address() directly?
> 
> Yes.

For now yes...  But please use kmap_local_page() and it will do the right thing
(by default call page_address() on !HIGHMEM systems).

IMO page_address() is a hold over from a time when memory management was much
simpler and, again IMO, today its use assumes too much for drivers like this.
As more protections on memory are implemented it presents a series of land
mines to be found.

While kmap is also a hold over I'm trying to redefine it to be 'get the kernel
mapping' rather than 'map this into highmem'...  But perhaps I'm losing that
battle...

Ira

[1] https://lore.kernel.org/lkml/20220127175505.851391-1-ira.weiny@intel.com/T/#mcd60ea9a9c7b90e63b8d333c9270186fc7e47707

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-27 18:08                       ` Ira Weiny
@ 2022-01-28  6:16                         ` hch
  2022-01-28 19:15                           ` Ira Weiny
  0 siblings, 1 reply; 27+ messages in thread
From: hch @ 2022-01-28  6:16 UTC (permalink / raw)
  To: Ira Weiny
  Cc: hch, yangx.jy, Jason Gunthorpe, lizhijian, linux-rdma, tom,
	yanjun.zhu, rpearsonhpe, y-goto, tomasz.gromadzki

On Thu, Jan 27, 2022 at 10:08:33AM -0800, Ira Weiny wrote:
> On Thu, Jan 27, 2022 at 10:57:30AM +0100, Christoph Hellwig wrote:
> > On Thu, Jan 27, 2022 at 09:37:59AM +0000, yangx.jy@fujitsu.com wrote:
> > > Do you mean we have to consider that some allocated pages come from high 
> > > memory?
> > >
> > > I think INFINIBAND_VIRT_DMA kconfig[1] has ensured that all allocated 
> > > pages have a kernel virtual address.
> > 
> > rxe and siw depend on INFINIBAND_VIRT_DMA which depends on !HIGHMEM,
> > so you don't need kmap here at all.
> 
> Until/if I get PKS protecting pmem.[1]  Then if the page is pmem, page_address()
> will give you an address which you will fault on when you access it.

In that case we'll have problems all over the drivers that select
INFINIBAND_VIRT_DMA, so this one doesn't make much of a difference..

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-28  6:16                         ` hch
@ 2022-01-28 19:15                           ` Ira Weiny
  2022-02-10 11:06                             ` yangx.jy
  0 siblings, 1 reply; 27+ messages in thread
From: Ira Weiny @ 2022-01-28 19:15 UTC (permalink / raw)
  To: hch
  Cc: yangx.jy, Jason Gunthorpe, lizhijian, linux-rdma, tom,
	yanjun.zhu, rpearsonhpe, y-goto, tomasz.gromadzki

On Fri, Jan 28, 2022 at 07:16:18AM +0100, Christoph Hellwig wrote:
> On Thu, Jan 27, 2022 at 10:08:33AM -0800, Ira Weiny wrote:
> > On Thu, Jan 27, 2022 at 10:57:30AM +0100, Christoph Hellwig wrote:
> > > On Thu, Jan 27, 2022 at 09:37:59AM +0000, yangx.jy@fujitsu.com wrote:
> > > > Do you mean we have to consider that some allocated pages come from high 
> > > > memory?
> > > >
> > > > I think INFINIBAND_VIRT_DMA kconfig[1] has ensured that all allocated 
> > > > pages have a kernel virtual address.
> > > 
> > > rxe and siw depend on INFINIBAND_VIRT_DMA which depends on !HIGHMEM,
> > > so you don't need kmap here at all.
> > 
> > Until/if I get PKS protecting pmem.[1]  Then if the page is pmem, page_address()
> > will give you an address which you will fault on when you access it.
> 
> In that case we'll have problems all over the drivers that select
> INFINIBAND_VIRT_DMA, so this one doesn't make much of a difference..

Ah...  ok...

Obviously I've not kept up with the software providers...

For this use case, can kmap_local_page() work?

Ira

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-01-28 19:15                           ` Ira Weiny
@ 2022-02-10 11:06                             ` yangx.jy
  2022-02-11 18:30                               ` Ira Weiny
  0 siblings, 1 reply; 27+ messages in thread
From: yangx.jy @ 2022-02-10 11:06 UTC (permalink / raw)
  To: Ira Weiny, hch
  Cc: Jason Gunthorpe, lizhijian, linux-rdma, tom, yanjun.zhu,
	rpearsonhpe, y-goto, tomasz.gromadzki

On 2022/1/29 3:15, Ira Weiny wrote:

> Ah...  ok...
>
> Obviously I've not kept up with the software providers...
>
> For this use case, can kmap_local_page() work?
>
> Ira

Hi Ira,

I applied your patch set on my branch but ibv_reg_mr() with /dev/dax0.0 
didn't trigger any warning or panic after enabling DEVMAP_ACCESS_PROTECTION.

PS: ibv_reg_mr() calls page_address() in kernel.

Do you know if some hardwares(e.g. CPU) need to support PKS feature? how 
to check the feature on hardwares? or do I need more steps?

Best Regards,

Xiao Yang

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation
  2022-02-10 11:06                             ` yangx.jy
@ 2022-02-11 18:30                               ` Ira Weiny
  0 siblings, 0 replies; 27+ messages in thread
From: Ira Weiny @ 2022-02-11 18:30 UTC (permalink / raw)
  To: yangx.jy
  Cc: hch, Jason Gunthorpe, lizhijian, linux-rdma, tom, yanjun.zhu,
	rpearsonhpe, y-goto, tomasz.gromadzki

On Thu, Feb 10, 2022 at 11:06:55AM +0000, yangx.jy@fujitsu.com wrote:
> On 2022/1/29 3:15, Ira Weiny wrote:
> 
> > Ah...  ok...
> >
> > Obviously I've not kept up with the software providers...
> >
> > For this use case, can kmap_local_page() work?
> >
> > Ira
> 
> Hi Ira,
> 
> I applied your patch set on my branch but ibv_reg_mr() with /dev/dax0.0 
> didn't trigger any warning or panic after enabling DEVMAP_ACCESS_PROTECTION.
> 
> PS: ibv_reg_mr() calls page_address() in kernel.
> 
> Do you know if some hardwares(e.g. CPU) need to support PKS feature?

Yes you will need Sapphire rapids hardware or newer, or qemu 6.0 which also
supports PKS.

>
> how 
> to check the feature on hardwares? or do I need more steps?

My lscpu on qemu reports pks.  I suspect you don't have PKS on the hardware so
it is working.

Ira

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-02-11 18:30 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-13  3:03 [RFC PATCH v2 0/2] RDMA/rxe: Add RDMA Atomic Write operation Xiao Yang
2022-01-13  3:03 ` [RFC PATCH v2 1/2] RDMA/rxe: Rename send_atomic_ack() and atomic member of struct resp_res Xiao Yang
2022-01-13  3:03 ` [RFC PATCH v2 2/2] RDMA/rxe: Support RDMA Atomic Write operation Xiao Yang
2022-01-13 11:35   ` kernel test robot
2022-01-13 12:16   ` kernel test robot
2022-01-13 12:16     ` kernel test robot
2022-01-17 13:16   ` Jason Gunthorpe
2022-01-18  8:01     ` yangx.jy
2022-01-18 12:35       ` Jason Gunthorpe
2022-01-19  1:54         ` lizhijian
2022-01-19 12:36           ` Jason Gunthorpe
2022-01-19 16:47             ` Ira Weiny
2022-01-20 12:07             ` Li, Zhijian
2022-01-21 12:58               ` Jason Gunthorpe
2022-01-21 16:06                 ` Ira Weiny
2022-01-21 16:08                   ` Ira Weiny
2022-01-24  3:47                     ` lizhijian
2022-01-27  9:37                   ` yangx.jy
2022-01-27  9:57                     ` hch
2022-01-27 18:08                       ` Ira Weiny
2022-01-28  6:16                         ` hch
2022-01-28 19:15                           ` Ira Weiny
2022-02-10 11:06                             ` yangx.jy
2022-02-11 18:30                               ` Ira Weiny
2022-01-18  8:02     ` yangx.jy
2022-01-18  8:04     ` yangx.jy
2022-01-18  9:03       ` yangx.jy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.