linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-next v5 0/6] Replace AV by AH in UD sends
@ 2021-10-06  1:58 Bob Pearson
  2021-10-06  1:58 ` [PATCH for-next v5 1/6] RDMA/rxe: Move AV from rxe_send_wqe to rxe_send_wr Bob Pearson
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Bob Pearson @ 2021-10-06  1:58 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently the rdma_rxe driver and its user space provider exchange
addressing information for UD sends by having the provider compute an
address vector (AV) and send it with each WQE. This is not the way
that the RDMA verbs API was intended to operate.

This series of patches modifies the way UD send WQEs work by exchanging
an index identifying the AH replacing the 88 byte AV by a 4 byte AH
index. In order to not break compatibility with the existing API the
rdma_rxe driver will recognise when an older version of the provider
is not sending an index (i.e. it is 0) and will use the AV instead.

This series of patches is identical to the previous version
but rebased to 5.15.0-rc2+. It applies cleanly to

    commit: d30ef6d5c013c19e907f2a3a3d6eee04fcd3de0d (for-next)

---
v5:
  Rebase to 5.15.0-rc2+

v4:
  Rebase to 5.15.0-rc1+

v3:
  Split up commits into smaller steps.

v2:
  Rearranged AV in rxe_send_wqe to be in the ud struct but padded to the
  same offset as the original preserving ABI compatibility.

Bob Pearson (6):
  RDMA/rxe: Move AV from rxe_send_wqe to rxe_send_wr
  RDMA/rxe: Change AH objects to indexed
  RDMA/rxe: Create AH index and return to user space
  RDMA/rxe: Replace ah->pd by ah->ibah.pd
  RDMA/rxe: Lookup kernel AH from ah index in UD WQEs
  RDMA/rxe: Convert kernel UD post send to use ah_num

 drivers/infiniband/sw/rxe/rxe_av.c    | 20 +++++++++++++-
 drivers/infiniband/sw/rxe/rxe_param.h |  4 ++-
 drivers/infiniband/sw/rxe/rxe_pool.c  |  4 ++-
 drivers/infiniband/sw/rxe/rxe_req.c   |  8 +++---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 39 ++++++++++++++++++++++-----
 drivers/infiniband/sw/rxe/rxe_verbs.h |  8 +++++-
 include/uapi/rdma/rdma_user_rxe.h     | 10 ++++++-
 7 files changed, 79 insertions(+), 14 deletions(-)

-- 
2.30.2


Bob Pearson (6):
  RDMA/rxe: Move AV from rxe_send_wqe to rxe_send_wr
  RDMA/rxe: Change AH objects to indexed
  RDMA/rxe: Create AH index and return to user space
  RDMA/rxe: Replace ah->pd by ah->ibah.pd
  RDMA/rxe: Lookup kernel AH from ah index in UD WQEs
  RDMA/rxe: Convert kernel UD post send to use ah_num

 drivers/infiniband/sw/rxe/rxe_av.c    | 20 +++++++++++++-
 drivers/infiniband/sw/rxe/rxe_param.h |  4 ++-
 drivers/infiniband/sw/rxe/rxe_pool.c  |  4 ++-
 drivers/infiniband/sw/rxe/rxe_req.c   |  8 +++---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 39 ++++++++++++++++++++++-----
 drivers/infiniband/sw/rxe/rxe_verbs.h |  8 +++++-
 include/uapi/rdma/rdma_user_rxe.h     | 10 ++++++-
 7 files changed, 79 insertions(+), 14 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH for-next v5 1/6] RDMA/rxe: Move AV from rxe_send_wqe to rxe_send_wr
  2021-10-06  1:58 [PATCH for-next v5 0/6] Replace AV by AH in UD sends Bob Pearson
@ 2021-10-06  1:58 ` Bob Pearson
  2021-10-06  1:58 ` [PATCH for-next v5 2/6] RDMA/rxe: Change AH objects to indexed Bob Pearson
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Bob Pearson @ 2021-10-06  1:58 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Move the struct rxe_av av from struct rxe_send_wqe to struct rxe_send_wr
placing it in wr.ud at the same offset as it was previously. This has the
effect of increasing the size of struct rxe_send_wr while keeping the size of
struct rxe_send_wqe the same. This better reflects the use of this field
which is only used for UD sends. This change has no effect on ABI
compatibility so the modified rxe driver will operate with older versions
of rdma-core.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_av.c    | 2 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c | 3 ++-
 include/uapi/rdma/rdma_user_rxe.h     | 4 +++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c
index da2e867a1ed9..85580ea5eed0 100644
--- a/drivers/infiniband/sw/rxe/rxe_av.c
+++ b/drivers/infiniband/sw/rxe/rxe_av.c
@@ -107,5 +107,5 @@ struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt)
 	if (qp_type(pkt->qp) == IB_QPT_RC || qp_type(pkt->qp) == IB_QPT_UC)
 		return &pkt->qp->pri_av;
 
-	return (pkt->wqe) ? &pkt->wqe->av : NULL;
+	return (pkt->wqe) ? &pkt->wqe->wr.wr.ud.av : NULL;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 9d0bb9aa7514..c09e1c25ce66 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -584,7 +584,8 @@ static void init_send_wqe(struct rxe_qp *qp, const struct ib_send_wr *ibwr,
 	if (qp_type(qp) == IB_QPT_UD ||
 	    qp_type(qp) == IB_QPT_SMI ||
 	    qp_type(qp) == IB_QPT_GSI)
-		memcpy(&wqe->av, &to_rah(ud_wr(ibwr)->ah)->av, sizeof(wqe->av));
+		memcpy(&wqe->wr.wr.ud.av, &to_rah(ud_wr(ibwr)->ah)->av,
+		       sizeof(struct rxe_av));
 
 	if (unlikely(ibwr->send_flags & IB_SEND_INLINE))
 		copy_inline_data_to_wqe(wqe, ibwr);
diff --git a/include/uapi/rdma/rdma_user_rxe.h b/include/uapi/rdma/rdma_user_rxe.h
index e283c2220aba..2f1ebbe96434 100644
--- a/include/uapi/rdma/rdma_user_rxe.h
+++ b/include/uapi/rdma/rdma_user_rxe.h
@@ -98,6 +98,9 @@ struct rxe_send_wr {
 			__u32	remote_qpn;
 			__u32	remote_qkey;
 			__u16	pkey_index;
+			__u16	reserved;
+			__u32	pad[5];
+			struct rxe_av av;
 		} ud;
 		struct {
 			__aligned_u64	addr;
@@ -148,7 +151,6 @@ struct rxe_dma_info {
 
 struct rxe_send_wqe {
 	struct rxe_send_wr	wr;
-	struct rxe_av		av;
 	__u32			status;
 	__u32			state;
 	__aligned_u64		iova;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH for-next v5 2/6] RDMA/rxe: Change AH objects to indexed
  2021-10-06  1:58 [PATCH for-next v5 0/6] Replace AV by AH in UD sends Bob Pearson
  2021-10-06  1:58 ` [PATCH for-next v5 1/6] RDMA/rxe: Move AV from rxe_send_wqe to rxe_send_wr Bob Pearson
@ 2021-10-06  1:58 ` Bob Pearson
  2021-10-06  1:58 ` [PATCH for-next v5 3/6] RDMA/rxe: Create AH index and return to user space Bob Pearson
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Bob Pearson @ 2021-10-06  1:58 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Make changes to rxe_param.h and rxe_pool.c to allow indexing of AH
objects. Valid indices are non-zero so older providers can be detected.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_param.h | 4 +++-
 drivers/infiniband/sw/rxe/rxe_pool.c  | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
index b5a70cbe94aa..d92d7edd712b 100644
--- a/drivers/infiniband/sw/rxe/rxe_param.h
+++ b/drivers/infiniband/sw/rxe/rxe_param.h
@@ -67,7 +67,9 @@ enum rxe_device_param {
 	RXE_MAX_MCAST_GRP		= 8192,
 	RXE_MAX_MCAST_QP_ATTACH		= 56,
 	RXE_MAX_TOT_MCAST_QP_ATTACH	= 0x70000,
-	RXE_MAX_AH			= 100,
+	RXE_MAX_AH			= 16383,
+	RXE_MIN_AH_INDEX		= 1,
+	RXE_MAX_AH_INDEX		= 16383,
 	RXE_MAX_SRQ_WR			= 0x4000,
 	RXE_MIN_SRQ_WR			= 1,
 	RXE_MAX_SRQ_SGE			= 27,
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index ffa8420b4765..7b4cb46edfd9 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -26,7 +26,9 @@ struct rxe_type_info rxe_type_info[RXE_NUM_TYPES] = {
 		.name		= "rxe-ah",
 		.size		= sizeof(struct rxe_ah),
 		.elem_offset	= offsetof(struct rxe_ah, pelem),
-		.flags		= RXE_POOL_NO_ALLOC,
+		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
+		.min_index	= RXE_MIN_AH_INDEX,
+		.max_index	= RXE_MAX_AH_INDEX,
 	},
 	[RXE_TYPE_SRQ] = {
 		.name		= "rxe-srq",
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH for-next v5 3/6] RDMA/rxe: Create AH index and return to user space
  2021-10-06  1:58 [PATCH for-next v5 0/6] Replace AV by AH in UD sends Bob Pearson
  2021-10-06  1:58 ` [PATCH for-next v5 1/6] RDMA/rxe: Move AV from rxe_send_wqe to rxe_send_wr Bob Pearson
  2021-10-06  1:58 ` [PATCH for-next v5 2/6] RDMA/rxe: Change AH objects to indexed Bob Pearson
@ 2021-10-06  1:58 ` Bob Pearson
  2021-10-06  1:58 ` [PATCH for-next v5 4/6] RDMA/rxe: Replace ah->pd by ah->ibah.pd Bob Pearson
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Bob Pearson @ 2021-10-06  1:58 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Make changes to rdma_user_rxe.h to allow indexing AH objects, passing
the index in UD send WRs to the driver and returning the index to the rxe
provider.

Modify rxe_create_ah() to add an index to AH when created and if
called from a new user provider return it to user space. If called
from an old provider mark the AH as not having a useful index.
Modify rxe_destroy_ah to drop the index before deleting the object.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 31 ++++++++++++++++++++++++++-
 drivers/infiniband/sw/rxe/rxe_verbs.h |  2 ++
 include/uapi/rdma/rdma_user_rxe.h     |  8 ++++++-
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index c09e1c25ce66..8854ace63acd 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -158,9 +158,19 @@ static int rxe_create_ah(struct ib_ah *ibah,
 			 struct ib_udata *udata)
 
 {
-	int err;
 	struct rxe_dev *rxe = to_rdev(ibah->device);
 	struct rxe_ah *ah = to_rah(ibah);
+	struct rxe_create_ah_resp __user *uresp = NULL;
+	int err;
+
+	if (udata) {
+		/* test if new user provider */
+		if (udata->outlen >= sizeof(*uresp))
+			uresp = udata->outbuf;
+		ah->is_user = true;
+	} else {
+		ah->is_user = false;
+	}
 
 	err = rxe_av_chk_attr(rxe, init_attr->ah_attr);
 	if (err)
@@ -170,6 +180,24 @@ static int rxe_create_ah(struct ib_ah *ibah,
 	if (err)
 		return err;
 
+	/* create index > 0 */
+	rxe_add_index(ah);
+	ah->ah_num = ah->pelem.index;
+
+	if (uresp) {
+		/* only if new user provider */
+		err = copy_to_user(&uresp->ah_num, &ah->ah_num,
+					 sizeof(uresp->ah_num));
+		if (err) {
+			rxe_drop_index(ah);
+			rxe_drop_ref(ah);
+			return -EFAULT;
+		}
+	} else if (ah->is_user) {
+		/* only if old user provider */
+		ah->ah_num = 0;
+	}
+
 	rxe_init_av(init_attr->ah_attr, &ah->av);
 	return 0;
 }
@@ -202,6 +230,7 @@ static int rxe_destroy_ah(struct ib_ah *ibah, u32 flags)
 {
 	struct rxe_ah *ah = to_rah(ibah);
 
+	rxe_drop_index(ah);
 	rxe_drop_ref(ah);
 	return 0;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index c807639435eb..9cd203f1fa22 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -48,6 +48,8 @@ struct rxe_ah {
 	struct rxe_pool_entry	pelem;
 	struct rxe_pd		*pd;
 	struct rxe_av		av;
+	bool			is_user;
+	int			ah_num;
 };
 
 struct rxe_cqe {
diff --git a/include/uapi/rdma/rdma_user_rxe.h b/include/uapi/rdma/rdma_user_rxe.h
index 2f1ebbe96434..dc9f7a5e203a 100644
--- a/include/uapi/rdma/rdma_user_rxe.h
+++ b/include/uapi/rdma/rdma_user_rxe.h
@@ -99,7 +99,8 @@ struct rxe_send_wr {
 			__u32	remote_qkey;
 			__u16	pkey_index;
 			__u16	reserved;
-			__u32	pad[5];
+			__u32	ah_num;
+			__u32	pad[4];
 			struct rxe_av av;
 		} ud;
 		struct {
@@ -170,6 +171,11 @@ struct rxe_recv_wqe {
 	struct rxe_dma_info	dma;
 };
 
+struct rxe_create_ah_resp {
+	__u32 ah_num;
+	__u32 reserved;
+};
+
 struct rxe_create_cq_resp {
 	struct mminfo mi;
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH for-next v5 4/6] RDMA/rxe: Replace ah->pd by ah->ibah.pd
  2021-10-06  1:58 [PATCH for-next v5 0/6] Replace AV by AH in UD sends Bob Pearson
                   ` (2 preceding siblings ...)
  2021-10-06  1:58 ` [PATCH for-next v5 3/6] RDMA/rxe: Create AH index and return to user space Bob Pearson
@ 2021-10-06  1:58 ` Bob Pearson
  2021-10-06  1:58 ` [PATCH for-next v5 5/6] RDMA/rxe: Lookup kernel AH from ah index in UD WQEs Bob Pearson
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Bob Pearson @ 2021-10-06  1:58 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

The pd field in struct rxe_ah is redundant with the pd field in the
rdma-core's ib_ah. Eliminate the pd field in rxe_ah and add an inline
to extract the pd from the ibah field.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_verbs.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 9cd203f1fa22..881a5159a7d0 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -46,7 +46,6 @@ struct rxe_pd {
 struct rxe_ah {
 	struct ib_ah		ibah;
 	struct rxe_pool_entry	pelem;
-	struct rxe_pd		*pd;
 	struct rxe_av		av;
 	bool			is_user;
 	int			ah_num;
@@ -471,6 +470,11 @@ static inline struct rxe_mw *to_rmw(struct ib_mw *mw)
 	return mw ? container_of(mw, struct rxe_mw, ibmw) : NULL;
 }
 
+static inline struct rxe_pd *rxe_ah_pd(struct rxe_ah *ah)
+{
+	return to_rpd(ah->ibah.pd);
+}
+
 static inline struct rxe_pd *mr_pd(struct rxe_mr *mr)
 {
 	return to_rpd(mr->ibmr.pd);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH for-next v5 5/6] RDMA/rxe: Lookup kernel AH from ah index in UD WQEs
  2021-10-06  1:58 [PATCH for-next v5 0/6] Replace AV by AH in UD sends Bob Pearson
                   ` (3 preceding siblings ...)
  2021-10-06  1:58 ` [PATCH for-next v5 4/6] RDMA/rxe: Replace ah->pd by ah->ibah.pd Bob Pearson
@ 2021-10-06  1:58 ` Bob Pearson
  2021-10-06 11:55   ` Zhu Yanjun
  2021-10-06  1:58 ` [PATCH for-next v5 6/6] RDMA/rxe: Convert kernel UD post send to use ah_num Bob Pearson
  2021-10-06 19:37 ` [PATCH for-next v5 0/6] Replace AV by AH in UD sends Jason Gunthorpe
  6 siblings, 1 reply; 16+ messages in thread
From: Bob Pearson @ 2021-10-06  1:58 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Add code to rxe_get_av in rxe_av.c to use the AH index in UD send WQEs
to lookup the kernel AH. For old user providers continue to use the AV
passed in WQEs. Move setting pkt->rxe to before the call to rxe_get_av()
to get access to the AH pool.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_av.c  | 20 +++++++++++++++++++-
 drivers/infiniband/sw/rxe/rxe_req.c |  8 +++++---
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c
index 85580ea5eed0..38c7b6fb39d7 100644
--- a/drivers/infiniband/sw/rxe/rxe_av.c
+++ b/drivers/infiniband/sw/rxe/rxe_av.c
@@ -101,11 +101,29 @@ void rxe_av_fill_ip_info(struct rxe_av *av, struct rdma_ah_attr *attr)
 
 struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt)
 {
+	struct rxe_ah *ah;
+	u32 ah_num;
+
 	if (!pkt || !pkt->qp)
 		return NULL;
 
 	if (qp_type(pkt->qp) == IB_QPT_RC || qp_type(pkt->qp) == IB_QPT_UC)
 		return &pkt->qp->pri_av;
 
-	return (pkt->wqe) ? &pkt->wqe->wr.wr.ud.av : NULL;
+	if (!pkt->wqe)
+		return NULL;
+
+	ah_num = pkt->wqe->wr.wr.ud.ah_num;
+	if (ah_num) {
+		/* only new user provider or kernel client */
+		ah = rxe_pool_get_index(&pkt->rxe->ah_pool, ah_num);
+		if (!ah || ah->ah_num != ah_num || rxe_ah_pd(ah) != pkt->qp->pd) {
+			pr_warn("Unable to find AH matching ah_num\n");
+			return NULL;
+		}
+		return &ah->av;
+	}
+
+	/* only old user provider for UD sends*/
+	return &pkt->wqe->wr.wr.ud.av;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index fe275fcaffbd..0c9d2af15f3d 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -379,9 +379,8 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
 	/* length from start of bth to end of icrc */
 	paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE;
 
-	/* pkt->hdr, rxe, port_num and mask are initialized in ifc
-	 * layer
-	 */
+	/* pkt->hdr, port_num and mask are initialized in ifc layer */
+	pkt->rxe	= rxe;
 	pkt->opcode	= opcode;
 	pkt->qp		= qp;
 	pkt->psn	= qp->req.psn;
@@ -391,6 +390,9 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
 
 	/* init skb */
 	av = rxe_get_av(pkt);
+	if (!av)
+		return NULL;
+
 	skb = rxe_init_packet(rxe, av, paylen, pkt);
 	if (unlikely(!skb))
 		return NULL;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH for-next v5 6/6] RDMA/rxe: Convert kernel UD post send to use ah_num
  2021-10-06  1:58 [PATCH for-next v5 0/6] Replace AV by AH in UD sends Bob Pearson
                   ` (4 preceding siblings ...)
  2021-10-06  1:58 ` [PATCH for-next v5 5/6] RDMA/rxe: Lookup kernel AH from ah index in UD WQEs Bob Pearson
@ 2021-10-06  1:58 ` Bob Pearson
  2021-10-06 19:37 ` [PATCH for-next v5 0/6] Replace AV by AH in UD sends Jason Gunthorpe
  6 siblings, 0 replies; 16+ messages in thread
From: Bob Pearson @ 2021-10-06  1:58 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Modify ib_post_send for kernel UD sends to put the AH index into the
WQE instead of the address vector.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 8854ace63acd..b808777e2221 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -537,8 +537,11 @@ static void init_send_wr(struct rxe_qp *qp, struct rxe_send_wr *wr,
 	if (qp_type(qp) == IB_QPT_UD ||
 	    qp_type(qp) == IB_QPT_SMI ||
 	    qp_type(qp) == IB_QPT_GSI) {
+		struct ib_ah *ibah = ud_wr(ibwr)->ah;
+
 		wr->wr.ud.remote_qpn = ud_wr(ibwr)->remote_qpn;
 		wr->wr.ud.remote_qkey = ud_wr(ibwr)->remote_qkey;
+		wr->wr.ud.ah_num = to_rah(ibah)->ah_num;
 		if (qp_type(qp) == IB_QPT_GSI)
 			wr->wr.ud.pkey_index = ud_wr(ibwr)->pkey_index;
 		if (wr->opcode == IB_WR_SEND_WITH_IMM)
@@ -610,12 +613,6 @@ static void init_send_wqe(struct rxe_qp *qp, const struct ib_send_wr *ibwr,
 		return;
 	}
 
-	if (qp_type(qp) == IB_QPT_UD ||
-	    qp_type(qp) == IB_QPT_SMI ||
-	    qp_type(qp) == IB_QPT_GSI)
-		memcpy(&wqe->wr.wr.ud.av, &to_rah(ud_wr(ibwr)->ah)->av,
-		       sizeof(struct rxe_av));
-
 	if (unlikely(ibwr->send_flags & IB_SEND_INLINE))
 		copy_inline_data_to_wqe(wqe, ibwr);
 	else
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next v5 5/6] RDMA/rxe: Lookup kernel AH from ah index in UD WQEs
  2021-10-06  1:58 ` [PATCH for-next v5 5/6] RDMA/rxe: Lookup kernel AH from ah index in UD WQEs Bob Pearson
@ 2021-10-06 11:55   ` Zhu Yanjun
  2021-10-06 14:42     ` Pearson, Robert B
  0 siblings, 1 reply; 16+ messages in thread
From: Zhu Yanjun @ 2021-10-06 11:55 UTC (permalink / raw)
  To: Bob Pearson; +Cc: Jason Gunthorpe, RDMA mailing list

On Wed, Oct 6, 2021 at 9:58 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> Add code to rxe_get_av in rxe_av.c to use the AH index in UD send WQEs
> to lookup the kernel AH. For old user providers continue to use the AV
> passed in WQEs. Move setting pkt->rxe to before the call to rxe_get_av()
> to get access to the AH pool.
>
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_av.c  | 20 +++++++++++++++++++-
>  drivers/infiniband/sw/rxe/rxe_req.c |  8 +++++---
>  2 files changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c
> index 85580ea5eed0..38c7b6fb39d7 100644
> --- a/drivers/infiniband/sw/rxe/rxe_av.c
> +++ b/drivers/infiniband/sw/rxe/rxe_av.c
> @@ -101,11 +101,29 @@ void rxe_av_fill_ip_info(struct rxe_av *av, struct rdma_ah_attr *attr)
>
>  struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt)
>  {
> +       struct rxe_ah *ah;
> +       u32 ah_num;
> +
>         if (!pkt || !pkt->qp)
>                 return NULL;
>
>         if (qp_type(pkt->qp) == IB_QPT_RC || qp_type(pkt->qp) == IB_QPT_UC)
>                 return &pkt->qp->pri_av;
>
> -       return (pkt->wqe) ? &pkt->wqe->wr.wr.ud.av : NULL;
> +       if (!pkt->wqe)
> +               return NULL;
> +
> +       ah_num = pkt->wqe->wr.wr.ud.ah_num;
> +       if (ah_num) {
> +               /* only new user provider or kernel client */

struct rxe_ah *ah;
ah is only used in this snippet. Is it better to move to here?
It is only a trivial problem.

Zhu Yanjun
> +               ah = rxe_pool_get_index(&pkt->rxe->ah_pool, ah_num);
> +               if (!ah || ah->ah_num != ah_num || rxe_ah_pd(ah) != pkt->qp->pd) {
> +                       pr_warn("Unable to find AH matching ah_num\n");
> +                       return NULL;
> +               }
> +               return &ah->av;
> +       }
> +
> +       /* only old user provider for UD sends*/
> +       return &pkt->wqe->wr.wr.ud.av;
>  }
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index fe275fcaffbd..0c9d2af15f3d 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -379,9 +379,8 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
>         /* length from start of bth to end of icrc */
>         paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE;
>
> -       /* pkt->hdr, rxe, port_num and mask are initialized in ifc
> -        * layer
> -        */
> +       /* pkt->hdr, port_num and mask are initialized in ifc layer */
> +       pkt->rxe        = rxe;
>         pkt->opcode     = opcode;
>         pkt->qp         = qp;
>         pkt->psn        = qp->req.psn;
> @@ -391,6 +390,9 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
>
>         /* init skb */
>         av = rxe_get_av(pkt);
> +       if (!av)
> +               return NULL;
> +
>         skb = rxe_init_packet(rxe, av, paylen, pkt);
>         if (unlikely(!skb))
>                 return NULL;
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH for-next v5 5/6] RDMA/rxe: Lookup kernel AH from ah index in UD WQEs
  2021-10-06 11:55   ` Zhu Yanjun
@ 2021-10-06 14:42     ` Pearson, Robert B
  2021-10-07  3:12       ` Zhu Yanjun
  0 siblings, 1 reply; 16+ messages in thread
From: Pearson, Robert B @ 2021-10-06 14:42 UTC (permalink / raw)
  To: Zhu Yanjun, Bob Pearson; +Cc: Jason Gunthorpe, RDMA mailing list

Zhu,

It's a matter of preference. I find that for me always putting all the local variables at the top of a subroutine saves time and reduces bugs. I know where to look. They're always there. And there are no tricky scope issues to think about. If you can't see them because they are off the screen the subroutine is probably too big.

BTW do you have a new email address? I just saw one go by.

Bob

-----Original Message-----
From: Zhu Yanjun <zyjzyj2000@gmail.com> 
Sent: Wednesday, October 6, 2021 6:56 AM
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>; RDMA mailing list <linux-rdma@vger.kernel.org>
Subject: Re: [PATCH for-next v5 5/6] RDMA/rxe: Lookup kernel AH from ah index in UD WQEs

On Wed, Oct 6, 2021 at 9:58 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> Add code to rxe_get_av in rxe_av.c to use the AH index in UD send WQEs 
> to lookup the kernel AH. For old user providers continue to use the AV 
> passed in WQEs. Move setting pkt->rxe to before the call to 
> rxe_get_av() to get access to the AH pool.
>
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_av.c  | 20 +++++++++++++++++++-  
> drivers/infiniband/sw/rxe/rxe_req.c |  8 +++++---
>  2 files changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_av.c 
> b/drivers/infiniband/sw/rxe/rxe_av.c
> index 85580ea5eed0..38c7b6fb39d7 100644
> --- a/drivers/infiniband/sw/rxe/rxe_av.c
> +++ b/drivers/infiniband/sw/rxe/rxe_av.c
> @@ -101,11 +101,29 @@ void rxe_av_fill_ip_info(struct rxe_av *av, 
> struct rdma_ah_attr *attr)
>
>  struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt)  {
> +       struct rxe_ah *ah;
> +       u32 ah_num;
> +
>         if (!pkt || !pkt->qp)
>                 return NULL;
>
>         if (qp_type(pkt->qp) == IB_QPT_RC || qp_type(pkt->qp) == IB_QPT_UC)
>                 return &pkt->qp->pri_av;
>
> -       return (pkt->wqe) ? &pkt->wqe->wr.wr.ud.av : NULL;
> +       if (!pkt->wqe)
> +               return NULL;
> +
> +       ah_num = pkt->wqe->wr.wr.ud.ah_num;
> +       if (ah_num) {
> +               /* only new user provider or kernel client */

struct rxe_ah *ah;
ah is only used in this snippet. Is it better to move to here?
It is only a trivial problem.

Zhu Yanjun
> +               ah = rxe_pool_get_index(&pkt->rxe->ah_pool, ah_num);
> +               if (!ah || ah->ah_num != ah_num || rxe_ah_pd(ah) != pkt->qp->pd) {
> +                       pr_warn("Unable to find AH matching ah_num\n");
> +                       return NULL;
> +               }
> +               return &ah->av;
> +       }
> +
> +       /* only old user provider for UD sends*/
> +       return &pkt->wqe->wr.wr.ud.av;
>  }
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c 
> b/drivers/infiniband/sw/rxe/rxe_req.c
> index fe275fcaffbd..0c9d2af15f3d 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -379,9 +379,8 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
>         /* length from start of bth to end of icrc */
>         paylen = rxe_opcode[opcode].length + payload + pad + 
> RXE_ICRC_SIZE;
>
> -       /* pkt->hdr, rxe, port_num and mask are initialized in ifc
> -        * layer
> -        */
> +       /* pkt->hdr, port_num and mask are initialized in ifc layer */
> +       pkt->rxe        = rxe;
>         pkt->opcode     = opcode;
>         pkt->qp         = qp;
>         pkt->psn        = qp->req.psn;
> @@ -391,6 +390,9 @@ static struct sk_buff *init_req_packet(struct 
> rxe_qp *qp,
>
>         /* init skb */
>         av = rxe_get_av(pkt);
> +       if (!av)
> +               return NULL;
> +
>         skb = rxe_init_packet(rxe, av, paylen, pkt);
>         if (unlikely(!skb))
>                 return NULL;
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next v5 0/6] Replace AV by AH in UD sends
  2021-10-06  1:58 [PATCH for-next v5 0/6] Replace AV by AH in UD sends Bob Pearson
                   ` (5 preceding siblings ...)
  2021-10-06  1:58 ` [PATCH for-next v5 6/6] RDMA/rxe: Convert kernel UD post send to use ah_num Bob Pearson
@ 2021-10-06 19:37 ` Jason Gunthorpe
       [not found]   ` <8fb347bb-81b2-2ba6-a97c-16a5db86541d@gmail.com>
  6 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2021-10-06 19:37 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Tue, Oct 05, 2021 at 08:58:09PM -0500, Bob Pearson wrote:
> Currently the rdma_rxe driver and its user space provider exchange
> addressing information for UD sends by having the provider compute an
> address vector (AV) and send it with each WQE. This is not the way
> that the RDMA verbs API was intended to operate.
> 
> This series of patches modifies the way UD send WQEs work by exchanging
> an index identifying the AH replacing the 88 byte AV by a 4 byte AH
> index. In order to not break compatibility with the existing API the
> rdma_rxe driver will recognise when an older version of the provider
> is not sending an index (i.e. it is 0) and will use the AV instead.
> 
> This series of patches is identical to the previous version
> but rebased to 5.15.0-rc2+. It applies cleanly to
> 
>     commit: d30ef6d5c013c19e907f2a3a3d6eee04fcd3de0d (for-next)
> 
> v5:
>   Rebase to 5.15.0-rc2+

This is not the right base, I said you needed something path Rao's
patch like current rdma for-next since it gets conflicts:

Applying: RDMA/rxe: Move AV from rxe_send_wqe to rxe_send_wr
Applying: RDMA/rxe: Change AH objects to indexed
Using index info to reconstruct a base tree...
M	drivers/infiniband/sw/rxe/rxe_param.h
Falling back to patching base and 3-way merge...
Auto-merging drivers/infiniband/sw/rxe/rxe_param.h
CONFLICT (content): Merge conflict in drivers/infiniband/sw/rxe/rxe_param.h
error: Failed to merge in the changes.
Patch failed at 0002 RDMA/rxe: Change AH objects to indexed
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Try again

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next v5 5/6] RDMA/rxe: Lookup kernel AH from ah index in UD WQEs
  2021-10-06 14:42     ` Pearson, Robert B
@ 2021-10-07  3:12       ` Zhu Yanjun
  0 siblings, 0 replies; 16+ messages in thread
From: Zhu Yanjun @ 2021-10-07  3:12 UTC (permalink / raw)
  To: Pearson, Robert B; +Cc: Bob Pearson, Jason Gunthorpe, RDMA mailing list

On Wed, Oct 6, 2021 at 10:42 PM Pearson, Robert B
<robert.pearson2@hpe.com> wrote:
>
> Zhu,
>
> It's a matter of preference. I find that for me always putting all the local variables at the top of a subroutine saves time and reduces bugs. I know where to look. They're always there. And there are no tricky scope issues to think about. If you can't see them because they are off the screen the subroutine is probably too big.
>

Yeah. It is a matter of preference. I like to put all the variables
near where they are used.
Do not worry. I am fine with your preference.

Zhu Yanjun
> BTW do you have a new email address? I just saw one go by.
>
> Bob
>
> -----Original Message-----
> From: Zhu Yanjun <zyjzyj2000@gmail.com>
> Sent: Wednesday, October 6, 2021 6:56 AM
> To: Bob Pearson <rpearsonhpe@gmail.com>
> Cc: Jason Gunthorpe <jgg@nvidia.com>; RDMA mailing list <linux-rdma@vger.kernel.org>
> Subject: Re: [PATCH for-next v5 5/6] RDMA/rxe: Lookup kernel AH from ah index in UD WQEs
>
> On Wed, Oct 6, 2021 at 9:58 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
> >
> > Add code to rxe_get_av in rxe_av.c to use the AH index in UD send WQEs
> > to lookup the kernel AH. For old user providers continue to use the AV
> > passed in WQEs. Move setting pkt->rxe to before the call to
> > rxe_get_av() to get access to the AH pool.
> >
> > Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> > ---
> >  drivers/infiniband/sw/rxe/rxe_av.c  | 20 +++++++++++++++++++-
> > drivers/infiniband/sw/rxe/rxe_req.c |  8 +++++---
> >  2 files changed, 24 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_av.c
> > b/drivers/infiniband/sw/rxe/rxe_av.c
> > index 85580ea5eed0..38c7b6fb39d7 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_av.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_av.c
> > @@ -101,11 +101,29 @@ void rxe_av_fill_ip_info(struct rxe_av *av,
> > struct rdma_ah_attr *attr)
> >
> >  struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt)  {
> > +       struct rxe_ah *ah;
> > +       u32 ah_num;
> > +
> >         if (!pkt || !pkt->qp)
> >                 return NULL;
> >
> >         if (qp_type(pkt->qp) == IB_QPT_RC || qp_type(pkt->qp) == IB_QPT_UC)
> >                 return &pkt->qp->pri_av;
> >
> > -       return (pkt->wqe) ? &pkt->wqe->wr.wr.ud.av : NULL;
> > +       if (!pkt->wqe)
> > +               return NULL;
> > +
> > +       ah_num = pkt->wqe->wr.wr.ud.ah_num;
> > +       if (ah_num) {
> > +               /* only new user provider or kernel client */
>
> struct rxe_ah *ah;
> ah is only used in this snippet. Is it better to move to here?
> It is only a trivial problem.
>
> Zhu Yanjun
> > +               ah = rxe_pool_get_index(&pkt->rxe->ah_pool, ah_num);
> > +               if (!ah || ah->ah_num != ah_num || rxe_ah_pd(ah) != pkt->qp->pd) {
> > +                       pr_warn("Unable to find AH matching ah_num\n");
> > +                       return NULL;
> > +               }
> > +               return &ah->av;
> > +       }
> > +
> > +       /* only old user provider for UD sends*/
> > +       return &pkt->wqe->wr.wr.ud.av;
> >  }
> > diff --git a/drivers/infiniband/sw/rxe/rxe_req.c
> > b/drivers/infiniband/sw/rxe/rxe_req.c
> > index fe275fcaffbd..0c9d2af15f3d 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_req.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> > @@ -379,9 +379,8 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
> >         /* length from start of bth to end of icrc */
> >         paylen = rxe_opcode[opcode].length + payload + pad +
> > RXE_ICRC_SIZE;
> >
> > -       /* pkt->hdr, rxe, port_num and mask are initialized in ifc
> > -        * layer
> > -        */
> > +       /* pkt->hdr, port_num and mask are initialized in ifc layer */
> > +       pkt->rxe        = rxe;
> >         pkt->opcode     = opcode;
> >         pkt->qp         = qp;
> >         pkt->psn        = qp->req.psn;
> > @@ -391,6 +390,9 @@ static struct sk_buff *init_req_packet(struct
> > rxe_qp *qp,
> >
> >         /* init skb */
> >         av = rxe_get_av(pkt);
> > +       if (!av)
> > +               return NULL;
> > +
> >         skb = rxe_init_packet(rxe, av, paylen, pkt);
> >         if (unlikely(!skb))
> >                 return NULL;
> > --
> > 2.30.2
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next v5 0/6] Replace AV by AH in UD sends
       [not found]         ` <20211007190543.GM2744544@nvidia.com>
@ 2021-10-07 19:51           ` Bob Pearson
  2021-10-07 19:57             ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Bob Pearson @ 2021-10-07 19:51 UTC (permalink / raw)
  To: Jason Gunthorpe, Rao Shoaib, Zhu Yanjun, linux-rdma

On 10/7/21 2:05 PM, Jason Gunthorpe wrote:
> On Thu, Oct 07, 2021 at 01:53:27PM -0500, Bob Pearson wrote:
> 
>> On looking, Rao's patch is not in for-next. Last one was
>> January. Which branch are you looking at?
> 
> Oh, it is still in the wip branch, try now
> 
> Jason
> 

I see the issue. Rao is asking for 2^20 objects max by default which will
require 128KiB of memory in the index reservation bit mask for each of them.
There are 4 indexed objects QP by qpn, SRQ by srqn, MR by rkey and MW by rkey.
That's 512KiB of memory which seems excessive to me for many use cases where the
number of objects is fairly small.

The bit mask is used to allocate and free the indices and there is also a red black
tree that is used to look up objects by their index (or key if they use keys instead.)

If there is a usual way to address these kinds of issues in Linux maybe we should
consider that. If not there are a couple of approaches we could take. First would
be to get rid of the index bit mask and just hand out randomly selected indices in
(a bigger range) and detect collisions when we insert the object into the red black
tree and retry. This is basically what happens with 'keys' for example mgids for
multicast group elements. Alternatively we could leave the max big but limit the
allocated indices to a smaller amount until the total number of allocated indices
reached some threshold and then extend the bit mask table. Then only the use cases
that really needed the big index range would pay the price for the memory.

Random indices would slightly reduce some of the security issues that have been
pointed out about the InfiniBand transport.

I am looking for suggestions on how to go forward here.

Bob

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next v5 0/6] Replace AV by AH in UD sends
  2021-10-07 19:51           ` Bob Pearson
@ 2021-10-07 19:57             ` Jason Gunthorpe
  2021-10-07 20:40               ` Shoaib Rao
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2021-10-07 19:57 UTC (permalink / raw)
  To: Bob Pearson; +Cc: Rao Shoaib, Zhu Yanjun, linux-rdma

On Thu, Oct 07, 2021 at 02:51:11PM -0500, Bob Pearson wrote:
> On 10/7/21 2:05 PM, Jason Gunthorpe wrote:
> > On Thu, Oct 07, 2021 at 01:53:27PM -0500, Bob Pearson wrote:
> > 
> >> On looking, Rao's patch is not in for-next. Last one was
> >> January. Which branch are you looking at?
> > 
> > Oh, it is still in the wip branch, try now
> > 
> > Jason
> > 
> 
> I see the issue. Rao is asking for 2^20 objects max by default which will
> require 128KiB of memory in the index reservation bit mask for each of them.
> There are 4 indexed objects QP by qpn, SRQ by srqn, MR by rkey and MW by rkey.
> That's 512KiB of memory which seems excessive to me for many use cases where the
> number of objects is fairly small.
> 
> The bit mask is used to allocate and free the indices and there is also a red black
> tree that is used to look up objects by their index (or key if they use keys instead.)
> 
> If there is a usual way to address these kinds of issues in Linux maybe we should
> consider that.

Use an allocating xarray

But for these AV patches just fix the merge conflict to something sane
and go ahead

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next v5 0/6] Replace AV by AH in UD sends
  2021-10-07 19:57             ` Jason Gunthorpe
@ 2021-10-07 20:40               ` Shoaib Rao
  2021-10-07 22:00                 ` Bob Pearson
  0 siblings, 1 reply; 16+ messages in thread
From: Shoaib Rao @ 2021-10-07 20:40 UTC (permalink / raw)
  To: Jason Gunthorpe, Bob Pearson; +Cc: Zhu Yanjun, linux-rdma


On 10/7/21 12:57 PM, Jason Gunthorpe wrote:
> On Thu, Oct 07, 2021 at 02:51:11PM -0500, Bob Pearson wrote:
>> On 10/7/21 2:05 PM, Jason Gunthorpe wrote:
>>> On Thu, Oct 07, 2021 at 01:53:27PM -0500, Bob Pearson wrote:
>>>
>>>> On looking, Rao's patch is not in for-next. Last one was
>>>> January. Which branch are you looking at?
>>> Oh, it is still in the wip branch, try now
>>>
>>> Jason
>>>
>> I see the issue. Rao is asking for 2^20 objects max by default which will
>> require 128KiB of memory in the index reservation bit mask for each of them.
>> There are 4 indexed objects QP by qpn, SRQ by srqn, MR by rkey and MW by rkey.
>> That's 512KiB of memory which seems excessive to me for many use cases where the
>> number of objects is fairly small.
>>
>> The bit mask is used to allocate and free the indices and there is also a red black
>> tree that is used to look up objects by their index (or key if they use keys instead.)
>>
>> If there is a usual way to address these kinds of issues in Linux maybe we should
>> consider that.
> Use an allocating xarray
>
> But for these AV patches just fix the merge conflict to something sane
> and go ahead
>
> Jason

I did not want to increase the values too high but we discussed it so I 
did. Let me know if I need to modify the patch and reduce the values.

Shoaib


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next v5 0/6] Replace AV by AH in UD sends
  2021-10-07 20:40               ` Shoaib Rao
@ 2021-10-07 22:00                 ` Bob Pearson
  2021-10-07 22:53                   ` Shoaib Rao
  0 siblings, 1 reply; 16+ messages in thread
From: Bob Pearson @ 2021-10-07 22:00 UTC (permalink / raw)
  To: Shoaib Rao, Jason Gunthorpe; +Cc: Zhu Yanjun, linux-rdma

On 10/7/21 3:40 PM, Shoaib Rao wrote:
> 
> On 10/7/21 12:57 PM, Jason Gunthorpe wrote:
>> On Thu, Oct 07, 2021 at 02:51:11PM -0500, Bob Pearson wrote:
>>> On 10/7/21 2:05 PM, Jason Gunthorpe wrote:
>>>> On Thu, Oct 07, 2021 at 01:53:27PM -0500, Bob Pearson wrote:
>>>>
>>>>> On looking, Rao's patch is not in for-next. Last one was
>>>>> January. Which branch are you looking at?
>>>> Oh, it is still in the wip branch, try now
>>>>
>>>> Jason
>>>>
>>> I see the issue. Rao is asking for 2^20 objects max by default which will
>>> require 128KiB of memory in the index reservation bit mask for each of them.
>>> There are 4 indexed objects QP by qpn, SRQ by srqn, MR by rkey and MW by rkey.
>>> That's 512KiB of memory which seems excessive to me for many use cases where the
>>> number of objects is fairly small.
>>>
>>> The bit mask is used to allocate and free the indices and there is also a red black
>>> tree that is used to look up objects by their index (or key if they use keys instead.)
>>>
>>> If there is a usual way to address these kinds of issues in Linux maybe we should
>>> consider that.
>> Use an allocating xarray
>>
>> But for these AV patches just fix the merge conflict to something sane
>> and go ahead
>>
>> Jason
> 
> I did not want to increase the values too high but we discussed it so I did. Let me know if I need to modify the patch and reduce the values.
> 
> Shoaib
> 

If we convert the rxe_pools to use xarrays as Jason suggests it looks like this issue
goes away. I'm looking at that.

Bob

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next v5 0/6] Replace AV by AH in UD sends
  2021-10-07 22:00                 ` Bob Pearson
@ 2021-10-07 22:53                   ` Shoaib Rao
  0 siblings, 0 replies; 16+ messages in thread
From: Shoaib Rao @ 2021-10-07 22:53 UTC (permalink / raw)
  To: Bob Pearson, Jason Gunthorpe; +Cc: Zhu Yanjun, linux-rdma


On 10/7/21 3:00 PM, Bob Pearson wrote:
> On 10/7/21 3:40 PM, Shoaib Rao wrote:
>> On 10/7/21 12:57 PM, Jason Gunthorpe wrote:
>>> On Thu, Oct 07, 2021 at 02:51:11PM -0500, Bob Pearson wrote:
>>>> On 10/7/21 2:05 PM, Jason Gunthorpe wrote:
>>>>> On Thu, Oct 07, 2021 at 01:53:27PM -0500, Bob Pearson wrote:
>>>>>
>>>>>> On looking, Rao's patch is not in for-next. Last one was
>>>>>> January. Which branch are you looking at?
>>>>> Oh, it is still in the wip branch, try now
>>>>>
>>>>> Jason
>>>>>
>>>> I see the issue. Rao is asking for 2^20 objects max by default which will
>>>> require 128KiB of memory in the index reservation bit mask for each of them.
>>>> There are 4 indexed objects QP by qpn, SRQ by srqn, MR by rkey and MW by rkey.
>>>> That's 512KiB of memory which seems excessive to me for many use cases where the
>>>> number of objects is fairly small.
>>>>
>>>> The bit mask is used to allocate and free the indices and there is also a red black
>>>> tree that is used to look up objects by their index (or key if they use keys instead.)
>>>>
>>>> If there is a usual way to address these kinds of issues in Linux maybe we should
>>>> consider that.
>>> Use an allocating xarray
>>>
>>> But for these AV patches just fix the merge conflict to something sane
>>> and go ahead
>>>
>>> Jason
>> I did not want to increase the values too high but we discussed it so I did. Let me know if I need to modify the patch and reduce the values.
>>
>> Shoaib
>>
> If we convert the rxe_pools to use xarrays as Jason suggests it looks like this issue
> goes away. I'm looking at that.
>
> Bob

Thanks Bob. Let me know if there is anything that I can help out with.

Shoaib


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-10-07 22:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-06  1:58 [PATCH for-next v5 0/6] Replace AV by AH in UD sends Bob Pearson
2021-10-06  1:58 ` [PATCH for-next v5 1/6] RDMA/rxe: Move AV from rxe_send_wqe to rxe_send_wr Bob Pearson
2021-10-06  1:58 ` [PATCH for-next v5 2/6] RDMA/rxe: Change AH objects to indexed Bob Pearson
2021-10-06  1:58 ` [PATCH for-next v5 3/6] RDMA/rxe: Create AH index and return to user space Bob Pearson
2021-10-06  1:58 ` [PATCH for-next v5 4/6] RDMA/rxe: Replace ah->pd by ah->ibah.pd Bob Pearson
2021-10-06  1:58 ` [PATCH for-next v5 5/6] RDMA/rxe: Lookup kernel AH from ah index in UD WQEs Bob Pearson
2021-10-06 11:55   ` Zhu Yanjun
2021-10-06 14:42     ` Pearson, Robert B
2021-10-07  3:12       ` Zhu Yanjun
2021-10-06  1:58 ` [PATCH for-next v5 6/6] RDMA/rxe: Convert kernel UD post send to use ah_num Bob Pearson
2021-10-06 19:37 ` [PATCH for-next v5 0/6] Replace AV by AH in UD sends Jason Gunthorpe
     [not found]   ` <8fb347bb-81b2-2ba6-a97c-16a5db86541d@gmail.com>
     [not found]     ` <20211006224906.GE2744544@nvidia.com>
     [not found]       ` <086698cc-9e50-49be-aea8-7a4426f2e502@gmail.com>
     [not found]         ` <20211007190543.GM2744544@nvidia.com>
2021-10-07 19:51           ` Bob Pearson
2021-10-07 19:57             ` Jason Gunthorpe
2021-10-07 20:40               ` Shoaib Rao
2021-10-07 22:00                 ` Bob Pearson
2021-10-07 22:53                   ` Shoaib Rao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).