All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-core 0/8] Completion timestamping support in mlx4
@ 2017-01-25 14:49 Yishai Hadas
       [not found] ` <1485355791-27646-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Yishai Hadas @ 2017-01-25 14:49 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, lariel-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

This series from Ariel implements the libibverbs 'Completion timestamping' API
in the mlx4 provider.

It implements the new iterator style CQ polling API with support of querying
specific CQ fields, among them the completion timestamp.

Benchmarks we ran in our test lab found that this new approach generally equals
to the legacy API but *not* worse than. As the new API enables extending the
polled fields we can overall say that it's a better API than the legacy one.

Pull request was sent:
https://github.com/linux-rdma/rdma-core/pull/61

Yishai

Ariel Levkovich (8):
  mlx4: sl_vid field in struct mlx4_cqe should be 16 bit
  mlx4: Refactor mlx4_poll_one
  mlx4: Add lazy CQ polling
  mlx4: Add inline functions to read completion's attributes
  mlx4: Add ability to poll CQs through iterator's style API
  mlx4: Add support for creating an extended CQ
  mlx4: Add ibv_query_device_ex support
  mlx4: Add ibv_query_rt_values

 providers/mlx4/cq.c       | 492 +++++++++++++++++++++++++++++++++++++---------
 providers/mlx4/mlx4-abi.h |  27 +++
 providers/mlx4/mlx4.c     |  41 +++-
 providers/mlx4/mlx4.h     |  56 +++++-
 providers/mlx4/verbs.c    | 238 ++++++++++++++++++++--
 5 files changed, 735 insertions(+), 119 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH rdma-core 1/8] mlx4: sl_vid field in struct mlx4_cqe should be 16 bit
       [not found] ` <1485355791-27646-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-01-25 14:49   ` Yishai Hadas
  2017-01-25 14:49   ` [PATCH rdma-core 2/8] mlx4: Refactor mlx4_poll_one Yishai Hadas
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Yishai Hadas @ 2017-01-25 14:49 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, lariel-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

This fix adjusts the sl_vid fields size in struct mlx4_cqe
to be 16 bit.

Fixes: 9a90176880 ("Add support for 64B CQEs")
Signed-off-by: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Acked-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 providers/mlx4/mlx4.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/providers/mlx4/mlx4.h b/providers/mlx4/mlx4.h
index b851e95..5a73ed6 100644
--- a/providers/mlx4/mlx4.h
+++ b/providers/mlx4/mlx4.h
@@ -272,8 +272,7 @@ struct mlx4_cqe {
 	uint32_t	vlan_my_qpn;
 	uint32_t	immed_rss_invalid;
 	uint32_t	g_mlpath_rqpn;
-	uint8_t		sl_vid;
-	uint8_t		reserved1;
+	uint16_t	sl_vid;
 	uint16_t	rlid;
 	uint32_t	status;
 	uint32_t	byte_cnt;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH rdma-core 2/8] mlx4: Refactor mlx4_poll_one
       [not found] ` <1485355791-27646-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-01-25 14:49   ` [PATCH rdma-core 1/8] mlx4: sl_vid field in struct mlx4_cqe should be 16 bit Yishai Hadas
@ 2017-01-25 14:49   ` Yishai Hadas
       [not found]     ` <1485355791-27646-3-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-01-25 14:49   ` [PATCH rdma-core 3/8] mlx4: Add lazy CQ polling Yishai Hadas
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Yishai Hadas @ 2017-01-25 14:49 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, lariel-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Since downstream patches aim to provide lazy CQE polling, which
let the user poll the CQE's attributes via inline functions, we
refactor poll_one:
* Return status instead of writing directly to the WC as part of
  handle_error_cqe.
* Introduce mlx4_get_next_cqe which will be used to advance the
  CQE iterator.

Signed-off-by: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Acked-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 providers/mlx4/cq.c   | 88 +++++++++++++++++++++++++++------------------------
 providers/mlx4/mlx4.h |  6 ++++
 2 files changed, 52 insertions(+), 42 deletions(-)

diff --git a/providers/mlx4/cq.c b/providers/mlx4/cq.c
index 23cc3ed..8f67c90 100644
--- a/providers/mlx4/cq.c
+++ b/providers/mlx4/cq.c
@@ -114,7 +114,7 @@ static struct mlx4_cqe *next_cqe_sw(struct mlx4_cq *cq)
 	return get_sw_cqe(cq, cq->cons_index);
 }
 
-static void mlx4_handle_error_cqe(struct mlx4_err_cqe *cqe, struct ibv_wc *wc)
+static enum ibv_wc_status mlx4_handle_error_cqe(struct mlx4_err_cqe *cqe)
 {
 	if (cqe->syndrome == MLX4_CQE_SYNDROME_LOCAL_QP_OP_ERR)
 		printf(PFX "local QP operation err "
@@ -126,64 +126,43 @@ static void mlx4_handle_error_cqe(struct mlx4_err_cqe *cqe, struct ibv_wc *wc)
 
 	switch (cqe->syndrome) {
 	case MLX4_CQE_SYNDROME_LOCAL_LENGTH_ERR:
-		wc->status = IBV_WC_LOC_LEN_ERR;
-		break;
+		return IBV_WC_LOC_LEN_ERR;
 	case MLX4_CQE_SYNDROME_LOCAL_QP_OP_ERR:
-		wc->status = IBV_WC_LOC_QP_OP_ERR;
-		break;
+		return IBV_WC_LOC_QP_OP_ERR;
 	case MLX4_CQE_SYNDROME_LOCAL_PROT_ERR:
-		wc->status = IBV_WC_LOC_PROT_ERR;
-		break;
+		return IBV_WC_LOC_PROT_ERR;
 	case MLX4_CQE_SYNDROME_WR_FLUSH_ERR:
-		wc->status = IBV_WC_WR_FLUSH_ERR;
-		break;
+		return IBV_WC_WR_FLUSH_ERR;
 	case MLX4_CQE_SYNDROME_MW_BIND_ERR:
-		wc->status = IBV_WC_MW_BIND_ERR;
-		break;
+		return IBV_WC_MW_BIND_ERR;
 	case MLX4_CQE_SYNDROME_BAD_RESP_ERR:
-		wc->status = IBV_WC_BAD_RESP_ERR;
-		break;
+		return IBV_WC_BAD_RESP_ERR;
 	case MLX4_CQE_SYNDROME_LOCAL_ACCESS_ERR:
-		wc->status = IBV_WC_LOC_ACCESS_ERR;
-		break;
+		return IBV_WC_LOC_ACCESS_ERR;
 	case MLX4_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR:
-		wc->status = IBV_WC_REM_INV_REQ_ERR;
-		break;
+		return IBV_WC_REM_INV_REQ_ERR;
 	case MLX4_CQE_SYNDROME_REMOTE_ACCESS_ERR:
-		wc->status = IBV_WC_REM_ACCESS_ERR;
-		break;
+		return IBV_WC_REM_ACCESS_ERR;
 	case MLX4_CQE_SYNDROME_REMOTE_OP_ERR:
-		wc->status = IBV_WC_REM_OP_ERR;
-		break;
+		return IBV_WC_REM_OP_ERR;
 	case MLX4_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR:
-		wc->status = IBV_WC_RETRY_EXC_ERR;
-		break;
+		return IBV_WC_RETRY_EXC_ERR;
 	case MLX4_CQE_SYNDROME_RNR_RETRY_EXC_ERR:
-		wc->status = IBV_WC_RNR_RETRY_EXC_ERR;
-		break;
+		return IBV_WC_RNR_RETRY_EXC_ERR;
 	case MLX4_CQE_SYNDROME_REMOTE_ABORTED_ERR:
-		wc->status = IBV_WC_REM_ABORT_ERR;
-		break;
+		return IBV_WC_REM_ABORT_ERR;
 	default:
-		wc->status = IBV_WC_GENERAL_ERR;
-		break;
+		return IBV_WC_GENERAL_ERR;
 	}
-
-	wc->vendor_err = cqe->vendor_err;
 }
 
-static int mlx4_poll_one(struct mlx4_cq *cq,
-			 struct mlx4_qp **cur_qp,
-			 struct ibv_wc *wc)
+static inline int mlx4_get_next_cqe(struct mlx4_cq *cq,
+				    struct mlx4_cqe **pcqe)
+				    ALWAYS_INLINE;
+static inline int mlx4_get_next_cqe(struct mlx4_cq *cq,
+				    struct mlx4_cqe **pcqe)
 {
-	struct mlx4_wq *wq;
 	struct mlx4_cqe *cqe;
-	struct mlx4_srq *srq;
-	uint32_t qpn;
-	uint32_t g_mlpath_rqpn;
-	uint16_t wqe_index;
-	int is_error;
-	int is_send;
 
 	cqe = next_cqe_sw(cq);
 	if (!cqe)
@@ -202,6 +181,28 @@ static int mlx4_poll_one(struct mlx4_cq *cq,
 	 */
 	rmb();
 
+	*pcqe = cqe;
+
+	return CQ_OK;
+}
+
+static int mlx4_poll_one(struct mlx4_cq *cq,
+			 struct mlx4_qp **cur_qp,
+			 struct ibv_wc *wc)
+{
+	struct mlx4_wq *wq;
+	struct mlx4_cqe *cqe;
+	struct mlx4_srq *srq;
+	uint32_t qpn;
+	uint32_t g_mlpath_rqpn;
+	uint16_t wqe_index;
+	struct mlx4_err_cqe *ecqe;
+	int is_error;
+	int is_send;
+
+	if  (mlx4_get_next_cqe(cq, &cqe) == CQ_EMPTY)
+		return CQ_EMPTY;
+
 	qpn = ntohl(cqe->vlan_my_qpn) & MLX4_CQE_QPN_MASK;
 	wc->qp_num = qpn;
 
@@ -250,7 +251,10 @@ static int mlx4_poll_one(struct mlx4_cq *cq,
 	}
 
 	if (is_error) {
-		mlx4_handle_error_cqe((struct mlx4_err_cqe *) cqe, wc);
+		ecqe = (struct mlx4_err_cqe *)cqe;
+		wc->status = mlx4_handle_error_cqe(ecqe);
+		wc->vendor_err = ecqe->vendor_err;
+
 		return CQ_OK;
 	}
 
diff --git a/providers/mlx4/mlx4.h b/providers/mlx4/mlx4.h
index 5a73ed6..af21eeb 100644
--- a/providers/mlx4/mlx4.h
+++ b/providers/mlx4/mlx4.h
@@ -45,6 +45,12 @@
 
 #include <valgrind/memcheck.h>
 
+#ifdef HAVE_FUNC_ATTRIBUTE_ALWAYS_INLINE
+#define ALWAYS_INLINE __attribute__((always_inline))
+#else
+#define ALWAYS_INLINE
+#endif
+
 #define PFX		"mlx4: "
 
 enum {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH rdma-core 3/8] mlx4: Add lazy CQ polling
       [not found] ` <1485355791-27646-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-01-25 14:49   ` [PATCH rdma-core 1/8] mlx4: sl_vid field in struct mlx4_cqe should be 16 bit Yishai Hadas
  2017-01-25 14:49   ` [PATCH rdma-core 2/8] mlx4: Refactor mlx4_poll_one Yishai Hadas
@ 2017-01-25 14:49   ` Yishai Hadas
  2017-01-25 14:49   ` [PATCH rdma-core 4/8] mlx4: Add inline functions to read completion's attributes Yishai Hadas
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Yishai Hadas @ 2017-01-25 14:49 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, lariel-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Currently, when a user wants to poll a CQ for completion, he has no
choice but to get the whole work completion (WC). This has several
implications - for example:
* Extending the WC is limited, as adding new fields makes the WC
  larger and could take more cache lines.
* Every field is being copied to the WC - even fields that the user
  doesn't care about.

This patch adds some support for handling the CQE in a lazing manner.
The new lazy mode is going to be called in downstream patches.

We only parse fields that are mandatory in order to figure out the CQE
such as type, status, wr_id, etc.

To share code with the legacy mode without having a performance penalty
the legacy code was refactored and the 'always_inline' mechanism was
used so that branch conditions will be dropped at compile time.

Signed-off-by: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Acked-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 providers/mlx4/cq.c    | 165 +++++++++++++++++++++++++++++++------------------
 providers/mlx4/mlx4.h  |   9 ++-
 providers/mlx4/verbs.c |   6 +-
 3 files changed, 116 insertions(+), 64 deletions(-)

diff --git a/providers/mlx4/cq.c b/providers/mlx4/cq.c
index 8f67c90..6c4b3c4 100644
--- a/providers/mlx4/cq.c
+++ b/providers/mlx4/cq.c
@@ -156,6 +156,46 @@ static enum ibv_wc_status mlx4_handle_error_cqe(struct mlx4_err_cqe *cqe)
 	}
 }
 
+static inline void handle_good_req(struct ibv_wc *wc, struct mlx4_cqe *cqe)
+{
+	wc->wc_flags = 0;
+	switch (cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) {
+	case MLX4_OPCODE_RDMA_WRITE_IMM:
+		wc->wc_flags |= IBV_WC_WITH_IMM;
+	case MLX4_OPCODE_RDMA_WRITE:
+		wc->opcode    = IBV_WC_RDMA_WRITE;
+		break;
+	case MLX4_OPCODE_SEND_IMM:
+		wc->wc_flags |= IBV_WC_WITH_IMM;
+	case MLX4_OPCODE_SEND:
+	case MLX4_OPCODE_SEND_INVAL:
+		wc->opcode    = IBV_WC_SEND;
+		break;
+	case MLX4_OPCODE_RDMA_READ:
+		wc->opcode    = IBV_WC_RDMA_READ;
+		wc->byte_len  = ntohl(cqe->byte_cnt);
+		break;
+	case MLX4_OPCODE_ATOMIC_CS:
+		wc->opcode    = IBV_WC_COMP_SWAP;
+		wc->byte_len  = 8;
+		break;
+	case MLX4_OPCODE_ATOMIC_FA:
+		wc->opcode    = IBV_WC_FETCH_ADD;
+		wc->byte_len  = 8;
+		break;
+	case MLX4_OPCODE_LOCAL_INVAL:
+		wc->opcode    = IBV_WC_LOCAL_INV;
+		break;
+	case MLX4_OPCODE_BIND_MW:
+		wc->opcode    = IBV_WC_BIND_MW;
+		break;
+	default:
+		/* assume it's a send completion */
+		wc->opcode    = IBV_WC_SEND;
+		break;
+	}
+}
+
 static inline int mlx4_get_next_cqe(struct mlx4_cq *cq,
 				    struct mlx4_cqe **pcqe)
 				    ALWAYS_INLINE;
@@ -186,25 +226,35 @@ static inline int mlx4_get_next_cqe(struct mlx4_cq *cq,
 	return CQ_OK;
 }
 
-static int mlx4_poll_one(struct mlx4_cq *cq,
-			 struct mlx4_qp **cur_qp,
-			 struct ibv_wc *wc)
+static inline int mlx4_parse_cqe(struct mlx4_cq *cq,
+					struct mlx4_cqe *cqe,
+					struct mlx4_qp **cur_qp,
+					struct ibv_wc *wc, int lazy)
+					ALWAYS_INLINE;
+static inline int mlx4_parse_cqe(struct mlx4_cq *cq,
+					struct mlx4_cqe *cqe,
+					struct mlx4_qp **cur_qp,
+					struct ibv_wc *wc, int lazy)
 {
 	struct mlx4_wq *wq;
-	struct mlx4_cqe *cqe;
 	struct mlx4_srq *srq;
 	uint32_t qpn;
 	uint32_t g_mlpath_rqpn;
+	uint64_t *pwr_id;
 	uint16_t wqe_index;
 	struct mlx4_err_cqe *ecqe;
+	struct mlx4_context *mctx;
 	int is_error;
 	int is_send;
+	enum ibv_wc_status *pstatus;
 
-	if  (mlx4_get_next_cqe(cq, &cqe) == CQ_EMPTY)
-		return CQ_EMPTY;
-
+	mctx = to_mctx(cq->ibv_cq.context);
 	qpn = ntohl(cqe->vlan_my_qpn) & MLX4_CQE_QPN_MASK;
-	wc->qp_num = qpn;
+	if (lazy) {
+		cq->cqe = cqe;
+		cq->flags &= (~MLX4_CQ_FLAGS_RX_CSUM_VALID);
+	} else
+		wc->qp_num = qpn;
 
 	is_send  = cqe->owner_sr_opcode & MLX4_CQE_IS_SEND_MASK;
 	is_error = (cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) ==
@@ -216,7 +266,7 @@ static int mlx4_poll_one(struct mlx4_cq *cq,
 		 * because CQs will be locked while SRQs are removed
 		 * from the table.
 		 */
-		srq = mlx4_find_xsrq(&to_mctx(cq->ibv_cq.context)->xsrq_table,
+		srq = mlx4_find_xsrq(&mctx->xsrq_table,
 				     ntohl(cqe->g_mlpath_rqpn) & MLX4_CQE_QPN_MASK);
 		if (!srq)
 			return CQ_POLL_ERR;
@@ -227,78 +277,46 @@ static int mlx4_poll_one(struct mlx4_cq *cq,
 			 * because CQs will be locked while QPs are removed
 			 * from the table.
 			 */
-			*cur_qp = mlx4_find_qp(to_mctx(cq->ibv_cq.context), qpn);
+			*cur_qp = mlx4_find_qp(mctx, qpn);
 			if (!*cur_qp)
 				return CQ_POLL_ERR;
 		}
 		srq = ((*cur_qp)->verbs_qp.qp.srq) ? to_msrq((*cur_qp)->verbs_qp.qp.srq) : NULL;
 	}
 
+	pwr_id = lazy ? &cq->ibv_cq.wr_id : &wc->wr_id;
 	if (is_send) {
 		wq = &(*cur_qp)->sq;
 		wqe_index = ntohs(cqe->wqe_index);
 		wq->tail += (uint16_t) (wqe_index - (uint16_t) wq->tail);
-		wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
+		*pwr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
 		++wq->tail;
 	} else if (srq) {
 		wqe_index = htons(cqe->wqe_index);
-		wc->wr_id = srq->wrid[wqe_index];
+		*pwr_id = srq->wrid[wqe_index];
 		mlx4_free_srq_wqe(srq, wqe_index);
 	} else {
 		wq = &(*cur_qp)->rq;
-		wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
+		*pwr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
 		++wq->tail;
 	}
 
+	pstatus = lazy ? &cq->ibv_cq.status : &wc->status;
 	if (is_error) {
 		ecqe = (struct mlx4_err_cqe *)cqe;
-		wc->status = mlx4_handle_error_cqe(ecqe);
-		wc->vendor_err = ecqe->vendor_err;
-
+		*pstatus = mlx4_handle_error_cqe(ecqe);
+		if (!lazy)
+			wc->vendor_err = ecqe->vendor_err;
 		return CQ_OK;
 	}
 
-	wc->status = IBV_WC_SUCCESS;
-
-	if (is_send) {
-		wc->wc_flags = 0;
-		switch (cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) {
-		case MLX4_OPCODE_RDMA_WRITE_IMM:
-			wc->wc_flags |= IBV_WC_WITH_IMM;
-		case MLX4_OPCODE_RDMA_WRITE:
-			wc->opcode    = IBV_WC_RDMA_WRITE;
-			break;
-		case MLX4_OPCODE_SEND_IMM:
-			wc->wc_flags |= IBV_WC_WITH_IMM;
-		case MLX4_OPCODE_SEND:
-			wc->opcode    = IBV_WC_SEND;
-			break;
-		case MLX4_OPCODE_RDMA_READ:
-			wc->opcode    = IBV_WC_RDMA_READ;
-			wc->byte_len  = ntohl(cqe->byte_cnt);
-			break;
-		case MLX4_OPCODE_ATOMIC_CS:
-			wc->opcode    = IBV_WC_COMP_SWAP;
-			wc->byte_len  = 8;
-			break;
-		case MLX4_OPCODE_ATOMIC_FA:
-			wc->opcode    = IBV_WC_FETCH_ADD;
-			wc->byte_len  = 8;
-			break;
-		case MLX4_OPCODE_LOCAL_INVAL:
-			wc->opcode    = IBV_WC_LOCAL_INV;
-			break;
-		case MLX4_OPCODE_BIND_MW:
-			wc->opcode    = IBV_WC_BIND_MW;
-			break;
-		case MLX4_OPCODE_SEND_INVAL:
-			wc->opcode    = IBV_WC_SEND;
-			break;
-		default:
-			/* assume it's a send completion */
-			wc->opcode    = IBV_WC_SEND;
-			break;
-		}
+	*pstatus = IBV_WC_SUCCESS;
+	if (lazy) {
+		if (!is_send)
+			if ((*cur_qp) && ((*cur_qp)->qp_cap_cache & MLX4_RX_CSUM_VALID))
+				cq->flags |= MLX4_CQ_FLAGS_RX_CSUM_VALID;
+	} else if (is_send) {
+		handle_good_req(wc, cqe);
 	} else {
 		wc->byte_len = ntohl(cqe->byte_cnt);
 
@@ -331,7 +349,7 @@ static int mlx4_poll_one(struct mlx4_cq *cq,
 		wc->wc_flags	  |= g_mlpath_rqpn & 0x80000000 ? IBV_WC_GRH : 0;
 		wc->pkey_index     = ntohl(cqe->immed_rss_invalid) & 0x7f;
 		/* When working with xrc srqs, don't have qp to check link layer.
-		  * Using IB SL, should consider Roce. (TBD)
+		* Using IB SL, should consider Roce. (TBD)
 		*/
 		if ((*cur_qp) && (*cur_qp)->link_layer == IBV_LINK_LAYER_ETHERNET)
 			wc->sl	   = ntohs(cqe->sl_vid) >> 13;
@@ -340,14 +358,41 @@ static int mlx4_poll_one(struct mlx4_cq *cq,
 
 		if ((*cur_qp) && ((*cur_qp)->qp_cap_cache & MLX4_RX_CSUM_VALID)) {
 			wc->wc_flags |= ((cqe->status & htonl(MLX4_CQE_STATUS_IPV4_CSUM_OK)) ==
-					 htonl(MLX4_CQE_STATUS_IPV4_CSUM_OK)) <<
-					IBV_WC_IP_CSUM_OK_SHIFT;
+				 htonl(MLX4_CQE_STATUS_IPV4_CSUM_OK)) <<
+				IBV_WC_IP_CSUM_OK_SHIFT;
 		}
 	}
 
 	return CQ_OK;
 }
 
+static inline int mlx4_parse_lazy_cqe(struct mlx4_cq *cq,
+				      struct mlx4_cqe *cqe)
+				      ALWAYS_INLINE;
+static inline int mlx4_parse_lazy_cqe(struct mlx4_cq *cq,
+				      struct mlx4_cqe *cqe)
+{
+	return mlx4_parse_cqe(cq, cqe, &cq->cur_qp, NULL, 1);
+}
+
+static inline int mlx4_poll_one(struct mlx4_cq *cq,
+			 struct mlx4_qp **cur_qp,
+			 struct ibv_wc *wc)
+			 ALWAYS_INLINE;
+static inline int mlx4_poll_one(struct mlx4_cq *cq,
+			 struct mlx4_qp **cur_qp,
+			 struct ibv_wc *wc)
+{
+	struct mlx4_cqe *cqe;
+	int err;
+
+	err = mlx4_get_next_cqe(cq, &cqe);
+	if (err == CQ_EMPTY)
+		return err;
+
+	return mlx4_parse_cqe(cq, cqe, cur_qp, wc, 0);
+}
+
 int mlx4_poll_cq(struct ibv_cq *ibcq, int ne, struct ibv_wc *wc)
 {
 	struct mlx4_cq *cq = to_mcq(ibcq);
diff --git a/providers/mlx4/mlx4.h b/providers/mlx4/mlx4.h
index af21eeb..4428d30 100644
--- a/providers/mlx4/mlx4.h
+++ b/providers/mlx4/mlx4.h
@@ -179,8 +179,12 @@ struct mlx4_pd {
 	uint32_t			pdn;
 };
 
+enum {
+	MLX4_CQ_FLAGS_RX_CSUM_VALID = 1 << 0,
+};
+
 struct mlx4_cq {
-	struct ibv_cq			ibv_cq;
+	struct ibv_cq_ex		ibv_cq;
 	struct mlx4_buf			buf;
 	struct mlx4_buf			resize_buf;
 	pthread_spinlock_t		lock;
@@ -190,6 +194,9 @@ struct mlx4_cq {
 	uint32_t		       *arm_db;
 	int				arm_sn;
 	int				cqe_size;
+	struct mlx4_qp			*cur_qp;
+	struct mlx4_cqe			*cqe;
+	uint32_t			flags;
 };
 
 struct mlx4_srq {
diff --git a/providers/mlx4/verbs.c b/providers/mlx4/verbs.c
index 21ec1d1..83c971d 100644
--- a/providers/mlx4/verbs.c
+++ b/providers/mlx4/verbs.c
@@ -346,14 +346,14 @@ struct ibv_cq *mlx4_create_cq(struct ibv_context *context, int cqe,
 	cmd.db_addr  = (uintptr_t) cq->set_ci_db;
 
 	ret = ibv_cmd_create_cq(context, cqe - 1, channel, comp_vector,
-				&cq->ibv_cq, &cmd.ibv_cmd, sizeof cmd,
-				&resp.ibv_resp, sizeof resp);
+				ibv_cq_ex_to_cq(&cq->ibv_cq), &cmd.ibv_cmd, sizeof(cmd),
+				&resp.ibv_resp, sizeof(resp));
 	if (ret)
 		goto err_db;
 
 	cq->cqn = resp.cqn;
 
-	return &cq->ibv_cq;
+	return ibv_cq_ex_to_cq(&cq->ibv_cq);
 
 err_db:
 	mlx4_free_db(to_mctx(context), MLX4_DB_TYPE_CQ, cq->set_ci_db);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH rdma-core 4/8] mlx4: Add inline functions to read completion's attributes
       [not found] ` <1485355791-27646-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-01-25 14:49   ` [PATCH rdma-core 3/8] mlx4: Add lazy CQ polling Yishai Hadas
@ 2017-01-25 14:49   ` Yishai Hadas
       [not found]     ` <1485355791-27646-5-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-01-25 14:49   ` [PATCH rdma-core 5/8] mlx4: Add ability to poll CQs through iterator's style API Yishai Hadas
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Yishai Hadas @ 2017-01-25 14:49 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, lariel-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add inline functions in order to read various completion's
attributes. These functions will be assigned in the ibv_cq_ex
structure in order to allow the user to read the completion's
attributes.

Signed-off-by: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Acked-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 providers/mlx4/cq.c   | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++
 providers/mlx4/mlx4.h |  13 +++--
 2 files changed, 157 insertions(+), 3 deletions(-)

diff --git a/providers/mlx4/cq.c b/providers/mlx4/cq.c
index 6c4b3c4..a80b2fb 100644
--- a/providers/mlx4/cq.c
+++ b/providers/mlx4/cq.c
@@ -416,6 +416,153 @@ int mlx4_poll_cq(struct ibv_cq *ibcq, int ne, struct ibv_wc *wc)
 	return err == CQ_POLL_ERR ? err : npolled;
 }
 
+static inline enum ibv_wc_opcode mlx4_cq_read_wc_opcode(struct ibv_cq_ex *ibcq)
+{
+	struct mlx4_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));
+
+	if (cq->cqe->owner_sr_opcode & MLX4_CQE_IS_SEND_MASK) {
+		switch (cq->cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) {
+		case MLX4_OPCODE_RDMA_WRITE_IMM:
+		case MLX4_OPCODE_RDMA_WRITE:
+			return IBV_WC_RDMA_WRITE;
+		case MLX4_OPCODE_SEND_INVAL:
+		case MLX4_OPCODE_SEND_IMM:
+		case MLX4_OPCODE_SEND:
+			return IBV_WC_SEND;
+		case MLX4_OPCODE_RDMA_READ:
+			return IBV_WC_RDMA_READ;
+		case MLX4_OPCODE_ATOMIC_CS:
+			return IBV_WC_COMP_SWAP;
+		case MLX4_OPCODE_ATOMIC_FA:
+			return IBV_WC_FETCH_ADD;
+		case MLX4_OPCODE_LOCAL_INVAL:
+			return IBV_WC_LOCAL_INV;
+		case MLX4_OPCODE_BIND_MW:
+			return IBV_WC_BIND_MW;
+		}
+	} else {
+		switch (cq->cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) {
+		case MLX4_RECV_OPCODE_RDMA_WRITE_IMM:
+			return IBV_WC_RECV_RDMA_WITH_IMM;
+		case MLX4_RECV_OPCODE_SEND_INVAL:
+		case MLX4_RECV_OPCODE_SEND_IMM:
+		case MLX4_RECV_OPCODE_SEND:
+			return IBV_WC_RECV;
+		}
+	}
+
+	return 0;
+}
+
+static inline uint32_t mlx4_cq_read_wc_qp_num(struct ibv_cq_ex *ibcq)
+{
+	struct mlx4_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));
+
+	return ntohl(cq->cqe->vlan_my_qpn) & MLX4_CQE_QPN_MASK;
+}
+
+static inline int mlx4_cq_read_wc_flags(struct ibv_cq_ex *ibcq)
+{
+	struct mlx4_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));
+	int is_send  = cq->cqe->owner_sr_opcode & MLX4_CQE_IS_SEND_MASK;
+	int wc_flags = 0;
+
+	if (is_send) {
+		switch (cq->cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) {
+		case MLX4_OPCODE_RDMA_WRITE_IMM:
+		case MLX4_OPCODE_SEND_IMM:
+			wc_flags |= IBV_WC_WITH_IMM;
+			break;
+		}
+	} else {
+		if (cq->flags & MLX4_CQ_FLAGS_RX_CSUM_VALID)
+			wc_flags |= ((cq->cqe->status &
+				htonl(MLX4_CQE_STATUS_IPV4_CSUM_OK)) ==
+				htonl(MLX4_CQE_STATUS_IPV4_CSUM_OK)) <<
+				IBV_WC_IP_CSUM_OK_SHIFT;
+
+		switch (cq->cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) {
+		case MLX4_RECV_OPCODE_RDMA_WRITE_IMM:
+		case MLX4_RECV_OPCODE_SEND_IMM:
+			wc_flags |= IBV_WC_WITH_IMM;
+			break;
+		case MLX4_RECV_OPCODE_SEND_INVAL:
+			wc_flags |= IBV_WC_WITH_INV;
+			break;
+		}
+		wc_flags |= (ntohl(cq->cqe->g_mlpath_rqpn) & 0x80000000) ? IBV_WC_GRH : 0;
+	}
+
+	return wc_flags;
+}
+
+static inline uint32_t mlx4_cq_read_wc_byte_len(struct ibv_cq_ex *ibcq)
+{
+	struct mlx4_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));
+
+	return ntohl(cq->cqe->byte_cnt);
+}
+
+static inline uint32_t mlx4_cq_read_wc_vendor_err(struct ibv_cq_ex *ibcq)
+{
+	struct mlx4_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));
+	struct mlx4_err_cqe *ecqe = (struct mlx4_err_cqe *)cq->cqe;
+
+	return ecqe->vendor_err;
+}
+
+static inline uint32_t mlx4_cq_read_wc_imm_data(struct ibv_cq_ex *ibcq)
+{
+	struct mlx4_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));
+
+	switch (cq->cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) {
+	case MLX4_RECV_OPCODE_SEND_INVAL:
+		return ntohl(cq->cqe->immed_rss_invalid);
+	default:
+		return cq->cqe->immed_rss_invalid;
+	}
+}
+
+static inline uint32_t mlx4_cq_read_wc_slid(struct ibv_cq_ex *ibcq)
+{
+	struct mlx4_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));
+
+	return (uint32_t)ntohs(cq->cqe->rlid);
+}
+
+static inline uint8_t mlx4_cq_read_wc_sl(struct ibv_cq_ex *ibcq)
+{
+	struct mlx4_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));
+
+	if ((cq->cur_qp) && (cq->cur_qp->link_layer == IBV_LINK_LAYER_ETHERNET))
+		return ntohs(cq->cqe->sl_vid) >> 13;
+	else
+		return ntohs(cq->cqe->sl_vid) >> 12;
+}
+
+static inline uint32_t mlx4_cq_read_wc_src_qp(struct ibv_cq_ex *ibcq)
+{
+	struct mlx4_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));
+
+	return ntohl(cq->cqe->g_mlpath_rqpn) & 0xffffff;
+}
+
+static inline uint8_t mlx4_cq_read_wc_dlid_path_bits(struct ibv_cq_ex *ibcq)
+{
+	struct mlx4_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));
+
+	return (ntohl(cq->cqe->g_mlpath_rqpn) >> 24) & 0x7f;
+}
+
+static inline uint64_t mlx4_cq_read_wc_completion_ts(struct ibv_cq_ex *ibcq)
+{
+	struct mlx4_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));
+
+	return ((uint64_t)ntohl(cq->cqe->ts_47_16) << 16) |
+			       (cq->cqe->ts_15_8   <<  8) |
+			       (cq->cqe->ts_7_0);
+}
+
 int mlx4_arm_cq(struct ibv_cq *ibvcq, int solicited)
 {
 	struct mlx4_cq *cq = to_mcq(ibvcq);
diff --git a/providers/mlx4/mlx4.h b/providers/mlx4/mlx4.h
index 4428d30..5ab083c 100644
--- a/providers/mlx4/mlx4.h
+++ b/providers/mlx4/mlx4.h
@@ -285,13 +285,20 @@ struct mlx4_cqe {
 	uint32_t	vlan_my_qpn;
 	uint32_t	immed_rss_invalid;
 	uint32_t	g_mlpath_rqpn;
-	uint16_t	sl_vid;
-	uint16_t	rlid;
+	union {
+		struct {
+			uint16_t	sl_vid;
+			uint16_t	rlid;
+		};
+		uint32_t ts_47_16;
+	};
 	uint32_t	status;
 	uint32_t	byte_cnt;
 	uint16_t	wqe_index;
 	uint16_t	checksum;
-	uint8_t		reserved3[3];
+	uint8_t		reserved3;
+	uint8_t		ts_15_8;
+	uint8_t		ts_7_0;
 	uint8_t		owner_sr_opcode;
 };
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH rdma-core 5/8] mlx4: Add ability to poll CQs through iterator's style API
       [not found] ` <1485355791-27646-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2017-01-25 14:49   ` [PATCH rdma-core 4/8] mlx4: Add inline functions to read completion's attributes Yishai Hadas
@ 2017-01-25 14:49   ` Yishai Hadas
       [not found]     ` <1485355791-27646-6-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-01-25 14:49   ` [PATCH rdma-core 6/8] mlx4: Add support for creating an extended CQ Yishai Hadas
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Yishai Hadas @ 2017-01-25 14:49 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, lariel-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

The new poll CQ API is based on an iterator's style API.
The user calls start_poll_cq and next_poll_cq, query whatever valid
and initialized (initialized attributes are attributes which were
stated when the CQ was created) attributes and call end_poll_cq at
the end.

This patch implements this methodology in mlx4 user space vendor
driver. In order to make start and end efficient, we use specialized
functions for every case - locked and single threaded(unlocked).

Signed-off-by: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Acked-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 providers/mlx4/cq.c   | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++
 providers/mlx4/mlx4.h | 10 ++++++-
 2 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/providers/mlx4/cq.c b/providers/mlx4/cq.c
index a80b2fb..728efde 100644
--- a/providers/mlx4/cq.c
+++ b/providers/mlx4/cq.c
@@ -416,6 +416,89 @@ int mlx4_poll_cq(struct ibv_cq *ibcq, int ne, struct ibv_wc *wc)
 	return err == CQ_POLL_ERR ? err : npolled;
 }
 
+static inline void _mlx4_end_poll(struct ibv_cq_ex *ibcq, int lock)
+				  ALWAYS_INLINE;
+static inline void _mlx4_end_poll(struct ibv_cq_ex *ibcq, int lock)
+{
+	struct mlx4_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));
+
+	mlx4_update_cons_index(cq);
+
+	if (lock)
+		pthread_spin_unlock(&cq->lock);
+}
+
+static inline int _mlx4_start_poll(struct ibv_cq_ex *ibcq,
+				   struct ibv_poll_cq_attr *attr,
+				   int lock)
+				   ALWAYS_INLINE;
+static inline int _mlx4_start_poll(struct ibv_cq_ex *ibcq,
+				   struct ibv_poll_cq_attr *attr,
+				   int lock)
+{
+	struct mlx4_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));
+	struct mlx4_cqe *cqe;
+	int err;
+
+	if (unlikely(attr->comp_mask))
+		return EINVAL;
+
+	if (lock)
+		pthread_spin_lock(&cq->lock);
+
+	cq->cur_qp = NULL;
+
+	err = mlx4_get_next_cqe(cq, &cqe);
+	if (err == CQ_EMPTY) {
+		if (lock)
+			pthread_spin_unlock(&cq->lock);
+		return ENOENT;
+	}
+
+	err = mlx4_parse_lazy_cqe(cq, cqe);
+	if (lock && err)
+		pthread_spin_unlock(&cq->lock);
+
+	return err;
+}
+
+static inline int mlx4_next_poll(struct ibv_cq_ex *ibcq)
+				 ALWAYS_INLINE;
+static inline int mlx4_next_poll(struct ibv_cq_ex *ibcq)
+{
+	struct mlx4_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));
+	struct mlx4_cqe *cqe;
+	int err;
+
+	err = mlx4_get_next_cqe(cq, &cqe);
+	if (err == CQ_EMPTY)
+		return ENOENT;
+
+	return mlx4_parse_lazy_cqe(cq, cqe);
+}
+
+static inline void mlx4_end_poll(struct ibv_cq_ex *ibcq)
+{
+	_mlx4_end_poll(ibcq, 0);
+}
+
+static inline void mlx4_end_poll_lock(struct ibv_cq_ex *ibcq)
+{
+	_mlx4_end_poll(ibcq, 1);
+}
+
+static inline int mlx4_start_poll(struct ibv_cq_ex *ibcq,
+				  struct ibv_poll_cq_attr *attr)
+{
+	return _mlx4_start_poll(ibcq, attr, 0);
+}
+
+static inline int mlx4_start_poll_lock(struct ibv_cq_ex *ibcq,
+				       struct ibv_poll_cq_attr *attr)
+{
+	return _mlx4_start_poll(ibcq, attr, 1);
+}
+
 static inline enum ibv_wc_opcode mlx4_cq_read_wc_opcode(struct ibv_cq_ex *ibcq)
 {
 	struct mlx4_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));
diff --git a/providers/mlx4/mlx4.h b/providers/mlx4/mlx4.h
index 5ab083c..cb4c8d4 100644
--- a/providers/mlx4/mlx4.h
+++ b/providers/mlx4/mlx4.h
@@ -59,12 +59,20 @@ enum {
 
 #ifndef likely
 #ifdef __GNUC__
-#define likely(x)       __builtin_expect(!!(x),1)
+#define likely(x)       __builtin_expect(!!(x), 1)
 #else
 #define likely(x)      (x)
 #endif
 #endif
 
+#ifndef unlikely
+#ifdef __GNUC__
+#define unlikely(x)	    __builtin_expect(!!(x), 0)
+#else
+#define unlikely(x)	   (x)
+#endif
+#endif
+
 enum {
 	MLX4_QP_TABLE_BITS		= 8,
 	MLX4_QP_TABLE_SIZE		= 1 << MLX4_QP_TABLE_BITS,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH rdma-core 6/8] mlx4: Add support for creating an extended CQ
       [not found] ` <1485355791-27646-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2017-01-25 14:49   ` [PATCH rdma-core 5/8] mlx4: Add ability to poll CQs through iterator's style API Yishai Hadas
@ 2017-01-25 14:49   ` Yishai Hadas
  2017-01-25 14:49   ` [PATCH rdma-core 7/8] mlx4: Add ibv_query_device_ex support Yishai Hadas
  2017-01-25 14:49   ` [PATCH rdma-core 8/8] mlx4: Add ibv_query_rt_values Yishai Hadas
  7 siblings, 0 replies; 14+ messages in thread
From: Yishai Hadas @ 2017-01-25 14:49 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, lariel-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

This patch adds the support for creating an extended CQ.
This means we support:
- The new polling mechanism.
- A CQ which is single threaded and by thus doesn't waste CPU cycles
  on locking.
- Getting completion timestamp from the CQ.

Signed-off-by: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Acked-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 providers/mlx4/cq.c       |  33 ++++++++++
 providers/mlx4/mlx4-abi.h |  12 ++++
 providers/mlx4/mlx4.c     |   1 +
 providers/mlx4/mlx4.h     |   5 ++
 providers/mlx4/verbs.c    | 157 ++++++++++++++++++++++++++++++++++++++++------
 5 files changed, 190 insertions(+), 18 deletions(-)

diff --git a/providers/mlx4/cq.c b/providers/mlx4/cq.c
index 728efde..22fdbf2 100644
--- a/providers/mlx4/cq.c
+++ b/providers/mlx4/cq.c
@@ -646,6 +646,39 @@ static inline uint64_t mlx4_cq_read_wc_completion_ts(struct ibv_cq_ex *ibcq)
 			       (cq->cqe->ts_7_0);
 }
 
+void mlx4_cq_fill_pfns(struct mlx4_cq *cq, const struct ibv_cq_init_attr_ex *cq_attr)
+{
+
+	if (cq->flags & MLX4_CQ_FLAGS_SINGLE_THREADED) {
+		cq->ibv_cq.start_poll = mlx4_start_poll;
+		cq->ibv_cq.end_poll = mlx4_end_poll;
+	} else {
+		cq->ibv_cq.start_poll = mlx4_start_poll_lock;
+		cq->ibv_cq.end_poll = mlx4_end_poll_lock;
+	}
+	cq->ibv_cq.next_poll = mlx4_next_poll;
+
+	cq->ibv_cq.read_opcode = mlx4_cq_read_wc_opcode;
+	cq->ibv_cq.read_vendor_err = mlx4_cq_read_wc_vendor_err;
+	cq->ibv_cq.read_wc_flags = mlx4_cq_read_wc_flags;
+	if (cq_attr->wc_flags & IBV_WC_EX_WITH_BYTE_LEN)
+		cq->ibv_cq.read_byte_len = mlx4_cq_read_wc_byte_len;
+	if (cq_attr->wc_flags & IBV_WC_EX_WITH_IMM)
+		cq->ibv_cq.read_imm_data = mlx4_cq_read_wc_imm_data;
+	if (cq_attr->wc_flags & IBV_WC_EX_WITH_QP_NUM)
+		cq->ibv_cq.read_qp_num = mlx4_cq_read_wc_qp_num;
+	if (cq_attr->wc_flags & IBV_WC_EX_WITH_SRC_QP)
+		cq->ibv_cq.read_src_qp = mlx4_cq_read_wc_src_qp;
+	if (cq_attr->wc_flags & IBV_WC_EX_WITH_SLID)
+		cq->ibv_cq.read_slid = mlx4_cq_read_wc_slid;
+	if (cq_attr->wc_flags & IBV_WC_EX_WITH_SL)
+		cq->ibv_cq.read_sl = mlx4_cq_read_wc_sl;
+	if (cq_attr->wc_flags & IBV_WC_EX_WITH_DLID_PATH_BITS)
+		cq->ibv_cq.read_dlid_path_bits = mlx4_cq_read_wc_dlid_path_bits;
+	if (cq_attr->wc_flags & IBV_WC_EX_WITH_COMPLETION_TIMESTAMP)
+		cq->ibv_cq.read_completion_ts = mlx4_cq_read_wc_completion_ts;
+}
+
 int mlx4_arm_cq(struct ibv_cq *ibvcq, int solicited)
 {
 	struct mlx4_cq *cq = to_mcq(ibvcq);
diff --git a/providers/mlx4/mlx4-abi.h b/providers/mlx4/mlx4-abi.h
index ac21fa8..3b8bac5 100644
--- a/providers/mlx4/mlx4-abi.h
+++ b/providers/mlx4/mlx4-abi.h
@@ -78,6 +78,18 @@ struct mlx4_create_cq_resp {
 	__u32				reserved;
 };
 
+struct mlx4_create_cq_ex {
+	struct ibv_create_cq_ex		ibv_cmd;
+	__u64				buf_addr;
+	__u64				db_addr;
+};
+
+struct mlx4_create_cq_resp_ex {
+	struct ibv_create_cq_resp_ex	ibv_resp;
+	__u32				cqn;
+	__u32				reserved;
+};
+
 struct mlx4_resize_cq {
 	struct ibv_resize_cq		ibv_cmd;
 	__u64				buf_addr;
diff --git a/providers/mlx4/mlx4.c b/providers/mlx4/mlx4.c
index b59c202..3f29d1a 100644
--- a/providers/mlx4/mlx4.c
+++ b/providers/mlx4/mlx4.c
@@ -216,6 +216,7 @@ static int mlx4_init_context(struct verbs_device *v_device,
 	verbs_set_ctx_op(verbs_ctx, open_qp, mlx4_open_qp);
 	verbs_set_ctx_op(verbs_ctx, ibv_create_flow, ibv_cmd_create_flow);
 	verbs_set_ctx_op(verbs_ctx, ibv_destroy_flow, ibv_cmd_destroy_flow);
+	verbs_set_ctx_op(verbs_ctx, create_cq_ex, mlx4_create_cq_ex);
 
 	return 0;
 
diff --git a/providers/mlx4/mlx4.h b/providers/mlx4/mlx4.h
index cb4c8d4..9d43b63 100644
--- a/providers/mlx4/mlx4.h
+++ b/providers/mlx4/mlx4.h
@@ -189,6 +189,8 @@ struct mlx4_pd {
 
 enum {
 	MLX4_CQ_FLAGS_RX_CSUM_VALID = 1 << 0,
+	MLX4_CQ_FLAGS_EXTENDED = 1 << 1,
+	MLX4_CQ_FLAGS_SINGLE_THREADED = 1 << 2,
 };
 
 struct mlx4_cq {
@@ -396,6 +398,9 @@ int mlx4_bind_mw(struct ibv_qp *qp, struct ibv_mw *mw,
 struct ibv_cq *mlx4_create_cq(struct ibv_context *context, int cqe,
 			       struct ibv_comp_channel *channel,
 			       int comp_vector);
+struct ibv_cq_ex *mlx4_create_cq_ex(struct ibv_context *context,
+				    struct ibv_cq_init_attr_ex *cq_attr);
+void mlx4_cq_fill_pfns(struct mlx4_cq *cq, const struct ibv_cq_init_attr_ex *cq_attr);
 int mlx4_alloc_cq_buf(struct mlx4_device *dev, struct mlx4_buf *buf, int nent,
 		      int entry_size);
 int mlx4_resize_cq(struct ibv_cq *cq, int cqe);
diff --git a/providers/mlx4/verbs.c b/providers/mlx4/verbs.c
index 83c971d..4b00550 100644
--- a/providers/mlx4/verbs.c
+++ b/providers/mlx4/verbs.c
@@ -304,19 +304,103 @@ int align_queue_size(int req)
 	return nent;
 }
 
-struct ibv_cq *mlx4_create_cq(struct ibv_context *context, int cqe,
-			       struct ibv_comp_channel *channel,
-			       int comp_vector)
+enum {
+	CREATE_CQ_SUPPORTED_WC_FLAGS = IBV_WC_STANDARD_FLAGS	|
+				       IBV_WC_EX_WITH_COMPLETION_TIMESTAMP
+};
+
+enum {
+	CREATE_CQ_SUPPORTED_COMP_MASK = IBV_CQ_INIT_ATTR_MASK_FLAGS
+};
+
+enum {
+	CREATE_CQ_SUPPORTED_FLAGS = IBV_CREATE_CQ_ATTR_SINGLE_THREADED
+};
+
+
+static int mlx4_cmd_create_cq(struct ibv_context *context,
+			      struct ibv_cq_init_attr_ex *cq_attr,
+			      struct mlx4_cq *cq)
+{
+	struct mlx4_create_cq      cmd = {};
+	struct mlx4_create_cq_resp resp = {};
+	int ret;
+
+	cmd.buf_addr = (uintptr_t) cq->buf.buf;
+	cmd.db_addr  = (uintptr_t) cq->set_ci_db;
+
+	ret = ibv_cmd_create_cq(context, cq_attr->cqe, cq_attr->channel,
+				cq_attr->comp_vector,
+				ibv_cq_ex_to_cq(&cq->ibv_cq),
+				&cmd.ibv_cmd, sizeof(cmd),
+				&resp.ibv_resp, sizeof(resp));
+	if (!ret)
+		cq->cqn = resp.cqn;
+
+	return ret;
+
+}
+
+static int mlx4_cmd_create_cq_ex(struct ibv_context *context,
+				 struct ibv_cq_init_attr_ex *cq_attr,
+				 struct mlx4_cq *cq)
+{
+	struct mlx4_create_cq_ex      cmd = {};
+	struct mlx4_create_cq_resp_ex resp = {};
+	int ret;
+
+	cmd.buf_addr = (uintptr_t) cq->buf.buf;
+	cmd.db_addr  = (uintptr_t) cq->set_ci_db;
+
+	ret = ibv_cmd_create_cq_ex(context, cq_attr,
+				   &cq->ibv_cq, &cmd.ibv_cmd,
+				   sizeof(cmd.ibv_cmd),
+				   sizeof(cmd),
+				   &resp.ibv_resp,
+				   sizeof(resp.ibv_resp),
+				   sizeof(resp));
+	if (!ret)
+		cq->cqn = resp.cqn;
+
+	return ret;
+}
+
+static struct ibv_cq_ex *create_cq(struct ibv_context *context,
+				   struct ibv_cq_init_attr_ex *cq_attr,
+				   int cq_alloc_flags)
 {
-	struct mlx4_create_cq      cmd;
-	struct mlx4_create_cq_resp resp;
-	struct mlx4_cq		  *cq;
-	int			   ret;
-	struct mlx4_context       *mctx = to_mctx(context);
+	struct mlx4_cq      *cq;
+	int                  ret;
+	struct mlx4_context *mctx = to_mctx(context);
 
 	/* Sanity check CQ size before proceeding */
-	if (cqe > 0x3fffff)
+	if (cq_attr->cqe > 0x3fffff) {
+		errno = EINVAL;
+		return NULL;
+	}
+
+	if (cq_attr->comp_mask & ~CREATE_CQ_SUPPORTED_COMP_MASK) {
+		errno = ENOTSUP;
+		return NULL;
+	}
+
+	if (cq_attr->comp_mask & IBV_CQ_INIT_ATTR_MASK_FLAGS &&
+	    cq_attr->flags & ~CREATE_CQ_SUPPORTED_FLAGS) {
+		errno = ENOTSUP;
+		return NULL;
+	}
+
+	if (cq_attr->wc_flags & ~CREATE_CQ_SUPPORTED_WC_FLAGS)
+		return NULL;
+
+	/* mlx4 devices don't support slid and sl in cqe when completion
+	 * timestamp is enabled in the CQ
+	*/
+	if ((cq_attr->wc_flags & (IBV_WC_EX_WITH_SLID | IBV_WC_EX_WITH_SL)) &&
+	    (cq_attr->wc_flags & IBV_WC_EX_WITH_COMPLETION_TIMESTAMP)) {
+		errno = ENOTSUP;
 		return NULL;
+	}
 
 	cq = malloc(sizeof *cq);
 	if (!cq)
@@ -327,9 +411,9 @@ struct ibv_cq *mlx4_create_cq(struct ibv_context *context, int cqe,
 	if (pthread_spin_init(&cq->lock, PTHREAD_PROCESS_PRIVATE))
 		goto err;
 
-	cqe = align_queue_size(cqe + 1);
+	cq_attr->cqe = align_queue_size(cq_attr->cqe + 1);
 
-	if (mlx4_alloc_cq_buf(to_mdev(context->device), &cq->buf, cqe, mctx->cqe_size))
+	if (mlx4_alloc_cq_buf(to_mdev(context->device), &cq->buf, cq_attr->cqe, mctx->cqe_size))
 		goto err;
 
 	cq->cqe_size = mctx->cqe_size;
@@ -341,19 +425,26 @@ struct ibv_cq *mlx4_create_cq(struct ibv_context *context, int cqe,
 	*cq->arm_db    = 0;
 	cq->arm_sn     = 1;
 	*cq->set_ci_db = 0;
+	cq->flags = cq_alloc_flags;
 
-	cmd.buf_addr = (uintptr_t) cq->buf.buf;
-	cmd.db_addr  = (uintptr_t) cq->set_ci_db;
+	if (cq_attr->comp_mask & IBV_CQ_INIT_ATTR_MASK_FLAGS &&
+	    cq_attr->flags & IBV_CREATE_CQ_ATTR_SINGLE_THREADED)
+		cq->flags |= MLX4_CQ_FLAGS_SINGLE_THREADED;
+
+	--cq_attr->cqe;
+	if (cq_alloc_flags & MLX4_CQ_FLAGS_EXTENDED)
+		ret = mlx4_cmd_create_cq_ex(context, cq_attr, cq);
+	else
+		ret = mlx4_cmd_create_cq(context, cq_attr, cq);
 
-	ret = ibv_cmd_create_cq(context, cqe - 1, channel, comp_vector,
-				ibv_cq_ex_to_cq(&cq->ibv_cq), &cmd.ibv_cmd, sizeof(cmd),
-				&resp.ibv_resp, sizeof(resp));
 	if (ret)
 		goto err_db;
 
-	cq->cqn = resp.cqn;
 
-	return ibv_cq_ex_to_cq(&cq->ibv_cq);
+	if (cq_alloc_flags & MLX4_CQ_FLAGS_EXTENDED)
+		mlx4_cq_fill_pfns(cq, cq_attr);
+
+	return &cq->ibv_cq;
 
 err_db:
 	mlx4_free_db(to_mctx(context), MLX4_DB_TYPE_CQ, cq->set_ci_db);
@@ -367,6 +458,36 @@ err:
 	return NULL;
 }
 
+struct ibv_cq *mlx4_create_cq(struct ibv_context *context, int cqe,
+			      struct ibv_comp_channel *channel,
+			      int comp_vector)
+{
+	struct ibv_cq_ex *cq;
+	struct ibv_cq_init_attr_ex cq_attr = {.cqe = cqe, .channel = channel,
+					      .comp_vector = comp_vector,
+					      .wc_flags = IBV_WC_STANDARD_FLAGS};
+
+	cq = create_cq(context, &cq_attr, 0);
+	return cq ? ibv_cq_ex_to_cq(cq) : NULL;
+}
+
+struct ibv_cq_ex *mlx4_create_cq_ex(struct ibv_context *context,
+				    struct ibv_cq_init_attr_ex *cq_attr)
+{
+	/*
+	 * Make local copy since some attributes might be adjusted
+	 * for internal use.
+	 */
+	struct ibv_cq_init_attr_ex cq_attr_c = {.cqe = cq_attr->cqe,
+						.channel = cq_attr->channel,
+						.comp_vector = cq_attr->comp_vector,
+						.wc_flags = cq_attr->wc_flags,
+						.comp_mask = cq_attr->comp_mask,
+						.flags = cq_attr->flags};
+
+	return create_cq(context, &cq_attr_c, MLX4_CQ_FLAGS_EXTENDED);
+}
+
 int mlx4_resize_cq(struct ibv_cq *ibcq, int cqe)
 {
 	struct mlx4_cq *cq = to_mcq(ibcq);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH rdma-core 7/8] mlx4: Add ibv_query_device_ex support
       [not found] ` <1485355791-27646-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2017-01-25 14:49   ` [PATCH rdma-core 6/8] mlx4: Add support for creating an extended CQ Yishai Hadas
@ 2017-01-25 14:49   ` Yishai Hadas
  2017-01-25 14:49   ` [PATCH rdma-core 8/8] mlx4: Add ibv_query_rt_values Yishai Hadas
  7 siblings, 0 replies; 14+ messages in thread
From: Yishai Hadas @ 2017-01-25 14:49 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, lariel-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

The extended query_device verb allows the user to get
additional vendor specific device parameters.
With this commit ibv_query_device_ex can be used to
retrieve the HCA's core clock offset.

Signed-off-by: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Acked-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 providers/mlx4/mlx4-abi.h | 15 +++++++++++++++
 providers/mlx4/mlx4.c     |  1 +
 providers/mlx4/mlx4.h     |  4 ++++
 providers/mlx4/verbs.c    | 37 +++++++++++++++++++++++++++++++++++++
 4 files changed, 57 insertions(+)

diff --git a/providers/mlx4/mlx4-abi.h b/providers/mlx4/mlx4-abi.h
index 3b8bac5..7d89505 100644
--- a/providers/mlx4/mlx4-abi.h
+++ b/providers/mlx4/mlx4-abi.h
@@ -51,6 +51,10 @@ struct mlx4_alloc_ucontext_resp_v3 {
 	__u16				bf_regs_per_page;
 };
 
+enum mlx4_query_dev_ex_resp_mask {
+	MLX4_QUERY_DEV_RESP_MASK_CORE_CLOCK_OFFSET = 1UL << 0,
+};
+
 struct mlx4_alloc_ucontext_resp {
 	struct ibv_get_context_resp	ibv_resp;
 	__u32				dev_caps;
@@ -95,6 +99,17 @@ struct mlx4_resize_cq {
 	__u64				buf_addr;
 };
 
+struct mlx4_query_device_ex_resp {
+	struct ibv_query_device_resp_ex ibv_resp;
+	__u32				comp_mask;
+	__u32				response_length;
+	__u64				hca_core_clock_offset;
+};
+
+struct mlx4_query_device_ex {
+	struct ibv_query_device_ex	ibv_cmd;
+};
+
 struct mlx4_create_srq {
 	struct ibv_create_srq		ibv_cmd;
 	__u64				buf_addr;
diff --git a/providers/mlx4/mlx4.c b/providers/mlx4/mlx4.c
index 3f29d1a..755768e 100644
--- a/providers/mlx4/mlx4.c
+++ b/providers/mlx4/mlx4.c
@@ -217,6 +217,7 @@ static int mlx4_init_context(struct verbs_device *v_device,
 	verbs_set_ctx_op(verbs_ctx, ibv_create_flow, ibv_cmd_create_flow);
 	verbs_set_ctx_op(verbs_ctx, ibv_destroy_flow, ibv_cmd_destroy_flow);
 	verbs_set_ctx_op(verbs_ctx, create_cq_ex, mlx4_create_cq_ex);
+	verbs_set_ctx_op(verbs_ctx, query_device_ex, mlx4_query_device_ex);
 
 	return 0;
 
diff --git a/providers/mlx4/mlx4.h b/providers/mlx4/mlx4.h
index 9d43b63..8c01ec1 100644
--- a/providers/mlx4/mlx4.h
+++ b/providers/mlx4/mlx4.h
@@ -375,6 +375,10 @@ void mlx4_free_db(struct mlx4_context *context, enum mlx4_db_type type, uint32_t
 
 int mlx4_query_device(struct ibv_context *context,
 		       struct ibv_device_attr *attr);
+int mlx4_query_device_ex(struct ibv_context *context,
+			 const struct ibv_query_device_ex_input *input,
+			 struct ibv_device_attr_ex *attr,
+			 size_t attr_size);
 int mlx4_query_port(struct ibv_context *context, uint8_t port,
 		     struct ibv_port_attr *attr);
 
diff --git a/providers/mlx4/verbs.c b/providers/mlx4/verbs.c
index 4b00550..c523c41 100644
--- a/providers/mlx4/verbs.c
+++ b/providers/mlx4/verbs.c
@@ -64,6 +64,43 @@ int mlx4_query_device(struct ibv_context *context, struct ibv_device_attr *attr)
 	return 0;
 }
 
+int mlx4_query_device_ex(struct ibv_context *context,
+			 const struct ibv_query_device_ex_input *input,
+			 struct ibv_device_attr_ex *attr,
+			 size_t attr_size)
+{
+	struct mlx4_context *mctx = to_mctx(context);
+	struct mlx4_query_device_ex_resp resp = {};
+	struct mlx4_query_device_ex cmd = {};
+	uint64_t raw_fw_ver;
+	unsigned sub_minor;
+	unsigned major;
+	unsigned minor;
+	int err;
+
+	err = ibv_cmd_query_device_ex(context, input, attr, attr_size,
+				      &raw_fw_ver,
+				      &cmd.ibv_cmd, sizeof(cmd.ibv_cmd), sizeof(cmd),
+				      &resp.ibv_resp, sizeof(resp.ibv_resp),
+				      sizeof(resp));
+	if (err)
+		return err;
+
+	if (resp.comp_mask & MLX4_QUERY_DEV_RESP_MASK_CORE_CLOCK_OFFSET) {
+		mctx->core_clock.offset = resp.hca_core_clock_offset;
+		mctx->core_clock.offset_valid = 1;
+	}
+
+	major     = (raw_fw_ver >> 32) & 0xffff;
+	minor     = (raw_fw_ver >> 16) & 0xffff;
+	sub_minor = raw_fw_ver & 0xffff;
+
+	snprintf(attr->orig_attr.fw_ver, sizeof attr->orig_attr.fw_ver,
+		 "%d.%d.%03d", major, minor, sub_minor);
+
+	return 0;
+}
+
 int mlx4_query_port(struct ibv_context *context, uint8_t port,
 		     struct ibv_port_attr *attr)
 {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH rdma-core 8/8] mlx4: Add ibv_query_rt_values
       [not found] ` <1485355791-27646-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2017-01-25 14:49   ` [PATCH rdma-core 7/8] mlx4: Add ibv_query_device_ex support Yishai Hadas
@ 2017-01-25 14:49   ` Yishai Hadas
  7 siblings, 0 replies; 14+ messages in thread
From: Yishai Hadas @ 2017-01-25 14:49 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, lariel-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

In order to query the current HCA's core clock, libmlx4 should
support ibv_query_rt_values verb. Querying the hardware's cycles
register is done by mmaping this register to user-space.
Therefore, when libmlx4 initializes we mmap the cycles register.
This assumes the machine's architecture places the PCI and memory in
the same address space.
The page offset is retrieved by calling ibv_query_device_ex.

Signed-off-by: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Acked-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 providers/mlx4/mlx4.c  | 39 ++++++++++++++++++++++++++++++++++-----
 providers/mlx4/mlx4.h  |  8 +++++++-
 providers/mlx4/verbs.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 87 insertions(+), 6 deletions(-)

diff --git a/providers/mlx4/mlx4.c b/providers/mlx4/mlx4.c
index 755768e..8e1a0dd 100644
--- a/providers/mlx4/mlx4.c
+++ b/providers/mlx4/mlx4.c
@@ -118,6 +118,28 @@ static struct ibv_context_ops mlx4_ctx_ops = {
 	.detach_mcast  = ibv_cmd_detach_mcast
 };
 
+static int mlx4_map_internal_clock(struct mlx4_device *mdev,
+				   struct ibv_context *ibv_ctx)
+{
+	struct mlx4_context *context = to_mctx(ibv_ctx);
+	void *hca_clock_page;
+
+	hca_clock_page = mmap(NULL, mdev->page_size,
+			      PROT_READ, MAP_SHARED, ibv_ctx->cmd_fd,
+			      mdev->page_size * 3);
+
+	if (hca_clock_page == MAP_FAILED) {
+		fprintf(stderr, PFX
+			"Warning: Timestamp available,\n"
+			"but failed to mmap() hca core clock page.\n");
+		return -1;
+	}
+
+	context->hca_core_clock = hca_clock_page +
+		(context->core_clock.offset & (mdev->page_size - 1));
+	return 0;
+}
+
 static int mlx4_init_context(struct verbs_device *v_device,
 				struct ibv_context *ibv_ctx, int cmd_fd)
 {
@@ -129,7 +151,7 @@ static int mlx4_init_context(struct verbs_device *v_device,
 	__u16				bf_reg_size;
 	struct mlx4_device              *dev = to_mdev(&v_device->device);
 	struct verbs_context *verbs_ctx = verbs_get_ctx(ibv_ctx);
-	struct ibv_device_attr		dev_attrs;
+	struct ibv_device_attr_ex	dev_attrs;
 
 	/* memory footprint of mlx4_context and verbs_context share
 	* struct ibv_context.
@@ -200,10 +222,14 @@ static int mlx4_init_context(struct verbs_device *v_device,
 	pthread_spin_init(&context->uar_lock, PTHREAD_PROCESS_PRIVATE);
 	ibv_ctx->ops = mlx4_ctx_ops;
 
+	context->hca_core_clock = NULL;
 	memset(&dev_attrs, 0, sizeof(dev_attrs));
-	if (!mlx4_query_device(ibv_ctx, &dev_attrs)) {
-		context->max_qp_wr = dev_attrs.max_qp_wr;
-		context->max_sge = dev_attrs.max_sge;
+	if (!mlx4_query_device_ex(ibv_ctx, NULL, &dev_attrs,
+				  sizeof(struct ibv_device_attr_ex))) {
+		context->max_qp_wr = dev_attrs.orig_attr.max_qp_wr;
+		context->max_sge = dev_attrs.orig_attr.max_sge;
+		if (context->core_clock.offset_valid)
+			mlx4_map_internal_clock(dev, ibv_ctx);
 	}
 
 	verbs_ctx->has_comp_mask = VERBS_CONTEXT_XRCD | VERBS_CONTEXT_SRQ |
@@ -218,6 +244,7 @@ static int mlx4_init_context(struct verbs_device *v_device,
 	verbs_set_ctx_op(verbs_ctx, ibv_destroy_flow, ibv_cmd_destroy_flow);
 	verbs_set_ctx_op(verbs_ctx, create_cq_ex, mlx4_create_cq_ex);
 	verbs_set_ctx_op(verbs_ctx, query_device_ex, mlx4_query_device_ex);
+	verbs_set_ctx_op(verbs_ctx, query_rt_values, mlx4_query_rt_values);
 
 	return 0;
 
@@ -231,7 +258,9 @@ static void mlx4_uninit_context(struct verbs_device *v_device,
 	munmap(context->uar, to_mdev(&v_device->device)->page_size);
 	if (context->bf_page)
 		munmap(context->bf_page, to_mdev(&v_device->device)->page_size);
-
+	if (context->hca_core_clock)
+		munmap(context->hca_core_clock - context->core_clock.offset,
+		       to_mdev(&v_device->device)->page_size);
 }
 
 static struct verbs_device *mlx4_driver_init(const char *uverbs_sys_path, int abi_version)
diff --git a/providers/mlx4/mlx4.h b/providers/mlx4/mlx4.h
index 8c01ec1..2bcfc7f 100644
--- a/providers/mlx4/mlx4.h
+++ b/providers/mlx4/mlx4.h
@@ -175,6 +175,11 @@ struct mlx4_context {
 		uint8_t                 link_layer;
 		enum ibv_port_cap_flags caps;
 	} port_query_cache[MLX4_PORTS_NUM];
+	struct {
+		uint64_t                offset;
+		uint8_t                 offset_valid;
+	} core_clock;
+	void			       *hca_core_clock;
 };
 
 struct mlx4_buf {
@@ -381,7 +386,8 @@ int mlx4_query_device_ex(struct ibv_context *context,
 			 size_t attr_size);
 int mlx4_query_port(struct ibv_context *context, uint8_t port,
 		     struct ibv_port_attr *attr);
-
+int mlx4_query_rt_values(struct ibv_context *context,
+			 struct ibv_values_ex *values);
 struct ibv_pd *mlx4_alloc_pd(struct ibv_context *context);
 int mlx4_free_pd(struct ibv_pd *pd);
 struct ibv_xrcd *mlx4_open_xrcd(struct ibv_context *context,
diff --git a/providers/mlx4/verbs.c b/providers/mlx4/verbs.c
index c523c41..e2f798f 100644
--- a/providers/mlx4/verbs.c
+++ b/providers/mlx4/verbs.c
@@ -101,6 +101,52 @@ int mlx4_query_device_ex(struct ibv_context *context,
 	return 0;
 }
 
+#define READL(ptr) (*((uint32_t *)(ptr)))
+static int mlx4_read_clock(struct ibv_context *context, uint64_t *cycles)
+{
+	unsigned int clockhi, clocklo, clockhi1;
+	int i;
+	struct mlx4_context *ctx = to_mctx(context);
+
+	if (!ctx->hca_core_clock)
+		return -EOPNOTSUPP;
+
+	/* Handle wraparound */
+	for (i = 0; i < 2; i++) {
+		clockhi = ntohl(READL(ctx->hca_core_clock));
+		clocklo = ntohl(READL(ctx->hca_core_clock + 4));
+		clockhi1 = ntohl(READL(ctx->hca_core_clock));
+		if (clockhi == clockhi1)
+			break;
+	}
+
+	*cycles = (uint64_t)clockhi << 32 | (uint64_t)clocklo;
+
+	return 0;
+}
+
+int mlx4_query_rt_values(struct ibv_context *context,
+			 struct ibv_values_ex *values)
+{
+	uint32_t comp_mask = 0;
+	int err = 0;
+
+	if (values->comp_mask & IBV_VALUES_MASK_RAW_CLOCK) {
+		uint64_t cycles;
+
+		err = mlx4_read_clock(context, &cycles);
+		if (!err) {
+			values->raw_clock.tv_sec = 0;
+			values->raw_clock.tv_nsec = cycles;
+			comp_mask |= IBV_VALUES_MASK_RAW_CLOCK;
+		}
+	}
+
+	values->comp_mask = comp_mask;
+
+	return err;
+}
+
 int mlx4_query_port(struct ibv_context *context, uint8_t port,
 		     struct ibv_port_attr *attr)
 {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH rdma-core 2/8] mlx4: Refactor mlx4_poll_one
       [not found]     ` <1485355791-27646-3-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-01-25 17:00       ` Jason Gunthorpe
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2017-01-25 17:00 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, lariel-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

On Wed, Jan 25, 2017 at 04:49:45PM +0200, Yishai Hadas wrote:

> +#ifdef HAVE_FUNC_ATTRIBUTE_ALWAYS_INLINE
> +#define ALWAYS_INLINE __attribute__((always_inline))
> +#else
> +#define ALWAYS_INLINE
> +#endif

Please put this and the copy in mlx5.h into util/compiler.h

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH rdma-core 5/8] mlx4: Add ability to poll CQs through iterator's style API
       [not found]     ` <1485355791-27646-6-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-01-25 17:04       ` Jason Gunthorpe
       [not found]         ` <20170125170413.GC16579-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2017-01-25 17:04 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, lariel-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

On Wed, Jan 25, 2017 at 04:49:48PM +0200, Yishai Hadas wrote:
> diff --git a/providers/mlx4/mlx4.h b/providers/mlx4/mlx4.h
> index 5ab083c..cb4c8d4 100644
> +++ b/providers/mlx4/mlx4.h
> @@ -59,12 +59,20 @@ enum {
>  
>  #ifndef likely
>  #ifdef __GNUC__
> -#define likely(x)       __builtin_expect(!!(x),1)
> +#define likely(x)       __builtin_expect(!!(x), 1)
>  #else
>  #define likely(x)      (x)
>  #endif
>  #endif
>  
> +#ifndef unlikely
> +#ifdef __GNUC__
> +#define unlikely(x)	    __builtin_expect(!!(x), 0)
> +#else
> +#define unlikely(x)	   (x)
> +#endif
> +#endif

Also move all the copies of these into util/compiler.h instead of
adding more.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH rdma-core 4/8] mlx4: Add inline functions to read completion's attributes
       [not found]     ` <1485355791-27646-5-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-01-25 17:09       ` Jason Gunthorpe
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2017-01-25 17:09 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, lariel-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

On Wed, Jan 25, 2017 at 04:49:47PM +0200, Yishai Hadas wrote:
> From: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Add inline functions in order to read various completion's
> attributes. These functions will be assigned in the ibv_cq_ex
> structure in order to allow the user to read the completion's
> attributes.
> 
> Signed-off-by: Ariel Levkovich <lariel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Acked-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>  providers/mlx4/cq.c   | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  providers/mlx4/mlx4.h |  13 +++--
>  2 files changed, 157 insertions(+), 3 deletions(-)
> 
> diff --git a/providers/mlx4/cq.c b/providers/mlx4/cq.c
> index 6c4b3c4..a80b2fb 100644
> +++ b/providers/mlx4/cq.c
> @@ -416,6 +416,153 @@ int mlx4_poll_cq(struct ibv_cq *ibcq, int ne, struct ibv_wc *wc)
>  	return err == CQ_POLL_ERR ? err : npolled;
>  }
>  
> +static inline enum ibv_wc_opcode mlx4_cq_read_wc_opcode(struct ibv_cq_ex *ibcq)
> +{

Why are these inline?

At the end of the series the only user of this function is here:

+	cq->ibv_cq.read_opcode = mlx4_cq_read_wc_opcode;

Which is using it as a function pointer, so it cannot be inlined.

Drop all the unncessary 'static inline', they are confusing.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH rdma-core 5/8] mlx4: Add ability to poll CQs through iterator's style API
       [not found]         ` <20170125170413.GC16579-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-01-25 17:11           ` Leon Romanovsky
       [not found]             ` <20170125171112.GR6005-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Leon Romanovsky @ 2017-01-25 17:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, lariel-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]

On Wed, Jan 25, 2017 at 10:04:13AM -0700, Jason Gunthorpe wrote:
> On Wed, Jan 25, 2017 at 04:49:48PM +0200, Yishai Hadas wrote:
> > diff --git a/providers/mlx4/mlx4.h b/providers/mlx4/mlx4.h
> > index 5ab083c..cb4c8d4 100644
> > +++ b/providers/mlx4/mlx4.h
> > @@ -59,12 +59,20 @@ enum {
> >
> >  #ifndef likely
> >  #ifdef __GNUC__
> > -#define likely(x)       __builtin_expect(!!(x),1)
> > +#define likely(x)       __builtin_expect(!!(x), 1)
> >  #else
> >  #define likely(x)      (x)
> >  #endif
> >  #endif
> >
> > +#ifndef unlikely
> > +#ifdef __GNUC__
> > +#define unlikely(x)	    __builtin_expect(!!(x), 0)
> > +#else
> > +#define unlikely(x)	   (x)
> > +#endif
> > +#endif
>
> Also move all the copies of these into util/compiler.h instead of
> adding more.

I tried once :)
https://www.spinics.net/lists/linux-rdma/msg43086.html

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH rdma-core 5/8] mlx4: Add ability to poll CQs through iterator's style API
       [not found]             ` <20170125171112.GR6005-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2017-01-25 17:15               ` Jason Gunthorpe
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2017-01-25 17:15 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Yishai Hadas, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, lariel-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

On Wed, Jan 25, 2017 at 07:11:12PM +0200, Leon Romanovsky wrote:
> On Wed, Jan 25, 2017 at 10:04:13AM -0700, Jason Gunthorpe wrote:

> > Also move all the copies of these into util/compiler.h instead of
> > adding more.
> 
> I tried once :)
> https://www.spinics.net/lists/linux-rdma/msg43086.html

I thought I rememered this had been done..

All things considered, if you are not going to make the ccan debug
facility work, I'd just add the trivial macros to util/compiler.h, the
ccan version has nothing else special about it.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-01-25 17:15 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-25 14:49 [PATCH rdma-core 0/8] Completion timestamping support in mlx4 Yishai Hadas
     [not found] ` <1485355791-27646-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-01-25 14:49   ` [PATCH rdma-core 1/8] mlx4: sl_vid field in struct mlx4_cqe should be 16 bit Yishai Hadas
2017-01-25 14:49   ` [PATCH rdma-core 2/8] mlx4: Refactor mlx4_poll_one Yishai Hadas
     [not found]     ` <1485355791-27646-3-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-01-25 17:00       ` Jason Gunthorpe
2017-01-25 14:49   ` [PATCH rdma-core 3/8] mlx4: Add lazy CQ polling Yishai Hadas
2017-01-25 14:49   ` [PATCH rdma-core 4/8] mlx4: Add inline functions to read completion's attributes Yishai Hadas
     [not found]     ` <1485355791-27646-5-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-01-25 17:09       ` Jason Gunthorpe
2017-01-25 14:49   ` [PATCH rdma-core 5/8] mlx4: Add ability to poll CQs through iterator's style API Yishai Hadas
     [not found]     ` <1485355791-27646-6-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-01-25 17:04       ` Jason Gunthorpe
     [not found]         ` <20170125170413.GC16579-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-01-25 17:11           ` Leon Romanovsky
     [not found]             ` <20170125171112.GR6005-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-01-25 17:15               ` Jason Gunthorpe
2017-01-25 14:49   ` [PATCH rdma-core 6/8] mlx4: Add support for creating an extended CQ Yishai Hadas
2017-01-25 14:49   ` [PATCH rdma-core 7/8] mlx4: Add ibv_query_device_ex support Yishai Hadas
2017-01-25 14:49   ` [PATCH rdma-core 8/8] mlx4: Add ibv_query_rt_values Yishai Hadas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.