All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH libmlx5 V1 0/6] Completion timestamping
@ 2015-12-03 16:02 Matan Barak
       [not found] ` <1449158571-26228-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Matan Barak @ 2015-12-03 16:02 UTC (permalink / raw)
  To: Eli Cohen
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Matan Barak,
	Eran Ben Elisha, Christoph Lameter

Hi Eli,

This series adds support for completion timestamp. In order to
support this feature, several extended verbs were implemented
(as instructed in libibverbs).

ibv_query_device_ex was extended to support reading the
hca_core_clock and timestamp mask.

The init_context verb vendor specific data was changed so
it'll conform to the verbs extensions form. This is done in
order to easily extend the response data for passing the page
offset of the free running clock register. This is mandatory
for mapping this register to the user space. This mapping
is done when libmlx5 initializes.

In order to support CQ completion timestmap reporting, we implement
ibv_create_cq_ex verb. This verb is used both for creating a CQ
which supports timestamp and in order to state which fields should
be returned via WC. Returning this data is done via implementing
ibv_poll_cq_ex. We query the CQ requested wc_flags for every field
the user has requested and populate it according to the carried
network operation and WC status.

Last but not least, ibv_poll_cq_ex was optimized in order to eliminate
the if statements and or operations for common combinations of wc
fields. This is done by inlining and using a custom poll_one_ex
function for these fields.

This series depends on '[PATCH libibverbs 0/5] Completion timestamping'
and is rebased above '[PATCH libmlx5 v1 0/5] Support CQE

Thanks,
Matan

Changes from V0:
 * Use mlx5_init_context in order to pass hca_core_clock_offset.

Matan Barak (6):
  Add ibv_poll_cq_ex support
  Add timestmap support for ibv_poll_cq_ex
  Add ibv_create_cq_ex support
  Add ibv_query_values support
  Optimize poll_cq
  Add always_inline check

 configure.ac   |  17 +
 src/cq.c       | 959 ++++++++++++++++++++++++++++++++++++++++++++++++---------
 src/mlx5-abi.h |  10 +-
 src/mlx5.c     |  43 +++
 src/mlx5.h     |  42 ++-
 src/verbs.c    | 115 ++++++-
 6 files changed, 1037 insertions(+), 149 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH libmlx5 V1 1/6] Add ibv_poll_cq_ex support
       [not found] ` <1449158571-26228-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-12-03 16:02   ` Matan Barak
  2015-12-03 16:02   ` [PATCH libmlx5 V1 2/6] Add timestmap support for ibv_poll_cq_ex Matan Barak
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Matan Barak @ 2015-12-03 16:02 UTC (permalink / raw)
  To: Eli Cohen
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Matan Barak,
	Eran Ben Elisha, Christoph Lameter

Extended poll_cq supports writing only user's required work completion
fields. Adding support for this extended verb.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 src/cq.c   | 699 +++++++++++++++++++++++++++++++++++++++++++++++++------------
 src/mlx5.c |   5 +
 src/mlx5.h |  14 ++
 3 files changed, 584 insertions(+), 134 deletions(-)

diff --git a/src/cq.c b/src/cq.c
index 32f0dd4..0185696 100644
--- a/src/cq.c
+++ b/src/cq.c
@@ -200,6 +200,85 @@ static void handle_good_req(struct ibv_wc *wc, struct mlx5_cqe64 *cqe)
 	}
 }
 
+union wc_buffer {
+	uint8_t		*b8;
+	uint16_t	*b16;
+	uint32_t	*b32;
+	uint64_t	*b64;
+};
+
+static inline void handle_good_req_ex(struct ibv_wc_ex *wc_ex,
+				      union wc_buffer *pwc_buffer,
+				      struct mlx5_cqe64 *cqe,
+				      uint64_t wc_flags,
+				      uint32_t qpn)
+{
+	union wc_buffer wc_buffer = *pwc_buffer;
+
+	switch (ntohl(cqe->sop_drop_qpn) >> 24) {
+	case MLX5_OPCODE_RDMA_WRITE_IMM:
+		wc_ex->wc_flags |= IBV_WC_EX_IMM;
+	case MLX5_OPCODE_RDMA_WRITE:
+		wc_ex->opcode    = IBV_WC_RDMA_WRITE;
+		if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN)
+			wc_buffer.b32++;
+		if (wc_flags & IBV_WC_EX_WITH_IMM)
+			wc_buffer.b32++;
+		break;
+	case MLX5_OPCODE_SEND_IMM:
+		wc_ex->wc_flags |= IBV_WC_EX_IMM;
+	case MLX5_OPCODE_SEND:
+	case MLX5_OPCODE_SEND_INVAL:
+		wc_ex->opcode    = IBV_WC_SEND;
+		if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN)
+			wc_buffer.b32++;
+		if (wc_flags & IBV_WC_EX_WITH_IMM)
+			wc_buffer.b32++;
+		break;
+	case MLX5_OPCODE_RDMA_READ:
+		wc_ex->opcode    = IBV_WC_RDMA_READ;
+		if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN) {
+			*wc_buffer.b32++ = ntohl(cqe->byte_cnt);
+			wc_ex->wc_flags |= IBV_WC_EX_WITH_BYTE_LEN;
+		}
+		if (wc_flags & IBV_WC_EX_WITH_IMM)
+			wc_buffer.b32++;
+		break;
+	case MLX5_OPCODE_ATOMIC_CS:
+		wc_ex->opcode    = IBV_WC_COMP_SWAP;
+		if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN) {
+			*wc_buffer.b32++ = 8;
+			wc_ex->wc_flags |= IBV_WC_EX_WITH_BYTE_LEN;
+		}
+		if (wc_flags & IBV_WC_EX_WITH_IMM)
+			wc_buffer.b32++;
+		break;
+	case MLX5_OPCODE_ATOMIC_FA:
+		wc_ex->opcode    = IBV_WC_FETCH_ADD;
+		if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN) {
+			*wc_buffer.b32++ = 8;
+			wc_ex->wc_flags |= IBV_WC_EX_WITH_BYTE_LEN;
+		}
+		if (wc_flags & IBV_WC_EX_WITH_IMM)
+			wc_buffer.b32++;
+		break;
+	case MLX5_OPCODE_BIND_MW:
+		wc_ex->opcode    = IBV_WC_BIND_MW;
+		if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN)
+			wc_buffer.b32++;
+		if (wc_flags & IBV_WC_EX_WITH_IMM)
+			wc_buffer.b32++;
+		break;
+	}
+
+	if (wc_flags & IBV_WC_EX_WITH_QP_NUM) {
+		*wc_buffer.b32++ = qpn;
+		wc_ex->wc_flags |= IBV_WC_EX_WITH_QP_NUM;
+	}
+
+	*pwc_buffer = wc_buffer;
+}
+
 static int handle_responder(struct ibv_wc *wc, struct mlx5_cqe64 *cqe,
 			    struct mlx5_qp *qp, struct mlx5_srq *srq)
 {
@@ -262,6 +341,103 @@ static int handle_responder(struct ibv_wc *wc, struct mlx5_cqe64 *cqe,
 	return IBV_WC_SUCCESS;
 }
 
+static inline int handle_responder_ex(struct ibv_wc_ex *wc_ex,
+				      union wc_buffer *pwc_buffer,
+				      struct mlx5_cqe64 *cqe,
+				      struct mlx5_qp *qp, struct mlx5_srq *srq,
+				      uint64_t wc_flags, uint32_t qpn)
+{
+	uint16_t wqe_ctr;
+	struct mlx5_wq *wq;
+	uint8_t g;
+	union wc_buffer wc_buffer = *pwc_buffer;
+	int err = 0;
+	uint32_t byte_len = ntohl(cqe->byte_cnt);
+
+	if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN) {
+		*wc_buffer.b32++ = byte_len;
+		wc_ex->wc_flags |= IBV_WC_EX_WITH_BYTE_LEN;
+	}
+	if (srq) {
+		wqe_ctr = ntohs(cqe->wqe_counter);
+		wc_ex->wr_id = srq->wrid[wqe_ctr];
+		mlx5_free_srq_wqe(srq, wqe_ctr);
+		if (cqe->op_own & MLX5_INLINE_SCATTER_32)
+			err = mlx5_copy_to_recv_srq(srq, wqe_ctr, cqe,
+						    byte_len);
+		else if (cqe->op_own & MLX5_INLINE_SCATTER_64)
+			err = mlx5_copy_to_recv_srq(srq, wqe_ctr, cqe - 1,
+						    byte_len);
+	} else {
+		wq	  = &qp->rq;
+		wqe_ctr = wq->tail & (wq->wqe_cnt - 1);
+		wc_ex->wr_id = wq->wrid[wqe_ctr];
+		++wq->tail;
+		if (cqe->op_own & MLX5_INLINE_SCATTER_32)
+			err = mlx5_copy_to_recv_wqe(qp, wqe_ctr, cqe,
+						    byte_len);
+		else if (cqe->op_own & MLX5_INLINE_SCATTER_64)
+			err = mlx5_copy_to_recv_wqe(qp, wqe_ctr, cqe - 1,
+						    byte_len);
+	}
+	if (err)
+		return err;
+
+	switch (cqe->op_own >> 4) {
+	case MLX5_CQE_RESP_WR_IMM:
+		wc_ex->opcode	= IBV_WC_RECV_RDMA_WITH_IMM;
+		wc_ex->wc_flags	= IBV_WC_EX_IMM;
+		if (wc_flags & IBV_WC_EX_WITH_IMM) {
+			*wc_buffer.b32++ = ntohl(cqe->byte_cnt);
+			wc_ex->wc_flags |= IBV_WC_EX_WITH_IMM;
+		}
+		break;
+	case MLX5_CQE_RESP_SEND:
+		wc_ex->opcode   = IBV_WC_RECV;
+		if (wc_flags & IBV_WC_EX_WITH_IMM)
+			wc_buffer.b32++;
+		break;
+	case MLX5_CQE_RESP_SEND_IMM:
+		wc_ex->opcode	= IBV_WC_RECV;
+		wc_ex->wc_flags	= IBV_WC_EX_WITH_IMM;
+		if (wc_flags & IBV_WC_EX_WITH_IMM) {
+			*wc_buffer.b32++ = ntohl(cqe->imm_inval_pkey);
+			wc_ex->wc_flags |= IBV_WC_EX_WITH_IMM;
+		}
+		break;
+	}
+	if (wc_flags & IBV_WC_EX_WITH_QP_NUM) {
+		*wc_buffer.b32++ = qpn;
+		wc_ex->wc_flags |= IBV_WC_EX_WITH_QP_NUM;
+	}
+	if (wc_flags & IBV_WC_EX_WITH_SRC_QP) {
+		*wc_buffer.b32++ = ntohl(cqe->flags_rqpn) & 0xffffff;
+		wc_ex->wc_flags |= IBV_WC_EX_WITH_SRC_QP;
+	}
+	if (wc_flags & IBV_WC_EX_WITH_PKEY_INDEX) {
+		*wc_buffer.b16++ = ntohl(cqe->imm_inval_pkey) & 0xffff;
+		wc_ex->wc_flags |= IBV_WC_EX_WITH_PKEY_INDEX;
+	}
+	if (wc_flags & IBV_WC_EX_WITH_SLID) {
+		*wc_buffer.b16++ = ntohs(cqe->slid);
+		wc_ex->wc_flags |= IBV_WC_EX_WITH_SLID;
+	}
+	if (wc_flags & IBV_WC_EX_WITH_SL) {
+		*wc_buffer.b8++ = (ntohl(cqe->flags_rqpn) >> 24) & 0xf;
+		wc_ex->wc_flags |= IBV_WC_EX_WITH_SL;
+	}
+	if (wc_flags & IBV_WC_EX_WITH_DLID_PATH_BITS) {
+		*wc_buffer.b8++ = cqe->ml_path & 0x7f;
+		wc_ex->wc_flags |= IBV_WC_EX_WITH_DLID_PATH_BITS;
+	}
+
+	g = (ntohl(cqe->flags_rqpn) >> 28) & 3;
+	wc_ex->wc_flags |= g ? IBV_WC_EX_GRH : 0;
+
+	*pwc_buffer = wc_buffer;
+	return IBV_WC_SUCCESS;
+}
+
 static void dump_cqe(FILE *fp, void *buf)
 {
 	uint32_t *p = buf;
@@ -273,54 +449,55 @@ static void dump_cqe(FILE *fp, void *buf)
 }
 
 static void mlx5_handle_error_cqe(struct mlx5_err_cqe *cqe,
-				  struct ibv_wc *wc)
+				  uint32_t *pwc_status,
+				  uint32_t *pwc_vendor_err)
 {
 	switch (cqe->syndrome) {
 	case MLX5_CQE_SYNDROME_LOCAL_LENGTH_ERR:
-		wc->status = IBV_WC_LOC_LEN_ERR;
+		*pwc_status = IBV_WC_LOC_LEN_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_LOCAL_QP_OP_ERR:
-		wc->status = IBV_WC_LOC_QP_OP_ERR;
+		*pwc_status = IBV_WC_LOC_QP_OP_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_LOCAL_PROT_ERR:
-		wc->status = IBV_WC_LOC_PROT_ERR;
+		*pwc_status = IBV_WC_LOC_PROT_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_WR_FLUSH_ERR:
-		wc->status = IBV_WC_WR_FLUSH_ERR;
+		*pwc_status = IBV_WC_WR_FLUSH_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_MW_BIND_ERR:
-		wc->status = IBV_WC_MW_BIND_ERR;
+		*pwc_status = IBV_WC_MW_BIND_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_BAD_RESP_ERR:
-		wc->status = IBV_WC_BAD_RESP_ERR;
+		*pwc_status = IBV_WC_BAD_RESP_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_LOCAL_ACCESS_ERR:
-		wc->status = IBV_WC_LOC_ACCESS_ERR;
+		*pwc_status = IBV_WC_LOC_ACCESS_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR:
-		wc->status = IBV_WC_REM_INV_REQ_ERR;
+		*pwc_status = IBV_WC_REM_INV_REQ_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR:
-		wc->status = IBV_WC_REM_ACCESS_ERR;
+		*pwc_status = IBV_WC_REM_ACCESS_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_REMOTE_OP_ERR:
-		wc->status = IBV_WC_REM_OP_ERR;
+		*pwc_status = IBV_WC_REM_OP_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR:
-		wc->status = IBV_WC_RETRY_EXC_ERR;
+		*pwc_status = IBV_WC_RETRY_EXC_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_RNR_RETRY_EXC_ERR:
-		wc->status = IBV_WC_RNR_RETRY_EXC_ERR;
+		*pwc_status = IBV_WC_RNR_RETRY_EXC_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_REMOTE_ABORTED_ERR:
-		wc->status = IBV_WC_REM_ABORT_ERR;
+		*pwc_status = IBV_WC_REM_ABORT_ERR;
 		break;
 	default:
-		wc->status = IBV_WC_GENERAL_ERR;
+		*pwc_status = IBV_WC_GENERAL_ERR;
 		break;
 	}
 
-	wc->vendor_err = cqe->vendor_err_synd;
+	*pwc_vendor_err = cqe->vendor_err_synd;
 }
 
 #if defined(__x86_64__) || defined (__i386__)
@@ -453,6 +630,171 @@ static inline int get_srq_ctx(struct mlx5_context *mctx,
 	return CQ_OK;
 }
 
+static inline void dump_cqe_debug(FILE *fp, struct mlx5_cqe64 *cqe64)
+	__attribute__((always_inline));
+static inline void dump_cqe_debug(FILE *fp, struct mlx5_cqe64 *cqe64)
+{
+#ifdef MLX5_DEBUG
+	if (mlx5_debug_mask & MLX5_DBG_CQ_CQE) {
+		mlx5_dbg(fp, MLX5_DBG_CQ_CQE, "dump cqe for cqn 0x%x:\n", cq->cqn);
+		dump_cqe(fp, cqe64);
+	}
+#endif
+}
+
+inline int mlx5_poll_one_cqe_req(struct mlx5_cq *cq,
+				 struct mlx5_resource **cur_rsc,
+				 void *cqe, uint32_t qpn, int cqe_ver,
+				 uint64_t *wr_id) __attribute__((always_inline));
+inline int mlx5_poll_one_cqe_req(struct mlx5_cq *cq,
+				 struct mlx5_resource **cur_rsc,
+				 void *cqe, uint32_t qpn, int cqe_ver,
+				 uint64_t *wr_id)
+{
+	struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context);
+	struct mlx5_qp *mqp = NULL;
+	struct mlx5_cqe64 *cqe64 = (cq->cqe_sz == 64) ? cqe : cqe + 64;
+	uint32_t byte_len = ntohl(cqe64->byte_cnt);
+	struct mlx5_wq *wq;
+	uint16_t wqe_ctr;
+	int err;
+	int idx;
+
+	mqp = get_req_context(mctx, cur_rsc,
+			      (cqe_ver ? (ntohl(cqe64->srqn_uidx) & 0xffffff) : qpn),
+			      cqe_ver);
+	if (unlikely(!mqp))
+		return CQ_POLL_ERR;
+	wq = &mqp->sq;
+	wqe_ctr = ntohs(cqe64->wqe_counter);
+	idx = wqe_ctr & (wq->wqe_cnt - 1);
+	if (cqe64->op_own & MLX5_INLINE_SCATTER_32)
+		err = mlx5_copy_to_send_wqe(mqp, wqe_ctr, cqe,
+					    byte_len);
+	else if (cqe64->op_own & MLX5_INLINE_SCATTER_64)
+		err = mlx5_copy_to_send_wqe(mqp, wqe_ctr, cqe - 1,
+					    byte_len);
+	else
+		err = 0;
+
+	wq->tail = wq->wqe_head[idx] + 1;
+	*wr_id = wq->wrid[idx];
+
+	return err;
+}
+
+inline int mlx5_poll_one_cqe_resp(struct mlx5_context *mctx,
+				  struct mlx5_resource **cur_rsc,
+				  struct mlx5_srq **cur_srq,
+				  struct mlx5_cqe64 *cqe64, int cqe_ver,
+				  uint32_t qpn, int *is_srq)
+	__attribute__((always_inline));
+inline int mlx5_poll_one_cqe_resp(struct mlx5_context *mctx,
+				  struct mlx5_resource **cur_rsc,
+				  struct mlx5_srq **cur_srq,
+				  struct mlx5_cqe64 *cqe64, int cqe_ver,
+				  uint32_t qpn, int *is_srq)
+{
+	uint32_t srqn_uidx = ntohl(cqe64->srqn_uidx) & 0xffffff;
+	int err;
+
+	if (cqe_ver) {
+		err = get_resp_cxt_v1(mctx, cur_rsc, cur_srq, srqn_uidx, is_srq);
+	} else {
+		if (srqn_uidx) {
+			err = get_srq_ctx(mctx, cur_srq, srqn_uidx);
+			*is_srq = 1;
+		} else {
+			err = get_resp_ctx(mctx, cur_rsc, qpn);
+		}
+	}
+
+	return err;
+}
+
+inline int mlx5_poll_one_cqe_err(struct mlx5_context *mctx,
+				 struct mlx5_resource **cur_rsc,
+				 struct mlx5_srq **cur_srq,
+				 struct mlx5_cqe64 *cqe64, int cqe_ver,
+				 uint32_t qpn, uint32_t *pwc_status,
+				 uint32_t *pwc_vendor_err,
+				 uint64_t *pwc_wr_id, uint8_t opcode)
+	__attribute__((always_inline));
+inline int mlx5_poll_one_cqe_err(struct mlx5_context *mctx,
+				 struct mlx5_resource **cur_rsc,
+				 struct mlx5_srq **cur_srq,
+				 struct mlx5_cqe64 *cqe64, int cqe_ver,
+				 uint32_t qpn, uint32_t *pwc_status,
+				 uint32_t *pwc_vendor_err,
+				 uint64_t *pwc_wr_id, uint8_t opcode)
+{
+	uint32_t srqn_uidx = ntohl(cqe64->srqn_uidx) & 0xffffff;
+	struct mlx5_err_cqe *ecqe = (struct mlx5_err_cqe *)cqe64;
+	int err = CQ_OK;
+
+	mlx5_handle_error_cqe(ecqe, pwc_status, pwc_vendor_err);
+	if (unlikely(ecqe->syndrome != MLX5_CQE_SYNDROME_WR_FLUSH_ERR &&
+		     ecqe->syndrome != MLX5_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR)) {
+		FILE *fp = mctx->dbg_fp;
+
+		fprintf(fp, PFX "%s: got completion with error:\n",
+			mctx->hostname);
+		dump_cqe(fp, ecqe);
+		if (mlx5_freeze_on_error_cqe) {
+			fprintf(fp, PFX "freezing at poll cq...");
+			while (1)
+				sleep(10);
+		}
+	}
+
+	if (opcode == MLX5_CQE_REQ_ERR) {
+		struct mlx5_qp *mqp = NULL;
+		struct mlx5_wq *wq;
+		uint16_t wqe_ctr;
+		int idx;
+
+		mqp = get_req_context(mctx, cur_rsc, (cqe_ver ? srqn_uidx : qpn), cqe_ver);
+		if (unlikely(!mqp))
+			return CQ_POLL_ERR;
+		wq = &mqp->sq;
+		wqe_ctr = ntohs(cqe64->wqe_counter);
+		idx = wqe_ctr & (wq->wqe_cnt - 1);
+		*pwc_wr_id = wq->wrid[idx];
+		wq->tail = wq->wqe_head[idx] + 1;
+	} else {
+		int is_srq = 0;
+
+		if (cqe_ver) {
+			err = get_resp_cxt_v1(mctx, cur_rsc, cur_srq, srqn_uidx, &is_srq);
+		} else {
+			if (srqn_uidx) {
+				err = get_srq_ctx(mctx, cur_srq, srqn_uidx);
+				is_srq = 1;
+			} else {
+				err = get_resp_ctx(mctx, cur_rsc, qpn);
+			}
+		}
+		if (unlikely(err))
+			return CQ_POLL_ERR;
+
+		if (is_srq) {
+			uint16_t wqe_ctr = ntohs(cqe64->wqe_counter);
+
+			*pwc_wr_id = (*cur_srq)->wrid[wqe_ctr];
+			mlx5_free_srq_wqe(*cur_srq, wqe_ctr);
+		} else {
+			struct mlx5_qp *mqp = rsc_to_mqp(*cur_rsc);
+			struct mlx5_wq *wq;
+
+			wq = &mqp->rq;
+			*pwc_wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
+			++wq->tail;
+		}
+	}
+
+	return err;
+}
+
 static inline int mlx5_poll_one(struct mlx5_cq *cq,
 			 struct mlx5_resource **cur_rsc,
 			 struct mlx5_srq **cur_srq,
@@ -464,17 +806,10 @@ static inline int mlx5_poll_one(struct mlx5_cq *cq,
 			 struct ibv_wc *wc, int cqe_ver)
 {
 	struct mlx5_cqe64 *cqe64;
-	struct mlx5_wq *wq;
-	uint16_t wqe_ctr;
 	void *cqe;
 	uint32_t qpn;
-	uint32_t srqn_uidx;
-	int idx;
 	uint8_t opcode;
-	struct mlx5_err_cqe *ecqe;
 	int err;
-	int is_srq = 0;
-	struct mlx5_qp *mqp = NULL;
 	struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context);
 
 	cqe = next_cqe_sw(cq);
@@ -494,137 +829,165 @@ static inline int mlx5_poll_one(struct mlx5_cq *cq,
 	 */
 	rmb();
 
-#ifdef MLX5_DEBUG
-	if (mlx5_debug_mask & MLX5_DBG_CQ_CQE) {
-		FILE *fp = mctx->dbg_fp;
-
-		mlx5_dbg(fp, MLX5_DBG_CQ_CQE, "dump cqe for cqn 0x%x:\n", cq->cqn);
-		dump_cqe(fp, cqe64);
-	}
-#endif
+	dump_cqe_debug(mctx->dbg_fp, cqe64);
 
 	qpn = ntohl(cqe64->sop_drop_qpn) & 0xffffff;
 	wc->wc_flags = 0;
 
 	switch (opcode) {
 	case MLX5_CQE_REQ:
-		mqp = get_req_context(mctx, cur_rsc,
-				      (cqe_ver ? (ntohl(cqe64->srqn_uidx) & 0xffffff) : qpn),
-				      cqe_ver);
-		if (unlikely(!mqp))
-			return CQ_POLL_ERR;
-		wq = &mqp->sq;
-		wqe_ctr = ntohs(cqe64->wqe_counter);
-		idx = wqe_ctr & (wq->wqe_cnt - 1);
+		err = mlx5_poll_one_cqe_req(cq, cur_rsc, cqe, qpn, cqe_ver,
+					    &wc->wr_id);
 		handle_good_req(wc, cqe64);
-		if (cqe64->op_own & MLX5_INLINE_SCATTER_32)
-			err = mlx5_copy_to_send_wqe(mqp, wqe_ctr, cqe,
-						    wc->byte_len);
-		else if (cqe64->op_own & MLX5_INLINE_SCATTER_64)
-			err = mlx5_copy_to_send_wqe(mqp, wqe_ctr, cqe - 1,
-						    wc->byte_len);
-		else
-			err = 0;
-
-		wc->wr_id = wq->wrid[idx];
-		wq->tail = wq->wqe_head[idx] + 1;
 		wc->status = err;
 		break;
+
 	case MLX5_CQE_RESP_WR_IMM:
 	case MLX5_CQE_RESP_SEND:
 	case MLX5_CQE_RESP_SEND_IMM:
-	case MLX5_CQE_RESP_SEND_INV:
-		srqn_uidx = ntohl(cqe64->srqn_uidx) & 0xffffff;
-		if (cqe_ver) {
-			err = get_resp_cxt_v1(mctx, cur_rsc, cur_srq, srqn_uidx, &is_srq);
-		} else {
-			if (srqn_uidx) {
-				err = get_srq_ctx(mctx, cur_srq, srqn_uidx);
-				is_srq = 1;
-			} else {
-				err = get_resp_ctx(mctx, cur_rsc, qpn);
-			}
-		}
+	case MLX5_CQE_RESP_SEND_INV: {
+		int is_srq;
+
+		err = mlx5_poll_one_cqe_resp(mctx, cur_rsc, cur_srq, cqe64,
+					     cqe_ver, qpn, &is_srq);
 		if (unlikely(err))
 			return err;
 
 		wc->status = handle_responder(wc, cqe64, rsc_to_mqp(*cur_rsc),
 					      is_srq ? *cur_srq : NULL);
 		break;
+	}
 	case MLX5_CQE_RESIZE_CQ:
 		break;
 	case MLX5_CQE_REQ_ERR:
 	case MLX5_CQE_RESP_ERR:
-		srqn_uidx = ntohl(cqe64->srqn_uidx) & 0xffffff;
-		ecqe = (struct mlx5_err_cqe *)cqe64;
-		mlx5_handle_error_cqe(ecqe, wc);
-		if (unlikely(ecqe->syndrome != MLX5_CQE_SYNDROME_WR_FLUSH_ERR &&
-			     ecqe->syndrome != MLX5_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR)) {
-			FILE *fp = mctx->dbg_fp;
-			fprintf(fp, PFX "%s: got completion with error:\n",
-				mctx->hostname);
-			dump_cqe(fp, ecqe);
-			if (mlx5_freeze_on_error_cqe) {
-				fprintf(fp, PFX "freezing at poll cq...");
-				while (1)
-					sleep(10);
-			}
-		}
+		err = mlx5_poll_one_cqe_err(mctx, cur_rsc, cur_srq, cqe64,
+					    cqe_ver, qpn, &wc->status,
+					    &wc->vendor_err, &wc->wr_id,
+					    opcode);
+		if (err != CQ_OK)
+			return err;
+		break;
+	}
 
-		if (opcode == MLX5_CQE_REQ_ERR) {
-			mqp = get_req_context(mctx, cur_rsc, (cqe_ver ? srqn_uidx : qpn), cqe_ver);
-			if (unlikely(!mqp))
-				return CQ_POLL_ERR;
-			wq = &mqp->sq;
-			wqe_ctr = ntohs(cqe64->wqe_counter);
-			idx = wqe_ctr & (wq->wqe_cnt - 1);
-			wc->wr_id = wq->wrid[idx];
-			wq->tail = wq->wqe_head[idx] + 1;
-		} else {
-			if (cqe_ver) {
-				err = get_resp_cxt_v1(mctx, cur_rsc, cur_srq, srqn_uidx, &is_srq);
-			} else {
-				if (srqn_uidx) {
-					err = get_srq_ctx(mctx, cur_srq, srqn_uidx);
-					is_srq = 1;
-				} else {
-					err = get_resp_ctx(mctx, cur_rsc, qpn);
-				}
-			}
-			if (unlikely(err))
-				return CQ_POLL_ERR;
+	wc->qp_num = qpn;
+	return CQ_OK;
+}
 
-			if (is_srq) {
-				wqe_ctr = ntohs(cqe64->wqe_counter);
-				wc->wr_id = (*cur_srq)->wrid[wqe_ctr];
-				mlx5_free_srq_wqe(*cur_srq, wqe_ctr);
-			} else {
-				mqp = rsc_to_mqp(*cur_rsc);
-				wq = &mqp->rq;
-				wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
-				++wq->tail;
-			}
-		}
+inline int mlx5_poll_one_ex(struct mlx5_cq *cq,
+			    struct mlx5_resource **cur_rsc,
+			    struct mlx5_srq **cur_srq,
+			    struct ibv_wc_ex **pwc_ex, uint64_t wc_flags,
+			    int cqe_ver)
+{
+	struct mlx5_cqe64 *cqe64;
+	void *cqe;
+	uint32_t qpn;
+	uint8_t opcode;
+	int err;
+	struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context);
+	struct ibv_wc_ex *wc_ex = *pwc_ex;
+	union wc_buffer wc_buffer;
+
+	cqe = next_cqe_sw(cq);
+	if (!cqe)
+		return CQ_EMPTY;
+
+	cqe64 = (cq->cqe_sz == 64) ? cqe : cqe + 64;
+
+	opcode = cqe64->op_own >> 4;
+	++cq->cons_index;
+
+	VALGRIND_MAKE_MEM_DEFINED(cqe64, sizeof *cqe64);
+
+	/*
+	 * Make sure we read CQ entry contents after we've checked the
+	 * ownership bit.
+	 */
+	rmb();
+
+	dump_cqe_debug(mctx->dbg_fp, cqe64);
+
+	qpn = ntohl(cqe64->sop_drop_qpn) & 0xffffff;
+	wc_buffer.b64 = (uint64_t *)&wc_ex->buffer;
+	wc_ex->wc_flags = 0;
+	wc_ex->reserved = 0;
+
+	switch (opcode) {
+	case MLX5_CQE_REQ:
+		err = mlx5_poll_one_cqe_req(cq, cur_rsc, cqe, qpn, cqe_ver,
+					    &wc_ex->wr_id);
+		handle_good_req_ex(wc_ex, &wc_buffer, cqe64, wc_flags, qpn);
+		wc_ex->status = err;
+		if (wc_flags & IBV_WC_EX_WITH_SRC_QP)
+			wc_buffer.b32++;
+		if (wc_flags & IBV_WC_EX_WITH_PKEY_INDEX)
+			wc_buffer.b16++;
+		if (wc_flags & IBV_WC_EX_WITH_SLID)
+			wc_buffer.b16++;
+		if (wc_flags & IBV_WC_EX_WITH_SL)
+			wc_buffer.b8++;
+		if (wc_flags & IBV_WC_EX_WITH_DLID_PATH_BITS)
+			wc_buffer.b8++;
+		break;
+
+	case MLX5_CQE_RESP_WR_IMM:
+	case MLX5_CQE_RESP_SEND:
+	case MLX5_CQE_RESP_SEND_IMM:
+	case MLX5_CQE_RESP_SEND_INV: {
+		int is_srq;
+
+		err = mlx5_poll_one_cqe_resp(mctx, cur_rsc, cur_srq, cqe64,
+					     cqe_ver, qpn, &is_srq);
+		if (unlikely(err))
+			return err;
+
+		wc_ex->status = handle_responder_ex(wc_ex, &wc_buffer, cqe64,
+						    rsc_to_mqp(*cur_rsc),
+						    is_srq ? *cur_srq : NULL,
+						    wc_flags, qpn);
 		break;
 	}
+	case MLX5_CQE_REQ_ERR:
+	case MLX5_CQE_RESP_ERR:
+		err = mlx5_poll_one_cqe_err(mctx, cur_rsc, cur_srq, cqe64,
+					    cqe_ver, qpn, &wc_ex->status,
+					    &wc_ex->vendor_err, &wc_ex->wr_id,
+					    opcode);
+		if (err != CQ_OK)
+			return err;
 
-	wc->qp_num = qpn;
+	case MLX5_CQE_RESIZE_CQ:
+		if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN)
+			wc_buffer.b32++;
+		if (wc_flags & IBV_WC_EX_WITH_IMM)
+			wc_buffer.b32++;
+		if (wc_flags & IBV_WC_EX_WITH_QP_NUM) {
+			*wc_buffer.b32++ = qpn;
+			wc_ex->wc_flags |= IBV_WC_EX_WITH_QP_NUM;
+		}
+		if (wc_flags & IBV_WC_EX_WITH_SRC_QP)
+			wc_buffer.b32++;
+		if (wc_flags & IBV_WC_EX_WITH_PKEY_INDEX)
+			wc_buffer.b16++;
+		if (wc_flags & IBV_WC_EX_WITH_SLID)
+			wc_buffer.b16++;
+		if (wc_flags & IBV_WC_EX_WITH_SL)
+			wc_buffer.b8++;
+		if (wc_flags & IBV_WC_EX_WITH_DLID_PATH_BITS)
+			wc_buffer.b8++;
+		break;
+	}
 
+	*pwc_ex = (struct ibv_wc_ex *)((uintptr_t)(wc_buffer.b8 + sizeof(uint64_t) - 1) &
+				       ~(sizeof(uint64_t) - 1));
 	return CQ_OK;
 }
 
-static inline int poll_cq(struct ibv_cq *ibcq, int ne,
-		      struct ibv_wc *wc, int cqe_ver)
-		      __attribute__((always_inline));
-static inline int poll_cq(struct ibv_cq *ibcq, int ne,
-		      struct ibv_wc *wc, int cqe_ver)
+static inline void mlx5_poll_cq_stall_start(struct mlx5_cq *cq)
+__attribute__((always_inline));
+static inline void mlx5_poll_cq_stall_start(struct mlx5_cq *cq)
 {
-	struct mlx5_cq *cq = to_mcq(ibcq);
-	struct mlx5_resource *rsc = NULL;
-	struct mlx5_srq *srq = NULL;
-	int npolled;
-	int err = CQ_OK;
-
 	if (cq->stall_enable) {
 		if (cq->stall_adaptive_enable) {
 			if (cq->stall_last_count)
@@ -634,19 +997,13 @@ static inline int poll_cq(struct ibv_cq *ibcq, int ne,
 			mlx5_stall_poll_cq();
 		}
 	}
+}
 
-	mlx5_spin_lock(&cq->lock);
-
-	for (npolled = 0; npolled < ne; ++npolled) {
-		err = mlx5_poll_one(cq, &rsc, &srq, wc + npolled, cqe_ver);
-		if (err != CQ_OK)
-			break;
-	}
-
-	update_cons_index(cq);
-
-	mlx5_spin_unlock(&cq->lock);
-
+static inline void mlx5_poll_cq_stall_end(struct mlx5_cq *cq, int ne,
+					  int npolled, int err) __attribute__((always_inline));
+static inline void mlx5_poll_cq_stall_end(struct mlx5_cq *cq, int ne,
+					  int npolled, int err)
+{
 	if (cq->stall_enable) {
 		if (cq->stall_adaptive_enable) {
 			if (npolled == 0) {
@@ -666,6 +1023,34 @@ static inline int poll_cq(struct ibv_cq *ibcq, int ne,
 			cq->stall_next_poll = 1;
 		}
 	}
+}
+
+static inline int poll_cq(struct ibv_cq *ibcq, int ne,
+			  struct ibv_wc *wc, int cqe_ver)
+	__attribute__((always_inline));
+static inline int poll_cq(struct ibv_cq *ibcq, int ne,
+			  struct ibv_wc *wc, int cqe_ver)
+{
+	struct mlx5_cq *cq = to_mcq(ibcq);
+	struct mlx5_resource *rsc = NULL;
+	struct mlx5_srq *srq = NULL;
+	int npolled;
+	int err = CQ_OK;
+
+	mlx5_poll_cq_stall_start(cq);
+	mlx5_spin_lock(&cq->lock);
+
+	for (npolled = 0; npolled < ne; ++npolled) {
+		err = mlx5_poll_one(cq, &rsc, &srq, wc + npolled, cqe_ver);
+		if (err != CQ_OK)
+			break;
+	}
+
+	update_cons_index(cq);
+
+	mlx5_spin_unlock(&cq->lock);
+
+	mlx5_poll_cq_stall_end(cq, ne, npolled, err);
 
 	return err == CQ_POLL_ERR ? err : npolled;
 }
@@ -680,6 +1065,52 @@ int mlx5_poll_cq_v1(struct ibv_cq *ibcq, int ne, struct ibv_wc *wc)
 	return poll_cq(ibcq, ne, wc, 1);
 }
 
+static inline int poll_cq_ex(struct ibv_cq *ibcq, struct ibv_wc_ex *wc,
+			     struct ibv_poll_cq_ex_attr *attr, int cqe_ver)
+{
+	struct mlx5_cq *cq = to_mcq(ibcq);
+	struct mlx5_resource *rsc = NULL;
+	struct mlx5_srq *srq = NULL;
+	int npolled;
+	int err = CQ_OK;
+	int (*poll_fn)(struct mlx5_cq *cq, struct mlx5_resource **rsc,
+		       struct mlx5_srq **cur_srq,
+		       struct ibv_wc_ex **pwc_ex, uint64_t wc_flags,
+		       int cqe_ver) =
+		cq->poll_one;
+	uint64_t wc_flags = cq->wc_flags;
+	unsigned int ne = attr->max_entries;
+
+	mlx5_poll_cq_stall_start(cq);
+	mlx5_spin_lock(&cq->lock);
+
+	for (npolled = 0; npolled < ne; ++npolled) {
+		err = poll_fn(cq, &rsc, &srq, &wc, wc_flags, cqe_ver);
+		if (err != CQ_OK)
+			break;
+	}
+
+	update_cons_index(cq);
+
+	mlx5_spin_unlock(&cq->lock);
+
+	mlx5_poll_cq_stall_end(cq, ne, npolled, err);
+
+	return err == CQ_POLL_ERR ? err : npolled;
+}
+
+int mlx5_poll_cq_ex(struct ibv_cq *ibcq, struct ibv_wc_ex *wc,
+		    struct ibv_poll_cq_ex_attr *attr)
+{
+	return poll_cq_ex(ibcq, wc, attr, 0);
+}
+
+int mlx5_poll_cq_v1_ex(struct ibv_cq *ibcq, struct ibv_wc_ex *wc,
+		       struct ibv_poll_cq_ex_attr *attr)
+{
+	return poll_cq_ex(ibcq, wc, attr, 1);
+}
+
 int mlx5_arm_cq(struct ibv_cq *ibvcq, int solicited)
 {
 	struct mlx5_cq *cq = to_mcq(ibvcq);
diff --git a/src/mlx5.c b/src/mlx5.c
index 5e9b61c..eac332b 100644
--- a/src/mlx5.c
+++ b/src/mlx5.c
@@ -664,6 +664,11 @@ static int mlx5_init_context(struct verbs_device *vdev,
 	verbs_set_ctx_op(v_ctx, create_srq_ex, mlx5_create_srq_ex);
 	verbs_set_ctx_op(v_ctx, get_srq_num, mlx5_get_srq_num);
 	verbs_set_ctx_op(v_ctx, query_device_ex, mlx5_query_device_ex);
+	if (context->cqe_version && context->cqe_version == 1)
+		verbs_set_ctx_op(v_ctx, poll_cq_ex, mlx5_poll_cq_v1_ex);
+	else
+		verbs_set_ctx_op(v_ctx, poll_cq_ex, mlx5_poll_cq_ex);
+
 
 	return 0;
 
diff --git a/src/mlx5.h b/src/mlx5.h
index b57c7c7..91aafbe 100644
--- a/src/mlx5.h
+++ b/src/mlx5.h
@@ -345,6 +345,11 @@ enum {
 
 struct mlx5_cq {
 	struct ibv_cq			ibv_cq;
+	uint64_t			wc_flags;
+	int (*poll_one)(struct mlx5_cq *cq, struct mlx5_resource **cur_rsc,
+			struct mlx5_srq **cur_srq,
+			struct ibv_wc_ex **pwc_ex, uint64_t wc_flags,
+			int cqe_ver);
 	struct mlx5_buf			buf_a;
 	struct mlx5_buf			buf_b;
 	struct mlx5_buf		       *active_buf;
@@ -595,6 +600,15 @@ int mlx5_dereg_mr(struct ibv_mr *mr);
 struct ibv_cq *mlx5_create_cq(struct ibv_context *context, int cqe,
 			       struct ibv_comp_channel *channel,
 			       int comp_vector);
+int mlx5_poll_cq_ex(struct ibv_cq *ibcq, struct ibv_wc_ex *wc,
+		    struct ibv_poll_cq_ex_attr *attr);
+int mlx5_poll_cq_v1_ex(struct ibv_cq *ibcq, struct ibv_wc_ex *wc,
+		       struct ibv_poll_cq_ex_attr *attr);
+int mlx5_poll_one_ex(struct mlx5_cq *cq,
+		     struct mlx5_resource **cur_rsc,
+		     struct mlx5_srq **cur_srq,
+		     struct ibv_wc_ex **pwc_ex, uint64_t wc_flags,
+		     int cqe_ver);
 int mlx5_alloc_cq_buf(struct mlx5_context *mctx, struct mlx5_cq *cq,
 		      struct mlx5_buf *buf, int nent, int cqe_sz);
 int mlx5_free_cq_buf(struct mlx5_context *ctx, struct mlx5_buf *buf);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH libmlx5 V1 2/6] Add timestmap support for ibv_poll_cq_ex
       [not found] ` <1449158571-26228-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-12-03 16:02   ` [PATCH libmlx5 V1 1/6] Add ibv_poll_cq_ex support Matan Barak
@ 2015-12-03 16:02   ` Matan Barak
  2015-12-03 16:02   ` [PATCH libmlx5 V1 3/6] Add ibv_create_cq_ex support Matan Barak
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Matan Barak @ 2015-12-03 16:02 UTC (permalink / raw)
  To: Eli Cohen
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Matan Barak,
	Eran Ben Elisha, Christoph Lameter

Add support for filling the timestamp field in ibv_poll_cq_ex
(if it's required by the user).

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 src/cq.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/cq.c b/src/cq.c
index 0185696..5e06990 100644
--- a/src/cq.c
+++ b/src/cq.c
@@ -913,6 +913,11 @@ inline int mlx5_poll_one_ex(struct mlx5_cq *cq,
 	wc_ex->wc_flags = 0;
 	wc_ex->reserved = 0;
 
+	if (wc_flags & IBV_WC_EX_WITH_COMPLETION_TIMESTAMP) {
+		*wc_buffer.b64++ = ntohll(cqe64->timestamp);
+		wc_ex->wc_flags |= IBV_WC_EX_WITH_COMPLETION_TIMESTAMP;
+	}
+
 	switch (opcode) {
 	case MLX5_CQE_REQ:
 		err = mlx5_poll_one_cqe_req(cq, cur_rsc, cqe, qpn, cqe_ver,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH libmlx5 V1 3/6] Add ibv_create_cq_ex support
       [not found] ` <1449158571-26228-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-12-03 16:02   ` [PATCH libmlx5 V1 1/6] Add ibv_poll_cq_ex support Matan Barak
  2015-12-03 16:02   ` [PATCH libmlx5 V1 2/6] Add timestmap support for ibv_poll_cq_ex Matan Barak
@ 2015-12-03 16:02   ` Matan Barak
  2015-12-03 16:02   ` [PATCH libmlx5 V1 4/6] Add ibv_query_values support Matan Barak
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Matan Barak @ 2015-12-03 16:02 UTC (permalink / raw)
  To: Eli Cohen
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Matan Barak,
	Eran Ben Elisha, Christoph Lameter

In order to create a CQ which supports timestamp, the user needs
to specify the timestamp flag for ibv_create_cq_ex.
Adding support for ibv_create_cq_ex in the mlx5's vendor library.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 src/mlx5.c  |  1 +
 src/mlx5.h  |  2 ++
 src/verbs.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++++--------
 3 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/src/mlx5.c b/src/mlx5.c
index eac332b..229d99d 100644
--- a/src/mlx5.c
+++ b/src/mlx5.c
@@ -664,6 +664,7 @@ static int mlx5_init_context(struct verbs_device *vdev,
 	verbs_set_ctx_op(v_ctx, create_srq_ex, mlx5_create_srq_ex);
 	verbs_set_ctx_op(v_ctx, get_srq_num, mlx5_get_srq_num);
 	verbs_set_ctx_op(v_ctx, query_device_ex, mlx5_query_device_ex);
+	verbs_set_ctx_op(v_ctx, create_cq_ex, mlx5_create_cq_ex);
 	if (context->cqe_version && context->cqe_version == 1)
 		verbs_set_ctx_op(v_ctx, poll_cq_ex, mlx5_poll_cq_v1_ex);
 	else
diff --git a/src/mlx5.h b/src/mlx5.h
index 91aafbe..0c0b027 100644
--- a/src/mlx5.h
+++ b/src/mlx5.h
@@ -600,6 +600,8 @@ int mlx5_dereg_mr(struct ibv_mr *mr);
 struct ibv_cq *mlx5_create_cq(struct ibv_context *context, int cqe,
 			       struct ibv_comp_channel *channel,
 			       int comp_vector);
+struct ibv_cq *mlx5_create_cq_ex(struct ibv_context *context,
+				 struct ibv_create_cq_attr_ex *cq_attr);
 int mlx5_poll_cq_ex(struct ibv_cq *ibcq, struct ibv_wc_ex *wc,
 		    struct ibv_poll_cq_ex_attr *attr);
 int mlx5_poll_cq_v1_ex(struct ibv_cq *ibcq, struct ibv_wc_ex *wc,
diff --git a/src/verbs.c b/src/verbs.c
index 92f273d..1dbee60 100644
--- a/src/verbs.c
+++ b/src/verbs.c
@@ -240,9 +240,21 @@ static int qp_sig_enabled(void)
 	return 0;
 }
 
-struct ibv_cq *mlx5_create_cq(struct ibv_context *context, int cqe,
-			      struct ibv_comp_channel *channel,
-			      int comp_vector)
+enum {
+	CREATE_CQ_SUPPORTED_WC_FLAGS = IBV_WC_STANDARD_FLAGS	|
+				       IBV_WC_EX_WITH_COMPLETION_TIMESTAMP
+};
+
+enum {
+	CREATE_CQ_SUPPORTED_COMP_MASK = IBV_CREATE_CQ_ATTR_FLAGS
+};
+
+enum {
+	CREATE_CQ_SUPPORTED_FLAGS = IBV_CREATE_CQ_ATTR_COMPLETION_TIMESTAMP
+};
+
+static struct ibv_cq *create_cq(struct ibv_context *context,
+				const struct ibv_create_cq_attr_ex *cq_attr)
 {
 	struct mlx5_create_cq		cmd;
 	struct mlx5_create_cq_resp	resp;
@@ -254,12 +266,33 @@ struct ibv_cq *mlx5_create_cq(struct ibv_context *context, int cqe,
 	FILE *fp = to_mctx(context)->dbg_fp;
 #endif
 
-	if (!cqe) {
-		mlx5_dbg(fp, MLX5_DBG_CQ, "\n");
+	if (!cq_attr->cqe) {
+		mlx5_dbg(fp, MLX5_DBG_CQ, "CQE invalid\n");
+		errno = EINVAL;
+		return NULL;
+	}
+
+	if (cq_attr->comp_mask & ~CREATE_CQ_SUPPORTED_COMP_MASK) {
+		mlx5_dbg(fp, MLX5_DBG_CQ,
+			 "Unsupported comp_mask for create_cq\n");
+		errno = EINVAL;
+		return NULL;
+	}
+
+	if (cq_attr->comp_mask & IBV_CREATE_CQ_ATTR_FLAGS &&
+	    cq_attr->flags & ~CREATE_CQ_SUPPORTED_FLAGS) {
+		mlx5_dbg(fp, MLX5_DBG_CQ,
+			 "Unsupported creation flags requested for create_cq\n");
 		errno = EINVAL;
 		return NULL;
 	}
 
+	if (cq_attr->wc_flags & ~CREATE_CQ_SUPPORTED_WC_FLAGS) {
+		mlx5_dbg(fp, MLX5_DBG_CQ, "\n");
+		errno = ENOTSUP;
+		return NULL;
+	}
+
 	cq =  calloc(1, sizeof *cq);
 	if (!cq) {
 		mlx5_dbg(fp, MLX5_DBG_CQ, "\n");
@@ -273,14 +306,14 @@ struct ibv_cq *mlx5_create_cq(struct ibv_context *context, int cqe,
 		goto err;
 
 	/* The additional entry is required for resize CQ */
-	if (cqe <= 0) {
+	if (cq_attr->cqe <= 0) {
 		mlx5_dbg(fp, MLX5_DBG_CQ, "\n");
 		errno = EINVAL;
 		goto err_spl;
 	}
 
-	ncqe = align_queue_size(cqe + 1);
-	if ((ncqe > (1 << 24)) || (ncqe < (cqe + 1))) {
+	ncqe = align_queue_size(cq_attr->cqe + 1);
+	if ((ncqe > (1 << 24)) || (ncqe < (cq_attr->cqe + 1))) {
 		mlx5_dbg(fp, MLX5_DBG_CQ, "ncqe %d\n", ncqe);
 		errno = EINVAL;
 		goto err_spl;
@@ -313,7 +346,8 @@ struct ibv_cq *mlx5_create_cq(struct ibv_context *context, int cqe,
 	cmd.db_addr  = (uintptr_t) cq->dbrec;
 	cmd.cqe_size = cqe_sz;
 
-	ret = ibv_cmd_create_cq(context, ncqe - 1, channel, comp_vector,
+	ret = ibv_cmd_create_cq(context, ncqe - 1, cq_attr->channel,
+				cq_attr->comp_vector,
 				&cq->ibv_cq, &cmd.ibv_cmd, sizeof cmd,
 				&resp.ibv_resp, sizeof resp);
 	if (ret) {
@@ -328,6 +362,9 @@ struct ibv_cq *mlx5_create_cq(struct ibv_context *context, int cqe,
 	cq->stall_adaptive_enable = to_mctx(context)->stall_adaptive_enable;
 	cq->stall_cycles = to_mctx(context)->stall_cycles;
 
+	cq->wc_flags = cq_attr->wc_flags;
+	cq->poll_one = mlx5_poll_one_ex;
+
 	return &cq->ibv_cq;
 
 err_db:
@@ -345,6 +382,23 @@ err:
 	return NULL;
 }
 
+struct ibv_cq *mlx5_create_cq(struct ibv_context *context, int cqe,
+			      struct ibv_comp_channel *channel,
+			      int comp_vector)
+{
+	struct ibv_create_cq_attr_ex cq_attr = {.cqe = cqe, .channel = channel,
+						.comp_vector = comp_vector,
+						.wc_flags = IBV_WC_STANDARD_FLAGS};
+
+	return create_cq(context, &cq_attr);
+}
+
+struct ibv_cq *mlx5_create_cq_ex(struct ibv_context *context,
+				 struct ibv_create_cq_attr_ex *cq_attr)
+{
+	return create_cq(context, cq_attr);
+}
+
 int mlx5_resize_cq(struct ibv_cq *ibcq, int cqe)
 {
 	struct mlx5_cq *cq = to_mcq(ibcq);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH libmlx5 V1 4/6] Add ibv_query_values support
       [not found] ` <1449158571-26228-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-12-03 16:02   ` [PATCH libmlx5 V1 3/6] Add ibv_create_cq_ex support Matan Barak
@ 2015-12-03 16:02   ` Matan Barak
  2015-12-03 16:02   ` [PATCH libmlx5 V1 5/6] Optimize poll_cq Matan Barak
  2015-12-03 16:02   ` [PATCH libmlx5 V1 6/6] Add always_inline check Matan Barak
  5 siblings, 0 replies; 9+ messages in thread
From: Matan Barak @ 2015-12-03 16:02 UTC (permalink / raw)
  To: Eli Cohen
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Matan Barak,
	Eran Ben Elisha, Christoph Lameter

In order to query the current HCA's core clock, libmlx5 should
support ibv_query_values verb. Querying the hardware's cycles
register is done by mmaping this register to user-space.
Therefore, when libmlx5 initializes we mmap the cycles register.
This assumes the machine's architecture places the PCI and memory in
the same address space.
The page offset is passed through init_context vendor's data.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 src/mlx5-abi.h | 10 +++++++++-
 src/mlx5.c     | 37 +++++++++++++++++++++++++++++++++++++
 src/mlx5.h     | 10 +++++++++-
 src/verbs.c    | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/src/mlx5-abi.h b/src/mlx5-abi.h
index 769ea81..43d4906 100644
--- a/src/mlx5-abi.h
+++ b/src/mlx5-abi.h
@@ -55,7 +55,11 @@ struct mlx5_alloc_ucontext {
 	__u32				total_num_uuars;
 	__u32				num_low_latency_uuars;
 	__u32				flags;
-	__u32				reserved;
+	__u32				comp_mask;
+};
+
+enum mlx5_ib_alloc_ucontext_resp_mask {
+	MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_CORE_CLOCK_OFFSET = 1UL << 0,
 };
 
 struct mlx5_alloc_ucontext_resp {
@@ -72,6 +76,10 @@ struct mlx5_alloc_ucontext_resp {
 	__u16				num_ports;
 	__u8				cqe_version;
 	__u8				reserved;
+	__u32				comp_mask;
+	__u32				response_length;
+	__u32				reserved2;
+	__u64				hca_core_clock_offset;
 };
 
 struct mlx5_alloc_pd_resp {
diff --git a/src/mlx5.c b/src/mlx5.c
index 229d99d..c455c08 100644
--- a/src/mlx5.c
+++ b/src/mlx5.c
@@ -524,6 +524,30 @@ static int single_threaded_app(void)
 	return 0;
 }
 
+static int mlx5_map_internal_clock(struct mlx5_device *mdev,
+				   struct ibv_context *ibv_ctx)
+{
+	struct mlx5_context *context = to_mctx(ibv_ctx);
+	void *hca_clock_page;
+	off_t offset = 0;
+
+	set_command(MLX5_MMAP_GET_CORE_CLOCK_CMD, &offset);
+	hca_clock_page = mmap(NULL, mdev->page_size,
+			      PROT_READ, MAP_SHARED, ibv_ctx->cmd_fd,
+			      mdev->page_size * offset);
+
+	if (hca_clock_page == MAP_FAILED) {
+		fprintf(stderr, PFX
+			"Warning: Timestamp available,\n"
+			"but failed to mmap() hca core clock page.\n");
+		return -1;
+	}
+
+	context->hca_core_clock = hca_clock_page +
+		(context->core_clock.offset & (mdev->page_size - 1));
+	return 0;
+}
+
 static int mlx5_init_context(struct verbs_device *vdev,
 			     struct ibv_context *ctx, int cmd_fd)
 {
@@ -647,6 +671,15 @@ static int mlx5_init_context(struct verbs_device *vdev,
 		context->bfs[j].uuarn = j;
 	}
 
+	context->hca_core_clock = NULL;
+	if (resp.response_length + sizeof(resp.ibv_resp) >=
+	    offsetof(struct mlx5_alloc_ucontext_resp, hca_core_clock_offset) +
+	    sizeof(resp.hca_core_clock_offset) &&
+	    resp.comp_mask & MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_CORE_CLOCK_OFFSET) {
+		context->core_clock.offset = resp.hca_core_clock_offset;
+		mlx5_map_internal_clock(mdev, ctx);
+	}
+
 	mlx5_spinlock_init(&context->lock32);
 
 	context->prefer_bf = get_always_bf();
@@ -664,6 +697,7 @@ static int mlx5_init_context(struct verbs_device *vdev,
 	verbs_set_ctx_op(v_ctx, create_srq_ex, mlx5_create_srq_ex);
 	verbs_set_ctx_op(v_ctx, get_srq_num, mlx5_get_srq_num);
 	verbs_set_ctx_op(v_ctx, query_device_ex, mlx5_query_device_ex);
+	verbs_set_ctx_op(v_ctx, query_values, mlx5_query_values);
 	verbs_set_ctx_op(v_ctx, create_cq_ex, mlx5_create_cq_ex);
 	if (context->cqe_version && context->cqe_version == 1)
 		verbs_set_ctx_op(v_ctx, poll_cq_ex, mlx5_poll_cq_v1_ex);
@@ -697,6 +731,9 @@ static void mlx5_cleanup_context(struct verbs_device *device,
 		if (context->uar[i])
 			munmap(context->uar[i], page_size);
 	}
+	if (context->hca_core_clock)
+		munmap(context->hca_core_clock - context->core_clock.offset,
+		       page_size);
 	close_debug_file(context);
 }
 
diff --git a/src/mlx5.h b/src/mlx5.h
index 0c0b027..b5bcfaa 100644
--- a/src/mlx5.h
+++ b/src/mlx5.h
@@ -117,7 +117,8 @@ enum {
 
 enum {
 	MLX5_MMAP_GET_REGULAR_PAGES_CMD    = 0,
-	MLX5_MMAP_GET_CONTIGUOUS_PAGES_CMD = 1
+	MLX5_MMAP_GET_CONTIGUOUS_PAGES_CMD = 1,
+	MLX5_MMAP_GET_CORE_CLOCK_CMD    = 5
 };
 
 #define MLX5_CQ_PREFIX "MLX_CQ"
@@ -307,6 +308,11 @@ struct mlx5_context {
 	struct mlx5_spinlock            hugetlb_lock;
 	struct list_head                hugetlb_list;
 	uint8_t				cqe_version;
+	struct {
+		uint64_t                offset;
+		uint64_t                mask;
+	} core_clock;
+	void			       *hca_core_clock;
 };
 
 struct mlx5_bitmap {
@@ -585,6 +591,8 @@ int mlx5_query_device_ex(struct ibv_context *context,
 			 const struct ibv_query_device_ex_input *input,
 			 struct ibv_device_attr_ex *attr,
 			 size_t attr_size);
+int mlx5_query_values(struct ibv_context *context,
+		      struct ibv_values_ex *values);
 struct ibv_qp *mlx5_create_qp_ex(struct ibv_context *context,
 				 struct ibv_qp_init_attr_ex *attr);
 int mlx5_query_port(struct ibv_context *context, uint8_t port,
diff --git a/src/verbs.c b/src/verbs.c
index 1dbee60..5d732a2 100644
--- a/src/verbs.c
+++ b/src/verbs.c
@@ -79,6 +79,52 @@ int mlx5_query_device(struct ibv_context *context, struct ibv_device_attr *attr)
 	return 0;
 }
 
+#define READL(ptr) (*((uint32_t *)(ptr)))
+static int mlx5_read_clock(struct ibv_context *context, uint64_t *cycles)
+{
+	unsigned int clockhi, clocklo, clockhi1;
+	int i;
+	struct mlx5_context *ctx = to_mctx(context);
+
+	if (!ctx->hca_core_clock)
+		return -EOPNOTSUPP;
+
+	/* Handle wraparound */
+	for (i = 0; i < 2; i++) {
+		clockhi = ntohl(READL(ctx->hca_core_clock));
+		clocklo = ntohl(READL(ctx->hca_core_clock + 4));
+		clockhi1 = ntohl(READL(ctx->hca_core_clock));
+		if (clockhi == clockhi1)
+			break;
+	}
+
+	*cycles = (uint64_t)clockhi << 32 | (uint64_t)clocklo;
+
+	return 0;
+}
+
+int mlx5_query_values(struct ibv_context *context,
+		      struct ibv_values_ex *values)
+{
+	uint32_t comp_mask = 0;
+	int err = 0;
+
+	if (values->comp_mask & IBV_VALUES_MASK_RAW_CLOCK) {
+		uint64_t cycles;
+
+		err = mlx5_read_clock(context, &cycles);
+		if (!err) {
+			values->raw_clock.tv_sec = 0;
+			values->raw_clock.tv_nsec = cycles;
+			comp_mask |= IBV_VALUES_MASK_RAW_CLOCK;
+		}
+	}
+
+	values->comp_mask = comp_mask;
+
+	return err;
+}
+
 int mlx5_query_port(struct ibv_context *context, uint8_t port,
 		     struct ibv_port_attr *attr)
 {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH libmlx5 V1 5/6] Optimize poll_cq
       [not found] ` <1449158571-26228-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-12-03 16:02   ` [PATCH libmlx5 V1 4/6] Add ibv_query_values support Matan Barak
@ 2015-12-03 16:02   ` Matan Barak
  2015-12-03 16:02   ` [PATCH libmlx5 V1 6/6] Add always_inline check Matan Barak
  5 siblings, 0 replies; 9+ messages in thread
From: Matan Barak @ 2015-12-03 16:02 UTC (permalink / raw)
  To: Eli Cohen
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Matan Barak,
	Eran Ben Elisha, Christoph Lameter

The current ibv_poll_cq_ex mechanism needs to query every field
for its existence. In order to avoid this penalty at runtime,
add optimized functions for special cases.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 src/cq.c    | 363 +++++++++++++++++++++++++++++++++++++++++++++++++-----------
 src/mlx5.h  |  10 ++
 src/verbs.c |   9 +-
 3 files changed, 310 insertions(+), 72 deletions(-)

diff --git a/src/cq.c b/src/cq.c
index 5e06990..fcb4237 100644
--- a/src/cq.c
+++ b/src/cq.c
@@ -41,6 +41,7 @@
 #include <netinet/in.h>
 #include <string.h>
 #include <errno.h>
+#include <assert.h>
 #include <unistd.h>
 
 #include <infiniband/opcode.h>
@@ -207,73 +208,91 @@ union wc_buffer {
 	uint64_t	*b64;
 };
 
+#define IS_IN_WC_FLAGS(yes, no, maybe, flag) (((yes) & (flag)) ||    \
+					      (!((no) & (flag)) && \
+					       ((maybe) & (flag))))
 static inline void handle_good_req_ex(struct ibv_wc_ex *wc_ex,
 				      union wc_buffer *pwc_buffer,
 				      struct mlx5_cqe64 *cqe,
 				      uint64_t wc_flags,
-				      uint32_t qpn)
+				      uint64_t wc_flags_yes,
+				      uint64_t wc_flags_no,
+				      uint32_t qpn, uint64_t *wc_flags_out)
 {
 	union wc_buffer wc_buffer = *pwc_buffer;
 
 	switch (ntohl(cqe->sop_drop_qpn) >> 24) {
 	case MLX5_OPCODE_RDMA_WRITE_IMM:
-		wc_ex->wc_flags |= IBV_WC_EX_IMM;
+		*wc_flags_out |= IBV_WC_EX_IMM;
 	case MLX5_OPCODE_RDMA_WRITE:
 		wc_ex->opcode    = IBV_WC_RDMA_WRITE;
-		if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_BYTE_LEN))
 			wc_buffer.b32++;
-		if (wc_flags & IBV_WC_EX_WITH_IMM)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_IMM))
 			wc_buffer.b32++;
 		break;
 	case MLX5_OPCODE_SEND_IMM:
-		wc_ex->wc_flags |= IBV_WC_EX_IMM;
+		*wc_flags_out |= IBV_WC_EX_IMM;
 	case MLX5_OPCODE_SEND:
 	case MLX5_OPCODE_SEND_INVAL:
 		wc_ex->opcode    = IBV_WC_SEND;
-		if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_BYTE_LEN))
 			wc_buffer.b32++;
-		if (wc_flags & IBV_WC_EX_WITH_IMM)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_IMM))
 			wc_buffer.b32++;
 		break;
 	case MLX5_OPCODE_RDMA_READ:
 		wc_ex->opcode    = IBV_WC_RDMA_READ;
-		if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN) {
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_BYTE_LEN)) {
 			*wc_buffer.b32++ = ntohl(cqe->byte_cnt);
-			wc_ex->wc_flags |= IBV_WC_EX_WITH_BYTE_LEN;
+			*wc_flags_out |= IBV_WC_EX_WITH_BYTE_LEN;
 		}
-		if (wc_flags & IBV_WC_EX_WITH_IMM)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_IMM))
 			wc_buffer.b32++;
 		break;
 	case MLX5_OPCODE_ATOMIC_CS:
 		wc_ex->opcode    = IBV_WC_COMP_SWAP;
-		if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN) {
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_BYTE_LEN)) {
 			*wc_buffer.b32++ = 8;
-			wc_ex->wc_flags |= IBV_WC_EX_WITH_BYTE_LEN;
+			*wc_flags_out |= IBV_WC_EX_WITH_BYTE_LEN;
 		}
-		if (wc_flags & IBV_WC_EX_WITH_IMM)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_IMM))
 			wc_buffer.b32++;
 		break;
 	case MLX5_OPCODE_ATOMIC_FA:
 		wc_ex->opcode    = IBV_WC_FETCH_ADD;
-		if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN) {
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_BYTE_LEN)) {
 			*wc_buffer.b32++ = 8;
-			wc_ex->wc_flags |= IBV_WC_EX_WITH_BYTE_LEN;
+			*wc_flags_out |= IBV_WC_EX_WITH_BYTE_LEN;
 		}
-		if (wc_flags & IBV_WC_EX_WITH_IMM)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_IMM))
 			wc_buffer.b32++;
 		break;
 	case MLX5_OPCODE_BIND_MW:
 		wc_ex->opcode    = IBV_WC_BIND_MW;
-		if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_BYTE_LEN))
 			wc_buffer.b32++;
-		if (wc_flags & IBV_WC_EX_WITH_IMM)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_IMM))
 			wc_buffer.b32++;
 		break;
 	}
 
-	if (wc_flags & IBV_WC_EX_WITH_QP_NUM) {
+	if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+			   IBV_WC_EX_WITH_QP_NUM)) {
 		*wc_buffer.b32++ = qpn;
-		wc_ex->wc_flags |= IBV_WC_EX_WITH_QP_NUM;
+		*wc_flags_out |= IBV_WC_EX_WITH_QP_NUM;
 	}
 
 	*pwc_buffer = wc_buffer;
@@ -345,7 +364,9 @@ static inline int handle_responder_ex(struct ibv_wc_ex *wc_ex,
 				      union wc_buffer *pwc_buffer,
 				      struct mlx5_cqe64 *cqe,
 				      struct mlx5_qp *qp, struct mlx5_srq *srq,
-				      uint64_t wc_flags, uint32_t qpn)
+				      uint64_t wc_flags, uint64_t wc_flags_yes,
+				      uint64_t wc_flags_no, uint32_t qpn,
+				      uint64_t *wc_flags_out)
 {
 	uint16_t wqe_ctr;
 	struct mlx5_wq *wq;
@@ -354,9 +375,10 @@ static inline int handle_responder_ex(struct ibv_wc_ex *wc_ex,
 	int err = 0;
 	uint32_t byte_len = ntohl(cqe->byte_cnt);
 
-	if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN) {
+	if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+			   IBV_WC_EX_WITH_BYTE_LEN)) {
 		*wc_buffer.b32++ = byte_len;
-		wc_ex->wc_flags |= IBV_WC_EX_WITH_BYTE_LEN;
+		*wc_flags_out |= IBV_WC_EX_WITH_BYTE_LEN;
 	}
 	if (srq) {
 		wqe_ctr = ntohs(cqe->wqe_counter);
@@ -386,53 +408,62 @@ static inline int handle_responder_ex(struct ibv_wc_ex *wc_ex,
 	switch (cqe->op_own >> 4) {
 	case MLX5_CQE_RESP_WR_IMM:
 		wc_ex->opcode	= IBV_WC_RECV_RDMA_WITH_IMM;
-		wc_ex->wc_flags	= IBV_WC_EX_IMM;
-		if (wc_flags & IBV_WC_EX_WITH_IMM) {
+		*wc_flags_out	= IBV_WC_EX_IMM;
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_IMM)) {
 			*wc_buffer.b32++ = ntohl(cqe->byte_cnt);
-			wc_ex->wc_flags |= IBV_WC_EX_WITH_IMM;
+			*wc_flags_out |= IBV_WC_EX_WITH_IMM;
 		}
 		break;
 	case MLX5_CQE_RESP_SEND:
 		wc_ex->opcode   = IBV_WC_RECV;
-		if (wc_flags & IBV_WC_EX_WITH_IMM)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_IMM))
 			wc_buffer.b32++;
 		break;
 	case MLX5_CQE_RESP_SEND_IMM:
 		wc_ex->opcode	= IBV_WC_RECV;
-		wc_ex->wc_flags	= IBV_WC_EX_WITH_IMM;
-		if (wc_flags & IBV_WC_EX_WITH_IMM) {
+		*wc_flags_out	= IBV_WC_EX_WITH_IMM;
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_IMM)) {
 			*wc_buffer.b32++ = ntohl(cqe->imm_inval_pkey);
-			wc_ex->wc_flags |= IBV_WC_EX_WITH_IMM;
+			*wc_flags_out |= IBV_WC_EX_WITH_IMM;
 		}
 		break;
 	}
-	if (wc_flags & IBV_WC_EX_WITH_QP_NUM) {
+	if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+			   IBV_WC_EX_WITH_QP_NUM)) {
 		*wc_buffer.b32++ = qpn;
-		wc_ex->wc_flags |= IBV_WC_EX_WITH_QP_NUM;
+		*wc_flags_out |= IBV_WC_EX_WITH_QP_NUM;
 	}
-	if (wc_flags & IBV_WC_EX_WITH_SRC_QP) {
+	if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+			   IBV_WC_EX_WITH_SRC_QP)) {
 		*wc_buffer.b32++ = ntohl(cqe->flags_rqpn) & 0xffffff;
-		wc_ex->wc_flags |= IBV_WC_EX_WITH_SRC_QP;
+		*wc_flags_out |= IBV_WC_EX_WITH_SRC_QP;
 	}
-	if (wc_flags & IBV_WC_EX_WITH_PKEY_INDEX) {
+	if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+			   IBV_WC_EX_WITH_PKEY_INDEX)) {
 		*wc_buffer.b16++ = ntohl(cqe->imm_inval_pkey) & 0xffff;
-		wc_ex->wc_flags |= IBV_WC_EX_WITH_PKEY_INDEX;
+		*wc_flags_out |= IBV_WC_EX_WITH_PKEY_INDEX;
 	}
-	if (wc_flags & IBV_WC_EX_WITH_SLID) {
+	if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+			   IBV_WC_EX_WITH_SLID)) {
 		*wc_buffer.b16++ = ntohs(cqe->slid);
-		wc_ex->wc_flags |= IBV_WC_EX_WITH_SLID;
+		*wc_flags_out |= IBV_WC_EX_WITH_SLID;
 	}
-	if (wc_flags & IBV_WC_EX_WITH_SL) {
+	if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+			   IBV_WC_EX_WITH_SL)) {
 		*wc_buffer.b8++ = (ntohl(cqe->flags_rqpn) >> 24) & 0xf;
-		wc_ex->wc_flags |= IBV_WC_EX_WITH_SL;
+		*wc_flags_out |= IBV_WC_EX_WITH_SL;
 	}
-	if (wc_flags & IBV_WC_EX_WITH_DLID_PATH_BITS) {
+	if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+			   IBV_WC_EX_WITH_DLID_PATH_BITS)) {
 		*wc_buffer.b8++ = cqe->ml_path & 0x7f;
-		wc_ex->wc_flags |= IBV_WC_EX_WITH_DLID_PATH_BITS;
+		*wc_flags_out |= IBV_WC_EX_WITH_DLID_PATH_BITS;
 	}
 
 	g = (ntohl(cqe->flags_rqpn) >> 28) & 3;
-	wc_ex->wc_flags |= g ? IBV_WC_EX_GRH : 0;
+	*wc_flags_out |= g ? IBV_WC_EX_GRH : 0;
 
 	*pwc_buffer = wc_buffer;
 	return IBV_WC_SUCCESS;
@@ -795,6 +826,9 @@ inline int mlx5_poll_one_cqe_err(struct mlx5_context *mctx,
 	return err;
 }
 
+#define IS_IN_WC_FLAGS(yes, no, maybe, flag) (((yes) & (flag)) ||    \
+					      (!((no) & (flag)) && \
+					       ((maybe) & (flag))))
 static inline int mlx5_poll_one(struct mlx5_cq *cq,
 			 struct mlx5_resource **cur_rsc,
 			 struct mlx5_srq **cur_srq,
@@ -874,11 +908,21 @@ static inline int mlx5_poll_one(struct mlx5_cq *cq,
 	return CQ_OK;
 }
 
-inline int mlx5_poll_one_ex(struct mlx5_cq *cq,
-			    struct mlx5_resource **cur_rsc,
-			    struct mlx5_srq **cur_srq,
-			    struct ibv_wc_ex **pwc_ex, uint64_t wc_flags,
-			    int cqe_ver)
+static inline int _mlx5_poll_one_ex(struct mlx5_cq *cq,
+				    struct mlx5_resource **cur_rsc,
+				    struct mlx5_srq **cur_srq,
+				    struct ibv_wc_ex **pwc_ex,
+				    uint64_t wc_flags,
+				    uint64_t wc_flags_yes, uint64_t wc_flags_no,
+				    int cqe_ver)
+	__attribute__((always_inline));
+static inline int _mlx5_poll_one_ex(struct mlx5_cq *cq,
+				    struct mlx5_resource **cur_rsc,
+				    struct mlx5_srq **cur_srq,
+				    struct ibv_wc_ex **pwc_ex,
+				    uint64_t wc_flags,
+				    uint64_t wc_flags_yes, uint64_t wc_flags_no,
+				    int cqe_ver)
 {
 	struct mlx5_cqe64 *cqe64;
 	void *cqe;
@@ -888,6 +932,7 @@ inline int mlx5_poll_one_ex(struct mlx5_cq *cq,
 	struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context);
 	struct ibv_wc_ex *wc_ex = *pwc_ex;
 	union wc_buffer wc_buffer;
+	uint64_t wc_flags_out = 0;
 
 	cqe = next_cqe_sw(cq);
 	if (!cqe)
@@ -913,26 +958,34 @@ inline int mlx5_poll_one_ex(struct mlx5_cq *cq,
 	wc_ex->wc_flags = 0;
 	wc_ex->reserved = 0;
 
-	if (wc_flags & IBV_WC_EX_WITH_COMPLETION_TIMESTAMP) {
+	if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+			   IBV_WC_EX_WITH_COMPLETION_TIMESTAMP)) {
 		*wc_buffer.b64++ = ntohll(cqe64->timestamp);
-		wc_ex->wc_flags |= IBV_WC_EX_WITH_COMPLETION_TIMESTAMP;
+		wc_flags_out |= IBV_WC_EX_WITH_COMPLETION_TIMESTAMP;
 	}
 
 	switch (opcode) {
 	case MLX5_CQE_REQ:
 		err = mlx5_poll_one_cqe_req(cq, cur_rsc, cqe, qpn, cqe_ver,
 					    &wc_ex->wr_id);
-		handle_good_req_ex(wc_ex, &wc_buffer, cqe64, wc_flags, qpn);
+		handle_good_req_ex(wc_ex, &wc_buffer, cqe64, wc_flags,
+				   wc_flags_yes, wc_flags_no, qpn,
+				   &wc_flags_out);
 		wc_ex->status = err;
-		if (wc_flags & IBV_WC_EX_WITH_SRC_QP)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_SRC_QP))
 			wc_buffer.b32++;
-		if (wc_flags & IBV_WC_EX_WITH_PKEY_INDEX)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_PKEY_INDEX))
 			wc_buffer.b16++;
-		if (wc_flags & IBV_WC_EX_WITH_SLID)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_SLID))
 			wc_buffer.b16++;
-		if (wc_flags & IBV_WC_EX_WITH_SL)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_SL))
 			wc_buffer.b8++;
-		if (wc_flags & IBV_WC_EX_WITH_DLID_PATH_BITS)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_DLID_PATH_BITS))
 			wc_buffer.b8++;
 		break;
 
@@ -950,7 +1003,9 @@ inline int mlx5_poll_one_ex(struct mlx5_cq *cq,
 		wc_ex->status = handle_responder_ex(wc_ex, &wc_buffer, cqe64,
 						    rsc_to_mqp(*cur_rsc),
 						    is_srq ? *cur_srq : NULL,
-						    wc_flags, qpn);
+						    wc_flags, wc_flags_yes,
+						    wc_flags_no, qpn,
+						    &wc_flags_out);
 		break;
 	}
 	case MLX5_CQE_REQ_ERR:
@@ -963,32 +1018,208 @@ inline int mlx5_poll_one_ex(struct mlx5_cq *cq,
 			return err;
 
 	case MLX5_CQE_RESIZE_CQ:
-		if (wc_flags & IBV_WC_EX_WITH_BYTE_LEN)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_BYTE_LEN))
 			wc_buffer.b32++;
-		if (wc_flags & IBV_WC_EX_WITH_IMM)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_IMM))
 			wc_buffer.b32++;
-		if (wc_flags & IBV_WC_EX_WITH_QP_NUM) {
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_QP_NUM)) {
 			*wc_buffer.b32++ = qpn;
-			wc_ex->wc_flags |= IBV_WC_EX_WITH_QP_NUM;
+			wc_flags_out |= IBV_WC_EX_WITH_QP_NUM;
 		}
-		if (wc_flags & IBV_WC_EX_WITH_SRC_QP)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_SRC_QP))
 			wc_buffer.b32++;
-		if (wc_flags & IBV_WC_EX_WITH_PKEY_INDEX)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_PKEY_INDEX))
 			wc_buffer.b16++;
-		if (wc_flags & IBV_WC_EX_WITH_SLID)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_SLID))
 			wc_buffer.b16++;
-		if (wc_flags & IBV_WC_EX_WITH_SL)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_SL))
 			wc_buffer.b8++;
-		if (wc_flags & IBV_WC_EX_WITH_DLID_PATH_BITS)
+		if (IS_IN_WC_FLAGS(wc_flags_yes, wc_flags_no, wc_flags,
+				   IBV_WC_EX_WITH_DLID_PATH_BITS))
 			wc_buffer.b8++;
 		break;
 	}
 
+	wc_ex->wc_flags = wc_flags_out;
 	*pwc_ex = (struct ibv_wc_ex *)((uintptr_t)(wc_buffer.b8 + sizeof(uint64_t) - 1) &
 				       ~(sizeof(uint64_t) - 1));
 	return CQ_OK;
 }
 
+int mlx5_poll_one_ex(struct mlx5_cq *cq,
+		     struct mlx5_resource **cur_rsc,
+		     struct mlx5_srq **cur_srq,
+		     struct ibv_wc_ex **pwc_ex, uint64_t wc_flags,
+		     int cqe_ver)
+{
+	return _mlx5_poll_one_ex(cq, cur_rsc, cur_srq, pwc_ex, wc_flags, 0, 0,
+				 cqe_ver);
+}
+
+#define MLX5_POLL_ONE_EX_WC_FLAGS_NAME(wc_flags_yes, wc_flags_no) \
+	mlx5_poll_one_ex_custom##wc_flags_yes ## _ ## wc_flags_no
+
+/* The compiler will create one function per wc_flags combination. Since
+ * _mlx5_poll_one_ex  is always inlined (for compilers that supports that),
+ * the compiler drops the if statements and merge all wc_flags_out ORs/ANDs.
+ */
+#define MLX5_POLL_ONE_EX_WC_FLAGS(wc_flags_yes, wc_flags_no)	\
+static int MLX5_POLL_ONE_EX_WC_FLAGS_NAME(wc_flags_yes, wc_flags_no)		\
+						(struct mlx5_cq *cq,		\
+						 struct mlx5_resource **cur_rsc,\
+						 struct mlx5_srq **cur_srq,	\
+						 struct ibv_wc_ex **pwc_ex,	\
+						 uint64_t wc_flags,		\
+						 int cqe_ver)			\
+{									        \
+	return _mlx5_poll_one_ex(cq, cur_rsc, cur_srq, pwc_ex, wc_flags,        \
+				 wc_flags_yes, wc_flags_no, cqe_ver);	        \
+}
+
+/*
+	Since we use the preprocessor here, we have to calculate the Or value
+	ourselves:
+	IBV_WC_EX_GRH			= 1 << 0,
+	IBV_WC_EX_IMM			= 1 << 1,
+	IBV_WC_EX_WITH_BYTE_LEN		= 1 << 2,
+	IBV_WC_EX_WITH_IMM		= 1 << 3,
+	IBV_WC_EX_WITH_QP_NUM		= 1 << 4,
+	IBV_WC_EX_WITH_SRC_QP		= 1 << 5,
+	IBV_WC_EX_WITH_PKEY_INDEX	= 1 << 6,
+	IBV_WC_EX_WITH_SLID		= 1 << 7,
+	IBV_WC_EX_WITH_SL		= 1 << 8,
+	IBV_WC_EX_WITH_DLID_PATH_BITS	= 1 << 9,
+	IBV_WC_EX_WITH_COMPLETION_TIMESTAMP = 1 << 10,
+*/
+
+/* Bitwise or of all flags between IBV_WC_EX_WITH_BYTE_LEN and
+ * IBV_WC_EX_WITH_COMPLETION_TIMESTAMP.
+ */
+#define SUPPORTED_WC_ALL_FLAGS	2045
+/* Bitwise or of all flags between IBV_WC_EX_WITH_BYTE_LEN and
+ * IBV_WC_EX_WITH_DLID_PATH_BITS (all the fields that are available
+ * in the legacy WC).
+ */
+#define SUPPORTED_WC_STD_FLAGS  1020
+
+#define OPTIMIZE_POLL_CQ	/* All maybe - must be in table! */	    \
+				OP(0, 0)				SEP \
+				/* No options */			    \
+				OP(0, SUPPORTED_WC_ALL_FLAGS)		SEP \
+				/* All options */			    \
+				OP(SUPPORTED_WC_ALL_FLAGS, 0)		SEP \
+				/* All standard options */		    \
+				OP(SUPPORTED_WC_STD_FLAGS, 1024)	SEP \
+				/* Just Bytelen - for DPDK */		    \
+				OP(4, 1016)				SEP \
+				/* Timestmap only, for FSI */		    \
+				OP(1024, 1020)				SEP
+
+#define OP	MLX5_POLL_ONE_EX_WC_FLAGS
+#define SEP	;
+
+/* Declare optimized poll_one function for popular scenarios. Each function
+ * has a name of
+ * mlx5_poll_one_ex_custom<supported_wc_flags>_<not_supported_wc_flags>.
+ * Since the supported and not supported wc_flags are given beforehand,
+ * the compiler could optimize the if and or statements and create optimized
+ * code.
+ */
+OPTIMIZE_POLL_CQ
+
+#define ADD_POLL_ONE(_wc_flags_yes, _wc_flags_no)			\
+				{.wc_flags_yes = _wc_flags_yes,		\
+				 .wc_flags_no = _wc_flags_no,		\
+				 .fn = MLX5_POLL_ONE_EX_WC_FLAGS_NAME(  \
+					_wc_flags_yes, _wc_flags_no)	\
+				}
+
+#undef OP
+#undef SEP
+#define OP	ADD_POLL_ONE
+#define SEP	,
+
+struct {
+	int (*fn)(struct mlx5_cq *cq,
+		  struct mlx5_resource **cur_rsc,
+		  struct mlx5_srq **cur_srq,
+		  struct ibv_wc_ex **pwc_ex, uint64_t wc_flags,
+		  int cqe_ver);
+	uint64_t wc_flags_yes;
+	uint64_t wc_flags_no;
+} mlx5_poll_one_ex_fns[] = {
+	/* This array contains all the custom poll_one functions. Every entry
+	 * in this array looks like:
+	 * {.wc_flags_yes = <flags that are always in the wc>,
+	 *  .wc_flags_no = <flags that are never in the wc>,
+	 *  .fn = <the custom poll one function}.
+	 * The .fn function is optimized according to the .wc_flags_yes and
+	 * .wc_flags_no flags. Other flags have the "if statement".
+	 */
+	OPTIMIZE_POLL_CQ
+};
+
+/* This function gets wc_flags as an argument and returns a function pointer
+ * of type  *	int (*fn)(struct mlx5_cq *cq,
+		  struct mlx5_resource **cur_rsc,
+		  struct mlx5_srq **cur_srq,
+		  struct ibv_wc_ex **pwc_ex, uint64_t wc_flags,
+		  int cqe_ver);
+ * The returned function is one of the custom poll one functions declared in
+ * mlx5_poll_one_ex_fns. The function is chosen as the function which the
+ * number of wc_flags_maybe bits (the fields that aren't in the yes/no parts)
+ * is the smallest.
+ */
+int (*mlx5_get_poll_one_fn(uint64_t wc_flags))(struct mlx5_cq *cq,
+					       struct mlx5_resource **cur_rsc,
+					       struct mlx5_srq **cur_srq,
+					       struct ibv_wc_ex **pwc_ex, uint64_t wc_flags,
+					       int cqe_ver)
+{
+	unsigned int i = 0;
+	uint8_t min_bits = -1;
+	int min_index = 0xff;
+
+	for (i = 0;
+	     i < sizeof(mlx5_poll_one_ex_fns) / sizeof(mlx5_poll_one_ex_fns[0]);
+	     i++) {
+		uint64_t bits;
+		uint8_t nbits;
+
+		/* Can't have required flags in "no" */
+		if (wc_flags & mlx5_poll_one_ex_fns[i].wc_flags_no)
+			continue;
+
+		/* Can't have not required flags in yes */
+		if (~wc_flags & mlx5_poll_one_ex_fns[i].wc_flags_yes)
+			continue;
+
+		/* Number of wc_flags_maybe. See above comment for more details */
+		bits = (wc_flags  ^ mlx5_poll_one_ex_fns[i].wc_flags_yes) |
+		       ((~wc_flags ^ mlx5_poll_one_ex_fns[i].wc_flags_no) &
+			CREATE_CQ_SUPPORTED_WC_FLAGS);
+
+		nbits = ibv_popcount64(bits);
+
+		/* Look for the minimum number of bits */
+		if (nbits < min_bits) {
+			min_bits = nbits;
+			min_index = i;
+		}
+	}
+
+	assert(min_index >= 0);
+
+	return mlx5_poll_one_ex_fns[min_index].fn;
+}
+
 static inline void mlx5_poll_cq_stall_start(struct mlx5_cq *cq)
 __attribute__((always_inline));
 static inline void mlx5_poll_cq_stall_start(struct mlx5_cq *cq)
diff --git a/src/mlx5.h b/src/mlx5.h
index b5bcfaa..6c40d6d 100644
--- a/src/mlx5.h
+++ b/src/mlx5.h
@@ -109,6 +109,10 @@
 
 #define PFX		"mlx5: "
 
+enum {
+	CREATE_CQ_SUPPORTED_WC_FLAGS = IBV_WC_STANDARD_FLAGS	|
+				       IBV_WC_EX_WITH_COMPLETION_TIMESTAMP
+};
 
 enum {
 	MLX5_IB_MMAP_CMD_SHIFT	= 8,
@@ -619,6 +623,12 @@ int mlx5_poll_one_ex(struct mlx5_cq *cq,
 		     struct mlx5_srq **cur_srq,
 		     struct ibv_wc_ex **pwc_ex, uint64_t wc_flags,
 		     int cqe_ver);
+int (*mlx5_get_poll_one_fn(uint64_t wc_flags))(struct mlx5_cq *cq,
+					       struct mlx5_resource **cur_rsc,
+					       struct mlx5_srq **cur_srq,
+					       struct ibv_wc_ex **pwc_ex,
+					       uint64_t wc_flags,
+					       int cqe_ver);
 int mlx5_alloc_cq_buf(struct mlx5_context *mctx, struct mlx5_cq *cq,
 		      struct mlx5_buf *buf, int nent, int cqe_sz);
 int mlx5_free_cq_buf(struct mlx5_context *ctx, struct mlx5_buf *buf);
diff --git a/src/verbs.c b/src/verbs.c
index 5d732a2..504aacf 100644
--- a/src/verbs.c
+++ b/src/verbs.c
@@ -287,11 +287,6 @@ static int qp_sig_enabled(void)
 }
 
 enum {
-	CREATE_CQ_SUPPORTED_WC_FLAGS = IBV_WC_STANDARD_FLAGS	|
-				       IBV_WC_EX_WITH_COMPLETION_TIMESTAMP
-};
-
-enum {
 	CREATE_CQ_SUPPORTED_COMP_MASK = IBV_CREATE_CQ_ATTR_FLAGS
 };
 
@@ -409,7 +404,9 @@ static struct ibv_cq *create_cq(struct ibv_context *context,
 	cq->stall_cycles = to_mctx(context)->stall_cycles;
 
 	cq->wc_flags = cq_attr->wc_flags;
-	cq->poll_one = mlx5_poll_one_ex;
+	cq->poll_one = mlx5_get_poll_one_fn(cq->wc_flags);
+	if (!cq->poll_one)
+		cq->poll_one = mlx5_poll_one_ex;
 
 	return &cq->ibv_cq;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH libmlx5 V1 6/6] Add always_inline check
       [not found] ` <1449158571-26228-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-12-03 16:02   ` [PATCH libmlx5 V1 5/6] Optimize poll_cq Matan Barak
@ 2015-12-03 16:02   ` Matan Barak
       [not found]     ` <1449158571-26228-7-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  5 siblings, 1 reply; 9+ messages in thread
From: Matan Barak @ 2015-12-03 16:02 UTC (permalink / raw)
  To: Eli Cohen
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Matan Barak,
	Eran Ben Elisha, Christoph Lameter

Always inline isn't supported by every compiler. Adding it to
configure.ac in order to support it only when possible.
Inline other poll_one data path functions in order to eliminate
"ifs".

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 configure.ac | 17 +++++++++++++++++
 src/cq.c     | 42 +++++++++++++++++++++++++++++-------------
 src/mlx5.h   |  6 ++++++
 3 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/configure.ac b/configure.ac
index fca0b46..50b4f9c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -65,6 +65,23 @@ AC_CHECK_FUNC(ibv_read_sysfs_file, [],
     AC_MSG_ERROR([ibv_read_sysfs_file() not found.  libmlx5 requires libibverbs >= 1.0.3.]))
 AC_CHECK_FUNCS(ibv_dontfork_range ibv_dofork_range ibv_register_driver)
 
+AC_MSG_CHECKING("always inline")
+CFLAGS_BAK="$CFLAGS"
+CFLAGS="$CFLAGS -Werror"
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+	static inline int f(void)
+		__attribute__((always_inline));
+	static inline int f(void)
+	{
+		return 1;
+	}
+]],[[
+		int a = f();
+		a = a;
+]])], [AC_MSG_RESULT([yes]) AC_DEFINE([HAVE_ALWAYS_INLINE], [1], [Define if __attribute((always_inline)).])],
+[AC_MSG_RESULT([no])])
+CFLAGS="$CFLAGS_BAK"
+
 dnl Now check if for libibverbs 1.0 vs 1.1
 dummy=if$$
 cat <<IBV_VERSION > $dummy.c
diff --git a/src/cq.c b/src/cq.c
index fcb4237..41751b7 100644
--- a/src/cq.c
+++ b/src/cq.c
@@ -218,6 +218,14 @@ static inline void handle_good_req_ex(struct ibv_wc_ex *wc_ex,
 				      uint64_t wc_flags_yes,
 				      uint64_t wc_flags_no,
 				      uint32_t qpn, uint64_t *wc_flags_out)
+	ALWAYS_INLINE;
+static inline void handle_good_req_ex(struct ibv_wc_ex *wc_ex,
+				      union wc_buffer *pwc_buffer,
+				      struct mlx5_cqe64 *cqe,
+				      uint64_t wc_flags,
+				      uint64_t wc_flags_yes,
+				      uint64_t wc_flags_no,
+				      uint32_t qpn, uint64_t *wc_flags_out)
 {
 	union wc_buffer wc_buffer = *pwc_buffer;
 
@@ -367,6 +375,14 @@ static inline int handle_responder_ex(struct ibv_wc_ex *wc_ex,
 				      uint64_t wc_flags, uint64_t wc_flags_yes,
 				      uint64_t wc_flags_no, uint32_t qpn,
 				      uint64_t *wc_flags_out)
+	ALWAYS_INLINE;
+static inline int handle_responder_ex(struct ibv_wc_ex *wc_ex,
+				      union wc_buffer *pwc_buffer,
+				      struct mlx5_cqe64 *cqe,
+				      struct mlx5_qp *qp, struct mlx5_srq *srq,
+				      uint64_t wc_flags, uint64_t wc_flags_yes,
+				      uint64_t wc_flags_no, uint32_t qpn,
+				      uint64_t *wc_flags_out)
 {
 	uint16_t wqe_ctr;
 	struct mlx5_wq *wq;
@@ -573,7 +589,7 @@ static void mlx5_get_cycles(uint64_t *cycles)
 static inline struct mlx5_qp *get_req_context(struct mlx5_context *mctx,
 					      struct mlx5_resource **cur_rsc,
 					      uint32_t rsn, int cqe_ver)
-					      __attribute__((always_inline));
+					      ALWAYS_INLINE;
 static inline struct mlx5_qp *get_req_context(struct mlx5_context *mctx,
 					      struct mlx5_resource **cur_rsc,
 					      uint32_t rsn, int cqe_ver)
@@ -589,7 +605,7 @@ static inline int get_resp_cxt_v1(struct mlx5_context *mctx,
 				  struct mlx5_resource **cur_rsc,
 				  struct mlx5_srq **cur_srq,
 				  uint32_t uidx, int *is_srq)
-				  __attribute__((always_inline));
+				  ALWAYS_INLINE;
 static inline int get_resp_cxt_v1(struct mlx5_context *mctx,
 				  struct mlx5_resource **cur_rsc,
 				  struct mlx5_srq **cur_srq,
@@ -625,7 +641,7 @@ static inline int get_resp_cxt_v1(struct mlx5_context *mctx,
 static inline int get_resp_ctx(struct mlx5_context *mctx,
 			       struct mlx5_resource **cur_rsc,
 			       uint32_t qpn)
-			       __attribute__((always_inline));
+			       ALWAYS_INLINE;
 static inline int get_resp_ctx(struct mlx5_context *mctx,
 			       struct mlx5_resource **cur_rsc,
 			       uint32_t qpn)
@@ -647,7 +663,7 @@ static inline int get_resp_ctx(struct mlx5_context *mctx,
 static inline int get_srq_ctx(struct mlx5_context *mctx,
 			      struct mlx5_srq **cur_srq,
 			      uint32_t srqn_uidx)
-			      __attribute__((always_inline));
+			      ALWAYS_INLINE;
 static inline int get_srq_ctx(struct mlx5_context *mctx,
 			      struct mlx5_srq **cur_srq,
 			      uint32_t srqn)
@@ -662,7 +678,7 @@ static inline int get_srq_ctx(struct mlx5_context *mctx,
 }
 
 static inline void dump_cqe_debug(FILE *fp, struct mlx5_cqe64 *cqe64)
-	__attribute__((always_inline));
+	ALWAYS_INLINE;
 static inline void dump_cqe_debug(FILE *fp, struct mlx5_cqe64 *cqe64)
 {
 #ifdef MLX5_DEBUG
@@ -676,7 +692,7 @@ static inline void dump_cqe_debug(FILE *fp, struct mlx5_cqe64 *cqe64)
 inline int mlx5_poll_one_cqe_req(struct mlx5_cq *cq,
 				 struct mlx5_resource **cur_rsc,
 				 void *cqe, uint32_t qpn, int cqe_ver,
-				 uint64_t *wr_id) __attribute__((always_inline));
+				 uint64_t *wr_id) ALWAYS_INLINE;
 inline int mlx5_poll_one_cqe_req(struct mlx5_cq *cq,
 				 struct mlx5_resource **cur_rsc,
 				 void *cqe, uint32_t qpn, int cqe_ver,
@@ -719,7 +735,7 @@ inline int mlx5_poll_one_cqe_resp(struct mlx5_context *mctx,
 				  struct mlx5_srq **cur_srq,
 				  struct mlx5_cqe64 *cqe64, int cqe_ver,
 				  uint32_t qpn, int *is_srq)
-	__attribute__((always_inline));
+	ALWAYS_INLINE;
 inline int mlx5_poll_one_cqe_resp(struct mlx5_context *mctx,
 				  struct mlx5_resource **cur_rsc,
 				  struct mlx5_srq **cur_srq,
@@ -750,7 +766,7 @@ inline int mlx5_poll_one_cqe_err(struct mlx5_context *mctx,
 				 uint32_t qpn, uint32_t *pwc_status,
 				 uint32_t *pwc_vendor_err,
 				 uint64_t *pwc_wr_id, uint8_t opcode)
-	__attribute__((always_inline));
+	ALWAYS_INLINE;
 inline int mlx5_poll_one_cqe_err(struct mlx5_context *mctx,
 				 struct mlx5_resource **cur_rsc,
 				 struct mlx5_srq **cur_srq,
@@ -833,7 +849,7 @@ static inline int mlx5_poll_one(struct mlx5_cq *cq,
 			 struct mlx5_resource **cur_rsc,
 			 struct mlx5_srq **cur_srq,
 			 struct ibv_wc *wc, int cqe_ver)
-			 __attribute__((always_inline));
+			 ALWAYS_INLINE;
 static inline int mlx5_poll_one(struct mlx5_cq *cq,
 			 struct mlx5_resource **cur_rsc,
 			 struct mlx5_srq **cur_srq,
@@ -915,7 +931,7 @@ static inline int _mlx5_poll_one_ex(struct mlx5_cq *cq,
 				    uint64_t wc_flags,
 				    uint64_t wc_flags_yes, uint64_t wc_flags_no,
 				    int cqe_ver)
-	__attribute__((always_inline));
+	ALWAYS_INLINE;
 static inline int _mlx5_poll_one_ex(struct mlx5_cq *cq,
 				    struct mlx5_resource **cur_rsc,
 				    struct mlx5_srq **cur_srq,
@@ -1221,7 +1237,7 @@ int (*mlx5_get_poll_one_fn(uint64_t wc_flags))(struct mlx5_cq *cq,
 }
 
 static inline void mlx5_poll_cq_stall_start(struct mlx5_cq *cq)
-__attribute__((always_inline));
+ALWAYS_INLINE;
 static inline void mlx5_poll_cq_stall_start(struct mlx5_cq *cq)
 {
 	if (cq->stall_enable) {
@@ -1236,7 +1252,7 @@ static inline void mlx5_poll_cq_stall_start(struct mlx5_cq *cq)
 }
 
 static inline void mlx5_poll_cq_stall_end(struct mlx5_cq *cq, int ne,
-					  int npolled, int err) __attribute__((always_inline));
+					  int npolled, int err) ALWAYS_INLINE;
 static inline void mlx5_poll_cq_stall_end(struct mlx5_cq *cq, int ne,
 					  int npolled, int err)
 {
@@ -1263,7 +1279,7 @@ static inline void mlx5_poll_cq_stall_end(struct mlx5_cq *cq, int ne,
 
 static inline int poll_cq(struct ibv_cq *ibcq, int ne,
 			  struct ibv_wc *wc, int cqe_ver)
-	__attribute__((always_inline));
+	ALWAYS_INLINE;
 static inline int poll_cq(struct ibv_cq *ibcq, int ne,
 			  struct ibv_wc *wc, int cqe_ver)
 {
diff --git a/src/mlx5.h b/src/mlx5.h
index 6c40d6d..2f2e2f7 100644
--- a/src/mlx5.h
+++ b/src/mlx5.h
@@ -114,6 +114,12 @@ enum {
 				       IBV_WC_EX_WITH_COMPLETION_TIMESTAMP
 };
 
+#ifdef HAVE_ALWAYS_INLINE
+#define ALWAYS_INLINE __attribute__((always_inline))
+#else
+#define ALWAYS_INLINE
+#endif
+
 enum {
 	MLX5_IB_MMAP_CMD_SHIFT	= 8,
 	MLX5_IB_MMAP_CMD_MASK	= 0xff,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH libmlx5 V1 6/6] Add always_inline check
       [not found]     ` <1449158571-26228-7-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-12-07 13:07       ` Haggai Eran
       [not found]         ` <5665849E.4000304-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Haggai Eran @ 2015-12-07 13:07 UTC (permalink / raw)
  To: Matan Barak, Eli Cohen
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Eran Ben Elisha,
	Christoph Lameter

On 12/03/2015 06:02 PM, Matan Barak wrote:
> Always inline isn't supported by every compiler. Adding it to
> configure.ac in order to support it only when possible.
> Inline other poll_one data path functions in order to eliminate
> "ifs".
> 
> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
>  configure.ac | 17 +++++++++++++++++
>  src/cq.c     | 42 +++++++++++++++++++++++++++++-------------
>  src/mlx5.h   |  6 ++++++
>  3 files changed, 52 insertions(+), 13 deletions(-)
> 
> diff --git a/configure.ac b/configure.ac
> index fca0b46..50b4f9c 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -65,6 +65,23 @@ AC_CHECK_FUNC(ibv_read_sysfs_file, [],
>      AC_MSG_ERROR([ibv_read_sysfs_file() not found.  libmlx5 requires libibverbs >= 1.0.3.]))
>  AC_CHECK_FUNCS(ibv_dontfork_range ibv_dofork_range ibv_register_driver)
>  
> +AC_MSG_CHECKING("always inline")
Did you consider using an existing script like AX_GCC_FUNC_ATTRIBUTE [1]?

> +CFLAGS_BAK="$CFLAGS"
> +CFLAGS="$CFLAGS -Werror"
> +AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
> +	static inline int f(void)
> +		__attribute__((always_inline));
> +	static inline int f(void)
> +	{
> +		return 1;
> +	}
> +]],[[
> +		int a = f();
> +		a = a;
> +]])], [AC_MSG_RESULT([yes]) AC_DEFINE([HAVE_ALWAYS_INLINE], [1], [Define if __attribute((always_inline)).])]
The description here doesn't look right. How about "Define if
__attribute__((always_inline) is supported"?

Regards,
Haggai

[1] https://www.gnu.org/software/autoconf-archive/ax_gcc_func_attribute.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH libmlx5 V1 6/6] Add always_inline check
       [not found]         ` <5665849E.4000304-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-12-10 14:57           ` Matan Barak
  0 siblings, 0 replies; 9+ messages in thread
From: Matan Barak @ 2015-12-10 14:57 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Matan Barak, Eli Cohen, linux-rdma, Doug Ledford,
	Eran Ben Elisha, Christoph Lameter

On Mon, Dec 7, 2015 at 3:07 PM, Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On 12/03/2015 06:02 PM, Matan Barak wrote:
>> Always inline isn't supported by every compiler. Adding it to
>> configure.ac in order to support it only when possible.
>> Inline other poll_one data path functions in order to eliminate
>> "ifs".
>>
>> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> ---
>>  configure.ac | 17 +++++++++++++++++
>>  src/cq.c     | 42 +++++++++++++++++++++++++++++-------------
>>  src/mlx5.h   |  6 ++++++
>>  3 files changed, 52 insertions(+), 13 deletions(-)
>>
>> diff --git a/configure.ac b/configure.ac
>> index fca0b46..50b4f9c 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -65,6 +65,23 @@ AC_CHECK_FUNC(ibv_read_sysfs_file, [],
>>      AC_MSG_ERROR([ibv_read_sysfs_file() not found.  libmlx5 requires libibverbs >= 1.0.3.]))
>>  AC_CHECK_FUNCS(ibv_dontfork_range ibv_dofork_range ibv_register_driver)
>>
>> +AC_MSG_CHECKING("always inline")
> Did you consider using an existing script like AX_GCC_FUNC_ATTRIBUTE [1]?
>

That's probably better, I'll change.

>> +CFLAGS_BAK="$CFLAGS"
>> +CFLAGS="$CFLAGS -Werror"
>> +AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
>> +     static inline int f(void)
>> +             __attribute__((always_inline));
>> +     static inline int f(void)
>> +     {
>> +             return 1;
>> +     }
>> +]],[[
>> +             int a = f();
>> +             a = a;
>> +]])], [AC_MSG_RESULT([yes]) AC_DEFINE([HAVE_ALWAYS_INLINE], [1], [Define if __attribute((always_inline)).])]
> The description here doesn't look right. How about "Define if
> __attribute__((always_inline) is supported"?
>

If I use AX_GCC_FUNC_ATTRIBUTE, I don't need this anymore.

> Regards,
> Haggai
>
> [1] https://www.gnu.org/software/autoconf-archive/ax_gcc_func_attribute.html

Thanks for the review.

Regards,
Matan

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-12-10 14:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-03 16:02 [PATCH libmlx5 V1 0/6] Completion timestamping Matan Barak
     [not found] ` <1449158571-26228-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-12-03 16:02   ` [PATCH libmlx5 V1 1/6] Add ibv_poll_cq_ex support Matan Barak
2015-12-03 16:02   ` [PATCH libmlx5 V1 2/6] Add timestmap support for ibv_poll_cq_ex Matan Barak
2015-12-03 16:02   ` [PATCH libmlx5 V1 3/6] Add ibv_create_cq_ex support Matan Barak
2015-12-03 16:02   ` [PATCH libmlx5 V1 4/6] Add ibv_query_values support Matan Barak
2015-12-03 16:02   ` [PATCH libmlx5 V1 5/6] Optimize poll_cq Matan Barak
2015-12-03 16:02   ` [PATCH libmlx5 V1 6/6] Add always_inline check Matan Barak
     [not found]     ` <1449158571-26228-7-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-12-07 13:07       ` Haggai Eran
     [not found]         ` <5665849E.4000304-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-12-10 14:57           ` Matan Barak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.