DPDK-dev Archive on lore.kernel.org
 help / color / Atom feed
* [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement
@ 2019-07-08 14:07 Matan Azrad
  2019-07-08 14:07 ` [dpdk-dev] [PATCH 1/4] net/mlx5: accelerate DV flow counter transactions Matan Azrad
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Matan Azrad @ 2019-07-08 14:07 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko; +Cc: dev

New features in devx to query and allocate flow counters by batch commands allow to accelerate flow counter create/destroy/query.

Matan Azrad (4):
  net/mlx5: accelerate DV flow counter transactions
  net/mlx5: resize a full counter container
  net/mlx5: accelerate DV flow counter query
  net/mlx5: allow basic counter management fallback

 doc/guides/rel_notes/release_19_08.rst |   6 +-
 drivers/net/mlx5/Makefile              |   7 +-
 drivers/net/mlx5/meson.build           |   2 +
 drivers/net/mlx5/mlx5.c                | 102 ++++++
 drivers/net/mlx5/mlx5.h                | 145 +++++++-
 drivers/net/mlx5/mlx5_devx_cmds.c      | 225 +++++++++---
 drivers/net/mlx5/mlx5_ethdev.c         |  85 ++++-
 drivers/net/mlx5/mlx5_flow.c           | 147 ++++++++
 drivers/net/mlx5/mlx5_flow.h           |  27 +-
 drivers/net/mlx5/mlx5_flow_dv.c        | 616 ++++++++++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_flow_verbs.c     |  15 +-
 drivers/net/mlx5/mlx5_glue.c           |  91 +++++
 drivers/net/mlx5/mlx5_glue.h           |  20 ++
 drivers/net/mlx5/mlx5_prm.h            | 116 ++++++-
 14 files changed, 1464 insertions(+), 140 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH 1/4] net/mlx5: accelerate DV flow counter transactions
  2019-07-08 14:07 [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Matan Azrad
@ 2019-07-08 14:07 ` Matan Azrad
  2019-07-08 14:07 ` [dpdk-dev] [PATCH 2/4] net/mlx5: resize a full counter container Matan Azrad
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Matan Azrad @ 2019-07-08 14:07 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko; +Cc: dev

The DevX interface exposes a new feature to the PMD that can allocate a
batch of counters by one FW command. It can improve the flow
transaction rate (with count action).

Add a new counter pools mechanism to manage HW counters in the PMD.
So, for each flow with counter creation the PMD will try to find a free
counter in the PMD pools container and only if there is no a free
counter, it will allocate a new DevX batch counters.

Currently we cannot support batch counter for a group 0 flow, so
create a 2 container types, one which allocates counters one by
one and one which allocates X counters by the batch feature.

The allocated counters objects are never released back to the HW
assuming the flows maximum number will be close to the actual value of
the flows number.
Later, it can be updated, and dynamic release mechanism can be added.

The counters are contained in pools, each pool with 512 counters.
The pools are contained in counter containers according to the
allocation resolution type - single or batch.
The cache memory of the counters statistics is saved as raw data per
pool.
All the raw data memory is allocated for all the container in one
memory allocation and is managed by counter_stats_mem_mng structure
which registers all the raw memory to the HW.
Each pool points to one raw data structure.

The query operation is in pool resolution which updates all the pool
counter raw data by one operation.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/Makefile          |   2 +-
 drivers/net/mlx5/mlx5.c            |  85 +++++++
 drivers/net/mlx5/mlx5.h            | 115 ++++++++-
 drivers/net/mlx5/mlx5_devx_cmds.c  | 185 ++++++++++----
 drivers/net/mlx5/mlx5_flow.h       |  19 --
 drivers/net/mlx5/mlx5_flow_dv.c    | 485 ++++++++++++++++++++++++++++++++-----
 drivers/net/mlx5/mlx5_flow_verbs.c |  15 +-
 drivers/net/mlx5/mlx5_glue.c       |  29 +++
 drivers/net/mlx5/mlx5_glue.h       |   5 +
 drivers/net/mlx5/mlx5_prm.h        | 112 ++++++++-
 10 files changed, 901 insertions(+), 151 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 619e6b6..b210c80 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -8,7 +8,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_pmd_mlx5.a
 LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
 LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
-LIB_GLUE_VERSION = 19.05.0
+LIB_GLUE_VERSION = 19.08.0
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d93f92d..62be141 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -157,6 +157,89 @@ struct mlx5_dev_spawn_data {
 static pthread_mutex_t mlx5_ibv_list_mutex = PTHREAD_MUTEX_INITIALIZER;
 
 /**
+ * Initialize the counters management structure.
+ *
+ * @param[in] sh
+ *   Pointer to mlx5_ibv_shared object to free
+ */
+static void
+mlx5_flow_counters_mng_init(struct mlx5_ibv_shared *sh)
+{
+	uint8_t i;
+
+	TAILQ_INIT(&sh->cmng.flow_counters);
+	for (i = 0; i < RTE_DIM(sh->cmng.ccont); ++i)
+		TAILQ_INIT(&sh->cmng.ccont[i].pool_list);
+}
+
+/**
+ * Destroy all the resources allocated for a counter memory management.
+ *
+ * @param[in] mng
+ *   Pointer to the memory management structure.
+ */
+static void
+mlx5_flow_destroy_counter_stat_mem_mng(struct mlx5_counter_stats_mem_mng *mng)
+{
+	uint8_t *mem = (uint8_t *)(uintptr_t)mng->raws[0].data;
+
+	LIST_REMOVE(mng, next);
+	claim_zero(mlx5_devx_cmd_destroy(mng->dm));
+	claim_zero(mlx5_glue->devx_umem_dereg(mng->umem));
+	rte_free(mem);
+}
+
+/**
+ * Close and release all the resources of the counters management.
+ *
+ * @param[in] sh
+ *   Pointer to mlx5_ibv_shared object to free.
+ */
+static void
+mlx5_flow_counters_mng_close(struct mlx5_ibv_shared *sh)
+{
+	struct mlx5_counter_stats_mem_mng *mng;
+	uint8_t i;
+	int j;
+
+	for (i = 0; i < RTE_DIM(sh->cmng.ccont); ++i) {
+		struct mlx5_flow_counter_pool *pool;
+		uint32_t batch = !!(i % 2);
+
+		if (!sh->cmng.ccont[i].pools)
+			continue;
+		pool = TAILQ_FIRST(&sh->cmng.ccont[i].pool_list);
+		while (pool) {
+			if (batch) {
+				if (pool->min_dcs)
+					claim_zero
+					(mlx5_devx_cmd_destroy(pool->min_dcs));
+			}
+			for (j = 0; j < MLX5_COUNTERS_PER_POOL; ++j) {
+				if (pool->counters_raw[j].action)
+					claim_zero
+					(mlx5_glue->destroy_flow_action
+					       (pool->counters_raw[j].action));
+				if (!batch && pool->counters_raw[j].dcs)
+					claim_zero(mlx5_devx_cmd_destroy
+						  (pool->counters_raw[j].dcs));
+			}
+			TAILQ_REMOVE(&sh->cmng.ccont[i].pool_list, pool,
+				     next);
+			rte_free(pool);
+			pool = TAILQ_FIRST(&sh->cmng.ccont[i].pool_list);
+		}
+		rte_free(sh->cmng.ccont[i].pools);
+	}
+	mng = LIST_FIRST(&sh->cmng.mem_mngs);
+	while (mng) {
+		mlx5_flow_destroy_counter_stat_mem_mng(mng);
+		mng = LIST_FIRST(&sh->cmng.mem_mngs);
+	}
+	memset(&sh->cmng, 0, sizeof(sh->cmng));
+}
+
+/**
  * Allocate shared IB device context. If there is multiport device the
  * master and representors will share this context, if there is single
  * port dedicated IB device, the context will be used by only given
@@ -260,6 +343,7 @@ struct mlx5_dev_spawn_data {
 		err = rte_errno;
 		goto error;
 	}
+	mlx5_flow_counters_mng_init(sh);
 	LIST_INSERT_HEAD(&mlx5_ibv_list, sh, next);
 exit:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
@@ -314,6 +398,7 @@ struct mlx5_dev_spawn_data {
 	 *  Ensure there is no async event handler installed.
 	 *  Only primary process handles async device events.
 	 **/
+	mlx5_flow_counters_mng_close(sh);
 	assert(!sh->intr_cnt);
 	if (sh->intr_cnt)
 		mlx5_intr_callback_unregister
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5af3f41..3944b5f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -152,15 +152,23 @@ struct mlx5_stats_ctrl {
 	uint64_t imissed_base;
 };
 
-/* devx counter object */
-struct mlx5_devx_counter_set {
-	struct mlx5dv_devx_obj *obj;
-	int id; /* Flow counter ID */
+/* devX creation object */
+struct mlx5_devx_obj {
+	struct mlx5dv_devx_obj *obj; /* The DV object. */
+	int id; /* The object ID. */
+};
+
+struct mlx5_devx_mkey_attr {
+	uint64_t addr;
+	uint64_t size;
+	uint32_t umem_id;
+	uint32_t pd;
 };
 
 /* HCA attributes. */
 struct mlx5_hca_attr {
 	uint32_t eswitch_manager:1;
+	uint8_t flow_counter_bulk_alloc_bitmap;
 };
 
 /* Flow list . */
@@ -248,6 +256,87 @@ struct mlx5_drop {
 	struct mlx5_rxq_ibv *rxq; /* Verbs Rx queue. */
 };
 
+#define MLX5_COUNTERS_PER_POOL 512
+
+struct mlx5_flow_counter_pool;
+
+struct flow_counter_stats {
+	uint64_t hits;
+	uint64_t bytes;
+};
+
+/* Counters information. */
+struct mlx5_flow_counter {
+	TAILQ_ENTRY(mlx5_flow_counter) next;
+	/**< Pointer to the next flow counter structure. */
+	uint32_t shared:1; /**< Share counter ID with other flow rules. */
+	uint32_t batch: 1;
+	/**< Whether the counter was allocated by batch command. */
+	uint32_t ref_cnt:30; /**< Reference counter. */
+	uint32_t id; /**< Counter ID. */
+	union {  /**< Holds the counters for the rule. */
+#if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
+		struct ibv_counter_set *cs;
+#elif defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
+		struct ibv_counters *cs;
+#endif
+		struct mlx5_devx_obj *dcs; /**< Counter Devx object. */
+		struct mlx5_flow_counter_pool *pool; /**< The counter pool. */
+	};
+	uint64_t hits; /**< Reset value of hits packets. */
+	uint64_t bytes; /**< Reset value of bytes. */
+	void *action; /**< Pointer to the dv action. */
+};
+
+TAILQ_HEAD(mlx5_counters, mlx5_flow_counter);
+
+/* Counter pool structure - query is in pool resolution. */
+struct mlx5_flow_counter_pool {
+	TAILQ_ENTRY(mlx5_flow_counter_pool) next;
+	struct mlx5_counters counters; /* Free counter list. */
+	struct mlx5_devx_obj *min_dcs;
+	/* The devx object of the minimum counter ID in the pool. */
+	struct mlx5_counter_stats_raw *raw; /* The counter stats memory raw. */
+	struct mlx5_flow_counter counters_raw[]; /* The counters memory. */
+};
+
+struct mlx5_counter_stats_raw;
+
+/* Memory management structure for group of counter statistics raws. */
+struct mlx5_counter_stats_mem_mng {
+	LIST_ENTRY(mlx5_counter_stats_mem_mng) next;
+	struct mlx5_counter_stats_raw *raws;
+	struct mlx5_devx_obj *dm;
+	struct mlx5dv_devx_umem *umem;
+};
+
+/* Raw memory structure for the counter statistics values of a pool. */
+struct mlx5_counter_stats_raw {
+	LIST_ENTRY(mlx5_counter_stats_raw) next;
+	int min_dcs_id;
+	struct mlx5_counter_stats_mem_mng *mem_mng;
+	volatile struct flow_counter_stats *data;
+};
+
+TAILQ_HEAD(mlx5_counter_pools, mlx5_flow_counter_pool);
+
+/* Container structure for counter pools. */
+struct mlx5_pools_container {
+	uint16_t n_valid; /* Number of valid pools. */
+	uint16_t n; /* Number of pools. */
+	struct mlx5_counter_pools pool_list; /* Counter pool list. */
+	struct mlx5_flow_counter_pool **pools; /* Counter pool array. */
+	struct mlx5_counter_stats_mem_mng *init_mem_mng;
+	/* Hold the memory management for the next allocated pools raws. */
+};
+
+/* Counter global management structure. */
+struct mlx5_flow_counter_mng {
+	struct mlx5_pools_container ccont[2];
+	struct mlx5_counters flow_counters; /* Legacy flow counter list. */
+	LIST_HEAD(mem_mngs, mlx5_counter_stats_mem_mng) mem_mngs;
+};
+
 /* Per port data of shared IB device. */
 struct mlx5_ibv_shared_port {
 	uint32_t ih_port_id;
@@ -314,6 +403,7 @@ struct mlx5_ibv_shared {
 	LIST_HEAD(jump, mlx5_flow_dv_jump_tbl_resource) jump_tbl;
 	LIST_HEAD(port_id_action_list, mlx5_flow_dv_port_id_action_resource)
 		port_id_action_list; /* List of port ID actions. */
+	struct mlx5_flow_counter_mng cmng; /* Counters management structure. */
 	/* Shared interrupt handler section. */
 	pthread_mutex_t intr_mutex; /* Interrupt config mutex. */
 	uint32_t intr_cnt; /* Interrupt handler reference counter. */
@@ -362,8 +452,6 @@ struct mlx5_priv {
 	struct mlx5_drop drop_queue; /* Flow drop queues. */
 	struct mlx5_flows flows; /* RTE Flow rules. */
 	struct mlx5_flows ctrl_flows; /* Control flow rules. */
-	LIST_HEAD(counters, mlx5_flow_counter) flow_counters;
-	/* Flow counters. */
 	LIST_HEAD(rxq, mlx5_rxq_ctrl) rxqsctrl; /* DPDK Rx queues. */
 	LIST_HEAD(rxqibv, mlx5_rxq_ibv) rxqsibv; /* Verbs Rx queues. */
 	LIST_HEAD(hrxq, mlx5_hrxq) hrxqs; /* Verbs Hash Rx queues. */
@@ -584,12 +672,15 @@ int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 
 /* mlx5_devx_cmds.c */
 
-int mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
-				     struct mlx5_devx_counter_set *dcx);
-int mlx5_devx_cmd_flow_counter_free(struct mlx5dv_devx_obj *obj);
-int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_counter_set *dcx,
-				     int clear,
-				     uint64_t *pkts, uint64_t *bytes);
+struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
+						       uint32_t bulk_sz);
+int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
+int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				     int clear, uint32_t n_counters,
+				     uint64_t *pkts, uint64_t *bytes,
+				     uint32_t mkey, void *addr);
 int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
 				 struct mlx5_hca_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+					     struct mlx5_devx_mkey_attr *attr);
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index e5776c4..92f2fc8 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -2,6 +2,8 @@
 /* Copyright 2018 Mellanox Technologies, Ltd */
 
 #include <rte_flow_driver.h>
+#include <rte_malloc.h>
+#include <unistd.h>
 
 #include "mlx5.h"
 #include "mlx5_glue.h"
@@ -14,47 +16,37 @@
  *   ibv contexts returned from mlx5dv_open_device.
  * @param dcs
  *   Pointer to counters properties structure to be filled by the routine.
+ * @param bulk_n_128
+ *   Bulk counter numbers in 128 counters units.
  *
  * @return
- *   0 on success, a negative value otherwise.
+ *   Pointer to counter object on success, a negative value otherwise and
+ *   rte_errno is set.
  */
-int mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
-				     struct mlx5_devx_counter_set *dcs)
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx, uint32_t bulk_n_128)
 {
+	struct mlx5_devx_obj *dcs = rte_zmalloc("dcs", sizeof(*dcs), 0);
 	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
 	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
-	int status, syndrome;
 
+	if (!dcs) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
 	MLX5_SET(alloc_flow_counter_in, in, opcode,
 		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
+	MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk, bulk_n_128);
 	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
 					      sizeof(in), out, sizeof(out));
-	if (!dcs->obj)
-		return -errno;
-	status = MLX5_GET(query_flow_counter_out, out, status);
-	syndrome = MLX5_GET(query_flow_counter_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to create devx counters, "
-			"status %x, syndrome %x", status, syndrome);
-		return -1;
+	if (!dcs->obj) {
+		DRV_LOG(ERR, "Can't allocate counters - error %d\n", errno);
+		rte_errno = errno;
+		rte_free(dcs);
+		return NULL;
 	}
-	dcs->id = MLX5_GET(alloc_flow_counter_out,
-			   out, flow_counter_id);
-	return 0;
-}
-
-/**
- * Free flow counters obtained via devx interface.
- *
- * @param[in] obj
- *   devx object that was obtained from mlx5_devx_cmd_fc_alloc.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int mlx5_devx_cmd_flow_counter_free(struct mlx5dv_devx_obj *obj)
-{
-	return mlx5_glue->devx_obj_destroy(obj);
+	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
+	return dcs;
 }
 
 /**
@@ -64,49 +56,140 @@ int mlx5_devx_cmd_flow_counter_free(struct mlx5dv_devx_obj *obj)
  *   devx object that was obtained from mlx5_devx_cmd_fc_alloc.
  * @param[in] clear
  *   Whether hardware should clear the counters after the query or not.
+ * @param[in] n_counters
+ *   The counter number to read.
  *  @param pkts
  *   The number of packets that matched the flow.
  *  @param bytes
  *    The number of bytes that matched the flow.
+ *  @param mkey
+ *   The mkey key for batch query.
+ *  @param addr
+ *    The address in the mkey range for batch query.
  *
  * @return
  *   0 on success, a negative value otherwise.
  */
 int
-mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_counter_set *dcs,
-				 int clear __rte_unused,
-				 uint64_t *pkts, uint64_t *bytes)
+mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs, int clear,
+				 uint32_t n_counters, uint64_t *pkts,
+				 uint64_t *bytes, uint32_t mkey, void *addr)
 {
-	uint32_t out[MLX5_ST_SZ_BYTES(query_flow_counter_out) +
-		MLX5_ST_SZ_BYTES(traffic_counter)]   = {0};
+	int out_len = MLX5_ST_SZ_BYTES(query_flow_counter_out) +
+			MLX5_ST_SZ_BYTES(traffic_counter);
+	uint32_t out[out_len];
 	uint32_t in[MLX5_ST_SZ_DW(query_flow_counter_in)] = {0};
 	void *stats;
-	int status, syndrome, rc;
+	int rc;
 
 	MLX5_SET(query_flow_counter_in, in, opcode,
 		 MLX5_CMD_OP_QUERY_FLOW_COUNTER);
 	MLX5_SET(query_flow_counter_in, in, op_mod, 0);
 	MLX5_SET(query_flow_counter_in, in, flow_counter_id, dcs->id);
-	rc = mlx5_glue->devx_obj_query(dcs->obj,
-				       in, sizeof(in), out, sizeof(out));
-	if (rc)
-		return rc;
-	status = MLX5_GET(query_flow_counter_out, out, status);
-	syndrome = MLX5_GET(query_flow_counter_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to query devx counters, "
-			"id %d, status %x, syndrome = %x",
-			status, syndrome, dcs->id);
-		return -1;
+	MLX5_SET(query_flow_counter_in, in, clear, !!clear);
+
+	if (n_counters) {
+		MLX5_SET(query_flow_counter_in, in, num_of_counters,
+			 n_counters);
+		MLX5_SET(query_flow_counter_in, in, dump_to_memory, 1);
+		MLX5_SET(query_flow_counter_in, in, mkey, mkey);
+		MLX5_SET64(query_flow_counter_in, in, address,
+			   (uint64_t)(uintptr_t)addr);
+	}
+	rc = mlx5_glue->devx_obj_query(dcs->obj, in, sizeof(in), out, out_len);
+	if (rc) {
+		DRV_LOG(ERR, "Failed to query devx counters with rc %d\n ", rc);
+		rte_errno = rc;
+		return -rc;
+	}
+	if (!n_counters) {
+		stats = MLX5_ADDR_OF(query_flow_counter_out,
+				     out, flow_statistics);
+		*pkts = MLX5_GET64(traffic_counter, stats, packets);
+		*bytes = MLX5_GET64(traffic_counter, stats, octets);
 	}
-	stats = MLX5_ADDR_OF(query_flow_counter_out,
-			     out, flow_statistics);
-	*pkts = MLX5_GET64(traffic_counter, stats, packets);
-	*bytes = MLX5_GET64(traffic_counter, stats, octets);
 	return 0;
 }
 
 /**
+ * Create a new mkey.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[in] attr
+ *   Attributes of the requested mkey.
+ *
+ * @return
+ *   Pointer to Devx mkey on success, a negative value otherwise and rte_errno
+ *   is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+			  struct mlx5_devx_mkey_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_mkey_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_mkey_out)] = {0};
+	void *mkc;
+	struct mlx5_devx_obj *mkey = rte_zmalloc("mkey", sizeof(*mkey), 0);
+	size_t pgsize;
+	uint32_t translation_size;
+
+	if (!mkey) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	pgsize = sysconf(_SC_PAGESIZE);
+	translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
+	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
+	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
+		 translation_size);
+	MLX5_SET(create_mkey_in, in, mkey_umem_id, attr->umem_id);
+	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	MLX5_SET(mkc, mkc, lw, 0x1);
+	MLX5_SET(mkc, mkc, lr, 0x1);
+	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
+	MLX5_SET(mkc, mkc, qpn, 0xffffff);
+	MLX5_SET(mkc, mkc, pd, attr->pd);
+	MLX5_SET(mkc, mkc, mkey_7_0, attr->umem_id & 0xFF);
+	MLX5_SET(mkc, mkc, translations_octword_size, translation_size);
+	MLX5_SET64(mkc, mkc, start_addr, attr->addr);
+	MLX5_SET64(mkc, mkc, len, attr->size);
+	MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
+	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+					       sizeof(out));
+	if (!mkey->obj) {
+		DRV_LOG(ERR, "Can't create mkey - error %d\n", errno);
+		rte_errno = errno;
+		rte_free(mkey);
+		return NULL;
+	}
+	mkey->id = MLX5_GET(create_mkey_out, out, mkey_index);
+	mkey->id = (mkey->id << 8) | (attr->umem_id & 0xFF);
+	return mkey;
+}
+
+/**
+ * Destroy any object allocated by a Devx API.
+ *
+ * @param[in] obj
+ *   Pointer to a general object.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj)
+{
+	int ret;
+
+	if (!obj)
+		return 0;
+	ret =  mlx5_glue->devx_obj_destroy(obj->obj);
+	rte_free(obj);
+	return ret;
+}
+
+/**
  * Query HCA attributes.
  * Using those attributes we can check on run time if the device
  * is having the required capabilities.
@@ -146,6 +229,8 @@ int mlx5_devx_cmd_flow_counter_free(struct mlx5dv_devx_obj *obj)
 		return -1;
 	}
 	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+	attr->flow_counter_bulk_alloc_bitmap =
+			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
 	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
 	return 0;
 }
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 3d7fcf7..fbd09d0 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -354,25 +354,6 @@ struct mlx5_flow {
 	};
 };
 
-/* Counters information. */
-struct mlx5_flow_counter {
-	LIST_ENTRY(mlx5_flow_counter) next; /**< Pointer to the next counter. */
-	uint32_t shared:1; /**< Share counter ID with other flow rules. */
-	uint32_t ref_cnt:31; /**< Reference counter. */
-	uint32_t id; /**< Counter ID. */
-	union {  /**< Holds the counters for the rule. */
-#if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
-		struct ibv_counter_set *cs;
-#elif defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
-		struct ibv_counters *cs;
-#endif
-		struct mlx5_devx_counter_set *dcs;
-	};
-	uint64_t hits; /**< Number of packets matched by the rule. */
-	uint64_t bytes; /**< Number of bytes matched by the rule. */
-	void *action; /**< Pointer to the dv action. */
-};
-
 /* Flow structure. */
 struct rte_flow {
 	TAILQ_ENTRY(rte_flow) next; /**< Pointer to the next flow structure. */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 9cc09e7..9654c3b 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -6,6 +6,7 @@
 #include <stdalign.h>
 #include <stdint.h>
 #include <string.h>
+#include <unistd.h>
 
 /* Verbs header. */
 /* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
@@ -2113,8 +2114,344 @@ struct field_modify_info modify_tcp[] = {
 	return 0;
 }
 
+#define MLX5_CNT_CONTAINER_SIZE 64
+#define MLX5_CNT_CONTAINER(priv, batch) (&(priv)->sh->cmng.ccont[batch])
+
+/**
+ * Get a pool by a counter.
+ *
+ * @param[in] cnt
+ *   Pointer to the counter.
+ *
+ * @return
+ *   The counter pool.
+ */
+static struct mlx5_flow_counter_pool *
+flow_dv_counter_pool_get(struct mlx5_flow_counter *cnt)
+{
+	if (!cnt->batch) {
+		cnt -= cnt->dcs->id % MLX5_COUNTERS_PER_POOL;
+		return (struct mlx5_flow_counter_pool *)cnt - 1;
+	}
+	return cnt->pool;
+}
+
+/**
+ * Get a pool by devx counter ID.
+ *
+ * @param[in] cont
+ *   Pointer to the counter container.
+ * @param[in] id
+ *   The counter devx ID.
+ *
+ * @return
+ *   The counter pool pointer if exists, NULL otherwise,
+ */
+static struct mlx5_flow_counter_pool *
+flow_dv_find_pool_by_id(struct mlx5_pools_container *cont, int id)
+{
+	struct mlx5_flow_counter_pool *pool;
+
+	TAILQ_FOREACH(pool, &cont->pool_list, next) {
+		int base = (pool->min_dcs->id / MLX5_COUNTERS_PER_POOL) *
+				MLX5_COUNTERS_PER_POOL;
+
+		if (id >= base && id < base + MLX5_COUNTERS_PER_POOL)
+			return pool;
+	};
+	return NULL;
+}
+
+/**
+ * Allocate a new memory for the counter values wrapped by all the needed
+ * management.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] raws_n
+ *   The raw memory areas - each one for MLX5_COUNTERS_PER_POOL counters.
+ *
+ * @return
+ *   The new memory management pointer on success, otherwise NULL and rte_errno
+ *   is set.
+ */
+static struct mlx5_counter_stats_mem_mng *
+flow_dv_create_counter_stat_mem_mng(struct rte_eth_dev *dev, int raws_n)
+{
+	struct mlx5_ibv_shared *sh = ((struct mlx5_priv *)
+					(dev->data->dev_private))->sh;
+	struct mlx5dv_pd dv_pd;
+	struct mlx5dv_obj dv_obj;
+	struct mlx5_devx_mkey_attr mkey_attr;
+	struct mlx5_counter_stats_mem_mng *mem_mng;
+	volatile struct flow_counter_stats *raw_data;
+	int size = (sizeof(struct flow_counter_stats) *
+			MLX5_COUNTERS_PER_POOL +
+			sizeof(struct mlx5_counter_stats_raw)) * raws_n +
+			sizeof(struct mlx5_counter_stats_mem_mng);
+	uint8_t *mem = rte_calloc(__func__, 1, size, sysconf(_SC_PAGESIZE));
+	int i;
+
+	if (!mem) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	mem_mng = (struct mlx5_counter_stats_mem_mng *)(mem + size) - 1;
+	size = sizeof(*raw_data) * MLX5_COUNTERS_PER_POOL * raws_n;
+	mem_mng->umem = mlx5_glue->devx_umem_reg(sh->ctx, mem, size,
+						 IBV_ACCESS_LOCAL_WRITE);
+	if (!mem_mng->umem) {
+		rte_errno = errno;
+		rte_free(mem);
+		return NULL;
+	}
+	dv_obj.pd.in = sh->pd;
+	dv_obj.pd.out = &dv_pd;
+	mlx5_glue->dv_init_obj(&dv_obj, MLX5DV_OBJ_PD);
+	mkey_attr.addr = (uintptr_t)mem;
+	mkey_attr.size = size;
+	mkey_attr.umem_id = mem_mng->umem->umem_id;
+	mkey_attr.pd = dv_pd.pdn;
+	mem_mng->dm = mlx5_devx_cmd_mkey_create(sh->ctx, &mkey_attr);
+	if (!mem_mng->dm) {
+		mlx5_glue->devx_umem_dereg(mem_mng->umem);
+		rte_errno = errno;
+		rte_free(mem);
+		return NULL;
+	}
+	mem_mng->raws = (struct mlx5_counter_stats_raw *)(mem + size);
+	raw_data = (volatile struct flow_counter_stats *)mem;
+	for (i = 0; i < raws_n; ++i) {
+		mem_mng->raws[i].mem_mng = mem_mng;
+		mem_mng->raws[i].data = raw_data + i * MLX5_COUNTERS_PER_POOL;
+	}
+	LIST_INSERT_HEAD(&sh->cmng.mem_mngs, mem_mng, next);
+	return mem_mng;
+}
+
+/**
+ * Prepare a counter container.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] batch
+ *   Whether the pool is for counter that was allocated by batch command.
+ *
+ * @return
+ *   The container pointer on success, otherwise NULL and rte_errno is set.
+ */
+static struct mlx5_pools_container *
+flow_dv_container_prepare(struct rte_eth_dev *dev, uint32_t batch)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
+	struct mlx5_counter_stats_mem_mng *mem_mng;
+	uint32_t size = MLX5_CNT_CONTAINER_SIZE;
+	uint32_t mem_size = sizeof(struct mlx5_flow_counter_pool *) * size;
+
+	cont->pools = rte_calloc(__func__, 1, mem_size, 0);
+	if (!cont->pools) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	mem_mng = flow_dv_create_counter_stat_mem_mng(dev, size);
+	if (!mem_mng) {
+		rte_free(cont->pools);
+		return NULL;
+	}
+	cont->n = size;
+	TAILQ_INIT(&cont->pool_list);
+	cont->init_mem_mng = mem_mng;
+	return cont;
+}
+
+/**
+ * Query a devx flow counter.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] cnt
+ *   Pointer to the flow counter.
+ * @param[out] pkts
+ *   The statistics value of packets.
+ * @param[out] bytes
+ *   The statistics value of bytes.
+ *
+ * @return
+ *   0 on success, otherwise a negative errno value and rte_errno is set.
+ */
+static inline int
+_flow_dv_query_count(struct rte_eth_dev *dev __rte_unused,
+		     struct mlx5_flow_counter *cnt, uint64_t *pkts,
+		     uint64_t *bytes)
+{
+	struct mlx5_flow_counter_pool *pool =
+			flow_dv_counter_pool_get(cnt);
+	uint16_t offset = pool->min_dcs->id % MLX5_COUNTERS_PER_POOL;
+	int ret = mlx5_devx_cmd_flow_counter_query
+		(pool->min_dcs, 0, MLX5_COUNTERS_PER_POOL - offset, NULL,
+		 NULL, pool->raw->mem_mng->dm->id,
+		 (void *)(uintptr_t)(pool->raw->data +
+		 offset));
+
+	if (ret) {
+		DRV_LOG(ERR, "Failed to trigger synchronous"
+			" query for dcs ID %d\n",
+			pool->min_dcs->id);
+		return ret;
+	}
+	offset = cnt - &pool->counters_raw[0];
+	*pkts = rte_be_to_cpu_64(pool->raw->data[offset].hits);
+	*bytes = rte_be_to_cpu_64(pool->raw->data[offset].bytes);
+	return 0;
+}
+
+/**
+ * Create and initialize a new counter pool.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[out] dcs
+ *   The devX counter handle.
+ * @param[in] batch
+ *   Whether the pool is for counter that was allocated by batch command.
+ *
+ * @return
+ *   A new pool pointer on success, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_flow_counter_pool *
+flow_dv_pool_create(struct rte_eth_dev *dev, struct mlx5_devx_obj *dcs,
+		    uint32_t batch)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_counter_pool *pool;
+	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
+	uint32_t size;
+
+	if (!cont->n) {
+		cont = flow_dv_container_prepare(dev, batch);
+		if (!cont)
+			return NULL;
+	} else if (cont->n == cont->n_valid) {
+		DRV_LOG(ERR, "No space in container to allocate a new pool\n");
+		rte_errno = ENOSPC;
+		return NULL;
+	}
+	size = sizeof(*pool) + MLX5_COUNTERS_PER_POOL *
+			sizeof(struct mlx5_flow_counter);
+	pool = rte_calloc(__func__, 1, size, 0);
+	if (!pool) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	pool->min_dcs = dcs;
+	pool->raw = cont->init_mem_mng->raws + cont->n_valid;
+	TAILQ_INIT(&pool->counters);
+	TAILQ_INSERT_TAIL(&cont->pool_list, pool, next);
+	cont->pools[cont->n_valid] = pool;
+	cont->n_valid++;
+	return pool;
+}
+
 /**
- * Get or create a flow counter.
+ * Prepare a new counter and/or a new counter pool.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[out] cnt_free
+ *   Where to put the pointer of a new counter.
+ * @param[in] batch
+ *   Whether the pool is for counter that was allocated by batch command.
+ *
+ * @return
+ *   The free counter pool pointer and @p cnt_free is set on success,
+ *   NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_flow_counter_pool *
+flow_dv_counter_pool_prepare(struct rte_eth_dev *dev,
+			     struct mlx5_flow_counter **cnt_free,
+			     uint32_t batch)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_counter_pool *pool;
+	struct mlx5_devx_obj *dcs = NULL;
+	struct mlx5_flow_counter *cnt;
+	uint32_t i;
+
+	if (!batch) {
+		/* bulk_bitmap must be 0 for single counter allocation. */
+		dcs = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, 0);
+		if (!dcs)
+			return NULL;
+		pool = flow_dv_find_pool_by_id(MLX5_CNT_CONTAINER(priv, batch),
+					       dcs->id);
+		if (!pool) {
+			pool = flow_dv_pool_create(dev, dcs, batch);
+			if (!pool) {
+				mlx5_devx_cmd_destroy(dcs);
+				return NULL;
+			}
+		} else if (dcs->id < pool->min_dcs->id) {
+			pool->min_dcs->id = dcs->id;
+		}
+		cnt = &pool->counters_raw[dcs->id % MLX5_COUNTERS_PER_POOL];
+		TAILQ_INSERT_HEAD(&pool->counters, cnt, next);
+		cnt->dcs = dcs;
+		*cnt_free = cnt;
+		return pool;
+	}
+	/* bulk_bitmap is in 128 counters units. */
+	if (priv->config.hca_attr.flow_counter_bulk_alloc_bitmap & 0x4)
+		dcs = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, 0x4);
+	if (!dcs) {
+		rte_errno = ENODATA;
+		return NULL;
+	}
+	pool = flow_dv_pool_create(dev, dcs, batch);
+	if (!pool) {
+		mlx5_devx_cmd_destroy(dcs);
+		return NULL;
+	}
+	for (i = 0; i < MLX5_COUNTERS_PER_POOL; ++i) {
+		cnt = &pool->counters_raw[i];
+		cnt->pool = pool;
+		TAILQ_INSERT_HEAD(&pool->counters, cnt, next);
+	}
+	*cnt_free = &pool->counters_raw[0];
+	return pool;
+}
+
+/**
+ * Search for existed shared counter.
+ *
+ * @param[in] cont
+ *   Pointer to the relevant counter pool container.
+ * @param[in] id
+ *   The shared counter ID to search.
+ *
+ * @return
+ *   NULL if not existed, otherwise pointer to the shared counter.
+ */
+static struct mlx5_flow_counter *
+flow_dv_counter_shared_search(struct mlx5_pools_container *cont,
+			      uint32_t id)
+{
+	static struct mlx5_flow_counter *cnt;
+	struct mlx5_flow_counter_pool *pool;
+	int i;
+
+	TAILQ_FOREACH(pool, &cont->pool_list, next) {
+		for (i = 0; i < MLX5_COUNTERS_PER_POOL; ++i) {
+			cnt = &pool->counters_raw[i];
+			if (cnt->ref_cnt && cnt->shared && cnt->id == id)
+				return cnt;
+		}
+	}
+	return NULL;
+}
+
+/**
+ * Allocate a flow counter.
  *
  * @param[in] dev
  *   Pointer to the Ethernet device structure.
@@ -2122,80 +2459,110 @@ struct field_modify_info modify_tcp[] = {
  *   Indicate if this counter is shared with other flows.
  * @param[in] id
  *   Counter identifier.
+ * @param[in] group
+ *   Counter flow group.
  *
  * @return
  *   pointer to flow counter on success, NULL otherwise and rte_errno is set.
  */
 static struct mlx5_flow_counter *
-flow_dv_counter_new(struct rte_eth_dev *dev, uint32_t shared, uint32_t id)
+flow_dv_counter_alloc(struct rte_eth_dev *dev, uint32_t shared, uint32_t id,
+		      uint16_t group)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter *cnt = NULL;
-	struct mlx5_devx_counter_set *dcs = NULL;
-	int ret;
+	struct mlx5_flow_counter_pool *pool = NULL;
+	struct mlx5_flow_counter *cnt_free = NULL;
+	/*
+	 * Currently group 0 flow counter cannot be assigned to a flow if it is
+	 * not the first one in the batch counter allocation, so it is better
+	 * to allocate counters one by one for these flows in a separate
+	 * container.
+	 * A counter can be shared between different groups so need to take
+	 * shared counters from the single container.
+	 */
+	uint32_t batch = (group && !shared) ? 1 : 0;
+	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
 
 	if (!priv->config.devx) {
-		ret = -ENOTSUP;
-		goto error_exit;
+		rte_errno = ENOTSUP;
+		return NULL;
 	}
 	if (shared) {
-		LIST_FOREACH(cnt, &priv->flow_counters, next) {
-			if (cnt->shared && cnt->id == id) {
-				cnt->ref_cnt++;
-				return cnt;
+		cnt_free = flow_dv_counter_shared_search(cont, id);
+		if (cnt_free) {
+			if (cnt_free->ref_cnt + 1 == 0) {
+				rte_errno = E2BIG;
+				return NULL;
 			}
+			cnt_free->ref_cnt++;
+			return cnt_free;
 		}
 	}
-	cnt = rte_calloc(__func__, 1, sizeof(*cnt), 0);
-	dcs = rte_calloc(__func__, 1, sizeof(*dcs), 0);
-	if (!dcs || !cnt) {
-		ret = -ENOMEM;
-		goto error_exit;
+	/* Pools which has a free counters are in the start. */
+	pool = TAILQ_FIRST(&cont->pool_list);
+	if (pool)
+		cnt_free = TAILQ_FIRST(&pool->counters);
+	if (!cnt_free) {
+		pool = flow_dv_counter_pool_prepare(dev, &cnt_free, batch);
+		if (!pool)
+			return NULL;
 	}
-	ret = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, dcs);
-	if (ret)
-		goto error_exit;
-	struct mlx5_flow_counter tmpl = {
-		.shared = shared,
-		.ref_cnt = 1,
-		.id = id,
-		.dcs = dcs,
-	};
-	tmpl.action = mlx5_glue->dv_create_flow_action_counter(dcs->obj, 0);
-	if (!tmpl.action) {
-		ret = errno;
-		goto error_exit;
+	cnt_free->batch = batch;
+	/* Create a DV counter action only in the first time usage. */
+	if (!cnt_free->action) {
+		uint16_t offset;
+		struct mlx5_devx_obj *dcs;
+
+		if (batch) {
+			offset = cnt_free - &pool->counters_raw[0];
+			dcs = pool->min_dcs;
+		} else {
+			offset = 0;
+			dcs = cnt_free->dcs;
+		}
+		cnt_free->action = mlx5_glue->dv_create_flow_action_counter
+					(dcs->obj, offset);
+		if (!cnt_free->action) {
+			rte_errno = errno;
+			return NULL;
+		}
 	}
-	*cnt = tmpl;
-	LIST_INSERT_HEAD(&priv->flow_counters, cnt, next);
-	return cnt;
-error_exit:
-	rte_free(cnt);
-	rte_free(dcs);
-	rte_errno = -ret;
-	return NULL;
+	/* Update the counter reset values. */
+	if (_flow_dv_query_count(dev, cnt_free, &cnt_free->hits,
+				 &cnt_free->bytes))
+		return NULL;
+	cnt_free->shared = shared;
+	cnt_free->ref_cnt = 1;
+	cnt_free->id = id;
+	TAILQ_REMOVE(&pool->counters, cnt_free, next);
+	if (TAILQ_EMPTY(&pool->counters)) {
+		/* Move the pool to the end of the container pool list. */
+		TAILQ_REMOVE(&cont->pool_list, pool, next);
+		TAILQ_INSERT_TAIL(&cont->pool_list, pool, next);
+	}
+	return cnt_free;
 }
 
 /**
  * Release a flow counter.
  *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
  * @param[in] counter
  *   Pointer to the counter handler.
  */
 static void
-flow_dv_counter_release(struct mlx5_flow_counter *counter)
+flow_dv_counter_release(struct rte_eth_dev *dev __rte_unused,
+			struct mlx5_flow_counter *counter)
 {
-	int ret;
-
 	if (!counter)
 		return;
 	if (--counter->ref_cnt == 0) {
-		ret = mlx5_devx_cmd_flow_counter_free(counter->dcs->obj);
-		if (ret)
-			DRV_LOG(ERR, "Failed to free devx counters, %d", ret);
-		LIST_REMOVE(counter, next);
-		rte_free(counter->dcs);
-		rte_free(counter);
+		struct mlx5_flow_counter_pool *pool =
+				flow_dv_counter_pool_get(counter);
+
+		/* Put the counter in the end - the earliest one. */
+		TAILQ_INSERT_TAIL(&pool->counters, counter, next);
 	}
 }
 
@@ -4103,8 +4470,10 @@ struct field_modify_info modify_tcp[] = {
 				rte_errno = ENOTSUP;
 				goto cnt_err;
 			}
-			flow->counter = flow_dv_counter_new(dev, count->shared,
-							    count->id);
+			flow->counter = flow_dv_counter_alloc(dev,
+							      count->shared,
+							      count->id,
+							      attr->group);
 			if (flow->counter == NULL)
 				goto cnt_err;
 			dev_flow->dv.actions[actions_n++] =
@@ -4770,7 +5139,7 @@ struct field_modify_info modify_tcp[] = {
 		return;
 	flow_dv_remove(dev, flow);
 	if (flow->counter) {
-		flow_dv_counter_release(flow->counter);
+		flow_dv_counter_release(dev, flow->counter);
 		flow->counter = NULL;
 	}
 	if (flow->tag_resource) {
@@ -4815,9 +5184,6 @@ struct field_modify_info modify_tcp[] = {
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_query_count *qc = data;
-	uint64_t pkts = 0;
-	uint64_t bytes = 0;
-	int err;
 
 	if (!priv->config.devx)
 		return rte_flow_error_set(error, ENOTSUP,
@@ -4825,15 +5191,14 @@ struct field_modify_info modify_tcp[] = {
 					  NULL,
 					  "counters are not supported");
 	if (flow->counter) {
-		err = mlx5_devx_cmd_flow_counter_query
-						(flow->counter->dcs,
-						 qc->reset, &pkts, &bytes);
+		uint64_t pkts, bytes;
+		int err = _flow_dv_query_count(dev, flow->counter, &pkts,
+					       &bytes);
+
 		if (err)
-			return rte_flow_error_set
-				(error, err,
-				 RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				 NULL,
-				 "cannot read counters");
+			return rte_flow_error_set(error, -err,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "cannot read counters");
 		qc->hits_set = 1;
 		qc->bytes_set = 1;
 		qc->hits = pkts - flow->counter->hits;
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 2f4c80c..b3395b8 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -124,7 +124,7 @@
 	int ret;
 
 	if (shared) {
-		LIST_FOREACH(cnt, &priv->flow_counters, next) {
+		TAILQ_FOREACH(cnt, &priv->sh->cmng.flow_counters, next) {
 			if (cnt->shared && cnt->id == id) {
 				cnt->ref_cnt++;
 				return cnt;
@@ -144,7 +144,7 @@
 	/* Create counter with Verbs. */
 	ret = flow_verbs_counter_create(dev, cnt);
 	if (!ret) {
-		LIST_INSERT_HEAD(&priv->flow_counters, cnt, next);
+		TAILQ_INSERT_HEAD(&priv->sh->cmng.flow_counters, cnt, next);
 		return cnt;
 	}
 	/* Some error occurred in Verbs library. */
@@ -156,19 +156,24 @@
 /**
  * Release a flow counter.
  *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
  * @param[in] counter
  *   Pointer to the counter handler.
  */
 static void
-flow_verbs_counter_release(struct mlx5_flow_counter *counter)
+flow_verbs_counter_release(struct rte_eth_dev *dev,
+			   struct mlx5_flow_counter *counter)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	if (--counter->ref_cnt == 0) {
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
 		claim_zero(mlx5_glue->destroy_counter_set(counter->cs));
 #elif defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
 		claim_zero(mlx5_glue->destroy_counters(counter->cs));
 #endif
-		LIST_REMOVE(counter, next);
+		TAILQ_REMOVE(&priv->sh->cmng.flow_counters, counter, next);
 		rte_free(counter);
 	}
 }
@@ -1612,7 +1617,7 @@
 		rte_free(dev_flow);
 	}
 	if (flow->counter) {
-		flow_verbs_counter_release(flow->counter);
+		flow_verbs_counter_release(dev, flow->counter);
 		flow->counter = NULL;
 	}
 }
diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
index d038373..ba5fd06 100644
--- a/drivers/net/mlx5/mlx5_glue.c
+++ b/drivers/net/mlx5/mlx5_glue.c
@@ -849,6 +849,33 @@
 #endif
 }
 
+static struct mlx5dv_devx_umem *
+mlx5_glue_devx_umem_reg(struct ibv_context *context, void *addr, size_t size,
+			uint32_t access)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_umem_reg(context, addr, size, access);
+#else
+	(void)context;
+	(void)addr;
+	(void)size;
+	(void)access;
+	errno = -ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_devx_umem_dereg(struct mlx5dv_devx_umem *dv_devx_umem)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_umem_dereg(dv_devx_umem);
+#else
+	(void)dv_devx_umem;
+	return -ENOTSUP;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -930,4 +957,6 @@
 	.devx_obj_query = mlx5_glue_devx_obj_query,
 	.devx_obj_modify = mlx5_glue_devx_obj_modify,
 	.devx_general_cmd = mlx5_glue_devx_general_cmd,
+	.devx_umem_reg = mlx5_glue_devx_umem_reg,
+	.devx_umem_dereg = mlx5_glue_devx_umem_dereg,
 };
diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
index 433c9ed..18b1ce6 100644
--- a/drivers/net/mlx5/mlx5_glue.h
+++ b/drivers/net/mlx5/mlx5_glue.h
@@ -61,6 +61,7 @@
 
 #ifndef HAVE_IBV_DEVX_OBJ
 struct mlx5dv_devx_obj;
+struct mlx5dv_devx_umem;
 #endif
 
 #ifndef HAVE_MLX5DV_DR
@@ -209,6 +210,10 @@ struct mlx5_glue {
 	int (*devx_general_cmd)(struct ibv_context *context,
 				const void *in, size_t inlen,
 				void *out, size_t outlen);
+	struct mlx5dv_devx_umem *(*devx_umem_reg)(struct ibv_context *context,
+						  void *addr, size_t size,
+						  uint32_t access);
+	int (*devx_umem_dereg)(struct mlx5dv_devx_umem *dv_devx_umem);
 };
 
 const struct mlx5_glue *mlx5_glue;
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index 7482383..e2e538d 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -415,6 +415,14 @@ struct mlx5_modification_cmd {
 				 (((_v) & __mlx5_mask(typ, fld)) << \
 				   __mlx5_dw_bit_off(typ, fld))); \
 	} while (0)
+
+#define MLX5_SET64(typ, p, fld, v) \
+	do { \
+		assert(__mlx5_bit_sz(typ, fld) == 64); \
+		*((__be64 *)(p) + __mlx5_64_off(typ, fld)) = \
+			rte_cpu_to_be_64(v); \
+	} while (0)
+
 #define MLX5_GET(typ, p, fld) \
 	((rte_be_to_cpu_32(*((__be32 *)(p) +\
 	__mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
@@ -552,10 +560,15 @@ enum {
 
 enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
+	MLX5_CMD_OP_CREATE_MKEY = 0x200,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
 	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
 };
 
+enum {
+	MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
+};
+
 /* Flow counters. */
 struct mlx5_ifc_alloc_flow_counter_out_bits {
 	u8         status[0x8];
@@ -570,7 +583,9 @@ struct mlx5_ifc_alloc_flow_counter_in_bits {
 	u8         reserved_at_10[0x10];
 	u8         reserved_at_20[0x10];
 	u8         op_mod[0x10];
-	u8         reserved_at_40[0x40];
+	u8         flow_counter_id[0x20];
+	u8         reserved_at_40[0x18];
+	u8         flow_counter_bulk[0x8];
 };
 
 struct mlx5_ifc_dealloc_flow_counter_out_bits {
@@ -607,13 +622,102 @@ struct mlx5_ifc_query_flow_counter_in_bits {
 	u8         reserved_at_10[0x10];
 	u8         reserved_at_20[0x10];
 	u8         op_mod[0x10];
-	u8         reserved_at_40[0x80];
+	u8         reserved_at_40[0x20];
+	u8         mkey[0x20];
+	u8         address[0x40];
 	u8         clear[0x1];
-	u8         reserved_at_c1[0xf];
-	u8         num_of_counters[0x10];
+	u8         dump_to_memory[0x1];
+	u8         num_of_counters[0x1e];
 	u8         flow_counter_id[0x20];
 };
 
+struct mlx5_ifc_mkc_bits {
+	u8         reserved_at_0[0x1];
+	u8         free[0x1];
+	u8         reserved_at_2[0x1];
+	u8         access_mode_4_2[0x3];
+	u8         reserved_at_6[0x7];
+	u8         relaxed_ordering_write[0x1];
+	u8         reserved_at_e[0x1];
+	u8         small_fence_on_rdma_read_response[0x1];
+	u8         umr_en[0x1];
+	u8         a[0x1];
+	u8         rw[0x1];
+	u8         rr[0x1];
+	u8         lw[0x1];
+	u8         lr[0x1];
+	u8         access_mode_1_0[0x2];
+	u8         reserved_at_18[0x8];
+
+	u8         qpn[0x18];
+	u8         mkey_7_0[0x8];
+
+	u8         reserved_at_40[0x20];
+
+	u8         length64[0x1];
+	u8         bsf_en[0x1];
+	u8         sync_umr[0x1];
+	u8         reserved_at_63[0x2];
+	u8         expected_sigerr_count[0x1];
+	u8         reserved_at_66[0x1];
+	u8         en_rinval[0x1];
+	u8         pd[0x18];
+
+	u8         start_addr[0x40];
+
+	u8         len[0x40];
+
+	u8         bsf_octword_size[0x20];
+
+	u8         reserved_at_120[0x80];
+
+	u8         translations_octword_size[0x20];
+
+	u8         reserved_at_1c0[0x1b];
+	u8         log_page_size[0x5];
+
+	u8         reserved_at_1e0[0x20];
+};
+
+struct mlx5_ifc_create_mkey_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+
+	u8         syndrome[0x20];
+
+	u8         reserved_at_40[0x8];
+	u8         mkey_index[0x18];
+
+	u8         reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_mkey_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         reserved_at_40[0x20];
+
+	u8         pg_access[0x1];
+	u8         reserved_at_61[0x1f];
+
+	struct mlx5_ifc_mkc_bits memory_key_mkey_entry;
+
+	u8         reserved_at_280[0x80];
+
+	u8         translations_octword_actual_size[0x20];
+
+	u8         mkey_umem_id[0x20];
+
+	u8         mkey_umem_offset[0x40];
+
+	u8         reserved_at_380[0x500];
+
+	u8         klm_pas_mtt[][0x20];
+};
+
 enum {
 	MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0 << 1,
 	MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP        = 0xc << 1,
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH 2/4] net/mlx5: resize a full counter container
  2019-07-08 14:07 [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Matan Azrad
  2019-07-08 14:07 ` [dpdk-dev] [PATCH 1/4] net/mlx5: accelerate DV flow counter transactions Matan Azrad
@ 2019-07-08 14:07 ` Matan Azrad
  2019-07-08 14:07 ` [dpdk-dev] [PATCH 3/4] net/mlx5: accelerate DV flow counter query Matan Azrad
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Matan Azrad @ 2019-07-08 14:07 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko; +Cc: dev

When the counter countainer has no more space to store more counter
pools try to resize the container to allow more pools to be created.

So, the only limitation for the maximum counter number is the memory.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow_dv.c | 43 +++++++++++++++++++++++------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 9654c3b..3b7a43e 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -2114,7 +2114,7 @@ struct field_modify_info modify_tcp[] = {
 	return 0;
 }
 
-#define MLX5_CNT_CONTAINER_SIZE 64
+#define MLX5_CNT_CONTAINER_RESIZE 64
 #define MLX5_CNT_CONTAINER(priv, batch) (&(priv)->sh->cmng.ccont[batch])
 
 /**
@@ -2230,7 +2230,7 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
- * Prepare a counter container.
+ * Resize a counter container.
  *
  * @param[in] dev
  *   Pointer to the Ethernet device structure.
@@ -2241,26 +2241,34 @@ struct field_modify_info modify_tcp[] = {
  *   The container pointer on success, otherwise NULL and rte_errno is set.
  */
 static struct mlx5_pools_container *
-flow_dv_container_prepare(struct rte_eth_dev *dev, uint32_t batch)
+flow_dv_container_resize(struct rte_eth_dev *dev, uint32_t batch)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
 	struct mlx5_counter_stats_mem_mng *mem_mng;
-	uint32_t size = MLX5_CNT_CONTAINER_SIZE;
-	uint32_t mem_size = sizeof(struct mlx5_flow_counter_pool *) * size;
-
-	cont->pools = rte_calloc(__func__, 1, mem_size, 0);
-	if (!cont->pools) {
+	uint32_t resize = cont->n + MLX5_CNT_CONTAINER_RESIZE;
+	uint32_t mem_size = sizeof(struct mlx5_flow_counter_pool *) * resize;
+	struct mlx5_flow_counter_pool **new_pools = rte_calloc(__func__, 1,
+							       mem_size, 0);
+	if (!new_pools) {
 		rte_errno = ENOMEM;
 		return NULL;
 	}
-	mem_mng = flow_dv_create_counter_stat_mem_mng(dev, size);
+	mem_mng = flow_dv_create_counter_stat_mem_mng(dev,
+						    MLX5_CNT_CONTAINER_RESIZE);
 	if (!mem_mng) {
-		rte_free(cont->pools);
+		rte_free(new_pools);
 		return NULL;
 	}
-	cont->n = size;
-	TAILQ_INIT(&cont->pool_list);
+	if (cont->n) {
+		memcpy(new_pools, cont->pools,
+		       cont->n * sizeof(struct mlx5_flow_counter_pool *));
+		rte_free(cont->pools);
+	} else {
+		TAILQ_INIT(&cont->pool_list);
+	}
+	cont->pools = new_pools;
+	cont->n = resize;
 	cont->init_mem_mng = mem_mng;
 	return cont;
 }
@@ -2328,14 +2336,10 @@ struct field_modify_info modify_tcp[] = {
 	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
 	uint32_t size;
 
-	if (!cont->n) {
-		cont = flow_dv_container_prepare(dev, batch);
+	if (cont->n == cont->n_valid) {
+		cont = flow_dv_container_resize(dev, batch);
 		if (!cont)
 			return NULL;
-	} else if (cont->n == cont->n_valid) {
-		DRV_LOG(ERR, "No space in container to allocate a new pool\n");
-		rte_errno = ENOSPC;
-		return NULL;
 	}
 	size = sizeof(*pool) + MLX5_COUNTERS_PER_POOL *
 			sizeof(struct mlx5_flow_counter);
@@ -2345,7 +2349,8 @@ struct field_modify_info modify_tcp[] = {
 		return NULL;
 	}
 	pool->min_dcs = dcs;
-	pool->raw = cont->init_mem_mng->raws + cont->n_valid;
+	pool->raw = cont->init_mem_mng->raws + cont->n_valid  %
+			MLX5_CNT_CONTAINER_RESIZE;
 	TAILQ_INIT(&pool->counters);
 	TAILQ_INSERT_TAIL(&cont->pool_list, pool, next);
 	cont->pools[cont->n_valid] = pool;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH 3/4] net/mlx5: accelerate DV flow counter query
  2019-07-08 14:07 [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Matan Azrad
  2019-07-08 14:07 ` [dpdk-dev] [PATCH 1/4] net/mlx5: accelerate DV flow counter transactions Matan Azrad
  2019-07-08 14:07 ` [dpdk-dev] [PATCH 2/4] net/mlx5: resize a full counter container Matan Azrad
@ 2019-07-08 14:07 ` Matan Azrad
  2019-07-08 14:07 ` [dpdk-dev] [PATCH 4/4] net/mlx5: allow basic counter management fallback Matan Azrad
  2019-07-16 14:34 ` [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Matan Azrad
  4 siblings, 0 replies; 11+ messages in thread
From: Matan Azrad @ 2019-07-08 14:07 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko; +Cc: dev

All the DV counters are cashed in the PMD memory and are contained in
pools which are contained in containers according to the counters
allocation type - batch or single.

Currently, the flow counter query is done synchronously in pool
resolution means that on the user request a FW command is triggered to
read all the counters in the pool.

A new feature of devX to asynchronously read batch of flow counters
allows to accelerate the user query operation.

Using the DPDK host thread, the PMD periodically triggers asynchronous
query in pool resolution for all the counter pools and an interrupt is
triggered by the FW when the values are updated.
In the interrupt handler the pool counter values raw data is replaced
using a double buffer algorithm (very fast).
In the user query, the PMD just returns the last query values from the
PMD cache - no system-calls and FW commands are triggered from the user
control thread on query operation!

More synchronization is added with the host thread:
        Container resize uses double buffer algorithm.
        Pools growing in container uses atomic operation.
        Pool query buffer replace uses a spinlock.
        Pool minimum devX counter ID uses atomic operation.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/rel_notes/release_19_08.rst |   6 +-
 drivers/net/mlx5/Makefile              |   5 ++
 drivers/net/mlx5/meson.build           |   2 +
 drivers/net/mlx5/mlx5.c                |   9 ++
 drivers/net/mlx5/mlx5.h                |  44 ++++++++--
 drivers/net/mlx5/mlx5_devx_cmds.c      |  48 ++++++++++-
 drivers/net/mlx5/mlx5_ethdev.c         |  85 +++++++++++++++++--
 drivers/net/mlx5/mlx5_flow.c           | 147 +++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h           |   8 ++
 drivers/net/mlx5/mlx5_flow_dv.c        | 141 ++++++++++++++++++++-----------
 drivers/net/mlx5/mlx5_glue.c           |  62 ++++++++++++++
 drivers/net/mlx5/mlx5_glue.h           |  15 ++++
 12 files changed, 506 insertions(+), 66 deletions(-)

diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index ab5052e..5fb8552 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -190,11 +190,13 @@ New Features
   Added telemetry mode to l3fwd-power application to report
   application level busyness, empty and full polls of rte_eth_rx_burst().
 
-* **Updated Mellanox mlx5 driver.**
+* **Updated Mellanox mlx5 PMD.**
 
    Updated Mellanox mlx5 driver with new features and improvements, including:
 
-   * Added support for match on ICMP/ICMP6's code and type.
+  * Added support for match on ICMP/ICMP6's code and type.
+  * Accelerate flows with count action creation and destroy.
+  * Accelerate flows counter query.
 
 Removed Items
 -------------
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index b210c80..76d40b1 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -173,6 +173,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum MLX5DV_FLOW_ACTION_COUNTERS_DEVX \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVX_ASYNC \
+		infiniband/mlx5dv.h \
+		func mlx5dv_devx_obj_query_async \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_ETHTOOL_LINK_MODE_25G \
 		/usr/include/linux/ethtool.h \
 		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 3eff22e..fabd490 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -122,6 +122,8 @@ if build
 		'mlx5dv_devx_obj_create' ],
 		[ 'HAVE_IBV_FLOW_DEVX_COUNTERS', 'infiniband/mlx5dv.h',
 		'MLX5DV_FLOW_ACTION_COUNTERS_DEVX' ],
+		[ 'HAVE_IBV_DEVX_ASYNC', 'infiniband/mlx5dv.h',
+		'mlx5dv_devx_obj_query_async' ],
 		[ 'HAVE_MLX5DV_DR', 'infiniband/mlx5dv.h',
 		'MLX5DV_DR_DOMAIN_TYPE_NIC_RX' ],
 		[ 'HAVE_MLX5DV_DR_ESWITCH', 'infiniband/mlx5dv.h',
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 62be141..a8d824e 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -37,6 +37,7 @@
 #include <rte_rwlock.h>
 #include <rte_spinlock.h>
 #include <rte_string_fns.h>
+#include <rte_alarm.h>
 
 #include "mlx5.h"
 #include "mlx5_utils.h"
@@ -201,7 +202,15 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_counter_stats_mem_mng *mng;
 	uint8_t i;
 	int j;
+	int retries = 1024;
 
+	rte_errno = 0;
+	while (--retries) {
+		rte_eal_alarm_cancel(mlx5_flow_query_alarm, sh);
+		if (rte_errno != EINPROGRESS)
+			break;
+		rte_pause();
+	}
 	for (i = 0; i < RTE_DIM(sh->cmng.ccont); ++i) {
 		struct mlx5_flow_counter_pool *pool;
 		uint32_t batch = !!(i % 2);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3944b5f..4ce352a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -257,6 +257,7 @@ struct mlx5_drop {
 };
 
 #define MLX5_COUNTERS_PER_POOL 512
+#define MLX5_MAX_PENDING_QUERIES 4
 
 struct mlx5_flow_counter_pool;
 
@@ -283,7 +284,10 @@ struct mlx5_flow_counter {
 		struct mlx5_devx_obj *dcs; /**< Counter Devx object. */
 		struct mlx5_flow_counter_pool *pool; /**< The counter pool. */
 	};
-	uint64_t hits; /**< Reset value of hits packets. */
+	union {
+		uint64_t hits; /**< Reset value of hits packets. */
+		int64_t query_gen; /**< Generation of the last release. */
+	};
 	uint64_t bytes; /**< Reset value of bytes. */
 	void *action; /**< Pointer to the dv action. */
 };
@@ -294,10 +298,17 @@ struct mlx5_flow_counter {
 struct mlx5_flow_counter_pool {
 	TAILQ_ENTRY(mlx5_flow_counter_pool) next;
 	struct mlx5_counters counters; /* Free counter list. */
-	struct mlx5_devx_obj *min_dcs;
-	/* The devx object of the minimum counter ID in the pool. */
-	struct mlx5_counter_stats_raw *raw; /* The counter stats memory raw. */
-	struct mlx5_flow_counter counters_raw[]; /* The counters memory. */
+	union {
+		struct mlx5_devx_obj *min_dcs;
+		rte_atomic64_t a64_dcs;
+	};
+	/* The devx object of the minimum counter ID. */
+	rte_atomic64_t query_gen;
+	uint32_t n_counters: 16; /* Number of devx allocated counters. */
+	rte_spinlock_t sl; /* The pool lock. */
+	struct mlx5_counter_stats_raw *raw;
+	struct mlx5_counter_stats_raw *raw_hw; /* The raw on HW working. */
+	struct mlx5_flow_counter counters_raw[]; /* The pool counters memory. */
 };
 
 struct mlx5_counter_stats_raw;
@@ -322,7 +333,7 @@ struct mlx5_counter_stats_raw {
 
 /* Container structure for counter pools. */
 struct mlx5_pools_container {
-	uint16_t n_valid; /* Number of valid pools. */
+	rte_atomic16_t n_valid; /* Number of valid pools. */
 	uint16_t n; /* Number of pools. */
 	struct mlx5_counter_pools pool_list; /* Counter pool list. */
 	struct mlx5_flow_counter_pool **pools; /* Counter pool array. */
@@ -332,9 +343,16 @@ struct mlx5_pools_container {
 
 /* Counter global management structure. */
 struct mlx5_flow_counter_mng {
-	struct mlx5_pools_container ccont[2];
+	uint8_t mhi[2]; /* master \ host container index. */
+	struct mlx5_pools_container ccont[2 * 2];
+	/* 2 containers for single and for batch for double-buffer. */
 	struct mlx5_counters flow_counters; /* Legacy flow counter list. */
+	uint8_t pending_queries;
+	uint8_t batch;
+	uint16_t pool_index;
+	uint8_t query_thread_on;
 	LIST_HEAD(mem_mngs, mlx5_counter_stats_mem_mng) mem_mngs;
+	LIST_HEAD(stat_raws, mlx5_counter_stats_raw) free_stat_raws;
 };
 
 /* Per port data of shared IB device. */
@@ -408,6 +426,8 @@ struct mlx5_ibv_shared {
 	pthread_mutex_t intr_mutex; /* Interrupt config mutex. */
 	uint32_t intr_cnt; /* Interrupt handler reference counter. */
 	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
+	struct rte_intr_handle intr_handle_devx; /* DEVX interrupt handler. */
+	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
@@ -520,6 +540,7 @@ int mlx5_ibv_device_to_pci_addr(const struct ibv_device *device,
 				struct rte_pci_addr *pci_addr);
 void mlx5_dev_link_status_handler(void *arg);
 void mlx5_dev_interrupt_handler(void *arg);
+void mlx5_dev_interrupt_handler_devx(void *arg);
 void mlx5_dev_interrupt_handler_uninstall(struct rte_eth_dev *dev);
 void mlx5_dev_interrupt_handler_install(struct rte_eth_dev *dev);
 int mlx5_set_link_down(struct rte_eth_dev *dev);
@@ -641,6 +662,10 @@ int mlx5_ctrl_flow(struct rte_eth_dev *dev,
 		   struct rte_flow_item_eth *eth_mask);
 int mlx5_flow_create_drop_queue(struct rte_eth_dev *dev);
 void mlx5_flow_delete_drop_queue(struct rte_eth_dev *dev);
+void mlx5_flow_async_pool_query_handle(struct mlx5_ibv_shared *sh,
+				       uint64_t async_id, int status);
+void mlx5_set_query_alarm(struct mlx5_ibv_shared *sh);
+void mlx5_flow_query_alarm(void *arg);
 
 /* mlx5_mp.c */
 void mlx5_mp_req_start_rxtx(struct rte_eth_dev *dev);
@@ -678,9 +703,12 @@ struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
 int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
 				     int clear, uint32_t n_counters,
 				     uint64_t *pkts, uint64_t *bytes,
-				     uint32_t mkey, void *addr);
+				     uint32_t mkey, void *addr,
+				     struct mlx5dv_devx_cmd_comp *cmd_comp,
+				     uint64_t async_id);
 int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
 				 struct mlx5_hca_attr *attr);
 struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
 					     struct mlx5_devx_mkey_attr *attr);
+int mlx5_devx_get_out_command_status(void *out);
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 92f2fc8..28d967a 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -66,14 +66,21 @@ struct mlx5_devx_obj *
  *   The mkey key for batch query.
  *  @param addr
  *    The address in the mkey range for batch query.
+ *  @param cmd_comp
+ *   The completion object for asynchronous batch query.
+ *  @param async_id
+ *    The ID to be returned in the asynchronous batch query response.
  *
  * @return
  *   0 on success, a negative value otherwise.
  */
 int
-mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs, int clear,
-				 uint32_t n_counters, uint64_t *pkts,
-				 uint64_t *bytes, uint32_t mkey, void *addr)
+mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				 int clear, uint32_t n_counters,
+				 uint64_t *pkts, uint64_t *bytes,
+				 uint32_t mkey, void *addr,
+				 struct mlx5dv_devx_cmd_comp *cmd_comp,
+				 uint64_t async_id)
 {
 	int out_len = MLX5_ST_SZ_BYTES(query_flow_counter_out) +
 			MLX5_ST_SZ_BYTES(traffic_counter);
@@ -96,7 +103,13 @@ struct mlx5_devx_obj *
 		MLX5_SET64(query_flow_counter_in, in, address,
 			   (uint64_t)(uintptr_t)addr);
 	}
-	rc = mlx5_glue->devx_obj_query(dcs->obj, in, sizeof(in), out, out_len);
+	if (!cmd_comp)
+		rc = mlx5_glue->devx_obj_query(dcs->obj, in, sizeof(in), out,
+					       out_len);
+	else
+		rc = mlx5_glue->devx_obj_query_async(dcs->obj, in, sizeof(in),
+						     out_len, async_id,
+						     cmd_comp);
 	if (rc) {
 		DRV_LOG(ERR, "Failed to query devx counters with rc %d\n ", rc);
 		rte_errno = rc;
@@ -169,6 +182,33 @@ struct mlx5_devx_obj *
 }
 
 /**
+ * Get status of devx command response.
+ * Mainly used for asynchronous commands.
+ *
+ * @param[in] out
+ *   The out response buffer.
+ *
+ * @return
+ *   0 on success, non-zero value otherwise.
+ */
+int
+mlx5_devx_get_out_command_status(void *out)
+{
+	int status;
+
+	if (!out)
+		return -EINVAL;
+	status = MLX5_GET(query_flow_counter_out, out, status);
+	if (status) {
+		int syndrome = MLX5_GET(query_flow_counter_out, out, syndrome);
+
+		DRV_LOG(ERR, "Bad devX status %x, syndrome = %x\n", status,
+			syndrome);
+	}
+	return status;
+}
+
+/**
  * Destroy any object allocated by a Devx API.
  *
  * @param[in] obj
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index eeefe4d..004901a 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1433,6 +1433,38 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
+ * Handle DEVX interrupts from the NIC.
+ * This function is probably called from the DPDK host thread.
+ *
+ * @param cb_arg
+ *   Callback argument.
+ */
+void
+mlx5_dev_interrupt_handler_devx(void *cb_arg)
+{
+#ifndef HAVE_IBV_DEVX_ASYNC
+	(void)cb_arg;
+	return;
+#else
+	struct mlx5_ibv_shared *sh = cb_arg;
+	union {
+		struct mlx5dv_devx_async_cmd_hdr cmd_resp;
+		uint8_t buf[MLX5_ST_SZ_BYTES(query_flow_counter_out) +
+			    MLX5_ST_SZ_BYTES(traffic_counter) +
+			    sizeof(struct mlx5dv_devx_async_cmd_hdr)];
+	} out;
+	uint8_t *buf = out.buf + sizeof(out.cmd_resp);
+
+	while (!mlx5_glue->devx_get_async_cmd_comp(sh->devx_comp,
+						   &out.cmd_resp,
+						   sizeof(out.buf)))
+		mlx5_flow_async_pool_query_handle
+			(sh, (uint64_t)out.cmd_resp.wr_id,
+			 mlx5_devx_get_out_command_status(buf));
+#endif /* HAVE_IBV_DEVX_ASYNC */
+}
+
+/**
  * Uninstall shared asynchronous device events handler.
  * This function is implemented to support event sharing
  * between multiple ports of single IB device.
@@ -1464,6 +1496,17 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 				     mlx5_dev_interrupt_handler, sh);
 	sh->intr_handle.fd = 0;
 	sh->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+	if (sh->intr_handle_devx.fd) {
+		rte_intr_callback_unregister(&sh->intr_handle_devx,
+					     mlx5_dev_interrupt_handler_devx,
+					     sh);
+		sh->intr_handle_devx.fd = 0;
+		sh->intr_handle_devx.type = RTE_INTR_HANDLE_UNKNOWN;
+	}
+	if (sh->devx_comp) {
+		mlx5_glue->devx_destroy_cmd_comp(sh->devx_comp);
+		sh->devx_comp = NULL;
+	}
 exit:
 	pthread_mutex_unlock(&sh->intr_mutex);
 }
@@ -1507,17 +1550,49 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	if (ret) {
 		DRV_LOG(INFO, "failed to change file descriptor"
 			      " async event queue");
-		/* Indicate there will be no interrupts. */
-		dev->data->dev_conf.intr_conf.lsc = 0;
-		dev->data->dev_conf.intr_conf.rmv = 0;
-		sh->port[priv->ibv_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
-		goto exit;
+		goto error;
 	}
 	sh->intr_handle.fd = sh->ctx->async_fd;
 	sh->intr_handle.type = RTE_INTR_HANDLE_EXT;
 	rte_intr_callback_register(&sh->intr_handle,
 				   mlx5_dev_interrupt_handler, sh);
+	if (priv->config.devx) {
+#ifndef HAVE_IBV_DEVX_ASYNC
+		goto error_unregister;
+#else
+		sh->devx_comp = mlx5_glue->devx_create_cmd_comp(sh->ctx);
+		if (sh->devx_comp) {
+			flags = fcntl(sh->devx_comp->fd, F_GETFL);
+			ret = fcntl(sh->devx_comp->fd, F_SETFL,
+				    flags | O_NONBLOCK);
+			if (ret) {
+				DRV_LOG(INFO, "failed to change file descriptor"
+					      " devx async event queue");
+				goto error_unregister;
+			}
+			sh->intr_handle_devx.fd = sh->devx_comp->fd;
+			sh->intr_handle_devx.type = RTE_INTR_HANDLE_EXT;
+			rte_intr_callback_register
+				(&sh->intr_handle_devx,
+				 mlx5_dev_interrupt_handler_devx, sh);
+		} else {
+			DRV_LOG(INFO, "failed to create devx async command "
+				"completion");
+			goto error_unregister;
+		}
+#endif /* HAVE_IBV_DEVX_ASYNC */
+	}
 	sh->intr_cnt++;
+error_unregister:
+	rte_intr_callback_unregister(&sh->intr_handle,
+				     mlx5_dev_interrupt_handler, sh);
+error:
+	/* Indicate there will be no interrupts. */
+	dev->data->dev_conf.intr_conf.lsc = 0;
+	dev->data->dev_conf.intr_conf.rmv = 0;
+	sh->intr_handle.fd = 0;
+	sh->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+	sh->port[priv->ibv_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
 exit:
 	pthread_mutex_unlock(&sh->intr_mutex);
 }
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 534cd93..1c5431d 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -3078,3 +3078,150 @@ struct rte_flow *
 	}
 	return 0;
 }
+
+#define MLX5_POOL_QUERY_FREQ_US 1000000
+
+/**
+ * Set the periodic procedure for triggering asynchronous batch queries for all
+ * the counter pools.
+ *
+ * @param[in] sh
+ *   Pointer to mlx5_ibv_shared object.
+ */
+void
+mlx5_set_query_alarm(struct mlx5_ibv_shared *sh)
+{
+	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(sh, 0, 0);
+	uint32_t pools_n = rte_atomic16_read(&cont->n_valid);
+	uint32_t us;
+
+	cont = MLX5_CNT_CONTAINER(sh, 1, 0);
+	pools_n += rte_atomic16_read(&cont->n_valid);
+	us = MLX5_POOL_QUERY_FREQ_US / pools_n;
+	DRV_LOG(DEBUG, "Set alarm for %u pools each %u us\n", pools_n, us);
+	if (rte_eal_alarm_set(us, mlx5_flow_query_alarm, sh)) {
+		sh->cmng.query_thread_on = 0;
+		DRV_LOG(ERR, "Cannot reinitialize query alarm\n");
+	} else {
+		sh->cmng.query_thread_on = 1;
+	}
+}
+
+/**
+ * The periodic procedure for triggering asynchronous batch queries for all the
+ * counter pools. This function is probably called by the host thread.
+ *
+ * @param[in] arg
+ *   The parameter for the alarm process.
+ */
+void
+mlx5_flow_query_alarm(void *arg)
+{
+	struct mlx5_ibv_shared *sh = arg;
+	struct mlx5_devx_obj *dcs;
+	uint16_t offset;
+	int ret;
+	uint8_t batch = sh->cmng.batch;
+	uint16_t pool_index = sh->cmng.pool_index;
+	struct mlx5_pools_container *cont;
+	struct mlx5_pools_container *mcont;
+	struct mlx5_flow_counter_pool *pool;
+
+	if (sh->cmng.pending_queries >= MLX5_MAX_PENDING_QUERIES)
+		goto set_alarm;
+next_container:
+	cont = MLX5_CNT_CONTAINER(sh, batch, 1);
+	mcont = MLX5_CNT_CONTAINER(sh, batch, 0);
+	/* Check if resize was done and need to flip a container. */
+	if (cont != mcont) {
+		if (cont->pools) {
+			/* Clean the old container. */
+			rte_free(cont->pools);
+			memset(cont, 0, sizeof(*cont));
+		}
+		rte_cio_wmb();
+		 /* Flip the host container. */
+		sh->cmng.mhi[batch] ^= (uint8_t)2;
+		cont = mcont;
+	}
+	if (!cont->pools) {
+		/* 2 empty containers case is unexpected. */
+		if (unlikely(batch != sh->cmng.batch))
+			goto set_alarm;
+		batch ^= 0x1;
+		pool_index = 0;
+		goto next_container;
+	}
+	pool = cont->pools[pool_index];
+	if (pool->raw_hw)
+		/* There is a pool query in progress. */
+		goto set_alarm;
+	pool->raw_hw =
+		LIST_FIRST(&sh->cmng.free_stat_raws);
+	if (!pool->raw_hw)
+		/* No free counter statistics raw memory. */
+		goto set_alarm;
+	dcs = (struct mlx5_devx_obj *)(uintptr_t)rte_atomic64_read
+							      (&pool->a64_dcs);
+	offset = batch ? 0 : dcs->id % MLX5_COUNTERS_PER_POOL;
+	ret = mlx5_devx_cmd_flow_counter_query(dcs, 0, MLX5_COUNTERS_PER_POOL -
+					       offset, NULL, NULL,
+					       pool->raw_hw->mem_mng->dm->id,
+					       (void *)(uintptr_t)
+					       (pool->raw_hw->data + offset),
+					       sh->devx_comp,
+					       (uint64_t)(uintptr_t)pool);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to trigger asynchronous query for dcs ID"
+			" %d\n", pool->min_dcs->id);
+		pool->raw_hw = NULL;
+		goto set_alarm;
+	}
+	pool->raw_hw->min_dcs_id = dcs->id;
+	LIST_REMOVE(pool->raw_hw, next);
+	sh->cmng.pending_queries++;
+	pool_index++;
+	if (pool_index >= rte_atomic16_read(&cont->n_valid)) {
+		batch ^= 0x1;
+		pool_index = 0;
+	}
+set_alarm:
+	sh->cmng.batch = batch;
+	sh->cmng.pool_index = pool_index;
+	mlx5_set_query_alarm(sh);
+}
+
+/**
+ * Handler for the HW respond about ready values from an asynchronous batch
+ * query. This function is probably called by the host thread.
+ *
+ * @param[in] sh
+ *   The pointer to the shared IB device context.
+ * @param[in] async_id
+ *   The Devx async ID.
+ * @param[in] status
+ *   The status of the completion.
+ */
+void
+mlx5_flow_async_pool_query_handle(struct mlx5_ibv_shared *sh,
+				  uint64_t async_id, int status)
+{
+	struct mlx5_flow_counter_pool *pool =
+		(struct mlx5_flow_counter_pool *)(uintptr_t)async_id;
+	struct mlx5_counter_stats_raw *raw_to_free;
+
+	if (unlikely(status)) {
+		raw_to_free = pool->raw_hw;
+	} else {
+		raw_to_free = pool->raw;
+		rte_spinlock_lock(&pool->sl);
+		pool->raw = pool->raw_hw;
+		rte_spinlock_unlock(&pool->sl);
+		rte_atomic64_add(&pool->query_gen, 1);
+		/* Be sure the new raw counters data is updated in memory. */
+		rte_cio_wmb();
+	}
+	LIST_INSERT_HEAD(&sh->cmng.free_stat_raws, raw_to_free, next);
+	pool->raw_hw = NULL;
+	sh->cmng.pending_queries--;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index fbd09d0..0d6f64a 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -21,6 +21,9 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+#include <rte_atomic.h>
+#include <rte_alarm.h>
+
 #include "mlx5.h"
 #include "mlx5_prm.h"
 
@@ -409,6 +412,11 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_query_t query;
 };
 
+#define MLX5_CNT_CONTAINER(sh, batch, thread) (&(sh)->cmng.ccont \
+	[(((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
+#define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
+	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
+
 /* mlx5_flow.c */
 
 uint64_t mlx5_flow_hashfields_adjust(struct mlx5_flow *dev_flow, int tunnel,
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 3b7a43e..b4a1463 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -2115,7 +2115,6 @@ struct field_modify_info modify_tcp[] = {
 }
 
 #define MLX5_CNT_CONTAINER_RESIZE 64
-#define MLX5_CNT_CONTAINER(priv, batch) (&(priv)->sh->cmng.ccont[batch])
 
 /**
  * Get a pool by a counter.
@@ -2238,39 +2237,53 @@ struct field_modify_info modify_tcp[] = {
  *   Whether the pool is for counter that was allocated by batch command.
  *
  * @return
- *   The container pointer on success, otherwise NULL and rte_errno is set.
+ *   The new container pointer on success, otherwise NULL and rte_errno is set.
  */
 static struct mlx5_pools_container *
 flow_dv_container_resize(struct rte_eth_dev *dev, uint32_t batch)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
+	struct mlx5_pools_container *cont =
+			MLX5_CNT_CONTAINER(priv->sh, batch, 0);
+	struct mlx5_pools_container *new_cont =
+			MLX5_CNT_CONTAINER_UNUSED(priv->sh, batch, 0);
 	struct mlx5_counter_stats_mem_mng *mem_mng;
 	uint32_t resize = cont->n + MLX5_CNT_CONTAINER_RESIZE;
 	uint32_t mem_size = sizeof(struct mlx5_flow_counter_pool *) * resize;
-	struct mlx5_flow_counter_pool **new_pools = rte_calloc(__func__, 1,
-							       mem_size, 0);
-	if (!new_pools) {
+	int i;
+
+	if (cont != MLX5_CNT_CONTAINER(priv->sh, batch, 1)) {
+		/* The last resize still hasn't detected by the host thread. */
+		rte_errno = EAGAIN;
+		return NULL;
+	}
+	new_cont->pools = rte_calloc(__func__, 1, mem_size, 0);
+	if (!new_cont->pools) {
 		rte_errno = ENOMEM;
 		return NULL;
 	}
+	if (cont->n)
+		memcpy(new_cont->pools, cont->pools, cont->n *
+		       sizeof(struct mlx5_flow_counter_pool *));
 	mem_mng = flow_dv_create_counter_stat_mem_mng(dev,
-						    MLX5_CNT_CONTAINER_RESIZE);
+		MLX5_CNT_CONTAINER_RESIZE + MLX5_MAX_PENDING_QUERIES);
 	if (!mem_mng) {
-		rte_free(new_pools);
+		rte_free(new_cont->pools);
 		return NULL;
 	}
-	if (cont->n) {
-		memcpy(new_pools, cont->pools,
-		       cont->n * sizeof(struct mlx5_flow_counter_pool *));
-		rte_free(cont->pools);
-	} else {
-		TAILQ_INIT(&cont->pool_list);
-	}
-	cont->pools = new_pools;
-	cont->n = resize;
-	cont->init_mem_mng = mem_mng;
-	return cont;
+	for (i = 0; i < MLX5_MAX_PENDING_QUERIES; ++i)
+		LIST_INSERT_HEAD(&priv->sh->cmng.free_stat_raws,
+				 mem_mng->raws + MLX5_CNT_CONTAINER_RESIZE +
+				 i, next);
+	new_cont->n = resize;
+	rte_atomic16_set(&new_cont->n_valid, rte_atomic16_read(&cont->n_valid));
+	TAILQ_INIT(&new_cont->pool_list);
+	TAILQ_CONCAT(&new_cont->pool_list, &cont->pool_list, next);
+	new_cont->init_mem_mng = mem_mng;
+	rte_cio_wmb();
+	 /* Flip the master container. */
+	priv->sh->cmng.mhi[batch] ^= (uint8_t)1;
+	return new_cont;
 }
 
 /**
@@ -2295,22 +2308,22 @@ struct field_modify_info modify_tcp[] = {
 {
 	struct mlx5_flow_counter_pool *pool =
 			flow_dv_counter_pool_get(cnt);
-	uint16_t offset = pool->min_dcs->id % MLX5_COUNTERS_PER_POOL;
-	int ret = mlx5_devx_cmd_flow_counter_query
-		(pool->min_dcs, 0, MLX5_COUNTERS_PER_POOL - offset, NULL,
-		 NULL, pool->raw->mem_mng->dm->id,
-		 (void *)(uintptr_t)(pool->raw->data +
-		 offset));
-
-	if (ret) {
-		DRV_LOG(ERR, "Failed to trigger synchronous"
-			" query for dcs ID %d\n",
-			pool->min_dcs->id);
-		return ret;
+	int offset = cnt - &pool->counters_raw[0];
+
+	rte_spinlock_lock(&pool->sl);
+	/*
+	 * The single counters allocation may allocate smaller ID than the
+	 * current allocated in parallel to the host reading.
+	 * In this case the new counter values must be reported as 0.
+	 */
+	if (unlikely(!cnt->batch && cnt->dcs->id < pool->raw->min_dcs_id)) {
+		*pkts = 0;
+		*bytes = 0;
+	} else {
+		*pkts = rte_be_to_cpu_64(pool->raw->data[offset].hits);
+		*bytes = rte_be_to_cpu_64(pool->raw->data[offset].bytes);
 	}
-	offset = cnt - &pool->counters_raw[0];
-	*pkts = rte_be_to_cpu_64(pool->raw->data[offset].hits);
-	*bytes = rte_be_to_cpu_64(pool->raw->data[offset].bytes);
+	rte_spinlock_unlock(&pool->sl);
 	return 0;
 }
 
@@ -2333,10 +2346,12 @@ struct field_modify_info modify_tcp[] = {
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_counter_pool *pool;
-	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
+	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv->sh, batch,
+							       0);
+	int16_t n_valid = rte_atomic16_read(&cont->n_valid);
 	uint32_t size;
 
-	if (cont->n == cont->n_valid) {
+	if (cont->n == n_valid) {
 		cont = flow_dv_container_resize(dev, batch);
 		if (!cont)
 			return NULL;
@@ -2349,12 +2364,21 @@ struct field_modify_info modify_tcp[] = {
 		return NULL;
 	}
 	pool->min_dcs = dcs;
-	pool->raw = cont->init_mem_mng->raws + cont->n_valid  %
-			MLX5_CNT_CONTAINER_RESIZE;
+	pool->raw = cont->init_mem_mng->raws + n_valid %
+						     MLX5_CNT_CONTAINER_RESIZE;
+	pool->raw_hw = NULL;
+	rte_spinlock_init(&pool->sl);
+	/*
+	 * The generation of the new allocated counters in this pool is 0, 2 in
+	 * the pool generation makes all the counters valid for allocation.
+	 */
+	rte_atomic64_set(&pool->query_gen, 0x2);
 	TAILQ_INIT(&pool->counters);
 	TAILQ_INSERT_TAIL(&cont->pool_list, pool, next);
-	cont->pools[cont->n_valid] = pool;
-	cont->n_valid++;
+	cont->pools[n_valid] = pool;
+	/* Pool initialization must be updated before host thread access. */
+	rte_cio_wmb();
+	rte_atomic16_add(&cont->n_valid, 1);
 	return pool;
 }
 
@@ -2388,8 +2412,8 @@ struct field_modify_info modify_tcp[] = {
 		dcs = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, 0);
 		if (!dcs)
 			return NULL;
-		pool = flow_dv_find_pool_by_id(MLX5_CNT_CONTAINER(priv, batch),
-					       dcs->id);
+		pool = flow_dv_find_pool_by_id
+			(MLX5_CNT_CONTAINER(priv->sh, batch, 0), dcs->id);
 		if (!pool) {
 			pool = flow_dv_pool_create(dev, dcs, batch);
 			if (!pool) {
@@ -2397,7 +2421,8 @@ struct field_modify_info modify_tcp[] = {
 				return NULL;
 			}
 		} else if (dcs->id < pool->min_dcs->id) {
-			pool->min_dcs->id = dcs->id;
+			rte_atomic64_set(&pool->a64_dcs,
+					 (int64_t)(uintptr_t)dcs);
 		}
 		cnt = &pool->counters_raw[dcs->id % MLX5_COUNTERS_PER_POOL];
 		TAILQ_INSERT_HEAD(&pool->counters, cnt, next);
@@ -2486,8 +2511,13 @@ struct field_modify_info modify_tcp[] = {
 	 * shared counters from the single container.
 	 */
 	uint32_t batch = (group && !shared) ? 1 : 0;
-	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
+	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv->sh, batch,
+							       0);
 
+#ifndef HAVE_IBV_DEVX_ASYNC
+	rte_errno = ENOTSUP;
+	return NULL;
+#endif
 	if (!priv->config.devx) {
 		rte_errno = ENOTSUP;
 		return NULL;
@@ -2504,9 +2534,22 @@ struct field_modify_info modify_tcp[] = {
 		}
 	}
 	/* Pools which has a free counters are in the start. */
-	pool = TAILQ_FIRST(&cont->pool_list);
-	if (pool)
+	TAILQ_FOREACH(pool, &cont->pool_list, next) {
+		/*
+		 * The free counter reset values must be updated between the
+		 * counter release to the counter allocation, so, at least one
+		 * query must be done in this time. ensure it by saving the
+		 * query generation in the release time.
+		 * The free list is sorted according to the generation - so if
+		 * the first one is not updated, all the others are not
+		 * updated too.
+		 */
 		cnt_free = TAILQ_FIRST(&pool->counters);
+		if (cnt_free && cnt_free->query_gen + 1 <
+		    rte_atomic64_read(&pool->query_gen))
+			break;
+		cnt_free = NULL;
+	}
 	if (!cnt_free) {
 		pool = flow_dv_counter_pool_prepare(dev, &cnt_free, batch);
 		if (!pool)
@@ -2539,6 +2582,9 @@ struct field_modify_info modify_tcp[] = {
 	cnt_free->shared = shared;
 	cnt_free->ref_cnt = 1;
 	cnt_free->id = id;
+	if (!priv->sh->cmng.query_thread_on)
+		/* Start the asynchronous batch query by the host thread. */
+		mlx5_set_query_alarm(priv->sh);
 	TAILQ_REMOVE(&pool->counters, cnt_free, next);
 	if (TAILQ_EMPTY(&pool->counters)) {
 		/* Move the pool to the end of the container pool list. */
@@ -2566,8 +2612,9 @@ struct field_modify_info modify_tcp[] = {
 		struct mlx5_flow_counter_pool *pool =
 				flow_dv_counter_pool_get(counter);
 
-		/* Put the counter in the end - the earliest one. */
+		/* Put the counter in the end - the last updated one. */
 		TAILQ_INSERT_TAIL(&pool->counters, counter, next);
+		counter->query_gen = rte_atomic64_read(&pool->query_gen);
 	}
 }
 
diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
index ba5fd06..942f89d 100644
--- a/drivers/net/mlx5/mlx5_glue.c
+++ b/drivers/net/mlx5/mlx5_glue.c
@@ -849,6 +849,64 @@
 #endif
 }
 
+static struct mlx5dv_devx_cmd_comp *
+mlx5_glue_devx_create_cmd_comp(struct ibv_context *ctx)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_create_cmd_comp(ctx);
+#else
+	(void)ctx;
+	errno = -ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_devx_destroy_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	mlx5dv_devx_destroy_cmd_comp(cmd_comp);
+#else
+	(void)cmd_comp;
+	errno = -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_query_async(struct mlx5dv_devx_obj *obj, const void *in,
+			       size_t inlen, size_t outlen, uint64_t wr_id,
+			       struct mlx5dv_devx_cmd_comp *cmd_comp)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_obj_query_async(obj, in, inlen, outlen, wr_id,
+					   cmd_comp);
+#else
+	(void)obj;
+	(void)in;
+	(void)inlen;
+	(void)outlen;
+	(void)wr_id;
+	(void)cmd_comp;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_get_async_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp,
+				  struct mlx5dv_devx_async_cmd_hdr *cmd_resp,
+				  size_t cmd_resp_len)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_get_async_cmd_comp(cmd_comp, cmd_resp,
+					      cmd_resp_len);
+#else
+	(void)cmd_comp;
+	(void)cmd_resp;
+	(void)cmd_resp_len;
+	return -ENOTSUP;
+#endif
+}
+
 static struct mlx5dv_devx_umem *
 mlx5_glue_devx_umem_reg(struct ibv_context *context, void *addr, size_t size,
 			uint32_t access)
@@ -957,6 +1015,10 @@
 	.devx_obj_query = mlx5_glue_devx_obj_query,
 	.devx_obj_modify = mlx5_glue_devx_obj_modify,
 	.devx_general_cmd = mlx5_glue_devx_general_cmd,
+	.devx_create_cmd_comp = mlx5_glue_devx_create_cmd_comp,
+	.devx_destroy_cmd_comp = mlx5_glue_devx_destroy_cmd_comp,
+	.devx_obj_query_async = mlx5_glue_devx_obj_query_async,
+	.devx_get_async_cmd_comp = mlx5_glue_devx_get_async_cmd_comp,
 	.devx_umem_reg = mlx5_glue_devx_umem_reg,
 	.devx_umem_dereg = mlx5_glue_devx_umem_dereg,
 };
diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
index 18b1ce6..9facdb9 100644
--- a/drivers/net/mlx5/mlx5_glue.h
+++ b/drivers/net/mlx5/mlx5_glue.h
@@ -64,6 +64,11 @@
 struct mlx5dv_devx_umem;
 #endif
 
+#ifndef HAVE_IBV_DEVX_ASYNC
+struct mlx5dv_devx_cmd_comp;
+struct mlx5dv_devx_async_cmd_hdr;
+#endif
+
 #ifndef HAVE_MLX5DV_DR
 enum  mlx5dv_dr_domain_type { unused, };
 struct mlx5dv_dr_domain;
@@ -210,6 +215,16 @@ struct mlx5_glue {
 	int (*devx_general_cmd)(struct ibv_context *context,
 				const void *in, size_t inlen,
 				void *out, size_t outlen);
+	struct mlx5dv_devx_cmd_comp *(*devx_create_cmd_comp)
+					(struct ibv_context *context);
+	void (*devx_destroy_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp);
+	int (*devx_obj_query_async)(struct mlx5dv_devx_obj *obj,
+				    const void *in, size_t inlen,
+				    size_t outlen, uint64_t wr_id,
+				    struct mlx5dv_devx_cmd_comp *cmd_comp);
+	int (*devx_get_async_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp,
+				       struct mlx5dv_devx_async_cmd_hdr *resp,
+				       size_t cmd_resp_len);
 	struct mlx5dv_devx_umem *(*devx_umem_reg)(struct ibv_context *context,
 						  void *addr, size_t size,
 						  uint32_t access);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH 4/4] net/mlx5: allow basic counter management fallback
  2019-07-08 14:07 [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Matan Azrad
                   ` (2 preceding siblings ...)
  2019-07-08 14:07 ` [dpdk-dev] [PATCH 3/4] net/mlx5: accelerate DV flow counter query Matan Azrad
@ 2019-07-08 14:07 ` Matan Azrad
  2019-07-16 14:34 ` [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Matan Azrad
  4 siblings, 0 replies; 11+ messages in thread
From: Matan Azrad @ 2019-07-08 14:07 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko; +Cc: dev

In case the asynchronous devx commands are not supported in RDMA core
fallback to use a basic counter management.

Here, the PMD counters cashe is redundant and the host thread doesn't
update it. hence, each counter operation will go to the FW and the
acceleration reduces.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c           |   8 +++
 drivers/net/mlx5/mlx5.h           |   2 +
 drivers/net/mlx5/mlx5_devx_cmds.c |   4 +-
 drivers/net/mlx5/mlx5_flow_dv.c   | 127 ++++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_prm.h       |   4 +-
 5 files changed, 137 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a8d824e..f4ad5d2 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1624,11 +1624,19 @@ struct mlx5_dev_spawn_data {
 	mlx5_link_update(eth_dev, 0);
 #ifdef HAVE_IBV_DEVX_OBJ
 	if (config.devx) {
+		priv->counter_fallback = 0;
 		err = mlx5_devx_cmd_query_hca_attr(sh->ctx, &config.hca_attr);
 		if (err) {
 			err = -err;
 			goto error;
 		}
+		if (!config.hca_attr.flow_counters_dump)
+			priv->counter_fallback = 1;
+#ifndef HAVE_IBV_DEVX_ASYNC
+		priv->counter_fallback = 1;
+#endif
+		if (priv->counter_fallback)
+			DRV_LOG(INFO, "Use fall-back DV counter management\n");
 	}
 #endif
 #ifdef HAVE_MLX5DV_DR_ESWITCH
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4ce352a..2bd2aa6 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -168,6 +168,7 @@ struct mlx5_devx_mkey_attr {
 /* HCA attributes. */
 struct mlx5_hca_attr {
 	uint32_t eswitch_manager:1;
+	uint32_t flow_counters_dump:1;
 	uint8_t flow_counter_bulk_alloc_bitmap;
 };
 
@@ -457,6 +458,7 @@ struct mlx5_priv {
 	unsigned int representor:1; /* Device is a port representor. */
 	unsigned int master:1; /* Device is a E-Switch master. */
 	unsigned int dr_shared:1; /* DV/DR data is shared. */
+	unsigned int counter_fallback:1; /* Use counter fallback management. */
 	uint16_t domain_id; /* Switch domain identifier. */
 	uint16_t vport_id; /* Associated VF vport index (if any). */
 	int32_t representor_id; /* Port representor identifier. */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 28d967a..d26d5bc 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -57,7 +57,7 @@ struct mlx5_devx_obj *
  * @param[in] clear
  *   Whether hardware should clear the counters after the query or not.
  * @param[in] n_counters
- *   The counter number to read.
+ *   0 in case of 1 counter to read, otherwise the counter number to read.
  *  @param pkts
  *   The number of packets that matched the flow.
  *  @param bytes
@@ -271,6 +271,8 @@ struct mlx5_devx_obj *
 	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
 	attr->flow_counter_bulk_alloc_bitmap =
 			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
+	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
+					    flow_counters_dump);
 	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
 	return 0;
 }
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index b4a1463..629816e 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -2117,6 +2117,113 @@ struct field_modify_info modify_tcp[] = {
 #define MLX5_CNT_CONTAINER_RESIZE 64
 
 /**
+ * Get or create a flow counter.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] shared
+ *   Indicate if this counter is shared with other flows.
+ * @param[in] id
+ *   Counter identifier.
+ *
+ * @return
+ *   pointer to flow counter on success, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_flow_counter *
+flow_dv_counter_alloc_fallback(struct rte_eth_dev *dev, uint32_t shared,
+			       uint32_t id)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_counter *cnt = NULL;
+	struct mlx5_devx_obj *dcs = NULL;
+
+	if (!priv->config.devx) {
+		rte_errno = ENOTSUP;
+		return NULL;
+	}
+	if (shared) {
+		TAILQ_FOREACH(cnt, &priv->sh->cmng.flow_counters, next) {
+			if (cnt->shared && cnt->id == id) {
+				cnt->ref_cnt++;
+				return cnt;
+			}
+		}
+	}
+	dcs = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, 0);
+	if (!dcs)
+		return NULL;
+	cnt = rte_calloc(__func__, 1, sizeof(*cnt), 0);
+	if (!cnt) {
+		claim_zero(mlx5_devx_cmd_destroy(cnt->dcs));
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	struct mlx5_flow_counter tmpl = {
+		.shared = shared,
+		.ref_cnt = 1,
+		.id = id,
+		.dcs = dcs,
+	};
+	tmpl.action = mlx5_glue->dv_create_flow_action_counter(dcs->obj, 0);
+	if (!tmpl.action) {
+		claim_zero(mlx5_devx_cmd_destroy(cnt->dcs));
+		rte_errno = errno;
+		rte_free(cnt);
+		return NULL;
+	}
+	*cnt = tmpl;
+	TAILQ_INSERT_HEAD(&priv->sh->cmng.flow_counters, cnt, next);
+	return cnt;
+}
+
+/**
+ * Release a flow counter.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] counter
+ *   Pointer to the counter handler.
+ */
+static void
+flow_dv_counter_release_fallback(struct rte_eth_dev *dev,
+				 struct mlx5_flow_counter *counter)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (!counter)
+		return;
+	if (--counter->ref_cnt == 0) {
+		TAILQ_REMOVE(&priv->sh->cmng.flow_counters, counter, next);
+		claim_zero(mlx5_devx_cmd_destroy(counter->dcs));
+		rte_free(counter);
+	}
+}
+
+/**
+ * Query a devx flow counter.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] cnt
+ *   Pointer to the flow counter.
+ * @param[out] pkts
+ *   The statistics value of packets.
+ * @param[out] bytes
+ *   The statistics value of bytes.
+ *
+ * @return
+ *   0 on success, otherwise a negative errno value and rte_errno is set.
+ */
+static inline int
+_flow_dv_query_count_fallback(struct rte_eth_dev *dev __rte_unused,
+		     struct mlx5_flow_counter *cnt, uint64_t *pkts,
+		     uint64_t *bytes)
+{
+	return mlx5_devx_cmd_flow_counter_query(cnt->dcs, 0, 0, pkts, bytes,
+						0, NULL, NULL, 0);
+}
+
+/**
  * Get a pool by a counter.
  *
  * @param[in] cnt
@@ -2302,14 +2409,18 @@ struct field_modify_info modify_tcp[] = {
  *   0 on success, otherwise a negative errno value and rte_errno is set.
  */
 static inline int
-_flow_dv_query_count(struct rte_eth_dev *dev __rte_unused,
+_flow_dv_query_count(struct rte_eth_dev *dev,
 		     struct mlx5_flow_counter *cnt, uint64_t *pkts,
 		     uint64_t *bytes)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_counter_pool *pool =
 			flow_dv_counter_pool_get(cnt);
 	int offset = cnt - &pool->counters_raw[0];
 
+	if (priv->counter_fallback)
+		return _flow_dv_query_count_fallback(dev, cnt, pkts, bytes);
+
 	rte_spinlock_lock(&pool->sl);
 	/*
 	 * The single counters allocation may allocate smaller ID than the
@@ -2514,10 +2625,8 @@ struct field_modify_info modify_tcp[] = {
 	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv->sh, batch,
 							       0);
 
-#ifndef HAVE_IBV_DEVX_ASYNC
-	rte_errno = ENOTSUP;
-	return NULL;
-#endif
+	if (priv->counter_fallback)
+		return flow_dv_counter_alloc_fallback(dev, shared, id);
 	if (!priv->config.devx) {
 		rte_errno = ENOTSUP;
 		return NULL;
@@ -2603,11 +2712,17 @@ struct field_modify_info modify_tcp[] = {
  *   Pointer to the counter handler.
  */
 static void
-flow_dv_counter_release(struct rte_eth_dev *dev __rte_unused,
+flow_dv_counter_release(struct rte_eth_dev *dev,
 			struct mlx5_flow_counter *counter)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	if (!counter)
 		return;
+	if (priv->counter_fallback) {
+		flow_dv_counter_release_fallback(dev, counter);
+		return;
+	}
 	if (--counter->ref_cnt == 0) {
 		struct mlx5_flow_counter_pool *pool =
 				flow_dv_counter_pool_get(counter);
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index e2e538d..b53e6ce 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -916,7 +916,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8 reserved_at_343[0x5];
 	u8 log_max_flow_counter_bulk[0x8];
 	u8 max_flow_counter_15_0[0x10];
-	u8 reserved_at_360[0x3];
+	u8 modify_tis[0x1];
+	u8 flow_counters_dump[0x1];
+	u8 reserved_at_360[0x1];
 	u8 log_max_rq[0x5];
 	u8 reserved_at_368[0x3];
 	u8 log_max_sq[0x5];
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement
  2019-07-08 14:07 [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Matan Azrad
                   ` (3 preceding siblings ...)
  2019-07-08 14:07 ` [dpdk-dev] [PATCH 4/4] net/mlx5: allow basic counter management fallback Matan Azrad
@ 2019-07-16 14:34 ` Matan Azrad
  2019-07-16 14:34   ` [dpdk-dev] [PATCH 1/4] net/mlx5: accelerate DV flow counter transactions Matan Azrad
                     ` (4 more replies)
  4 siblings, 5 replies; 11+ messages in thread
From: Matan Azrad @ 2019-07-16 14:34 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko; +Cc: dev

New features in devx to query and allocate flow counters by batch commands allow to accelerate flow counter create/destroy/query.

v2:
rebase.

Matan Azrad (4):
  net/mlx5: accelerate DV flow counter transactions
  net/mlx5: resize a full counter container
  net/mlx5: accelerate DV flow counter query
  net/mlx5: allow basic counter management fallback

 doc/guides/rel_notes/release_19_08.rst |   2 +
 drivers/net/mlx5/Makefile              |   7 +-
 drivers/net/mlx5/meson.build           |   4 +-
 drivers/net/mlx5/mlx5.c                | 102 ++++++
 drivers/net/mlx5/mlx5.h                | 145 +++++++-
 drivers/net/mlx5/mlx5_devx_cmds.c      | 225 +++++++++---
 drivers/net/mlx5/mlx5_ethdev.c         |  85 ++++-
 drivers/net/mlx5/mlx5_flow.c           | 147 ++++++++
 drivers/net/mlx5/mlx5_flow.h           |  27 +-
 drivers/net/mlx5/mlx5_flow_dv.c        | 616 ++++++++++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_flow_verbs.c     |  15 +-
 drivers/net/mlx5/mlx5_glue.c           |  91 +++++
 drivers/net/mlx5/mlx5_glue.h           |  20 ++
 drivers/net/mlx5/mlx5_prm.h            | 116 ++++++-
 14 files changed, 1463 insertions(+), 139 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH 1/4] net/mlx5: accelerate DV flow counter transactions
  2019-07-16 14:34 ` [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Matan Azrad
@ 2019-07-16 14:34   ` Matan Azrad
  2019-07-16 14:34   ` [dpdk-dev] [PATCH 2/4] net/mlx5: resize a full counter container Matan Azrad
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Matan Azrad @ 2019-07-16 14:34 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko; +Cc: dev

The DevX interface exposes a new feature to the PMD that can allocate a
batch of counters by one FW command. It can improve the flow
transaction rate (with count action).

Add a new counter pools mechanism to manage HW counters in the PMD.
So, for each flow with counter creation the PMD will try to find a free
counter in the PMD pools container and only if there is no a free
counter, it will allocate a new DevX batch counters.

Currently we cannot support batch counter for a group 0 flow, so
create a 2 container types, one which allocates counters one by
one and one which allocates X counters by the batch feature.

The allocated counters objects are never released back to the HW
assuming the flows maximum number will be close to the actual value of
the flows number.
Later, it can be updated, and dynamic release mechanism can be added.

The counters are contained in pools, each pool with 512 counters.
The pools are contained in counter containers according to the
allocation resolution type - single or batch.
The cache memory of the counters statistics is saved as raw data per
pool.
All the raw data memory is allocated for all the container in one
memory allocation and is managed by counter_stats_mem_mng structure
which registers all the raw memory to the HW.
Each pool points to one raw data structure.

The query operation is in pool resolution which updates all the pool
counter raw data by one operation.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/Makefile          |   2 +-
 drivers/net/mlx5/meson.build       |   2 +-
 drivers/net/mlx5/mlx5.c            |  85 +++++++
 drivers/net/mlx5/mlx5.h            | 115 ++++++++-
 drivers/net/mlx5/mlx5_devx_cmds.c  | 185 ++++++++++----
 drivers/net/mlx5/mlx5_flow.h       |  19 --
 drivers/net/mlx5/mlx5_flow_dv.c    | 485 ++++++++++++++++++++++++++++++++-----
 drivers/net/mlx5/mlx5_flow_verbs.c |  15 +-
 drivers/net/mlx5/mlx5_glue.c       |  29 +++
 drivers/net/mlx5/mlx5_glue.h       |   5 +
 drivers/net/mlx5/mlx5_prm.h        | 112 ++++++++-
 11 files changed, 902 insertions(+), 152 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 619e6b6..b210c80 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -8,7 +8,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_pmd_mlx5.a
 LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
 LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
-LIB_GLUE_VERSION = 19.05.0
+LIB_GLUE_VERSION = 19.08.0
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 3eff22e..2c23e44 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -11,7 +11,7 @@ build = true
 
 pmd_dlopen = (get_option('ibverbs_link') == 'dlopen')
 LIB_GLUE_BASE = 'librte_pmd_mlx5_glue.so'
-LIB_GLUE_VERSION = '19.05.0'
+LIB_GLUE_VERSION = '19.08.0'
 LIB_GLUE = LIB_GLUE_BASE + '.' + LIB_GLUE_VERSION
 if pmd_dlopen
 	dpdk_conf.set('RTE_IBVERBS_LINK_DLOPEN', 1)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d93f92d..62be141 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -157,6 +157,89 @@ struct mlx5_dev_spawn_data {
 static pthread_mutex_t mlx5_ibv_list_mutex = PTHREAD_MUTEX_INITIALIZER;
 
 /**
+ * Initialize the counters management structure.
+ *
+ * @param[in] sh
+ *   Pointer to mlx5_ibv_shared object to free
+ */
+static void
+mlx5_flow_counters_mng_init(struct mlx5_ibv_shared *sh)
+{
+	uint8_t i;
+
+	TAILQ_INIT(&sh->cmng.flow_counters);
+	for (i = 0; i < RTE_DIM(sh->cmng.ccont); ++i)
+		TAILQ_INIT(&sh->cmng.ccont[i].pool_list);
+}
+
+/**
+ * Destroy all the resources allocated for a counter memory management.
+ *
+ * @param[in] mng
+ *   Pointer to the memory management structure.
+ */
+static void
+mlx5_flow_destroy_counter_stat_mem_mng(struct mlx5_counter_stats_mem_mng *mng)
+{
+	uint8_t *mem = (uint8_t *)(uintptr_t)mng->raws[0].data;
+
+	LIST_REMOVE(mng, next);
+	claim_zero(mlx5_devx_cmd_destroy(mng->dm));
+	claim_zero(mlx5_glue->devx_umem_dereg(mng->umem));
+	rte_free(mem);
+}
+
+/**
+ * Close and release all the resources of the counters management.
+ *
+ * @param[in] sh
+ *   Pointer to mlx5_ibv_shared object to free.
+ */
+static void
+mlx5_flow_counters_mng_close(struct mlx5_ibv_shared *sh)
+{
+	struct mlx5_counter_stats_mem_mng *mng;
+	uint8_t i;
+	int j;
+
+	for (i = 0; i < RTE_DIM(sh->cmng.ccont); ++i) {
+		struct mlx5_flow_counter_pool *pool;
+		uint32_t batch = !!(i % 2);
+
+		if (!sh->cmng.ccont[i].pools)
+			continue;
+		pool = TAILQ_FIRST(&sh->cmng.ccont[i].pool_list);
+		while (pool) {
+			if (batch) {
+				if (pool->min_dcs)
+					claim_zero
+					(mlx5_devx_cmd_destroy(pool->min_dcs));
+			}
+			for (j = 0; j < MLX5_COUNTERS_PER_POOL; ++j) {
+				if (pool->counters_raw[j].action)
+					claim_zero
+					(mlx5_glue->destroy_flow_action
+					       (pool->counters_raw[j].action));
+				if (!batch && pool->counters_raw[j].dcs)
+					claim_zero(mlx5_devx_cmd_destroy
+						  (pool->counters_raw[j].dcs));
+			}
+			TAILQ_REMOVE(&sh->cmng.ccont[i].pool_list, pool,
+				     next);
+			rte_free(pool);
+			pool = TAILQ_FIRST(&sh->cmng.ccont[i].pool_list);
+		}
+		rte_free(sh->cmng.ccont[i].pools);
+	}
+	mng = LIST_FIRST(&sh->cmng.mem_mngs);
+	while (mng) {
+		mlx5_flow_destroy_counter_stat_mem_mng(mng);
+		mng = LIST_FIRST(&sh->cmng.mem_mngs);
+	}
+	memset(&sh->cmng, 0, sizeof(sh->cmng));
+}
+
+/**
  * Allocate shared IB device context. If there is multiport device the
  * master and representors will share this context, if there is single
  * port dedicated IB device, the context will be used by only given
@@ -260,6 +343,7 @@ struct mlx5_dev_spawn_data {
 		err = rte_errno;
 		goto error;
 	}
+	mlx5_flow_counters_mng_init(sh);
 	LIST_INSERT_HEAD(&mlx5_ibv_list, sh, next);
 exit:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
@@ -314,6 +398,7 @@ struct mlx5_dev_spawn_data {
 	 *  Ensure there is no async event handler installed.
 	 *  Only primary process handles async device events.
 	 **/
+	mlx5_flow_counters_mng_close(sh);
 	assert(!sh->intr_cnt);
 	if (sh->intr_cnt)
 		mlx5_intr_callback_unregister
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5af3f41..3944b5f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -152,15 +152,23 @@ struct mlx5_stats_ctrl {
 	uint64_t imissed_base;
 };
 
-/* devx counter object */
-struct mlx5_devx_counter_set {
-	struct mlx5dv_devx_obj *obj;
-	int id; /* Flow counter ID */
+/* devX creation object */
+struct mlx5_devx_obj {
+	struct mlx5dv_devx_obj *obj; /* The DV object. */
+	int id; /* The object ID. */
+};
+
+struct mlx5_devx_mkey_attr {
+	uint64_t addr;
+	uint64_t size;
+	uint32_t umem_id;
+	uint32_t pd;
 };
 
 /* HCA attributes. */
 struct mlx5_hca_attr {
 	uint32_t eswitch_manager:1;
+	uint8_t flow_counter_bulk_alloc_bitmap;
 };
 
 /* Flow list . */
@@ -248,6 +256,87 @@ struct mlx5_drop {
 	struct mlx5_rxq_ibv *rxq; /* Verbs Rx queue. */
 };
 
+#define MLX5_COUNTERS_PER_POOL 512
+
+struct mlx5_flow_counter_pool;
+
+struct flow_counter_stats {
+	uint64_t hits;
+	uint64_t bytes;
+};
+
+/* Counters information. */
+struct mlx5_flow_counter {
+	TAILQ_ENTRY(mlx5_flow_counter) next;
+	/**< Pointer to the next flow counter structure. */
+	uint32_t shared:1; /**< Share counter ID with other flow rules. */
+	uint32_t batch: 1;
+	/**< Whether the counter was allocated by batch command. */
+	uint32_t ref_cnt:30; /**< Reference counter. */
+	uint32_t id; /**< Counter ID. */
+	union {  /**< Holds the counters for the rule. */
+#if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
+		struct ibv_counter_set *cs;
+#elif defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
+		struct ibv_counters *cs;
+#endif
+		struct mlx5_devx_obj *dcs; /**< Counter Devx object. */
+		struct mlx5_flow_counter_pool *pool; /**< The counter pool. */
+	};
+	uint64_t hits; /**< Reset value of hits packets. */
+	uint64_t bytes; /**< Reset value of bytes. */
+	void *action; /**< Pointer to the dv action. */
+};
+
+TAILQ_HEAD(mlx5_counters, mlx5_flow_counter);
+
+/* Counter pool structure - query is in pool resolution. */
+struct mlx5_flow_counter_pool {
+	TAILQ_ENTRY(mlx5_flow_counter_pool) next;
+	struct mlx5_counters counters; /* Free counter list. */
+	struct mlx5_devx_obj *min_dcs;
+	/* The devx object of the minimum counter ID in the pool. */
+	struct mlx5_counter_stats_raw *raw; /* The counter stats memory raw. */
+	struct mlx5_flow_counter counters_raw[]; /* The counters memory. */
+};
+
+struct mlx5_counter_stats_raw;
+
+/* Memory management structure for group of counter statistics raws. */
+struct mlx5_counter_stats_mem_mng {
+	LIST_ENTRY(mlx5_counter_stats_mem_mng) next;
+	struct mlx5_counter_stats_raw *raws;
+	struct mlx5_devx_obj *dm;
+	struct mlx5dv_devx_umem *umem;
+};
+
+/* Raw memory structure for the counter statistics values of a pool. */
+struct mlx5_counter_stats_raw {
+	LIST_ENTRY(mlx5_counter_stats_raw) next;
+	int min_dcs_id;
+	struct mlx5_counter_stats_mem_mng *mem_mng;
+	volatile struct flow_counter_stats *data;
+};
+
+TAILQ_HEAD(mlx5_counter_pools, mlx5_flow_counter_pool);
+
+/* Container structure for counter pools. */
+struct mlx5_pools_container {
+	uint16_t n_valid; /* Number of valid pools. */
+	uint16_t n; /* Number of pools. */
+	struct mlx5_counter_pools pool_list; /* Counter pool list. */
+	struct mlx5_flow_counter_pool **pools; /* Counter pool array. */
+	struct mlx5_counter_stats_mem_mng *init_mem_mng;
+	/* Hold the memory management for the next allocated pools raws. */
+};
+
+/* Counter global management structure. */
+struct mlx5_flow_counter_mng {
+	struct mlx5_pools_container ccont[2];
+	struct mlx5_counters flow_counters; /* Legacy flow counter list. */
+	LIST_HEAD(mem_mngs, mlx5_counter_stats_mem_mng) mem_mngs;
+};
+
 /* Per port data of shared IB device. */
 struct mlx5_ibv_shared_port {
 	uint32_t ih_port_id;
@@ -314,6 +403,7 @@ struct mlx5_ibv_shared {
 	LIST_HEAD(jump, mlx5_flow_dv_jump_tbl_resource) jump_tbl;
 	LIST_HEAD(port_id_action_list, mlx5_flow_dv_port_id_action_resource)
 		port_id_action_list; /* List of port ID actions. */
+	struct mlx5_flow_counter_mng cmng; /* Counters management structure. */
 	/* Shared interrupt handler section. */
 	pthread_mutex_t intr_mutex; /* Interrupt config mutex. */
 	uint32_t intr_cnt; /* Interrupt handler reference counter. */
@@ -362,8 +452,6 @@ struct mlx5_priv {
 	struct mlx5_drop drop_queue; /* Flow drop queues. */
 	struct mlx5_flows flows; /* RTE Flow rules. */
 	struct mlx5_flows ctrl_flows; /* Control flow rules. */
-	LIST_HEAD(counters, mlx5_flow_counter) flow_counters;
-	/* Flow counters. */
 	LIST_HEAD(rxq, mlx5_rxq_ctrl) rxqsctrl; /* DPDK Rx queues. */
 	LIST_HEAD(rxqibv, mlx5_rxq_ibv) rxqsibv; /* Verbs Rx queues. */
 	LIST_HEAD(hrxq, mlx5_hrxq) hrxqs; /* Verbs Hash Rx queues. */
@@ -584,12 +672,15 @@ int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 
 /* mlx5_devx_cmds.c */
 
-int mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
-				     struct mlx5_devx_counter_set *dcx);
-int mlx5_devx_cmd_flow_counter_free(struct mlx5dv_devx_obj *obj);
-int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_counter_set *dcx,
-				     int clear,
-				     uint64_t *pkts, uint64_t *bytes);
+struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
+						       uint32_t bulk_sz);
+int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
+int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				     int clear, uint32_t n_counters,
+				     uint64_t *pkts, uint64_t *bytes,
+				     uint32_t mkey, void *addr);
 int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
 				 struct mlx5_hca_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+					     struct mlx5_devx_mkey_attr *attr);
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index e5776c4..92f2fc8 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -2,6 +2,8 @@
 /* Copyright 2018 Mellanox Technologies, Ltd */
 
 #include <rte_flow_driver.h>
+#include <rte_malloc.h>
+#include <unistd.h>
 
 #include "mlx5.h"
 #include "mlx5_glue.h"
@@ -14,47 +16,37 @@
  *   ibv contexts returned from mlx5dv_open_device.
  * @param dcs
  *   Pointer to counters properties structure to be filled by the routine.
+ * @param bulk_n_128
+ *   Bulk counter numbers in 128 counters units.
  *
  * @return
- *   0 on success, a negative value otherwise.
+ *   Pointer to counter object on success, a negative value otherwise and
+ *   rte_errno is set.
  */
-int mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
-				     struct mlx5_devx_counter_set *dcs)
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx, uint32_t bulk_n_128)
 {
+	struct mlx5_devx_obj *dcs = rte_zmalloc("dcs", sizeof(*dcs), 0);
 	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
 	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
-	int status, syndrome;
 
+	if (!dcs) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
 	MLX5_SET(alloc_flow_counter_in, in, opcode,
 		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
+	MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk, bulk_n_128);
 	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
 					      sizeof(in), out, sizeof(out));
-	if (!dcs->obj)
-		return -errno;
-	status = MLX5_GET(query_flow_counter_out, out, status);
-	syndrome = MLX5_GET(query_flow_counter_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to create devx counters, "
-			"status %x, syndrome %x", status, syndrome);
-		return -1;
+	if (!dcs->obj) {
+		DRV_LOG(ERR, "Can't allocate counters - error %d\n", errno);
+		rte_errno = errno;
+		rte_free(dcs);
+		return NULL;
 	}
-	dcs->id = MLX5_GET(alloc_flow_counter_out,
-			   out, flow_counter_id);
-	return 0;
-}
-
-/**
- * Free flow counters obtained via devx interface.
- *
- * @param[in] obj
- *   devx object that was obtained from mlx5_devx_cmd_fc_alloc.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int mlx5_devx_cmd_flow_counter_free(struct mlx5dv_devx_obj *obj)
-{
-	return mlx5_glue->devx_obj_destroy(obj);
+	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
+	return dcs;
 }
 
 /**
@@ -64,49 +56,140 @@ int mlx5_devx_cmd_flow_counter_free(struct mlx5dv_devx_obj *obj)
  *   devx object that was obtained from mlx5_devx_cmd_fc_alloc.
  * @param[in] clear
  *   Whether hardware should clear the counters after the query or not.
+ * @param[in] n_counters
+ *   The counter number to read.
  *  @param pkts
  *   The number of packets that matched the flow.
  *  @param bytes
  *    The number of bytes that matched the flow.
+ *  @param mkey
+ *   The mkey key for batch query.
+ *  @param addr
+ *    The address in the mkey range for batch query.
  *
  * @return
  *   0 on success, a negative value otherwise.
  */
 int
-mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_counter_set *dcs,
-				 int clear __rte_unused,
-				 uint64_t *pkts, uint64_t *bytes)
+mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs, int clear,
+				 uint32_t n_counters, uint64_t *pkts,
+				 uint64_t *bytes, uint32_t mkey, void *addr)
 {
-	uint32_t out[MLX5_ST_SZ_BYTES(query_flow_counter_out) +
-		MLX5_ST_SZ_BYTES(traffic_counter)]   = {0};
+	int out_len = MLX5_ST_SZ_BYTES(query_flow_counter_out) +
+			MLX5_ST_SZ_BYTES(traffic_counter);
+	uint32_t out[out_len];
 	uint32_t in[MLX5_ST_SZ_DW(query_flow_counter_in)] = {0};
 	void *stats;
-	int status, syndrome, rc;
+	int rc;
 
 	MLX5_SET(query_flow_counter_in, in, opcode,
 		 MLX5_CMD_OP_QUERY_FLOW_COUNTER);
 	MLX5_SET(query_flow_counter_in, in, op_mod, 0);
 	MLX5_SET(query_flow_counter_in, in, flow_counter_id, dcs->id);
-	rc = mlx5_glue->devx_obj_query(dcs->obj,
-				       in, sizeof(in), out, sizeof(out));
-	if (rc)
-		return rc;
-	status = MLX5_GET(query_flow_counter_out, out, status);
-	syndrome = MLX5_GET(query_flow_counter_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to query devx counters, "
-			"id %d, status %x, syndrome = %x",
-			status, syndrome, dcs->id);
-		return -1;
+	MLX5_SET(query_flow_counter_in, in, clear, !!clear);
+
+	if (n_counters) {
+		MLX5_SET(query_flow_counter_in, in, num_of_counters,
+			 n_counters);
+		MLX5_SET(query_flow_counter_in, in, dump_to_memory, 1);
+		MLX5_SET(query_flow_counter_in, in, mkey, mkey);
+		MLX5_SET64(query_flow_counter_in, in, address,
+			   (uint64_t)(uintptr_t)addr);
+	}
+	rc = mlx5_glue->devx_obj_query(dcs->obj, in, sizeof(in), out, out_len);
+	if (rc) {
+		DRV_LOG(ERR, "Failed to query devx counters with rc %d\n ", rc);
+		rte_errno = rc;
+		return -rc;
+	}
+	if (!n_counters) {
+		stats = MLX5_ADDR_OF(query_flow_counter_out,
+				     out, flow_statistics);
+		*pkts = MLX5_GET64(traffic_counter, stats, packets);
+		*bytes = MLX5_GET64(traffic_counter, stats, octets);
 	}
-	stats = MLX5_ADDR_OF(query_flow_counter_out,
-			     out, flow_statistics);
-	*pkts = MLX5_GET64(traffic_counter, stats, packets);
-	*bytes = MLX5_GET64(traffic_counter, stats, octets);
 	return 0;
 }
 
 /**
+ * Create a new mkey.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[in] attr
+ *   Attributes of the requested mkey.
+ *
+ * @return
+ *   Pointer to Devx mkey on success, a negative value otherwise and rte_errno
+ *   is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+			  struct mlx5_devx_mkey_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_mkey_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_mkey_out)] = {0};
+	void *mkc;
+	struct mlx5_devx_obj *mkey = rte_zmalloc("mkey", sizeof(*mkey), 0);
+	size_t pgsize;
+	uint32_t translation_size;
+
+	if (!mkey) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	pgsize = sysconf(_SC_PAGESIZE);
+	translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
+	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
+	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
+		 translation_size);
+	MLX5_SET(create_mkey_in, in, mkey_umem_id, attr->umem_id);
+	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	MLX5_SET(mkc, mkc, lw, 0x1);
+	MLX5_SET(mkc, mkc, lr, 0x1);
+	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
+	MLX5_SET(mkc, mkc, qpn, 0xffffff);
+	MLX5_SET(mkc, mkc, pd, attr->pd);
+	MLX5_SET(mkc, mkc, mkey_7_0, attr->umem_id & 0xFF);
+	MLX5_SET(mkc, mkc, translations_octword_size, translation_size);
+	MLX5_SET64(mkc, mkc, start_addr, attr->addr);
+	MLX5_SET64(mkc, mkc, len, attr->size);
+	MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
+	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+					       sizeof(out));
+	if (!mkey->obj) {
+		DRV_LOG(ERR, "Can't create mkey - error %d\n", errno);
+		rte_errno = errno;
+		rte_free(mkey);
+		return NULL;
+	}
+	mkey->id = MLX5_GET(create_mkey_out, out, mkey_index);
+	mkey->id = (mkey->id << 8) | (attr->umem_id & 0xFF);
+	return mkey;
+}
+
+/**
+ * Destroy any object allocated by a Devx API.
+ *
+ * @param[in] obj
+ *   Pointer to a general object.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj)
+{
+	int ret;
+
+	if (!obj)
+		return 0;
+	ret =  mlx5_glue->devx_obj_destroy(obj->obj);
+	rte_free(obj);
+	return ret;
+}
+
+/**
  * Query HCA attributes.
  * Using those attributes we can check on run time if the device
  * is having the required capabilities.
@@ -146,6 +229,8 @@ int mlx5_devx_cmd_flow_counter_free(struct mlx5dv_devx_obj *obj)
 		return -1;
 	}
 	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+	attr->flow_counter_bulk_alloc_bitmap =
+			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
 	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
 	return 0;
 }
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 72b339e..119bb31 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -359,25 +359,6 @@ struct mlx5_flow {
 	};
 };
 
-/* Counters information. */
-struct mlx5_flow_counter {
-	LIST_ENTRY(mlx5_flow_counter) next; /**< Pointer to the next counter. */
-	uint32_t shared:1; /**< Share counter ID with other flow rules. */
-	uint32_t ref_cnt:31; /**< Reference counter. */
-	uint32_t id; /**< Counter ID. */
-	union {  /**< Holds the counters for the rule. */
-#if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
-		struct ibv_counter_set *cs;
-#elif defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
-		struct ibv_counters *cs;
-#endif
-		struct mlx5_devx_counter_set *dcs;
-	};
-	uint64_t hits; /**< Number of packets matched by the rule. */
-	uint64_t bytes; /**< Number of bytes matched by the rule. */
-	void *action; /**< Pointer to the dv action. */
-};
-
 /* Flow structure. */
 struct rte_flow {
 	TAILQ_ENTRY(rte_flow) next; /**< Pointer to the next flow structure. */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 3fa624b..da16e48 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -6,6 +6,7 @@
 #include <stdalign.h>
 #include <stdint.h>
 #include <string.h>
+#include <unistd.h>
 
 /* Verbs header. */
 /* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
@@ -2146,8 +2147,344 @@ struct field_modify_info modify_tcp[] = {
 	return 0;
 }
 
+#define MLX5_CNT_CONTAINER_SIZE 64
+#define MLX5_CNT_CONTAINER(priv, batch) (&(priv)->sh->cmng.ccont[batch])
+
+/**
+ * Get a pool by a counter.
+ *
+ * @param[in] cnt
+ *   Pointer to the counter.
+ *
+ * @return
+ *   The counter pool.
+ */
+static struct mlx5_flow_counter_pool *
+flow_dv_counter_pool_get(struct mlx5_flow_counter *cnt)
+{
+	if (!cnt->batch) {
+		cnt -= cnt->dcs->id % MLX5_COUNTERS_PER_POOL;
+		return (struct mlx5_flow_counter_pool *)cnt - 1;
+	}
+	return cnt->pool;
+}
+
+/**
+ * Get a pool by devx counter ID.
+ *
+ * @param[in] cont
+ *   Pointer to the counter container.
+ * @param[in] id
+ *   The counter devx ID.
+ *
+ * @return
+ *   The counter pool pointer if exists, NULL otherwise,
+ */
+static struct mlx5_flow_counter_pool *
+flow_dv_find_pool_by_id(struct mlx5_pools_container *cont, int id)
+{
+	struct mlx5_flow_counter_pool *pool;
+
+	TAILQ_FOREACH(pool, &cont->pool_list, next) {
+		int base = (pool->min_dcs->id / MLX5_COUNTERS_PER_POOL) *
+				MLX5_COUNTERS_PER_POOL;
+
+		if (id >= base && id < base + MLX5_COUNTERS_PER_POOL)
+			return pool;
+	};
+	return NULL;
+}
+
+/**
+ * Allocate a new memory for the counter values wrapped by all the needed
+ * management.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] raws_n
+ *   The raw memory areas - each one for MLX5_COUNTERS_PER_POOL counters.
+ *
+ * @return
+ *   The new memory management pointer on success, otherwise NULL and rte_errno
+ *   is set.
+ */
+static struct mlx5_counter_stats_mem_mng *
+flow_dv_create_counter_stat_mem_mng(struct rte_eth_dev *dev, int raws_n)
+{
+	struct mlx5_ibv_shared *sh = ((struct mlx5_priv *)
+					(dev->data->dev_private))->sh;
+	struct mlx5dv_pd dv_pd;
+	struct mlx5dv_obj dv_obj;
+	struct mlx5_devx_mkey_attr mkey_attr;
+	struct mlx5_counter_stats_mem_mng *mem_mng;
+	volatile struct flow_counter_stats *raw_data;
+	int size = (sizeof(struct flow_counter_stats) *
+			MLX5_COUNTERS_PER_POOL +
+			sizeof(struct mlx5_counter_stats_raw)) * raws_n +
+			sizeof(struct mlx5_counter_stats_mem_mng);
+	uint8_t *mem = rte_calloc(__func__, 1, size, sysconf(_SC_PAGESIZE));
+	int i;
+
+	if (!mem) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	mem_mng = (struct mlx5_counter_stats_mem_mng *)(mem + size) - 1;
+	size = sizeof(*raw_data) * MLX5_COUNTERS_PER_POOL * raws_n;
+	mem_mng->umem = mlx5_glue->devx_umem_reg(sh->ctx, mem, size,
+						 IBV_ACCESS_LOCAL_WRITE);
+	if (!mem_mng->umem) {
+		rte_errno = errno;
+		rte_free(mem);
+		return NULL;
+	}
+	dv_obj.pd.in = sh->pd;
+	dv_obj.pd.out = &dv_pd;
+	mlx5_glue->dv_init_obj(&dv_obj, MLX5DV_OBJ_PD);
+	mkey_attr.addr = (uintptr_t)mem;
+	mkey_attr.size = size;
+	mkey_attr.umem_id = mem_mng->umem->umem_id;
+	mkey_attr.pd = dv_pd.pdn;
+	mem_mng->dm = mlx5_devx_cmd_mkey_create(sh->ctx, &mkey_attr);
+	if (!mem_mng->dm) {
+		mlx5_glue->devx_umem_dereg(mem_mng->umem);
+		rte_errno = errno;
+		rte_free(mem);
+		return NULL;
+	}
+	mem_mng->raws = (struct mlx5_counter_stats_raw *)(mem + size);
+	raw_data = (volatile struct flow_counter_stats *)mem;
+	for (i = 0; i < raws_n; ++i) {
+		mem_mng->raws[i].mem_mng = mem_mng;
+		mem_mng->raws[i].data = raw_data + i * MLX5_COUNTERS_PER_POOL;
+	}
+	LIST_INSERT_HEAD(&sh->cmng.mem_mngs, mem_mng, next);
+	return mem_mng;
+}
+
+/**
+ * Prepare a counter container.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] batch
+ *   Whether the pool is for counter that was allocated by batch command.
+ *
+ * @return
+ *   The container pointer on success, otherwise NULL and rte_errno is set.
+ */
+static struct mlx5_pools_container *
+flow_dv_container_prepare(struct rte_eth_dev *dev, uint32_t batch)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
+	struct mlx5_counter_stats_mem_mng *mem_mng;
+	uint32_t size = MLX5_CNT_CONTAINER_SIZE;
+	uint32_t mem_size = sizeof(struct mlx5_flow_counter_pool *) * size;
+
+	cont->pools = rte_calloc(__func__, 1, mem_size, 0);
+	if (!cont->pools) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	mem_mng = flow_dv_create_counter_stat_mem_mng(dev, size);
+	if (!mem_mng) {
+		rte_free(cont->pools);
+		return NULL;
+	}
+	cont->n = size;
+	TAILQ_INIT(&cont->pool_list);
+	cont->init_mem_mng = mem_mng;
+	return cont;
+}
+
+/**
+ * Query a devx flow counter.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] cnt
+ *   Pointer to the flow counter.
+ * @param[out] pkts
+ *   The statistics value of packets.
+ * @param[out] bytes
+ *   The statistics value of bytes.
+ *
+ * @return
+ *   0 on success, otherwise a negative errno value and rte_errno is set.
+ */
+static inline int
+_flow_dv_query_count(struct rte_eth_dev *dev __rte_unused,
+		     struct mlx5_flow_counter *cnt, uint64_t *pkts,
+		     uint64_t *bytes)
+{
+	struct mlx5_flow_counter_pool *pool =
+			flow_dv_counter_pool_get(cnt);
+	uint16_t offset = pool->min_dcs->id % MLX5_COUNTERS_PER_POOL;
+	int ret = mlx5_devx_cmd_flow_counter_query
+		(pool->min_dcs, 0, MLX5_COUNTERS_PER_POOL - offset, NULL,
+		 NULL, pool->raw->mem_mng->dm->id,
+		 (void *)(uintptr_t)(pool->raw->data +
+		 offset));
+
+	if (ret) {
+		DRV_LOG(ERR, "Failed to trigger synchronous"
+			" query for dcs ID %d\n",
+			pool->min_dcs->id);
+		return ret;
+	}
+	offset = cnt - &pool->counters_raw[0];
+	*pkts = rte_be_to_cpu_64(pool->raw->data[offset].hits);
+	*bytes = rte_be_to_cpu_64(pool->raw->data[offset].bytes);
+	return 0;
+}
+
+/**
+ * Create and initialize a new counter pool.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[out] dcs
+ *   The devX counter handle.
+ * @param[in] batch
+ *   Whether the pool is for counter that was allocated by batch command.
+ *
+ * @return
+ *   A new pool pointer on success, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_flow_counter_pool *
+flow_dv_pool_create(struct rte_eth_dev *dev, struct mlx5_devx_obj *dcs,
+		    uint32_t batch)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_counter_pool *pool;
+	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
+	uint32_t size;
+
+	if (!cont->n) {
+		cont = flow_dv_container_prepare(dev, batch);
+		if (!cont)
+			return NULL;
+	} else if (cont->n == cont->n_valid) {
+		DRV_LOG(ERR, "No space in container to allocate a new pool\n");
+		rte_errno = ENOSPC;
+		return NULL;
+	}
+	size = sizeof(*pool) + MLX5_COUNTERS_PER_POOL *
+			sizeof(struct mlx5_flow_counter);
+	pool = rte_calloc(__func__, 1, size, 0);
+	if (!pool) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	pool->min_dcs = dcs;
+	pool->raw = cont->init_mem_mng->raws + cont->n_valid;
+	TAILQ_INIT(&pool->counters);
+	TAILQ_INSERT_TAIL(&cont->pool_list, pool, next);
+	cont->pools[cont->n_valid] = pool;
+	cont->n_valid++;
+	return pool;
+}
+
 /**
- * Get or create a flow counter.
+ * Prepare a new counter and/or a new counter pool.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[out] cnt_free
+ *   Where to put the pointer of a new counter.
+ * @param[in] batch
+ *   Whether the pool is for counter that was allocated by batch command.
+ *
+ * @return
+ *   The free counter pool pointer and @p cnt_free is set on success,
+ *   NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_flow_counter_pool *
+flow_dv_counter_pool_prepare(struct rte_eth_dev *dev,
+			     struct mlx5_flow_counter **cnt_free,
+			     uint32_t batch)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_counter_pool *pool;
+	struct mlx5_devx_obj *dcs = NULL;
+	struct mlx5_flow_counter *cnt;
+	uint32_t i;
+
+	if (!batch) {
+		/* bulk_bitmap must be 0 for single counter allocation. */
+		dcs = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, 0);
+		if (!dcs)
+			return NULL;
+		pool = flow_dv_find_pool_by_id(MLX5_CNT_CONTAINER(priv, batch),
+					       dcs->id);
+		if (!pool) {
+			pool = flow_dv_pool_create(dev, dcs, batch);
+			if (!pool) {
+				mlx5_devx_cmd_destroy(dcs);
+				return NULL;
+			}
+		} else if (dcs->id < pool->min_dcs->id) {
+			pool->min_dcs->id = dcs->id;
+		}
+		cnt = &pool->counters_raw[dcs->id % MLX5_COUNTERS_PER_POOL];
+		TAILQ_INSERT_HEAD(&pool->counters, cnt, next);
+		cnt->dcs = dcs;
+		*cnt_free = cnt;
+		return pool;
+	}
+	/* bulk_bitmap is in 128 counters units. */
+	if (priv->config.hca_attr.flow_counter_bulk_alloc_bitmap & 0x4)
+		dcs = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, 0x4);
+	if (!dcs) {
+		rte_errno = ENODATA;
+		return NULL;
+	}
+	pool = flow_dv_pool_create(dev, dcs, batch);
+	if (!pool) {
+		mlx5_devx_cmd_destroy(dcs);
+		return NULL;
+	}
+	for (i = 0; i < MLX5_COUNTERS_PER_POOL; ++i) {
+		cnt = &pool->counters_raw[i];
+		cnt->pool = pool;
+		TAILQ_INSERT_HEAD(&pool->counters, cnt, next);
+	}
+	*cnt_free = &pool->counters_raw[0];
+	return pool;
+}
+
+/**
+ * Search for existed shared counter.
+ *
+ * @param[in] cont
+ *   Pointer to the relevant counter pool container.
+ * @param[in] id
+ *   The shared counter ID to search.
+ *
+ * @return
+ *   NULL if not existed, otherwise pointer to the shared counter.
+ */
+static struct mlx5_flow_counter *
+flow_dv_counter_shared_search(struct mlx5_pools_container *cont,
+			      uint32_t id)
+{
+	static struct mlx5_flow_counter *cnt;
+	struct mlx5_flow_counter_pool *pool;
+	int i;
+
+	TAILQ_FOREACH(pool, &cont->pool_list, next) {
+		for (i = 0; i < MLX5_COUNTERS_PER_POOL; ++i) {
+			cnt = &pool->counters_raw[i];
+			if (cnt->ref_cnt && cnt->shared && cnt->id == id)
+				return cnt;
+		}
+	}
+	return NULL;
+}
+
+/**
+ * Allocate a flow counter.
  *
  * @param[in] dev
  *   Pointer to the Ethernet device structure.
@@ -2155,80 +2492,110 @@ struct field_modify_info modify_tcp[] = {
  *   Indicate if this counter is shared with other flows.
  * @param[in] id
  *   Counter identifier.
+ * @param[in] group
+ *   Counter flow group.
  *
  * @return
  *   pointer to flow counter on success, NULL otherwise and rte_errno is set.
  */
 static struct mlx5_flow_counter *
-flow_dv_counter_new(struct rte_eth_dev *dev, uint32_t shared, uint32_t id)
+flow_dv_counter_alloc(struct rte_eth_dev *dev, uint32_t shared, uint32_t id,
+		      uint16_t group)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter *cnt = NULL;
-	struct mlx5_devx_counter_set *dcs = NULL;
-	int ret;
+	struct mlx5_flow_counter_pool *pool = NULL;
+	struct mlx5_flow_counter *cnt_free = NULL;
+	/*
+	 * Currently group 0 flow counter cannot be assigned to a flow if it is
+	 * not the first one in the batch counter allocation, so it is better
+	 * to allocate counters one by one for these flows in a separate
+	 * container.
+	 * A counter can be shared between different groups so need to take
+	 * shared counters from the single container.
+	 */
+	uint32_t batch = (group && !shared) ? 1 : 0;
+	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
 
 	if (!priv->config.devx) {
-		ret = -ENOTSUP;
-		goto error_exit;
+		rte_errno = ENOTSUP;
+		return NULL;
 	}
 	if (shared) {
-		LIST_FOREACH(cnt, &priv->flow_counters, next) {
-			if (cnt->shared && cnt->id == id) {
-				cnt->ref_cnt++;
-				return cnt;
+		cnt_free = flow_dv_counter_shared_search(cont, id);
+		if (cnt_free) {
+			if (cnt_free->ref_cnt + 1 == 0) {
+				rte_errno = E2BIG;
+				return NULL;
 			}
+			cnt_free->ref_cnt++;
+			return cnt_free;
 		}
 	}
-	cnt = rte_calloc(__func__, 1, sizeof(*cnt), 0);
-	dcs = rte_calloc(__func__, 1, sizeof(*dcs), 0);
-	if (!dcs || !cnt) {
-		ret = -ENOMEM;
-		goto error_exit;
+	/* Pools which has a free counters are in the start. */
+	pool = TAILQ_FIRST(&cont->pool_list);
+	if (pool)
+		cnt_free = TAILQ_FIRST(&pool->counters);
+	if (!cnt_free) {
+		pool = flow_dv_counter_pool_prepare(dev, &cnt_free, batch);
+		if (!pool)
+			return NULL;
 	}
-	ret = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, dcs);
-	if (ret)
-		goto error_exit;
-	struct mlx5_flow_counter tmpl = {
-		.shared = shared,
-		.ref_cnt = 1,
-		.id = id,
-		.dcs = dcs,
-	};
-	tmpl.action = mlx5_glue->dv_create_flow_action_counter(dcs->obj, 0);
-	if (!tmpl.action) {
-		ret = errno;
-		goto error_exit;
+	cnt_free->batch = batch;
+	/* Create a DV counter action only in the first time usage. */
+	if (!cnt_free->action) {
+		uint16_t offset;
+		struct mlx5_devx_obj *dcs;
+
+		if (batch) {
+			offset = cnt_free - &pool->counters_raw[0];
+			dcs = pool->min_dcs;
+		} else {
+			offset = 0;
+			dcs = cnt_free->dcs;
+		}
+		cnt_free->action = mlx5_glue->dv_create_flow_action_counter
+					(dcs->obj, offset);
+		if (!cnt_free->action) {
+			rte_errno = errno;
+			return NULL;
+		}
 	}
-	*cnt = tmpl;
-	LIST_INSERT_HEAD(&priv->flow_counters, cnt, next);
-	return cnt;
-error_exit:
-	rte_free(cnt);
-	rte_free(dcs);
-	rte_errno = -ret;
-	return NULL;
+	/* Update the counter reset values. */
+	if (_flow_dv_query_count(dev, cnt_free, &cnt_free->hits,
+				 &cnt_free->bytes))
+		return NULL;
+	cnt_free->shared = shared;
+	cnt_free->ref_cnt = 1;
+	cnt_free->id = id;
+	TAILQ_REMOVE(&pool->counters, cnt_free, next);
+	if (TAILQ_EMPTY(&pool->counters)) {
+		/* Move the pool to the end of the container pool list. */
+		TAILQ_REMOVE(&cont->pool_list, pool, next);
+		TAILQ_INSERT_TAIL(&cont->pool_list, pool, next);
+	}
+	return cnt_free;
 }
 
 /**
  * Release a flow counter.
  *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
  * @param[in] counter
  *   Pointer to the counter handler.
  */
 static void
-flow_dv_counter_release(struct mlx5_flow_counter *counter)
+flow_dv_counter_release(struct rte_eth_dev *dev __rte_unused,
+			struct mlx5_flow_counter *counter)
 {
-	int ret;
-
 	if (!counter)
 		return;
 	if (--counter->ref_cnt == 0) {
-		ret = mlx5_devx_cmd_flow_counter_free(counter->dcs->obj);
-		if (ret)
-			DRV_LOG(ERR, "Failed to free devx counters, %d", ret);
-		LIST_REMOVE(counter, next);
-		rte_free(counter->dcs);
-		rte_free(counter);
+		struct mlx5_flow_counter_pool *pool =
+				flow_dv_counter_pool_get(counter);
+
+		/* Put the counter in the end - the earliest one. */
+		TAILQ_INSERT_TAIL(&pool->counters, counter, next);
 	}
 }
 
@@ -4217,8 +4584,10 @@ struct field_modify_info modify_tcp[] = {
 				rte_errno = ENOTSUP;
 				goto cnt_err;
 			}
-			flow->counter = flow_dv_counter_new(dev, count->shared,
-							    count->id);
+			flow->counter = flow_dv_counter_alloc(dev,
+							      count->shared,
+							      count->id,
+							      attr->group);
 			if (flow->counter == NULL)
 				goto cnt_err;
 			dev_flow->dv.actions[actions_n++] =
@@ -4891,7 +5260,7 @@ struct field_modify_info modify_tcp[] = {
 		return;
 	flow_dv_remove(dev, flow);
 	if (flow->counter) {
-		flow_dv_counter_release(flow->counter);
+		flow_dv_counter_release(dev, flow->counter);
 		flow->counter = NULL;
 	}
 	if (flow->tag_resource) {
@@ -4936,9 +5305,6 @@ struct field_modify_info modify_tcp[] = {
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_query_count *qc = data;
-	uint64_t pkts = 0;
-	uint64_t bytes = 0;
-	int err;
 
 	if (!priv->config.devx)
 		return rte_flow_error_set(error, ENOTSUP,
@@ -4946,15 +5312,14 @@ struct field_modify_info modify_tcp[] = {
 					  NULL,
 					  "counters are not supported");
 	if (flow->counter) {
-		err = mlx5_devx_cmd_flow_counter_query
-						(flow->counter->dcs,
-						 qc->reset, &pkts, &bytes);
+		uint64_t pkts, bytes;
+		int err = _flow_dv_query_count(dev, flow->counter, &pkts,
+					       &bytes);
+
 		if (err)
-			return rte_flow_error_set
-				(error, err,
-				 RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				 NULL,
-				 "cannot read counters");
+			return rte_flow_error_set(error, -err,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "cannot read counters");
 		qc->hits_set = 1;
 		qc->bytes_set = 1;
 		qc->hits = pkts - flow->counter->hits;
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 2f4c80c..b3395b8 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -124,7 +124,7 @@
 	int ret;
 
 	if (shared) {
-		LIST_FOREACH(cnt, &priv->flow_counters, next) {
+		TAILQ_FOREACH(cnt, &priv->sh->cmng.flow_counters, next) {
 			if (cnt->shared && cnt->id == id) {
 				cnt->ref_cnt++;
 				return cnt;
@@ -144,7 +144,7 @@
 	/* Create counter with Verbs. */
 	ret = flow_verbs_counter_create(dev, cnt);
 	if (!ret) {
-		LIST_INSERT_HEAD(&priv->flow_counters, cnt, next);
+		TAILQ_INSERT_HEAD(&priv->sh->cmng.flow_counters, cnt, next);
 		return cnt;
 	}
 	/* Some error occurred in Verbs library. */
@@ -156,19 +156,24 @@
 /**
  * Release a flow counter.
  *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
  * @param[in] counter
  *   Pointer to the counter handler.
  */
 static void
-flow_verbs_counter_release(struct mlx5_flow_counter *counter)
+flow_verbs_counter_release(struct rte_eth_dev *dev,
+			   struct mlx5_flow_counter *counter)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	if (--counter->ref_cnt == 0) {
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
 		claim_zero(mlx5_glue->destroy_counter_set(counter->cs));
 #elif defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
 		claim_zero(mlx5_glue->destroy_counters(counter->cs));
 #endif
-		LIST_REMOVE(counter, next);
+		TAILQ_REMOVE(&priv->sh->cmng.flow_counters, counter, next);
 		rte_free(counter);
 	}
 }
@@ -1612,7 +1617,7 @@
 		rte_free(dev_flow);
 	}
 	if (flow->counter) {
-		flow_verbs_counter_release(flow->counter);
+		flow_verbs_counter_release(dev, flow->counter);
 		flow->counter = NULL;
 	}
 }
diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
index d038373..ba5fd06 100644
--- a/drivers/net/mlx5/mlx5_glue.c
+++ b/drivers/net/mlx5/mlx5_glue.c
@@ -849,6 +849,33 @@
 #endif
 }
 
+static struct mlx5dv_devx_umem *
+mlx5_glue_devx_umem_reg(struct ibv_context *context, void *addr, size_t size,
+			uint32_t access)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_umem_reg(context, addr, size, access);
+#else
+	(void)context;
+	(void)addr;
+	(void)size;
+	(void)access;
+	errno = -ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_devx_umem_dereg(struct mlx5dv_devx_umem *dv_devx_umem)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_umem_dereg(dv_devx_umem);
+#else
+	(void)dv_devx_umem;
+	return -ENOTSUP;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -930,4 +957,6 @@
 	.devx_obj_query = mlx5_glue_devx_obj_query,
 	.devx_obj_modify = mlx5_glue_devx_obj_modify,
 	.devx_general_cmd = mlx5_glue_devx_general_cmd,
+	.devx_umem_reg = mlx5_glue_devx_umem_reg,
+	.devx_umem_dereg = mlx5_glue_devx_umem_dereg,
 };
diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
index 433c9ed..18b1ce6 100644
--- a/drivers/net/mlx5/mlx5_glue.h
+++ b/drivers/net/mlx5/mlx5_glue.h
@@ -61,6 +61,7 @@
 
 #ifndef HAVE_IBV_DEVX_OBJ
 struct mlx5dv_devx_obj;
+struct mlx5dv_devx_umem;
 #endif
 
 #ifndef HAVE_MLX5DV_DR
@@ -209,6 +210,10 @@ struct mlx5_glue {
 	int (*devx_general_cmd)(struct ibv_context *context,
 				const void *in, size_t inlen,
 				void *out, size_t outlen);
+	struct mlx5dv_devx_umem *(*devx_umem_reg)(struct ibv_context *context,
+						  void *addr, size_t size,
+						  uint32_t access);
+	int (*devx_umem_dereg)(struct mlx5dv_devx_umem *dv_devx_umem);
 };
 
 const struct mlx5_glue *mlx5_glue;
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index fe171f1..79f852b 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -415,6 +415,14 @@ struct mlx5_modification_cmd {
 				 (((_v) & __mlx5_mask(typ, fld)) << \
 				   __mlx5_dw_bit_off(typ, fld))); \
 	} while (0)
+
+#define MLX5_SET64(typ, p, fld, v) \
+	do { \
+		assert(__mlx5_bit_sz(typ, fld) == 64); \
+		*((__be64 *)(p) + __mlx5_64_off(typ, fld)) = \
+			rte_cpu_to_be_64(v); \
+	} while (0)
+
 #define MLX5_GET(typ, p, fld) \
 	((rte_be_to_cpu_32(*((__be32 *)(p) +\
 	__mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
@@ -556,10 +564,15 @@ enum {
 
 enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
+	MLX5_CMD_OP_CREATE_MKEY = 0x200,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
 	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
 };
 
+enum {
+	MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
+};
+
 /* Flow counters. */
 struct mlx5_ifc_alloc_flow_counter_out_bits {
 	u8         status[0x8];
@@ -574,7 +587,9 @@ struct mlx5_ifc_alloc_flow_counter_in_bits {
 	u8         reserved_at_10[0x10];
 	u8         reserved_at_20[0x10];
 	u8         op_mod[0x10];
-	u8         reserved_at_40[0x40];
+	u8         flow_counter_id[0x20];
+	u8         reserved_at_40[0x18];
+	u8         flow_counter_bulk[0x8];
 };
 
 struct mlx5_ifc_dealloc_flow_counter_out_bits {
@@ -611,13 +626,102 @@ struct mlx5_ifc_query_flow_counter_in_bits {
 	u8         reserved_at_10[0x10];
 	u8         reserved_at_20[0x10];
 	u8         op_mod[0x10];
-	u8         reserved_at_40[0x80];
+	u8         reserved_at_40[0x20];
+	u8         mkey[0x20];
+	u8         address[0x40];
 	u8         clear[0x1];
-	u8         reserved_at_c1[0xf];
-	u8         num_of_counters[0x10];
+	u8         dump_to_memory[0x1];
+	u8         num_of_counters[0x1e];
 	u8         flow_counter_id[0x20];
 };
 
+struct mlx5_ifc_mkc_bits {
+	u8         reserved_at_0[0x1];
+	u8         free[0x1];
+	u8         reserved_at_2[0x1];
+	u8         access_mode_4_2[0x3];
+	u8         reserved_at_6[0x7];
+	u8         relaxed_ordering_write[0x1];
+	u8         reserved_at_e[0x1];
+	u8         small_fence_on_rdma_read_response[0x1];
+	u8         umr_en[0x1];
+	u8         a[0x1];
+	u8         rw[0x1];
+	u8         rr[0x1];
+	u8         lw[0x1];
+	u8         lr[0x1];
+	u8         access_mode_1_0[0x2];
+	u8         reserved_at_18[0x8];
+
+	u8         qpn[0x18];
+	u8         mkey_7_0[0x8];
+
+	u8         reserved_at_40[0x20];
+
+	u8         length64[0x1];
+	u8         bsf_en[0x1];
+	u8         sync_umr[0x1];
+	u8         reserved_at_63[0x2];
+	u8         expected_sigerr_count[0x1];
+	u8         reserved_at_66[0x1];
+	u8         en_rinval[0x1];
+	u8         pd[0x18];
+
+	u8         start_addr[0x40];
+
+	u8         len[0x40];
+
+	u8         bsf_octword_size[0x20];
+
+	u8         reserved_at_120[0x80];
+
+	u8         translations_octword_size[0x20];
+
+	u8         reserved_at_1c0[0x1b];
+	u8         log_page_size[0x5];
+
+	u8         reserved_at_1e0[0x20];
+};
+
+struct mlx5_ifc_create_mkey_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+
+	u8         syndrome[0x20];
+
+	u8         reserved_at_40[0x8];
+	u8         mkey_index[0x18];
+
+	u8         reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_mkey_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         reserved_at_40[0x20];
+
+	u8         pg_access[0x1];
+	u8         reserved_at_61[0x1f];
+
+	struct mlx5_ifc_mkc_bits memory_key_mkey_entry;
+
+	u8         reserved_at_280[0x80];
+
+	u8         translations_octword_actual_size[0x20];
+
+	u8         mkey_umem_id[0x20];
+
+	u8         mkey_umem_offset[0x40];
+
+	u8         reserved_at_380[0x500];
+
+	u8         klm_pas_mtt[][0x20];
+};
+
 enum {
 	MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0 << 1,
 	MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP        = 0xc << 1,
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH 2/4] net/mlx5: resize a full counter container
  2019-07-16 14:34 ` [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Matan Azrad
  2019-07-16 14:34   ` [dpdk-dev] [PATCH 1/4] net/mlx5: accelerate DV flow counter transactions Matan Azrad
@ 2019-07-16 14:34   ` Matan Azrad
  2019-07-16 14:34   ` [dpdk-dev] [PATCH 3/4] net/mlx5: accelerate DV flow counter query Matan Azrad
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Matan Azrad @ 2019-07-16 14:34 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko; +Cc: dev

When the counter countainer has no more space to store more counter
pools try to resize the container to allow more pools to be created.

So, the only limitation for the maximum counter number is the memory.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow_dv.c | 43 +++++++++++++++++++++++------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index da16e48..693848e 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -2147,7 +2147,7 @@ struct field_modify_info modify_tcp[] = {
 	return 0;
 }
 
-#define MLX5_CNT_CONTAINER_SIZE 64
+#define MLX5_CNT_CONTAINER_RESIZE 64
 #define MLX5_CNT_CONTAINER(priv, batch) (&(priv)->sh->cmng.ccont[batch])
 
 /**
@@ -2263,7 +2263,7 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
- * Prepare a counter container.
+ * Resize a counter container.
  *
  * @param[in] dev
  *   Pointer to the Ethernet device structure.
@@ -2274,26 +2274,34 @@ struct field_modify_info modify_tcp[] = {
  *   The container pointer on success, otherwise NULL and rte_errno is set.
  */
 static struct mlx5_pools_container *
-flow_dv_container_prepare(struct rte_eth_dev *dev, uint32_t batch)
+flow_dv_container_resize(struct rte_eth_dev *dev, uint32_t batch)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
 	struct mlx5_counter_stats_mem_mng *mem_mng;
-	uint32_t size = MLX5_CNT_CONTAINER_SIZE;
-	uint32_t mem_size = sizeof(struct mlx5_flow_counter_pool *) * size;
-
-	cont->pools = rte_calloc(__func__, 1, mem_size, 0);
-	if (!cont->pools) {
+	uint32_t resize = cont->n + MLX5_CNT_CONTAINER_RESIZE;
+	uint32_t mem_size = sizeof(struct mlx5_flow_counter_pool *) * resize;
+	struct mlx5_flow_counter_pool **new_pools = rte_calloc(__func__, 1,
+							       mem_size, 0);
+	if (!new_pools) {
 		rte_errno = ENOMEM;
 		return NULL;
 	}
-	mem_mng = flow_dv_create_counter_stat_mem_mng(dev, size);
+	mem_mng = flow_dv_create_counter_stat_mem_mng(dev,
+						    MLX5_CNT_CONTAINER_RESIZE);
 	if (!mem_mng) {
-		rte_free(cont->pools);
+		rte_free(new_pools);
 		return NULL;
 	}
-	cont->n = size;
-	TAILQ_INIT(&cont->pool_list);
+	if (cont->n) {
+		memcpy(new_pools, cont->pools,
+		       cont->n * sizeof(struct mlx5_flow_counter_pool *));
+		rte_free(cont->pools);
+	} else {
+		TAILQ_INIT(&cont->pool_list);
+	}
+	cont->pools = new_pools;
+	cont->n = resize;
 	cont->init_mem_mng = mem_mng;
 	return cont;
 }
@@ -2361,14 +2369,10 @@ struct field_modify_info modify_tcp[] = {
 	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
 	uint32_t size;
 
-	if (!cont->n) {
-		cont = flow_dv_container_prepare(dev, batch);
+	if (cont->n == cont->n_valid) {
+		cont = flow_dv_container_resize(dev, batch);
 		if (!cont)
 			return NULL;
-	} else if (cont->n == cont->n_valid) {
-		DRV_LOG(ERR, "No space in container to allocate a new pool\n");
-		rte_errno = ENOSPC;
-		return NULL;
 	}
 	size = sizeof(*pool) + MLX5_COUNTERS_PER_POOL *
 			sizeof(struct mlx5_flow_counter);
@@ -2378,7 +2382,8 @@ struct field_modify_info modify_tcp[] = {
 		return NULL;
 	}
 	pool->min_dcs = dcs;
-	pool->raw = cont->init_mem_mng->raws + cont->n_valid;
+	pool->raw = cont->init_mem_mng->raws + cont->n_valid  %
+			MLX5_CNT_CONTAINER_RESIZE;
 	TAILQ_INIT(&pool->counters);
 	TAILQ_INSERT_TAIL(&cont->pool_list, pool, next);
 	cont->pools[cont->n_valid] = pool;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH 3/4] net/mlx5: accelerate DV flow counter query
  2019-07-16 14:34 ` [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Matan Azrad
  2019-07-16 14:34   ` [dpdk-dev] [PATCH 1/4] net/mlx5: accelerate DV flow counter transactions Matan Azrad
  2019-07-16 14:34   ` [dpdk-dev] [PATCH 2/4] net/mlx5: resize a full counter container Matan Azrad
@ 2019-07-16 14:34   ` Matan Azrad
  2019-07-16 14:34   ` [dpdk-dev] [PATCH 4/4] net/mlx5: allow basic counter management fallback Matan Azrad
  2019-07-17  6:50   ` [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Raslan Darawsheh
  4 siblings, 0 replies; 11+ messages in thread
From: Matan Azrad @ 2019-07-16 14:34 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko; +Cc: dev

All the DV counters are cashed in the PMD memory and are contained in
pools which are contained in containers according to the counters
allocation type - batch or single.

Currently, the flow counter query is done synchronously in pool
resolution means that on the user request a FW command is triggered to
read all the counters in the pool.

A new feature of devX to asynchronously read batch of flow counters
allows to accelerate the user query operation.

Using the DPDK host thread, the PMD periodically triggers asynchronous
query in pool resolution for all the counter pools and an interrupt is
triggered by the FW when the values are updated.
In the interrupt handler the pool counter values raw data is replaced
using a double buffer algorithm (very fast).
In the user query, the PMD just returns the last query values from the
PMD cache - no system-calls and FW commands are triggered from the user
control thread on query operation!

More synchronization is added with the host thread:
        Container resize uses double buffer algorithm.
        Pools growing in container uses atomic operation.
        Pool query buffer replace uses a spinlock.
        Pool minimum devX counter ID uses atomic operation.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/rel_notes/release_19_08.rst |   2 +
 drivers/net/mlx5/Makefile              |   5 ++
 drivers/net/mlx5/meson.build           |   2 +
 drivers/net/mlx5/mlx5.c                |   9 ++
 drivers/net/mlx5/mlx5.h                |  44 ++++++++--
 drivers/net/mlx5/mlx5_devx_cmds.c      |  48 ++++++++++-
 drivers/net/mlx5/mlx5_ethdev.c         |  85 +++++++++++++++++--
 drivers/net/mlx5/mlx5_flow.c           | 147 +++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h           |   8 ++
 drivers/net/mlx5/mlx5_flow_dv.c        | 141 ++++++++++++++++++++-----------
 drivers/net/mlx5/mlx5_glue.c           |  62 ++++++++++++++
 drivers/net/mlx5/mlx5_glue.h           |  15 ++++
 12 files changed, 504 insertions(+), 64 deletions(-)

diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index 6fb1a77..034db47 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -114,6 +114,8 @@ New Features
   * Added support for match on ICMP/ICMP6 code and type.
   * Added support for matching on GRE's key and C,K,S present bits.
   * Added support for IP-in-IP tunnel.
+  * Accelerate flows with count action creation and destroy.
+  * Accelerate flows counter query.
 
 * **Updated Solarflare network PMD.**
 
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index b210c80..76d40b1 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -173,6 +173,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum MLX5DV_FLOW_ACTION_COUNTERS_DEVX \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVX_ASYNC \
+		infiniband/mlx5dv.h \
+		func mlx5dv_devx_obj_query_async \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_ETHTOOL_LINK_MODE_25G \
 		/usr/include/linux/ethtool.h \
 		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 2c23e44..ed42641 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -122,6 +122,8 @@ if build
 		'mlx5dv_devx_obj_create' ],
 		[ 'HAVE_IBV_FLOW_DEVX_COUNTERS', 'infiniband/mlx5dv.h',
 		'MLX5DV_FLOW_ACTION_COUNTERS_DEVX' ],
+		[ 'HAVE_IBV_DEVX_ASYNC', 'infiniband/mlx5dv.h',
+		'mlx5dv_devx_obj_query_async' ],
 		[ 'HAVE_MLX5DV_DR', 'infiniband/mlx5dv.h',
 		'MLX5DV_DR_DOMAIN_TYPE_NIC_RX' ],
 		[ 'HAVE_MLX5DV_DR_ESWITCH', 'infiniband/mlx5dv.h',
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 62be141..a8d824e 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -37,6 +37,7 @@
 #include <rte_rwlock.h>
 #include <rte_spinlock.h>
 #include <rte_string_fns.h>
+#include <rte_alarm.h>
 
 #include "mlx5.h"
 #include "mlx5_utils.h"
@@ -201,7 +202,15 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_counter_stats_mem_mng *mng;
 	uint8_t i;
 	int j;
+	int retries = 1024;
 
+	rte_errno = 0;
+	while (--retries) {
+		rte_eal_alarm_cancel(mlx5_flow_query_alarm, sh);
+		if (rte_errno != EINPROGRESS)
+			break;
+		rte_pause();
+	}
 	for (i = 0; i < RTE_DIM(sh->cmng.ccont); ++i) {
 		struct mlx5_flow_counter_pool *pool;
 		uint32_t batch = !!(i % 2);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3944b5f..4ce352a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -257,6 +257,7 @@ struct mlx5_drop {
 };
 
 #define MLX5_COUNTERS_PER_POOL 512
+#define MLX5_MAX_PENDING_QUERIES 4
 
 struct mlx5_flow_counter_pool;
 
@@ -283,7 +284,10 @@ struct mlx5_flow_counter {
 		struct mlx5_devx_obj *dcs; /**< Counter Devx object. */
 		struct mlx5_flow_counter_pool *pool; /**< The counter pool. */
 	};
-	uint64_t hits; /**< Reset value of hits packets. */
+	union {
+		uint64_t hits; /**< Reset value of hits packets. */
+		int64_t query_gen; /**< Generation of the last release. */
+	};
 	uint64_t bytes; /**< Reset value of bytes. */
 	void *action; /**< Pointer to the dv action. */
 };
@@ -294,10 +298,17 @@ struct mlx5_flow_counter {
 struct mlx5_flow_counter_pool {
 	TAILQ_ENTRY(mlx5_flow_counter_pool) next;
 	struct mlx5_counters counters; /* Free counter list. */
-	struct mlx5_devx_obj *min_dcs;
-	/* The devx object of the minimum counter ID in the pool. */
-	struct mlx5_counter_stats_raw *raw; /* The counter stats memory raw. */
-	struct mlx5_flow_counter counters_raw[]; /* The counters memory. */
+	union {
+		struct mlx5_devx_obj *min_dcs;
+		rte_atomic64_t a64_dcs;
+	};
+	/* The devx object of the minimum counter ID. */
+	rte_atomic64_t query_gen;
+	uint32_t n_counters: 16; /* Number of devx allocated counters. */
+	rte_spinlock_t sl; /* The pool lock. */
+	struct mlx5_counter_stats_raw *raw;
+	struct mlx5_counter_stats_raw *raw_hw; /* The raw on HW working. */
+	struct mlx5_flow_counter counters_raw[]; /* The pool counters memory. */
 };
 
 struct mlx5_counter_stats_raw;
@@ -322,7 +333,7 @@ struct mlx5_counter_stats_raw {
 
 /* Container structure for counter pools. */
 struct mlx5_pools_container {
-	uint16_t n_valid; /* Number of valid pools. */
+	rte_atomic16_t n_valid; /* Number of valid pools. */
 	uint16_t n; /* Number of pools. */
 	struct mlx5_counter_pools pool_list; /* Counter pool list. */
 	struct mlx5_flow_counter_pool **pools; /* Counter pool array. */
@@ -332,9 +343,16 @@ struct mlx5_pools_container {
 
 /* Counter global management structure. */
 struct mlx5_flow_counter_mng {
-	struct mlx5_pools_container ccont[2];
+	uint8_t mhi[2]; /* master \ host container index. */
+	struct mlx5_pools_container ccont[2 * 2];
+	/* 2 containers for single and for batch for double-buffer. */
 	struct mlx5_counters flow_counters; /* Legacy flow counter list. */
+	uint8_t pending_queries;
+	uint8_t batch;
+	uint16_t pool_index;
+	uint8_t query_thread_on;
 	LIST_HEAD(mem_mngs, mlx5_counter_stats_mem_mng) mem_mngs;
+	LIST_HEAD(stat_raws, mlx5_counter_stats_raw) free_stat_raws;
 };
 
 /* Per port data of shared IB device. */
@@ -408,6 +426,8 @@ struct mlx5_ibv_shared {
 	pthread_mutex_t intr_mutex; /* Interrupt config mutex. */
 	uint32_t intr_cnt; /* Interrupt handler reference counter. */
 	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
+	struct rte_intr_handle intr_handle_devx; /* DEVX interrupt handler. */
+	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
@@ -520,6 +540,7 @@ int mlx5_ibv_device_to_pci_addr(const struct ibv_device *device,
 				struct rte_pci_addr *pci_addr);
 void mlx5_dev_link_status_handler(void *arg);
 void mlx5_dev_interrupt_handler(void *arg);
+void mlx5_dev_interrupt_handler_devx(void *arg);
 void mlx5_dev_interrupt_handler_uninstall(struct rte_eth_dev *dev);
 void mlx5_dev_interrupt_handler_install(struct rte_eth_dev *dev);
 int mlx5_set_link_down(struct rte_eth_dev *dev);
@@ -641,6 +662,10 @@ int mlx5_ctrl_flow(struct rte_eth_dev *dev,
 		   struct rte_flow_item_eth *eth_mask);
 int mlx5_flow_create_drop_queue(struct rte_eth_dev *dev);
 void mlx5_flow_delete_drop_queue(struct rte_eth_dev *dev);
+void mlx5_flow_async_pool_query_handle(struct mlx5_ibv_shared *sh,
+				       uint64_t async_id, int status);
+void mlx5_set_query_alarm(struct mlx5_ibv_shared *sh);
+void mlx5_flow_query_alarm(void *arg);
 
 /* mlx5_mp.c */
 void mlx5_mp_req_start_rxtx(struct rte_eth_dev *dev);
@@ -678,9 +703,12 @@ struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
 int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
 				     int clear, uint32_t n_counters,
 				     uint64_t *pkts, uint64_t *bytes,
-				     uint32_t mkey, void *addr);
+				     uint32_t mkey, void *addr,
+				     struct mlx5dv_devx_cmd_comp *cmd_comp,
+				     uint64_t async_id);
 int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
 				 struct mlx5_hca_attr *attr);
 struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
 					     struct mlx5_devx_mkey_attr *attr);
+int mlx5_devx_get_out_command_status(void *out);
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 92f2fc8..28d967a 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -66,14 +66,21 @@ struct mlx5_devx_obj *
  *   The mkey key for batch query.
  *  @param addr
  *    The address in the mkey range for batch query.
+ *  @param cmd_comp
+ *   The completion object for asynchronous batch query.
+ *  @param async_id
+ *    The ID to be returned in the asynchronous batch query response.
  *
  * @return
  *   0 on success, a negative value otherwise.
  */
 int
-mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs, int clear,
-				 uint32_t n_counters, uint64_t *pkts,
-				 uint64_t *bytes, uint32_t mkey, void *addr)
+mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				 int clear, uint32_t n_counters,
+				 uint64_t *pkts, uint64_t *bytes,
+				 uint32_t mkey, void *addr,
+				 struct mlx5dv_devx_cmd_comp *cmd_comp,
+				 uint64_t async_id)
 {
 	int out_len = MLX5_ST_SZ_BYTES(query_flow_counter_out) +
 			MLX5_ST_SZ_BYTES(traffic_counter);
@@ -96,7 +103,13 @@ struct mlx5_devx_obj *
 		MLX5_SET64(query_flow_counter_in, in, address,
 			   (uint64_t)(uintptr_t)addr);
 	}
-	rc = mlx5_glue->devx_obj_query(dcs->obj, in, sizeof(in), out, out_len);
+	if (!cmd_comp)
+		rc = mlx5_glue->devx_obj_query(dcs->obj, in, sizeof(in), out,
+					       out_len);
+	else
+		rc = mlx5_glue->devx_obj_query_async(dcs->obj, in, sizeof(in),
+						     out_len, async_id,
+						     cmd_comp);
 	if (rc) {
 		DRV_LOG(ERR, "Failed to query devx counters with rc %d\n ", rc);
 		rte_errno = rc;
@@ -169,6 +182,33 @@ struct mlx5_devx_obj *
 }
 
 /**
+ * Get status of devx command response.
+ * Mainly used for asynchronous commands.
+ *
+ * @param[in] out
+ *   The out response buffer.
+ *
+ * @return
+ *   0 on success, non-zero value otherwise.
+ */
+int
+mlx5_devx_get_out_command_status(void *out)
+{
+	int status;
+
+	if (!out)
+		return -EINVAL;
+	status = MLX5_GET(query_flow_counter_out, out, status);
+	if (status) {
+		int syndrome = MLX5_GET(query_flow_counter_out, out, syndrome);
+
+		DRV_LOG(ERR, "Bad devX status %x, syndrome = %x\n", status,
+			syndrome);
+	}
+	return status;
+}
+
+/**
  * Destroy any object allocated by a Devx API.
  *
  * @param[in] obj
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index eeefe4d..004901a 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1433,6 +1433,38 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
+ * Handle DEVX interrupts from the NIC.
+ * This function is probably called from the DPDK host thread.
+ *
+ * @param cb_arg
+ *   Callback argument.
+ */
+void
+mlx5_dev_interrupt_handler_devx(void *cb_arg)
+{
+#ifndef HAVE_IBV_DEVX_ASYNC
+	(void)cb_arg;
+	return;
+#else
+	struct mlx5_ibv_shared *sh = cb_arg;
+	union {
+		struct mlx5dv_devx_async_cmd_hdr cmd_resp;
+		uint8_t buf[MLX5_ST_SZ_BYTES(query_flow_counter_out) +
+			    MLX5_ST_SZ_BYTES(traffic_counter) +
+			    sizeof(struct mlx5dv_devx_async_cmd_hdr)];
+	} out;
+	uint8_t *buf = out.buf + sizeof(out.cmd_resp);
+
+	while (!mlx5_glue->devx_get_async_cmd_comp(sh->devx_comp,
+						   &out.cmd_resp,
+						   sizeof(out.buf)))
+		mlx5_flow_async_pool_query_handle
+			(sh, (uint64_t)out.cmd_resp.wr_id,
+			 mlx5_devx_get_out_command_status(buf));
+#endif /* HAVE_IBV_DEVX_ASYNC */
+}
+
+/**
  * Uninstall shared asynchronous device events handler.
  * This function is implemented to support event sharing
  * between multiple ports of single IB device.
@@ -1464,6 +1496,17 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 				     mlx5_dev_interrupt_handler, sh);
 	sh->intr_handle.fd = 0;
 	sh->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+	if (sh->intr_handle_devx.fd) {
+		rte_intr_callback_unregister(&sh->intr_handle_devx,
+					     mlx5_dev_interrupt_handler_devx,
+					     sh);
+		sh->intr_handle_devx.fd = 0;
+		sh->intr_handle_devx.type = RTE_INTR_HANDLE_UNKNOWN;
+	}
+	if (sh->devx_comp) {
+		mlx5_glue->devx_destroy_cmd_comp(sh->devx_comp);
+		sh->devx_comp = NULL;
+	}
 exit:
 	pthread_mutex_unlock(&sh->intr_mutex);
 }
@@ -1507,17 +1550,49 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	if (ret) {
 		DRV_LOG(INFO, "failed to change file descriptor"
 			      " async event queue");
-		/* Indicate there will be no interrupts. */
-		dev->data->dev_conf.intr_conf.lsc = 0;
-		dev->data->dev_conf.intr_conf.rmv = 0;
-		sh->port[priv->ibv_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
-		goto exit;
+		goto error;
 	}
 	sh->intr_handle.fd = sh->ctx->async_fd;
 	sh->intr_handle.type = RTE_INTR_HANDLE_EXT;
 	rte_intr_callback_register(&sh->intr_handle,
 				   mlx5_dev_interrupt_handler, sh);
+	if (priv->config.devx) {
+#ifndef HAVE_IBV_DEVX_ASYNC
+		goto error_unregister;
+#else
+		sh->devx_comp = mlx5_glue->devx_create_cmd_comp(sh->ctx);
+		if (sh->devx_comp) {
+			flags = fcntl(sh->devx_comp->fd, F_GETFL);
+			ret = fcntl(sh->devx_comp->fd, F_SETFL,
+				    flags | O_NONBLOCK);
+			if (ret) {
+				DRV_LOG(INFO, "failed to change file descriptor"
+					      " devx async event queue");
+				goto error_unregister;
+			}
+			sh->intr_handle_devx.fd = sh->devx_comp->fd;
+			sh->intr_handle_devx.type = RTE_INTR_HANDLE_EXT;
+			rte_intr_callback_register
+				(&sh->intr_handle_devx,
+				 mlx5_dev_interrupt_handler_devx, sh);
+		} else {
+			DRV_LOG(INFO, "failed to create devx async command "
+				"completion");
+			goto error_unregister;
+		}
+#endif /* HAVE_IBV_DEVX_ASYNC */
+	}
 	sh->intr_cnt++;
+error_unregister:
+	rte_intr_callback_unregister(&sh->intr_handle,
+				     mlx5_dev_interrupt_handler, sh);
+error:
+	/* Indicate there will be no interrupts. */
+	dev->data->dev_conf.intr_conf.lsc = 0;
+	dev->data->dev_conf.intr_conf.rmv = 0;
+	sh->intr_handle.fd = 0;
+	sh->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+	sh->port[priv->ibv_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
 exit:
 	pthread_mutex_unlock(&sh->intr_mutex);
 }
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 4ba34db..7109985 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -3174,3 +3174,150 @@ struct rte_flow *
 	}
 	return 0;
 }
+
+#define MLX5_POOL_QUERY_FREQ_US 1000000
+
+/**
+ * Set the periodic procedure for triggering asynchronous batch queries for all
+ * the counter pools.
+ *
+ * @param[in] sh
+ *   Pointer to mlx5_ibv_shared object.
+ */
+void
+mlx5_set_query_alarm(struct mlx5_ibv_shared *sh)
+{
+	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(sh, 0, 0);
+	uint32_t pools_n = rte_atomic16_read(&cont->n_valid);
+	uint32_t us;
+
+	cont = MLX5_CNT_CONTAINER(sh, 1, 0);
+	pools_n += rte_atomic16_read(&cont->n_valid);
+	us = MLX5_POOL_QUERY_FREQ_US / pools_n;
+	DRV_LOG(DEBUG, "Set alarm for %u pools each %u us\n", pools_n, us);
+	if (rte_eal_alarm_set(us, mlx5_flow_query_alarm, sh)) {
+		sh->cmng.query_thread_on = 0;
+		DRV_LOG(ERR, "Cannot reinitialize query alarm\n");
+	} else {
+		sh->cmng.query_thread_on = 1;
+	}
+}
+
+/**
+ * The periodic procedure for triggering asynchronous batch queries for all the
+ * counter pools. This function is probably called by the host thread.
+ *
+ * @param[in] arg
+ *   The parameter for the alarm process.
+ */
+void
+mlx5_flow_query_alarm(void *arg)
+{
+	struct mlx5_ibv_shared *sh = arg;
+	struct mlx5_devx_obj *dcs;
+	uint16_t offset;
+	int ret;
+	uint8_t batch = sh->cmng.batch;
+	uint16_t pool_index = sh->cmng.pool_index;
+	struct mlx5_pools_container *cont;
+	struct mlx5_pools_container *mcont;
+	struct mlx5_flow_counter_pool *pool;
+
+	if (sh->cmng.pending_queries >= MLX5_MAX_PENDING_QUERIES)
+		goto set_alarm;
+next_container:
+	cont = MLX5_CNT_CONTAINER(sh, batch, 1);
+	mcont = MLX5_CNT_CONTAINER(sh, batch, 0);
+	/* Check if resize was done and need to flip a container. */
+	if (cont != mcont) {
+		if (cont->pools) {
+			/* Clean the old container. */
+			rte_free(cont->pools);
+			memset(cont, 0, sizeof(*cont));
+		}
+		rte_cio_wmb();
+		 /* Flip the host container. */
+		sh->cmng.mhi[batch] ^= (uint8_t)2;
+		cont = mcont;
+	}
+	if (!cont->pools) {
+		/* 2 empty containers case is unexpected. */
+		if (unlikely(batch != sh->cmng.batch))
+			goto set_alarm;
+		batch ^= 0x1;
+		pool_index = 0;
+		goto next_container;
+	}
+	pool = cont->pools[pool_index];
+	if (pool->raw_hw)
+		/* There is a pool query in progress. */
+		goto set_alarm;
+	pool->raw_hw =
+		LIST_FIRST(&sh->cmng.free_stat_raws);
+	if (!pool->raw_hw)
+		/* No free counter statistics raw memory. */
+		goto set_alarm;
+	dcs = (struct mlx5_devx_obj *)(uintptr_t)rte_atomic64_read
+							      (&pool->a64_dcs);
+	offset = batch ? 0 : dcs->id % MLX5_COUNTERS_PER_POOL;
+	ret = mlx5_devx_cmd_flow_counter_query(dcs, 0, MLX5_COUNTERS_PER_POOL -
+					       offset, NULL, NULL,
+					       pool->raw_hw->mem_mng->dm->id,
+					       (void *)(uintptr_t)
+					       (pool->raw_hw->data + offset),
+					       sh->devx_comp,
+					       (uint64_t)(uintptr_t)pool);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to trigger asynchronous query for dcs ID"
+			" %d\n", pool->min_dcs->id);
+		pool->raw_hw = NULL;
+		goto set_alarm;
+	}
+	pool->raw_hw->min_dcs_id = dcs->id;
+	LIST_REMOVE(pool->raw_hw, next);
+	sh->cmng.pending_queries++;
+	pool_index++;
+	if (pool_index >= rte_atomic16_read(&cont->n_valid)) {
+		batch ^= 0x1;
+		pool_index = 0;
+	}
+set_alarm:
+	sh->cmng.batch = batch;
+	sh->cmng.pool_index = pool_index;
+	mlx5_set_query_alarm(sh);
+}
+
+/**
+ * Handler for the HW respond about ready values from an asynchronous batch
+ * query. This function is probably called by the host thread.
+ *
+ * @param[in] sh
+ *   The pointer to the shared IB device context.
+ * @param[in] async_id
+ *   The Devx async ID.
+ * @param[in] status
+ *   The status of the completion.
+ */
+void
+mlx5_flow_async_pool_query_handle(struct mlx5_ibv_shared *sh,
+				  uint64_t async_id, int status)
+{
+	struct mlx5_flow_counter_pool *pool =
+		(struct mlx5_flow_counter_pool *)(uintptr_t)async_id;
+	struct mlx5_counter_stats_raw *raw_to_free;
+
+	if (unlikely(status)) {
+		raw_to_free = pool->raw_hw;
+	} else {
+		raw_to_free = pool->raw;
+		rte_spinlock_lock(&pool->sl);
+		pool->raw = pool->raw_hw;
+		rte_spinlock_unlock(&pool->sl);
+		rte_atomic64_add(&pool->query_gen, 1);
+		/* Be sure the new raw counters data is updated in memory. */
+		rte_cio_wmb();
+	}
+	LIST_INSERT_HEAD(&sh->cmng.free_stat_raws, raw_to_free, next);
+	pool->raw_hw = NULL;
+	sh->cmng.pending_queries--;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 119bb31..53110ce 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -21,6 +21,9 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+#include <rte_atomic.h>
+#include <rte_alarm.h>
+
 #include "mlx5.h"
 #include "mlx5_prm.h"
 
@@ -414,6 +417,11 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_query_t query;
 };
 
+#define MLX5_CNT_CONTAINER(sh, batch, thread) (&(sh)->cmng.ccont \
+	[(((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
+#define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
+	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
+
 /* mlx5_flow.c */
 
 uint64_t mlx5_flow_hashfields_adjust(struct mlx5_flow *dev_flow, int tunnel,
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 693848e..4849bd9 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -2148,7 +2148,6 @@ struct field_modify_info modify_tcp[] = {
 }
 
 #define MLX5_CNT_CONTAINER_RESIZE 64
-#define MLX5_CNT_CONTAINER(priv, batch) (&(priv)->sh->cmng.ccont[batch])
 
 /**
  * Get a pool by a counter.
@@ -2271,39 +2270,53 @@ struct field_modify_info modify_tcp[] = {
  *   Whether the pool is for counter that was allocated by batch command.
  *
  * @return
- *   The container pointer on success, otherwise NULL and rte_errno is set.
+ *   The new container pointer on success, otherwise NULL and rte_errno is set.
  */
 static struct mlx5_pools_container *
 flow_dv_container_resize(struct rte_eth_dev *dev, uint32_t batch)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
+	struct mlx5_pools_container *cont =
+			MLX5_CNT_CONTAINER(priv->sh, batch, 0);
+	struct mlx5_pools_container *new_cont =
+			MLX5_CNT_CONTAINER_UNUSED(priv->sh, batch, 0);
 	struct mlx5_counter_stats_mem_mng *mem_mng;
 	uint32_t resize = cont->n + MLX5_CNT_CONTAINER_RESIZE;
 	uint32_t mem_size = sizeof(struct mlx5_flow_counter_pool *) * resize;
-	struct mlx5_flow_counter_pool **new_pools = rte_calloc(__func__, 1,
-							       mem_size, 0);
-	if (!new_pools) {
+	int i;
+
+	if (cont != MLX5_CNT_CONTAINER(priv->sh, batch, 1)) {
+		/* The last resize still hasn't detected by the host thread. */
+		rte_errno = EAGAIN;
+		return NULL;
+	}
+	new_cont->pools = rte_calloc(__func__, 1, mem_size, 0);
+	if (!new_cont->pools) {
 		rte_errno = ENOMEM;
 		return NULL;
 	}
+	if (cont->n)
+		memcpy(new_cont->pools, cont->pools, cont->n *
+		       sizeof(struct mlx5_flow_counter_pool *));
 	mem_mng = flow_dv_create_counter_stat_mem_mng(dev,
-						    MLX5_CNT_CONTAINER_RESIZE);
+		MLX5_CNT_CONTAINER_RESIZE + MLX5_MAX_PENDING_QUERIES);
 	if (!mem_mng) {
-		rte_free(new_pools);
+		rte_free(new_cont->pools);
 		return NULL;
 	}
-	if (cont->n) {
-		memcpy(new_pools, cont->pools,
-		       cont->n * sizeof(struct mlx5_flow_counter_pool *));
-		rte_free(cont->pools);
-	} else {
-		TAILQ_INIT(&cont->pool_list);
-	}
-	cont->pools = new_pools;
-	cont->n = resize;
-	cont->init_mem_mng = mem_mng;
-	return cont;
+	for (i = 0; i < MLX5_MAX_PENDING_QUERIES; ++i)
+		LIST_INSERT_HEAD(&priv->sh->cmng.free_stat_raws,
+				 mem_mng->raws + MLX5_CNT_CONTAINER_RESIZE +
+				 i, next);
+	new_cont->n = resize;
+	rte_atomic16_set(&new_cont->n_valid, rte_atomic16_read(&cont->n_valid));
+	TAILQ_INIT(&new_cont->pool_list);
+	TAILQ_CONCAT(&new_cont->pool_list, &cont->pool_list, next);
+	new_cont->init_mem_mng = mem_mng;
+	rte_cio_wmb();
+	 /* Flip the master container. */
+	priv->sh->cmng.mhi[batch] ^= (uint8_t)1;
+	return new_cont;
 }
 
 /**
@@ -2328,22 +2341,22 @@ struct field_modify_info modify_tcp[] = {
 {
 	struct mlx5_flow_counter_pool *pool =
 			flow_dv_counter_pool_get(cnt);
-	uint16_t offset = pool->min_dcs->id % MLX5_COUNTERS_PER_POOL;
-	int ret = mlx5_devx_cmd_flow_counter_query
-		(pool->min_dcs, 0, MLX5_COUNTERS_PER_POOL - offset, NULL,
-		 NULL, pool->raw->mem_mng->dm->id,
-		 (void *)(uintptr_t)(pool->raw->data +
-		 offset));
-
-	if (ret) {
-		DRV_LOG(ERR, "Failed to trigger synchronous"
-			" query for dcs ID %d\n",
-			pool->min_dcs->id);
-		return ret;
+	int offset = cnt - &pool->counters_raw[0];
+
+	rte_spinlock_lock(&pool->sl);
+	/*
+	 * The single counters allocation may allocate smaller ID than the
+	 * current allocated in parallel to the host reading.
+	 * In this case the new counter values must be reported as 0.
+	 */
+	if (unlikely(!cnt->batch && cnt->dcs->id < pool->raw->min_dcs_id)) {
+		*pkts = 0;
+		*bytes = 0;
+	} else {
+		*pkts = rte_be_to_cpu_64(pool->raw->data[offset].hits);
+		*bytes = rte_be_to_cpu_64(pool->raw->data[offset].bytes);
 	}
-	offset = cnt - &pool->counters_raw[0];
-	*pkts = rte_be_to_cpu_64(pool->raw->data[offset].hits);
-	*bytes = rte_be_to_cpu_64(pool->raw->data[offset].bytes);
+	rte_spinlock_unlock(&pool->sl);
 	return 0;
 }
 
@@ -2366,10 +2379,12 @@ struct field_modify_info modify_tcp[] = {
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_counter_pool *pool;
-	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
+	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv->sh, batch,
+							       0);
+	int16_t n_valid = rte_atomic16_read(&cont->n_valid);
 	uint32_t size;
 
-	if (cont->n == cont->n_valid) {
+	if (cont->n == n_valid) {
 		cont = flow_dv_container_resize(dev, batch);
 		if (!cont)
 			return NULL;
@@ -2382,12 +2397,21 @@ struct field_modify_info modify_tcp[] = {
 		return NULL;
 	}
 	pool->min_dcs = dcs;
-	pool->raw = cont->init_mem_mng->raws + cont->n_valid  %
-			MLX5_CNT_CONTAINER_RESIZE;
+	pool->raw = cont->init_mem_mng->raws + n_valid %
+						     MLX5_CNT_CONTAINER_RESIZE;
+	pool->raw_hw = NULL;
+	rte_spinlock_init(&pool->sl);
+	/*
+	 * The generation of the new allocated counters in this pool is 0, 2 in
+	 * the pool generation makes all the counters valid for allocation.
+	 */
+	rte_atomic64_set(&pool->query_gen, 0x2);
 	TAILQ_INIT(&pool->counters);
 	TAILQ_INSERT_TAIL(&cont->pool_list, pool, next);
-	cont->pools[cont->n_valid] = pool;
-	cont->n_valid++;
+	cont->pools[n_valid] = pool;
+	/* Pool initialization must be updated before host thread access. */
+	rte_cio_wmb();
+	rte_atomic16_add(&cont->n_valid, 1);
 	return pool;
 }
 
@@ -2421,8 +2445,8 @@ struct field_modify_info modify_tcp[] = {
 		dcs = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, 0);
 		if (!dcs)
 			return NULL;
-		pool = flow_dv_find_pool_by_id(MLX5_CNT_CONTAINER(priv, batch),
-					       dcs->id);
+		pool = flow_dv_find_pool_by_id
+			(MLX5_CNT_CONTAINER(priv->sh, batch, 0), dcs->id);
 		if (!pool) {
 			pool = flow_dv_pool_create(dev, dcs, batch);
 			if (!pool) {
@@ -2430,7 +2454,8 @@ struct field_modify_info modify_tcp[] = {
 				return NULL;
 			}
 		} else if (dcs->id < pool->min_dcs->id) {
-			pool->min_dcs->id = dcs->id;
+			rte_atomic64_set(&pool->a64_dcs,
+					 (int64_t)(uintptr_t)dcs);
 		}
 		cnt = &pool->counters_raw[dcs->id % MLX5_COUNTERS_PER_POOL];
 		TAILQ_INSERT_HEAD(&pool->counters, cnt, next);
@@ -2519,8 +2544,13 @@ struct field_modify_info modify_tcp[] = {
 	 * shared counters from the single container.
 	 */
 	uint32_t batch = (group && !shared) ? 1 : 0;
-	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv, batch);
+	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv->sh, batch,
+							       0);
 
+#ifndef HAVE_IBV_DEVX_ASYNC
+	rte_errno = ENOTSUP;
+	return NULL;
+#endif
 	if (!priv->config.devx) {
 		rte_errno = ENOTSUP;
 		return NULL;
@@ -2537,9 +2567,22 @@ struct field_modify_info modify_tcp[] = {
 		}
 	}
 	/* Pools which has a free counters are in the start. */
-	pool = TAILQ_FIRST(&cont->pool_list);
-	if (pool)
+	TAILQ_FOREACH(pool, &cont->pool_list, next) {
+		/*
+		 * The free counter reset values must be updated between the
+		 * counter release to the counter allocation, so, at least one
+		 * query must be done in this time. ensure it by saving the
+		 * query generation in the release time.
+		 * The free list is sorted according to the generation - so if
+		 * the first one is not updated, all the others are not
+		 * updated too.
+		 */
 		cnt_free = TAILQ_FIRST(&pool->counters);
+		if (cnt_free && cnt_free->query_gen + 1 <
+		    rte_atomic64_read(&pool->query_gen))
+			break;
+		cnt_free = NULL;
+	}
 	if (!cnt_free) {
 		pool = flow_dv_counter_pool_prepare(dev, &cnt_free, batch);
 		if (!pool)
@@ -2572,6 +2615,9 @@ struct field_modify_info modify_tcp[] = {
 	cnt_free->shared = shared;
 	cnt_free->ref_cnt = 1;
 	cnt_free->id = id;
+	if (!priv->sh->cmng.query_thread_on)
+		/* Start the asynchronous batch query by the host thread. */
+		mlx5_set_query_alarm(priv->sh);
 	TAILQ_REMOVE(&pool->counters, cnt_free, next);
 	if (TAILQ_EMPTY(&pool->counters)) {
 		/* Move the pool to the end of the container pool list. */
@@ -2599,8 +2645,9 @@ struct field_modify_info modify_tcp[] = {
 		struct mlx5_flow_counter_pool *pool =
 				flow_dv_counter_pool_get(counter);
 
-		/* Put the counter in the end - the earliest one. */
+		/* Put the counter in the end - the last updated one. */
 		TAILQ_INSERT_TAIL(&pool->counters, counter, next);
+		counter->query_gen = rte_atomic64_read(&pool->query_gen);
 	}
 }
 
diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
index ba5fd06..942f89d 100644
--- a/drivers/net/mlx5/mlx5_glue.c
+++ b/drivers/net/mlx5/mlx5_glue.c
@@ -849,6 +849,64 @@
 #endif
 }
 
+static struct mlx5dv_devx_cmd_comp *
+mlx5_glue_devx_create_cmd_comp(struct ibv_context *ctx)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_create_cmd_comp(ctx);
+#else
+	(void)ctx;
+	errno = -ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_devx_destroy_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	mlx5dv_devx_destroy_cmd_comp(cmd_comp);
+#else
+	(void)cmd_comp;
+	errno = -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_query_async(struct mlx5dv_devx_obj *obj, const void *in,
+			       size_t inlen, size_t outlen, uint64_t wr_id,
+			       struct mlx5dv_devx_cmd_comp *cmd_comp)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_obj_query_async(obj, in, inlen, outlen, wr_id,
+					   cmd_comp);
+#else
+	(void)obj;
+	(void)in;
+	(void)inlen;
+	(void)outlen;
+	(void)wr_id;
+	(void)cmd_comp;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_get_async_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp,
+				  struct mlx5dv_devx_async_cmd_hdr *cmd_resp,
+				  size_t cmd_resp_len)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_get_async_cmd_comp(cmd_comp, cmd_resp,
+					      cmd_resp_len);
+#else
+	(void)cmd_comp;
+	(void)cmd_resp;
+	(void)cmd_resp_len;
+	return -ENOTSUP;
+#endif
+}
+
 static struct mlx5dv_devx_umem *
 mlx5_glue_devx_umem_reg(struct ibv_context *context, void *addr, size_t size,
 			uint32_t access)
@@ -957,6 +1015,10 @@
 	.devx_obj_query = mlx5_glue_devx_obj_query,
 	.devx_obj_modify = mlx5_glue_devx_obj_modify,
 	.devx_general_cmd = mlx5_glue_devx_general_cmd,
+	.devx_create_cmd_comp = mlx5_glue_devx_create_cmd_comp,
+	.devx_destroy_cmd_comp = mlx5_glue_devx_destroy_cmd_comp,
+	.devx_obj_query_async = mlx5_glue_devx_obj_query_async,
+	.devx_get_async_cmd_comp = mlx5_glue_devx_get_async_cmd_comp,
 	.devx_umem_reg = mlx5_glue_devx_umem_reg,
 	.devx_umem_dereg = mlx5_glue_devx_umem_dereg,
 };
diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
index 18b1ce6..9facdb9 100644
--- a/drivers/net/mlx5/mlx5_glue.h
+++ b/drivers/net/mlx5/mlx5_glue.h
@@ -64,6 +64,11 @@
 struct mlx5dv_devx_umem;
 #endif
 
+#ifndef HAVE_IBV_DEVX_ASYNC
+struct mlx5dv_devx_cmd_comp;
+struct mlx5dv_devx_async_cmd_hdr;
+#endif
+
 #ifndef HAVE_MLX5DV_DR
 enum  mlx5dv_dr_domain_type { unused, };
 struct mlx5dv_dr_domain;
@@ -210,6 +215,16 @@ struct mlx5_glue {
 	int (*devx_general_cmd)(struct ibv_context *context,
 				const void *in, size_t inlen,
 				void *out, size_t outlen);
+	struct mlx5dv_devx_cmd_comp *(*devx_create_cmd_comp)
+					(struct ibv_context *context);
+	void (*devx_destroy_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp);
+	int (*devx_obj_query_async)(struct mlx5dv_devx_obj *obj,
+				    const void *in, size_t inlen,
+				    size_t outlen, uint64_t wr_id,
+				    struct mlx5dv_devx_cmd_comp *cmd_comp);
+	int (*devx_get_async_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp,
+				       struct mlx5dv_devx_async_cmd_hdr *resp,
+				       size_t cmd_resp_len);
 	struct mlx5dv_devx_umem *(*devx_umem_reg)(struct ibv_context *context,
 						  void *addr, size_t size,
 						  uint32_t access);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH 4/4] net/mlx5: allow basic counter management fallback
  2019-07-16 14:34 ` [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Matan Azrad
                     ` (2 preceding siblings ...)
  2019-07-16 14:34   ` [dpdk-dev] [PATCH 3/4] net/mlx5: accelerate DV flow counter query Matan Azrad
@ 2019-07-16 14:34   ` Matan Azrad
  2019-07-17  6:50   ` [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Raslan Darawsheh
  4 siblings, 0 replies; 11+ messages in thread
From: Matan Azrad @ 2019-07-16 14:34 UTC (permalink / raw)
  To: Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko; +Cc: dev

In case the asynchronous devx commands are not supported in RDMA core
fallback to use a basic counter management.

Here, the PMD counters cashe is redundant and the host thread doesn't
update it. hence, each counter operation will go to the FW and the
acceleration reduces.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c           |   8 +++
 drivers/net/mlx5/mlx5.h           |   2 +
 drivers/net/mlx5/mlx5_devx_cmds.c |   4 +-
 drivers/net/mlx5/mlx5_flow_dv.c   | 127 ++++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_prm.h       |   4 +-
 5 files changed, 137 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a8d824e..f4ad5d2 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1624,11 +1624,19 @@ struct mlx5_dev_spawn_data {
 	mlx5_link_update(eth_dev, 0);
 #ifdef HAVE_IBV_DEVX_OBJ
 	if (config.devx) {
+		priv->counter_fallback = 0;
 		err = mlx5_devx_cmd_query_hca_attr(sh->ctx, &config.hca_attr);
 		if (err) {
 			err = -err;
 			goto error;
 		}
+		if (!config.hca_attr.flow_counters_dump)
+			priv->counter_fallback = 1;
+#ifndef HAVE_IBV_DEVX_ASYNC
+		priv->counter_fallback = 1;
+#endif
+		if (priv->counter_fallback)
+			DRV_LOG(INFO, "Use fall-back DV counter management\n");
 	}
 #endif
 #ifdef HAVE_MLX5DV_DR_ESWITCH
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4ce352a..2bd2aa6 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -168,6 +168,7 @@ struct mlx5_devx_mkey_attr {
 /* HCA attributes. */
 struct mlx5_hca_attr {
 	uint32_t eswitch_manager:1;
+	uint32_t flow_counters_dump:1;
 	uint8_t flow_counter_bulk_alloc_bitmap;
 };
 
@@ -457,6 +458,7 @@ struct mlx5_priv {
 	unsigned int representor:1; /* Device is a port representor. */
 	unsigned int master:1; /* Device is a E-Switch master. */
 	unsigned int dr_shared:1; /* DV/DR data is shared. */
+	unsigned int counter_fallback:1; /* Use counter fallback management. */
 	uint16_t domain_id; /* Switch domain identifier. */
 	uint16_t vport_id; /* Associated VF vport index (if any). */
 	int32_t representor_id; /* Port representor identifier. */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 28d967a..d26d5bc 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -57,7 +57,7 @@ struct mlx5_devx_obj *
  * @param[in] clear
  *   Whether hardware should clear the counters after the query or not.
  * @param[in] n_counters
- *   The counter number to read.
+ *   0 in case of 1 counter to read, otherwise the counter number to read.
  *  @param pkts
  *   The number of packets that matched the flow.
  *  @param bytes
@@ -271,6 +271,8 @@ struct mlx5_devx_obj *
 	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
 	attr->flow_counter_bulk_alloc_bitmap =
 			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
+	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
+					    flow_counters_dump);
 	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
 	return 0;
 }
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 4849bd9..1d1ff90 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -2150,6 +2150,113 @@ struct field_modify_info modify_tcp[] = {
 #define MLX5_CNT_CONTAINER_RESIZE 64
 
 /**
+ * Get or create a flow counter.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] shared
+ *   Indicate if this counter is shared with other flows.
+ * @param[in] id
+ *   Counter identifier.
+ *
+ * @return
+ *   pointer to flow counter on success, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_flow_counter *
+flow_dv_counter_alloc_fallback(struct rte_eth_dev *dev, uint32_t shared,
+			       uint32_t id)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_counter *cnt = NULL;
+	struct mlx5_devx_obj *dcs = NULL;
+
+	if (!priv->config.devx) {
+		rte_errno = ENOTSUP;
+		return NULL;
+	}
+	if (shared) {
+		TAILQ_FOREACH(cnt, &priv->sh->cmng.flow_counters, next) {
+			if (cnt->shared && cnt->id == id) {
+				cnt->ref_cnt++;
+				return cnt;
+			}
+		}
+	}
+	dcs = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, 0);
+	if (!dcs)
+		return NULL;
+	cnt = rte_calloc(__func__, 1, sizeof(*cnt), 0);
+	if (!cnt) {
+		claim_zero(mlx5_devx_cmd_destroy(cnt->dcs));
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	struct mlx5_flow_counter tmpl = {
+		.shared = shared,
+		.ref_cnt = 1,
+		.id = id,
+		.dcs = dcs,
+	};
+	tmpl.action = mlx5_glue->dv_create_flow_action_counter(dcs->obj, 0);
+	if (!tmpl.action) {
+		claim_zero(mlx5_devx_cmd_destroy(cnt->dcs));
+		rte_errno = errno;
+		rte_free(cnt);
+		return NULL;
+	}
+	*cnt = tmpl;
+	TAILQ_INSERT_HEAD(&priv->sh->cmng.flow_counters, cnt, next);
+	return cnt;
+}
+
+/**
+ * Release a flow counter.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] counter
+ *   Pointer to the counter handler.
+ */
+static void
+flow_dv_counter_release_fallback(struct rte_eth_dev *dev,
+				 struct mlx5_flow_counter *counter)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (!counter)
+		return;
+	if (--counter->ref_cnt == 0) {
+		TAILQ_REMOVE(&priv->sh->cmng.flow_counters, counter, next);
+		claim_zero(mlx5_devx_cmd_destroy(counter->dcs));
+		rte_free(counter);
+	}
+}
+
+/**
+ * Query a devx flow counter.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] cnt
+ *   Pointer to the flow counter.
+ * @param[out] pkts
+ *   The statistics value of packets.
+ * @param[out] bytes
+ *   The statistics value of bytes.
+ *
+ * @return
+ *   0 on success, otherwise a negative errno value and rte_errno is set.
+ */
+static inline int
+_flow_dv_query_count_fallback(struct rte_eth_dev *dev __rte_unused,
+		     struct mlx5_flow_counter *cnt, uint64_t *pkts,
+		     uint64_t *bytes)
+{
+	return mlx5_devx_cmd_flow_counter_query(cnt->dcs, 0, 0, pkts, bytes,
+						0, NULL, NULL, 0);
+}
+
+/**
  * Get a pool by a counter.
  *
  * @param[in] cnt
@@ -2335,14 +2442,18 @@ struct field_modify_info modify_tcp[] = {
  *   0 on success, otherwise a negative errno value and rte_errno is set.
  */
 static inline int
-_flow_dv_query_count(struct rte_eth_dev *dev __rte_unused,
+_flow_dv_query_count(struct rte_eth_dev *dev,
 		     struct mlx5_flow_counter *cnt, uint64_t *pkts,
 		     uint64_t *bytes)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_counter_pool *pool =
 			flow_dv_counter_pool_get(cnt);
 	int offset = cnt - &pool->counters_raw[0];
 
+	if (priv->counter_fallback)
+		return _flow_dv_query_count_fallback(dev, cnt, pkts, bytes);
+
 	rte_spinlock_lock(&pool->sl);
 	/*
 	 * The single counters allocation may allocate smaller ID than the
@@ -2547,10 +2658,8 @@ struct field_modify_info modify_tcp[] = {
 	struct mlx5_pools_container *cont = MLX5_CNT_CONTAINER(priv->sh, batch,
 							       0);
 
-#ifndef HAVE_IBV_DEVX_ASYNC
-	rte_errno = ENOTSUP;
-	return NULL;
-#endif
+	if (priv->counter_fallback)
+		return flow_dv_counter_alloc_fallback(dev, shared, id);
 	if (!priv->config.devx) {
 		rte_errno = ENOTSUP;
 		return NULL;
@@ -2636,11 +2745,17 @@ struct field_modify_info modify_tcp[] = {
  *   Pointer to the counter handler.
  */
 static void
-flow_dv_counter_release(struct rte_eth_dev *dev __rte_unused,
+flow_dv_counter_release(struct rte_eth_dev *dev,
 			struct mlx5_flow_counter *counter)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	if (!counter)
 		return;
+	if (priv->counter_fallback) {
+		flow_dv_counter_release_fallback(dev, counter);
+		return;
+	}
 	if (--counter->ref_cnt == 0) {
 		struct mlx5_flow_counter_pool *pool =
 				flow_dv_counter_pool_get(counter);
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index 79f852b..95ff29a 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -920,7 +920,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8 reserved_at_343[0x5];
 	u8 log_max_flow_counter_bulk[0x8];
 	u8 max_flow_counter_15_0[0x10];
-	u8 reserved_at_360[0x3];
+	u8 modify_tis[0x1];
+	u8 flow_counters_dump[0x1];
+	u8 reserved_at_360[0x1];
 	u8 log_max_rq[0x5];
 	u8 reserved_at_368[0x3];
 	u8 log_max_sq[0x5];
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement
  2019-07-16 14:34 ` [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Matan Azrad
                     ` (3 preceding siblings ...)
  2019-07-16 14:34   ` [dpdk-dev] [PATCH 4/4] net/mlx5: allow basic counter management fallback Matan Azrad
@ 2019-07-17  6:50   ` Raslan Darawsheh
  4 siblings, 0 replies; 11+ messages in thread
From: Raslan Darawsheh @ 2019-07-17  6:50 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Yongseok Koh, Slava Ovsiienko; +Cc: dev

Hi,

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Matan Azrad
> Sent: Tuesday, July 16, 2019 5:35 PM
> To: Shahaf Shuler <shahafs@mellanox.com>; Yongseok Koh
> <yskoh@mellanox.com>; Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters
> mangement
> 
> New features in devx to query and allocate flow counters by batch
> commands allow to accelerate flow counter create/destroy/query.
> 
> v2:
> rebase.
> 
> Matan Azrad (4):
>   net/mlx5: accelerate DV flow counter transactions
>   net/mlx5: resize a full counter container
>   net/mlx5: accelerate DV flow counter query
>   net/mlx5: allow basic counter management fallback
> 
>  doc/guides/rel_notes/release_19_08.rst |   2 +
>  drivers/net/mlx5/Makefile              |   7 +-
>  drivers/net/mlx5/meson.build           |   4 +-
>  drivers/net/mlx5/mlx5.c                | 102 ++++++
>  drivers/net/mlx5/mlx5.h                | 145 +++++++-
>  drivers/net/mlx5/mlx5_devx_cmds.c      | 225 +++++++++---
>  drivers/net/mlx5/mlx5_ethdev.c         |  85 ++++-
>  drivers/net/mlx5/mlx5_flow.c           | 147 ++++++++
>  drivers/net/mlx5/mlx5_flow.h           |  27 +-
>  drivers/net/mlx5/mlx5_flow_dv.c        | 616
> ++++++++++++++++++++++++++++++---
>  drivers/net/mlx5/mlx5_flow_verbs.c     |  15 +-
>  drivers/net/mlx5/mlx5_glue.c           |  91 +++++
>  drivers/net/mlx5/mlx5_glue.h           |  20 ++
>  drivers/net/mlx5/mlx5_prm.h            | 116 ++++++-
>  14 files changed, 1463 insertions(+), 139 deletions(-)
> 
> --
> 1.8.3.1

Series applied to next-net-mlx,


Kindest regards
Raslan Darawsheh

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-08 14:07 [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Matan Azrad
2019-07-08 14:07 ` [dpdk-dev] [PATCH 1/4] net/mlx5: accelerate DV flow counter transactions Matan Azrad
2019-07-08 14:07 ` [dpdk-dev] [PATCH 2/4] net/mlx5: resize a full counter container Matan Azrad
2019-07-08 14:07 ` [dpdk-dev] [PATCH 3/4] net/mlx5: accelerate DV flow counter query Matan Azrad
2019-07-08 14:07 ` [dpdk-dev] [PATCH 4/4] net/mlx5: allow basic counter management fallback Matan Azrad
2019-07-16 14:34 ` [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Matan Azrad
2019-07-16 14:34   ` [dpdk-dev] [PATCH 1/4] net/mlx5: accelerate DV flow counter transactions Matan Azrad
2019-07-16 14:34   ` [dpdk-dev] [PATCH 2/4] net/mlx5: resize a full counter container Matan Azrad
2019-07-16 14:34   ` [dpdk-dev] [PATCH 3/4] net/mlx5: accelerate DV flow counter query Matan Azrad
2019-07-16 14:34   ` [dpdk-dev] [PATCH 4/4] net/mlx5: allow basic counter management fallback Matan Azrad
2019-07-17  6:50   ` [dpdk-dev] [PATCH 0/4] net/mlx5: accelerate DV flow counters mangement Raslan Darawsheh

DPDK-dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/dpdk-dev/0 dpdk-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dpdk-dev dpdk-dev/ https://lore.kernel.org/dpdk-dev \
		dev@dpdk.org dpdk-dev@archiver.kernel.org
	public-inbox-index dpdk-dev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox