All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next 00/16] Flow counters support
@ 2017-10-19 14:41 Yishai Hadas
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

Hi Doug,

This series from Raed introduces an API to read via the verbs interface hardware
counters for a specific IB Flow object.

The above enables a user space application to read the real time information on
its various Flow Steering objects and take run time decisions accordingly.

The first 8 patches of the series introduce the generic interface for using
counters then there are 2 patches which enable the flow steering counters on
top of the API.

The last 6 patches are the mlx5 driver implementation of the counters verbs.

We may add in the future on top of the API an option to read also QP, WQ and
other objects' counters. 

The series follows an RFC [1] which introduced the motivation, the expected
usage and the API.

More specifically,
A generic interface with the following functionality is presented:
- An option to know per device how many counter sets it exposes by using some
  extension of the ib_query_device_ex() verb. (i.e. max_counter_sets).
- Per counter set there is an API to get its metadata by using a new verb named
   ib_describe_counter_set(), it gets a counter set id in range of 
   [0,max_counter_sets - 1] and return the counter set information.
   Some of the metadata is IB generic (e.g. counter set type, num counters in the set, etc.)
   while other is some specific driver information (e.g. counters names).
- Introduce counter set object named 'ib_counter_set' and its create/destroy
  verbs. Post a successful creation the counter_set object can  be attached to
  an IB object based on its type. (e.g. Flow, QP, etc.) 
- Introduce the ib_query_counter_set() verb, it enables the user to read the
  hardware counters which are associated with the counter set.
- Introduce a new steering specification from type count 
  (i.e. ib_flow_spec_action_count) which gets a previously created counter_id.
  By creating a flow steering object with a counter, its counters can be read
  by the ib_query_counter_set() verb.

A detailed information on each API and its properties appear in the commit logs
and in the relevant H file.

[1] RFC - https://www.spinics.net/lists/linux-rdma/msg47355.html

Yishai

Boris Pismenny (1):
  IB/mlx5: Pass mlx5_flow_act struct instead of multiple arguments

Raed Salem (15):
  IB/core: Expose max_counter_sets capability
  IB/uverbs: Expose max_counter_sets capability
  IB/core: Introduce counter set describe verb
  IB/uverbs: Add describe counter set support
  IB/core: Introduce counter set object and its create/destroy verbs
  IB/uverbs: Add create/destroy counter set support
  IB/core: Introduce counter set query verb
  IB/uverbs: Add query counter set support
  IB/core: Add support for flow counter set
  IB/uverbs: Add support for flow counter set
  net/mlx5: Export flow counter related API
  net/mlx5: Expand mlx5_fc_query_cached to return absolute counters
    values
  IB/mlx5: Add counter set operations
  IB/mlx5: Add flow counter set support
  IB/mlx5: Expose max_counter_sets capability

 drivers/infiniband/core/uverbs.h                   |   5 +
 drivers/infiniband/core/uverbs_cmd.c               | 314 ++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c              |   4 +
 drivers/infiniband/core/uverbs_std_types.c         |  15 +-
 drivers/infiniband/core/verbs.c                    |  91 +++++-
 drivers/infiniband/hw/mlx5/main.c                  | 364 +++++++++++++++++++--
 drivers/infiniband/hw/mlx5/mlx5_ib.h               |  31 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h   |   2 -
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |  23 --
 .../net/ethernet/mellanox/mlx5/core/fs_counters.c  |  17 +-
 include/linux/mlx5/fs.h                            |  31 +-
 include/rdma/ib_verbs.h                            |  74 +++++
 include/rdma/uverbs_std_types.h                    |   1 +
 include/uapi/rdma/ib_user_ioctl_verbs.h            |   1 +
 include/uapi/rdma/ib_user_verbs.h                  |  75 ++++-
 include/uapi/rdma/mlx5-abi.h                       |  16 +
 18 files changed, 1016 insertions(+), 55 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 01/16] IB/core: Expose max_counter_sets capability
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-10-19 14:41   ` Yishai Hadas
  2017-10-19 14:41   ` [PATCH rdma-next 02/16] IB/uverbs: " Yishai Hadas
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

max_counter_sets represents the number of different counter sets
exposed from the device.
A valid counter set id must be in range of [0, max_counter_sets - 1],
it will be used with counter set verbs in downstream patches in
this series.

Signed-off-by: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 include/rdma/ib_verbs.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index e1d027a..0cc880d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -371,6 +371,7 @@ struct ib_device_attr {
 	u32			raw_packet_caps; /* Use ib_raw_packet_caps enum */
 	struct ib_tm_caps	tm_caps;
 	struct ib_cq_caps       cq_caps;
+	u16			max_counter_sets;
 };
 
 enum ib_mtu {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 02/16] IB/uverbs: Expose max_counter_sets capability
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-10-19 14:41   ` [PATCH rdma-next 01/16] IB/core: Expose max_counter_sets capability Yishai Hadas
@ 2017-10-19 14:41   ` Yishai Hadas
  2017-10-19 14:41   ` [PATCH rdma-next 03/16] IB/core: Introduce counter set describe verb Yishai Hadas
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

max_counter_sets represents the number of different counter sets
exposed from the device.
A valid counter set id must be in range of [0, max_counter_sets - 1],
it will be used with counter set verbs in downstream patches in
this series.

Signed-off-by: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c | 8 ++++++++
 include/uapi/rdma/ib_user_verbs.h    | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index ce969cc..9c6b795 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3861,6 +3861,14 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
 	resp.cq_moderation_caps.max_cq_moderation_period =
 		attr.cq_caps.max_cq_moderation_period;
 	resp.response_length += sizeof(resp.cq_moderation_caps);
+
+	if (ucore->outlen < resp.response_length +
+		sizeof(resp.max_counter_sets) + sizeof(resp.reserved))
+		goto end;
+
+	resp.max_counter_sets = attr.max_counter_sets;
+	resp.response_length += sizeof(resp.max_counter_sets) +
+			 sizeof(resp.reserved);
 end:
 	err = ib_copy_to_udata(ucore, &resp, resp.response_length);
 	return err;
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 7745a0e..b3f952c 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -270,6 +270,8 @@ struct ib_uverbs_ex_query_device_resp {
 	__u32 raw_packet_caps;
 	struct ib_uverbs_tm_caps tm_caps;
 	struct ib_uverbs_cq_caps cq_moderation_caps;
+	__u16 max_counter_sets;
+	__u8 reserved[6];
 };
 
 struct ib_uverbs_query_port {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 03/16] IB/core: Introduce counter set describe verb
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-10-19 14:41   ` [PATCH rdma-next 01/16] IB/core: Expose max_counter_sets capability Yishai Hadas
  2017-10-19 14:41   ` [PATCH rdma-next 02/16] IB/uverbs: " Yishai Hadas
@ 2017-10-19 14:41   ` Yishai Hadas
       [not found]     ` <1508424118-27205-4-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-10-19 14:41   ` [PATCH rdma-next 04/16] IB/uverbs: Add describe counter set support Yishai Hadas
                     ` (13 subsequent siblings)
  16 siblings, 1 reply; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

In order to work with a counter set object, first need to get various
information on the counter set, hence the describe verb is introduced.

The following info is exposed by the describe verb:
- Type of counter set:
  Which type of verb object this counter set
  could count, for example Flow.
- Number of instances of this counter-set that are available in the
  hardware (i.e. num_of_cs).
- Additional attributes about the counter set, for example is cache
  supported ? if yes, application can choose upon query to read the data from
  the cache.
- Number of counters in the counter set i.e. entries_count.
- An array of null terminated counters names strings within this counter set,
  each name is 64 bytes aligned.
  The user must allocate sufficient buffer size to get this data.
  In case buffer is NULL all other metadata is returned, except of
  the array.
  Based on the returned metadata, user can know the required buffer size
  needed to hold the counters names, which is 64 * entries_count.

Signed-off-by: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/verbs.c | 20 ++++++++++++++++++++
 include/rdma/ib_verbs.h         | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index d8f1a5d..eda389a 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -2290,3 +2290,23 @@ void ib_drain_qp(struct ib_qp *qp)
 		ib_drain_rq(qp);
 }
 EXPORT_SYMBOL(ib_drain_qp);
+
+/**
+ * ib_describe_counter_set - Describes a Counter Set
+ * @device: The device on which to describe a counter set.
+ * @cs_id: the counter set id to be described
+ * @cs_describe_attr: A list of description attributes
+ * required to get the outcome.
+ */
+int ib_describe_counter_set(struct ib_device *device,
+			    u16 cs_id,
+			    struct ib_counter_set_describe_attr *cs_describe_attr)
+{
+	if (!device->describe_counter_set)
+		return -ENOSYS;
+
+	return device->describe_counter_set(device, cs_id,
+					cs_describe_attr,
+					NULL);
+}
+EXPORT_SYMBOL(ib_describe_counter_set);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0cc880d..44f92c3 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2053,6 +2053,30 @@ struct ib_port_pkey_list {
 	struct list_head              pkey_list;
 };
 
+enum ib_counter_set_type {
+	IB_COUNTER_SET_FLOW,
+};
+
+enum ib_counter_set_attributes {
+	/* Is cache supported */
+	IB_COUNTER_SET_ATTR_CACHED = 1 << 0,
+};
+
+#define IB_COUNTER_NAME_LEN 64
+struct ib_counter_set_describe_attr {
+	/* Type that this set refers to, use enum ib_counter_set_type */
+	u8 counted_type;
+	/* Number of instances of this counter-set available in the hardware */
+	u64 num_of_cs;
+	/* Attributes of the set, use enum ib_counter_set_attributes */
+	u32 attributes;
+	/* Number of counters in this set */
+	u8 entries_count;
+	/* Counters_names_buff length */
+	u16 counters_names_len;
+	char *counters_names_buff;
+};
+
 struct ib_device {
 	/* Do not access @dma_device directly from ULP nor from HW drivers. */
 	struct device                *dma_device;
@@ -2308,6 +2332,11 @@ struct ib_device {
 							   struct ib_rwq_ind_table_init_attr *init_attr,
 							   struct ib_udata *udata);
 	int                        (*destroy_rwq_ind_table)(struct ib_rwq_ind_table *wq_ind_table);
+	int	(*describe_counter_set)(struct ib_device *device,
+					u16	cs_id,
+					struct ib_counter_set_describe_attr *cs_describe_attr,
+					struct ib_udata *udata);
+
 	/**
 	 * rdma netdev operation
 	 *
@@ -3615,6 +3644,10 @@ struct ib_rwq_ind_table *ib_create_rwq_ind_table(struct ib_device *device,
 						 wq_ind_table_init_attr);
 int ib_destroy_rwq_ind_table(struct ib_rwq_ind_table *wq_ind_table);
 
+int ib_describe_counter_set(struct ib_device *device,
+			    u16 cs_id,
+			    struct ib_counter_set_describe_attr *cs_describe_attr);
+
 int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
 		 unsigned int *sg_offset, unsigned int page_size);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 04/16] IB/uverbs: Add describe counter set support
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-10-19 14:41   ` [PATCH rdma-next 03/16] IB/core: Introduce counter set describe verb Yishai Hadas
@ 2017-10-19 14:41   ` Yishai Hadas
  2017-10-19 14:41   ` [PATCH rdma-next 05/16] IB/core: Introduce counter set object and its create/destroy verbs Yishai Hadas
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

User space application which uses counter set functionality,
is expected first to call counter set describe to get various
information on the counter set so it can be used later on.

The command returns the counter set static information as of
type, number of counters, attributes of the counter set, etc.
In case user asks for the counters names by supplying a buffer
rather than null this data will be returned as well in
case the buffer length is sufficient.
Based on the returned metadata, user can know the required buffer
size needed to hold the counters names, which is 64 * entries_count.

Signed-off-by: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h      |  1 +
 drivers/infiniband/core/uverbs_cmd.c  | 75 +++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c |  1 +
 include/uapi/rdma/ib_user_verbs.h     | 21 +++++++++-
 4 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index deccefb..7a95153 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -307,5 +307,6 @@ struct ib_uverbs_flow_spec {
 IB_UVERBS_DECLARE_EX_CMD(destroy_rwq_ind_table);
 IB_UVERBS_DECLARE_EX_CMD(modify_qp);
 IB_UVERBS_DECLARE_EX_CMD(modify_cq);
+IB_UVERBS_DECLARE_EX_CMD(describe_counter_set);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 9c6b795..24775a5 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3909,3 +3909,78 @@ int ib_uverbs_ex_modify_cq(struct ib_uverbs_file *file,
 
 	return ret;
 }
+
+int ib_uverbs_ex_describe_counter_set(struct ib_uverbs_file *file,
+				      struct ib_device *ib_dev,
+				      struct ib_udata *ucore,
+				      struct ib_udata *uhw)
+{
+	struct ib_uverbs_ex_describe_counter_set	cmd = {};
+	struct ib_uverbs_ex_describe_counter_set_resp	resp = {};
+	struct ib_counter_set_describe_attr cs_describe_attr = {};
+	size_t	required_resp_len;
+	size_t	required_cmd_sz;
+	u16 buff_len;
+	int ret = 0;
+
+	required_cmd_sz = offsetof(typeof(cmd), cs_id) +
+			sizeof(cmd.cs_id);
+	required_resp_len = offsetof(typeof(resp), reserved) +
+			sizeof(resp.reserved);
+
+	if (ucore->inlen < required_cmd_sz)
+		return -EINVAL;
+
+	if (ucore->outlen < required_resp_len)
+		return -ENOSPC;
+
+	if (ucore->inlen > sizeof(cmd) &&
+	    !ib_is_udata_cleared(ucore, sizeof(cmd),
+			ucore->inlen - sizeof(cmd)))
+		return -EOPNOTSUPP;
+
+	ret = ib_copy_from_udata(&cmd, ucore, min(sizeof(cmd), ucore->inlen));
+	if (ret)
+		return ret;
+
+	if (cmd.comp_mask)
+		return -EOPNOTSUPP;
+
+	if (cmd.counters_names_resp) {
+		/* Prevent memory allocation rather than max expected size */
+		buff_len = min_t(u16, cmd.counters_names_max,
+				 MAX_COUNTER_SET_BUFF_LEN);
+		cs_describe_attr.counters_names_buff =
+				kzalloc(buff_len, GFP_KERNEL);
+		if (!cs_describe_attr.counters_names_buff)
+			return -ENOMEM;
+
+		cs_describe_attr.counters_names_len = buff_len;
+	}
+
+	ret = ib_dev->describe_counter_set(ib_dev, cmd.cs_id,
+					   &cs_describe_attr, uhw);
+	if (ret)
+		goto err;
+
+	if (cmd.counters_names_resp) {
+		if (copy_to_user(u64_to_user_ptr(cmd.counters_names_resp),
+				 cs_describe_attr.counters_names_buff,
+				 cs_describe_attr.entries_count *
+				 IB_COUNTER_NAME_LEN)) {
+			ret = -EFAULT;
+			goto err;
+		}
+	}
+
+	resp.num_of_cs = cs_describe_attr.num_of_cs;
+	resp.attributes = cs_describe_attr.attributes;
+	resp.counted_type = cs_describe_attr.counted_type;
+	resp.entries_count = cs_describe_attr.entries_count;
+	resp.response_length = required_resp_len;
+	ret = ib_copy_to_udata(ucore, &resp, resp.response_length);
+
+err:
+	kfree(cs_describe_attr.counters_names_buff);
+	return ret;
+}
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 381fd9c..51b04bd 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -129,6 +129,7 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file *file,
 	[IB_USER_VERBS_EX_CMD_DESTROY_RWQ_IND_TBL] = ib_uverbs_ex_destroy_rwq_ind_table,
 	[IB_USER_VERBS_EX_CMD_MODIFY_QP]        = ib_uverbs_ex_modify_qp,
 	[IB_USER_VERBS_EX_CMD_MODIFY_CQ]        = ib_uverbs_ex_modify_cq,
+	[IB_USER_VERBS_EX_CMD_DESCRIBE_COUNTER_SET]	= ib_uverbs_ex_describe_counter_set,
 };
 
 static void ib_uverbs_add_one(struct ib_device *device);
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index b3f952c..0bed600 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -101,7 +101,8 @@ enum {
 	IB_USER_VERBS_EX_CMD_DESTROY_WQ,
 	IB_USER_VERBS_EX_CMD_CREATE_RWQ_IND_TBL,
 	IB_USER_VERBS_EX_CMD_DESTROY_RWQ_IND_TBL,
-	IB_USER_VERBS_EX_CMD_MODIFY_CQ
+	IB_USER_VERBS_EX_CMD_MODIFY_CQ,
+	IB_USER_VERBS_EX_CMD_DESCRIBE_COUNTER_SET,
 };
 
 /*
@@ -1168,6 +1169,24 @@ struct ib_uverbs_ex_modify_cq {
 	__u32 comp_mask;
 };
 
+#define MAX_COUNTER_SET_BUFF_LEN 4096
+struct ib_uverbs_ex_describe_counter_set  {
+	__u64 counters_names_resp;
+	__u32 comp_mask;
+	__u16 counters_names_max;
+	__u16 cs_id;
+};
+
+struct ib_uverbs_ex_describe_counter_set_resp {
+	__u64 num_of_cs;
+	__u32 comp_mask;
+	__u32 response_length;
+	__u32 attributes;
+	__u8 counted_type;
+	__u8 entries_count;
+	__u16 reserved;
+};
+
 #define IB_DEVICE_NAME_MAX 64
 
 #endif /* IB_USER_VERBS_H */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 05/16] IB/core: Introduce counter set object and its create/destroy verbs
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2017-10-19 14:41   ` [PATCH rdma-next 04/16] IB/uverbs: Add describe counter set support Yishai Hadas
@ 2017-10-19 14:41   ` Yishai Hadas
  2017-10-19 14:41   ` [PATCH rdma-next 06/16] IB/uverbs: Add create/destroy counter set support Yishai Hadas
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Counter set instance is allocated on an IB context and belongs to
that context.
Upon successful creation the counter set hardware resources are
allocated and are ready to be bound to its counted type IB object.
Post the bind operation the hardware related counters could be queried.

Downstream patches in this series will introduce the bind and the
query functionality.

Counter set instance can be de-allocated, upon successful
destruction the related hardware resources are released.

Prior to calling the destroy verb the user must first make sure that
the counter set is not being used by its IB object, in other words not
bound to any of its counted objects.

Signed-off-by: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/verbs.c | 42 +++++++++++++++++++++++++++++++++++++++++
 include/rdma/ib_verbs.h         | 15 +++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index eda389a..ea71393 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -2310,3 +2310,45 @@ int ib_describe_counter_set(struct ib_device *device,
 					NULL);
 }
 EXPORT_SYMBOL(ib_describe_counter_set);
+
+/**
+ * ib_create_counter_set - Creates a Counter Set
+ * @device: The device on which to create the counter set.
+ * @cs_id: counter set id required to
+ * create the counter set.
+ */
+struct ib_counter_set *ib_create_counter_set(struct ib_device *device,
+					     u16 cs_id)
+{
+	struct ib_counter_set *cs;
+
+	if (!device->create_counter_set)
+		return ERR_PTR(-ENOSYS);
+
+	cs = device->create_counter_set(device, cs_id, NULL);
+	if (!IS_ERR(cs)) {
+		cs->device = device;
+		cs->uobject = NULL;
+		cs->cs_id = cs_id;
+		atomic_set(&cs->usecnt, 0);
+	}
+
+	return cs;
+}
+EXPORT_SYMBOL(ib_create_counter_set);
+
+/**
+ * ib_destroy_counter_set - Destroys the specified counter set.
+ * @cs: The counter set to destroy.
+ **/
+int ib_destroy_counter_set(struct ib_counter_set *cs)
+{
+	if (!cs->device->destroy_counter_set)
+		return -ENOSYS;
+
+	if (atomic_read(&cs->usecnt))
+		return -EBUSY;
+
+	return cs->device->destroy_counter_set(cs);
+}
+EXPORT_SYMBOL(ib_destroy_counter_set);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 44f92c3..1c78db7 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2077,6 +2077,14 @@ struct ib_counter_set_describe_attr {
 	char *counters_names_buff;
 };
 
+struct ib_counter_set {
+	struct ib_device	*device;
+	struct ib_uobject	*uobject;
+	u16	cs_id;
+	/* num of objects attached */
+	atomic_t	usecnt;
+};
+
 struct ib_device {
 	/* Do not access @dma_device directly from ULP nor from HW drivers. */
 	struct device                *dma_device;
@@ -2336,6 +2344,10 @@ struct ib_device {
 					u16	cs_id,
 					struct ib_counter_set_describe_attr *cs_describe_attr,
 					struct ib_udata *udata);
+	struct ib_counter_set *	(*create_counter_set)(struct ib_device *device,
+						      u16 cs_id,
+						      struct ib_udata *udata);
+	int	(*destroy_counter_set)(struct ib_counter_set *cs);
 
 	/**
 	 * rdma netdev operation
@@ -3647,6 +3659,9 @@ struct ib_rwq_ind_table *ib_create_rwq_ind_table(struct ib_device *device,
 int ib_describe_counter_set(struct ib_device *device,
 			    u16 cs_id,
 			    struct ib_counter_set_describe_attr *cs_describe_attr);
+struct ib_counter_set *ib_create_counter_set(struct ib_device *device,
+					     u16 cs_id);
+int ib_destroy_counter_set(struct ib_counter_set *cs);
 
 int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
 		 unsigned int *sg_offset, unsigned int page_size);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 06/16] IB/uverbs: Add create/destroy counter set support
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2017-10-19 14:41   ` [PATCH rdma-next 05/16] IB/core: Introduce counter set object and its create/destroy verbs Yishai Hadas
@ 2017-10-19 14:41   ` Yishai Hadas
  2017-10-19 14:41   ` [PATCH rdma-next 07/16] IB/core: Introduce counter set query verb Yishai Hadas
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

User space application which uses counter set functionality,
is expected to allocate/release the counter set hardware
resources by calling create/destroy verbs and in turn get
a unique handle that can be used to bind the counter set to
its counted type so the hardware related counters could be
queried.

Downstream patches in this series will introduce the bind and the
query functionality.

Signed-off-by: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h           |   2 +
 drivers/infiniband/core/uverbs_cmd.c       | 117 +++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c      |   2 +
 drivers/infiniband/core/uverbs_std_types.c |  15 +++-
 include/rdma/uverbs_std_types.h            |   1 +
 include/uapi/rdma/ib_user_ioctl_verbs.h    |   1 +
 include/uapi/rdma/ib_user_verbs.h          |  25 ++++++
 7 files changed, 162 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 7a95153..e1e8379 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -308,5 +308,7 @@ struct ib_uverbs_flow_spec {
 IB_UVERBS_DECLARE_EX_CMD(modify_qp);
 IB_UVERBS_DECLARE_EX_CMD(modify_cq);
 IB_UVERBS_DECLARE_EX_CMD(describe_counter_set);
+IB_UVERBS_DECLARE_EX_CMD(create_counter_set);
+IB_UVERBS_DECLARE_EX_CMD(destroy_counter_set);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 24775a5..b648c8f 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3984,3 +3984,120 @@ int ib_uverbs_ex_describe_counter_set(struct ib_uverbs_file *file,
 	kfree(cs_describe_attr.counters_names_buff);
 	return ret;
 }
+
+int ib_uverbs_ex_create_counter_set(struct ib_uverbs_file *file,
+				    struct ib_device *ib_dev,
+				    struct ib_udata *ucore,
+				    struct ib_udata *uhw)
+{
+	struct ib_uverbs_ex_create_counter_set	cmd = {};
+	struct ib_uverbs_ex_create_counter_set_resp	resp = {};
+	struct ib_counter_set	*cs;
+	struct ib_uobject	*uobj;
+	size_t	required_resp_len;
+	size_t	required_cmd_sz;
+	int	ret = 0;
+
+	required_cmd_sz = offsetof(typeof(cmd), reserved) +
+			sizeof(cmd.reserved);
+	required_resp_len = offsetof(typeof(resp), reserved) +
+			sizeof(resp.reserved);
+
+	if (ucore->inlen < required_cmd_sz)
+		return -EINVAL;
+
+	if (ucore->outlen < required_resp_len)
+		return -ENOSPC;
+
+	if (ucore->inlen > sizeof(cmd) &&
+	    !ib_is_udata_cleared(ucore, sizeof(cmd),
+			ucore->inlen - sizeof(cmd)))
+		return -EOPNOTSUPP;
+
+	ret = ib_copy_from_udata(&cmd, ucore, min(sizeof(cmd), ucore->inlen));
+	if (ret)
+		return ret;
+
+	if (cmd.comp_mask || cmd.reserved)
+		return -EOPNOTSUPP;
+
+	uobj = uobj_alloc(uobj_get_type(counter_set), file->ucontext);
+	if (IS_ERR(uobj))
+		return PTR_ERR(uobj);
+
+	cs = ib_dev->create_counter_set(ib_dev, cmd.cs_id, uhw);
+	if (IS_ERR(cs)) {
+		ret = PTR_ERR(cs);
+		goto err_uobj;
+	}
+	uobj->object = cs;
+	cs->device = ib_dev;
+	cs->uobject = uobj;
+	cs->cs_id = cmd.cs_id;
+	atomic_set(&cs->usecnt, 0);
+	resp.cs_handle = uobj->id;
+	resp.response_length = required_resp_len;
+	ret = ib_copy_to_udata(ucore,
+			       &resp, required_resp_len);
+	if (ret)
+		goto err_copy;
+
+	uobj_alloc_commit(uobj);
+
+	return 0;
+
+err_copy:
+	ib_destroy_counter_set(cs);
+err_uobj:
+	uobj_alloc_abort(uobj);
+	return ret;
+}
+
+int ib_uverbs_ex_destroy_counter_set(struct ib_uverbs_file *file,
+				     struct ib_device *ib_dev,
+				     struct ib_udata *ucore,
+				     struct ib_udata *uhw)
+{
+	struct ib_uverbs_ex_destroy_counter_set	cmd = {};
+	struct ib_uverbs_ex_destroy_counter_set_resp	resp = {};
+	struct ib_uobject	*uobj;
+	size_t	required_resp_len;
+	size_t	required_cmd_sz;
+	int	ret;
+
+	required_cmd_sz = offsetof(typeof(cmd), cs_handle) +
+			sizeof(cmd.cs_handle);
+	required_resp_len = offsetof(typeof(resp), response_length) +
+			sizeof(resp.response_length);
+
+	if (ucore->inlen < required_cmd_sz)
+		return -EINVAL;
+
+	if (ucore->outlen < required_resp_len)
+		return -ENOSPC;
+
+	if (ucore->inlen > sizeof(cmd) &&
+	    !ib_is_udata_cleared(ucore, sizeof(cmd),
+			   ucore->inlen - sizeof(cmd)))
+		return -EOPNOTSUPP;
+
+	ret = ib_copy_from_udata(&cmd, ucore, min(sizeof(cmd), ucore->inlen));
+	if (ret)
+		return ret;
+
+	if (cmd.comp_mask)
+		return -EOPNOTSUPP;
+
+	uobj  = uobj_get_write(uobj_get_type(counter_set),
+			       cmd.cs_handle,
+			       file->ucontext);
+	if (IS_ERR(uobj))
+		return PTR_ERR(uobj);
+
+	ret = uobj_remove_commit(uobj);
+	if (ret)
+		return ret;
+
+	resp.response_length = required_resp_len;
+	return ib_copy_to_udata(ucore, &resp, resp.response_length);
+}
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 51b04bd..e9e575d 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -130,6 +130,8 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file *file,
 	[IB_USER_VERBS_EX_CMD_MODIFY_QP]        = ib_uverbs_ex_modify_qp,
 	[IB_USER_VERBS_EX_CMD_MODIFY_CQ]        = ib_uverbs_ex_modify_cq,
 	[IB_USER_VERBS_EX_CMD_DESCRIBE_COUNTER_SET]	= ib_uverbs_ex_describe_counter_set,
+	[IB_USER_VERBS_EX_CMD_CREATE_COUNTER_SET]	= ib_uverbs_ex_create_counter_set,
+	[IB_USER_VERBS_EX_CMD_DESTROY_COUNTER_SET]	= ib_uverbs_ex_destroy_counter_set,
 };
 
 static void ib_uverbs_add_one(struct ib_device *device);
diff --git a/drivers/infiniband/core/uverbs_std_types.c b/drivers/infiniband/core/uverbs_std_types.c
index b095bce..b49cc75 100644
--- a/drivers/infiniband/core/uverbs_std_types.c
+++ b/drivers/infiniband/core/uverbs_std_types.c
@@ -190,6 +190,15 @@ static int uverbs_free_pd(struct ib_uobject *uobject,
 	return 0;
 }
 
+static int uverbs_free_counter_set(struct ib_uobject *uobject,
+				   enum rdma_remove_reason why)
+{
+	struct ib_counter_set *ib_cs =
+			(struct ib_counter_set *)(uobject->object);
+
+	return ib_destroy_counter_set(ib_cs);
+}
+
 static int uverbs_hot_unplug_completion_event_file(struct ib_uobject_file *uobj_file,
 						   enum rdma_remove_reason why)
 {
@@ -435,6 +444,9 @@ static DECLARE_UVERBS_METHOD(
 		      /* 2 is used in order to free the PD after MRs */
 		      &UVERBS_TYPE_ALLOC_IDR(2, uverbs_free_pd));
 
+DECLARE_UVERBS_OBJECT(uverbs_object_counter_set, UVERBS_OBJECT_COUNTER_SET,
+		      &UVERBS_TYPE_ALLOC_IDR(0, uverbs_free_counter_set));
+
 DECLARE_UVERBS_OBJECT(uverbs_object_device, UVERBS_OBJECT_DEVICE, NULL);
 
 DECLARE_UVERBS_OBJECT_TREE(uverbs_default_objects,
@@ -450,4 +462,5 @@ static DECLARE_UVERBS_METHOD(
 			   &uverbs_object_flow,
 			   &uverbs_object_wq,
 			   &uverbs_object_rwq_ind_table,
-			   &uverbs_object_xrcd);
+			   &uverbs_object_xrcd,
+			   &uverbs_object_counter_set);
diff --git a/include/rdma/uverbs_std_types.h b/include/rdma/uverbs_std_types.h
index 5f8e20b..1690c52 100644
--- a/include/rdma/uverbs_std_types.h
+++ b/include/rdma/uverbs_std_types.h
@@ -51,6 +51,7 @@
 extern const struct uverbs_object_def uverbs_object_pd;
 extern const struct uverbs_object_def uverbs_object_xrcd;
 extern const struct uverbs_object_def uverbs_object_device;
+extern const struct uverbs_object_def uverbs_object_counter_set;
 
 extern const struct uverbs_object_tree_def uverbs_default_objects;
 static inline const struct uverbs_object_tree_def *uverbs_default_get_objects(void)
diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index 842792e..1ab80ed 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -52,6 +52,7 @@ enum uverbs_default_objects {
 	UVERBS_OBJECT_XRCD,
 	UVERBS_OBJECT_RWQ_IND_TBL,
 	UVERBS_OBJECT_WQ,
+	UVERBS_OBJECT_COUNTER_SET,
 	UVERBS_OBJECT_LAST,
 };
 
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 0bed600..fa15ac0 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -103,6 +103,8 @@ enum {
 	IB_USER_VERBS_EX_CMD_DESTROY_RWQ_IND_TBL,
 	IB_USER_VERBS_EX_CMD_MODIFY_CQ,
 	IB_USER_VERBS_EX_CMD_DESCRIBE_COUNTER_SET,
+	IB_USER_VERBS_EX_CMD_CREATE_COUNTER_SET,
+	IB_USER_VERBS_EX_CMD_DESTROY_COUNTER_SET,
 };
 
 /*
@@ -1187,6 +1189,29 @@ struct ib_uverbs_ex_describe_counter_set_resp {
 	__u16 reserved;
 };
 
+struct ib_uverbs_ex_create_counter_set {
+	__u32 comp_mask;
+	__u16 cs_id;
+	__u16 reserved;
+};
+
+struct ib_uverbs_ex_create_counter_set_resp {
+	__u32 comp_mask;
+	__u32 response_length;
+	__u32 cs_handle;
+	__u32 reserved;
+};
+
+struct ib_uverbs_ex_destroy_counter_set  {
+	__u32 comp_mask;
+	__u32 cs_handle;
+};
+
+struct ib_uverbs_ex_destroy_counter_set_resp {
+	__u32 comp_mask;
+	__u32 response_length;
+};
+
 #define IB_DEVICE_NAME_MAX 64
 
 #endif /* IB_USER_VERBS_H */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 07/16] IB/core: Introduce counter set query verb
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2017-10-19 14:41   ` [PATCH rdma-next 06/16] IB/uverbs: Add create/destroy counter set support Yishai Hadas
@ 2017-10-19 14:41   ` Yishai Hadas
       [not found]     ` <1508424118-27205-8-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-10-19 14:41   ` [PATCH rdma-next 08/16] IB/uverbs: Add query counter set support Yishai Hadas
                     ` (9 subsequent siblings)
  16 siblings, 1 reply; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

This patch adds the query counter set verb, enabling the user to
read the hardware counters which are associated with the counter
set.

The user supplies counter set instance and an output address.
The hardware queries the counter set and writes statistics to the
output address as an array of uint64_t, each entry in the uint64_t
array represents a single counter.

Note:
A counter set must be first attached to an IB object in order to
be queried for its underlay counters, downstream patches will
present bind and query methods for flow counters.
The user has an option as part of the query verb to force reading
the up-to-date hardware values instead of reading some cached
values by using the IB_COUNTER_SET_FORCE_UPDATE flag.

Signed-off-by: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/verbs.c | 21 +++++++++++++++++++++
 include/rdma/ib_verbs.h         | 17 +++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index ea71393..ffe51fd 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -2352,3 +2352,24 @@ int ib_destroy_counter_set(struct ib_counter_set *cs)
 	return cs->device->destroy_counter_set(cs);
 }
 EXPORT_SYMBOL(ib_destroy_counter_set);
+
+/**
+ * ib_query_counter_set - Queries a Counter Set
+ * @cs: The counter set to query.
+ * @cs_query_attr: A list of query attributes
+ * required to get the outcome.
+ */
+int ib_query_counter_set(struct ib_counter_set *cs,
+			 struct ib_counter_set_query_attr *cs_query_attr)
+{
+	if (!cs->device->query_counter_set)
+		return -ENOSYS;
+
+	if (!atomic_read(&cs->usecnt))
+		return -EINVAL;
+
+	return cs->device->query_counter_set(cs,
+				cs_query_attr,
+				NULL);
+}
+EXPORT_SYMBOL(ib_query_counter_set);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1c78db7..bc33bc2 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2085,6 +2085,18 @@ struct ib_counter_set {
 	atomic_t	usecnt;
 };
 
+enum ib_query_counter_set_flags {
+	/* force hardware query instead of cached value */
+	IB_COUNTER_SET_FORCE_UPDATE = 1 << 0,
+};
+
+struct ib_counter_set_query_attr {
+	u32	query_flags; /* Use enum ib_query_counter_set_flags */
+	u64	*out_buff;
+	u32	buff_len;
+	u32	outlen;
+};
+
 struct ib_device {
 	/* Do not access @dma_device directly from ULP nor from HW drivers. */
 	struct device                *dma_device;
@@ -2348,6 +2360,9 @@ struct ib_device {
 						      u16 cs_id,
 						      struct ib_udata *udata);
 	int	(*destroy_counter_set)(struct ib_counter_set *cs);
+	int	(*query_counter_set)(struct ib_counter_set *cs,
+				     struct ib_counter_set_query_attr *cs_query_attr,
+				     struct ib_udata *udata);
 
 	/**
 	 * rdma netdev operation
@@ -3662,6 +3677,8 @@ int ib_describe_counter_set(struct ib_device *device,
 struct ib_counter_set *ib_create_counter_set(struct ib_device *device,
 					     u16 cs_id);
 int ib_destroy_counter_set(struct ib_counter_set *cs);
+int ib_query_counter_set(struct ib_counter_set *cs,
+			 struct ib_counter_set_query_attr *cs_query_attr);
 
 int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
 		 unsigned int *sg_offset, unsigned int page_size);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 08/16] IB/uverbs: Add query counter set support
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2017-10-19 14:41   ` [PATCH rdma-next 07/16] IB/core: Introduce counter set query verb Yishai Hadas
@ 2017-10-19 14:41   ` Yishai Hadas
  2017-10-19 14:41   ` [PATCH rdma-next 09/16] IB/core: Add support for flow counter set Yishai Hadas
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

This patch exposes the query counter set verb to user space
applications.
By that verb the user can read the hardware counters which
are associated with the counter set.

The application needs to provide a sufficient memory to
hold the statistics based on the information it read via
the describe counter set API.

Signed-off-by: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h      |  1 +
 drivers/infiniband/core/uverbs_cmd.c  | 82 +++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c |  1 +
 include/uapi/rdma/ib_user_verbs.h     | 14 ++++++
 4 files changed, 98 insertions(+)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index e1e8379..50c4bb4 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -310,5 +310,6 @@ struct ib_uverbs_flow_spec {
 IB_UVERBS_DECLARE_EX_CMD(describe_counter_set);
 IB_UVERBS_DECLARE_EX_CMD(create_counter_set);
 IB_UVERBS_DECLARE_EX_CMD(destroy_counter_set);
+IB_UVERBS_DECLARE_EX_CMD(query_counter_set);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index b648c8f..bf2b95a 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -4101,3 +4101,85 @@ int ib_uverbs_ex_destroy_counter_set(struct ib_uverbs_file *file,
 	resp.response_length = required_resp_len;
 	return ib_copy_to_udata(ucore, &resp, resp.response_length);
 }
+
+int ib_uverbs_ex_query_counter_set(struct ib_uverbs_file *file,
+				   struct ib_device *ib_dev,
+				   struct ib_udata *ucore,
+				   struct ib_udata *uhw)
+{
+	struct ib_counter_set_query_attr cs_query_attr = {};
+	struct ib_uverbs_ex_query_counter_set	cmd = {};
+	struct ib_uverbs_ex_query_counter_set_resp	resp = {};
+	struct ib_counter_set	*cs;
+	size_t	required_resp_len;
+	size_t	required_cmd_sz;
+	u32 buff_len;
+	int	ret;
+
+	required_cmd_sz = offsetof(typeof(cmd), comp_mask) +
+			sizeof(cmd.comp_mask);
+	required_resp_len = offsetof(typeof(resp), response_length) +
+			sizeof(resp.response_length);
+
+	if (ucore->inlen < required_cmd_sz)
+		return -EINVAL;
+
+	if (ucore->outlen < required_resp_len)
+		return -ENOSPC;
+
+	if (ucore->inlen > sizeof(cmd) &&
+	    !ib_is_udata_cleared(ucore, sizeof(cmd),
+			   ucore->inlen - sizeof(cmd)))
+		return -EOPNOTSUPP;
+
+	ret = ib_copy_from_udata(&cmd, ucore, min(sizeof(cmd), ucore->inlen));
+	if (ret)
+		return ret;
+
+	if (cmd.comp_mask)
+		return -EOPNOTSUPP;
+
+	cs = uobj_get_obj_read(counter_set,
+			       cmd.cs_handle,
+			       file->ucontext);
+
+	if (!cs)
+		return -EINVAL;
+
+	if (!atomic_read(&cs->usecnt)) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	/* Prevent memory allocation rather than max expected size */
+	buff_len = min_t(u32, cmd.out_buff_len,
+			 MAX_COUNTER_SET_BUFF_LEN);
+	cs_query_attr.buff_len = buff_len;
+	cs_query_attr.out_buff = kzalloc(buff_len, GFP_KERNEL);
+	if (!cs_query_attr.out_buff) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	cs_query_attr.query_flags = cmd.query_attr;
+	ret = ib_dev->query_counter_set(cs, &cs_query_attr, uhw);
+	if (ret)
+		goto err_query;
+
+	if (copy_to_user(u64_to_user_ptr(cmd.out_buff),
+			 cs_query_attr.out_buff,
+			 cs_query_attr.outlen)) {
+		ret = -EFAULT;
+		goto err_query;
+	}
+
+	resp.response_length = required_resp_len;
+	ret = ib_copy_to_udata(ucore,
+			       &resp, required_resp_len);
+
+err_query:
+	kfree(cs_query_attr.out_buff);
+err:
+	uobj_put_obj_read(cs);
+	return ret;
+}
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index e9e575d..d6d056c 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -132,6 +132,7 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file *file,
 	[IB_USER_VERBS_EX_CMD_DESCRIBE_COUNTER_SET]	= ib_uverbs_ex_describe_counter_set,
 	[IB_USER_VERBS_EX_CMD_CREATE_COUNTER_SET]	= ib_uverbs_ex_create_counter_set,
 	[IB_USER_VERBS_EX_CMD_DESTROY_COUNTER_SET]	= ib_uverbs_ex_destroy_counter_set,
+	[IB_USER_VERBS_EX_CMD_QUERY_COUNTER_SET]	= ib_uverbs_ex_query_counter_set,
 };
 
 static void ib_uverbs_add_one(struct ib_device *device);
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index fa15ac0..49d9b2d 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -105,6 +105,7 @@ enum {
 	IB_USER_VERBS_EX_CMD_DESCRIBE_COUNTER_SET,
 	IB_USER_VERBS_EX_CMD_CREATE_COUNTER_SET,
 	IB_USER_VERBS_EX_CMD_DESTROY_COUNTER_SET,
+	IB_USER_VERBS_EX_CMD_QUERY_COUNTER_SET,
 };
 
 /*
@@ -1212,6 +1213,19 @@ struct ib_uverbs_ex_destroy_counter_set_resp {
 	__u32 response_length;
 };
 
+struct ib_uverbs_ex_query_counter_set {
+	__u64 out_buff;
+	__u32 out_buff_len;
+	__u32 query_attr; /* Use enum ib_query_counter_set_flags */
+	__u32 cs_handle;
+	__u32 comp_mask;
+};
+
+struct ib_uverbs_ex_query_counter_set_resp {
+	__u32 comp_mask;
+	__u32 response_length;
+};
+
 #define IB_DEVICE_NAME_MAX 64
 
 #endif /* IB_USER_VERBS_H */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 09/16] IB/core: Add support for flow counter set
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (7 preceding siblings ...)
  2017-10-19 14:41   ` [PATCH rdma-next 08/16] IB/uverbs: Add query counter set support Yishai Hadas
@ 2017-10-19 14:41   ` Yishai Hadas
  2017-10-19 14:41   ` [PATCH rdma-next 10/16] IB/uverbs: " Yishai Hadas
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

A counter set object could be attached to flow on creation
by providing the counter specification action.

Signed-off-by: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/verbs.c |  8 +++++++-
 include/rdma/ib_verbs.h         | 14 +++++++++++---
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index ffe51fd..8e82249 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1941,6 +1941,8 @@ struct ib_flow *ib_create_flow(struct ib_qp *qp,
 	if (!IS_ERR(flow_id)) {
 		atomic_inc(&qp->usecnt);
 		flow_id->qp = qp;
+		if (flow_id->counter_set)
+			atomic_inc(&flow_id->counter_set->usecnt);
 	}
 	return flow_id;
 }
@@ -1950,10 +1952,14 @@ int ib_destroy_flow(struct ib_flow *flow_id)
 {
 	int err;
 	struct ib_qp *qp = flow_id->qp;
+	struct ib_counter_set *counter_set = flow_id->counter_set;
 
 	err = qp->device->destroy_flow(flow_id);
-	if (!err)
+	if (!err) {
 		atomic_dec(&qp->usecnt);
+		if (counter_set)
+			atomic_dec(&counter_set->usecnt);
+	}
 	return err;
 }
 EXPORT_SYMBOL(ib_destroy_flow);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index bc33bc2..d394e24 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1811,6 +1811,7 @@ enum ib_flow_spec_type {
 	/* Actions */
 	IB_FLOW_SPEC_ACTION_TAG         = 0x1000,
 	IB_FLOW_SPEC_ACTION_DROP        = 0x1001,
+	IB_FLOW_SPEC_ACTION_COUNT       = 0x1002,
 };
 #define IB_FLOW_SPEC_LAYER_MASK	0xF0
 #define IB_FLOW_SPEC_SUPPORT_LAYERS 8
@@ -1944,6 +1945,12 @@ struct ib_flow_spec_action_drop {
 	u16			      size;
 };
 
+struct ib_flow_spec_action_count {
+	enum ib_flow_spec_type type;
+	u16 size;
+	struct ib_counter_set *counter_set;
+};
+
 union ib_flow_spec {
 	struct {
 		u32			type;
@@ -1957,6 +1964,7 @@ struct ib_flow_spec_action_drop {
 	struct ib_flow_spec_tunnel      tunnel;
 	struct ib_flow_spec_action_tag  flow_tag;
 	struct ib_flow_spec_action_drop drop;
+	struct ib_flow_spec_action_count flow_count;
 };
 
 struct ib_flow_attr {
@@ -1975,6 +1983,7 @@ struct ib_flow_attr {
 struct ib_flow {
 	struct ib_qp		*qp;
 	struct ib_uobject	*uobject;
+	struct ib_counter_set	*counter_set;
 };
 
 struct ib_mad_hdr;
@@ -3671,6 +3680,8 @@ struct ib_rwq_ind_table *ib_create_rwq_ind_table(struct ib_device *device,
 						 wq_ind_table_init_attr);
 int ib_destroy_rwq_ind_table(struct ib_rwq_ind_table *wq_ind_table);
 
+int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
+		 unsigned int *sg_offset, unsigned int page_size);
 int ib_describe_counter_set(struct ib_device *device,
 			    u16 cs_id,
 			    struct ib_counter_set_describe_attr *cs_describe_attr);
@@ -3680,9 +3691,6 @@ struct ib_counter_set *ib_create_counter_set(struct ib_device *device,
 int ib_query_counter_set(struct ib_counter_set *cs,
 			 struct ib_counter_set_query_attr *cs_query_attr);
 
-int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
-		 unsigned int *sg_offset, unsigned int page_size);
-
 static inline int
 ib_map_mr_sg_zbva(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
 		  unsigned int *sg_offset, unsigned int page_size)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 10/16] IB/uverbs: Add support for flow counter set
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (8 preceding siblings ...)
  2017-10-19 14:41   ` [PATCH rdma-next 09/16] IB/core: Add support for flow counter set Yishai Hadas
@ 2017-10-19 14:41   ` Yishai Hadas
  2017-10-19 14:41   ` [PATCH rdma-next 11/16] net/mlx5: Export flow counter related API Yishai Hadas
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

The struct ib_uverbs_flow_spec_action_count associates
a counter set with the flow.

Use of ib_uverbs_flow_spec_action_count allows the
consumer to associate an existing counter set to
a flow on creation.

Signed-off-by: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h     |  1 +
 drivers/infiniband/core/uverbs_cmd.c | 32 ++++++++++++++++++++++++++++++++
 include/uapi/rdma/ib_user_verbs.h    | 13 +++++++++++++
 3 files changed, 46 insertions(+)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 50c4bb4..c87aca9 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -244,6 +244,7 @@ struct ib_uverbs_flow_spec {
 		struct ib_uverbs_flow_spec_ipv6    ipv6;
 		struct ib_uverbs_flow_spec_action_tag	flow_tag;
 		struct ib_uverbs_flow_spec_action_drop	drop;
+		struct ib_uverbs_flow_spec_action_count flow_count;
 	};
 };
 
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index bf2b95a..e05726f 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2732,6 +2732,13 @@ static int kern_spec_to_ib_spec_action(struct ib_uverbs_flow_spec *kern_spec,
 
 		ib_spec->drop.size = sizeof(struct ib_flow_spec_action_drop);
 		break;
+	case IB_FLOW_SPEC_ACTION_COUNT:
+		if (kern_spec->flow_count.size !=
+			sizeof(struct ib_uverbs_flow_spec_action_count))
+			return -EINVAL;
+		ib_spec->flow_count.size =
+				sizeof(struct ib_flow_spec_action_count);
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -3269,9 +3276,11 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
 	struct ib_uverbs_flow_attr	  *kern_flow_attr;
 	struct ib_flow_attr		  *flow_attr;
 	struct ib_qp			  *qp;
+	struct ib_counter_set	*cs = NULL;
 	int err = 0;
 	void *kern_spec;
 	void *ib_spec;
+	int cs_used = 0;
 	int i;
 
 	if (ucore->inlen < sizeof(cmd))
@@ -3366,6 +3375,25 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
 		flow_attr->size +=
 			((union ib_flow_spec *) ib_spec)->size;
 		cmd.flow_attr.size -= ((struct ib_uverbs_flow_spec *)kern_spec)->size;
+		if (((struct ib_uverbs_flow_spec *)kern_spec)->type ==
+				IB_FLOW_SPEC_ACTION_COUNT) {
+			__u32 cs_handle =
+				((struct ib_uverbs_flow_spec *)kern_spec)->flow_count.cs_handle;
+			cs_used++;
+			if (cs_used > 1) {
+				uobj_put_obj_read(cs);
+				err = -EINVAL;
+				goto err_free;
+			}
+			cs = uobj_get_obj_read(counter_set,
+					       cs_handle,
+					       file->ucontext);
+			if (!cs) {
+				err = -EINVAL;
+				goto err_free;
+			}
+			((union ib_flow_spec *)ib_spec)->flow_count.counter_set = cs;
+		}
 		kern_spec += ((struct ib_uverbs_flow_spec *) kern_spec)->size;
 		ib_spec += ((union ib_flow_spec *) ib_spec)->size;
 	}
@@ -3376,6 +3404,10 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
 		goto err_free;
 	}
 	flow_id = ib_create_flow(qp, flow_attr, IB_FLOW_DOMAIN_USER);
+
+	if (cs_used)
+		uobj_put_obj_read(cs);
+
 	if (IS_ERR(flow_id)) {
 		err = PTR_ERR(flow_id);
 		goto err_free;
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 49d9b2d..63f79d5 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -987,6 +987,19 @@ struct ib_uverbs_flow_spec_action_drop {
 	};
 };
 
+struct ib_uverbs_flow_spec_action_count {
+	union {
+		struct ib_uverbs_flow_spec_hdr hdr;
+		struct {
+			__u32 type;
+			__u16 size;
+			__u16 reserved;
+		};
+	};
+	__u32 cs_handle;
+	__u32 reserved1;
+};
+
 struct ib_uverbs_flow_tunnel_filter {
 	__be32 tunnel_id;
 };
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 11/16] net/mlx5: Export flow counter related API
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (9 preceding siblings ...)
  2017-10-19 14:41   ` [PATCH rdma-next 10/16] IB/uverbs: " Yishai Hadas
@ 2017-10-19 14:41   ` Yishai Hadas
  2017-10-19 14:41   ` [PATCH rdma-next 12/16] net/mlx5: Expand mlx5_fc_query_cached to return absolute counters values Yishai Hadas
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Exports counters API to be used in both IB and EN.

Signed-off-by: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h   |  2 --
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  | 23 ----------------------
 .../net/ethernet/mellanox/mlx5/core/fs_counters.c  |  3 +++
 include/linux/mlx5/fs.h                            | 22 +++++++++++++++++++++
 5 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index 881e2e5..58b6f2e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -424,6 +424,7 @@ int mlx5_cmd_fc_query(struct mlx5_core_dev *dev, u32 id,
 	*bytes = MLX5_GET64(traffic_counter, stats, octets);
 	return 0;
 }
+EXPORT_SYMBOL(mlx5_cmd_fc_query);
 
 struct mlx5_cmd_fc_bulk {
 	u32 id;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
index 71e2d0f..d413760 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
@@ -76,8 +76,6 @@ int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
 
 int mlx5_cmd_fc_alloc(struct mlx5_core_dev *dev, u32 *id);
 int mlx5_cmd_fc_free(struct mlx5_core_dev *dev, u32 id);
-int mlx5_cmd_fc_query(struct mlx5_core_dev *dev, u32 id,
-		      u64 *packets, u64 *bytes);
 
 struct mlx5_cmd_fc_bulk;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index 80f6f3c7..2789a14 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -128,29 +128,6 @@ struct mlx5_flow_table {
 	struct rhltable			fgs_hash;
 };
 
-struct mlx5_fc_cache {
-	u64 packets;
-	u64 bytes;
-	u64 lastuse;
-};
-
-struct mlx5_fc {
-	struct rb_node node;
-	struct list_head list;
-
-	/* last{packets,bytes} members are used when calculating the delta since
-	 * last reading
-	 */
-	u64 lastpackets;
-	u64 lastbytes;
-
-	u32 id;
-	bool deleted;
-	bool aging;
-
-	struct mlx5_fc_cache cache ____cacheline_aligned_in_smp;
-};
-
 struct mlx5_ft_underlay_qp {
 	struct list_head list;
 	u32 qpn;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
index 89d1f865..f679eb0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
@@ -243,6 +243,7 @@ struct mlx5_fc *mlx5_fc_create(struct mlx5_core_dev *dev, bool aging)
 
 	return ERR_PTR(err);
 }
+EXPORT_SYMBOL(mlx5_fc_create);
 
 void mlx5_fc_destroy(struct mlx5_core_dev *dev, struct mlx5_fc *counter)
 {
@@ -260,6 +261,7 @@ void mlx5_fc_destroy(struct mlx5_core_dev *dev, struct mlx5_fc *counter)
 	mlx5_cmd_fc_free(dev, counter->id);
 	kfree(counter);
 }
+EXPORT_SYMBOL(mlx5_fc_destroy);
 
 int mlx5_init_fc_stats(struct mlx5_core_dev *dev)
 {
@@ -326,6 +328,7 @@ void mlx5_fc_query_cached(struct mlx5_fc *counter,
 	counter->lastbytes = c.bytes;
 	counter->lastpackets = c.packets;
 }
+EXPORT_SYMBOL(mlx5_fc_query_cached);
 
 void mlx5_fc_queue_stats_work(struct mlx5_core_dev *dev,
 			      struct delayed_work *dwork,
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index b25e7ba..a4a715e 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -166,6 +166,28 @@ int mlx5_modify_rule_destination(struct mlx5_flow_handle *handler,
 void mlx5_fc_destroy(struct mlx5_core_dev *dev, struct mlx5_fc *counter);
 void mlx5_fc_query_cached(struct mlx5_fc *counter,
 			  u64 *bytes, u64 *packets, u64 *lastuse);
+int mlx5_cmd_fc_query(struct mlx5_core_dev *dev, u32 id,
+		      u64 *packets, u64 *bytes);
+
+struct mlx5_fc_cache {
+	u64 packets;
+	u64 bytes;
+	u64 lastuse;
+};
+
+struct mlx5_fc {
+	struct rb_node node;
+	struct list_head list;
+
+	u64 lastpackets;
+	u64 lastbytes;
+
+	u32 id;
+	bool deleted;
+	bool aging;
+	struct mlx5_fc_cache cache ____cacheline_aligned_in_smp;
+};
+
 int mlx5_fs_add_rx_underlay_qpn(struct mlx5_core_dev *dev, u32 underlay_qpn);
 int mlx5_fs_remove_rx_underlay_qpn(struct mlx5_core_dev *dev, u32 underlay_qpn);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 12/16] net/mlx5: Expand mlx5_fc_query_cached to return absolute counters values
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (10 preceding siblings ...)
  2017-10-19 14:41   ` [PATCH rdma-next 11/16] net/mlx5: Export flow counter related API Yishai Hadas
@ 2017-10-19 14:41   ` Yishai Hadas
  2017-10-19 14:41   ` [PATCH rdma-next 13/16] IB/mlx5: Add counter set operations Yishai Hadas
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Currently mlx5_fc_query_cached returns counters values since last call,
expands it to work with query flags, passing MLX5_FLOW_QUERY_CACHED_ABS
flag will makes function to return counters absolute values.

This absolute mode will be used in downstream patches in this series.

Signed-off-by: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c       |  6 ++++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c | 14 ++++++++++----
 include/linux/mlx5/fs.h                               |  8 +++++++-
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 1aa2028..37b3049 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -465,7 +465,8 @@ void mlx5e_tc_update_neigh_used_value(struct mlx5e_neigh_hash_entry *nhe)
 		list_for_each_entry(flow, &e->flows, encap) {
 			if (flow->flags & MLX5E_TC_FLOW_OFFLOADED) {
 				counter = mlx5_flow_rule_counter(flow->rule);
-				mlx5_fc_query_cached(counter, &bytes, &packets, &lastuse);
+				mlx5_fc_query_cached(counter, &bytes, &packets, &lastuse,
+						     MLX5_FLOW_QUERY_CACHED_DIFF);
 				if (time_after((unsigned long)lastuse, nhe->reported_lastuse)) {
 					neigh_used = true;
 					break;
@@ -2133,7 +2134,8 @@ int mlx5e_stats_flower(struct mlx5e_priv *priv,
 	if (!counter)
 		return 0;
 
-	mlx5_fc_query_cached(counter, &bytes, &packets, &lastuse);
+	mlx5_fc_query_cached(counter, &bytes, &packets, &lastuse,
+			     MLX5_FLOW_QUERY_CACHED_DIFF);
 
 	tcf_exts_stats_update(f->exts, bytes, packets, lastuse);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
index f679eb0..2a5b2f4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
@@ -315,16 +315,22 @@ void mlx5_cleanup_fc_stats(struct mlx5_core_dev *dev)
 }
 
 void mlx5_fc_query_cached(struct mlx5_fc *counter,
-			  u64 *bytes, u64 *packets, u64 *lastuse)
+			  u64 *bytes, u64 *packets, u64 *lastuse,
+			  enum mlx5_flow_query_cached_flags query_flags)
 {
 	struct mlx5_fc_cache c;
 
 	c = counter->cache;
 
-	*bytes = c.bytes - counter->lastbytes;
-	*packets = c.packets - counter->lastpackets;
-	*lastuse = c.lastuse;
+	if (query_flags == MLX5_FLOW_QUERY_CACHED_ABS) {
+		*bytes = c.bytes;
+		*packets = c.packets;
+	} else {
+		*bytes = c.bytes - counter->lastbytes;
+		*packets = c.packets - counter->lastpackets;
+	}
 
+	*lastuse = c.lastuse;
 	counter->lastbytes = c.bytes;
 	counter->lastpackets = c.packets;
 }
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index a4a715e..8e0331c 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -164,8 +164,14 @@ int mlx5_modify_rule_destination(struct mlx5_flow_handle *handler,
 struct mlx5_fc *mlx5_flow_rule_counter(struct mlx5_flow_handle *handler);
 struct mlx5_fc *mlx5_fc_create(struct mlx5_core_dev *dev, bool aging);
 void mlx5_fc_destroy(struct mlx5_core_dev *dev, struct mlx5_fc *counter);
+
+enum mlx5_flow_query_cached_flags {
+	MLX5_FLOW_QUERY_CACHED_DIFF = 0,
+	MLX5_FLOW_QUERY_CACHED_ABS = 1,
+};
 void mlx5_fc_query_cached(struct mlx5_fc *counter,
-			  u64 *bytes, u64 *packets, u64 *lastuse);
+			  u64 *bytes, u64 *packets, u64 *lastuse,
+			  enum mlx5_flow_query_cached_flags query_flags);
 int mlx5_cmd_fc_query(struct mlx5_core_dev *dev, u32 id,
 		      u64 *packets, u64 *bytes);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 13/16] IB/mlx5: Add counter set operations
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (11 preceding siblings ...)
  2017-10-19 14:41   ` [PATCH rdma-next 12/16] net/mlx5: Expand mlx5_fc_query_cached to return absolute counters values Yishai Hadas
@ 2017-10-19 14:41   ` Yishai Hadas
  2017-10-19 14:41   ` [PATCH rdma-next 14/16] IB/mlx5: Pass mlx5_flow_act struct instead of multiple arguments Yishai Hadas
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

This patch implements the uverbs counter sets API (e.g.
describe,create,destroy,query) by introducing some
internal management structures.

Downstream patches in this series will add the
functionality to support flow counters.

Signed-off-by: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c    | 171 ++++++++++++++++++++++++++++++++++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  31 +++++++
 include/uapi/rdma/mlx5-abi.h         |  16 ++++
 3 files changed, 217 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 61ce3ca..a5ff217 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3945,6 +3945,158 @@ static void init_delay_drop(struct mlx5_ib_dev *dev)
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
 
 	return mlx5_get_vector_affinity(dev->mdev, comp_vector);
+};
+
+static int mlx5_ib_describe_counter_set(struct ib_device *ib_dev,
+					u16	cs_id,
+					struct ib_counter_set_describe_attr *cs_describe_attr,
+					struct ib_udata *udata)
+{
+	struct mlx5_ib_describe_counter_set_resp resp = {};
+	struct ib_counter_set_describe_attr *ib_cs_desc;
+	struct mlx5_ib_dev *dev = to_mdev(ib_dev);
+	struct mlx5_desc_cs_attr *desc_cs;
+	size_t min_resp_len;
+
+	if (udata && udata->inlen > 0 &&
+	    !ib_is_udata_cleared(udata, 0,
+				 udata->inlen))
+		return -EINVAL;
+
+	min_resp_len = offsetof(typeof(resp), reserved) + sizeof(resp.reserved);
+	if (udata && udata->outlen && udata->outlen < min_resp_len)
+		return -EINVAL;
+
+	if (cs_id >= dev->counter_sets.max_counter_sets)
+		return -EINVAL;
+
+	desc_cs = &dev->counter_sets.desc_cs_arr[cs_id];
+	ib_cs_desc = &desc_cs->cs_desc;
+	if (cs_describe_attr->counters_names_buff) {
+		if (cs_describe_attr->counters_names_len <
+				ib_cs_desc->counters_names_len)
+			return -EINVAL;
+
+		desc_cs->fill_counters_names(cs_describe_attr->counters_names_buff);
+	}
+
+	cs_describe_attr->counted_type = ib_cs_desc->counted_type;
+	cs_describe_attr->num_of_cs = ib_cs_desc->num_of_cs;
+	cs_describe_attr->attributes = ib_cs_desc->attributes;
+	cs_describe_attr->entries_count = ib_cs_desc->entries_count;
+	if (udata && udata->outlen) {
+		resp.response_length = offsetof(typeof(resp), response_length) +
+					sizeof(resp.response_length);
+		return ib_copy_to_udata(udata, &resp, resp.response_length);
+	}
+
+	return 0;
+}
+
+static int mlx5_ib_destroy_counter_set(struct ib_counter_set *cs)
+{
+	struct mlx5_ib_cs *mcs = to_mcs(cs);
+
+	kfree(mcs);
+
+	return 0;
+}
+
+static struct ib_counter_set *mlx5_ib_create_counter_set(
+		struct ib_device *device,
+		u16 cs_id,
+		struct ib_udata *udata)
+{
+	struct mlx5_ib_create_counter_set_resp resp = {};
+	struct mlx5_ib_dev *dev;
+	struct mlx5_ib_cs *mcs;
+	size_t min_resp_len;
+	int err;
+
+	if (udata && udata->inlen > 0 &&
+	    !ib_is_udata_cleared(udata, 0,
+				 udata->inlen))
+		return ERR_PTR(-EINVAL);
+
+	min_resp_len = offsetof(typeof(resp), reserved) + sizeof(resp.reserved);
+	if (udata && udata->outlen && udata->outlen < min_resp_len)
+		return ERR_PTR(-EINVAL);
+
+	dev = to_mdev(device);
+	if (cs_id >= dev->counter_sets.max_counter_sets)
+		return ERR_PTR(-EINVAL);
+
+	mcs = kzalloc(sizeof(*mcs), GFP_KERNEL);
+	if (!mcs)
+		return ERR_PTR(-ENOMEM);
+
+	mcs->desc_cs = &dev->counter_sets.desc_cs_arr[cs_id];
+	if (udata && udata->outlen) {
+		resp.response_length = offsetof(typeof(resp), response_length) +
+					sizeof(resp.response_length);
+		err = ib_copy_to_udata(udata, &resp, resp.response_length);
+		if (err)
+			goto err_copy;
+	}
+
+	return &mcs->ibcs;
+
+err_copy:
+	mlx5_ib_destroy_counter_set(&mcs->ibcs);
+	return ERR_PTR(err);
+}
+
+static int mlx5_ib_query_counter_set(struct ib_counter_set *cs,
+				     struct ib_counter_set_query_attr *cs_query_attr,
+				     struct ib_udata *udata)
+{
+	struct mlx5_query_count_attr query_attr = {};
+	struct mlx5_ib_query_counter_set_resp resp = {};
+	struct mlx5_ib_cs *mcs = to_mcs(cs);
+	size_t	required_buff_len;
+	size_t min_resp_len;
+	int ret;
+
+	if (udata && udata->inlen > 0 &&
+	    !ib_is_udata_cleared(udata, 0,
+				 udata->inlen))
+		return -EINVAL;
+
+	min_resp_len = offsetof(typeof(resp), reserved) + sizeof(resp.reserved);
+	if (udata && udata->outlen && udata->outlen < min_resp_len)
+		return -EINVAL;
+
+	required_buff_len = mcs->desc_cs->cs_desc.entries_count * sizeof(u64);
+	if (required_buff_len > cs_query_attr->buff_len)
+		return -EINVAL;
+
+	cs_query_attr->outlen = required_buff_len;
+	query_attr.out = cs_query_attr->out_buff;
+	query_attr.hw_cs_handle = mcs->hw_cs_handle;
+	query_attr.query_flags =
+			cs_query_attr->query_flags;
+	ret = mcs->query_count(cs->device, &query_attr);
+	if (udata && udata->outlen) {
+		resp.response_length = offsetof(typeof(resp), response_length) +
+					sizeof(resp.response_length);
+		ret = ib_copy_to_udata(udata, &resp, resp.response_length);
+	}
+
+	return ret;
+}
+
+static int alloc_ib_counter_sets(struct mlx5_ib_dev *dev)
+{
+	struct mlx5_ib_counter_sets *ib_css = &dev->counter_sets;
+
+	ib_css->max_counter_sets = 0;
+
+	return 0;
+}
+
+static void dealloc_ib_counter_sets(struct mlx5_ib_dev *dev)
+{
+	kfree(dev->counter_sets.desc_cs_arr);
 }
 
 static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
@@ -4028,7 +4180,11 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 		(1ull << IB_USER_VERBS_EX_CMD_CREATE_CQ)	|
 		(1ull << IB_USER_VERBS_EX_CMD_CREATE_QP)	|
 		(1ull << IB_USER_VERBS_EX_CMD_MODIFY_QP)	|
-		(1ull << IB_USER_VERBS_EX_CMD_MODIFY_CQ);
+		(1ull << IB_USER_VERBS_EX_CMD_MODIFY_CQ)	|
+		(1ull << IB_USER_VERBS_EX_CMD_CREATE_COUNTER_SET)	|
+		(1ull << IB_USER_VERBS_EX_CMD_DESTROY_COUNTER_SET)	|
+		(1ull << IB_USER_VERBS_EX_CMD_QUERY_COUNTER_SET)	|
+		(1ull << IB_USER_VERBS_EX_CMD_DESCRIBE_COUNTER_SET);
 
 	dev->ib_dev.query_device	= mlx5_ib_query_device;
 	dev->ib_dev.query_port		= mlx5_ib_query_port;
@@ -4081,6 +4237,10 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 	dev->ib_dev.get_vector_affinity	= mlx5_ib_get_vector_affinity;
 	if (MLX5_CAP_GEN(mdev, ipoib_enhanced_offloads))
 		dev->ib_dev.alloc_rdma_netdev	= mlx5_ib_alloc_rdma_netdev;
+	dev->ib_dev.create_counter_set  = mlx5_ib_create_counter_set;
+	dev->ib_dev.destroy_counter_set = mlx5_ib_destroy_counter_set;
+	dev->ib_dev.query_counter_set   = mlx5_ib_query_counter_set;
+	dev->ib_dev.describe_counter_set = mlx5_ib_describe_counter_set;
 
 	if (mlx5_core_is_pf(mdev)) {
 		dev->ib_dev.get_vf_config	= mlx5_ib_get_vf_config;
@@ -4199,6 +4359,10 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 			goto err_delay_drop;
 	}
 
+	err = alloc_ib_counter_sets(dev);
+	if (err)
+		goto err_create_file;
+
 	if ((MLX5_CAP_GEN(mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH) &&
 	    MLX5_CAP_GEN(mdev, disable_local_lb))
 		mutex_init(&dev->lb_mutex);
@@ -4207,6 +4371,10 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 
 	return dev;
 
+err_create_file:
+	device_remove_file(&dev->ib_dev.dev,
+			   mlx5_class_attributes[i]);
+
 err_delay_drop:
 	cancel_delay_drop(dev);
 	destroy_umrc_res(dev);
@@ -4265,6 +4433,7 @@ static void mlx5_ib_remove(struct mlx5_core_dev *mdev, void *context)
 	if (MLX5_CAP_GEN(dev->mdev, max_qp_cnt))
 		mlx5_ib_dealloc_counters(dev);
 	destroy_umrc_res(dev);
+	dealloc_ib_counter_sets(dev);
 	mlx5_ib_odp_remove_one(dev);
 	destroy_dev_resources(&dev->devr);
 	if (ll == IB_LINK_LAYER_ETHERNET)
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 14b1554..d2f9016 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -705,6 +705,35 @@ struct mlx5_ib_delay_drop {
 	struct mlx5_ib_dbg_delay_drop *dbg;
 };
 
+struct mlx5_query_count_attr {
+	void *hw_cs_handle;
+	u64 *out;
+	u32 query_flags;
+};
+
+struct mlx5_desc_cs_attr {
+	struct ib_counter_set_describe_attr cs_desc;
+	void (*fill_counters_names)(char *cs_names_buff);
+};
+
+struct mlx5_ib_cs {
+	struct ib_counter_set ibcs;
+	struct mlx5_desc_cs_attr *desc_cs;
+	void *hw_cs_handle;
+	int (*query_count)(struct ib_device *ibdev,
+			   struct mlx5_query_count_attr *query_attr);
+};
+
+static inline struct mlx5_ib_cs *to_mcs(struct ib_counter_set *ibcs)
+{
+	return container_of(ibcs, struct mlx5_ib_cs, ibcs);
+}
+
+struct mlx5_ib_counter_sets {
+	struct mlx5_desc_cs_attr *desc_cs_arr;
+	u16 max_counter_sets;
+};
+
 struct mlx5_ib_dev {
 	struct ib_device		ib_dev;
 	struct mlx5_core_dev		*mdev;
@@ -748,6 +777,8 @@ struct mlx5_ib_dev {
 	struct mutex		lb_mutex;
 	u32			user_td;
 	u8			umr_fence;
+
+	struct mlx5_ib_counter_sets counter_sets;
 };
 
 static inline struct mlx5_ib_cq *to_mibcq(struct mlx5_core_cq *mcq)
diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
index 442b46b..6412400 100644
--- a/include/uapi/rdma/mlx5-abi.h
+++ b/include/uapi/rdma/mlx5-abi.h
@@ -366,4 +366,20 @@ struct mlx5_ib_modify_wq {
 	__u32	comp_mask;
 	__u32	reserved;
 };
+
+struct mlx5_ib_describe_counter_set_resp {
+	__u32	response_length;
+	__u32	reserved;
+};
+
+struct mlx5_ib_create_counter_set_resp {
+	__u32	response_length;
+	__u32	reserved;
+};
+
+struct mlx5_ib_query_counter_set_resp {
+	__u32	response_length;
+	__u32	reserved;
+};
+
 #endif /* MLX5_ABI_USER_H */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 14/16] IB/mlx5: Pass mlx5_flow_act struct instead of multiple arguments
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (12 preceding siblings ...)
  2017-10-19 14:41   ` [PATCH rdma-next 13/16] IB/mlx5: Add counter set operations Yishai Hadas
@ 2017-10-19 14:41   ` Yishai Hadas
  2017-10-19 14:41   ` [PATCH rdma-next 15/16] IB/mlx5: Add flow counter set support Yishai Hadas
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w, Boris Pismenny, Matan Barak,
	Leon Romanovsky

From: Boris Pismenny <borisp-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Group and pass all function arguments of parse_flow_attr call in one
common struct mlx5_flow_act.

This patch passes all the action arguments of parse_flow_attr in one common
struct mlx5_flow_act. It allows us to scale the number of actions without adding
new arguments to the function.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Boris Pismenny <borisp-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index a5ff217..2b434db 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1952,7 +1952,7 @@ static void set_tos(void *outer_c, void *outer_v, u8 mask, u8 val)
 #define IPV6_VERSION 6
 static int parse_flow_attr(struct mlx5_core_dev *mdev, u32 *match_c,
 			   u32 *match_v, const union ib_flow_spec *ib_spec,
-			   u32 *tag_id, bool *is_drop)
+			   struct mlx5_flow_act *action)
 {
 	void *misc_params_c = MLX5_ADDR_OF(fte_match_param, match_c,
 					   misc_parameters);
@@ -2170,13 +2170,13 @@ static int parse_flow_attr(struct mlx5_core_dev *mdev, u32 *match_c,
 		if (ib_spec->flow_tag.tag_id >= BIT(24))
 			return -EINVAL;
 
-		*tag_id = ib_spec->flow_tag.tag_id;
+		action->flow_tag = ib_spec->flow_tag.tag_id;
 		break;
 	case IB_FLOW_SPEC_ACTION_DROP:
 		if (FIELDS_NOT_SUPPORTED(ib_spec->drop,
 					 LAST_DROP_FIELD))
 			return -EOPNOTSUPP;
-		*is_drop = true;
+		action->action |= MLX5_FLOW_CONTEXT_ACTION_DROP;
 		break;
 	default:
 		return -EINVAL;
@@ -2429,13 +2429,11 @@ static struct mlx5_ib_flow_handler *_create_flow_rule(struct mlx5_ib_dev *dev,
 {
 	struct mlx5_flow_table	*ft = ft_prio->flow_table;
 	struct mlx5_ib_flow_handler *handler;
-	struct mlx5_flow_act flow_act = {0};
+	struct mlx5_flow_act flow_act = {.flow_tag = MLX5_FS_DEFAULT_FLOW_TAG};
 	struct mlx5_flow_spec *spec;
 	struct mlx5_flow_destination *rule_dst = dst;
 	const void *ib_flow = (const void *)flow_attr + sizeof(*flow_attr);
 	unsigned int spec_index;
-	u32 flow_tag = MLX5_FS_DEFAULT_FLOW_TAG;
-	bool is_drop = false;
 	int err = 0;
 	int dest_num = 1;
 
@@ -2454,7 +2452,7 @@ static struct mlx5_ib_flow_handler *_create_flow_rule(struct mlx5_ib_dev *dev,
 	for (spec_index = 0; spec_index < flow_attr->num_of_specs; spec_index++) {
 		err = parse_flow_attr(dev->mdev, spec->match_criteria,
 				      spec->match_value,
-				      ib_flow, &flow_tag, &is_drop);
+				      ib_flow, &flow_act);
 		if (err < 0)
 			goto free;
 
@@ -2465,8 +2463,7 @@ static struct mlx5_ib_flow_handler *_create_flow_rule(struct mlx5_ib_dev *dev,
 		set_underlay_qp(dev, spec, underlay_qpn);
 
 	spec->match_criteria_enable = get_match_criteria_enable(spec->match_criteria);
-	if (is_drop) {
-		flow_act.action = MLX5_FLOW_CONTEXT_ACTION_DROP;
+	if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_DROP) {
 		rule_dst = NULL;
 		dest_num = 0;
 	} else {
@@ -2474,15 +2471,14 @@ static struct mlx5_ib_flow_handler *_create_flow_rule(struct mlx5_ib_dev *dev,
 		    MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_PRIO;
 	}
 
-	if (flow_tag != MLX5_FS_DEFAULT_FLOW_TAG &&
+	if (flow_act.flow_tag != MLX5_FS_DEFAULT_FLOW_TAG &&
 	    (flow_attr->type == IB_FLOW_ATTR_ALL_DEFAULT ||
 	     flow_attr->type == IB_FLOW_ATTR_MC_DEFAULT)) {
 		mlx5_ib_warn(dev, "Flow tag %u and attribute type %x isn't allowed in leftovers\n",
-			     flow_tag, flow_attr->type);
+			     flow_act.flow_tag, flow_attr->type);
 		err = -EINVAL;
 		goto free;
 	}
-	flow_act.flow_tag = flow_tag;
 	handler->rule = mlx5_add_flow_rules(ft, spec,
 					    &flow_act,
 					    rule_dst, dest_num);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 15/16] IB/mlx5: Add flow counter set support
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (13 preceding siblings ...)
  2017-10-19 14:41   ` [PATCH rdma-next 14/16] IB/mlx5: Pass mlx5_flow_act struct instead of multiple arguments Yishai Hadas
@ 2017-10-19 14:41   ` Yishai Hadas
  2017-10-19 14:41   ` [PATCH rdma-next 16/16] IB/mlx5: Expose max_counter_sets capability Yishai Hadas
  2017-10-23 16:51   ` [PATCH rdma-next 00/16] Flow counters support Jason Gunthorpe
  16 siblings, 0 replies; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Associates a counter set with a flow when IB_FLOW_SPEC_ACTION_COUNT
is part of the flow specifications.

Expose flow counters to the generic counters sets interface.

Signed-off-by: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 178 +++++++++++++++++++++++++++++++++++---
 include/linux/mlx5/fs.h           |   1 +
 2 files changed, 168 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 2b434db..46ac67b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1931,6 +1931,16 @@ static void set_tos(void *outer_c, void *outer_v, u8 mask, u8 val)
 	MLX5_SET(fte_match_set_lyr_2_4, outer_v, ip_dscp, val >> 2);
 }
 
+static bool is_counter_set_valid(struct ib_counter_set *cs,
+				 enum ib_counter_set_type type)
+{
+	struct mlx5_ib_cs *mcs;
+
+	mcs = to_mcs(cs);
+
+	return (mcs->desc_cs->cs_desc.counted_type == type);
+}
+
 #define LAST_ETH_FIELD vlan_tag
 #define LAST_IB_FIELD sl
 #define LAST_IPV4_FIELD tos
@@ -1939,6 +1949,7 @@ static void set_tos(void *outer_c, void *outer_v, u8 mask, u8 val)
 #define LAST_TUNNEL_FIELD tunnel_id
 #define LAST_FLOW_TAG_FIELD tag_id
 #define LAST_DROP_FIELD size
+#define LAST_FLOW_COUNTER_FIELD counter_set
 
 /* Field is the last supported field */
 #define FIELDS_NOT_SUPPORTED(filter, field)\
@@ -2178,6 +2189,16 @@ static int parse_flow_attr(struct mlx5_core_dev *mdev, u32 *match_c,
 			return -EOPNOTSUPP;
 		action->action |= MLX5_FLOW_CONTEXT_ACTION_DROP;
 		break;
+	case IB_FLOW_SPEC_ACTION_COUNT:
+		if (FIELDS_NOT_SUPPORTED(ib_spec->flow_count,
+					 LAST_FLOW_COUNTER_FIELD))
+			return -EOPNOTSUPP;
+		action->counter_set = ib_spec->flow_count.counter_set;
+		if (!is_counter_set_valid(action->counter_set,
+					  IB_COUNTER_SET_FLOW))
+			return -EINVAL;
+		action->action |= MLX5_FLOW_CONTEXT_ACTION_COUNT;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -2429,13 +2450,16 @@ static struct mlx5_ib_flow_handler *_create_flow_rule(struct mlx5_ib_dev *dev,
 {
 	struct mlx5_flow_table	*ft = ft_prio->flow_table;
 	struct mlx5_ib_flow_handler *handler;
-	struct mlx5_flow_act flow_act = {.flow_tag = MLX5_FS_DEFAULT_FLOW_TAG};
+	struct mlx5_flow_act flow_act = {.flow_tag = MLX5_FS_DEFAULT_FLOW_TAG,
+					 .counter_set = NULL};
 	struct mlx5_flow_spec *spec;
-	struct mlx5_flow_destination *rule_dst = dst;
+	struct mlx5_flow_destination dest_arr[2] = {};
+	struct mlx5_flow_destination *rule_dst = dest_arr;
+	struct mlx5_ib_cs *rcs;
 	const void *ib_flow = (const void *)flow_attr + sizeof(*flow_attr);
 	unsigned int spec_index;
 	int err = 0;
-	int dest_num = 1;
+	int dest_num = 0;
 
 	if (!is_valid_attr(dev->mdev, flow_attr))
 		return ERR_PTR(-EINVAL);
@@ -2448,6 +2472,10 @@ static struct mlx5_ib_flow_handler *_create_flow_rule(struct mlx5_ib_dev *dev,
 	}
 
 	INIT_LIST_HEAD(&handler->list);
+	if (dst) {
+		memcpy(&dest_arr[0], dst, sizeof(*dst));
+		dest_num++;
+	}
 
 	for (spec_index = 0; spec_index < flow_attr->num_of_specs; spec_index++) {
 		err = parse_flow_attr(dev->mdev, spec->match_criteria,
@@ -2463,12 +2491,25 @@ static struct mlx5_ib_flow_handler *_create_flow_rule(struct mlx5_ib_dev *dev,
 		set_underlay_qp(dev, spec, underlay_qpn);
 
 	spec->match_criteria_enable = get_match_criteria_enable(spec->match_criteria);
-	if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_DROP) {
-		rule_dst = NULL;
-		dest_num = 0;
-	} else {
-		flow_act.action = dst ? MLX5_FLOW_CONTEXT_ACTION_FWD_DEST :
-		    MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_PRIO;
+	if (!(flow_act.action & MLX5_FLOW_CONTEXT_ACTION_DROP)) {
+		if (dst) {
+			flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+		} else {
+			flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_PRIO;
+			rule_dst = NULL;
+		}
+	}
+
+	if (rule_dst && flow_act.counter_set) {
+		rcs = to_mcs(flow_act.counter_set);
+		dest_arr[dest_num].type =
+				MLX5_FLOW_DESTINATION_TYPE_COUNTER;
+		dest_arr[dest_num].counter =
+				(struct mlx5_fc *)(rcs->hw_cs_handle);
+		dest_num++;
+		handler->ibflow.counter_set =
+				flow_act.counter_set;
+		flow_act.action |= MLX5_FLOW_CONTEXT_ACTION_COUNT;
 	}
 
 	if (flow_act.flow_tag != MLX5_FS_DEFAULT_FLOW_TAG &&
@@ -3989,10 +4030,35 @@ static int mlx5_ib_describe_counter_set(struct ib_device *ib_dev,
 	return 0;
 }
 
+static int query_flow_counter_set(struct ib_device *ibdev,
+				  struct mlx5_query_count_attr *query_attr)
+{
+	struct mlx5_fc *fc = (struct mlx5_fc *)(query_attr->hw_cs_handle);
+	struct mlx5_ib_dev *dev = to_mdev(ibdev);
+	int ret = 0;
+	u64 last_used;
+
+	if (query_attr->query_flags & IB_COUNTER_SET_FORCE_UPDATE) {
+		ret = mlx5_cmd_fc_query(dev->mdev,
+					fc->id,
+					&query_attr->out[0],
+					&query_attr->out[1]);
+	} else {
+		mlx5_fc_query_cached(fc, &query_attr->out[1],
+				     &query_attr->out[0],
+				     &last_used,
+				     MLX5_FLOW_QUERY_CACHED_ABS);
+	}
+	return ret;
+}
+
 static int mlx5_ib_destroy_counter_set(struct ib_counter_set *cs)
 {
+	struct mlx5_ib_dev *dev = to_mdev(cs->device);
 	struct mlx5_ib_cs *mcs = to_mcs(cs);
 
+	mlx5_fc_destroy(dev->mdev,
+			(struct mlx5_fc *)(mcs->hw_cs_handle));
 	kfree(mcs);
 
 	return 0;
@@ -4004,6 +4070,7 @@ static struct ib_counter_set *mlx5_ib_create_counter_set(
 		struct ib_udata *udata)
 {
 	struct mlx5_ib_create_counter_set_resp resp = {};
+	struct mlx5_fc *flow_count;
 	struct mlx5_ib_dev *dev;
 	struct mlx5_ib_cs *mcs;
 	size_t min_resp_len;
@@ -4027,17 +4094,34 @@ static struct ib_counter_set *mlx5_ib_create_counter_set(
 		return ERR_PTR(-ENOMEM);
 
 	mcs->desc_cs = &dev->counter_sets.desc_cs_arr[cs_id];
+	switch (mcs->desc_cs->cs_desc.counted_type) {
+	case IB_COUNTER_SET_FLOW:
+		mcs->query_count = query_flow_counter_set;
+		flow_count = mlx5_fc_create(dev->mdev, true);
+		if (IS_ERR(flow_count)) {
+			kfree(mcs);
+			return ERR_CAST(flow_count);
+		}
+		mcs->hw_cs_handle = (void *)flow_count;
+		break;
+	default:
+		mlx5_ib_dbg(dev, "unsupported counter set type %d\n",
+			    mcs->desc_cs->cs_desc.counted_type);
+		err = -EINVAL;
+		goto err_get_counter;
+	}
+
 	if (udata && udata->outlen) {
 		resp.response_length = offsetof(typeof(resp), response_length) +
 					sizeof(resp.response_length);
 		err = ib_copy_to_udata(udata, &resp, resp.response_length);
 		if (err)
-			goto err_copy;
+			goto err_get_counter;
 	}
 
 	return &mcs->ibcs;
 
-err_copy:
+err_get_counter:
 	mlx5_ib_destroy_counter_set(&mcs->ibcs);
 	return ERR_PTR(err);
 }
@@ -4081,13 +4165,85 @@ static int mlx5_ib_query_counter_set(struct ib_counter_set *cs,
 	return ret;
 }
 
+struct mlx5_ib_flow_counter {
+	size_t offset;
+	const char *name;
+};
+
+#define INIT_COUNTER(_struct, _name)\
+	{ .name = #_name, .offset = MLX5_BYTE_OFF(_struct, _name)}
+
+static const struct mlx5_ib_flow_counter basic_flow_cnts[] = {
+	INIT_COUNTER(traffic_counter, packets),
+	INIT_COUNTER(traffic_counter, octets),
+};
+
+static void fill_counters_names_flow(char *names_buff)
+{
+	u32 basic_flow_size = ARRAY_SIZE(basic_flow_cnts);
+	int i;
+
+	for (i = 0; i < basic_flow_size; i++) {
+		strlcpy(names_buff, basic_flow_cnts[i].name,
+			IB_COUNTER_NAME_LEN - 1);
+		names_buff += IB_COUNTER_NAME_LEN;
+	}
+}
+
+static int mlx5_ib_init_flow_counter_entry(struct mlx5_ib_dev *dev,
+					   struct mlx5_desc_cs_attr *desc_cs)
+{
+	u32 basic_flow_size = ARRAY_SIZE(basic_flow_cnts);
+	struct ib_counter_set_describe_attr *ib_cs_desc;
+
+	ib_cs_desc = &desc_cs->cs_desc;
+	ib_cs_desc->counted_type = IB_COUNTER_SET_FLOW;
+	ib_cs_desc->num_of_cs =
+			1 << MLX5_CAP_GEN(dev->mdev, log_max_flow_counter_bulk);
+	ib_cs_desc->entries_count = basic_flow_size;
+	ib_cs_desc->attributes = IB_COUNTER_SET_ATTR_CACHED;
+	ib_cs_desc->counters_names_len = basic_flow_size * IB_COUNTER_NAME_LEN;
+	desc_cs->fill_counters_names = fill_counters_names_flow;
+
+	return 0;
+}
+
 static int alloc_ib_counter_sets(struct mlx5_ib_dev *dev)
 {
 	struct mlx5_ib_counter_sets *ib_css = &dev->counter_sets;
+	u32 num_of_css = 0;
+	u32 ret;
 
 	ib_css->max_counter_sets = 0;
+	/* flow counters support */
+	ib_css->max_counter_sets +=
+		(mlx5_ib_port_link_layer(&dev->ib_dev, 1)
+				== IB_LINK_LAYER_ETHERNET);
+
+	if (!ib_css->max_counter_sets)
+		return 0;
+
+	/* alloc counter sets array  */
+	ib_css->desc_cs_arr =
+		kcalloc(ib_css->max_counter_sets,
+			sizeof(struct mlx5_desc_cs_attr),
+			GFP_KERNEL);
+	if (!ib_css->desc_cs_arr) {
+		ret = -ENOMEM;
+		goto err_no_cs;
+	}
+
+	ret = mlx5_ib_init_flow_counter_entry(dev,
+					      &ib_css->desc_cs_arr[num_of_css++]);
+	if (ret)
+		goto err_init_flow_counter_entry;
 
 	return 0;
+
+err_init_flow_counter_entry:
+	kfree(ib_css->desc_cs_arr);
+err_no_cs:
+	return ret;
 }
 
 static void dealloc_ib_counter_sets(struct mlx5_ib_dev *dev)
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 8e0331c..9a84527 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -140,6 +140,7 @@ struct mlx5_flow_act {
 	u32 flow_tag;
 	u32 encap_id;
 	u32 modify_id;
+	struct ib_counter_set *counter_set;
 };
 
 #define MLX5_DECLARE_FLOW_ACT(name) \
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH rdma-next 16/16] IB/mlx5: Expose max_counter_sets capability
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (14 preceding siblings ...)
  2017-10-19 14:41   ` [PATCH rdma-next 15/16] IB/mlx5: Add flow counter set support Yishai Hadas
@ 2017-10-19 14:41   ` Yishai Hadas
  2017-10-23 16:51   ` [PATCH rdma-next 00/16] Flow counters support Jason Gunthorpe
  16 siblings, 0 replies; 41+ messages in thread
From: Yishai Hadas @ 2017-10-19 14:41 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

From: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

max_counter_sets represents the number of different counter sets
exposed and makes the upper limit for the counter set id provided
to describe/create counter sets verbs i.e. counter set id must be
in range [0, max_counter_sets - 1].

Counter set id will be used with counter set object in downstream patches.

Signed-off-by: Raed Salem <raeds-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 46ac67b..499d694 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -601,6 +601,9 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 		return -EINVAL;
 
 	memset(props, 0, sizeof(*props));
+	props->max_counter_sets =
+			dev->counter_sets.max_counter_sets;
+
 	err = mlx5_query_system_image_guid(ibdev,
 					   &props->sys_image_guid);
 	if (err)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 03/16] IB/core: Introduce counter set describe verb
       [not found]     ` <1508424118-27205-4-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-10-20 10:44       ` Christopher Lameter
  2017-10-21  0:29         ` Guy Shattah
  0 siblings, 1 reply; 41+ messages in thread
From: Christopher Lameter @ 2017-10-20 10:44 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

On Thu, 19 Oct 2017, Yishai Hadas wrote:

> +#define IB_COUNTER_NAME_LEN 64
> +struct ib_counter_set_describe_attr {
> +	/* Type that this set refers to, use enum ib_counter_set_type */
> +	u8 counted_type;

Gap here of 7 bytes. Maybe move the field to get a more compact
structure?

> +	/* Number of instances of this counter-set available in the hardware */
> +	u64 num_of_cs;

u16 or u32 should be enough right?

> +	/* Attributes of the set, use enum ib_counter_set_attributes */
> +	u32 attributes;
> +	/* Number of counters in this set */
> +	u8 entries_count;

A counter set is limited to 255? Could there be counter sets that have a
thousand or so?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 07/16] IB/core: Introduce counter set query verb
       [not found]     ` <1508424118-27205-8-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-10-20 10:48       ` Christopher Lameter
  2017-10-20 15:40         ` Guy Shattah
  0 siblings, 1 reply; 41+ messages in thread
From: Christopher Lameter @ 2017-10-20 10:48 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

On Thu, 19 Oct 2017, Yishai Hadas wrote:

> A counter set must be first attached to an IB object in order to
> be queried for its underlay counters, downstream patches will
> present bind and query methods for flow counters.
> The user has an option as part of the query verb to force reading
> the up-to-date hardware values instead of reading some cached
> values by using the IB_COUNTER_SET_FORCE_UPDATE flag.

What does "cached" mean in this context? Accurate to the last second? Or
just return what was returned earlier? Semantics look a bit unclear to me.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 07/16] IB/core: Introduce counter set query verb
  2017-10-20 10:48       ` Christopher Lameter
@ 2017-10-20 15:40         ` Guy Shattah
  0 siblings, 0 replies; 41+ messages in thread
From: Guy Shattah @ 2017-10-20 15:40 UTC (permalink / raw)
  To: Christopher Lameter, Yishai Hadas
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny


On Thu, 19 Oct 2017, Christopher Lameter wrote:
On Thu, 19 Oct 2017, Yishai Hadas wrote:

>> A counter set must be first attached to an IB object in order to
>> be queried for its underlay counters, downstream patches will
>> present bind and query methods for flow counters.
>> The user has an option as part of the query verb to force reading
>> the up-to-date hardware values instead of reading some cached
>> values by using the IB_COUNTER_SET_FORCE_UPDATE flag.

>What does "cached" mean in this context? Accurate to the last second? Or
>just return what was returned earlier? Semantics look a bit unclear to me.

When this flag is used - querying the counter does not guarantee accurate value from the hardware.
The driver might return a value that was true several nonsecs. or even several seconds ago.


Explanation:

Querying the hardware is expensive,  when user queries the hardware
the driver could decide to use the same query to query multiple counter-set
rather than a single one ( the one the user asked for)

The counter-sets the user did not ask for - are stored in the driver cache.
Driver may decide to return these value rather than query the hardware again.


Nevertheless,  when using the the query API user can force the driver to 
execute direct hardware query , rather than return cached value.


Guy


    --
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH rdma-next 03/16] IB/core: Introduce counter set describe verb
  2017-10-20 10:44       ` Christopher Lameter
@ 2017-10-21  0:29         ` Guy Shattah
       [not found]           ` <AM6PR0502MB37838B19976EDF1D04C74751BD400-md96bDB8+JV1k1TWM4Wt8cDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Guy Shattah @ 2017-10-21  0:29 UTC (permalink / raw)
  To: Christopher Lameter, Yishai Hadas
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny



> On Thu, 19 Oct 2017, Yishai Hadas wrote:
> 
> > +#define IB_COUNTER_NAME_LEN 64
> > +struct ib_counter_set_describe_attr {
> > +	/* Type that this set refers to, use enum ib_counter_set_type */
> > +	u8 counted_type;
> 
> Gap here of 7 bytes. Maybe move the field to get a more compact structure?
> 
> > +	/* Number of instances of this counter-set available in the hardware
> */
> > +	u64 num_of_cs;
> 
> u16 or u32 should be enough right?

Future API might be added to create user-customizable  sets.
How many potential subsets are for a set of 32 counters? More than 2^64.
We might want to be able to support it.

> 
> > +	/* Attributes of the set, use enum ib_counter_set_attributes */
> > +	u32 attributes;
> > +	/* Number of counters in this set */
> > +	u8 entries_count;
> 
> A counter set is limited to 255? Could there be counter sets that have a
> thousand or so?

A counter set is merely a set. I can't see a reason why would a user want to query a set 
which contains more than 256 counters. All at the same time.
However, I agree that we could modify it to u16 or u32.

[Guy Shattah] 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 03/16] IB/core: Introduce counter set describe verb
       [not found]           ` <AM6PR0502MB37838B19976EDF1D04C74751BD400-md96bDB8+JV1k1TWM4Wt8cDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-10-22 12:00             ` Yishai Hadas
  0 siblings, 0 replies; 41+ messages in thread
From: Yishai Hadas @ 2017-10-22 12:00 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Guy Shattah, Yishai Hadas, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny

On 10/21/2017 3:29 AM, Guy Shattah wrote:
> 
> 
>> On Thu, 19 Oct 2017, Yishai Hadas wrote:
>>
>>> +#define IB_COUNTER_NAME_LEN 64
>>> +struct ib_counter_set_describe_attr {
>>> +	/* Type that this set refers to, use enum ib_counter_set_type */
>>> +	u8 counted_type;
>>
>> Gap here of 7 bytes. Maybe move the field to get a more compact structure?
>>
>>> +	/* Number of instances of this counter-set available in the hardware
>> */
>>> +	u64 num_of_cs;
>>
>> u16 or u32 should be enough right?
> 
> Future API might be added to create user-customizable  sets.
> How many potential subsets are for a set of 32 counters? More than 2^64.
> We might want to be able to support it.
> 
>>
>>> +	/* Attributes of the set, use enum ib_counter_set_attributes */
>>> +	u32 attributes;
>>> +	/* Number of counters in this set */
>>> +	u8 entries_count;
>>
>> A counter set is limited to 255? Could there be counter sets that have a
>> thousand or so?
> 
> A counter set is merely a set. I can't see a reason why would a user want to query a set
> which contains more than 256 counters. All at the same time.
> However, I agree that we could modify it to u16 or u32.

Thanks Christoper for looking at the series.

In V1 will change this field to be u16 and re-order the fields in struct 
ib_counter_set_describe_attr to be more compact as you suggest.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (15 preceding siblings ...)
  2017-10-19 14:41   ` [PATCH rdma-next 16/16] IB/mlx5: Expose max_counter_sets capability Yishai Hadas
@ 2017-10-23 16:51   ` Jason Gunthorpe
       [not found]     ` <20171023165118.GA18097-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  16 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2017-10-23 16:51 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

On Thu, Oct 19, 2017 at 05:41:42PM +0300, Yishai Hadas wrote:

> More specifically,
> A generic interface with the following functionality is presented:
> - An option to know per device how many counter sets it exposes by using some
>   extension of the ib_query_device_ex() verb. (i.e. max_counter_sets).
> - Per counter set there is an API to get its metadata by using a new verb named
>    ib_describe_counter_set(), it gets a counter set id in range of 
>    [0,max_counter_sets - 1] and return the counter set information.
>    Some of the metadata is IB generic (e.g. counter set type, num counters in the set, etc.)
>    while other is some specific driver information (e.g. counters names).
> - Introduce counter set object named 'ib_counter_set' and its create/destroy
>   verbs. Post a successful creation the counter_set object can  be attached to
>   an IB object based on its type. (e.g. Flow, QP, etc.) 
> - Introduce the ib_query_counter_set() verb, it enables the user to read the
>   hardware counters which are associated with the counter set.
> - Introduce a new steering specification from type count 
>   (i.e. ib_flow_spec_action_count) which gets a previously created counter_id.
>   By creating a flow steering object with a counter, its counters can be read
>   by the ib_query_counter_set() verb.

I haven't seen the uapi yet, but I don't like the direction this is
going, this looks much too hard for applications to acutally use.

My suggestion for the verbs interface is:

// Create an object to hold the counters
counters = ibv_create_counters([..]);

// Sample a specific counter linked to a QP in counter index 0
ibv_set_sampling_point_qp(counters, 0, qp, [..])

// Sample a specific counter linked to a flow steering object in counter index 1
ibv_add_sampling_point_flow(counters, 1, flow, [..])

// Stop reading anything associated with index 1
ibv_clear_sampling_point(counters, 1);

// Sample global device PMA counters
ibv_add_sampling_point_device(counters, 2, verbs, port, [..])

// Retreive the counters from hardware or from cache
uint64_t data[2];
ibv_read_counters(counters, data, NELEMS(data), [..]);
printf("QP counter is %u\n", data[0]);

I don't want to expose this goofy idea of a 'counter set' to the user,
that is a some weird device spcecific thing, and every device will be
different. No way the user should have to deal with this.

To implement the above, your driver will have to determine the
hardware specific counter sets to instantiate under the covers, and
map their contents to the indexes the user is asking for with the above list.

If for some reason it cannot satisfy some combination of counters then
it should fail the add_sampling_point call.

There is not much reason to have a complex query API, use the 'setup
and catch failure' approach to detect what features the device
supports.

[..] will have to be something sensible to define the sampling
point, maybe an enum, or maybe a string? Not sure.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]     ` <20171023165118.GA18097-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-10-23 17:00       ` Leon Romanovsky
  2017-10-25 14:58       ` Yishai Hadas
  1 sibling, 0 replies; 41+ messages in thread
From: Leon Romanovsky @ 2017-10-23 17:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 682 bytes --]

On Mon, Oct 23, 2017 at 10:51:18AM -0600, Jason Gunthorpe wrote:
> On Thu, Oct 19, 2017 at 05:41:42PM +0300, Yishai Hadas wrote:

<...>

>
> // Retreive the counters from hardware or from cache
> uint64_t data[2];
> ibv_read_counters(counters, data, NELEMS(data), [..]);
> printf("QP counter is %u\n", data[0]);
>

<..>

> [..] will have to be something sensible to define the sampling
> point, maybe an enum, or maybe a string? Not sure.

Enum

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]     ` <20171023165118.GA18097-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2017-10-23 17:00       ` Leon Romanovsky
@ 2017-10-25 14:58       ` Yishai Hadas
       [not found]         ` <b003f6e5-d7ce-3775-a1dc-0fd0f507a515-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  1 sibling, 1 reply; 41+ messages in thread
From: Yishai Hadas @ 2017-10-25 14:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w, Alexr-VPRAkNaXOzVWk0Htik3J/w,
	Guy Shattah, tzahio-VPRAkNaXOzVWk0Htik3J/w

On 10/23/2017 7:51 PM, Jason Gunthorpe wrote:
> On Thu, Oct 19, 2017 at 05:41:42PM +0300, Yishai Hadas wrote:
> 
>> More specifically,
>> A generic interface with the following functionality is presented:
>> - An option to know per device how many counter sets it exposes by using some
>>    extension of the ib_query_device_ex() verb. (i.e. max_counter_sets).
>> - Per counter set there is an API to get its metadata by using a new verb named
>>     ib_describe_counter_set(), it gets a counter set id in range of
>>     [0,max_counter_sets - 1] and return the counter set information.
>>     Some of the metadata is IB generic (e.g. counter set type, num counters in the set, etc.)
>>     while other is some specific driver information (e.g. counters names).
>> - Introduce counter set object named 'ib_counter_set' and its create/destroy
>>    verbs. Post a successful creation the counter_set object can  be attached to
>>    an IB object based on its type. (e.g. Flow, QP, etc.)
>> - Introduce the ib_query_counter_set() verb, it enables the user to read the
>>    hardware counters which are associated with the counter set.
>> - Introduce a new steering specification from type count
>>    (i.e. ib_flow_spec_action_count) which gets a previously created counter_id.
>>    By creating a flow steering object with a counter, its counters can be read
>>    by the ib_query_counter_set() verb.
> 
> I haven't seen the uapi yet, but I don't like the direction this is
> going, this looks much too hard for applications to acutally use.
> 

Hi Jason, please go over the patches/uapi, we believe that you'll find 
it quite easy to be used by an application, however we plan to improve 
as of below.

> My suggestion for the verbs interface is:
> 

We sent an RFC few months ago, if we got your feedback by that time we 
could involve it as part of V0. However, we found few of your notes that 
can be applicable even for this kernel cycle and we plan to involve them 
as part of V1, details below.

Expected changes from V0:
-------------------------------
#Capabilities:
The first patch introduced the notion of "max_counter_sets" capability, 
basically we thought to enable few subsets of same counted type (e.g. 
Flow/QP) which will be built by the driver.

To enable your approach to let user controlling its required counters 
per IB counter object we will move to report supported_counted_types 
which will be some enum/flag per counter type. (E.g. Flow, QP, Port, 
etc.) and drop the option to have few counter sets per type that are 
built by the driver.

#The describe API:
1) We'll change the API to get the counted type as an input instead of 
returning it as an output.
2) We'll drop the num_of_cs (number of counter sets) and counted type 
from its output.

#The create API:
We found it better to stay with our suggested API with few changes, 
below notes were considered.

1) There might be a hardware that doesn't support attaching a counter 
object to an existing object, your suggested API to set sampling points 
relay on that ability.

Specifically talking, mlx5 doesn't support attaching a counter object to 
Flow post its creation, that's why we introduced a new counter_spec type 
that introduced an already created IB counter.

2) We prefer using one single system call to create the ib_counter 
object with its required counters instead of creating it with sampling 
points which might involve many system calls.

3) We don't have at the moment a customer use case to remove a sampling 
point, we expect an application to create an IB counter with its 
relevant counters and use it. This ability can be added in the future 
upon demand, on objects that support that by some modify API.

4) Currently the API introduces vendor ordering based on the describe 
output and exposes only the option to read all counters from the given 
counter type. For day one we find it enough as for now there is only one 
counted type that is supported (i.e. Flow) by mlx5 driver and it exposes
only 2 counters. However, as the API is extensible we can extend it with 
comp_mask to get some other input which may control the ordering/partial 
counters/compound objects in the future based on demand.

5) The expected change in V1 for the create API is to get counted type 
as input instead of counter set index, this can be overwritten as 
pointed by the comp_mask flag to be more complex/informative meta data 
in later stages.

#Flow spec API to enable counting:
No change is expected, motivation was mentioned above.

#Query API:
For now no change is expected, it matches also to your approach as it 
just gets an IB counter and a buffer to collect the data.

Let's please get your feedback on above so that we can be ready with V1 
soon.

Thanks,
Yishai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]         ` <b003f6e5-d7ce-3775-a1dc-0fd0f507a515-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2017-10-25 15:17           ` Jason Gunthorpe
       [not found]             ` <20171025151734.GA15557-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2017-10-25 15:17 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: Yishai Hadas, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, raeds-VPRAkNaXOzVWk0Htik3J/w,
	majd-VPRAkNaXOzVWk0Htik3J/w, Alexr-VPRAkNaXOzVWk0Htik3J/w,
	Guy Shattah, tzahio-VPRAkNaXOzVWk0Htik3J/w

On Wed, Oct 25, 2017 at 05:58:15PM +0300, Yishai Hadas wrote:

> Hi Jason, please go over the patches/uapi, we believe that you'll find it
> quite easy to be used by an application, however we plan to improve as of
> below.

No, it is not, this proposal relies too much on device-specific
information to make sense in a multi-vendor context. Counters should
not be grouped in a hardware unique way.

But it is still hard to understand what you are talking about, so
maybe if you provide a short and clearer example..

> #The create API:
> We found it better to stay with our suggested API with few changes, below
> notes were considered.
> 
> 1) There might be a hardware that doesn't support attaching a counter object
> to an existing object, your suggested API to set sampling points relay on
> that ability.

That is unfortunate, and kind of lame, but you can still use the
proposed API by re-ordering things. Continue to have
ibv_add_sampling_point_flow, however a NULL flow argument will make
the counters become added when the flow is created:

 counters = ibv_create_counters([..]);
 ibv_add_sampling_point_flow(counters, 1, NULL, [..])
 flow_attr.counters = counters;
 ibv_create_flow(qp, &flow_attr);

> 2) We prefer using one single system call to create the ib_counter object
> with its required counters instead of creating it with sampling points which
> might involve many system calls.

No. More system calls are good for something like this, there is a lot
of possibly future variety here. Systems calls are better than huge
structs and comp mask.

It allows you to ditch the silly describe API by making the creation
side more granular.

> 3) We don't have at the moment a customer use case to remove a sampling
> point, we expect an application to create an IB counter with its relevant
> counters and use it. This ability can be added in the future upon demand, on
> objects that support that by some modify API.

Ok

> 4) Currently the API introduces vendor ordering based on the describe output
> and exposes only the option to read all counters from the given counter
> type. For day one we find it enough as for now there is only one counted
> type that is supported (i.e. Flow) by mlx5 driver and it exposes
> only 2 counters. However, as the API is extensible we can extend it with
> comp_mask to get some other input which may control the ordering/partial
> counters/compound objects in the future based on demand.

I don't understand this remark, why would irderubg ever matter for
counters?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]             ` <20171025151734.GA15557-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-10-27 15:46               ` Guy Shattah
       [not found]                 ` <AM6PR0502MB3783A1186AA0ABDCCD5359AEBD5A0-md96bDB8+JV1k1TWM4Wt8cDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Guy Shattah @ 2017-10-27 15:46 UTC (permalink / raw)
  To: Jason Gunthorpe, Yishai Hadas
  Cc: Yishai Hadas, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny,
	Alex Rosenbaum, Tzahi Oved

    
On Wed, Oct 25, 2017 at 06:17:15PM +0300, Jason Gunthorpe wrote:
>On Wed, Oct 25, 2017 at 05:58:15PM +0300, Yishai Hadas wrote:
>
>> Hi Jason, please go over the patches/uapi, we believe that you'll find it
>> quite easy to be used by an application, however we plan to improve as of
>> below.
>
>No, it is not, this proposal relies too much on device-specific
>information to make sense in a multi-vendor context. Counters should
>not be grouped in a hardware unique way.

counters are not group in hardware unique but in an intuitive way.
all counters related to flow (ibv_flow) are grouped togeger, and following
the same pattern all counters related to QP (ibv_qp) should be grouped together.

>
>But it is still hard to understand what you are talking about, so
>maybe if you provide a short and clearer example..
>
>> #The create API:
>> We found it better to stay with our suggested API with few changes, below
>> notes were considered.
>> 
>> 1) There might be a hardware that doesn't support attaching a counter object
>> to an existing object, your suggested API to set sampling points relay on
>> that ability.
>
>That is unfortunate, and kind of lame, but you can still use the
>proposed API by re-ordering things. Continue to have
>ibv_add_sampling_point_flow, however a NULL flow argument will make
>the counters become added when the flow is created:
>
> counters = ibv_create_counters([..]);
> ibv_add_sampling_point_flow(counters, 1, NULL, [..])
> flow_attr.counters = counters;
> ibv_create_flow(qp, &flow_attr);

As mentioned earlier - current hardware support attachment of counters
only during the creation of the object (flow/qp/etc).
Your proposal could be useful in the case where a structure of a dynamic
counter set is to be modified. However - since it is not the current case,
there is no benefit in dividing the build of such a counter-set into several steps.
Hence we suggest to supply the creation API with instructions on how to create such a set. 
Having said that - I'm not saying that the partial steps are totally useless. 
These steps could be used as 'syntactic sugar' in order to change some kind
of structured which would be passed to the creation method. 
Hence, methods such as  ibv_add_sampling_point_flow() should either be 
user-space-verbs only, or as macros or totally not there.

>
>> 2) We prefer using one single system call to create the ib_counter object
>> with its required counters instead of creating it with sampling points which
>> might involve many system calls.
>
>No. More system calls are good for something like this, there is a lot
>of possibly future variety here. Systems calls are better than huge
>structs and comp mask.
>
>It allows you to ditch the silly describe API by making the creation
>side more granular.
See response above.

Guy

>
>Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]                 ` <AM6PR0502MB3783A1186AA0ABDCCD5359AEBD5A0-md96bDB8+JV1k1TWM4Wt8cDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-10-27 15:59                   ` Jason Gunthorpe
       [not found]                     ` <20171027155955.GA15922-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2017-10-27 15:59 UTC (permalink / raw)
  To: Guy Shattah
  Cc: Yishai Hadas, Yishai Hadas, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny,
	Alex Rosenbaum, Tzahi Oved

On Fri, Oct 27, 2017 at 03:46:57PM +0000, Guy Shattah wrote:
> ?   
> On Wed, Oct 25, 2017 at 06:17:15PM +0300, Jason Gunthorpe wrote:
> >On Wed, Oct 25, 2017 at 05:58:15PM +0300, Yishai Hadas wrote:
> >
> >> Hi Jason, please go over the patches/uapi, we believe that you'll find it
> >> quite easy to be used by an application, however we plan to improve as of
> >> below.
> >
> >No, it is not, this proposal relies too much on device-specific
> >information to make sense in a multi-vendor context. Counters should
> >not be grouped in a hardware unique way.
> 
> counters are not group in hardware unique but in an intuitive way.

I don't think you understand my remark.

I do not want to expose the idea of hardware limited groupings of
counters to the user at all.

If the user needs counters 1,2 then the user should request those
counters only. Hardware that needs to sample, say, counter 1,2,3,4 to
get those will have to figure that out internally.

If the API requires a complex query interface just for the user it use
it at all, then it is a total failure in my view. And that seems to be
what is proposed here.

> >That is unfortunate, and kind of lame, but you can still use the
> >proposed API by re-ordering things. Continue to have
> >ibv_add_sampling_point_flow, however a NULL flow argument will make
> >the counters become added when the flow is created:
> >
> >?counters = ibv_create_counters([..]);
> >?ibv_add_sampling_point_flow(counters, 1, NULL, [..])
> >?flow_attr.counters = counters;
> >?ibv_create_flow(qp, &flow_attr);
> 
> As mentioned earlier - current hardware support attachment of counters
> only during the creation of the object (flow/qp/etc).

And as I explained, the above addresses that.

> Your proposal could be useful in the case where a structure of a dynamic
> counter set is to be modified. However - since it is not the current case,
> there is no benefit in dividing the build of such a counter-set into
> several steps.

The purpose is to make the API less dependent on specific
implementation details.

Mellanox drivers may wish to implement ibv_add_sampling_point_* as
some kind of internal bookkeeping in userspace, other drives may wish
to do kernel calls. It depends on how each type of hardware works.

You need to stop looking at the API through a Mellanox only lense.

In any case, having broken down APIs is far easier to use for the user.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]                     ` <20171027155955.GA15922-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-10-29 15:21                       ` Alex Rosenbaum
       [not found]                         ` <CAFgAxU-UcRapsoRn3hNUn27xgY370gUJ+WWE4URBq84ufkCXtA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Alex Rosenbaum @ 2017-10-29 15:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Guy Shattah, Yishai Hadas, Yishai Hadas,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny,
	Alex Rosenbaum, Tzahi Oved

On Fri, Oct 27, 2017 at 5:59 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Fri, Oct 27, 2017 at 03:46:57PM +0000, Guy Shattah wrote:
>> On Wed, Oct 25, 2017 at 06:17:15PM +0300, Jason Gunthorpe wrote:
>> >On Wed, Oct 25, 2017 at 05:58:15PM +0300, Yishai Hadas wrote:
>>
>> counters are not group in hardware unique but in an intuitive way.
>
> I don't think you understand my remark.
>
> I do not want to expose the idea of hardware limited groupings of
> counters to the user at all.
>
> If the user needs counters 1,2 then the user should request those
> counters only. Hardware that needs to sample, say, counter 1,2,3,4 to
> get those will have to figure that out internally.

Since this is all about measuring verbs objects, it seems sensible for
us to group the counters by that same verbs sets.
But we don't have to go that way.
We can keep a flat list describing all available counters for all verbs objects.
Or maybe you're thinking of some other method for the user to learn
about available counters and their respectful objects?

> If the API requires a complex query interface just for the user it use
> it at all, then it is a total failure in my view. And that seems to be
> what is proposed here.
>
>> >That is unfortunate, and kind of lame, but you can still use the
>> >proposed API by re-ordering things. Continue to have
>> >ibv_add_sampling_point_flow, however a NULL flow argument will make
>> >the counters become added when the flow is created:
>> >
>> >?counters = ibv_create_counters([..]);

to clarify, does this create a single counter collection? what is
referred to as counter_set in the RFC/patches? or multiple counter's'?
I assume the '[..]' is ibv_context.


>> >?ibv_add_sampling_point_flow(counters, 1, NULL, [..])

so do we really need the flow object handle is this API (the 'NULL')?
why not go with a 'ibv_set_sampling_point(counters, 1, [..])' which
covers any/all counter definition for any object?
and when needed add the ibv_attach_sampling_point_qp().

and we need to clear the definition of '[..]'... does user learns what
inputs are valid from a describe API, or some other method (re the
above describe API question)?


>> >?flow_attr.counters = counters;
>> >?ibv_create_flow(qp, &flow_attr);

Works OK for mlx5 HW/driver

>> As mentioned earlier - current hardware support attachment of counters
>> only during the creation of the object (flow/qp/etc).
>
> And as I explained, the above addresses that.
>
>> Your proposal could be useful in the case where a structure of a dynamic
>> counter set is to be modified. However - since it is not the current case,
>> there is no benefit in dividing the build of such a counter-set into
>> several steps.
>
> The purpose is to make the API less dependent on specific
> implementation details.
>
> Mellanox drivers may wish to implement ibv_add_sampling_point_* as
> some kind of internal bookkeeping in userspace, other drives may wish
> to do kernel calls. It depends on how each type of hardware works.

Yes, we definitely need to make sure the API allows this u.s.
bookkeeping and reduce the amount system calls.
So the verbs CMD will allow 1 or more sampling points definitions.
In an OVS use case there can be many 10,000'nd of flows + counters and
this needs to be fast.

> You need to stop looking at the API through a Mellanox only lense.

In our patches we listed all vendor specific counter types in the
vendor driver code, and allow access to this set through generic verbs
in the describe API.
In our api, any vendor could utilize the same api to expose their
vendor specific counters.
In your suggestion, how do vendors add their specific HW dependent counters?

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]                         ` <CAFgAxU-UcRapsoRn3hNUn27xgY370gUJ+WWE4URBq84ufkCXtA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-10-29 18:00                           ` Jason Gunthorpe
       [not found]                             ` <20171029180019.GE4488-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2017-10-29 18:00 UTC (permalink / raw)
  To: Alex Rosenbaum
  Cc: Guy Shattah, Yishai Hadas, Yishai Hadas,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny,
	Alex Rosenbaum, Tzahi Oved

On Sun, Oct 29, 2017 at 05:21:20PM +0200, Alex Rosenbaum wrote:
> On Fri, Oct 27, 2017 at 5:59 PM, Jason Gunthorpe
> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> > On Fri, Oct 27, 2017 at 03:46:57PM +0000, Guy Shattah wrote:
> >> On Wed, Oct 25, 2017 at 06:17:15PM +0300, Jason Gunthorpe wrote:
> >> >On Wed, Oct 25, 2017 at 05:58:15PM +0300, Yishai Hadas wrote:
> >>
> >> counters are not group in hardware unique but in an intuitive way.
> >
> > I don't think you understand my remark.
> >
> > I do not want to expose the idea of hardware limited groupings of
> > counters to the user at all.
> >
> > If the user needs counters 1,2 then the user should request those
> > counters only. Hardware that needs to sample, say, counter 1,2,3,4 to
> > get those will have to figure that out internally.
>
> Since this is all about measuring verbs objects, it seems sensible for
> us to group the counters by that same verbs sets.

Sure, but each verb may have lots and of different associated
counters, different hardware may have different sets, some hardware
may only be able to act on certain groupings of counters, etc.


> We can keep a flat list describing all available counters for all verbs objects.
> Or maybe you're thinking of some other method for the user to learn
> about available counters and their respectful objects?

Leon suggested an enum.

If the 'add_sampling_point_x' doesn't work then the counter is not
supported by the driver. Simple.

> >> >?counters = ibv_create_counters([..]);
> 
> to clarify, does this create a single counter collection? what is
> referred to as counter_set in the RFC/patches? or multiple counter's'?
> I assume the '[..]' is ibv_context.

This is a grouping of counters that are returned together when the
'read counter' API is called.

It is really unclear to me how this relates to the original proposed
API (still haven't seen the man pages!)

> >> >?ibv_add_sampling_point_flow(counters, 1, NULL, [..])
> 
> so do we really need the flow object handle is this API (the
> 'NULL')?

I had original drafted it with the idea that counters would be added/removed
to pre-existing verbs objects. This would support a fairly usual 'turn
on performance monitoring/turn off performance monitoring' kind of
application flow, when the counters were being used for monitoring.

I think mlx has designed the chip and was thinking of the API for some
other purpose where the counters always exist.

Having the argument allows both use models if hardware comes
along. Can mlx do the 'add counter' for any counter??

> and we need to clear the definition of '[..]'... does user learns what
> inputs are valid from a describe API, or some other method (re the
> above describe API question)?

Leon suggested a global documented enum, perhaps augmented with enum's
from the DV headers? The original series suggested a string?

> > You need to stop looking at the API through a Mellanox only lense.
> 
> In our patches we listed all vendor specific counter types in the
> vendor driver code, and allow access to this set through generic verbs
> in the describe API.
> In our api, any vendor could utilize the same api to expose their
> vendor specific counters.

Yes, but making the counters general is the trivial part of this
excercise. Making the API that creates any hardware objects and
configures the hardware to record these counters not be unduely tied
to a specific implementation is the tricky bit..

> In your suggestion, how do vendors add their specific HW dependent counters?

dv.h

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]                             ` <20171029180019.GE4488-uk2M96/98Pc@public.gmane.org>
@ 2017-11-01  9:32                               ` Alex Rosenbaum
       [not found]                                 ` <CAFgAxU_CMxQ616wBd-vkvyJ905ndLHrabmWtJ-Ye_hgSwWqiag-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Alex Rosenbaum @ 2017-11-01  9:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Guy Shattah, Yishai Hadas, Yishai Hadas,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny,
	Alex Rosenbaum, Tzahi Oved

On Sun, Oct 29, 2017 at 8:00 PM, Jason Gunthorpe <jgg-uk2M96/98Pc@public.gmane.org> wrote:
> On Sun, Oct 29, 2017 at 05:21:20PM +0200, Alex Rosenbaum wrote:
>> On Fri, Oct 27, 2017 at 5:59 PM, Jason Gunthorpe wrote:
>> > On Fri, Oct 27, 2017 at 03:46:57PM +0000, Guy Shattah wrote:
>> >> On Wed, Oct 25, 2017 at 06:17:15PM +0300, Jason Gunthorpe wrote:
>> >> >On Wed, Oct 25, 2017 at 05:58:15PM +0300, Yishai Hadas wrote:

> Leon suggested a global documented enum, perhaps augmented with
> enum's from the DV headers? The original series suggested a string?

Yes, the original suggestion was based on strings. but let's go with enum.
We didn't plan on a global set from start (based on strings), but
going with enum in verbs.h definitely makes sense to define some of
these as global scope and additional vendor/DV will always be required
(maybe above some IBV_COUNTER_VENDOR_RANGE value).


> If the 'add_sampling_point_x' doesn't work then the counter is not
> supported by the driver. Simple.

Since we don't want the provider lib to go to Kernel each call, we
need to keep the counter capability (aka: describe) uverbs command, at
least for the provider lib usage to get the values of supported
counters for its device.
I agree that it is not a must for user to use the describe API and
application can test the support as you mentioned above. But it is
always better to expose caps to user like most other verbs resources
do and we'll need in for our mlx library any way.

I would suggest to move it to be part of struct ibv_device_attr_ex {
struct ibv_counter_cap *counter_caps } retrieved from
ibv_query_device_ex() instead of a dedicated API.
Where struct ibv_counter_cap will hold an array of supported
ibv_counter_type enums by the device and maybe some flags per type to
indicate extra capabilities: volatile, cached.
Maybe even a short "pretty" string.


>> >> >?ibv_add_sampling_point_flow(counters, 1, NULL, [..])
>>
>> so do we really need the flow object handle is this API (the
>> 'NULL')?
>
> I had original drafted it with the idea that counters would be added/removed
> to pre-existing verbs objects. This would support a fairly usual 'turn
> on performance monitoring/turn off performance monitoring' kind of
> application flow, when the counters were being used for monitoring.
>
> I think mlx has designed the chip and was thinking of the API for some
> other purpose where the counters always exist.

There are 2 usage models I see which probably caused the misunderstanding here:
1. add/remove a counter during the verbs objects life cycle to measure
during a short period. Main use case will be debug or periodic
auto-tuning of resources. This matches your API design nicely.
2. measure the entire objects activity. OVS offload of flows must
report back 100% of the activity (redirect or drop of all packets).
This requires to create the flow/qp atomically bounded with the
counter. Currently this does not match your suggested API. I guess
that where our disagreement are coming from.
In case #2 we'll have to bind the counter handle to the verbs object
while creating the object: ibv_create_flow(flow, attr{coutner}) or
ibv_create_qp(qp, attr{counter}).


> Having the argument allows both use models if hardware comes along.
yes, agreed. At least I'm not against it, but definitely would like to
see a global scope add API which fits better a flat counter enum
model.
As I mentioned in previously email: ibv_add_sampling_point(counters, 1, [..])


> Can mlx do the 'add counter' for any counter??
Not sure I understood your question exactly.
mlx5 devices has counter-sets which can be bound to a flow object, and
other kind of counter-sets which can be bound to a QP, and some
additional ones too.
We can create both in HW (attached to flow and qp respectfully) and in
driver copy the relevant values to the user's counter object in the
correct location.


These changes will require some time to re-work the code.
Once we agree on the model, we'll go back to a more details arch and
implementation cycle and update on ML.

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]                                 ` <CAFgAxU_CMxQ616wBd-vkvyJ905ndLHrabmWtJ-Ye_hgSwWqiag-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-11-01 18:18                                   ` Jason Gunthorpe
       [not found]                                     ` <20171101181807.GJ1030-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2017-11-01 18:18 UTC (permalink / raw)
  To: Alex Rosenbaum
  Cc: Guy Shattah, Yishai Hadas, Yishai Hadas,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny,
	Alex Rosenbaum, Tzahi Oved

On Wed, Nov 01, 2017 at 11:32:26AM +0200, Alex Rosenbaum wrote:

> Yes, the original suggestion was based on strings. but let's go with enum.
> We didn't plan on a global set from start (based on strings), but
> going with enum in verbs.h definitely makes sense to define some of
> these as global scope and additional vendor/DV will always be required
> (maybe above some IBV_COUNTER_VENDOR_RANGE value).

Well, I would have insisted on some well defined global strings...

> > If the 'add_sampling_point_x' doesn't work then the counter is not
> > supported by the driver. Simple.
> 
> Since we don't want the provider lib to go to Kernel each call, we
> need to keep the counter capability (aka: describe) uverbs command, at
> least for the provider lib usage to get the values of supported
> counters for its device.

How you do that is up to your driver, I don't think we need a fully
complex describe API from the kernel. Just some simple HW specific
bitsets that indicate what level of support the driver and hardware
have. mlx can't have that many permutations, right?

> application can test the support as you mentioned above. But it is
> always better to expose caps to user like most other verbs resources
> do and we'll need in for our mlx library any way.

There are many ways to approach cap testing, we don't always need to
return a huge struct, for instance:

 ibv_query_counter_cap(...)

Is probably much saner.

> I would suggest to move it to be part of struct ibv_device_attr_ex {
> struct ibv_counter_cap *counter_caps } retrieved from
> ibv_query_device_ex() instead of a dedicated API.
> Where struct ibv_counter_cap will hold an array of supported
> ibv_counter_type enums by the device and maybe some flags per type to
> indicate extra capabilities: volatile, cached.
> Maybe even a short "pretty" string.

This might ultimately make sense as a verbs API of some kind, but not
sure we need a kernel call to support it..

> There are 2 usage models I see which probably caused the misunderstanding here:
> 1. add/remove a counter during the verbs objects life cycle to measure
> during a short period. Main use case will be debug or periodic
> auto-tuning of resources. This matches your API design nicely.
> 2. measure the entire objects activity. OVS offload of flows must
> report back 100% of the activity (redirect or drop of all packets).
> This requires to create the flow/qp atomically bounded with the
> counter. Currently this does not match your suggested API. I guess
> that where our disagreement are coming from.

I said to pass NULL for the sampling point object and then pass the
counter object to the create:

> In case #2 we'll have to bind the counter handle to the verbs object
> while creating the object: ibv_create_flow(flow, attr{coutner}) or
> ibv_create_qp(qp, attr{counter}).

Which seems perfectly fine for all uses case.

Having the object argument, even if NULL, lets us scope the counters
to the object under use, so eg we can have more generic counts like
PACKETS_RECIEVED, and if you apply that to a flow group it is the
packets entering the flow group, and if you apply it to a QP it is the
packets entering the QP, etc.

Having QP*FLOW*COUNTS labels seems overly complicated.

It also allows the API to reject confusing request eg,

  ibv_set_sampling_point_qp(counters, 0, NULL, FLOW_DISCARDED);

Should fail, as 'FLOW_DISCARDED' is a counter specific only to flows,
and asking to associate it with a QP is wrong.

It makes it much clearer in the API excatly what thing is being
counted without having to consult a manual.

> > Can mlx do the 'add counter' for any counter??
> Not sure I understood your question exactly.

I mean, if I have an existing QP, can I start counting packets
received?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]                                     ` <20171101181807.GJ1030-uk2M96/98Pc@public.gmane.org>
@ 2017-11-01 18:59                                       ` Alex Rosenbaum
       [not found]                                         ` <CAFgAxU-DsOr9T9P6gqvZ9AviE45_34vZ1WUgUehmF-kb2j8JtQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Alex Rosenbaum @ 2017-11-01 18:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Guy Shattah, Yishai Hadas, Yishai Hadas,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny,
	Alex Rosenbaum, Tzahi Oved

On Wed, Nov 1, 2017 at 8:18 PM, Jason Gunthorpe <jgg-uk2M96/98Pc@public.gmane.org> wrote:
> On Wed, Nov 01, 2017 at 11:32:26AM +0200, Alex Rosenbaum wrote:
>
>> > Can mlx do the 'add counter' for any counter??
>> Not sure I understood your question exactly.
>
> I mean, if I have an existing QP, can I start counting packets
> received?

yes, for QP mlx do support adding/removing the counters on the fly.
we could do this from ibv_modify_qp() to attach the entire defined
counter set or for a single counter value from
ibv_add_sampling_point_qp(counter, idx, qp, [..])
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]                                         ` <CAFgAxU-DsOr9T9P6gqvZ9AviE45_34vZ1WUgUehmF-kb2j8JtQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-11-01 19:01                                           ` Jason Gunthorpe
       [not found]                                             ` <20171101190119.GL1030-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2017-11-01 19:01 UTC (permalink / raw)
  To: Alex Rosenbaum
  Cc: Guy Shattah, Yishai Hadas, Yishai Hadas,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny,
	Alex Rosenbaum, Tzahi Oved

On Wed, Nov 01, 2017 at 08:59:25PM +0200, Alex Rosenbaum wrote:
> On Wed, Nov 1, 2017 at 8:18 PM, Jason Gunthorpe <jgg-uk2M96/98Pc@public.gmane.org> wrote:
> > On Wed, Nov 01, 2017 at 11:32:26AM +0200, Alex Rosenbaum wrote:
> >
> >> > Can mlx do the 'add counter' for any counter??
> >> Not sure I understood your question exactly.
> >
> > I mean, if I have an existing QP, can I start counting packets
> > received?
> 
> yes, for QP mlx do support adding/removing the counters on the fly.
> we could do this from ibv_modify_qp() to attach the entire defined
> counter set or for a single counter value from
> ibv_add_sampling_point_qp(counter, idx, qp, [..])

Then I think my proposed API makes the most sense overall since it
captures this use model as well, even if mlx currently does not support
this use model for flow counters.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]                                             ` <20171101190119.GL1030-uk2M96/98Pc@public.gmane.org>
@ 2017-11-01 19:46                                               ` Alex Rosenbaum
       [not found]                                                 ` <CAFgAxU9FZQC3JC6sEjB9W3YqHyF5StJ8_=mDQsF0eCWr9010hw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Alex Rosenbaum @ 2017-11-01 19:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Guy Shattah, Yishai Hadas, Yishai Hadas,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny,
	Alex Rosenbaum, Tzahi Oved

> Then I think my proposed API makes the most sense overall since it
> captures this use model as well, even if mlx currently does not support
> this use model for flow counters.
>
it will force the kernel syscall on each modify, which is something
that will limit the usage at scale.
doing the bookkeeping in user space will require a trigger to commit
this to kernel/HW, which can be the modify() or an additional API.

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]                                                 ` <CAFgAxU9FZQC3JC6sEjB9W3YqHyF5StJ8_=mDQsF0eCWr9010hw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-11-01 21:16                                                   ` Jason Gunthorpe
       [not found]                                                     ` <20171101211629.GA18874-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2017-11-01 21:16 UTC (permalink / raw)
  To: Alex Rosenbaum
  Cc: Guy Shattah, Yishai Hadas, Yishai Hadas,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny,
	Alex Rosenbaum, Tzahi Oved

On Wed, Nov 1, 2017 at 1:46 PM, Alex Rosenbaum <rosenbaumalex-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Then I think my proposed API makes the most sense overall since it
>> captures this use model as well, even if mlx currently does not support
>> this use model for flow counters.

> it will force the kernel syscall on each modify, which is something
> that will limit the usage at scale.

No, it doesn't force that. As we've talked about the driver should
book-keep in user space as required.

> doing the bookkeeping in user space will require a trigger to commit
> this to kernel/HW, which can be the modify() or an additional API.

The trigger to push things to the kernel is something like creating a
flow steering object.

The driver can bundle the create of the counter object, and the send
of the counters to add, in with the syscall for creating the flow
steering object.

Same argument for QP, similar argument for attaching to an existing
QP.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]                                                     ` <20171101211629.GA18874-uk2M96/98Pc@public.gmane.org>
@ 2017-11-02  6:50                                                       ` Alex Rosenbaum
       [not found]                                                         ` <CAFgAxU8GNEiyzwHqrYyxs8J7T0TUqmN7JrZukkA0JgYSgY8FoA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Alex Rosenbaum @ 2017-11-02  6:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Guy Shattah, Yishai Hadas, Yishai Hadas,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny,
	Alex Rosenbaum, Tzahi Oved

On Wed, Nov 1, 2017 at 11:16 PM, Jason Gunthorpe <jgg-uk2M96/98Pc@public.gmane.org> wrote:
> On Wed, Nov 1, 2017 at 1:46 PM, Alex Rosenbaum <rosenbaumalex-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> Then I think my proposed API makes the most sense overall since it
>>> captures this use model as well, even if mlx currently does not support
>>> this use model for flow counters.
>
>> it will force the kernel syscall on each modify, which is something
>> that will limit the usage at scale.
>
> No, it doesn't force that. As we've talked about the driver should
> book-keep in user space as required.
>
>> doing the bookkeeping in user space will require a trigger to commit
>> this to kernel/HW, which can be the modify() or an additional API.
>
> The trigger to push things to the kernel is something like creating a
> flow steering object.

I'm referring to reducing the kernel calls when supporting the on the
fly modification for counter and QP pair updates.
if user wants to add/remove a sampling point on an already operational
QP of flow, calling ibv_add_sampling_point_qp will have to go to
kernel each time, unless we allow multiple add_sampling_point followed
by a ibv_modify_qp(qp, {attr{counter}}) or add some 'flush' flag/API,
or 'more to come' flag until final add_sampling_poing_qp call.

> The driver can bundle the create of the counter object, and the send
> of the counters to add, in with the syscall for creating the flow
> steering object.
>
> Same argument for QP, similar argument for attaching to an existing
> QP.

totally agree about the atomic create of counter with verbs object.
this is well defined with the ibv_create_flow/qp()

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]                                                         ` <CAFgAxU8GNEiyzwHqrYyxs8J7T0TUqmN7JrZukkA0JgYSgY8FoA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-11-02 15:38                                                           ` Jason Gunthorpe
       [not found]                                                             ` <20171102153848.GF18874-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2017-11-02 15:38 UTC (permalink / raw)
  To: Alex Rosenbaum
  Cc: Guy Shattah, Yishai Hadas, Yishai Hadas,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny,
	Alex Rosenbaum, Tzahi Oved

On Thu, Nov 02, 2017 at 08:50:22AM +0200, Alex Rosenbaum wrote:

> by a ibv_modify_qp(qp, {attr{counter}}) or add some 'flush' flag/API,
> or 'more to come' flag until final add_sampling_poing_qp call.

Something along the lines of 'ibv_attach_counters_qp' or 'modify_qp'
(see other comments on overloading modify) is reasonable to me.

So, if every path has a atomic/batch API, do you still want to include
the single counter add capability? It would be fine to me to drop the
object arg, but keep the single counter and differentiated API..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]                                                             ` <20171102153848.GF18874-uk2M96/98Pc@public.gmane.org>
@ 2017-11-02 16:11                                                               ` Alex Rosenbaum
       [not found]                                                                 ` <CAFgAxU_Ouzk1bsBpZ==gTetD3OVGVosgeJRvwqErqz5s2utHBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Alex Rosenbaum @ 2017-11-02 16:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Guy Shattah, Yishai Hadas, Yishai Hadas,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny,
	Alex Rosenbaum, Tzahi Oved

On Thu, Nov 2, 2017 at 5:38 PM, Jason Gunthorpe <jgg-uk2M96/98Pc@public.gmane.org> wrote:
> On Thu, Nov 02, 2017 at 08:50:22AM +0200, Alex Rosenbaum wrote:
>
> So, if every path has a atomic/batch API, do you still want to include
> the single counter add capability? It would be fine to me to drop the
> object arg, but keep the single counter and differentiated API..

I think single counter add with user space bookkeeping is good and
simple API for applciations.

what do you mean by "differentiated API"?
do you mean keep the ibv_add_xxx_qp() but remove the ibv_qp object
input parameter? same for flow?
if yes, isn't it simpler to do ibv_create_counters_qp() or
ibv_create_counters(QP), and keep a single non differentiated add
sample counter point on a clearly defined ibv_counter
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]                                                                 ` <CAFgAxU_Ouzk1bsBpZ==gTetD3OVGVosgeJRvwqErqz5s2utHBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-11-02 16:19                                                                   ` Jason Gunthorpe
       [not found]                                                                     ` <20171102161928.GJ18874-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2017-11-02 16:19 UTC (permalink / raw)
  To: Alex Rosenbaum
  Cc: Guy Shattah, Yishai Hadas, Yishai Hadas,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny,
	Alex Rosenbaum, Tzahi Oved

On Thu, Nov 02, 2017 at 06:11:07PM +0200, Alex Rosenbaum wrote:
> On Thu, Nov 2, 2017 at 5:38 PM, Jason Gunthorpe <jgg-uk2M96/98Pc@public.gmane.org> wrote:
> > On Thu, Nov 02, 2017 at 08:50:22AM +0200, Alex Rosenbaum wrote:
> >
> > So, if every path has a atomic/batch API, do you still want to include
> > the single counter add capability? It would be fine to me to drop the
> > object arg, but keep the single counter and differentiated API..
> 
> I think single counter add with user space bookkeeping is good and
> simple API for applciations.
> 
> what do you mean by "differentiated API"?
> do you mean keep the ibv_add_xxx_qp() but remove the ibv_qp object
> input parameter? same for flow?

Yes

> if yes, isn't it simpler to do ibv_create_counters_qp() or
> ibv_create_counters(QP), and keep a single non differentiated add
> sample counter point on a clearly defined ibv_counter

You want to have the counters object itself linked to only QP or only
flow? Why?

The counter object should be able to aggregate counting
anything so that a single kernel round trip will return all of the
desired counters on any sort of object.

The point of the per-object differentiation is to keep the counter
labling simpler:

 ibv_add_sampling_point(counters, 0, QP, RX_BYTES)
 ibv_add_sampling_point(counters, 1, FLOW, RX_BYTES)
 ibv_add_sampling_point(counters, 2, QP, RX_PACKETS)
 ibv_add_sampling_point(counters, 3, FLOW, RX_PACKETS)

vs

 ibv_add_sampling_point(counters, 0, RX_BYTES_QP)
 ibv_add_sampling_point(counters, 1, RX_BYTES_FLOW)
[..]

Observing that most of the things we will want to count will likely
apply equally to qps, flows and other things. # packets, # bytes,
in/out versions, etc.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH rdma-next 00/16] Flow counters support
       [not found]                                                                     ` <20171102161928.GJ18874-uk2M96/98Pc@public.gmane.org>
@ 2017-12-27 15:59                                                                       ` Alex Rosenbaum
  0 siblings, 0 replies; 41+ messages in thread
From: Alex Rosenbaum @ 2017-12-27 15:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Guy Shattah, Yishai Hadas, Yishai Hadas,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Raed Salem, Majd Dibbiny,
	Alex Rosenbaum, Tzahi Oved

I posted a V2 [1] which is aligned to what we discussed here
Alex

[1] https://marc.info/?l=linux-rdma&m=151438961826612&w=2



On Thu, Nov 2, 2017 at 6:19 PM, Jason Gunthorpe <jgg-uk2M96/98Pc@public.gmane.org> wrote:
> On Thu, Nov 02, 2017 at 06:11:07PM +0200, Alex Rosenbaum wrote:
>> On Thu, Nov 2, 2017 at 5:38 PM, Jason Gunthorpe <jgg-uk2M96/98Pc@public.gmane.org> wrote:
>> > On Thu, Nov 02, 2017 at 08:50:22AM +0200, Alex Rosenbaum wrote:
>> >
>> > So, if every path has a atomic/batch API, do you still want to include
>> > the single counter add capability? It would be fine to me to drop the
>> > object arg, but keep the single counter and differentiated API..
>>
>> I think single counter add with user space bookkeeping is good and
>> simple API for applciations.
>>
>> what do you mean by "differentiated API"?
>> do you mean keep the ibv_add_xxx_qp() but remove the ibv_qp object
>> input parameter? same for flow?
>
> Yes
>
>> if yes, isn't it simpler to do ibv_create_counters_qp() or
>> ibv_create_counters(QP), and keep a single non differentiated add
>> sample counter point on a clearly defined ibv_counter
>
> You want to have the counters object itself linked to only QP or only
> flow? Why?
>
> The counter object should be able to aggregate counting
> anything so that a single kernel round trip will return all of the
> desired counters on any sort of object.
>
> The point of the per-object differentiation is to keep the counter
> labling simpler:
>
>  ibv_add_sampling_point(counters, 0, QP, RX_BYTES)
>  ibv_add_sampling_point(counters, 1, FLOW, RX_BYTES)
>  ibv_add_sampling_point(counters, 2, QP, RX_PACKETS)
>  ibv_add_sampling_point(counters, 3, FLOW, RX_PACKETS)
>
> vs
>
>  ibv_add_sampling_point(counters, 0, RX_BYTES_QP)
>  ibv_add_sampling_point(counters, 1, RX_BYTES_FLOW)
> [..]
>
> Observing that most of the things we will want to count will likely
> apply equally to qps, flows and other things. # packets, # bytes,
> in/out versions, etc.
>
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2017-12-27 15:59 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-19 14:41 [PATCH rdma-next 00/16] Flow counters support Yishai Hadas
     [not found] ` <1508424118-27205-1-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-10-19 14:41   ` [PATCH rdma-next 01/16] IB/core: Expose max_counter_sets capability Yishai Hadas
2017-10-19 14:41   ` [PATCH rdma-next 02/16] IB/uverbs: " Yishai Hadas
2017-10-19 14:41   ` [PATCH rdma-next 03/16] IB/core: Introduce counter set describe verb Yishai Hadas
     [not found]     ` <1508424118-27205-4-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-10-20 10:44       ` Christopher Lameter
2017-10-21  0:29         ` Guy Shattah
     [not found]           ` <AM6PR0502MB37838B19976EDF1D04C74751BD400-md96bDB8+JV1k1TWM4Wt8cDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-10-22 12:00             ` Yishai Hadas
2017-10-19 14:41   ` [PATCH rdma-next 04/16] IB/uverbs: Add describe counter set support Yishai Hadas
2017-10-19 14:41   ` [PATCH rdma-next 05/16] IB/core: Introduce counter set object and its create/destroy verbs Yishai Hadas
2017-10-19 14:41   ` [PATCH rdma-next 06/16] IB/uverbs: Add create/destroy counter set support Yishai Hadas
2017-10-19 14:41   ` [PATCH rdma-next 07/16] IB/core: Introduce counter set query verb Yishai Hadas
     [not found]     ` <1508424118-27205-8-git-send-email-yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-10-20 10:48       ` Christopher Lameter
2017-10-20 15:40         ` Guy Shattah
2017-10-19 14:41   ` [PATCH rdma-next 08/16] IB/uverbs: Add query counter set support Yishai Hadas
2017-10-19 14:41   ` [PATCH rdma-next 09/16] IB/core: Add support for flow counter set Yishai Hadas
2017-10-19 14:41   ` [PATCH rdma-next 10/16] IB/uverbs: " Yishai Hadas
2017-10-19 14:41   ` [PATCH rdma-next 11/16] net/mlx5: Export flow counter related API Yishai Hadas
2017-10-19 14:41   ` [PATCH rdma-next 12/16] net/mlx5: Expand mlx5_fc_query_cached to return absolute counters values Yishai Hadas
2017-10-19 14:41   ` [PATCH rdma-next 13/16] IB/mlx5: Add counter set operations Yishai Hadas
2017-10-19 14:41   ` [PATCH rdma-next 14/16] IB/mlx5: Pass mlx5_flow_act struct instead of multiple arguments Yishai Hadas
2017-10-19 14:41   ` [PATCH rdma-next 15/16] IB/mlx5: Add flow counter set support Yishai Hadas
2017-10-19 14:41   ` [PATCH rdma-next 16/16] IB/mlx5: Expose max_counter_sets capability Yishai Hadas
2017-10-23 16:51   ` [PATCH rdma-next 00/16] Flow counters support Jason Gunthorpe
     [not found]     ` <20171023165118.GA18097-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-10-23 17:00       ` Leon Romanovsky
2017-10-25 14:58       ` Yishai Hadas
     [not found]         ` <b003f6e5-d7ce-3775-a1dc-0fd0f507a515-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2017-10-25 15:17           ` Jason Gunthorpe
     [not found]             ` <20171025151734.GA15557-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-10-27 15:46               ` Guy Shattah
     [not found]                 ` <AM6PR0502MB3783A1186AA0ABDCCD5359AEBD5A0-md96bDB8+JV1k1TWM4Wt8cDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-10-27 15:59                   ` Jason Gunthorpe
     [not found]                     ` <20171027155955.GA15922-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-10-29 15:21                       ` Alex Rosenbaum
     [not found]                         ` <CAFgAxU-UcRapsoRn3hNUn27xgY370gUJ+WWE4URBq84ufkCXtA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-29 18:00                           ` Jason Gunthorpe
     [not found]                             ` <20171029180019.GE4488-uk2M96/98Pc@public.gmane.org>
2017-11-01  9:32                               ` Alex Rosenbaum
     [not found]                                 ` <CAFgAxU_CMxQ616wBd-vkvyJ905ndLHrabmWtJ-Ye_hgSwWqiag-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-01 18:18                                   ` Jason Gunthorpe
     [not found]                                     ` <20171101181807.GJ1030-uk2M96/98Pc@public.gmane.org>
2017-11-01 18:59                                       ` Alex Rosenbaum
     [not found]                                         ` <CAFgAxU-DsOr9T9P6gqvZ9AviE45_34vZ1WUgUehmF-kb2j8JtQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-01 19:01                                           ` Jason Gunthorpe
     [not found]                                             ` <20171101190119.GL1030-uk2M96/98Pc@public.gmane.org>
2017-11-01 19:46                                               ` Alex Rosenbaum
     [not found]                                                 ` <CAFgAxU9FZQC3JC6sEjB9W3YqHyF5StJ8_=mDQsF0eCWr9010hw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-01 21:16                                                   ` Jason Gunthorpe
     [not found]                                                     ` <20171101211629.GA18874-uk2M96/98Pc@public.gmane.org>
2017-11-02  6:50                                                       ` Alex Rosenbaum
     [not found]                                                         ` <CAFgAxU8GNEiyzwHqrYyxs8J7T0TUqmN7JrZukkA0JgYSgY8FoA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-02 15:38                                                           ` Jason Gunthorpe
     [not found]                                                             ` <20171102153848.GF18874-uk2M96/98Pc@public.gmane.org>
2017-11-02 16:11                                                               ` Alex Rosenbaum
     [not found]                                                                 ` <CAFgAxU_Ouzk1bsBpZ==gTetD3OVGVosgeJRvwqErqz5s2utHBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-02 16:19                                                                   ` Jason Gunthorpe
     [not found]                                                                     ` <20171102161928.GJ18874-uk2M96/98Pc@public.gmane.org>
2017-12-27 15:59                                                                       ` Alex Rosenbaum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.