All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next v3 0/4] ODP information and statistics
@ 2019-10-16  6:23 Leon Romanovsky
  2019-10-16  6:23 ` [PATCH rdma-next v3 1/4] IB/mlx5: Introduce ODP diagnostic counters Leon Romanovsky
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Leon Romanovsky @ 2019-10-16  6:23 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Erez Alfasi

From: Leon Romanovsky <leonro@mellanox.com>

Changelog:
 v3:
 * Implement batch count of "invalidataions".
 * "prefetched" ounter is dropped, will be separated to two separate counters (RQ/SQ)
  later once Jason's ODP rework will be finished.
 v2: https://lore.kernel.org/linux-rdma/20191006155139.30632-1-leon@kernel.org
 * Since umems can disappear during rereg flow and the fact that we
   are not locking its during our counter usage (uverbs prevents this
   by holding the uobject write lock), expose possible race bugs. Move
   the counters into mlx5_ib_mr to avoid racing bugs (related patches
   - #1, #3 & #4).
 * Fix page invalidation counting (Patch #1).
 * Make the code more elegant by defining fill_function type and use it
   within res_common_{dumpit, doit} (Patch #2).
 * Put an ODP implicit indicator within mlx5 reg MR operation,
   indicating when a given MR is ODP implicit registered and use
   its indication when dumping ODP type.
 * Since the counters has been moved to mlx5_ib_mr, the ODP stats are now
   filled with internal to mlx5 driver function. Remove the fill_odp_stats
   device operation from the reason mentioned above.
 v1: https://lore.kernel.org/linux-rdma/20190830081612.2611-1-leon@kernel.org
 * Dropped umem patch, because it doesn't follow our IB model, where
   UMEM is driver object and ib_core object (Jason).
 * Removed the ODP type indicator from ib_umem_odp not needed after
   commit fd7dbf035edc ("RDMA/odp: Make it clearer when a umem is an implicit ODP umem")
 * Since umems are not part of core MR (from the reason mentioned
   above) there is no way to access the odp type as was previously done via nldev
   (old patch #3). Instead, patch #4 is adding mlx5 implementation for fill_res_entry
   and dumping ODP type as part of the driver table entry, as its driver details.
 * Counter types are now atomic64_t instead of u64.
 v0: https://lore.kernel.org/linux-rdma/20190807103403.8102-1-leon@kernel.org

-----------------------------------------------------------------------------
Hi,

This series from Erez refactors exposes ODP type information (explicit,
implicit) and statistics through netlink interface.

Thanks

Erez Alfasi (4):
  IB/mlx5: Introduce ODP diagnostic counters
  RDMA/nldev: Allow different fill function per resource
  RDMA/mlx5: Return ODP type per MR
  RDMA/nldev: Provide MR statistics

 drivers/infiniband/core/device.c      |  1 +
 drivers/infiniband/core/nldev.c       | 98 ++++++++++++++++++++-------
 drivers/infiniband/hw/mlx5/Makefile   |  2 +-
 drivers/infiniband/hw/mlx5/main.c     |  3 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |  9 +++
 drivers/infiniband/hw/mlx5/odp.c      | 17 +++++
 drivers/infiniband/hw/mlx5/restrack.c | 90 ++++++++++++++++++++++++
 include/rdma/ib_verbs.h               | 12 ++++
 include/rdma/restrack.h               |  6 ++
 9 files changed, 212 insertions(+), 26 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/restrack.c

--
2.20.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH rdma-next v3 1/4] IB/mlx5: Introduce ODP diagnostic counters
  2019-10-16  6:23 [PATCH rdma-next v3 0/4] ODP information and statistics Leon Romanovsky
@ 2019-10-16  6:23 ` Leon Romanovsky
  2019-10-16  6:23 ` [PATCH rdma-next v3 2/4] RDMA/nldev: Allow different fill function per resource Leon Romanovsky
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2019-10-16  6:23 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Erez Alfasi

From: Erez Alfasi <ereza@mellanox.com>

Introduce ODP diagnostic counters and count the following
per MR within IB/mlx5 driver:
 1) Page faults:
	Total number of faulted pages.
 2) Page invalidations:
	Total number of pages invalidated by the OS during all
	invalidation events. The translations can be no longer
	valid due to either non-present pages or mapping changes.

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  4 ++++
 drivers/infiniband/hw/mlx5/odp.c     | 15 +++++++++++++++
 include/rdma/ib_verbs.h              |  5 +++++
 3 files changed, 24 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index bf30d53d94dc..5aae05ebf64b 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -585,6 +585,9 @@ struct mlx5_ib_dm {
 					  IB_ACCESS_REMOTE_READ   |\
 					  IB_ZERO_BASED)

+#define mlx5_update_odp_stats(mr, counter_name, value)		\
+	atomic64_add(value, &((mr)->odp_stats.counter_name))
+
 struct mlx5_ib_mr {
 	struct ib_mr		ibmr;
 	void			*descs;
@@ -622,6 +625,7 @@ struct mlx5_ib_mr {
 	wait_queue_head_t       q_leaf_free;
 	struct mlx5_async_work  cb_work;
 	atomic_t		num_pending_prefetch;
+	struct ib_odp_counters	odp_stats;
 };

 static inline bool is_odp_mr(struct mlx5_ib_mr *mr)
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 95cf0249b015..3601c6ad96f9 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -224,6 +224,7 @@ void mlx5_ib_invalidate_range(struct ib_umem_odp *umem_odp, unsigned long start,
 	const u64 umr_block_mask = (MLX5_UMR_MTT_ALIGNMENT /
 				    sizeof(struct mlx5_mtt)) - 1;
 	u64 idx = 0, blk_start_idx = 0;
+	u64 invalidations = 0;
 	int in_block = 0;
 	u64 addr;

@@ -261,6 +262,9 @@ void mlx5_ib_invalidate_range(struct ib_umem_odp *umem_odp, unsigned long start,
 				blk_start_idx = idx;
 				in_block = 1;
 			}
+
+			/* Count page invalidations */
+			invalidations += idx - blk_start_idx + 1;
 		} else {
 			u64 umr_offset = idx & umr_block_mask;

@@ -279,6 +283,9 @@ void mlx5_ib_invalidate_range(struct ib_umem_odp *umem_odp, unsigned long start,
 				   MLX5_IB_UPD_XLT_ZAP |
 				   MLX5_IB_UPD_XLT_ATOMIC);
 	mutex_unlock(&umem_odp->umem_mutex);
+
+	mlx5_update_odp_stats(mr, invalidations, invalidations);
+
 	/*
 	 * We are now sure that the device will not access the
 	 * memory. We can safely unmap it, and mark it as dirty if
@@ -287,6 +294,7 @@ void mlx5_ib_invalidate_range(struct ib_umem_odp *umem_odp, unsigned long start,

 	ib_umem_odp_unmap_dma_pages(umem_odp, start, end);

+
 	if (unlikely(!umem_odp->npages && mr->parent &&
 		     !umem_odp->dying)) {
 		WRITE_ONCE(umem_odp->dying, 1);
@@ -801,6 +809,13 @@ static int pagefault_single_data_segment(struct mlx5_ib_dev *dev,
 		if (ret < 0)
 			goto srcu_unlock;

+		/*
+		 * When prefetching a page, page fault is generated
+		 * in order to bring the page to the main memory.
+		 * In the current flow, page faults are being counted.
+		 */
+		mlx5_update_odp_stats(mr, faults, ret);
+
 		npages += ret;
 		ret = 0;
 		break;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6a47ba85c54c..26600dfb345d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2218,6 +2218,11 @@ struct rdma_netdev_alloc_params {
 				      struct net_device *netdev, void *param);
 };

+struct ib_odp_counters {
+	atomic64_t faults;
+	atomic64_t invalidations;
+};
+
 struct ib_counters {
 	struct ib_device	*device;
 	struct ib_uobject	*uobject;
--
2.20.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH rdma-next v3 2/4] RDMA/nldev: Allow different fill function per resource
  2019-10-16  6:23 [PATCH rdma-next v3 0/4] ODP information and statistics Leon Romanovsky
  2019-10-16  6:23 ` [PATCH rdma-next v3 1/4] IB/mlx5: Introduce ODP diagnostic counters Leon Romanovsky
@ 2019-10-16  6:23 ` Leon Romanovsky
  2019-10-16  6:23 ` [PATCH rdma-next v3 3/4] RDMA/mlx5: Return ODP type per MR Leon Romanovsky
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2019-10-16  6:23 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Erez Alfasi

From: Erez Alfasi <ereza@mellanox.com>

So far res_get_common_{dumpit, doit} was using the default
resource fill function which was defined as part of the
nldev_fill_res_entry fill_entries.

Add a fill function pointer as an argument allows us to use
different fill function in case we want to dump different
values then 'rdma resource' flow do, but still use the same
existing general resources dumping flow.

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/nldev.c | 44 +++++++++++++++++----------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 0ebe95c79ae0..01851467914a 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -42,6 +42,9 @@
 #include "cma_priv.h"
 #include "restrack.h"

+typedef int (*res_fill_func_t)(struct sk_buff*, bool,
+			       struct rdma_restrack_entry*, uint32_t);
+
 /*
  * Sort array elements by the netlink attribute name
  */
@@ -1128,8 +1131,6 @@ static int nldev_res_get_dumpit(struct sk_buff *skb,
 }

 struct nldev_fill_res_entry {
-	int (*fill_res_func)(struct sk_buff *msg, bool has_cap_net_admin,
-			     struct rdma_restrack_entry *res, u32 port);
 	enum rdma_nldev_attr nldev_attr;
 	enum rdma_nldev_command nldev_cmd;
 	u8 flags;
@@ -1143,21 +1144,18 @@ enum nldev_res_flags {

 static const struct nldev_fill_res_entry fill_entries[RDMA_RESTRACK_MAX] = {
 	[RDMA_RESTRACK_QP] = {
-		.fill_res_func = fill_res_qp_entry,
 		.nldev_cmd = RDMA_NLDEV_CMD_RES_QP_GET,
 		.nldev_attr = RDMA_NLDEV_ATTR_RES_QP,
 		.entry = RDMA_NLDEV_ATTR_RES_QP_ENTRY,
 		.id = RDMA_NLDEV_ATTR_RES_LQPN,
 	},
 	[RDMA_RESTRACK_CM_ID] = {
-		.fill_res_func = fill_res_cm_id_entry,
 		.nldev_cmd = RDMA_NLDEV_CMD_RES_CM_ID_GET,
 		.nldev_attr = RDMA_NLDEV_ATTR_RES_CM_ID,
 		.entry = RDMA_NLDEV_ATTR_RES_CM_ID_ENTRY,
 		.id = RDMA_NLDEV_ATTR_RES_CM_IDN,
 	},
 	[RDMA_RESTRACK_CQ] = {
-		.fill_res_func = fill_res_cq_entry,
 		.nldev_cmd = RDMA_NLDEV_CMD_RES_CQ_GET,
 		.nldev_attr = RDMA_NLDEV_ATTR_RES_CQ,
 		.flags = NLDEV_PER_DEV,
@@ -1165,7 +1163,6 @@ static const struct nldev_fill_res_entry fill_entries[RDMA_RESTRACK_MAX] = {
 		.id = RDMA_NLDEV_ATTR_RES_CQN,
 	},
 	[RDMA_RESTRACK_MR] = {
-		.fill_res_func = fill_res_mr_entry,
 		.nldev_cmd = RDMA_NLDEV_CMD_RES_MR_GET,
 		.nldev_attr = RDMA_NLDEV_ATTR_RES_MR,
 		.flags = NLDEV_PER_DEV,
@@ -1173,7 +1170,6 @@ static const struct nldev_fill_res_entry fill_entries[RDMA_RESTRACK_MAX] = {
 		.id = RDMA_NLDEV_ATTR_RES_MRN,
 	},
 	[RDMA_RESTRACK_PD] = {
-		.fill_res_func = fill_res_pd_entry,
 		.nldev_cmd = RDMA_NLDEV_CMD_RES_PD_GET,
 		.nldev_attr = RDMA_NLDEV_ATTR_RES_PD,
 		.flags = NLDEV_PER_DEV,
@@ -1181,7 +1177,6 @@ static const struct nldev_fill_res_entry fill_entries[RDMA_RESTRACK_MAX] = {
 		.id = RDMA_NLDEV_ATTR_RES_PDN,
 	},
 	[RDMA_RESTRACK_COUNTER] = {
-		.fill_res_func = fill_res_counter_entry,
 		.nldev_cmd = RDMA_NLDEV_CMD_STAT_GET,
 		.nldev_attr = RDMA_NLDEV_ATTR_STAT_COUNTER,
 		.entry = RDMA_NLDEV_ATTR_STAT_COUNTER_ENTRY,
@@ -1191,7 +1186,8 @@ static const struct nldev_fill_res_entry fill_entries[RDMA_RESTRACK_MAX] = {

 static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 			       struct netlink_ext_ack *extack,
-			       enum rdma_restrack_type res_type)
+			       enum rdma_restrack_type res_type,
+			       res_fill_func_t fill_func)
 {
 	const struct nldev_fill_res_entry *fe = &fill_entries[res_type];
 	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
@@ -1249,7 +1245,9 @@ static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 	}

 	has_cap_net_admin = netlink_capable(skb, CAP_NET_ADMIN);
-	ret = fe->fill_res_func(msg, has_cap_net_admin, res, port);
+
+	ret = fill_func(msg, has_cap_net_admin, res, port);
+
 	rdma_restrack_put(res);
 	if (ret)
 		goto err_free;
@@ -1269,7 +1267,8 @@ static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,

 static int res_get_common_dumpit(struct sk_buff *skb,
 				 struct netlink_callback *cb,
-				 enum rdma_restrack_type res_type)
+				 enum rdma_restrack_type res_type,
+				 res_fill_func_t fill_func)
 {
 	const struct nldev_fill_res_entry *fe = &fill_entries[res_type];
 	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
@@ -1354,7 +1353,8 @@ static int res_get_common_dumpit(struct sk_buff *skb,
 			goto msg_full;
 		}

-		ret = fe->fill_res_func(skb, has_cap_net_admin, res, port);
+		ret = fill_func(skb, has_cap_net_admin, res, port);
+
 		rdma_restrack_put(res);

 		if (ret) {
@@ -1397,17 +1397,19 @@ next:		idx++;
 	return ret;
 }

-#define RES_GET_FUNCS(name, type)                                              \
-	static int nldev_res_get_##name##_dumpit(struct sk_buff *skb,          \
+#define RES_GET_FUNCS(name, type)					       \
+	static int nldev_res_get_##name##_dumpit(struct sk_buff *skb,	       \
 						 struct netlink_callback *cb)  \
-	{                                                                      \
-		return res_get_common_dumpit(skb, cb, type);                   \
-	}                                                                      \
-	static int nldev_res_get_##name##_doit(struct sk_buff *skb,            \
-					       struct nlmsghdr *nlh,           \
+	{								       \
+		return res_get_common_dumpit(skb, cb, type,		       \
+					     fill_res_##name##_entry);	       \
+	}								       \
+	static int nldev_res_get_##name##_doit(struct sk_buff *skb,	       \
+					       struct nlmsghdr *nlh,	       \
 					       struct netlink_ext_ack *extack) \
-	{                                                                      \
-		return res_get_common_doit(skb, nlh, extack, type);            \
+	{								       \
+		return res_get_common_doit(skb, nlh, extack, type,	       \
+					   fill_res_##name##_entry);	       \
 	}

 RES_GET_FUNCS(qp, RDMA_RESTRACK_QP);
--
2.20.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH rdma-next v3 3/4] RDMA/mlx5: Return ODP type per MR
  2019-10-16  6:23 [PATCH rdma-next v3 0/4] ODP information and statistics Leon Romanovsky
  2019-10-16  6:23 ` [PATCH rdma-next v3 1/4] IB/mlx5: Introduce ODP diagnostic counters Leon Romanovsky
  2019-10-16  6:23 ` [PATCH rdma-next v3 2/4] RDMA/nldev: Allow different fill function per resource Leon Romanovsky
@ 2019-10-16  6:23 ` Leon Romanovsky
  2019-10-16  6:23 ` [PATCH rdma-next v3 4/4] RDMA/nldev: Provide MR statistics Leon Romanovsky
  2019-10-22 18:53 ` [PATCH rdma-next v3 0/4] ODP information and statistics Jason Gunthorpe
  4 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2019-10-16  6:23 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Erez Alfasi

From: Erez Alfasi <ereza@mellanox.com>

Provide an ODP explicit/implicit type as part
of 'rdma -dd resource show mr' dump.

For example:
~$: rdma -dd resource show mr
dev mlx5_0 mrn 1 rkey 0xa99a lkey 0xa99a mrlen 50000000
pdn 9 pid 7372 comm ibv_rc_pingpong drv_odp explicit

For non-ODP MRs, we won't print "drv_odp ..." at all.

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/nldev.c       | 13 ++++++++
 drivers/infiniband/hw/mlx5/Makefile   |  2 +-
 drivers/infiniband/hw/mlx5/main.c     |  1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |  3 ++
 drivers/infiniband/hw/mlx5/odp.c      |  2 ++
 drivers/infiniband/hw/mlx5/restrack.c | 48 +++++++++++++++++++++++++++
 include/rdma/restrack.h               |  3 ++
 7 files changed, 71 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/hw/mlx5/restrack.c

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 01851467914a..a38e7f5166fc 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -183,6 +183,19 @@ static int _rdma_nl_put_driver_u64(struct sk_buff *msg, const char *name,
 	return 0;
 }

+int rdma_nl_put_driver_string(struct sk_buff *msg, const char *name,
+			      const char *str)
+{
+	if (put_driver_name_print_type(msg, name,
+				       RDMA_NLDEV_PRINT_TYPE_UNSPEC))
+		return -EMSGSIZE;
+	if (nla_put_string(msg, RDMA_NLDEV_ATTR_DRIVER_STRING, str))
+		return -EMSGSIZE;
+
+	return 0;
+}
+EXPORT_SYMBOL(rdma_nl_put_driver_string);
+
 int rdma_nl_put_driver_u32(struct sk_buff *msg, const char *name, u32 value)
 {
 	return _rdma_nl_put_driver_u32(msg, name, RDMA_NLDEV_PRINT_TYPE_UNSPEC,
diff --git a/drivers/infiniband/hw/mlx5/Makefile b/drivers/infiniband/hw/mlx5/Makefile
index 9924be8384d8..d0a043ccbe58 100644
--- a/drivers/infiniband/hw/mlx5/Makefile
+++ b/drivers/infiniband/hw/mlx5/Makefile
@@ -3,7 +3,7 @@ obj-$(CONFIG_MLX5_INFINIBAND)	+= mlx5_ib.o

 mlx5_ib-y :=	main.o cq.o doorbell.o qp.o mem.o srq_cmd.o \
 		srq.o mr.o ah.o mad.o gsi.o ib_virt.o cmd.o \
-		cong.o
+		cong.o restrack.o
 mlx5_ib-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += odp.o
 mlx5_ib-$(CONFIG_MLX5_ESWITCH) += ib_rep.o
 mlx5_ib-$(CONFIG_INFINIBAND_USER_ACCESS) += devx.o
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index b95c2b05f682..3c3c19129cdd 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -6269,6 +6269,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 	.disassociate_ucontext = mlx5_ib_disassociate_ucontext,
 	.drain_rq = mlx5_ib_drain_rq,
 	.drain_sq = mlx5_ib_drain_sq,
+	.fill_res_entry = mlx5_ib_fill_res_entry,
 	.get_dev_fw_str = get_dev_fw_str,
 	.get_dma_mr = mlx5_ib_get_dma_mr,
 	.get_link_layer = mlx5_ib_port_link_layer,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 5aae05ebf64b..a0ca1ef16e4e 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -626,6 +626,7 @@ struct mlx5_ib_mr {
 	struct mlx5_async_work  cb_work;
 	atomic_t		num_pending_prefetch;
 	struct ib_odp_counters	odp_stats;
+	bool			is_odp_implicit;
 };

 static inline bool is_odp_mr(struct mlx5_ib_mr *mr)
@@ -1339,6 +1340,8 @@ struct mlx5_core_dev *mlx5_ib_get_native_port_mdev(struct mlx5_ib_dev *dev,
 						   u8 *native_port_num);
 void mlx5_ib_put_native_port_mdev(struct mlx5_ib_dev *dev,
 				  u8 port_num);
+int mlx5_ib_fill_res_entry(struct sk_buff *msg,
+			   struct rdma_restrack_entry *res);

 #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
 int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user);
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 3601c6ad96f9..2ab6e44aeaae 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -543,6 +543,8 @@ struct mlx5_ib_mr *mlx5_ib_alloc_implicit_mr(struct mlx5_ib_pd *pd,
 	atomic_set(&imr->num_leaf_free, 0);
 	atomic_set(&imr->num_pending_prefetch, 0);

+	imr->is_odp_implicit = true;
+
 	return imr;
 }

diff --git a/drivers/infiniband/hw/mlx5/restrack.c b/drivers/infiniband/hw/mlx5/restrack.c
new file mode 100644
index 000000000000..065049f52b83
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/restrack.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2019, Mellanox Technologies inc.  All rights reserved.
+ */
+
+#include <uapi/rdma/rdma_netlink.h>
+#include <rdma/ib_umem_odp.h>
+#include <rdma/restrack.h>
+#include "mlx5_ib.h"
+
+static int fill_res_mr_entry(struct sk_buff *msg,
+			     struct rdma_restrack_entry *res)
+{
+	struct ib_mr *ibmr = container_of(res, struct ib_mr, res);
+	struct mlx5_ib_mr *mr = to_mmr(ibmr);
+	struct nlattr *table_attr;
+
+	if (!(mr->access_flags & IB_ACCESS_ON_DEMAND))
+		return 0;
+
+	table_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_DRIVER);
+	if (!table_attr)
+		goto err;
+
+	if (mr->is_odp_implicit) {
+		if (rdma_nl_put_driver_string(msg, "odp", "implicit"))
+			goto err;
+	} else {
+		if (rdma_nl_put_driver_string(msg, "odp", "explicit"))
+			goto err;
+	}
+
+	nla_nest_end(msg, table_attr);
+	return 0;
+
+err:
+	nla_nest_cancel(msg, table_attr);
+	return -EMSGSIZE;
+}
+
+int mlx5_ib_fill_res_entry(struct sk_buff *msg,
+			   struct rdma_restrack_entry *res)
+{
+	if (res->type == RDMA_RESTRACK_MR)
+		return fill_res_mr_entry(msg, res);
+
+	return 0;
+}
diff --git a/include/rdma/restrack.h b/include/rdma/restrack.h
index 83df1ec6664e..fe9b3c507a9c 100644
--- a/include/rdma/restrack.h
+++ b/include/rdma/restrack.h
@@ -156,6 +156,9 @@ int rdma_nl_put_driver_u32_hex(struct sk_buff *msg, const char *name,
 int rdma_nl_put_driver_u64(struct sk_buff *msg, const char *name, u64 value);
 int rdma_nl_put_driver_u64_hex(struct sk_buff *msg, const char *name,
 			       u64 value);
+int rdma_nl_put_driver_string(struct sk_buff *msg, const char *name,
+			      const char *str);
+
 struct rdma_restrack_entry *rdma_restrack_get_byid(struct ib_device *dev,
 						   enum rdma_restrack_type type,
 						   u32 id);
--
2.20.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH rdma-next v3 4/4] RDMA/nldev: Provide MR statistics
  2019-10-16  6:23 [PATCH rdma-next v3 0/4] ODP information and statistics Leon Romanovsky
                   ` (2 preceding siblings ...)
  2019-10-16  6:23 ` [PATCH rdma-next v3 3/4] RDMA/mlx5: Return ODP type per MR Leon Romanovsky
@ 2019-10-16  6:23 ` Leon Romanovsky
  2019-10-22 18:41   ` Jason Gunthorpe
  2019-10-22 18:53 ` [PATCH rdma-next v3 0/4] ODP information and statistics Jason Gunthorpe
  4 siblings, 1 reply; 7+ messages in thread
From: Leon Romanovsky @ 2019-10-16  6:23 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Erez Alfasi

From: Erez Alfasi <ereza@mellanox.com>

Add RDMA nldev netlink interface for dumping MR
statistics information.

Output example:
ereza@dev~$: ./ibv_rc_pingpong -o -P -s 500000000
  local address:  LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID ::

ereza@dev~$: rdma stat show mr
dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/device.c      |  1 +
 drivers/infiniband/core/nldev.c       | 41 +++++++++++++++++++++++---
 drivers/infiniband/hw/mlx5/main.c     |  2 ++
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |  2 ++
 drivers/infiniband/hw/mlx5/restrack.c | 42 +++++++++++++++++++++++++++
 include/rdma/ib_verbs.h               |  7 +++++
 include/rdma/restrack.h               |  3 ++
 7 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index a667636f74bf..2e53aa25f0c7 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2606,6 +2606,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, drain_sq);
 	SET_DEVICE_OP(dev_ops, enable_driver);
 	SET_DEVICE_OP(dev_ops, fill_res_entry);
+	SET_DEVICE_OP(dev_ops, fill_stat_entry);
 	SET_DEVICE_OP(dev_ops, get_dev_fw_str);
 	SET_DEVICE_OP(dev_ops, get_dma_mr);
 	SET_DEVICE_OP(dev_ops, get_hw_stats);
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index a38e7f5166fc..5e056d5e5be3 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -453,6 +453,14 @@ static bool fill_res_entry(struct ib_device *dev, struct sk_buff *msg,
 	return dev->ops.fill_res_entry(msg, res);
 }

+static bool fill_stat_entry(struct ib_device *dev, struct sk_buff *msg,
+			    struct rdma_restrack_entry *res)
+{
+	if (!dev->ops.fill_stat_entry)
+		return false;
+	return dev->ops.fill_stat_entry(msg, res);
+}
+
 static int fill_res_qp_entry(struct sk_buff *msg, bool has_cap_net_admin,
 			     struct rdma_restrack_entry *res, uint32_t port)
 {
@@ -750,8 +758,8 @@ static int fill_stat_counter_qps(struct sk_buff *msg,
 	return ret;
 }

-static int fill_stat_hwcounter_entry(struct sk_buff *msg,
-				     const char *name, u64 value)
+int fill_stat_hwcounter_entry(struct sk_buff *msg,
+			      const char *name, u64 value)
 {
 	struct nlattr *entry_attr;

@@ -773,6 +781,25 @@ static int fill_stat_hwcounter_entry(struct sk_buff *msg,
 	nla_nest_cancel(msg, entry_attr);
 	return -EMSGSIZE;
 }
+EXPORT_SYMBOL(fill_stat_hwcounter_entry);
+
+static int fill_stat_mr_entry(struct sk_buff *msg, bool has_cap_net_admin,
+			      struct rdma_restrack_entry *res, uint32_t port)
+{
+	struct ib_mr *mr = container_of(res, struct ib_mr, res);
+	struct ib_device *dev = mr->pd->device;
+
+	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_MRN, res->id))
+		goto err;
+
+	if (fill_stat_entry(dev, msg, res))
+		goto err;
+
+	return 0;
+
+err:
+	return -EMSGSIZE;
+}

 static int fill_stat_counter_hwcounters(struct sk_buff *msg,
 					struct rdma_counter *counter)
@@ -2009,7 +2036,10 @@ static int nldev_stat_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 	case RDMA_NLDEV_ATTR_RES_QP:
 		ret = stat_get_doit_qp(skb, nlh, extack, tb);
 		break;
-
+	case RDMA_NLDEV_ATTR_RES_MR:
+		ret = res_get_common_doit(skb, nlh, extack, RDMA_RESTRACK_MR,
+					  fill_stat_mr_entry);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -2033,7 +2063,10 @@ static int nldev_stat_get_dumpit(struct sk_buff *skb,
 	case RDMA_NLDEV_ATTR_RES_QP:
 		ret = nldev_res_get_counter_dumpit(skb, cb);
 		break;
-
+	case RDMA_NLDEV_ATTR_RES_MR:
+		ret = res_get_common_dumpit(skb, cb, RDMA_RESTRACK_MR,
+					    fill_stat_mr_entry);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 3c3c19129cdd..fa23c8e7043b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -67,6 +67,7 @@
 #include <rdma/uverbs_std_types.h>
 #include <rdma/mlx5_user_ioctl_verbs.h>
 #include <rdma/mlx5_user_ioctl_cmds.h>
+#include <rdma/ib_umem_odp.h>

 #define UVERBS_MODULE_NAME mlx5_ib
 #include <rdma/uverbs_named_ioctl.h>
@@ -6270,6 +6271,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 	.drain_rq = mlx5_ib_drain_rq,
 	.drain_sq = mlx5_ib_drain_sq,
 	.fill_res_entry = mlx5_ib_fill_res_entry,
+	.fill_stat_entry = mlx5_ib_fill_stat_entry,
 	.get_dev_fw_str = get_dev_fw_str,
 	.get_dma_mr = mlx5_ib_get_dma_mr,
 	.get_link_layer = mlx5_ib_port_link_layer,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index a0ca1ef16e4e..e9bdb48cf1d3 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1342,6 +1342,8 @@ void mlx5_ib_put_native_port_mdev(struct mlx5_ib_dev *dev,
 				  u8 port_num);
 int mlx5_ib_fill_res_entry(struct sk_buff *msg,
 			   struct rdma_restrack_entry *res);
+int mlx5_ib_fill_stat_entry(struct sk_buff *msg,
+			    struct rdma_restrack_entry *res);

 #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
 int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user);
diff --git a/drivers/infiniband/hw/mlx5/restrack.c b/drivers/infiniband/hw/mlx5/restrack.c
index 065049f52b83..2b1f916570f9 100644
--- a/drivers/infiniband/hw/mlx5/restrack.c
+++ b/drivers/infiniband/hw/mlx5/restrack.c
@@ -8,6 +8,39 @@
 #include <rdma/restrack.h>
 #include "mlx5_ib.h"

+static int fill_stat_mr_entry(struct sk_buff *msg,
+			      struct rdma_restrack_entry *res)
+{
+	struct ib_mr *ibmr = container_of(res, struct ib_mr, res);
+	struct mlx5_ib_mr *mr = to_mmr(ibmr);
+	struct nlattr *table_attr;
+
+	if (!(mr->access_flags & IB_ACCESS_ON_DEMAND))
+		return 0;
+
+	table_attr = nla_nest_start(msg,
+				    RDMA_NLDEV_ATTR_STAT_HWCOUNTERS);
+
+	if (!table_attr)
+		goto err;
+
+	if (fill_stat_hwcounter_entry(msg, "page_faults",
+				      atomic64_read(&mr->odp_stats.faults)))
+		goto err_table;
+	if (fill_stat_hwcounter_entry(
+		    msg, "page_invalidations",
+		    atomic64_read(&mr->odp_stats.invalidations)))
+		goto err_table;
+
+	nla_nest_end(msg, table_attr);
+	return 0;
+
+err_table:
+	nla_nest_cancel(msg, table_attr);
+err:
+	return -EMSGSIZE;
+}
+
 static int fill_res_mr_entry(struct sk_buff *msg,
 			     struct rdma_restrack_entry *res)
 {
@@ -46,3 +79,12 @@ int mlx5_ib_fill_res_entry(struct sk_buff *msg,

 	return 0;
 }
+
+int mlx5_ib_fill_stat_entry(struct sk_buff *msg,
+			    struct rdma_restrack_entry *res)
+{
+	if (res->type == RDMA_RESTRACK_MR)
+		return fill_stat_mr_entry(msg, res);
+
+	return 0;
+}
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 26600dfb345d..aa8306f9ad83 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2568,6 +2568,13 @@ struct ib_device_ops {
 	 */
 	int (*counter_update_stats)(struct rdma_counter *counter);

+	/**
+	 * Allows rdma drivers to add their own restrack attributes
+	 * dumped via 'rdma stat' iproute2 command.
+	 */
+	int (*fill_stat_entry)(struct sk_buff *msg,
+			       struct rdma_restrack_entry *entry);
+
 	DECLARE_RDMA_OBJ_SIZE(ib_ah);
 	DECLARE_RDMA_OBJ_SIZE(ib_cq);
 	DECLARE_RDMA_OBJ_SIZE(ib_pd);
diff --git a/include/rdma/restrack.h b/include/rdma/restrack.h
index fe9b3c507a9c..90fdbe8a24a6 100644
--- a/include/rdma/restrack.h
+++ b/include/rdma/restrack.h
@@ -162,4 +162,7 @@ int rdma_nl_put_driver_string(struct sk_buff *msg, const char *name,
 struct rdma_restrack_entry *rdma_restrack_get_byid(struct ib_device *dev,
 						   enum rdma_restrack_type type,
 						   u32 id);
+int fill_stat_hwcounter_entry(struct sk_buff *msg,
+			      const char *name, u64 value);
+
 #endif /* _RDMA_RESTRACK_H_ */
--
2.20.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH rdma-next v3 4/4] RDMA/nldev: Provide MR statistics
  2019-10-16  6:23 ` [PATCH rdma-next v3 4/4] RDMA/nldev: Provide MR statistics Leon Romanovsky
@ 2019-10-22 18:41   ` Jason Gunthorpe
  0 siblings, 0 replies; 7+ messages in thread
From: Jason Gunthorpe @ 2019-10-22 18:41 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Erez Alfasi

On Wed, Oct 16, 2019 at 09:23:08AM +0300, Leon Romanovsky wrote:

> @@ -773,6 +781,25 @@ static int fill_stat_hwcounter_entry(struct sk_buff *msg,
>  	nla_nest_cancel(msg, entry_attr);
>  	return -EMSGSIZE;
>  }
> +EXPORT_SYMBOL(fill_stat_hwcounter_entry);

This is not a good name for a global symbol, I changed it to
rdma_nl_stat_hwcounter_entry

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH rdma-next v3 0/4] ODP information and statistics
  2019-10-16  6:23 [PATCH rdma-next v3 0/4] ODP information and statistics Leon Romanovsky
                   ` (3 preceding siblings ...)
  2019-10-16  6:23 ` [PATCH rdma-next v3 4/4] RDMA/nldev: Provide MR statistics Leon Romanovsky
@ 2019-10-22 18:53 ` Jason Gunthorpe
  4 siblings, 0 replies; 7+ messages in thread
From: Jason Gunthorpe @ 2019-10-22 18:53 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Erez Alfasi

On Wed, Oct 16, 2019 at 09:23:04AM +0300, Leon Romanovsky wrote:
> Erez Alfasi (4):
>   IB/mlx5: Introduce ODP diagnostic counters
>   RDMA/nldev: Allow different fill function per resource
>   RDMA/mlx5: Return ODP type per MR
>   RDMA/nldev: Provide MR statistics

applied to for-next, thanks

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-10-22 18:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-16  6:23 [PATCH rdma-next v3 0/4] ODP information and statistics Leon Romanovsky
2019-10-16  6:23 ` [PATCH rdma-next v3 1/4] IB/mlx5: Introduce ODP diagnostic counters Leon Romanovsky
2019-10-16  6:23 ` [PATCH rdma-next v3 2/4] RDMA/nldev: Allow different fill function per resource Leon Romanovsky
2019-10-16  6:23 ` [PATCH rdma-next v3 3/4] RDMA/mlx5: Return ODP type per MR Leon Romanovsky
2019-10-16  6:23 ` [PATCH rdma-next v3 4/4] RDMA/nldev: Provide MR statistics Leon Romanovsky
2019-10-22 18:41   ` Jason Gunthorpe
2019-10-22 18:53 ` [PATCH rdma-next v3 0/4] ODP information and statistics Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.