Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH rdma-next v1 0/4] ODP information and statistics
@ 2019-08-30  8:16 Leon Romanovsky
  2019-08-30  8:16 ` [PATCH rdma-next v1 1/4] IB/mlx5: Introduce ODP diagnostic counters Leon Romanovsky
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Leon Romanovsky @ 2019-08-30  8:16 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Erez Alfasi

From: Leon Romanovsky <leonro@mellanox.com>

Changelog:
 v1:
 * Dropped umem patch, because it doesn't follow our IB model, where
   UMEM is driver object and ib_core object (Jason).
 * Removed the ODP type indicator from ib_umem_odp not needed after
   commit fd7dbf035edc ("RDMA/odp: Make it clearer when a umem is an implicit ODP umem")
 * Since umems are not part of core MR (from the reason mentioned
   above) there is no way to access the odp type as was previously done via nldev
   (old patch #3). Instead, patch #4 is adding mlx5 implementation for fill_res_entry
   and dumping ODP type as part of the driver table entry, as its driver details.
 * Counter types are now atomic64_t instead of u64.
 v0: https://lore.kernel.org/linux-rdma/20190807103403.8102-1-leon@kernel.org

-----------------------------------------------------------------------------
Hi,

This series from Erez refactors exposes ODP type information (explicit,
implicit) and statistics through netlink interface.

Thanks

Erez Alfasi (4):
  IB/mlx5: Introduce ODP diagnostic counters
  RDMA/nldev: Allow different fill function per resource
  RDMA/nldev: Provide MR statistics
  RDMA/mlx5: Return ODP type per MR

 drivers/infiniband/core/device.c      |   1 +
 drivers/infiniband/core/nldev.c       | 109 ++++++++++++++++++++++----
 drivers/infiniband/hw/mlx5/Makefile   |   2 +-
 drivers/infiniband/hw/mlx5/main.c     |  17 ++++
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |   2 +
 drivers/infiniband/hw/mlx5/odp.c      |  18 +++++
 drivers/infiniband/hw/mlx5/restrack.c |  48 ++++++++++++
 include/rdma/ib_umem_odp.h            |  14 ++++
 include/rdma/ib_verbs.h               |   9 +++
 include/rdma/restrack.h               |   3 +
 10 files changed, 207 insertions(+), 16 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/restrack.c

--
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH rdma-next v1 1/4] IB/mlx5: Introduce ODP diagnostic counters
  2019-08-30  8:16 [PATCH rdma-next v1 0/4] ODP information and statistics Leon Romanovsky
@ 2019-08-30  8:16 ` Leon Romanovsky
  2019-09-09  8:45   ` Jason Gunthorpe
  2019-08-30  8:16 ` [PATCH rdma-next v1 2/4] RDMA/nldev: Allow different fill function per resource Leon Romanovsky
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2019-08-30  8:16 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Erez Alfasi

From: Erez Alfasi <ereza@mellanox.com>

Introduce ODP diagnostic counters and count the following
per MR within IB/mlx5 driver:
 1) Page faults:
	Total number of faulted pages.
 2) Page invalidations:
	Total number of pages invalidated by the OS during all
	invalidation events. The translations can be no longer
	valid due to either non-present pages or mapping changes.
 3) Prefetched pages:
	When prefetching a page, page fault is generated
	in order to bring the page to the main memory.
	The prefetched pages counter will be updated
	during a page fault flow only if it was derived
	from prefetching operation.

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/odp.c | 18 ++++++++++++++++++
 include/rdma/ib_umem_odp.h       | 14 ++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 905936423a03..b7c8a49ac753 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -287,6 +287,10 @@ void mlx5_ib_invalidate_range(struct ib_umem_odp *umem_odp, unsigned long start,
 
 	ib_umem_odp_unmap_dma_pages(umem_odp, start, end);
 
+	/* Count page invalidations */
+	ib_update_odp_stats(umem_odp, invalidations,
+			    ib_umem_odp_num_pages(umem_odp));
+
 	if (unlikely(!umem_odp->npages && mr->parent &&
 		     !umem_odp->dying)) {
 		WRITE_ONCE(umem_odp->dying, 1);
@@ -801,6 +805,20 @@ static int pagefault_single_data_segment(struct mlx5_ib_dev *dev,
 		if (ret < 0)
 			goto srcu_unlock;
 
+		/*
+		 * When prefetching a page, page fault is generated
+		 * in order to bring the page to the main memory.
+		 * In the current flow, page faults are being counted.
+		 * Prefetched pages counter will be updated as well
+		 * only if the current page fault flow was derived
+		 * from prefetching flow.
+		 */
+		ib_update_odp_stats(to_ib_umem_odp(mr->umem), faults, ret);
+
+		if (prefetch)
+			ib_update_odp_stats(to_ib_umem_odp(mr->umem),
+					    prefetched, ret);
+
 		npages += ret;
 		ret = 0;
 		break;
diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h
index b37c674b7fe6..3359e34516da 100644
--- a/include/rdma/ib_umem_odp.h
+++ b/include/rdma/ib_umem_odp.h
@@ -37,6 +37,12 @@
 #include <rdma/ib_verbs.h>
 #include <linux/interval_tree.h>
 
+struct ib_odp_counters {
+	atomic64_t faults;
+	atomic64_t invalidations;
+	atomic64_t prefetched;
+};
+
 struct ib_umem_odp {
 	struct ib_umem umem;
 	struct ib_ucontext_per_mm *per_mm;
@@ -62,6 +68,11 @@ struct ib_umem_odp {
 	struct mutex		umem_mutex;
 	void			*private; /* for the HW driver to use. */
 
+	/*
+	 * ODP diagnostic counters.
+	 */
+	struct ib_odp_counters odp_stats;
+
 	int notifiers_seq;
 	int notifiers_count;
 	int npages;
@@ -106,6 +117,9 @@ static inline size_t ib_umem_odp_num_pages(struct ib_umem_odp *umem_odp)
 	       umem_odp->page_shift;
 }
 
+#define ib_update_odp_stats(umem_odp, counter_name, value)		    \
+	atomic64_add(value, &(umem_odp->odp_stats.counter_name))
+
 /*
  * The lower 2 bits of the DMA address signal the R/W permissions for
  * the entry. To upgrade the permissions, provide the appropriate
-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH rdma-next v1 2/4] RDMA/nldev: Allow different fill function per resource
  2019-08-30  8:16 [PATCH rdma-next v1 0/4] ODP information and statistics Leon Romanovsky
  2019-08-30  8:16 ` [PATCH rdma-next v1 1/4] IB/mlx5: Introduce ODP diagnostic counters Leon Romanovsky
@ 2019-08-30  8:16 ` Leon Romanovsky
  2019-09-09  8:48   ` Jason Gunthorpe
  2019-08-30  8:16 ` [PATCH rdma-next v1 3/4] RDMA/nldev: Provide MR statistics Leon Romanovsky
  2019-08-30  8:16 ` [PATCH rdma-next v1 4/4] RDMA/mlx5: Return ODP type per MR Leon Romanovsky
  3 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2019-08-30  8:16 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Erez Alfasi

From: Erez Alfasi <ereza@mellanox.com>

So far res_get_common_{dumpit, doit} was using the default
resource fill function which was defined as part of the
nldev_fill_res_entry fill_entries.

Add a fill function pointer as an argument allows us to use
different fill function in case we want to dump different
values then 'rdma resource' flow do, but still use the same
existing general resources dumping flow.

If a NULL value is passed, it will be using the default
fill function that was defined in 'fill_entries' for a
given resource type.

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/nldev.c | 42 +++++++++++++++++++++++----------
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index cc08218f1ef7..47f7fe5432db 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -1181,7 +1181,10 @@ static const struct nldev_fill_res_entry fill_entries[RDMA_RESTRACK_MAX] = {
 
 static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 			       struct netlink_ext_ack *extack,
-			       enum rdma_restrack_type res_type)
+			       enum rdma_restrack_type res_type,
+			       int (*fill_func)(struct sk_buff*, bool,
+						struct rdma_restrack_entry*,
+						uint32_t))
 {
 	const struct nldev_fill_res_entry *fe = &fill_entries[res_type];
 	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
@@ -1244,7 +1247,12 @@ static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 	}
 
 	has_cap_net_admin = netlink_capable(skb, CAP_NET_ADMIN);
-	ret = fe->fill_res_func(msg, has_cap_net_admin, res, port);
+
+	if (fill_func)
+		ret = fill_func(msg, has_cap_net_admin, res, port);
+	else
+		ret = fe->fill_res_func(msg, has_cap_net_admin, res, port);
+
 	rdma_restrack_put(res);
 	if (ret)
 		goto err_free;
@@ -1264,7 +1272,10 @@ static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 static int res_get_common_dumpit(struct sk_buff *skb,
 				 struct netlink_callback *cb,
-				 enum rdma_restrack_type res_type)
+				 enum rdma_restrack_type res_type,
+				 int (*fill_func)(struct sk_buff*, bool,
+						  struct rdma_restrack_entry*,
+						  uint32_t))
 {
 	const struct nldev_fill_res_entry *fe = &fill_entries[res_type];
 	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
@@ -1352,7 +1363,12 @@ static int res_get_common_dumpit(struct sk_buff *skb,
 			goto msg_full;
 		}
 
-		ret = fe->fill_res_func(skb, has_cap_net_admin, res, port);
+		if (fill_func)
+			ret = fill_func(skb, has_cap_net_admin, res, port);
+		else
+			ret = fe->fill_res_func(skb, has_cap_net_admin,
+						res, port);
+
 		rdma_restrack_put(res);
 
 		if (ret) {
@@ -1395,17 +1411,17 @@ next:		idx++;
 	return ret;
 }
 
-#define RES_GET_FUNCS(name, type)                                              \
-	static int nldev_res_get_##name##_dumpit(struct sk_buff *skb,          \
+#define RES_GET_FUNCS(name, type)					       \
+	static int nldev_res_get_##name##_dumpit(struct sk_buff *skb,	       \
 						 struct netlink_callback *cb)  \
-	{                                                                      \
-		return res_get_common_dumpit(skb, cb, type);                   \
-	}                                                                      \
-	static int nldev_res_get_##name##_doit(struct sk_buff *skb,            \
-					       struct nlmsghdr *nlh,           \
+	{								       \
+		return res_get_common_dumpit(skb, cb, type, NULL);	       \
+	}								       \
+	static int nldev_res_get_##name##_doit(struct sk_buff *skb,	       \
+					       struct nlmsghdr *nlh,	       \
 					       struct netlink_ext_ack *extack) \
-	{                                                                      \
-		return res_get_common_doit(skb, nlh, extack, type);            \
+	{								       \
+		return res_get_common_doit(skb, nlh, extack, type, NULL);      \
 	}
 
 RES_GET_FUNCS(qp, RDMA_RESTRACK_QP);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH rdma-next v1 3/4] RDMA/nldev: Provide MR statistics
  2019-08-30  8:16 [PATCH rdma-next v1 0/4] ODP information and statistics Leon Romanovsky
  2019-08-30  8:16 ` [PATCH rdma-next v1 1/4] IB/mlx5: Introduce ODP diagnostic counters Leon Romanovsky
  2019-08-30  8:16 ` [PATCH rdma-next v1 2/4] RDMA/nldev: Allow different fill function per resource Leon Romanovsky
@ 2019-08-30  8:16 ` Leon Romanovsky
  2019-08-30 10:18   ` Parav Pandit
  2019-09-09  8:51   ` Jason Gunthorpe
  2019-08-30  8:16 ` [PATCH rdma-next v1 4/4] RDMA/mlx5: Return ODP type per MR Leon Romanovsky
  3 siblings, 2 replies; 16+ messages in thread
From: Leon Romanovsky @ 2019-08-30  8:16 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Erez Alfasi

From: Erez Alfasi <ereza@mellanox.com>

Add RDMA nldev netlink interface for dumping MR
statistics information.

Output example:
ereza@dev~$: ./ibv_rc_pingpong -o -P -s 500000000
  local address:  LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID ::

ereza@dev~$: rdma stat show mr
dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0
prefetched_pages 122071

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/device.c  |  1 +
 drivers/infiniband/core/nldev.c   | 54 +++++++++++++++++++++++++++++--
 drivers/infiniband/hw/mlx5/main.c | 16 +++++++++
 include/rdma/ib_verbs.h           |  9 ++++++
 4 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 99c4a55545cf..34a9e37c5c61 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2610,6 +2610,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, get_dma_mr);
 	SET_DEVICE_OP(dev_ops, get_hw_stats);
 	SET_DEVICE_OP(dev_ops, get_link_layer);
+	SET_DEVICE_OP(dev_ops, fill_odp_stats);
 	SET_DEVICE_OP(dev_ops, get_netdev);
 	SET_DEVICE_OP(dev_ops, get_port_immutable);
 	SET_DEVICE_OP(dev_ops, get_vector_affinity);
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 47f7fe5432db..47fee3d68cb9 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -37,6 +37,7 @@
 #include <net/netlink.h>
 #include <rdma/rdma_cm.h>
 #include <rdma/rdma_netlink.h>
+#include <rdma/ib_umem_odp.h>
 
 #include "core_priv.h"
 #include "cma_priv.h"
@@ -748,6 +749,49 @@ static int fill_stat_hwcounter_entry(struct sk_buff *msg,
 	return -EMSGSIZE;
 }
 
+static int fill_stat_mr_entry(struct sk_buff *msg, bool has_cap_net_admin,
+			      struct rdma_restrack_entry *res, uint32_t port)
+{
+	struct ib_mr *mr = container_of(res, struct ib_mr, res);
+	struct ib_device *dev = mr->pd->device;
+	struct ib_odp_counters odp_stats;
+	struct nlattr *table_attr;
+
+	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_MRN, res->id))
+		goto err;
+
+	if (!dev->ops.fill_odp_stats)
+		return 0;
+
+	if (!dev->ops.fill_odp_stats(mr, &odp_stats))
+		return 0;
+
+	table_attr = nla_nest_start(msg,
+				    RDMA_NLDEV_ATTR_STAT_HWCOUNTERS);
+
+	if (!table_attr)
+		return -EMSGSIZE;
+
+	if (fill_stat_hwcounter_entry(msg, "page_faults",
+				      (u64)atomic64_read(&odp_stats.faults)))
+		goto err_table;
+	if (fill_stat_hwcounter_entry(
+		    msg, "page_invalidations",
+		    (u64)atomic64_read(&odp_stats.invalidations)))
+		goto err_table;
+	if (fill_stat_hwcounter_entry(msg, "prefetched_pages",
+				      (u64)atomic64_read(&odp_stats.prefetched)))
+		goto err_table;
+
+	nla_nest_end(msg, table_attr);
+	return 0;
+
+err_table:
+	nla_nest_cancel(msg, table_attr);
+err:
+	return -EMSGSIZE;
+}
+
 static int fill_stat_counter_hwcounters(struct sk_buff *msg,
 					struct rdma_counter *counter)
 {
@@ -2008,7 +2052,10 @@ static int nldev_stat_get_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 	case RDMA_NLDEV_ATTR_RES_QP:
 		ret = stat_get_doit_qp(skb, nlh, extack, tb);
 		break;
-
+	case RDMA_NLDEV_ATTR_RES_MR:
+		ret = res_get_common_doit(skb, nlh, extack, RDMA_RESTRACK_MR,
+					  fill_stat_mr_entry);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -2032,7 +2079,10 @@ static int nldev_stat_get_dumpit(struct sk_buff *skb,
 	case RDMA_NLDEV_ATTR_RES_QP:
 		ret = nldev_res_get_counter_dumpit(skb, cb);
 		break;
-
+	case RDMA_NLDEV_ATTR_RES_MR:
+		ret = res_get_common_dumpit(skb, cb, RDMA_RESTRACK_MR,
+					    fill_stat_mr_entry);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 07aecba16019..05095fda03cc 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -67,6 +67,7 @@
 #include <rdma/uverbs_std_types.h>
 #include <rdma/mlx5_user_ioctl_verbs.h>
 #include <rdma/mlx5_user_ioctl_cmds.h>
+#include <rdma/ib_umem_odp.h>
 
 #define UVERBS_MODULE_NAME mlx5_ib
 #include <rdma/uverbs_named_ioctl.h>
@@ -121,6 +122,20 @@ struct mlx5_ib_dev *mlx5_ib_get_ibdev_from_mpi(struct mlx5_ib_multiport_info *mp
 	return dev;
 }
 
+static bool mlx5_ib_fill_odp_stats(struct ib_mr *ibmr,
+				   struct ib_odp_counters *cnt)
+{
+	struct mlx5_ib_mr *mr = to_mmr(ibmr);
+
+	if (!is_odp_mr(mr))
+		return false;
+
+	memcpy(cnt, &to_ib_umem_odp(mr->umem)->odp_stats,
+	       sizeof(struct ib_odp_counters));
+
+	return true;
+}
+
 static enum rdma_link_layer
 mlx5_port_type_cap_to_rdma_ll(int port_type_cap)
 {
@@ -6316,6 +6331,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 	.get_dev_fw_str = get_dev_fw_str,
 	.get_dma_mr = mlx5_ib_get_dma_mr,
 	.get_link_layer = mlx5_ib_port_link_layer,
+	.fill_odp_stats = mlx5_ib_fill_odp_stats,
 	.map_mr_sg = mlx5_ib_map_mr_sg,
 	.map_mr_sg_pi = mlx5_ib_map_mr_sg_pi,
 	.mmap = mlx5_ib_mmap,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index de5bc352f473..48d6513b3b59 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -72,6 +72,7 @@
 #define IB_FW_VERSION_NAME_MAX	ETHTOOL_FWVERS_LEN
 
 struct ib_umem_odp;
+struct ib_odp_counters;
 
 extern struct workqueue_struct *ib_wq;
 extern struct workqueue_struct *ib_comp_wq;
@@ -2566,6 +2567,14 @@ struct ib_device_ops {
 	 */
 	int (*counter_update_stats)(struct rdma_counter *counter);
 
+	/**
+	 * fill_odp_stats - Fill MR ODP stats into a given
+	 * ib_odp_counters struct.
+	 * Return value - true in case counters has been filled,
+	 * false otherwise (if its non-ODP registered MR for example).
+	 */
+	bool (*fill_odp_stats)(struct ib_mr *mr, struct ib_odp_counters *cnt);
+
 	DECLARE_RDMA_OBJ_SIZE(ib_ah);
 	DECLARE_RDMA_OBJ_SIZE(ib_cq);
 	DECLARE_RDMA_OBJ_SIZE(ib_pd);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH rdma-next v1 4/4] RDMA/mlx5: Return ODP type per MR
  2019-08-30  8:16 [PATCH rdma-next v1 0/4] ODP information and statistics Leon Romanovsky
                   ` (2 preceding siblings ...)
  2019-08-30  8:16 ` [PATCH rdma-next v1 3/4] RDMA/nldev: Provide MR statistics Leon Romanovsky
@ 2019-08-30  8:16 ` Leon Romanovsky
  2019-09-09  8:53   ` Jason Gunthorpe
  3 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2019-08-30  8:16 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Erez Alfasi

From: Erez Alfasi <ereza@mellanox.com>

Provide an ODP explicit/implicit type as part
of 'rdma -dd resource show mr' dump.

For example:
~$: rdma -dd resource show mr
dev mlx5_0 mrn 1 rkey 0xa99a lkey 0xa99a mrlen 50000000
pdn 9 pid 7372 comm ibv_rc_pingpong drv_odp explicit

For non-ODP MRs, we won't print "drv_odp ..." at all.

Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/nldev.c       | 13 ++++++++
 drivers/infiniband/hw/mlx5/Makefile   |  2 +-
 drivers/infiniband/hw/mlx5/main.c     |  1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |  2 ++
 drivers/infiniband/hw/mlx5/restrack.c | 48 +++++++++++++++++++++++++++
 include/rdma/restrack.h               |  3 ++
 6 files changed, 68 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/hw/mlx5/restrack.c

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 47fee3d68cb9..82ed8e339d8f 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -181,6 +181,19 @@ static int _rdma_nl_put_driver_u64(struct sk_buff *msg, const char *name,
 	return 0;
 }
 
+int rdma_nl_put_driver_string(struct sk_buff *msg, const char *name,
+			      const char *str)
+{
+	if (put_driver_name_print_type(msg, name,
+				       RDMA_NLDEV_PRINT_TYPE_UNSPEC))
+		return -EMSGSIZE;
+	if (nla_put_string(msg, RDMA_NLDEV_ATTR_DRIVER_STRING, str))
+		return -EMSGSIZE;
+
+	return 0;
+}
+EXPORT_SYMBOL(rdma_nl_put_driver_string);
+
 int rdma_nl_put_driver_u32(struct sk_buff *msg, const char *name, u32 value)
 {
 	return _rdma_nl_put_driver_u32(msg, name, RDMA_NLDEV_PRINT_TYPE_UNSPEC,
diff --git a/drivers/infiniband/hw/mlx5/Makefile b/drivers/infiniband/hw/mlx5/Makefile
index 9924be8384d8..d0a043ccbe58 100644
--- a/drivers/infiniband/hw/mlx5/Makefile
+++ b/drivers/infiniband/hw/mlx5/Makefile
@@ -3,7 +3,7 @@ obj-$(CONFIG_MLX5_INFINIBAND)	+= mlx5_ib.o
 
 mlx5_ib-y :=	main.o cq.o doorbell.o qp.o mem.o srq_cmd.o \
 		srq.o mr.o ah.o mad.o gsi.o ib_virt.o cmd.o \
-		cong.o
+		cong.o restrack.o
 mlx5_ib-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += odp.o
 mlx5_ib-$(CONFIG_MLX5_ESWITCH) += ib_rep.o
 mlx5_ib-$(CONFIG_INFINIBAND_USER_ACCESS) += devx.o
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 05095fda03cc..084145757549 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -6328,6 +6328,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
 	.disassociate_ucontext = mlx5_ib_disassociate_ucontext,
 	.drain_rq = mlx5_ib_drain_rq,
 	.drain_sq = mlx5_ib_drain_sq,
+	.fill_res_entry = mlx5_ib_fill_res_entry,
 	.get_dev_fw_str = get_dev_fw_str,
 	.get_dma_mr = mlx5_ib_get_dma_mr,
 	.get_link_layer = mlx5_ib_port_link_layer,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 6abfbf3a69b7..c67ede31d7dc 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1334,6 +1334,8 @@ struct mlx5_core_dev *mlx5_ib_get_native_port_mdev(struct mlx5_ib_dev *dev,
 						   u8 *native_port_num);
 void mlx5_ib_put_native_port_mdev(struct mlx5_ib_dev *dev,
 				  u8 port_num);
+int mlx5_ib_fill_res_entry(struct sk_buff *msg,
+			   struct rdma_restrack_entry *res);
 
 #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
 int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user);
diff --git a/drivers/infiniband/hw/mlx5/restrack.c b/drivers/infiniband/hw/mlx5/restrack.c
new file mode 100644
index 000000000000..427011d20250
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/restrack.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2019, Mellanox Technologies inc.  All rights reserved.
+ */
+
+#include <uapi/rdma/rdma_netlink.h>
+#include <rdma/ib_umem_odp.h>
+#include <rdma/restrack.h>
+#include "mlx5_ib.h"
+
+static int fill_res_mr_entry(struct sk_buff *msg,
+			     struct rdma_restrack_entry *res)
+{
+	struct ib_mr *ibmr = container_of(res, struct ib_mr, res);
+	struct mlx5_ib_mr *mr = to_mmr(ibmr);
+	struct nlattr *table_attr;
+
+	if (!is_odp_mr(mr))
+		return 0;
+
+	table_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_DRIVER);
+	if (!table_attr)
+		goto err;
+
+	if (to_ib_umem_odp(mr->umem)->is_implicit_odp) {
+		if (rdma_nl_put_driver_string(msg, "odp", "implicit"))
+			goto err;
+	} else {
+		if (rdma_nl_put_driver_string(msg, "odp", "explicit"))
+			goto err;
+	}
+
+	nla_nest_end(msg, table_attr);
+	return 0;
+
+err:
+	nla_nest_cancel(msg, table_attr);
+	return -EMSGSIZE;
+}
+
+int mlx5_ib_fill_res_entry(struct sk_buff *msg,
+			   struct rdma_restrack_entry *res)
+{
+	if (res->type == RDMA_RESTRACK_MR)
+		return fill_res_mr_entry(msg, res);
+
+	return 0;
+}
diff --git a/include/rdma/restrack.h b/include/rdma/restrack.h
index b0fc6b26bdf5..cb779b311703 100644
--- a/include/rdma/restrack.h
+++ b/include/rdma/restrack.h
@@ -157,6 +157,9 @@ int rdma_nl_put_driver_u32_hex(struct sk_buff *msg, const char *name,
 int rdma_nl_put_driver_u64(struct sk_buff *msg, const char *name, u64 value);
 int rdma_nl_put_driver_u64_hex(struct sk_buff *msg, const char *name,
 			       u64 value);
+int rdma_nl_put_driver_string(struct sk_buff *msg, const char *name,
+			      const char *str);
+
 struct rdma_restrack_entry *rdma_restrack_get_byid(struct ib_device *dev,
 						   enum rdma_restrack_type type,
 						   u32 id);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH rdma-next v1 3/4] RDMA/nldev: Provide MR statistics
  2019-08-30  8:16 ` [PATCH rdma-next v1 3/4] RDMA/nldev: Provide MR statistics Leon Romanovsky
@ 2019-08-30 10:18   ` Parav Pandit
  2019-08-30 11:12     ` Leon Romanovsky
  2019-09-09  8:51   ` Jason Gunthorpe
  1 sibling, 1 reply; 16+ messages in thread
From: Parav Pandit @ 2019-08-30 10:18 UTC (permalink / raw)
  To: Leon Romanovsky, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Erez Alfasi



> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Leon Romanovsky
> Sent: Friday, August 30, 2019 1:46 PM
> To: Doug Ledford <dledford@redhat.com>; Jason Gunthorpe
> <jgg@mellanox.com>
> Cc: Leon Romanovsky <leonro@mellanox.com>; RDMA mailing list <linux-
> rdma@vger.kernel.org>; Erez Alfasi <ereza@mellanox.com>
> Subject: [PATCH rdma-next v1 3/4] RDMA/nldev: Provide MR statistics
> 
> From: Erez Alfasi <ereza@mellanox.com>
> 
> Add RDMA nldev netlink interface for dumping MR statistics information.
> 
> Output example:
> ereza@dev~$: ./ibv_rc_pingpong -o -P -s 500000000
>   local address:  LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID ::
> 
> ereza@dev~$: rdma stat show mr
> dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0
> prefetched_pages 122071
> 
> Signed-off-by: Erez Alfasi <ereza@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
>  drivers/infiniband/core/device.c  |  1 +
>  drivers/infiniband/core/nldev.c   | 54 +++++++++++++++++++++++++++++--
>  drivers/infiniband/hw/mlx5/main.c | 16 +++++++++
>  include/rdma/ib_verbs.h           |  9 ++++++
>  4 files changed, 78 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/device.c
> b/drivers/infiniband/core/device.c
> index 99c4a55545cf..34a9e37c5c61 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -2610,6 +2610,7 @@ void ib_set_device_ops(struct ib_device *dev,
> const struct ib_device_ops *ops)
>  	SET_DEVICE_OP(dev_ops, get_dma_mr);
>  	SET_DEVICE_OP(dev_ops, get_hw_stats);
>  	SET_DEVICE_OP(dev_ops, get_link_layer);
> +	SET_DEVICE_OP(dev_ops, fill_odp_stats);
>  	SET_DEVICE_OP(dev_ops, get_netdev);
>  	SET_DEVICE_OP(dev_ops, get_port_immutable);
>  	SET_DEVICE_OP(dev_ops, get_vector_affinity); diff --git
> a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c index
> 47f7fe5432db..47fee3d68cb9 100644
> --- a/drivers/infiniband/core/nldev.c
> +++ b/drivers/infiniband/core/nldev.c
> @@ -37,6 +37,7 @@
>  #include <net/netlink.h>
>  #include <rdma/rdma_cm.h>
>  #include <rdma/rdma_netlink.h>
> +#include <rdma/ib_umem_odp.h>
> 
>  #include "core_priv.h"
>  #include "cma_priv.h"
> @@ -748,6 +749,49 @@ static int fill_stat_hwcounter_entry(struct sk_buff
> *msg,
>  	return -EMSGSIZE;
>  }
> 
> +static int fill_stat_mr_entry(struct sk_buff *msg, bool has_cap_net_admin,
> +			      struct rdma_restrack_entry *res, uint32_t port) {
> +	struct ib_mr *mr = container_of(res, struct ib_mr, res);
> +	struct ib_device *dev = mr->pd->device;
> +	struct ib_odp_counters odp_stats;
> +	struct nlattr *table_attr;
> +
> +	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_MRN, res->id))
> +		goto err;
> +
> +	if (!dev->ops.fill_odp_stats)
> +		return 0;
> +
> +	if (!dev->ops.fill_odp_stats(mr, &odp_stats))
> +		return 0;
> +
> +	table_attr = nla_nest_start(msg,
> +				    RDMA_NLDEV_ATTR_STAT_HWCOUNTERS);
> +
> +	if (!table_attr)
> +		return -EMSGSIZE;
> +
> +	if (fill_stat_hwcounter_entry(msg, "page_faults",
> +				      (u64)atomic64_read(&odp_stats.faults)))
> +		goto err_table;
> +	if (fill_stat_hwcounter_entry(
> +		    msg, "page_invalidations",
> +		    (u64)atomic64_read(&odp_stats.invalidations)))
> +		goto err_table;
> +	if (fill_stat_hwcounter_entry(msg, "prefetched_pages",
> +
> (u64)atomic64_read(&odp_stats.prefetched)))
> +		goto err_table;
> +
> +	nla_nest_end(msg, table_attr);
> +	return 0;
> +
> +err_table:
> +	nla_nest_cancel(msg, table_attr);
> +err:
> +	return -EMSGSIZE;
> +}
> +
>  static int fill_stat_counter_hwcounters(struct sk_buff *msg,
>  					struct rdma_counter *counter)
>  {
> @@ -2008,7 +2052,10 @@ static int nldev_stat_get_doit(struct sk_buff *skb,
> struct nlmsghdr *nlh,
>  	case RDMA_NLDEV_ATTR_RES_QP:
>  		ret = stat_get_doit_qp(skb, nlh, extack, tb);
>  		break;
> -
> +	case RDMA_NLDEV_ATTR_RES_MR:
> +		ret = res_get_common_doit(skb, nlh, extack,
> RDMA_RESTRACK_MR,
> +					  fill_stat_mr_entry);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  		break;
> @@ -2032,7 +2079,10 @@ static int nldev_stat_get_dumpit(struct sk_buff
> *skb,
>  	case RDMA_NLDEV_ATTR_RES_QP:
>  		ret = nldev_res_get_counter_dumpit(skb, cb);
>  		break;
> -
> +	case RDMA_NLDEV_ATTR_RES_MR:
> +		ret = res_get_common_dumpit(skb, cb,
> RDMA_RESTRACK_MR,
> +					    fill_stat_mr_entry);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  		break;
> diff --git a/drivers/infiniband/hw/mlx5/main.c
> b/drivers/infiniband/hw/mlx5/main.c
> index 07aecba16019..05095fda03cc 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -67,6 +67,7 @@
>  #include <rdma/uverbs_std_types.h>
>  #include <rdma/mlx5_user_ioctl_verbs.h>  #include
> <rdma/mlx5_user_ioctl_cmds.h>
> +#include <rdma/ib_umem_odp.h>
> 
>  #define UVERBS_MODULE_NAME mlx5_ib
>  #include <rdma/uverbs_named_ioctl.h>
> @@ -121,6 +122,20 @@ struct mlx5_ib_dev
> *mlx5_ib_get_ibdev_from_mpi(struct mlx5_ib_multiport_info *mp
>  	return dev;
>  }
> 
> +static bool mlx5_ib_fill_odp_stats(struct ib_mr *ibmr,
> +				   struct ib_odp_counters *cnt)
> +{
> +	struct mlx5_ib_mr *mr = to_mmr(ibmr);
> +
> +	if (!is_odp_mr(mr))
> +		return false;
> +
> +	memcpy(cnt, &to_ib_umem_odp(mr->umem)->odp_stats,
> +	       sizeof(struct ib_odp_counters));
> +
> +	return true;
> +}
> +
>  static enum rdma_link_layer
>  mlx5_port_type_cap_to_rdma_ll(int port_type_cap)  { @@ -6316,6 +6331,7
> @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
>  	.get_dev_fw_str = get_dev_fw_str,
>  	.get_dma_mr = mlx5_ib_get_dma_mr,
>  	.get_link_layer = mlx5_ib_port_link_layer,
> +	.fill_odp_stats = mlx5_ib_fill_odp_stats,
>  	.map_mr_sg = mlx5_ib_map_mr_sg,
>  	.map_mr_sg_pi = mlx5_ib_map_mr_sg_pi,
>  	.mmap = mlx5_ib_mmap,
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index
> de5bc352f473..48d6513b3b59 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -72,6 +72,7 @@
>  #define IB_FW_VERSION_NAME_MAX	ETHTOOL_FWVERS_LEN
> 
>  struct ib_umem_odp;
> +struct ib_odp_counters;
> 
>  extern struct workqueue_struct *ib_wq;
>  extern struct workqueue_struct *ib_comp_wq; @@ -2566,6 +2567,14 @@
> struct ib_device_ops {
>  	 */
>  	int (*counter_update_stats)(struct rdma_counter *counter);
> 
> +	/**
> +	 * fill_odp_stats - Fill MR ODP stats into a given
> +	 * ib_odp_counters struct.
> +	 * Return value - true in case counters has been filled,
> +	 * false otherwise (if its non-ODP registered MR for example).
> +	 */
> +	bool (*fill_odp_stats)(struct ib_mr *mr, struct ib_odp_counters
> *cnt);
> +
Requesting ODP stats on non-ODP MR is an error.
Instead of returning bool, please return int = -EINVAL as an error for non ODP MRs.

>  	DECLARE_RDMA_OBJ_SIZE(ib_ah);
>  	DECLARE_RDMA_OBJ_SIZE(ib_cq);
>  	DECLARE_RDMA_OBJ_SIZE(ib_pd);
> --
> 2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 3/4] RDMA/nldev: Provide MR statistics
  2019-08-30 10:18   ` Parav Pandit
@ 2019-08-30 11:12     ` Leon Romanovsky
  2019-08-30 12:06       ` Parav Pandit
  0 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2019-08-30 11:12 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Doug Ledford, Jason Gunthorpe, RDMA mailing list, Erez Alfasi

On Fri, Aug 30, 2019 at 10:18:35AM +0000, Parav Pandit wrote:
>
>
> > -----Original Message-----
> > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > owner@vger.kernel.org> On Behalf Of Leon Romanovsky
> > Sent: Friday, August 30, 2019 1:46 PM
> > To: Doug Ledford <dledford@redhat.com>; Jason Gunthorpe
> > <jgg@mellanox.com>
> > Cc: Leon Romanovsky <leonro@mellanox.com>; RDMA mailing list <linux-
> > rdma@vger.kernel.org>; Erez Alfasi <ereza@mellanox.com>
> > Subject: [PATCH rdma-next v1 3/4] RDMA/nldev: Provide MR statistics
> >
> > From: Erez Alfasi <ereza@mellanox.com>
> >
> > Add RDMA nldev netlink interface for dumping MR statistics information.
> >
> > Output example:
> > ereza@dev~$: ./ibv_rc_pingpong -o -P -s 500000000
> >   local address:  LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID ::
> >
> > ereza@dev~$: rdma stat show mr
> > dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0
> > prefetched_pages 122071
> >
> > Signed-off-by: Erez Alfasi <ereza@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > ---

<...>

> > struct ib_device_ops {
> >  	 */
> >  	int (*counter_update_stats)(struct rdma_counter *counter);
> >
> > +	/**
> > +	 * fill_odp_stats - Fill MR ODP stats into a given
> > +	 * ib_odp_counters struct.
> > +	 * Return value - true in case counters has been filled,
> > +	 * false otherwise (if its non-ODP registered MR for example).
> > +	 */
> > +	bool (*fill_odp_stats)(struct ib_mr *mr, struct ib_odp_counters
> > *cnt);
> > +
> Requesting ODP stats on non-ODP MR is an error.
> Instead of returning bool, please return int = -EINVAL as an error for non ODP MRs.

We don't want to add extra checks in the drivers for something that is
supposed to be checked by upper layer. If caller asks for ODP
statistics, he will check that MR is ODP backed. Especially if the need
to have ODP MR is embedded into the function name.

Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH rdma-next v1 3/4] RDMA/nldev: Provide MR statistics
  2019-08-30 11:12     ` Leon Romanovsky
@ 2019-08-30 12:06       ` Parav Pandit
  0 siblings, 0 replies; 16+ messages in thread
From: Parav Pandit @ 2019-08-30 12:06 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Jason Gunthorpe, RDMA mailing list, Erez Alfasi



> -----Original Message-----
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Friday, August 30, 2019 4:43 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Doug Ledford <dledford@redhat.com>; Jason Gunthorpe
> <jgg@mellanox.com>; RDMA mailing list <linux-rdma@vger.kernel.org>; Erez
> Alfasi <ereza@mellanox.com>
> Subject: Re: [PATCH rdma-next v1 3/4] RDMA/nldev: Provide MR statistics
> 
> On Fri, Aug 30, 2019 at 10:18:35AM +0000, Parav Pandit wrote:
> >
> >
> > > -----Original Message-----
> > > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > > owner@vger.kernel.org> On Behalf Of Leon Romanovsky
> > > Sent: Friday, August 30, 2019 1:46 PM
> > > To: Doug Ledford <dledford@redhat.com>; Jason Gunthorpe
> > > <jgg@mellanox.com>
> > > Cc: Leon Romanovsky <leonro@mellanox.com>; RDMA mailing list <linux-
> > > rdma@vger.kernel.org>; Erez Alfasi <ereza@mellanox.com>
> > > Subject: [PATCH rdma-next v1 3/4] RDMA/nldev: Provide MR statistics
> > >
> > > From: Erez Alfasi <ereza@mellanox.com>
> > >
> > > Add RDMA nldev netlink interface for dumping MR statistics information.
> > >
> > > Output example:
> > > ereza@dev~$: ./ibv_rc_pingpong -o -P -s 500000000
> > >   local address:  LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID ::
> > >
> > > ereza@dev~$: rdma stat show mr
> > > dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0
> > > prefetched_pages 122071
> > >
> > > Signed-off-by: Erez Alfasi <ereza@mellanox.com>
> > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > > ---
> 
> <...>
> 
> > > struct ib_device_ops {
> > >  	 */
> > >  	int (*counter_update_stats)(struct rdma_counter *counter);
> > >
> > > +	/**
> > > +	 * fill_odp_stats - Fill MR ODP stats into a given
> > > +	 * ib_odp_counters struct.
> > > +	 * Return value - true in case counters has been filled,
> > > +	 * false otherwise (if its non-ODP registered MR for example).
> > > +	 */
> > > +	bool (*fill_odp_stats)(struct ib_mr *mr, struct ib_odp_counters
> > > *cnt);
> > > +
> > Requesting ODP stats on non-ODP MR is an error.
> > Instead of returning bool, please return int = -EINVAL as an error for non ODP
> MRs.
> 
> We don't want to add extra checks in the drivers for something that is supposed
> to be checked by upper layer. If caller asks for ODP statistics, he will check that
> MR is ODP backed. Especially if the need to have ODP MR is embedded into the
> function name.
> 
That make sense. In that case return type should be void.
Comment says - "false otherwise (if its non-ODP registered MR for example)."

And with that logic,
Below code should be in upper layer instead of mlx5/main.c

+	if (!is_odp_mr(mr))
+		return false;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 1/4] IB/mlx5: Introduce ODP diagnostic counters
  2019-08-30  8:16 ` [PATCH rdma-next v1 1/4] IB/mlx5: Introduce ODP diagnostic counters Leon Romanovsky
@ 2019-09-09  8:45   ` Jason Gunthorpe
  2019-09-09  9:40     ` Leon Romanovsky
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2019-09-09  8:45 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Erez Alfasi

On Fri, Aug 30, 2019 at 11:16:09AM +0300, Leon Romanovsky wrote:
> From: Erez Alfasi <ereza@mellanox.com>
> 
> Introduce ODP diagnostic counters and count the following
> per MR within IB/mlx5 driver:
>  1) Page faults:
> 	Total number of faulted pages.
>  2) Page invalidations:
> 	Total number of pages invalidated by the OS during all
> 	invalidation events. The translations can be no longer
> 	valid due to either non-present pages or mapping changes.
>  3) Prefetched pages:
> 	When prefetching a page, page fault is generated
> 	in order to bring the page to the main memory.
> 	The prefetched pages counter will be updated
> 	during a page fault flow only if it was derived
> 	from prefetching operation.
> 
> Signed-off-by: Erez Alfasi <ereza@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/hw/mlx5/odp.c | 18 ++++++++++++++++++
>  include/rdma/ib_umem_odp.h       | 14 ++++++++++++++
>  2 files changed, 32 insertions(+)
> 
> diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
> index 905936423a03..b7c8a49ac753 100644
> +++ b/drivers/infiniband/hw/mlx5/odp.c
> @@ -287,6 +287,10 @@ void mlx5_ib_invalidate_range(struct ib_umem_odp *umem_odp, unsigned long start,
>  
>  	ib_umem_odp_unmap_dma_pages(umem_odp, start, end);
>  
> +	/* Count page invalidations */
> +	ib_update_odp_stats(umem_odp, invalidations,
> +			    ib_umem_odp_num_pages(umem_odp));

This doesn't seem right, it should only count the number of pages that
were actually removed from the mapping

> diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h
> index b37c674b7fe6..3359e34516da 100644
> +++ b/include/rdma/ib_umem_odp.h
> @@ -37,6 +37,12 @@
>  #include <rdma/ib_verbs.h>
>  #include <linux/interval_tree.h>
>  
> +struct ib_odp_counters {
> +	atomic64_t faults;
> +	atomic64_t invalidations;
> +	atomic64_t prefetched;
> +};
> +
>  struct ib_umem_odp {
>  	struct ib_umem umem;
>  	struct ib_ucontext_per_mm *per_mm;
> @@ -62,6 +68,11 @@ struct ib_umem_odp {
>  	struct mutex		umem_mutex;
>  	void			*private; /* for the HW driver to use. */
>  
> +	/*
> +	 * ODP diagnostic counters.
> +	 */
> +	struct ib_odp_counters odp_stats;

This really belongs in the MR not the umem

>  	int notifiers_seq;
>  	int notifiers_count;
>  	int npages;
> @@ -106,6 +117,9 @@ static inline size_t ib_umem_odp_num_pages(struct ib_umem_odp *umem_odp)
>  	       umem_odp->page_shift;
>  }
>  
> +#define ib_update_odp_stats(umem_odp, counter_name, value)		    \
> +	atomic64_add(value, &(umem_odp->odp_stats.counter_name))

Missing brackets in a macro

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 2/4] RDMA/nldev: Allow different fill function per resource
  2019-08-30  8:16 ` [PATCH rdma-next v1 2/4] RDMA/nldev: Allow different fill function per resource Leon Romanovsky
@ 2019-09-09  8:48   ` Jason Gunthorpe
  2019-09-09  9:46     ` Leon Romanovsky
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2019-09-09  8:48 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Erez Alfasi

On Fri, Aug 30, 2019 at 11:16:10AM +0300, Leon Romanovsky wrote:
> From: Erez Alfasi <ereza@mellanox.com>
> 
> So far res_get_common_{dumpit, doit} was using the default
> resource fill function which was defined as part of the
> nldev_fill_res_entry fill_entries.
> 
> Add a fill function pointer as an argument allows us to use
> different fill function in case we want to dump different
> values then 'rdma resource' flow do, but still use the same
> existing general resources dumping flow.
> 
> If a NULL value is passed, it will be using the default
> fill function that was defined in 'fill_entries' for a
> given resource type.
> 
> Signed-off-by: Erez Alfasi <ereza@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/core/nldev.c | 42 +++++++++++++++++++++++----------
>  1 file changed, 29 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index cc08218f1ef7..47f7fe5432db 100644
> +++ b/drivers/infiniband/core/nldev.c
> @@ -1181,7 +1181,10 @@ static const struct nldev_fill_res_entry fill_entries[RDMA_RESTRACK_MAX] = {
>  
>  static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
>  			       struct netlink_ext_ack *extack,
> -			       enum rdma_restrack_type res_type)
> +			       enum rdma_restrack_type res_type,
> +			       int (*fill_func)(struct sk_buff*, bool,
> +						struct rdma_restrack_entry*,
> +						uint32_t))

Use a typedef?

>  {
>  	const struct nldev_fill_res_entry *fe = &fill_entries[res_type];
>  	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
> @@ -1244,7 +1247,12 @@ static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
>  	}
>  
>  	has_cap_net_admin = netlink_capable(skb, CAP_NET_ADMIN);
> -	ret = fe->fill_res_func(msg, has_cap_net_admin, res, port);
> +
> +	if (fill_func)
> +		ret = fill_func(msg, has_cap_net_admin, res, port);
> +	else
> +		ret = fe->fill_res_func(msg, has_cap_net_admin, res, port);

Weird to now be choosing between two function pointers

> -#define RES_GET_FUNCS(name, type)                                              \
> -	static int nldev_res_get_##name##_dumpit(struct sk_buff *skb,          \
> +#define RES_GET_FUNCS(name, type)					       \
> +	static int nldev_res_get_##name##_dumpit(struct sk_buff *skb,	       \
>  						 struct netlink_callback *cb)  \
> -	{                                                                      \
> -		return res_get_common_dumpit(skb, cb, type);                   \
> -	}                                                                      \
> -	static int nldev_res_get_##name##_doit(struct sk_buff *skb,            \
> -					       struct nlmsghdr *nlh,           \
> +	{								       \
> +		return res_get_common_dumpit(skb, cb, type, NULL);	       \
> +	}								       \
> +	static int nldev_res_get_##name##_doit(struct sk_buff *skb,	       \
> +					       struct nlmsghdr *nlh,	       \
>  					       struct netlink_ext_ack *extack) \
> -	{                                                                      \
> -		return res_get_common_doit(skb, nlh, extack, type);            \
> +	{								       \
> +		return res_get_common_doit(skb, nlh, extack, type, NULL);      \
>  	}

ie the NULL should be fill_entries[type]->fill_res_func?

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 3/4] RDMA/nldev: Provide MR statistics
  2019-08-30  8:16 ` [PATCH rdma-next v1 3/4] RDMA/nldev: Provide MR statistics Leon Romanovsky
  2019-08-30 10:18   ` Parav Pandit
@ 2019-09-09  8:51   ` Jason Gunthorpe
  2019-09-09 10:00     ` Leon Romanovsky
  1 sibling, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2019-09-09  8:51 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Erez Alfasi

On Fri, Aug 30, 2019 at 11:16:11AM +0300, Leon Romanovsky wrote:
> From: Erez Alfasi <ereza@mellanox.com>
> 
> Add RDMA nldev netlink interface for dumping MR
> statistics information.
> 
> Output example:
> ereza@dev~$: ./ibv_rc_pingpong -o -P -s 500000000
>   local address:  LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID ::
> 
> ereza@dev~$: rdma stat show mr
> dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0
> prefetched_pages 122071
> 
> Signed-off-by: Erez Alfasi <ereza@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/core/device.c  |  1 +
>  drivers/infiniband/core/nldev.c   | 54 +++++++++++++++++++++++++++++--
>  drivers/infiniband/hw/mlx5/main.c | 16 +++++++++
>  include/rdma/ib_verbs.h           |  9 ++++++
>  4 files changed, 78 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index 99c4a55545cf..34a9e37c5c61 100644
> +++ b/drivers/infiniband/core/device.c
> @@ -2610,6 +2610,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
>  	SET_DEVICE_OP(dev_ops, get_dma_mr);
>  	SET_DEVICE_OP(dev_ops, get_hw_stats);
>  	SET_DEVICE_OP(dev_ops, get_link_layer);
> +	SET_DEVICE_OP(dev_ops, fill_odp_stats);
>  	SET_DEVICE_OP(dev_ops, get_netdev);
>  	SET_DEVICE_OP(dev_ops, get_port_immutable);
>  	SET_DEVICE_OP(dev_ops, get_vector_affinity);

I'm now curious what motivated placing the line here, apparently
randomly in a sorted list?

> +static int fill_stat_mr_entry(struct sk_buff *msg, bool has_cap_net_admin,
> +			      struct rdma_restrack_entry *res, uint32_t port)
> +{
> +	struct ib_mr *mr = container_of(res, struct ib_mr, res);
> +	struct ib_device *dev = mr->pd->device;
> +	struct ib_odp_counters odp_stats;
> +	struct nlattr *table_attr;
> +
> +	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_MRN, res->id))
> +		goto err;
> +
> +	if (!dev->ops.fill_odp_stats)
> +		return 0;
> +
> +	if (!dev->ops.fill_odp_stats(mr, &odp_stats))
> +		return 0;

As Parav says this seems to be wrong for !ODP MRs. Can we even detect
!ODP at this point?

> +
> +	table_attr = nla_nest_start(msg,
> +				    RDMA_NLDEV_ATTR_STAT_HWCOUNTERS);
> +
> +	if (!table_attr)
> +		return -EMSGSIZE;
> +
> +	if (fill_stat_hwcounter_entry(msg, "page_faults",
> +				      (u64)atomic64_read(&odp_stats.faults)))

Why the cast?


> +static bool mlx5_ib_fill_odp_stats(struct ib_mr *ibmr,
> +				   struct ib_odp_counters *cnt)
> +{
> +	struct mlx5_ib_mr *mr = to_mmr(ibmr);
> +
> +	if (!is_odp_mr(mr))
> +		return false;
> +
> +	memcpy(cnt, &to_ib_umem_odp(mr->umem)->odp_stats,
> +	       sizeof(struct ib_odp_counters));

Can't memcpy atomic64, have to open code a copy using atomic64_read.

> @@ -6316,6 +6331,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
>  	.get_dev_fw_str = get_dev_fw_str,
>  	.get_dma_mr = mlx5_ib_get_dma_mr,
>  	.get_link_layer = mlx5_ib_port_link_layer,
> +	.fill_odp_stats = mlx5_ib_fill_odp_stats,

Randomly again..

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 4/4] RDMA/mlx5: Return ODP type per MR
  2019-08-30  8:16 ` [PATCH rdma-next v1 4/4] RDMA/mlx5: Return ODP type per MR Leon Romanovsky
@ 2019-09-09  8:53   ` Jason Gunthorpe
  2019-09-09 10:01     ` Leon Romanovsky
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2019-09-09  8:53 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Erez Alfasi

On Fri, Aug 30, 2019 at 11:16:12AM +0300, Leon Romanovsky wrote:

> +static int fill_res_mr_entry(struct sk_buff *msg,
> +			     struct rdma_restrack_entry *res)
> +{
> +	struct ib_mr *ibmr = container_of(res, struct ib_mr, res);
> +	struct mlx5_ib_mr *mr = to_mmr(ibmr);
> +	struct nlattr *table_attr;
> +
> +	if (!is_odp_mr(mr))
> +		return 0;
> +
> +	table_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_DRIVER);
> +	if (!table_attr)
> +		goto err;
> +
> +	if (to_ib_umem_odp(mr->umem)->is_implicit_odp) {
> +		if (rdma_nl_put_driver_string(msg, "odp", "implicit"))
> +			goto err;
> +	} else {
> +		if (rdma_nl_put_driver_string(msg, "odp", "explicit"))
> +			goto err;
> +	}
> +
> +	nla_nest_end(msg, table_attr);
> +	return 0;
> +
> +err:
> +	nla_nest_cancel(msg, table_attr);
> +	return -EMSGSIZE;
> +}

If we are having a custom fill function why not fill the ODP stats
here instead of adding that callback and memcopies/etc

rdma_nl_put_odp_stats(...)

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 1/4] IB/mlx5: Introduce ODP diagnostic counters
  2019-09-09  8:45   ` Jason Gunthorpe
@ 2019-09-09  9:40     ` Leon Romanovsky
  0 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2019-09-09  9:40 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, RDMA mailing list, Erez Alfasi

On Mon, Sep 09, 2019 at 08:45:36AM +0000, Jason Gunthorpe wrote:
> On Fri, Aug 30, 2019 at 11:16:09AM +0300, Leon Romanovsky wrote:
> > From: Erez Alfasi <ereza@mellanox.com>
> >
> > Introduce ODP diagnostic counters and count the following
> > per MR within IB/mlx5 driver:
> >  1) Page faults:
> > 	Total number of faulted pages.
> >  2) Page invalidations:
> > 	Total number of pages invalidated by the OS during all
> > 	invalidation events. The translations can be no longer
> > 	valid due to either non-present pages or mapping changes.
> >  3) Prefetched pages:
> > 	When prefetching a page, page fault is generated
> > 	in order to bring the page to the main memory.
> > 	The prefetched pages counter will be updated
> > 	during a page fault flow only if it was derived
> > 	from prefetching operation.
> >
> > Signed-off-by: Erez Alfasi <ereza@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >  drivers/infiniband/hw/mlx5/odp.c | 18 ++++++++++++++++++
> >  include/rdma/ib_umem_odp.h       | 14 ++++++++++++++
> >  2 files changed, 32 insertions(+)
> >
> > diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
> > index 905936423a03..b7c8a49ac753 100644
> > +++ b/drivers/infiniband/hw/mlx5/odp.c
> > @@ -287,6 +287,10 @@ void mlx5_ib_invalidate_range(struct ib_umem_odp *umem_odp, unsigned long start,
> >
> >  	ib_umem_odp_unmap_dma_pages(umem_odp, start, end);
> >
> > +	/* Count page invalidations */
> > +	ib_update_odp_stats(umem_odp, invalidations,
> > +			    ib_umem_odp_num_pages(umem_odp));
>
> This doesn't seem right, it should only count the number of pages that
> were actually removed from the mapping

Absolutely, It is my mistake, I think that I mislead Erez back then.

>
> > diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h
> > index b37c674b7fe6..3359e34516da 100644
> > +++ b/include/rdma/ib_umem_odp.h
> > @@ -37,6 +37,12 @@
> >  #include <rdma/ib_verbs.h>
> >  #include <linux/interval_tree.h>
> >
> > +struct ib_odp_counters {
> > +	atomic64_t faults;
> > +	atomic64_t invalidations;
> > +	atomic64_t prefetched;
> > +};
> > +
> >  struct ib_umem_odp {
> >  	struct ib_umem umem;
> >  	struct ib_ucontext_per_mm *per_mm;
> > @@ -62,6 +68,11 @@ struct ib_umem_odp {
> >  	struct mutex		umem_mutex;
> >  	void			*private; /* for the HW driver to use. */
> >
> > +	/*
> > +	 * ODP diagnostic counters.
> > +	 */
> > +	struct ib_odp_counters odp_stats;
>
> This really belongs in the MR not the umem

Bu why? We have umem_odp, all our statistics calculations are done
in umem_odp level.

>
> >  	int notifiers_seq;
> >  	int notifiers_count;
> >  	int npages;
> > @@ -106,6 +117,9 @@ static inline size_t ib_umem_odp_num_pages(struct ib_umem_odp *umem_odp)
> >  	       umem_odp->page_shift;
> >  }
> >
> > +#define ib_update_odp_stats(umem_odp, counter_name, value)		    \
> > +	atomic64_add(value, &(umem_odp->odp_stats.counter_name))
>
> Missing brackets in a macro

I am doubt about that, because it is one to one replace and not "real" macro.

Thanks

>
> Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 2/4] RDMA/nldev: Allow different fill function per resource
  2019-09-09  8:48   ` Jason Gunthorpe
@ 2019-09-09  9:46     ` Leon Romanovsky
  0 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2019-09-09  9:46 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, RDMA mailing list, Erez Alfasi

On Mon, Sep 09, 2019 at 08:48:09AM +0000, Jason Gunthorpe wrote:
> On Fri, Aug 30, 2019 at 11:16:10AM +0300, Leon Romanovsky wrote:
> > From: Erez Alfasi <ereza@mellanox.com>
> >
> > So far res_get_common_{dumpit, doit} was using the default
> > resource fill function which was defined as part of the
> > nldev_fill_res_entry fill_entries.
> >
> > Add a fill function pointer as an argument allows us to use
> > different fill function in case we want to dump different
> > values then 'rdma resource' flow do, but still use the same
> > existing general resources dumping flow.
> >
> > If a NULL value is passed, it will be using the default
> > fill function that was defined in 'fill_entries' for a
> > given resource type.
> >
> > Signed-off-by: Erez Alfasi <ereza@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >  drivers/infiniband/core/nldev.c | 42 +++++++++++++++++++++++----------
> >  1 file changed, 29 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> > index cc08218f1ef7..47f7fe5432db 100644
> > +++ b/drivers/infiniband/core/nldev.c
> > @@ -1181,7 +1181,10 @@ static const struct nldev_fill_res_entry fill_entries[RDMA_RESTRACK_MAX] = {
> >
> >  static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
> >  			       struct netlink_ext_ack *extack,
> > -			       enum rdma_restrack_type res_type)
> > +			       enum rdma_restrack_type res_type,
> > +			       int (*fill_func)(struct sk_buff*, bool,
> > +						struct rdma_restrack_entry*,
> > +						uint32_t))
>
> Use a typedef?

I'll take a look on that, but it is not fully clear to me what are the
gains here.

>
> >  {
> >  	const struct nldev_fill_res_entry *fe = &fill_entries[res_type];
> >  	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
> > @@ -1244,7 +1247,12 @@ static int res_get_common_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
> >  	}
> >
> >  	has_cap_net_admin = netlink_capable(skb, CAP_NET_ADMIN);
> > -	ret = fe->fill_res_func(msg, has_cap_net_admin, res, port);
> > +
> > +	if (fill_func)
> > +		ret = fill_func(msg, has_cap_net_admin, res, port);
> > +	else
> > +		ret = fe->fill_res_func(msg, has_cap_net_admin, res, port);
>
> Weird to now be choosing between two function pointers

I didn't like this either, but we didn't find an easy solution to do
chains of fill functions. In our case, we are requesting COUNTER object
which will call to MR object to fill statistic.

>
> > -#define RES_GET_FUNCS(name, type)                                              \
> > -	static int nldev_res_get_##name##_dumpit(struct sk_buff *skb,          \
> > +#define RES_GET_FUNCS(name, type)					       \
> > +	static int nldev_res_get_##name##_dumpit(struct sk_buff *skb,	       \
> >  						 struct netlink_callback *cb)  \
> > -	{                                                                      \
> > -		return res_get_common_dumpit(skb, cb, type);                   \
> > -	}                                                                      \
> > -	static int nldev_res_get_##name##_doit(struct sk_buff *skb,            \
> > -					       struct nlmsghdr *nlh,           \
> > +	{								       \
> > +		return res_get_common_dumpit(skb, cb, type, NULL);	       \
> > +	}								       \
> > +	static int nldev_res_get_##name##_doit(struct sk_buff *skb,	       \
> > +					       struct nlmsghdr *nlh,	       \
> >  					       struct netlink_ext_ack *extack) \
> > -	{                                                                      \
> > -		return res_get_common_doit(skb, nlh, extack, type);            \
> > +	{								       \
> > +		return res_get_common_doit(skb, nlh, extack, type, NULL);      \
> >  	}
>
> ie the NULL should be fill_entries[type]->fill_res_func?

The "if (fill_func) " above will do the trick.

>
> Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 3/4] RDMA/nldev: Provide MR statistics
  2019-09-09  8:51   ` Jason Gunthorpe
@ 2019-09-09 10:00     ` Leon Romanovsky
  0 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2019-09-09 10:00 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, RDMA mailing list, Erez Alfasi

On Mon, Sep 09, 2019 at 08:51:50AM +0000, Jason Gunthorpe wrote:
> On Fri, Aug 30, 2019 at 11:16:11AM +0300, Leon Romanovsky wrote:
> > From: Erez Alfasi <ereza@mellanox.com>
> >
> > Add RDMA nldev netlink interface for dumping MR
> > statistics information.
> >
> > Output example:
> > ereza@dev~$: ./ibv_rc_pingpong -o -P -s 500000000
> >   local address:  LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID ::
> >
> > ereza@dev~$: rdma stat show mr
> > dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0
> > prefetched_pages 122071
> >
> > Signed-off-by: Erez Alfasi <ereza@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >  drivers/infiniband/core/device.c  |  1 +
> >  drivers/infiniband/core/nldev.c   | 54 +++++++++++++++++++++++++++++--
> >  drivers/infiniband/hw/mlx5/main.c | 16 +++++++++
> >  include/rdma/ib_verbs.h           |  9 ++++++
> >  4 files changed, 78 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> > index 99c4a55545cf..34a9e37c5c61 100644
> > +++ b/drivers/infiniband/core/device.c
> > @@ -2610,6 +2610,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
> >  	SET_DEVICE_OP(dev_ops, get_dma_mr);
> >  	SET_DEVICE_OP(dev_ops, get_hw_stats);
> >  	SET_DEVICE_OP(dev_ops, get_link_layer);
> > +	SET_DEVICE_OP(dev_ops, fill_odp_stats);
> >  	SET_DEVICE_OP(dev_ops, get_netdev);
> >  	SET_DEVICE_OP(dev_ops, get_port_immutable);
> >  	SET_DEVICE_OP(dev_ops, get_vector_affinity);
>
> I'm now curious what motivated placing the line here, apparently
> randomly in a sorted list?

desire to be different and express yourself?

>
> > +static int fill_stat_mr_entry(struct sk_buff *msg, bool has_cap_net_admin,
> > +			      struct rdma_restrack_entry *res, uint32_t port)
> > +{
> > +	struct ib_mr *mr = container_of(res, struct ib_mr, res);
> > +	struct ib_device *dev = mr->pd->device;
> > +	struct ib_odp_counters odp_stats;
> > +	struct nlattr *table_attr;
> > +
> > +	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_MRN, res->id))
> > +		goto err;
> > +
> > +	if (!dev->ops.fill_odp_stats)
> > +		return 0;
> > +
> > +	if (!dev->ops.fill_odp_stats(mr, &odp_stats))
> > +		return 0;
>
> As Parav says this seems to be wrong for !ODP MRs. Can we even detect
> !ODP at this point?

ODP is decided on UMEM level which you said should be driver property
and it means that driver should distinguish between odp/not-odp.

>
> > +
> > +	table_attr = nla_nest_start(msg,
> > +				    RDMA_NLDEV_ATTR_STAT_HWCOUNTERS);
> > +
> > +	if (!table_attr)
> > +		return -EMSGSIZE;
> > +
> > +	if (fill_stat_hwcounter_entry(msg, "page_faults",
> > +				      (u64)atomic64_read(&odp_stats.faults)))
>
> Why the cast?

atomic64_read returns s64 and not u64, I didn't see need to investigate
if s64 == u64 in all architectures.
>
>
> > +static bool mlx5_ib_fill_odp_stats(struct ib_mr *ibmr,
> > +				   struct ib_odp_counters *cnt)
> > +{
> > +	struct mlx5_ib_mr *mr = to_mmr(ibmr);
> > +
> > +	if (!is_odp_mr(mr))
> > +		return false;
> > +
> > +	memcpy(cnt, &to_ib_umem_odp(mr->umem)->odp_stats,
> > +	       sizeof(struct ib_odp_counters));
>
> Can't memcpy atomic64, have to open code a copy using atomic64_read.

Right

>
> > @@ -6316,6 +6331,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
> >  	.get_dev_fw_str = get_dev_fw_str,
> >  	.get_dma_mr = mlx5_ib_get_dma_mr,
> >  	.get_link_layer = mlx5_ib_port_link_layer,
> > +	.fill_odp_stats = mlx5_ib_fill_odp_stats,
>
> Randomly again..
>
> Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH rdma-next v1 4/4] RDMA/mlx5: Return ODP type per MR
  2019-09-09  8:53   ` Jason Gunthorpe
@ 2019-09-09 10:01     ` Leon Romanovsky
  0 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2019-09-09 10:01 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, RDMA mailing list, Erez Alfasi

On Mon, Sep 09, 2019 at 08:53:57AM +0000, Jason Gunthorpe wrote:
> On Fri, Aug 30, 2019 at 11:16:12AM +0300, Leon Romanovsky wrote:
>
> > +static int fill_res_mr_entry(struct sk_buff *msg,
> > +			     struct rdma_restrack_entry *res)
> > +{
> > +	struct ib_mr *ibmr = container_of(res, struct ib_mr, res);
> > +	struct mlx5_ib_mr *mr = to_mmr(ibmr);
> > +	struct nlattr *table_attr;
> > +
> > +	if (!is_odp_mr(mr))
> > +		return 0;
> > +
> > +	table_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_DRIVER);
> > +	if (!table_attr)
> > +		goto err;
> > +
> > +	if (to_ib_umem_odp(mr->umem)->is_implicit_odp) {
> > +		if (rdma_nl_put_driver_string(msg, "odp", "implicit"))
> > +			goto err;
> > +	} else {
> > +		if (rdma_nl_put_driver_string(msg, "odp", "explicit"))
> > +			goto err;
> > +	}
> > +
> > +	nla_nest_end(msg, table_attr);
> > +	return 0;
> > +
> > +err:
> > +	nla_nest_cancel(msg, table_attr);
> > +	return -EMSGSIZE;
> > +}
>
> If we are having a custom fill function why not fill the ODP stats
> here instead of adding that callback and memcopies/etc

It makes sense.

Thanks

>
> rdma_nl_put_odp_stats(...)
>
> Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, back to index

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-30  8:16 [PATCH rdma-next v1 0/4] ODP information and statistics Leon Romanovsky
2019-08-30  8:16 ` [PATCH rdma-next v1 1/4] IB/mlx5: Introduce ODP diagnostic counters Leon Romanovsky
2019-09-09  8:45   ` Jason Gunthorpe
2019-09-09  9:40     ` Leon Romanovsky
2019-08-30  8:16 ` [PATCH rdma-next v1 2/4] RDMA/nldev: Allow different fill function per resource Leon Romanovsky
2019-09-09  8:48   ` Jason Gunthorpe
2019-09-09  9:46     ` Leon Romanovsky
2019-08-30  8:16 ` [PATCH rdma-next v1 3/4] RDMA/nldev: Provide MR statistics Leon Romanovsky
2019-08-30 10:18   ` Parav Pandit
2019-08-30 11:12     ` Leon Romanovsky
2019-08-30 12:06       ` Parav Pandit
2019-09-09  8:51   ` Jason Gunthorpe
2019-09-09 10:00     ` Leon Romanovsky
2019-08-30  8:16 ` [PATCH rdma-next v1 4/4] RDMA/mlx5: Return ODP type per MR Leon Romanovsky
2019-09-09  8:53   ` Jason Gunthorpe
2019-09-09 10:01     ` Leon Romanovsky

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org linux-rdma@archiver.kernel.org
	public-inbox-index linux-rdma


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/ public-inbox