netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA
@ 2019-06-25 20:57 Saeed Mahameed
  2019-06-25 20:57 ` [for-next V2 01/10] linux/dim: Move logic to dim.h Saeed Mahameed
                   ` (11 more replies)
  0 siblings, 12 replies; 33+ messages in thread
From: Saeed Mahameed @ 2019-06-25 20:57 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Sagi Grimberg, Tal Gilboa, netdev,
	linux-rdma, Saeed Mahameed

Hi Dave, Doug & Jason

This series improves DIM - Dynamically-tuned Interrupt
Moderation- to be generic for netdev and RDMA use-cases.

From Tal and Yamin:

First 7 patches provide the necessary refactoring to current net_dim
library which affect some net drivers who are using the API.

The last 3 patches provide the RDMA implementation for DIM.
These patches are included in this pull request and they are posted
for review visibility only, they will be handled by the rdma tree later
on in this kernel release.

For more information please see tag log below.

Once we are all happy with the series, please pull to net-next and
rdma-next trees.

v1 for reference: 
(https://marc.info/?l=linux-netdev&m=155977708016030&w=2)

Changes since v2:
- added per ib device configuration knob for rdma-dim (Sagi)
- add NL directives for user-space / rdma tool to configure rdma dim (Sagi/Leon)
- use one header file for DIM implementations (Leon)
- various point changes in the rdma dim related code in the IB core (Leon)
- removed the RDMA specific patches form this pull request\

Thanks,
Saeed.

---
The following changes since commit cd6c84d8f0cdc911df435bb075ba22ce3c605b07:

  Linux 5.2-rc2 (2019-05-26 16:49:19 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/blk-dim-v2

for you to fetch changes up to 398c2b05bbee21cc172dfff017c0351d4d14e04c:

  linux/dim: Add completions count to dim_sample (2019-06-25 13:46:40 -0700)

----------------------------------------------------------------
Generic DIM

From: Tal Gilboa and Yamin Fridman

Implement net DIM over a generic DIM library, add RDMA DIM

dim.h lib exposes an implementation of the DIM algorithm for
dynamically-tuned interrupt moderation for networking interfaces.

We want a similar functionality for other protocols, which might need to
optimize interrupts differently. Main motivation here is DIM for NVMf
storage protocol.

Current DIM implementation prioritizes reducing interrupt overhead over
latency. Also, in order to reduce DIM's own overhead, the algorithm might
take some time to identify it needs to change profiles. While this is
acceptable for networking, it might not work well on other scenarios.

Here we propose a new structure to DIM. The idea is to allow a slightly
modified functionality without the risk of breaking Net DIM behavior for
netdev. We verified there are no degradations in current DIM behavior with
the modified solution.

Suggested solution:
- Common logic is implemented in lib/dim/dim.c
- Net DIM (existing) logic is implemented in lib/dim/net_dim.c, which uses
  the common logic in dim.c
- Any new DIM logic will be implemented in "lib/dim/new_dim.c".
  This new implementation will expose modified versions of profiles,
  dim_step() and dim_decision().
- DIM API is declared in include/linux/dim.h for all implementations.

Pros for this solution are:
- Zero impact on existing net_dim implementation and usage
- Relatively more code reuse (compared to two separate solutions)
- Increased extensibility

----------------------------------------------------------------
Tal Gilboa (6):
      linux/dim: Move logic to dim.h
      linux/dim: Remove "net" prefix from internal DIM members
      linux/dim: Rename externally exposed macros
      linux/dim: Rename net_dim_sample() to net_dim_update_sample()
      linux/dim: Rename externally used net_dim members
      linux/dim: Move implementation to .c files

Yamin Friedman (1):
      linux/dim: Add completions count to dim_sample

 MAINTAINERS                                        |   3 +-
 drivers/net/ethernet/broadcom/Kconfig              |   1 +
 drivers/net/ethernet/broadcom/bcmsysport.c         |  20 +-
 drivers/net/ethernet/broadcom/bcmsysport.h         |   4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |  12 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h          |   4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c  |   6 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c      |   9 +-
 drivers/net/ethernet/broadcom/genet/bcmgenet.c     |  18 +-
 drivers/net/ethernet/broadcom/genet/bcmgenet.h     |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig    |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  10 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  14 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  22 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  10 +-
 include/linux/dim.h                                | 366 ++++++++++++++++++
 include/linux/net_dim.h                            | 418 ---------------------
 lib/Kconfig                                        |   8 +
 lib/Makefile                                       |   1 +
 lib/dim/Makefile                                   |   9 +
 lib/dim/dim.c                                      |  83 ++++
 lib/dim/net_dim.c                                  | 190 ++++++++++
 23 files changed, 728 insertions(+), 489 deletions(-)
 create mode 100644 include/linux/dim.h
 delete mode 100644 include/linux/net_dim.h
 create mode 100644 lib/dim/Makefile
 create mode 100644 lib/dim/dim.c
 create mode 100644 lib/dim/net_dim.c

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [for-next V2 01/10] linux/dim: Move logic to dim.h
  2019-06-25 20:57 [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
@ 2019-06-25 20:57 ` Saeed Mahameed
  2019-06-25 21:53   ` Sagi Grimberg
  2019-06-25 20:57 ` [for-next V2 02/10] linux/dim: Remove "net" prefix from internal DIM members Saeed Mahameed
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2019-06-25 20:57 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Sagi Grimberg, Tal Gilboa, netdev,
	linux-rdma, Saeed Mahameed

From: Tal Gilboa <talgi@mellanox.com>

In preparation for supporting more implementations of the DIM
algorithm, I'm moving what would become common logic to a common
library. Downstream DIM implementations will use the common lib
for their implementation.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 MAINTAINERS             |   1 +
 include/linux/dim.h     | 153 ++++++++++++++++++++++++++++++++++++++++
 include/linux/net_dim.h | 148 +-------------------------------------
 3 files changed, 156 insertions(+), 146 deletions(-)
 create mode 100644 include/linux/dim.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 429c6c624861..5d4b852d9d39 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5589,6 +5589,7 @@ DYNAMIC INTERRUPT MODERATION
 M:	Tal Gilboa <talgi@mellanox.com>
 S:	Maintained
 F:	include/linux/net_dim.h
+F:	include/linux/dim.h
 
 DZ DECSTATION DZ11 SERIAL DRIVER
 M:	"Maciej W. Rozycki" <macro@linux-mips.org>
diff --git a/include/linux/dim.h b/include/linux/dim.h
new file mode 100644
index 000000000000..67d7ca40f3dd
--- /dev/null
+++ b/include/linux/dim.h
@@ -0,0 +1,153 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2019 Mellanox Technologies. */
+
+#ifndef DIM_H
+#define DIM_H
+
+#include <linux/module.h>
+
+#define NET_DIM_NEVENTS 64
+
+/* more than 10% difference */
+#define IS_SIGNIFICANT_DIFF(val, ref) \
+	(((100UL * abs((val) - (ref))) / (ref)) > 10)
+#define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) \
+& (BIT_ULL(bits) - 1))
+
+struct net_dim_cq_moder {
+	u16 usec;
+	u16 pkts;
+	u8 cq_period_mode;
+};
+
+struct net_dim_sample {
+	ktime_t time;
+	u32 pkt_ctr;
+	u32 byte_ctr;
+	u16 event_ctr;
+};
+
+struct net_dim_stats {
+	int ppms; /* packets per msec */
+	int bpms; /* bytes per msec */
+	int epms; /* events per msec */
+};
+
+struct net_dim { /* Dynamic Interrupt Moderation */
+	u8 state;
+	struct net_dim_stats prev_stats;
+	struct net_dim_sample start_sample;
+	struct work_struct work;
+	u8 profile_ix;
+	u8 mode;
+	u8 tune_state;
+	u8 steps_right;
+	u8 steps_left;
+	u8 tired;
+};
+
+enum {
+	NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
+	NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
+	NET_DIM_CQ_PERIOD_NUM_MODES
+};
+
+enum {
+	NET_DIM_START_MEASURE,
+	NET_DIM_MEASURE_IN_PROGRESS,
+	NET_DIM_APPLY_NEW_PROFILE,
+};
+
+enum {
+	NET_DIM_PARKING_ON_TOP,
+	NET_DIM_PARKING_TIRED,
+	NET_DIM_GOING_RIGHT,
+	NET_DIM_GOING_LEFT,
+};
+
+enum {
+	NET_DIM_STATS_WORSE,
+	NET_DIM_STATS_SAME,
+	NET_DIM_STATS_BETTER,
+};
+
+enum {
+	NET_DIM_STEPPED,
+	NET_DIM_TOO_TIRED,
+	NET_DIM_ON_EDGE,
+};
+
+static inline bool net_dim_on_top(struct net_dim *net_dim)
+{
+	switch (net_dim->tune_state) {
+	case NET_DIM_PARKING_ON_TOP:
+	case NET_DIM_PARKING_TIRED:
+		return true;
+	case NET_DIM_GOING_RIGHT:
+		return (net_dim->steps_left > 1) && (net_dim->steps_right == 1);
+	default: /* NET_DIM_GOING_LEFT */
+		return (net_dim->steps_right > 1) && (net_dim->steps_left == 1);
+	}
+}
+
+static inline void net_dim_turn(struct net_dim *net_dim)
+{
+	switch (net_dim->tune_state) {
+	case NET_DIM_PARKING_ON_TOP:
+	case NET_DIM_PARKING_TIRED:
+		break;
+	case NET_DIM_GOING_RIGHT:
+		net_dim->tune_state = NET_DIM_GOING_LEFT;
+		net_dim->steps_left = 0;
+		break;
+	case NET_DIM_GOING_LEFT:
+		net_dim->tune_state = NET_DIM_GOING_RIGHT;
+		net_dim->steps_right = 0;
+		break;
+	}
+}
+
+static inline void net_dim_park_on_top(struct net_dim *net_dim)
+{
+	net_dim->steps_right  = 0;
+	net_dim->steps_left   = 0;
+	net_dim->tired        = 0;
+	net_dim->tune_state   = NET_DIM_PARKING_ON_TOP;
+}
+
+static inline void net_dim_park_tired(struct net_dim *net_dim)
+{
+	net_dim->steps_right  = 0;
+	net_dim->steps_left   = 0;
+	net_dim->tune_state   = NET_DIM_PARKING_TIRED;
+}
+
+static inline void
+net_dim_sample(u16 event_ctr, u64 packets, u64 bytes, struct net_dim_sample *s)
+{
+	s->time	     = ktime_get();
+	s->pkt_ctr   = packets;
+	s->byte_ctr  = bytes;
+	s->event_ctr = event_ctr;
+}
+
+static inline void
+net_dim_calc_stats(struct net_dim_sample *start, struct net_dim_sample *end,
+		   struct net_dim_stats *curr_stats)
+{
+	/* u32 holds up to 71 minutes, should be enough */
+	u32 delta_us = ktime_us_delta(end->time, start->time);
+	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
+	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
+			     start->byte_ctr);
+
+	if (!delta_us)
+		return;
+
+	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
+	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
+	curr_stats->epms = DIV_ROUND_UP(NET_DIM_NEVENTS * USEC_PER_MSEC,
+					delta_us);
+}
+
+#endif /* DIM_H */
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index fd458389f7d1..373cda74b167 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -35,73 +35,10 @@
 #define NET_DIM_H
 
 #include <linux/module.h>
-
-struct net_dim_cq_moder {
-	u16 usec;
-	u16 pkts;
-	u8 cq_period_mode;
-};
-
-struct net_dim_sample {
-	ktime_t time;
-	u32     pkt_ctr;
-	u32     byte_ctr;
-	u16     event_ctr;
-};
-
-struct net_dim_stats {
-	int ppms; /* packets per msec */
-	int bpms; /* bytes per msec */
-	int epms; /* events per msec */
-};
-
-struct net_dim { /* Adaptive Moderation */
-	u8                                      state;
-	struct net_dim_stats                    prev_stats;
-	struct net_dim_sample                   start_sample;
-	struct work_struct                      work;
-	u8                                      profile_ix;
-	u8                                      mode;
-	u8                                      tune_state;
-	u8                                      steps_right;
-	u8                                      steps_left;
-	u8                                      tired;
-};
-
-enum {
-	NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
-	NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
-	NET_DIM_CQ_PERIOD_NUM_MODES
-};
-
-/* Adaptive moderation logic */
-enum {
-	NET_DIM_START_MEASURE,
-	NET_DIM_MEASURE_IN_PROGRESS,
-	NET_DIM_APPLY_NEW_PROFILE,
-};
-
-enum {
-	NET_DIM_PARKING_ON_TOP,
-	NET_DIM_PARKING_TIRED,
-	NET_DIM_GOING_RIGHT,
-	NET_DIM_GOING_LEFT,
-};
-
-enum {
-	NET_DIM_STATS_WORSE,
-	NET_DIM_STATS_SAME,
-	NET_DIM_STATS_BETTER,
-};
-
-enum {
-	NET_DIM_STEPPED,
-	NET_DIM_TOO_TIRED,
-	NET_DIM_ON_EDGE,
-};
+#include <linux/dim.h>
 
 #define NET_DIM_PARAMS_NUM_PROFILES 5
-/* Adaptive moderation profiles */
+/* Netdev dynamic interrupt moderation profiles */
 #define NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
 #define NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE 128
 #define NET_DIM_DEF_PROFILE_CQE 1
@@ -188,36 +125,6 @@ net_dim_get_def_tx_moderation(u8 cq_period_mode)
 	return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
 }
 
-static inline bool net_dim_on_top(struct net_dim *dim)
-{
-	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
-		return true;
-	case NET_DIM_GOING_RIGHT:
-		return (dim->steps_left > 1) && (dim->steps_right == 1);
-	default: /* NET_DIM_GOING_LEFT */
-		return (dim->steps_right > 1) && (dim->steps_left == 1);
-	}
-}
-
-static inline void net_dim_turn(struct net_dim *dim)
-{
-	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
-		break;
-	case NET_DIM_GOING_RIGHT:
-		dim->tune_state = NET_DIM_GOING_LEFT;
-		dim->steps_left = 0;
-		break;
-	case NET_DIM_GOING_LEFT:
-		dim->tune_state = NET_DIM_GOING_RIGHT;
-		dim->steps_right = 0;
-		break;
-	}
-}
-
 static inline int net_dim_step(struct net_dim *dim)
 {
 	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
@@ -245,21 +152,6 @@ static inline int net_dim_step(struct net_dim *dim)
 	return NET_DIM_STEPPED;
 }
 
-static inline void net_dim_park_on_top(struct net_dim *dim)
-{
-	dim->steps_right  = 0;
-	dim->steps_left   = 0;
-	dim->tired        = 0;
-	dim->tune_state   = NET_DIM_PARKING_ON_TOP;
-}
-
-static inline void net_dim_park_tired(struct net_dim *dim)
-{
-	dim->steps_right  = 0;
-	dim->steps_left   = 0;
-	dim->tune_state   = NET_DIM_PARKING_TIRED;
-}
-
 static inline void net_dim_exit_parking(struct net_dim *dim)
 {
 	dim->tune_state = dim->profile_ix ? NET_DIM_GOING_LEFT :
@@ -267,9 +159,6 @@ static inline void net_dim_exit_parking(struct net_dim *dim)
 	net_dim_step(dim);
 }
 
-#define IS_SIGNIFICANT_DIFF(val, ref) \
-	(((100UL * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
-
 static inline int net_dim_stats_compare(struct net_dim_stats *curr,
 					struct net_dim_stats *prev)
 {
@@ -351,39 +240,6 @@ static inline bool net_dim_decision(struct net_dim_stats *curr_stats,
 	return dim->profile_ix != prev_ix;
 }
 
-static inline void net_dim_sample(u16 event_ctr,
-				  u64 packets,
-				  u64 bytes,
-				  struct net_dim_sample *s)
-{
-	s->time	     = ktime_get();
-	s->pkt_ctr   = packets;
-	s->byte_ctr  = bytes;
-	s->event_ctr = event_ctr;
-}
-
-#define NET_DIM_NEVENTS 64
-#define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1))
-
-static inline void net_dim_calc_stats(struct net_dim_sample *start,
-				      struct net_dim_sample *end,
-				      struct net_dim_stats *curr_stats)
-{
-	/* u32 holds up to 71 minutes, should be enough */
-	u32 delta_us = ktime_us_delta(end->time, start->time);
-	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
-	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
-			     start->byte_ctr);
-
-	if (!delta_us)
-		return;
-
-	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
-	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
-	curr_stats->epms = DIV_ROUND_UP(NET_DIM_NEVENTS * USEC_PER_MSEC,
-					delta_us);
-}
-
 static inline void net_dim(struct net_dim *dim,
 			   struct net_dim_sample end_sample)
 {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [for-next V2 02/10] linux/dim: Remove "net" prefix from internal DIM members
  2019-06-25 20:57 [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
  2019-06-25 20:57 ` [for-next V2 01/10] linux/dim: Move logic to dim.h Saeed Mahameed
@ 2019-06-25 20:57 ` Saeed Mahameed
  2019-06-25 21:53   ` Sagi Grimberg
  2019-06-25 20:57 ` [for-next V2 03/10] linux/dim: Rename externally exposed macros Saeed Mahameed
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2019-06-25 20:57 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Sagi Grimberg, Tal Gilboa, netdev,
	linux-rdma, Saeed Mahameed

From: Tal Gilboa <talgi@mellanox.com>

Only renaming functions and structs which aren't used by an external code.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/dim.h     | 86 ++++++++++++++++++++--------------------
 include/linux/net_dim.h | 87 ++++++++++++++++++++---------------------
 2 files changed, 86 insertions(+), 87 deletions(-)

diff --git a/include/linux/dim.h b/include/linux/dim.h
index 67d7ca40f3dd..6ee991681d62 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -6,7 +6,7 @@
 
 #include <linux/module.h>
 
-#define NET_DIM_NEVENTS 64
+#define DIM_NEVENTS 64
 
 /* more than 10% difference */
 #define IS_SIGNIFICANT_DIFF(val, ref) \
@@ -27,7 +27,7 @@ struct net_dim_sample {
 	u16 event_ctr;
 };
 
-struct net_dim_stats {
+struct dim_stats {
 	int ppms; /* packets per msec */
 	int bpms; /* bytes per msec */
 	int epms; /* events per msec */
@@ -35,7 +35,7 @@ struct net_dim_stats {
 
 struct net_dim { /* Dynamic Interrupt Moderation */
 	u8 state;
-	struct net_dim_stats prev_stats;
+	struct dim_stats prev_stats;
 	struct net_dim_sample start_sample;
 	struct work_struct work;
 	u8 profile_ix;
@@ -59,67 +59,67 @@ enum {
 };
 
 enum {
-	NET_DIM_PARKING_ON_TOP,
-	NET_DIM_PARKING_TIRED,
-	NET_DIM_GOING_RIGHT,
-	NET_DIM_GOING_LEFT,
+	DIM_PARKING_ON_TOP,
+	DIM_PARKING_TIRED,
+	DIM_GOING_RIGHT,
+	DIM_GOING_LEFT,
 };
 
 enum {
-	NET_DIM_STATS_WORSE,
-	NET_DIM_STATS_SAME,
-	NET_DIM_STATS_BETTER,
+	DIM_STATS_WORSE,
+	DIM_STATS_SAME,
+	DIM_STATS_BETTER,
 };
 
 enum {
-	NET_DIM_STEPPED,
-	NET_DIM_TOO_TIRED,
-	NET_DIM_ON_EDGE,
+	DIM_STEPPED,
+	DIM_TOO_TIRED,
+	DIM_ON_EDGE,
 };
 
-static inline bool net_dim_on_top(struct net_dim *net_dim)
+static inline bool dim_on_top(struct net_dim *dim)
 {
-	switch (net_dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
 		return true;
-	case NET_DIM_GOING_RIGHT:
-		return (net_dim->steps_left > 1) && (net_dim->steps_right == 1);
-	default: /* NET_DIM_GOING_LEFT */
-		return (net_dim->steps_right > 1) && (net_dim->steps_left == 1);
+	case DIM_GOING_RIGHT:
+		return (dim->steps_left > 1) && (dim->steps_right == 1);
+	default: /* DIM_GOING_LEFT */
+		return (dim->steps_right > 1) && (dim->steps_left == 1);
 	}
 }
 
-static inline void net_dim_turn(struct net_dim *net_dim)
+static inline void dim_turn(struct net_dim *dim)
 {
-	switch (net_dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
 		break;
-	case NET_DIM_GOING_RIGHT:
-		net_dim->tune_state = NET_DIM_GOING_LEFT;
-		net_dim->steps_left = 0;
+	case DIM_GOING_RIGHT:
+		dim->tune_state = DIM_GOING_LEFT;
+		dim->steps_left = 0;
 		break;
-	case NET_DIM_GOING_LEFT:
-		net_dim->tune_state = NET_DIM_GOING_RIGHT;
-		net_dim->steps_right = 0;
+	case DIM_GOING_LEFT:
+		dim->tune_state = DIM_GOING_RIGHT;
+		dim->steps_right = 0;
 		break;
 	}
 }
 
-static inline void net_dim_park_on_top(struct net_dim *net_dim)
+static inline void dim_park_on_top(struct net_dim *dim)
 {
-	net_dim->steps_right  = 0;
-	net_dim->steps_left   = 0;
-	net_dim->tired        = 0;
-	net_dim->tune_state   = NET_DIM_PARKING_ON_TOP;
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tired        = 0;
+	dim->tune_state   = DIM_PARKING_ON_TOP;
 }
 
-static inline void net_dim_park_tired(struct net_dim *net_dim)
+static inline void dim_park_tired(struct net_dim *dim)
 {
-	net_dim->steps_right  = 0;
-	net_dim->steps_left   = 0;
-	net_dim->tune_state   = NET_DIM_PARKING_TIRED;
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tune_state   = DIM_PARKING_TIRED;
 }
 
 static inline void
@@ -132,8 +132,8 @@ net_dim_sample(u16 event_ctr, u64 packets, u64 bytes, struct net_dim_sample *s)
 }
 
 static inline void
-net_dim_calc_stats(struct net_dim_sample *start, struct net_dim_sample *end,
-		   struct net_dim_stats *curr_stats)
+dim_calc_stats(struct net_dim_sample *start, struct net_dim_sample *end,
+	       struct dim_stats *curr_stats)
 {
 	/* u32 holds up to 71 minutes, should be enough */
 	u32 delta_us = ktime_us_delta(end->time, start->time);
@@ -146,7 +146,7 @@ net_dim_calc_stats(struct net_dim_sample *start, struct net_dim_sample *end,
 
 	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
 	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
-	curr_stats->epms = DIV_ROUND_UP(NET_DIM_NEVENTS * USEC_PER_MSEC,
+	curr_stats->epms = DIV_ROUND_UP(DIM_NEVENTS * USEC_PER_MSEC,
 					delta_us);
 }
 
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index 373cda74b167..f89fa4fdfb46 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -128,67 +128,67 @@ net_dim_get_def_tx_moderation(u8 cq_period_mode)
 static inline int net_dim_step(struct net_dim *dim)
 {
 	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
-		return NET_DIM_TOO_TIRED;
+		return DIM_TOO_TIRED;
 
 	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
 		break;
-	case NET_DIM_GOING_RIGHT:
+	case DIM_GOING_RIGHT:
 		if (dim->profile_ix == (NET_DIM_PARAMS_NUM_PROFILES - 1))
-			return NET_DIM_ON_EDGE;
+			return DIM_ON_EDGE;
 		dim->profile_ix++;
 		dim->steps_right++;
 		break;
-	case NET_DIM_GOING_LEFT:
+	case DIM_GOING_LEFT:
 		if (dim->profile_ix == 0)
-			return NET_DIM_ON_EDGE;
+			return DIM_ON_EDGE;
 		dim->profile_ix--;
 		dim->steps_left++;
 		break;
 	}
 
 	dim->tired++;
-	return NET_DIM_STEPPED;
+	return DIM_STEPPED;
 }
 
 static inline void net_dim_exit_parking(struct net_dim *dim)
 {
-	dim->tune_state = dim->profile_ix ? NET_DIM_GOING_LEFT :
-					  NET_DIM_GOING_RIGHT;
+	dim->tune_state = dim->profile_ix ? DIM_GOING_LEFT :
+					  DIM_GOING_RIGHT;
 	net_dim_step(dim);
 }
 
-static inline int net_dim_stats_compare(struct net_dim_stats *curr,
-					struct net_dim_stats *prev)
+static inline int net_dim_stats_compare(struct dim_stats *curr,
+					struct dim_stats *prev)
 {
 	if (!prev->bpms)
-		return curr->bpms ? NET_DIM_STATS_BETTER :
-				    NET_DIM_STATS_SAME;
+		return curr->bpms ? DIM_STATS_BETTER :
+				    DIM_STATS_SAME;
 
 	if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
-		return (curr->bpms > prev->bpms) ? NET_DIM_STATS_BETTER :
-						   NET_DIM_STATS_WORSE;
+		return (curr->bpms > prev->bpms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
 
 	if (!prev->ppms)
-		return curr->ppms ? NET_DIM_STATS_BETTER :
-				    NET_DIM_STATS_SAME;
+		return curr->ppms ? DIM_STATS_BETTER :
+				    DIM_STATS_SAME;
 
 	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
-		return (curr->ppms > prev->ppms) ? NET_DIM_STATS_BETTER :
-						   NET_DIM_STATS_WORSE;
+		return (curr->ppms > prev->ppms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
 
 	if (!prev->epms)
-		return NET_DIM_STATS_SAME;
+		return DIM_STATS_SAME;
 
 	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
-		return (curr->epms < prev->epms) ? NET_DIM_STATS_BETTER :
-						   NET_DIM_STATS_WORSE;
+		return (curr->epms < prev->epms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
 
-	return NET_DIM_STATS_SAME;
+	return DIM_STATS_SAME;
 }
 
-static inline bool net_dim_decision(struct net_dim_stats *curr_stats,
+static inline bool net_dim_decision(struct dim_stats *curr_stats,
 				    struct net_dim *dim)
 {
 	int prev_state = dim->tune_state;
@@ -197,44 +197,44 @@ static inline bool net_dim_decision(struct net_dim_stats *curr_stats,
 	int step_res;
 
 	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
+	case DIM_PARKING_ON_TOP:
 		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
-		if (stats_res != NET_DIM_STATS_SAME)
+		if (stats_res != DIM_STATS_SAME)
 			net_dim_exit_parking(dim);
 		break;
 
-	case NET_DIM_PARKING_TIRED:
+	case DIM_PARKING_TIRED:
 		dim->tired--;
 		if (!dim->tired)
 			net_dim_exit_parking(dim);
 		break;
 
-	case NET_DIM_GOING_RIGHT:
-	case NET_DIM_GOING_LEFT:
+	case DIM_GOING_RIGHT:
+	case DIM_GOING_LEFT:
 		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
-		if (stats_res != NET_DIM_STATS_BETTER)
-			net_dim_turn(dim);
+		if (stats_res != DIM_STATS_BETTER)
+			dim_turn(dim);
 
-		if (net_dim_on_top(dim)) {
-			net_dim_park_on_top(dim);
+		if (dim_on_top(dim)) {
+			dim_park_on_top(dim);
 			break;
 		}
 
 		step_res = net_dim_step(dim);
 		switch (step_res) {
-		case NET_DIM_ON_EDGE:
-			net_dim_park_on_top(dim);
+		case DIM_ON_EDGE:
+			dim_park_on_top(dim);
 			break;
-		case NET_DIM_TOO_TIRED:
-			net_dim_park_tired(dim);
+		case DIM_TOO_TIRED:
+			dim_park_tired(dim);
 			break;
 		}
 
 		break;
 	}
 
-	if ((prev_state      != NET_DIM_PARKING_ON_TOP) ||
-	    (dim->tune_state != NET_DIM_PARKING_ON_TOP))
+	if (prev_state != DIM_PARKING_ON_TOP ||
+	    dim->tune_state != DIM_PARKING_ON_TOP)
 		dim->prev_stats = *curr_stats;
 
 	return dim->profile_ix != prev_ix;
@@ -243,7 +243,7 @@ static inline bool net_dim_decision(struct net_dim_stats *curr_stats,
 static inline void net_dim(struct net_dim *dim,
 			   struct net_dim_sample end_sample)
 {
-	struct net_dim_stats curr_stats;
+	struct dim_stats curr_stats;
 	u16 nevents;
 
 	switch (dim->state) {
@@ -251,10 +251,9 @@ static inline void net_dim(struct net_dim *dim,
 		nevents = BIT_GAP(BITS_PER_TYPE(u16),
 				  end_sample.event_ctr,
 				  dim->start_sample.event_ctr);
-		if (nevents < NET_DIM_NEVENTS)
+		if (nevents < DIM_NEVENTS)
 			break;
-		net_dim_calc_stats(&dim->start_sample, &end_sample,
-				   &curr_stats);
+		dim_calc_stats(&dim->start_sample, &end_sample, &curr_stats);
 		if (net_dim_decision(&curr_stats, dim)) {
 			dim->state = NET_DIM_APPLY_NEW_PROFILE;
 			schedule_work(&dim->work);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [for-next V2 03/10] linux/dim: Rename externally exposed macros
  2019-06-25 20:57 [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
  2019-06-25 20:57 ` [for-next V2 01/10] linux/dim: Move logic to dim.h Saeed Mahameed
  2019-06-25 20:57 ` [for-next V2 02/10] linux/dim: Remove "net" prefix from internal DIM members Saeed Mahameed
@ 2019-06-25 20:57 ` Saeed Mahameed
  2019-06-25 21:54   ` Sagi Grimberg
  2019-06-25 20:57 ` [for-next V2 04/10] linux/dim: Rename net_dim_sample() to net_dim_update_sample() Saeed Mahameed
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2019-06-25 20:57 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Sagi Grimberg, Tal Gilboa, netdev,
	linux-rdma, Saeed Mahameed

From: Tal Gilboa <talgi@mellanox.com>

Renamed macros in use by external drivers.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c     |  4 ++--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c      |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c  |  2 +-
 drivers/net/ethernet/broadcom/genet/bcmgenet.c |  4 ++--
 .../net/ethernet/mellanox/mlx5/core/en_dim.c   |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c  | 10 +++++-----
 include/linux/dim.h                            | 12 ++++++------
 include/linux/net_dim.h                        | 18 +++++++++---------
 8 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index c623896e3ccb..b5e2f9d2cb71 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1099,7 +1099,7 @@ static void bcm_sysport_dim_work(struct work_struct *work)
 			net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
 
 	bcm_sysport_set_rx_coalesce(priv, cur_profile.usec, cur_profile.pkts);
-	dim->state = NET_DIM_START_MEASURE;
+	dim->state = DIM_START_MEASURE;
 }
 
 /* RX and misc interrupt routine */
@@ -1440,7 +1440,7 @@ static void bcm_sysport_init_dim(struct bcm_sysport_priv *priv,
 	struct bcm_sysport_net_dim *dim = &priv->dim;
 
 	INIT_WORK(&dim->dim.work, cb);
-	dim->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+	dim->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 	dim->event_ctr = 0;
 	dim->packets = 0;
 	dim->bytes = 0;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8314c00d7537..49de873043c0 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -7810,7 +7810,7 @@ static void bnxt_enable_napi(struct bnxt *bp)
 
 		if (bp->bnapi[i]->rx_ring) {
 			INIT_WORK(&cpr->dim.work, bnxt_dim_work);
-			cpr->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+			cpr->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 		}
 		napi_enable(&bp->bnapi[i]->napi);
 	}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
index afa97c8bb081..16a4588709d1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
@@ -28,5 +28,5 @@ void bnxt_dim_work(struct work_struct *work)
 	cpr->rx_ring_coal.coal_bufs = cur_moder.pkts;
 
 	bnxt_hwrm_set_ring_coal(bnapi->bp, bnapi);
-	dim->state = NET_DIM_START_MEASURE;
+	dim->state = DIM_START_MEASURE;
 }
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 374b9ff05c88..5286a46ecfb0 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1928,7 +1928,7 @@ static void bcmgenet_dim_work(struct work_struct *work)
 			net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
 
 	bcmgenet_set_rx_coalesce(ring, cur_profile.usec, cur_profile.pkts);
-	dim->state = NET_DIM_START_MEASURE;
+	dim->state = DIM_START_MEASURE;
 }
 
 /* Assign skb to RX DMA descriptor. */
@@ -2085,7 +2085,7 @@ static void bcmgenet_init_dim(struct bcmgenet_rx_ring *ring,
 	struct bcmgenet_net_dim *dim = &ring->dim;
 
 	INIT_WORK(&dim->dim.work, cb);
-	dim->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+	dim->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 	dim->event_ctr = 0;
 	dim->packets = 0;
 	dim->bytes = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index d67adf70a97b..a80303add7c0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -38,7 +38,7 @@ mlx5e_complete_dim_work(struct net_dim *dim, struct net_dim_cq_moder moder,
 			struct mlx5_core_dev *mdev, struct mlx5_core_cq *mcq)
 {
 	mlx5_core_modify_cq_moderation(mdev, mcq, moder.usec, moder.pkts);
-	dim->state = NET_DIM_START_MEASURE;
+	dim->state = DIM_START_MEASURE;
 }
 
 void mlx5e_rx_dim_work(struct work_struct *work)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 457cc39423f2..5b89e992e482 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -584,11 +584,11 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 
 	switch (params->rx_cq_moderation.cq_period_mode) {
 	case MLX5_CQ_PERIOD_MODE_START_FROM_CQE:
-		rq->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE;
+		rq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_CQE;
 		break;
 	case MLX5_CQ_PERIOD_MODE_START_FROM_EQE:
 	default:
-		rq->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+		rq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 	}
 
 	rq->page_cache.head = 0;
@@ -2151,7 +2151,7 @@ static void mlx5e_build_ico_cq_param(struct mlx5e_priv *priv,
 
 	mlx5e_build_common_cq_param(priv, param);
 
-	param->cq_period_mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+	param->cq_period_mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 }
 
 static void mlx5e_build_icosq_param(struct mlx5e_priv *priv,
@@ -4440,8 +4440,8 @@ static struct net_dim_cq_moder mlx5e_get_def_rx_moderation(u8 cq_period_mode)
 static u8 mlx5_to_net_dim_cq_period_mode(u8 cq_period_mode)
 {
 	return cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE ?
-		NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE :
-		NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+		DIM_CQ_PERIOD_MODE_START_FROM_CQE :
+		DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 }
 
 void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode)
diff --git a/include/linux/dim.h b/include/linux/dim.h
index 6ee991681d62..989dbbdf9d45 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -47,15 +47,15 @@ struct net_dim { /* Dynamic Interrupt Moderation */
 };
 
 enum {
-	NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
-	NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
-	NET_DIM_CQ_PERIOD_NUM_MODES
+	DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
+	DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
+	DIM_CQ_PERIOD_NUM_MODES
 };
 
 enum {
-	NET_DIM_START_MEASURE,
-	NET_DIM_MEASURE_IN_PROGRESS,
-	NET_DIM_APPLY_NEW_PROFILE,
+	DIM_START_MEASURE,
+	DIM_MEASURE_IN_PROGRESS,
+	DIM_APPLY_NEW_PROFILE,
 };
 
 enum {
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index f89fa4fdfb46..e0c97f824dd0 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -78,13 +78,13 @@
 }
 
 static const struct net_dim_cq_moder
-rx_profile[NET_DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
+rx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
 	NET_DIM_RX_EQE_PROFILES,
 	NET_DIM_RX_CQE_PROFILES,
 };
 
 static const struct net_dim_cq_moder
-tx_profile[NET_DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
+tx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
 	NET_DIM_TX_EQE_PROFILES,
 	NET_DIM_TX_CQE_PROFILES,
 };
@@ -101,7 +101,7 @@ net_dim_get_rx_moderation(u8 cq_period_mode, int ix)
 static inline struct net_dim_cq_moder
 net_dim_get_def_rx_moderation(u8 cq_period_mode)
 {
-	u8 profile_ix = cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
+	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
 			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
 
 	return net_dim_get_rx_moderation(cq_period_mode, profile_ix);
@@ -119,7 +119,7 @@ net_dim_get_tx_moderation(u8 cq_period_mode, int ix)
 static inline struct net_dim_cq_moder
 net_dim_get_def_tx_moderation(u8 cq_period_mode)
 {
-	u8 profile_ix = cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
+	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
 			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
 
 	return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
@@ -247,7 +247,7 @@ static inline void net_dim(struct net_dim *dim,
 	u16 nevents;
 
 	switch (dim->state) {
-	case NET_DIM_MEASURE_IN_PROGRESS:
+	case DIM_MEASURE_IN_PROGRESS:
 		nevents = BIT_GAP(BITS_PER_TYPE(u16),
 				  end_sample.event_ctr,
 				  dim->start_sample.event_ctr);
@@ -255,17 +255,17 @@ static inline void net_dim(struct net_dim *dim,
 			break;
 		dim_calc_stats(&dim->start_sample, &end_sample, &curr_stats);
 		if (net_dim_decision(&curr_stats, dim)) {
-			dim->state = NET_DIM_APPLY_NEW_PROFILE;
+			dim->state = DIM_APPLY_NEW_PROFILE;
 			schedule_work(&dim->work);
 			break;
 		}
 		/* fall through */
-	case NET_DIM_START_MEASURE:
+	case DIM_START_MEASURE:
 		net_dim_sample(end_sample.event_ctr, end_sample.pkt_ctr, end_sample.byte_ctr,
 			       &dim->start_sample);
-		dim->state = NET_DIM_MEASURE_IN_PROGRESS;
+		dim->state = DIM_MEASURE_IN_PROGRESS;
 		break;
-	case NET_DIM_APPLY_NEW_PROFILE:
+	case DIM_APPLY_NEW_PROFILE:
 		break;
 	}
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [for-next V2 04/10] linux/dim: Rename net_dim_sample() to net_dim_update_sample()
  2019-06-25 20:57 [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
                   ` (2 preceding siblings ...)
  2019-06-25 20:57 ` [for-next V2 03/10] linux/dim: Rename externally exposed macros Saeed Mahameed
@ 2019-06-25 20:57 ` Saeed Mahameed
  2019-06-25 21:55   ` Sagi Grimberg
  2019-06-25 20:57 ` [for-next V2 05/10] linux/dim: Rename externally used net_dim members Saeed Mahameed
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2019-06-25 20:57 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Sagi Grimberg, Tal Gilboa, netdev,
	linux-rdma, Saeed Mahameed

From: Tal Gilboa <talgi@mellanox.com>

In order to avoid confusion between the function and the similarly
named struct.
In preparation for removing the 'net' prefix from dim members.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c        | 4 ++--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 8 ++++----
 drivers/net/ethernet/broadcom/genet/bcmgenet.c    | 4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 6 ++----
 include/linux/dim.h                               | 3 ++-
 include/linux/net_dim.h                           | 4 ++--
 6 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index b5e2f9d2cb71..faaf8ade15e5 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1019,8 +1019,8 @@ static int bcm_sysport_poll(struct napi_struct *napi, int budget)
 	}
 
 	if (priv->dim.use_dim) {
-		net_dim_sample(priv->dim.event_ctr, priv->dim.packets,
-			       priv->dim.bytes, &dim_sample);
+		net_dim_update_sample(priv->dim.event_ctr, priv->dim.packets,
+				      priv->dim.bytes, &dim_sample);
 		net_dim(&priv->dim.dim, dim_sample);
 	}
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 49de873043c0..eaec949c367a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2130,10 +2130,10 @@ static int bnxt_poll(struct napi_struct *napi, int budget)
 	if (bp->flags & BNXT_FLAG_DIM) {
 		struct net_dim_sample dim_sample;
 
-		net_dim_sample(cpr->event_ctr,
-			       cpr->rx_packets,
-			       cpr->rx_bytes,
-			       &dim_sample);
+		net_dim_update_sample(cpr->event_ctr,
+				      cpr->rx_packets,
+				      cpr->rx_bytes,
+				      &dim_sample);
 		net_dim(&cpr->dim, dim_sample);
 	}
 	return work_done;
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 5286a46ecfb0..297ae786ffed 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1909,8 +1909,8 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 	}
 
 	if (ring->dim.use_dim) {
-		net_dim_sample(ring->dim.event_ctr, ring->dim.packets,
-			       ring->dim.bytes, &dim_sample);
+		net_dim_update_sample(ring->dim.event_ctr, ring->dim.packets,
+				      ring->dim.bytes, &dim_sample);
 		net_dim(&ring->dim.dim, dim_sample);
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index f9862bf75491..07432e6428cf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -53,8 +53,7 @@ static void mlx5e_handle_tx_dim(struct mlx5e_txqsq *sq)
 	if (unlikely(!test_bit(MLX5E_SQ_STATE_AM, &sq->state)))
 		return;
 
-	net_dim_sample(sq->cq.event_ctr, stats->packets, stats->bytes,
-		       &dim_sample);
+	net_dim_update_sample(sq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
 	net_dim(&sq->dim, dim_sample);
 }
 
@@ -66,8 +65,7 @@ static void mlx5e_handle_rx_dim(struct mlx5e_rq *rq)
 	if (unlikely(!test_bit(MLX5E_RQ_STATE_AM, &rq->state)))
 		return;
 
-	net_dim_sample(rq->cq.event_ctr, stats->packets, stats->bytes,
-		       &dim_sample);
+	net_dim_update_sample(rq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
 	net_dim(&rq->dim, dim_sample);
 }
 
diff --git a/include/linux/dim.h b/include/linux/dim.h
index 989dbbdf9d45..f0f20ed25497 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -123,7 +123,8 @@ static inline void dim_park_tired(struct net_dim *dim)
 }
 
 static inline void
-net_dim_sample(u16 event_ctr, u64 packets, u64 bytes, struct net_dim_sample *s)
+net_dim_update_sample(u16 event_ctr, u64 packets, u64 bytes,
+		      struct net_dim_sample *s)
 {
 	s->time	     = ktime_get();
 	s->pkt_ctr   = packets;
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index e0c97f824dd0..d4b40adc7fa1 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -261,8 +261,8 @@ static inline void net_dim(struct net_dim *dim,
 		}
 		/* fall through */
 	case DIM_START_MEASURE:
-		net_dim_sample(end_sample.event_ctr, end_sample.pkt_ctr, end_sample.byte_ctr,
-			       &dim->start_sample);
+		net_dim_update_sample(end_sample.event_ctr, end_sample.pkt_ctr,
+				      end_sample.byte_ctr, &dim->start_sample);
 		dim->state = DIM_MEASURE_IN_PROGRESS;
 		break;
 	case DIM_APPLY_NEW_PROFILE:
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [for-next V2 05/10] linux/dim: Rename externally used net_dim members
  2019-06-25 20:57 [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
                   ` (3 preceding siblings ...)
  2019-06-25 20:57 ` [for-next V2 04/10] linux/dim: Rename net_dim_sample() to net_dim_update_sample() Saeed Mahameed
@ 2019-06-25 20:57 ` Saeed Mahameed
  2019-06-25 21:57   ` Sagi Grimberg
  2019-06-25 20:57 ` [for-next V2 06/10] linux/dim: Move implementation to .c files Saeed Mahameed
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2019-06-25 20:57 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Sagi Grimberg, Tal Gilboa, netdev,
	linux-rdma, Saeed Mahameed

From: Tal Gilboa <talgi@mellanox.com>

Removed 'net' prefix from functions and structs used by external drivers.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c    | 16 +++++-----
 drivers/net/ethernet/broadcom/bcmsysport.h    |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 10 +++----
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  2 +-
 .../net/ethernet/broadcom/bnxt/bnxt_debugfs.c |  4 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c |  5 ++--
 .../net/ethernet/broadcom/genet/bcmgenet.c    | 14 ++++-----
 .../net/ethernet/broadcom/genet/bcmgenet.h    |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  8 ++---
 .../net/ethernet/mellanox/mlx5/core/en_dim.c  | 10 +++----
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |  4 +--
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 12 ++++----
 .../net/ethernet/mellanox/mlx5/core/en_txrx.c |  8 ++---
 include/linux/dim.h                           | 21 +++++++------
 include/linux/net_dim.h                       | 30 +++++++++----------
 15 files changed, 73 insertions(+), 75 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index faaf8ade15e5..c1247b2948ff 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -612,7 +612,7 @@ static int bcm_sysport_set_coalesce(struct net_device *dev,
 				    struct ethtool_coalesce *ec)
 {
 	struct bcm_sysport_priv *priv = netdev_priv(dev);
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 	u32 usecs, pkts;
 	unsigned int i;
 
@@ -995,7 +995,7 @@ static int bcm_sysport_poll(struct napi_struct *napi, int budget)
 {
 	struct bcm_sysport_priv *priv =
 		container_of(napi, struct bcm_sysport_priv, napi);
-	struct net_dim_sample dim_sample;
+	struct dim_sample dim_sample;
 	unsigned int work_done = 0;
 
 	work_done = bcm_sysport_desc_rx(priv, budget);
@@ -1019,8 +1019,8 @@ static int bcm_sysport_poll(struct napi_struct *napi, int budget)
 	}
 
 	if (priv->dim.use_dim) {
-		net_dim_update_sample(priv->dim.event_ctr, priv->dim.packets,
-				      priv->dim.bytes, &dim_sample);
+		dim_update_sample(priv->dim.event_ctr, priv->dim.packets,
+				  priv->dim.bytes, &dim_sample);
 		net_dim(&priv->dim.dim, dim_sample);
 	}
 
@@ -1090,13 +1090,13 @@ static void bcm_sysport_resume_from_wol(struct bcm_sysport_priv *priv)
 
 static void bcm_sysport_dim_work(struct work_struct *work)
 {
-	struct net_dim *dim = container_of(work, struct net_dim, work);
+	struct dim *dim = container_of(work, struct dim, work);
 	struct bcm_sysport_net_dim *ndim =
 			container_of(dim, struct bcm_sysport_net_dim, dim);
 	struct bcm_sysport_priv *priv =
 			container_of(ndim, struct bcm_sysport_priv, dim);
-	struct net_dim_cq_moder cur_profile =
-			net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
+	struct dim_cq_moder cur_profile = net_dim_get_rx_moderation(dim->mode,
+								    dim->profile_ix);
 
 	bcm_sysport_set_rx_coalesce(priv, cur_profile.usec, cur_profile.pkts);
 	dim->state = DIM_START_MEASURE;
@@ -1449,7 +1449,7 @@ static void bcm_sysport_init_dim(struct bcm_sysport_priv *priv,
 static void bcm_sysport_init_rx_coalesce(struct bcm_sysport_priv *priv)
 {
 	struct bcm_sysport_net_dim *dim = &priv->dim;
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 	u32 usecs, pkts;
 
 	usecs = priv->rx_coalesce_usecs;
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.h b/drivers/net/ethernet/broadcom/bcmsysport.h
index 6f3141c86436..cbe6d559d964 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.h
+++ b/drivers/net/ethernet/broadcom/bcmsysport.h
@@ -705,7 +705,7 @@ struct bcm_sysport_net_dim {
 	u16			event_ctr;
 	unsigned long		packets;
 	unsigned long		bytes;
-	struct net_dim		dim;
+	struct dim		dim;
 };
 
 /* Software view of the TX ring */
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index eaec949c367a..c54668004600 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2128,12 +2128,12 @@ static int bnxt_poll(struct napi_struct *napi, int budget)
 		}
 	}
 	if (bp->flags & BNXT_FLAG_DIM) {
-		struct net_dim_sample dim_sample;
+		struct dim_sample dim_sample;
 
-		net_dim_update_sample(cpr->event_ctr,
-				      cpr->rx_packets,
-				      cpr->rx_bytes,
-				      &dim_sample);
+		dim_update_sample(cpr->event_ctr,
+				  cpr->rx_packets,
+				  cpr->rx_bytes,
+				  &dim_sample);
 		net_dim(&cpr->dim, dim_sample);
 	}
 	return work_done;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index eca36dd6b751..a552c5539cc9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -809,7 +809,7 @@ struct bnxt_cp_ring_info {
 	u64			rx_bytes;
 	u64			event_ctr;
 
-	struct net_dim		dim;
+	struct dim		dim;
 
 	union {
 		struct tx_cmp	*cp_desc_ring[MAX_CP_PAGES];
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c
index 94e208e9789f..3d1d53fbb135 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c
@@ -21,7 +21,7 @@ static ssize_t debugfs_dim_read(struct file *filep,
 				char __user *buffer,
 				size_t count, loff_t *ppos)
 {
-	struct net_dim *dim = filep->private_data;
+	struct dim *dim = filep->private_data;
 	int len;
 	char *buf;
 
@@ -61,7 +61,7 @@ static const struct file_operations debugfs_dim_fops = {
 	.read = debugfs_dim_read,
 };
 
-static struct dentry *debugfs_dim_ring_init(struct net_dim *dim, int ring_idx,
+static struct dentry *debugfs_dim_ring_init(struct dim *dim, int ring_idx,
 					    struct dentry *dd)
 {
 	static char qname[16];
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
index 16a4588709d1..11605f9fa61e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
@@ -13,15 +13,14 @@
 
 void bnxt_dim_work(struct work_struct *work)
 {
-	struct net_dim *dim = container_of(work, struct net_dim,
-					   work);
+	struct dim *dim = container_of(work, struct dim, work);
 	struct bnxt_cp_ring_info *cpr = container_of(dim,
 						     struct bnxt_cp_ring_info,
 						     dim);
 	struct bnxt_napi *bnapi = container_of(cpr,
 					       struct bnxt_napi,
 					       cp_ring);
-	struct net_dim_cq_moder cur_moder =
+	struct dim_cq_moder cur_moder =
 		net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
 
 	cpr->rx_ring_coal.coal_ticks = cur_moder.usec;
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 297ae786ffed..b7f8f4f1088f 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -643,7 +643,7 @@ static void bcmgenet_set_rx_coalesce(struct bcmgenet_rx_ring *ring,
 static void bcmgenet_set_ring_rx_coalesce(struct bcmgenet_rx_ring *ring,
 					  struct ethtool_coalesce *ec)
 {
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 	u32 usecs, pkts;
 
 	ring->rx_coalesce_usecs = ec->rx_coalesce_usecs;
@@ -1898,7 +1898,7 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 {
 	struct bcmgenet_rx_ring *ring = container_of(napi,
 			struct bcmgenet_rx_ring, napi);
-	struct net_dim_sample dim_sample;
+	struct dim_sample dim_sample;
 	unsigned int work_done;
 
 	work_done = bcmgenet_desc_rx(ring, budget);
@@ -1909,8 +1909,8 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 	}
 
 	if (ring->dim.use_dim) {
-		net_dim_update_sample(ring->dim.event_ctr, ring->dim.packets,
-				      ring->dim.bytes, &dim_sample);
+		dim_update_sample(ring->dim.event_ctr, ring->dim.packets,
+				  ring->dim.bytes, &dim_sample);
 		net_dim(&ring->dim.dim, dim_sample);
 	}
 
@@ -1919,12 +1919,12 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 
 static void bcmgenet_dim_work(struct work_struct *work)
 {
-	struct net_dim *dim = container_of(work, struct net_dim, work);
+	struct dim *dim = container_of(work, struct dim, work);
 	struct bcmgenet_net_dim *ndim =
 			container_of(dim, struct bcmgenet_net_dim, dim);
 	struct bcmgenet_rx_ring *ring =
 			container_of(ndim, struct bcmgenet_rx_ring, dim);
-	struct net_dim_cq_moder cur_profile =
+	struct dim_cq_moder cur_profile =
 			net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
 
 	bcmgenet_set_rx_coalesce(ring, cur_profile.usec, cur_profile.pkts);
@@ -2094,7 +2094,7 @@ static void bcmgenet_init_dim(struct bcmgenet_rx_ring *ring,
 static void bcmgenet_init_rx_coalesce(struct bcmgenet_rx_ring *ring)
 {
 	struct bcmgenet_net_dim *dim = &ring->dim;
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 	u32 usecs, pkts;
 
 	usecs = ring->rx_coalesce_usecs;
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index 14b49612aa86..6e418d9c3706 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -581,7 +581,7 @@ struct bcmgenet_net_dim {
 	u16		event_ctr;
 	unsigned long	packets;
 	unsigned long	bytes;
-	struct net_dim	dim;
+	struct dim	dim;
 };
 
 struct bcmgenet_rx_ring {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 3a183d690e23..11efd6e4bdc3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -238,9 +238,9 @@ struct mlx5e_params {
 	u16 num_channels;
 	u8  num_tc;
 	bool rx_cqe_compress_def;
-	struct net_dim_cq_moder rx_cq_moderation;
-	struct net_dim_cq_moder tx_cq_moderation;
 	bool tunneled_offload_en;
+	struct dim_cq_moder rx_cq_moderation;
+	struct dim_cq_moder tx_cq_moderation;
 	bool lro_en;
 	u8  tx_min_inline_mode;
 	bool vlan_strip_disable;
@@ -356,7 +356,7 @@ struct mlx5e_txqsq {
 	/* dirtied @completion */
 	u16                        cc;
 	u32                        dma_fifo_cc;
-	struct net_dim             dim; /* Adaptive Moderation */
+	struct dim                 dim; /* Adaptive Moderation */
 
 	/* dirtied @xmit */
 	u16                        pc ____cacheline_aligned_in_smp;
@@ -595,7 +595,7 @@ struct mlx5e_rq {
 	int                    ix;
 	unsigned int           hw_mtu;
 
-	struct net_dim         dim; /* Dynamic Interrupt Moderation */
+	struct dim         dim; /* Dynamic Interrupt Moderation */
 
 	/* XDP */
 	struct bpf_prog       *xdp_prog;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index a80303add7c0..ba3c1be9f2d3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -34,7 +34,7 @@
 #include "en.h"
 
 static void
-mlx5e_complete_dim_work(struct net_dim *dim, struct net_dim_cq_moder moder,
+mlx5e_complete_dim_work(struct dim *dim, struct dim_cq_moder moder,
 			struct mlx5_core_dev *mdev, struct mlx5_core_cq *mcq)
 {
 	mlx5_core_modify_cq_moderation(mdev, mcq, moder.usec, moder.pkts);
@@ -43,9 +43,9 @@ mlx5e_complete_dim_work(struct net_dim *dim, struct net_dim_cq_moder moder,
 
 void mlx5e_rx_dim_work(struct work_struct *work)
 {
-	struct net_dim *dim = container_of(work, struct net_dim, work);
+	struct dim *dim = container_of(work, struct dim, work);
 	struct mlx5e_rq *rq = container_of(dim, struct mlx5e_rq, dim);
-	struct net_dim_cq_moder cur_moder =
+	struct dim_cq_moder cur_moder =
 		net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
 
 	mlx5e_complete_dim_work(dim, cur_moder, rq->mdev, &rq->cq.mcq);
@@ -53,9 +53,9 @@ void mlx5e_rx_dim_work(struct work_struct *work)
 
 void mlx5e_tx_dim_work(struct work_struct *work)
 {
-	struct net_dim *dim = container_of(work, struct net_dim, work);
+	struct dim *dim = container_of(work, struct dim, work);
 	struct mlx5e_txqsq *sq = container_of(dim, struct mlx5e_txqsq, dim);
-	struct net_dim_cq_moder cur_moder =
+	struct dim_cq_moder cur_moder =
 		net_dim_get_tx_moderation(dim->mode, dim->profile_ix);
 
 	mlx5e_complete_dim_work(dim, cur_moder, sq->cq.mdev, &sq->cq.mcq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index dd764e0471f2..c853b657739c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -466,7 +466,7 @@ static int mlx5e_set_channels(struct net_device *dev,
 int mlx5e_ethtool_get_coalesce(struct mlx5e_priv *priv,
 			       struct ethtool_coalesce *coal)
 {
-	struct net_dim_cq_moder *rx_moder, *tx_moder;
+	struct dim_cq_moder *rx_moder, *tx_moder;
 
 	if (!MLX5_CAP_GEN(priv->mdev, cq_moderation))
 		return -EOPNOTSUPP;
@@ -521,7 +521,7 @@ mlx5e_set_priv_channels_coalesce(struct mlx5e_priv *priv, struct ethtool_coalesc
 int mlx5e_ethtool_set_coalesce(struct mlx5e_priv *priv,
 			       struct ethtool_coalesce *coal)
 {
-	struct net_dim_cq_moder *rx_moder, *tx_moder;
+	struct dim_cq_moder *rx_moder, *tx_moder;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	struct mlx5e_channels new_channels = {};
 	int err = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 5b89e992e482..9705101c0235 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1569,7 +1569,7 @@ static void mlx5e_destroy_cq(struct mlx5e_cq *cq)
 }
 
 static int mlx5e_open_cq(struct mlx5e_channel *c,
-			 struct net_dim_cq_moder moder,
+			 struct dim_cq_moder moder,
 			 struct mlx5e_cq_param *param,
 			 struct mlx5e_cq *cq)
 {
@@ -1774,7 +1774,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 			      struct mlx5e_channel **cp)
 {
 	int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
-	struct net_dim_cq_moder icocq_moder = {0, 0};
+	struct dim_cq_moder icocq_moder = {0, 0};
 	struct net_device *netdev = priv->netdev;
 	struct mlx5e_channel *c;
 	unsigned int irq;
@@ -4411,9 +4411,9 @@ static bool slow_pci_heuristic(struct mlx5_core_dev *mdev)
 		link_speed > MLX5E_SLOW_PCI_RATIO * pci_bw;
 }
 
-static struct net_dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode)
+static struct dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode)
 {
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 
 	moder.cq_period_mode = cq_period_mode;
 	moder.pkts = MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_PKTS;
@@ -4424,9 +4424,9 @@ static struct net_dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode)
 	return moder;
 }
 
-static struct net_dim_cq_moder mlx5e_get_def_rx_moderation(u8 cq_period_mode)
+static struct dim_cq_moder mlx5e_get_def_rx_moderation(u8 cq_period_mode)
 {
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 
 	moder.cq_period_mode = cq_period_mode;
 	moder.pkts = MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_PKTS;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index 07432e6428cf..e6c434efbd46 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -48,24 +48,24 @@ static inline bool mlx5e_channel_no_affinity_change(struct mlx5e_channel *c)
 static void mlx5e_handle_tx_dim(struct mlx5e_txqsq *sq)
 {
 	struct mlx5e_sq_stats *stats = sq->stats;
-	struct net_dim_sample dim_sample;
+	struct dim_sample dim_sample;
 
 	if (unlikely(!test_bit(MLX5E_SQ_STATE_AM, &sq->state)))
 		return;
 
-	net_dim_update_sample(sq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
+	dim_update_sample(sq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
 	net_dim(&sq->dim, dim_sample);
 }
 
 static void mlx5e_handle_rx_dim(struct mlx5e_rq *rq)
 {
 	struct mlx5e_rq_stats *stats = rq->stats;
-	struct net_dim_sample dim_sample;
+	struct dim_sample dim_sample;
 
 	if (unlikely(!test_bit(MLX5E_RQ_STATE_AM, &rq->state)))
 		return;
 
-	net_dim_update_sample(rq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
+	dim_update_sample(rq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
 	net_dim(&rq->dim, dim_sample);
 }
 
diff --git a/include/linux/dim.h b/include/linux/dim.h
index f0f20ed25497..60e5074a7cc0 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -14,13 +14,13 @@
 #define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) \
 & (BIT_ULL(bits) - 1))
 
-struct net_dim_cq_moder {
+struct dim_cq_moder {
 	u16 usec;
 	u16 pkts;
 	u8 cq_period_mode;
 };
 
-struct net_dim_sample {
+struct dim_sample {
 	ktime_t time;
 	u32 pkt_ctr;
 	u32 byte_ctr;
@@ -33,10 +33,10 @@ struct dim_stats {
 	int epms; /* events per msec */
 };
 
-struct net_dim { /* Dynamic Interrupt Moderation */
+struct dim { /* Dynamic Interrupt Moderation */
 	u8 state;
 	struct dim_stats prev_stats;
-	struct net_dim_sample start_sample;
+	struct dim_sample start_sample;
 	struct work_struct work;
 	u8 profile_ix;
 	u8 mode;
@@ -77,7 +77,7 @@ enum {
 	DIM_ON_EDGE,
 };
 
-static inline bool dim_on_top(struct net_dim *dim)
+static inline bool dim_on_top(struct dim *dim)
 {
 	switch (dim->tune_state) {
 	case DIM_PARKING_ON_TOP:
@@ -90,7 +90,7 @@ static inline bool dim_on_top(struct net_dim *dim)
 	}
 }
 
-static inline void dim_turn(struct net_dim *dim)
+static inline void dim_turn(struct dim *dim)
 {
 	switch (dim->tune_state) {
 	case DIM_PARKING_ON_TOP:
@@ -107,7 +107,7 @@ static inline void dim_turn(struct net_dim *dim)
 	}
 }
 
-static inline void dim_park_on_top(struct net_dim *dim)
+static inline void dim_park_on_top(struct dim *dim)
 {
 	dim->steps_right  = 0;
 	dim->steps_left   = 0;
@@ -115,7 +115,7 @@ static inline void dim_park_on_top(struct net_dim *dim)
 	dim->tune_state   = DIM_PARKING_ON_TOP;
 }
 
-static inline void dim_park_tired(struct net_dim *dim)
+static inline void dim_park_tired(struct dim *dim)
 {
 	dim->steps_right  = 0;
 	dim->steps_left   = 0;
@@ -123,8 +123,7 @@ static inline void dim_park_tired(struct net_dim *dim)
 }
 
 static inline void
-net_dim_update_sample(u16 event_ctr, u64 packets, u64 bytes,
-		      struct net_dim_sample *s)
+dim_update_sample(u16 event_ctr, u64 packets, u64 bytes, struct dim_sample *s)
 {
 	s->time	     = ktime_get();
 	s->pkt_ctr   = packets;
@@ -133,7 +132,7 @@ net_dim_update_sample(u16 event_ctr, u64 packets, u64 bytes,
 }
 
 static inline void
-dim_calc_stats(struct net_dim_sample *start, struct net_dim_sample *end,
+dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
 	       struct dim_stats *curr_stats)
 {
 	/* u32 holds up to 71 minutes, should be enough */
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index d4b40adc7fa1..4e009ec193ef 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -77,28 +77,28 @@
 	{64, 32}   \
 }
 
-static const struct net_dim_cq_moder
+static const struct dim_cq_moder
 rx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
 	NET_DIM_RX_EQE_PROFILES,
 	NET_DIM_RX_CQE_PROFILES,
 };
 
-static const struct net_dim_cq_moder
+static const struct dim_cq_moder
 tx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
 	NET_DIM_TX_EQE_PROFILES,
 	NET_DIM_TX_CQE_PROFILES,
 };
 
-static inline struct net_dim_cq_moder
+static inline struct dim_cq_moder
 net_dim_get_rx_moderation(u8 cq_period_mode, int ix)
 {
-	struct net_dim_cq_moder cq_moder = rx_profile[cq_period_mode][ix];
+	struct dim_cq_moder cq_moder = rx_profile[cq_period_mode][ix];
 
 	cq_moder.cq_period_mode = cq_period_mode;
 	return cq_moder;
 }
 
-static inline struct net_dim_cq_moder
+static inline struct dim_cq_moder
 net_dim_get_def_rx_moderation(u8 cq_period_mode)
 {
 	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
@@ -107,16 +107,16 @@ net_dim_get_def_rx_moderation(u8 cq_period_mode)
 	return net_dim_get_rx_moderation(cq_period_mode, profile_ix);
 }
 
-static inline struct net_dim_cq_moder
+static inline struct dim_cq_moder
 net_dim_get_tx_moderation(u8 cq_period_mode, int ix)
 {
-	struct net_dim_cq_moder cq_moder = tx_profile[cq_period_mode][ix];
+	struct dim_cq_moder cq_moder = tx_profile[cq_period_mode][ix];
 
 	cq_moder.cq_period_mode = cq_period_mode;
 	return cq_moder;
 }
 
-static inline struct net_dim_cq_moder
+static inline struct dim_cq_moder
 net_dim_get_def_tx_moderation(u8 cq_period_mode)
 {
 	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
@@ -125,7 +125,7 @@ net_dim_get_def_tx_moderation(u8 cq_period_mode)
 	return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
 }
 
-static inline int net_dim_step(struct net_dim *dim)
+static inline int net_dim_step(struct dim *dim)
 {
 	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
 		return DIM_TOO_TIRED;
@@ -152,7 +152,7 @@ static inline int net_dim_step(struct net_dim *dim)
 	return DIM_STEPPED;
 }
 
-static inline void net_dim_exit_parking(struct net_dim *dim)
+static inline void net_dim_exit_parking(struct dim *dim)
 {
 	dim->tune_state = dim->profile_ix ? DIM_GOING_LEFT :
 					  DIM_GOING_RIGHT;
@@ -189,7 +189,7 @@ static inline int net_dim_stats_compare(struct dim_stats *curr,
 }
 
 static inline bool net_dim_decision(struct dim_stats *curr_stats,
-				    struct net_dim *dim)
+				    struct dim *dim)
 {
 	int prev_state = dim->tune_state;
 	int prev_ix = dim->profile_ix;
@@ -240,8 +240,8 @@ static inline bool net_dim_decision(struct dim_stats *curr_stats,
 	return dim->profile_ix != prev_ix;
 }
 
-static inline void net_dim(struct net_dim *dim,
-			   struct net_dim_sample end_sample)
+static inline void net_dim(struct dim *dim,
+			   struct dim_sample end_sample)
 {
 	struct dim_stats curr_stats;
 	u16 nevents;
@@ -261,8 +261,8 @@ static inline void net_dim(struct net_dim *dim,
 		}
 		/* fall through */
 	case DIM_START_MEASURE:
-		net_dim_update_sample(end_sample.event_ctr, end_sample.pkt_ctr,
-				      end_sample.byte_ctr, &dim->start_sample);
+		dim_update_sample(end_sample.event_ctr, end_sample.pkt_ctr,
+				  end_sample.byte_ctr, &dim->start_sample);
 		dim->state = DIM_MEASURE_IN_PROGRESS;
 		break;
 	case DIM_APPLY_NEW_PROFILE:
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [for-next V2 06/10] linux/dim: Move implementation to .c files
  2019-06-25 20:57 [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
                   ` (4 preceding siblings ...)
  2019-06-25 20:57 ` [for-next V2 05/10] linux/dim: Rename externally used net_dim members Saeed Mahameed
@ 2019-06-25 20:57 ` Saeed Mahameed
  2019-07-02 16:15   ` Geert Uytterhoeven
  2019-06-25 20:57 ` [for-next V2 07/10] linux/dim: Add completions count to dim_sample Saeed Mahameed
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2019-06-25 20:57 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Sagi Grimberg, Tal Gilboa, netdev,
	linux-rdma, Saeed Mahameed

From: Tal Gilboa <talgi@mellanox.com>

Moved all logic from dim.h and net_dim.h to dim.c and net_dim.c.
This is both more structurally appealing and would allow to only
expose externally used functions.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 MAINTAINERS                                   |   2 +-
 drivers/net/ethernet/broadcom/Kconfig         |   1 +
 drivers/net/ethernet/broadcom/bcmsysport.h    |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   2 +-
 .../net/ethernet/broadcom/bnxt/bnxt_debugfs.c |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c |   2 +-
 .../net/ethernet/broadcom/genet/bcmgenet.h    |   2 +-
 .../net/ethernet/mellanox/mlx5/core/Kconfig   |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_dim.c  |   2 +-
 include/linux/dim.h                           | 319 ++++++++++++++----
 include/linux/net_dim.h                       | 273 ---------------
 lib/Kconfig                                   |   8 +
 lib/Makefile                                  |   1 +
 lib/dim/Makefile                              |   9 +
 lib/dim/dim.c                                 |  74 ++++
 lib/dim/net_dim.c                             | 190 +++++++++++
 17 files changed, 547 insertions(+), 345 deletions(-)
 delete mode 100644 include/linux/net_dim.h
 create mode 100644 lib/dim/Makefile
 create mode 100644 lib/dim/dim.c
 create mode 100644 lib/dim/net_dim.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 5d4b852d9d39..f78dd16195e3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5588,8 +5588,8 @@ F:	include/linux/dynamic_debug.h
 DYNAMIC INTERRUPT MODERATION
 M:	Tal Gilboa <talgi@mellanox.com>
 S:	Maintained
-F:	include/linux/net_dim.h
 F:	include/linux/dim.h
+F:	lib/dim/
 
 DZ DECSTATION DZ11 SERIAL DRIVER
 M:	"Maciej W. Rozycki" <macro@linux-mips.org>
diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
index b123509d385f..2e4a8c7237ef 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -8,6 +8,7 @@ config NET_VENDOR_BROADCOM
 	default y
 	depends on (SSB_POSSIBLE && HAS_DMA) || PCI || BCM63XX || \
 		   SIBYTE_SB1xxx_SOC
+	select DIMLIB
 	---help---
 	  If you have a network (Ethernet) chipset belonging to this class,
 	  say Y.
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.h b/drivers/net/ethernet/broadcom/bcmsysport.h
index cbe6d559d964..f6677a02d811 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.h
+++ b/drivers/net/ethernet/broadcom/bcmsysport.h
@@ -14,7 +14,7 @@
 #include <linux/bitmap.h>
 #include <linux/ethtool.h>
 #include <linux/if_vlan.h>
-#include <linux/net_dim.h>
+#include <linux/dim.h>
 
 /* Receive/transmit descriptor format */
 #define DESC_ADDR_HI_STATUS_LEN	0x00
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index a552c5539cc9..54c01705f3bd 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -23,7 +23,7 @@
 #include <net/devlink.h>
 #include <net/dst_metadata.h>
 #include <net/xdp.h>
-#include <linux/net_dim.h>
+#include <linux/dim.h>
 
 struct tx_bd {
 	__le32 tx_bd_len_flags_type;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c
index 3d1d53fbb135..61393f351a77 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c
@@ -11,7 +11,7 @@
 #include <linux/module.h>
 #include <linux/pci.h>
 #include "bnxt_hsi.h"
-#include <linux/net_dim.h>
+#include <linux/dim.h>
 #include "bnxt.h"
 #include "bnxt_debugfs.h"
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
index 11605f9fa61e..6f6576dc417a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
@@ -7,7 +7,7 @@
  * the Free Software Foundation.
  */
 
-#include <linux/net_dim.h>
+#include <linux/dim.h>
 #include "bnxt_hsi.h"
 #include "bnxt.h"
 
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index 6e418d9c3706..b2f05e47dc65 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -16,7 +16,7 @@
 #include <linux/mii.h>
 #include <linux/if_vlan.h>
 #include <linux/phy.h>
-#include <linux/net_dim.h>
+#include <linux/dim.h>
 
 /* total number of Buffer Descriptors, same for Rx/Tx */
 #define TOTAL_DESC				256
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 2391e3cfb56b..7845aa5bf6be 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -34,6 +34,7 @@ config MLX5_CORE_EN
 	depends on NETDEVICES && ETHERNET && INET && PCI && MLX5_CORE
 	depends on IPV6=y || IPV6=n || MLX5_CORE=m
 	select PAGE_POOL
+	select DIMLIB
 	default n
 	---help---
 	  Ethernet support in Mellanox Technologies ConnectX-4 NIC.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 11efd6e4bdc3..abf42d3aabe9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -48,7 +48,7 @@
 #include <linux/rhashtable.h>
 #include <net/switchdev.h>
 #include <net/xdp.h>
-#include <linux/net_dim.h>
+#include <linux/dim.h>
 #include <linux/bits.h>
 #include "wq.h"
 #include "mlx5_core.h"
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index ba3c1be9f2d3..ca9cfbf57d8f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -30,7 +30,7 @@
  * SOFTWARE.
  */
 
-#include <linux/net_dim.h>
+#include <linux/dim.h>
 #include "en.h"
 
 static void
diff --git a/include/linux/dim.h b/include/linux/dim.h
index 60e5074a7cc0..f48ede3e0322 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -6,20 +6,49 @@
 
 #include <linux/module.h>
 
+/**
+ * Number of events between DIM iterations.
+ * Causes a moderation of the algorithm run.
+ */
 #define DIM_NEVENTS 64
 
-/* more than 10% difference */
+/**
+ * Is a difference between values justifies taking an action.
+ * We consider 10% difference as significant.
+ */
 #define IS_SIGNIFICANT_DIFF(val, ref) \
 	(((100UL * abs((val) - (ref))) / (ref)) > 10)
+
+/**
+ * Calculate the gap between two values.
+ * Take wrap-around and variable size into consideration.
+ */
 #define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) \
-& (BIT_ULL(bits) - 1))
+		& (BIT_ULL(bits) - 1))
 
+/**
+ * Structure for CQ moderation values.
+ * Used for communications between DIM and its consumer.
+ *
+ * @usec: CQ timer suggestion (by DIM)
+ * @pkts: CQ packet counter suggestion (by DIM)
+ * @cq_period_mode: CQ priod count mode (from CQE/EQE)
+ */
 struct dim_cq_moder {
 	u16 usec;
 	u16 pkts;
 	u8 cq_period_mode;
 };
 
+/**
+ * Structure for DIM sample data.
+ * Used for communications between DIM and its consumer.
+ *
+ * @time: Sample timestamp
+ * @pkt_ctr: Number of packets
+ * @byte_ctr: Number of bytes
+ * @event_ctr: Number of events
+ */
 struct dim_sample {
 	ktime_t time;
 	u32 pkt_ctr;
@@ -27,13 +56,36 @@ struct dim_sample {
 	u16 event_ctr;
 };
 
+/**
+ * Structure for DIM stats.
+ * Used for holding current measured rates.
+ *
+ * @ppms: Packets per msec
+ * @bpms: Bytes per msec
+ * @epms: Events per msec
+ */
 struct dim_stats {
-	int ppms; /* packets per msec */
-	int bpms; /* bytes per msec */
-	int epms; /* events per msec */
+	int ppms;
+	int bpms;
+	int epms;
 };
 
-struct dim { /* Dynamic Interrupt Moderation */
+/**
+ * Main structure for dynamic interrupt moderation (DIM).
+ * Used for holding all information about a specific DIM instance.
+ *
+ * @state: Algorithm state (see below)
+ * @prev_stats: Measured rates from previous iteration (for comparison)
+ * @start_sample: Sampled data at start of current iteration
+ * @work: Work to perform on action required
+ * @profile_ix: Current moderation profile
+ * @mode: CQ period count mode
+ * @tune_state: Algorithm tuning state (see below)
+ * @steps_right: Number of steps taken towards higher moderation
+ * @steps_left: Number of steps taken towards lower moderation
+ * @tired: Parking depth counter
+ */
+struct dim {
 	u8 state;
 	struct dim_stats prev_stats;
 	struct dim_sample start_sample;
@@ -46,18 +98,49 @@ struct dim { /* Dynamic Interrupt Moderation */
 	u8 tired;
 };
 
+/**
+ * enum dim_cq_period_mode
+ *
+ * These are the modes for CQ period count.
+ *
+ * @DIM_CQ_PERIOD_MODE_START_FROM_EQE: Start counting from EQE
+ * @DIM_CQ_PERIOD_MODE_START_FROM_CQE: Start counting from CQE (implies timer reset)
+ * @DIM_CQ_PERIOD_NUM_MODES: Number of modes
+ */
 enum {
 	DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
 	DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
 	DIM_CQ_PERIOD_NUM_MODES
 };
 
+/**
+ * enum dim_state
+ *
+ * These are the DIM algorithm states.
+ * These will determine if the algorithm is in a valid state to start an iteration.
+ *
+ * @DIM_START_MEASURE: This is the first iteration (also after applying a new profile)
+ * @DIM_MEASURE_IN_PROGRESS: Algorithm is already in progress - check if
+ * need to perform an action
+ * @DIM_APPLY_NEW_PROFILE: DIM consumer is currently applying a profile - no need to measure
+ */
 enum {
 	DIM_START_MEASURE,
 	DIM_MEASURE_IN_PROGRESS,
 	DIM_APPLY_NEW_PROFILE,
 };
 
+/**
+ * enum dim_tune_state
+ *
+ * These are the DIM algorithm tune states.
+ * These will determine which action the algorithm should perform.
+ *
+ * @DIM_PARKING_ON_TOP: Algorithm found a local top point - exit on significant difference
+ * @DIM_PARKING_TIRED: Algorithm found a deep top point - don't exit if tired > 0
+ * @DIM_GOING_RIGHT: Algorithm is currently trying higher moderation levels
+ * @DIM_GOING_LEFT: Algorithm is currently trying lower moderation levels
+ */
 enum {
 	DIM_PARKING_ON_TOP,
 	DIM_PARKING_TIRED,
@@ -65,63 +148,95 @@ enum {
 	DIM_GOING_LEFT,
 };
 
+/**
+ * enum dim_stats_state
+ *
+ * These are the DIM algorithm statistics states.
+ * These will determine the verdict of current iteration.
+ *
+ * @DIM_STATS_WORSE: Current iteration shows worse performance than before
+ * @DIM_STATS_WORSE: Current iteration shows same performance than before
+ * @DIM_STATS_WORSE: Current iteration shows better performance than before
+ */
 enum {
 	DIM_STATS_WORSE,
 	DIM_STATS_SAME,
 	DIM_STATS_BETTER,
 };
 
+/**
+ * enum dim_step_result
+ *
+ * These are the DIM algorithm step results.
+ * These describe the result of a step.
+ *
+ * @DIM_STEPPED: Performed a regular step
+ * @DIM_TOO_TIRED: Same kind of step was done multiple times - should go to
+ * tired parking
+ * @DIM_ON_EDGE: Stepped to the most left/right profile
+ */
 enum {
 	DIM_STEPPED,
 	DIM_TOO_TIRED,
 	DIM_ON_EDGE,
 };
 
-static inline bool dim_on_top(struct dim *dim)
-{
-	switch (dim->tune_state) {
-	case DIM_PARKING_ON_TOP:
-	case DIM_PARKING_TIRED:
-		return true;
-	case DIM_GOING_RIGHT:
-		return (dim->steps_left > 1) && (dim->steps_right == 1);
-	default: /* DIM_GOING_LEFT */
-		return (dim->steps_right > 1) && (dim->steps_left == 1);
-	}
-}
+/**
+ *	dim_on_top - check if current state is a good place to stop (top location)
+ *	@dim: DIM context
+ *
+ * Check if current profile is a good place to park at.
+ * This will result in reducing the DIM checks frequency as we assume we
+ * shouldn't probably change profiles, unless traffic pattern wasn't changed.
+ */
+bool dim_on_top(struct dim *dim);
 
-static inline void dim_turn(struct dim *dim)
-{
-	switch (dim->tune_state) {
-	case DIM_PARKING_ON_TOP:
-	case DIM_PARKING_TIRED:
-		break;
-	case DIM_GOING_RIGHT:
-		dim->tune_state = DIM_GOING_LEFT;
-		dim->steps_left = 0;
-		break;
-	case DIM_GOING_LEFT:
-		dim->tune_state = DIM_GOING_RIGHT;
-		dim->steps_right = 0;
-		break;
-	}
-}
+/**
+ *	dim_turn - change profile alterning direction
+ *	@dim: DIM context
+ *
+ * Go left if we were going right and vice-versa.
+ * Do nothing if currently parking.
+ */
+void dim_turn(struct dim *dim);
 
-static inline void dim_park_on_top(struct dim *dim)
-{
-	dim->steps_right  = 0;
-	dim->steps_left   = 0;
-	dim->tired        = 0;
-	dim->tune_state   = DIM_PARKING_ON_TOP;
-}
+/**
+ *	dim_park_on_top - enter a parking state on a top location
+ *	@dim: DIM context
+ *
+ * Enter parking state.
+ * Clear all movement history.
+ */
+void dim_park_on_top(struct dim *dim);
 
-static inline void dim_park_tired(struct dim *dim)
-{
-	dim->steps_right  = 0;
-	dim->steps_left   = 0;
-	dim->tune_state   = DIM_PARKING_TIRED;
-}
+/**
+ *	dim_park_tired - enter a tired parking state
+ *	@dim: DIM context
+ *
+ * Enter parking state.
+ * Clear all movement history and cause DIM checks frequency to reduce.
+ */
+void dim_park_tired(struct dim *dim);
+
+/**
+ *	dim_calc_stats - calculate the difference between two samples
+ *	@start: start sample
+ *	@end: end sample
+ *	@curr_stats: delta between samples
+ *
+ * Calculate the delta between two samples (in data rates).
+ * Takes into consideration counter wrap-around.
+ */
+void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
+		    struct dim_stats *curr_stats);
 
+/**
+ *	dim_update_sample - set a sample's fields with give values
+ *	@event_ctr: number of events to set
+ *	@packets: number of packets to set
+ *	@bytes: number of bytes to set
+ *	@s: DIM sample
+ */
 static inline void
 dim_update_sample(u16 event_ctr, u64 packets, u64 bytes, struct dim_sample *s)
 {
@@ -131,23 +246,99 @@ dim_update_sample(u16 event_ctr, u64 packets, u64 bytes, struct dim_sample *s)
 	s->event_ctr = event_ctr;
 }
 
-static inline void
-dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
-	       struct dim_stats *curr_stats)
-{
-	/* u32 holds up to 71 minutes, should be enough */
-	u32 delta_us = ktime_us_delta(end->time, start->time);
-	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
-	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
-			     start->byte_ctr);
-
-	if (!delta_us)
-		return;
-
-	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
-	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
-	curr_stats->epms = DIV_ROUND_UP(DIM_NEVENTS * USEC_PER_MSEC,
-					delta_us);
+/* Net DIM */
+
+/*
+ * Net DIM profiles:
+ *        There are different set of profiles for each CQ period mode.
+ *        There are different set of profiles for RX/TX CQs.
+ *        Each profile size must be of NET_DIM_PARAMS_NUM_PROFILES
+ */
+#define NET_DIM_PARAMS_NUM_PROFILES 5
+#define NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
+#define NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE 128
+#define NET_DIM_DEF_PROFILE_CQE 1
+#define NET_DIM_DEF_PROFILE_EQE 1
+
+#define NET_DIM_RX_EQE_PROFILES { \
+	{1,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{8,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{64,  NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{128, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{256, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
 }
 
+#define NET_DIM_RX_CQE_PROFILES { \
+	{2,  256},             \
+	{8,  128},             \
+	{16, 64},              \
+	{32, 64},              \
+	{64, 64}               \
+}
+
+#define NET_DIM_TX_EQE_PROFILES { \
+	{1,   NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE},  \
+	{8,   NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE},  \
+	{32,  NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE},  \
+	{64,  NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE},  \
+	{128, NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE}   \
+}
+
+#define NET_DIM_TX_CQE_PROFILES { \
+	{5,  128},  \
+	{8,  64},  \
+	{16, 32},  \
+	{32, 32},  \
+	{64, 32}   \
+}
+
+static const struct dim_cq_moder
+rx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
+	NET_DIM_RX_EQE_PROFILES,
+	NET_DIM_RX_CQE_PROFILES,
+};
+
+static const struct dim_cq_moder
+tx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
+	NET_DIM_TX_EQE_PROFILES,
+	NET_DIM_TX_CQE_PROFILES,
+};
+
+/**
+ *	net_dim_get_rx_moderation - provide a CQ moderation object for the given RX profile
+ *	@cq_period_mode: CQ period mode
+ *	@ix: Profile index
+ */
+struct dim_cq_moder net_dim_get_rx_moderation(u8 cq_period_mode, int ix);
+
+/**
+ *	net_dim_get_def_rx_moderation - provide the default RX moderation
+ *	@cq_period_mode: CQ period mode
+ */
+struct dim_cq_moder net_dim_get_def_rx_moderation(u8 cq_period_mode);
+
+/**
+ *	net_dim_get_tx_moderation - provide a CQ moderation object for the given TX profile
+ *	@cq_period_mode: CQ period mode
+ *	@ix: Profile index
+ */
+struct dim_cq_moder net_dim_get_tx_moderation(u8 cq_period_mode, int ix);
+
+/**
+ *	net_dim_get_def_tx_moderation - provide the default TX moderation
+ *	@cq_period_mode: CQ period mode
+ */
+struct dim_cq_moder net_dim_get_def_tx_moderation(u8 cq_period_mode);
+
+/**
+ *	net_dim - main DIM algorithm entry point
+ *	@dim: DIM instance information
+ *	@end_sample: Current data measurement
+ *
+ * Called by the consumer.
+ * This is the main logic of the algorithm, where data is processed in order to decide on next
+ * required action.
+ */
+void net_dim(struct dim *dim, struct dim_sample end_sample);
+
 #endif /* DIM_H */
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
deleted file mode 100644
index 4e009ec193ef..000000000000
--- a/include/linux/net_dim.h
+++ /dev/null
@@ -1,273 +0,0 @@
-/*
- * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
- * Copyright (c) 2017-2018, Broadcom Limited. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-#ifndef NET_DIM_H
-#define NET_DIM_H
-
-#include <linux/module.h>
-#include <linux/dim.h>
-
-#define NET_DIM_PARAMS_NUM_PROFILES 5
-/* Netdev dynamic interrupt moderation profiles */
-#define NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
-#define NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE 128
-#define NET_DIM_DEF_PROFILE_CQE 1
-#define NET_DIM_DEF_PROFILE_EQE 1
-
-/* All profiles sizes must be NET_PARAMS_DIM_NUM_PROFILES */
-#define NET_DIM_RX_EQE_PROFILES { \
-	{1,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{8,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{64,  NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{128, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{256, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-}
-
-#define NET_DIM_RX_CQE_PROFILES { \
-	{2,  256},             \
-	{8,  128},             \
-	{16, 64},              \
-	{32, 64},              \
-	{64, 64}               \
-}
-
-#define NET_DIM_TX_EQE_PROFILES { \
-	{1,   NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE},  \
-	{8,   NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE},  \
-	{32,  NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE},  \
-	{64,  NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE},  \
-	{128, NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE}   \
-}
-
-#define NET_DIM_TX_CQE_PROFILES { \
-	{5,  128},  \
-	{8,  64},  \
-	{16, 32},  \
-	{32, 32},  \
-	{64, 32}   \
-}
-
-static const struct dim_cq_moder
-rx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
-	NET_DIM_RX_EQE_PROFILES,
-	NET_DIM_RX_CQE_PROFILES,
-};
-
-static const struct dim_cq_moder
-tx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
-	NET_DIM_TX_EQE_PROFILES,
-	NET_DIM_TX_CQE_PROFILES,
-};
-
-static inline struct dim_cq_moder
-net_dim_get_rx_moderation(u8 cq_period_mode, int ix)
-{
-	struct dim_cq_moder cq_moder = rx_profile[cq_period_mode][ix];
-
-	cq_moder.cq_period_mode = cq_period_mode;
-	return cq_moder;
-}
-
-static inline struct dim_cq_moder
-net_dim_get_def_rx_moderation(u8 cq_period_mode)
-{
-	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
-			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
-
-	return net_dim_get_rx_moderation(cq_period_mode, profile_ix);
-}
-
-static inline struct dim_cq_moder
-net_dim_get_tx_moderation(u8 cq_period_mode, int ix)
-{
-	struct dim_cq_moder cq_moder = tx_profile[cq_period_mode][ix];
-
-	cq_moder.cq_period_mode = cq_period_mode;
-	return cq_moder;
-}
-
-static inline struct dim_cq_moder
-net_dim_get_def_tx_moderation(u8 cq_period_mode)
-{
-	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
-			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
-
-	return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
-}
-
-static inline int net_dim_step(struct dim *dim)
-{
-	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
-		return DIM_TOO_TIRED;
-
-	switch (dim->tune_state) {
-	case DIM_PARKING_ON_TOP:
-	case DIM_PARKING_TIRED:
-		break;
-	case DIM_GOING_RIGHT:
-		if (dim->profile_ix == (NET_DIM_PARAMS_NUM_PROFILES - 1))
-			return DIM_ON_EDGE;
-		dim->profile_ix++;
-		dim->steps_right++;
-		break;
-	case DIM_GOING_LEFT:
-		if (dim->profile_ix == 0)
-			return DIM_ON_EDGE;
-		dim->profile_ix--;
-		dim->steps_left++;
-		break;
-	}
-
-	dim->tired++;
-	return DIM_STEPPED;
-}
-
-static inline void net_dim_exit_parking(struct dim *dim)
-{
-	dim->tune_state = dim->profile_ix ? DIM_GOING_LEFT :
-					  DIM_GOING_RIGHT;
-	net_dim_step(dim);
-}
-
-static inline int net_dim_stats_compare(struct dim_stats *curr,
-					struct dim_stats *prev)
-{
-	if (!prev->bpms)
-		return curr->bpms ? DIM_STATS_BETTER :
-				    DIM_STATS_SAME;
-
-	if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
-		return (curr->bpms > prev->bpms) ? DIM_STATS_BETTER :
-						   DIM_STATS_WORSE;
-
-	if (!prev->ppms)
-		return curr->ppms ? DIM_STATS_BETTER :
-				    DIM_STATS_SAME;
-
-	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
-		return (curr->ppms > prev->ppms) ? DIM_STATS_BETTER :
-						   DIM_STATS_WORSE;
-
-	if (!prev->epms)
-		return DIM_STATS_SAME;
-
-	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
-		return (curr->epms < prev->epms) ? DIM_STATS_BETTER :
-						   DIM_STATS_WORSE;
-
-	return DIM_STATS_SAME;
-}
-
-static inline bool net_dim_decision(struct dim_stats *curr_stats,
-				    struct dim *dim)
-{
-	int prev_state = dim->tune_state;
-	int prev_ix = dim->profile_ix;
-	int stats_res;
-	int step_res;
-
-	switch (dim->tune_state) {
-	case DIM_PARKING_ON_TOP:
-		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
-		if (stats_res != DIM_STATS_SAME)
-			net_dim_exit_parking(dim);
-		break;
-
-	case DIM_PARKING_TIRED:
-		dim->tired--;
-		if (!dim->tired)
-			net_dim_exit_parking(dim);
-		break;
-
-	case DIM_GOING_RIGHT:
-	case DIM_GOING_LEFT:
-		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
-		if (stats_res != DIM_STATS_BETTER)
-			dim_turn(dim);
-
-		if (dim_on_top(dim)) {
-			dim_park_on_top(dim);
-			break;
-		}
-
-		step_res = net_dim_step(dim);
-		switch (step_res) {
-		case DIM_ON_EDGE:
-			dim_park_on_top(dim);
-			break;
-		case DIM_TOO_TIRED:
-			dim_park_tired(dim);
-			break;
-		}
-
-		break;
-	}
-
-	if (prev_state != DIM_PARKING_ON_TOP ||
-	    dim->tune_state != DIM_PARKING_ON_TOP)
-		dim->prev_stats = *curr_stats;
-
-	return dim->profile_ix != prev_ix;
-}
-
-static inline void net_dim(struct dim *dim,
-			   struct dim_sample end_sample)
-{
-	struct dim_stats curr_stats;
-	u16 nevents;
-
-	switch (dim->state) {
-	case DIM_MEASURE_IN_PROGRESS:
-		nevents = BIT_GAP(BITS_PER_TYPE(u16),
-				  end_sample.event_ctr,
-				  dim->start_sample.event_ctr);
-		if (nevents < DIM_NEVENTS)
-			break;
-		dim_calc_stats(&dim->start_sample, &end_sample, &curr_stats);
-		if (net_dim_decision(&curr_stats, dim)) {
-			dim->state = DIM_APPLY_NEW_PROFILE;
-			schedule_work(&dim->work);
-			break;
-		}
-		/* fall through */
-	case DIM_START_MEASURE:
-		dim_update_sample(end_sample.event_ctr, end_sample.pkt_ctr,
-				  end_sample.byte_ctr, &dim->start_sample);
-		dim->state = DIM_MEASURE_IN_PROGRESS;
-		break;
-	case DIM_APPLY_NEW_PROFILE:
-		break;
-	}
-}
-
-#endif /* NET_DIM_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index 90623a0e1942..78ddb9526b62 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -562,6 +562,14 @@ config SIGNATURE
 	  Digital signature verification. Currently only RSA is supported.
 	  Implementation is done using GnuPG MPI library
 
+config DIMLIB
+	bool "DIM library"
+	default y
+	help
+	  Dynamic Interrupt Moderation library.
+	  Implements an algorithm for dynamically change CQ modertion values
+	  according to run time performance.
+
 #
 # libfdt files, only selected if needed.
 #
diff --git a/lib/Makefile b/lib/Makefile
index fb7697031a79..dcb558c7554d 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -202,6 +202,7 @@ obj-$(CONFIG_GLOB) += glob.o
 obj-$(CONFIG_GLOB_SELFTEST) += globtest.o
 
 obj-$(CONFIG_MPILIB) += mpi/
+obj-$(CONFIG_DIMLIB) += dim/
 obj-$(CONFIG_SIGNATURE) += digsig.o
 
 lib-$(CONFIG_CLZ_TAB) += clz_tab.o
diff --git a/lib/dim/Makefile b/lib/dim/Makefile
new file mode 100644
index 000000000000..160afe288df0
--- /dev/null
+++ b/lib/dim/Makefile
@@ -0,0 +1,9 @@
+#
+# DIM Dynamic Interrupt Moderation library
+#
+
+obj-$(CONFIG_DIMLIB) = net_dim.o
+
+net_dim-y = \
+	dim.o		\
+	net_dim.o
diff --git a/lib/dim/dim.c b/lib/dim/dim.c
new file mode 100644
index 000000000000..17d5236759bd
--- /dev/null
+++ b/lib/dim/dim.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2019, Mellanox Technologies inc.  All rights reserved.
+ */
+
+#include <linux/dim.h>
+
+bool dim_on_top(struct dim *dim)
+{
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
+		return true;
+	case DIM_GOING_RIGHT:
+		return (dim->steps_left > 1) && (dim->steps_right == 1);
+	default: /* DIM_GOING_LEFT */
+		return (dim->steps_right > 1) && (dim->steps_left == 1);
+	}
+}
+EXPORT_SYMBOL(dim_on_top);
+
+void dim_turn(struct dim *dim)
+{
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
+		break;
+	case DIM_GOING_RIGHT:
+		dim->tune_state = DIM_GOING_LEFT;
+		dim->steps_left = 0;
+		break;
+	case DIM_GOING_LEFT:
+		dim->tune_state = DIM_GOING_RIGHT;
+		dim->steps_right = 0;
+		break;
+	}
+}
+EXPORT_SYMBOL(dim_turn);
+
+void dim_park_on_top(struct dim *dim)
+{
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tired        = 0;
+	dim->tune_state   = DIM_PARKING_ON_TOP;
+}
+EXPORT_SYMBOL(dim_park_on_top);
+
+void dim_park_tired(struct dim *dim)
+{
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tune_state   = DIM_PARKING_TIRED;
+}
+EXPORT_SYMBOL(dim_park_tired);
+
+void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
+		    struct dim_stats *curr_stats)
+{
+	/* u32 holds up to 71 minutes, should be enough */
+	u32 delta_us = ktime_us_delta(end->time, start->time);
+	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
+	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
+			     start->byte_ctr);
+
+	if (!delta_us)
+		return;
+
+	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
+	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
+	curr_stats->epms = DIV_ROUND_UP(DIM_NEVENTS * USEC_PER_MSEC,
+					delta_us);
+}
+EXPORT_SYMBOL(dim_calc_stats);
diff --git a/lib/dim/net_dim.c b/lib/dim/net_dim.c
new file mode 100644
index 000000000000..5bcc902c5388
--- /dev/null
+++ b/lib/dim/net_dim.c
@@ -0,0 +1,190 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ */
+
+#include <linux/dim.h>
+
+struct dim_cq_moder
+net_dim_get_rx_moderation(u8 cq_period_mode, int ix)
+{
+	struct dim_cq_moder cq_moder = rx_profile[cq_period_mode][ix];
+
+	cq_moder.cq_period_mode = cq_period_mode;
+	return cq_moder;
+}
+EXPORT_SYMBOL(net_dim_get_rx_moderation);
+
+struct dim_cq_moder
+net_dim_get_def_rx_moderation(u8 cq_period_mode)
+{
+	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
+			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
+
+	return net_dim_get_rx_moderation(cq_period_mode, profile_ix);
+}
+EXPORT_SYMBOL(net_dim_get_def_rx_moderation);
+
+struct dim_cq_moder
+net_dim_get_tx_moderation(u8 cq_period_mode, int ix)
+{
+	struct dim_cq_moder cq_moder = tx_profile[cq_period_mode][ix];
+
+	cq_moder.cq_period_mode = cq_period_mode;
+	return cq_moder;
+}
+EXPORT_SYMBOL(net_dim_get_tx_moderation);
+
+struct dim_cq_moder
+net_dim_get_def_tx_moderation(u8 cq_period_mode)
+{
+	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
+			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
+
+	return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
+}
+EXPORT_SYMBOL(net_dim_get_def_tx_moderation);
+
+static int net_dim_step(struct dim *dim)
+{
+	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
+		return DIM_TOO_TIRED;
+
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
+		break;
+	case DIM_GOING_RIGHT:
+		if (dim->profile_ix == (NET_DIM_PARAMS_NUM_PROFILES - 1))
+			return DIM_ON_EDGE;
+		dim->profile_ix++;
+		dim->steps_right++;
+		break;
+	case DIM_GOING_LEFT:
+		if (dim->profile_ix == 0)
+			return DIM_ON_EDGE;
+		dim->profile_ix--;
+		dim->steps_left++;
+		break;
+	}
+
+	dim->tired++;
+	return DIM_STEPPED;
+}
+
+static void net_dim_exit_parking(struct dim *dim)
+{
+	dim->tune_state = dim->profile_ix ? DIM_GOING_LEFT : DIM_GOING_RIGHT;
+	net_dim_step(dim);
+}
+
+static int net_dim_stats_compare(struct dim_stats *curr,
+				 struct dim_stats *prev)
+{
+	if (!prev->bpms)
+		return curr->bpms ? DIM_STATS_BETTER : DIM_STATS_SAME;
+
+	if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
+		return (curr->bpms > prev->bpms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
+
+	if (!prev->ppms)
+		return curr->ppms ? DIM_STATS_BETTER :
+				    DIM_STATS_SAME;
+
+	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
+		return (curr->ppms > prev->ppms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
+
+	if (!prev->epms)
+		return DIM_STATS_SAME;
+
+	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
+		return (curr->epms < prev->epms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
+
+	return DIM_STATS_SAME;
+}
+
+static bool net_dim_decision(struct dim_stats *curr_stats, struct dim *dim)
+{
+	int prev_state = dim->tune_state;
+	int prev_ix = dim->profile_ix;
+	int stats_res;
+	int step_res;
+
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+		stats_res = net_dim_stats_compare(curr_stats,
+						  &dim->prev_stats);
+		if (stats_res != DIM_STATS_SAME)
+			net_dim_exit_parking(dim);
+		break;
+
+	case DIM_PARKING_TIRED:
+		dim->tired--;
+		if (!dim->tired)
+			net_dim_exit_parking(dim);
+		break;
+
+	case DIM_GOING_RIGHT:
+	case DIM_GOING_LEFT:
+		stats_res = net_dim_stats_compare(curr_stats,
+						  &dim->prev_stats);
+		if (stats_res != DIM_STATS_BETTER)
+			dim_turn(dim);
+
+		if (dim_on_top(dim)) {
+			dim_park_on_top(dim);
+			break;
+		}
+
+		step_res = net_dim_step(dim);
+		switch (step_res) {
+		case DIM_ON_EDGE:
+			dim_park_on_top(dim);
+			break;
+		case DIM_TOO_TIRED:
+			dim_park_tired(dim);
+			break;
+		}
+
+		break;
+	}
+
+	if (prev_state != DIM_PARKING_ON_TOP ||
+	    dim->tune_state != DIM_PARKING_ON_TOP)
+		dim->prev_stats = *curr_stats;
+
+	return dim->profile_ix != prev_ix;
+}
+
+void net_dim(struct dim *dim, struct dim_sample end_sample)
+{
+	struct dim_stats curr_stats;
+	u16 nevents;
+
+	switch (dim->state) {
+	case DIM_MEASURE_IN_PROGRESS:
+		nevents = BIT_GAP(BITS_PER_TYPE(u16),
+				  end_sample.event_ctr,
+				  dim->start_sample.event_ctr);
+		if (nevents < DIM_NEVENTS)
+			break;
+		dim_calc_stats(&dim->start_sample, &end_sample, &curr_stats);
+		if (net_dim_decision(&curr_stats, dim)) {
+			dim->state = DIM_APPLY_NEW_PROFILE;
+			schedule_work(&dim->work);
+			break;
+		}
+		/* fall through */
+	case DIM_START_MEASURE:
+		dim_update_sample(end_sample.event_ctr, end_sample.pkt_ctr,
+				  end_sample.byte_ctr, &dim->start_sample);
+		dim->state = DIM_MEASURE_IN_PROGRESS;
+		break;
+	case DIM_APPLY_NEW_PROFILE:
+		break;
+	}
+}
+EXPORT_SYMBOL(net_dim);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [for-next V2 07/10] linux/dim: Add completions count to dim_sample
  2019-06-25 20:57 [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
                   ` (5 preceding siblings ...)
  2019-06-25 20:57 ` [for-next V2 06/10] linux/dim: Move implementation to .c files Saeed Mahameed
@ 2019-06-25 20:57 ` Saeed Mahameed
  2019-06-25 20:57 ` [for-next V2 08/10] linux/dim: Implement rdma_dim Saeed Mahameed
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Saeed Mahameed @ 2019-06-25 20:57 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Sagi Grimberg, Tal Gilboa, netdev,
	linux-rdma, Yamin Friedman, Max Gurtovoy, Saeed Mahameed

From: Yamin Friedman <yaminf@mellanox.com>

Added a measurement of completions per/msec to allow for completion based
dim algorithms.

In order to use dynamic interrupt moderation with RDMA we need to have a
different measurment than packets per second. This change is meant to
prepare for adding a new DIM method.

All drivers that use net_dim and thus do not need a completion count will
have the completions set to 0.

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/dim.h | 28 +++++++++++++++++++++++++---
 lib/dim/dim.c       |  9 +++++++++
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/include/linux/dim.h b/include/linux/dim.h
index f48ede3e0322..aa9bdd47a648 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -37,6 +37,7 @@
 struct dim_cq_moder {
 	u16 usec;
 	u16 pkts;
+	u16 comps;
 	u8 cq_period_mode;
 };
 
@@ -54,6 +55,7 @@ struct dim_sample {
 	u32 pkt_ctr;
 	u32 byte_ctr;
 	u16 event_ctr;
+	u32 comp_ctr;
 };
 
 /**
@@ -65,9 +67,11 @@ struct dim_sample {
  * @epms: Events per msec
  */
 struct dim_stats {
-	int ppms;
-	int bpms;
-	int epms;
+	int ppms; /* packets per msec */
+	int bpms; /* bytes per msec */
+	int epms; /* events per msec */
+	int cpms; /* completions per msec */
+	int cpe_ratio; /* ratio of completions to events */
 };
 
 /**
@@ -89,6 +93,7 @@ struct dim {
 	u8 state;
 	struct dim_stats prev_stats;
 	struct dim_sample start_sample;
+	struct dim_sample measuring_sample;
 	struct work_struct work;
 	u8 profile_ix;
 	u8 mode;
@@ -246,6 +251,23 @@ dim_update_sample(u16 event_ctr, u64 packets, u64 bytes, struct dim_sample *s)
 	s->event_ctr = event_ctr;
 }
 
+/**
+ *	dim_update_sample_with_comps - set a sample's fields with given
+ *	values including the completion parameter
+ *	@event_ctr: number of events to set
+ *	@packets: number of packets to set
+ *	@bytes: number of bytes to set
+ *	@comps: number of completions to set
+ *	@s: DIM sample
+ */
+static inline void
+dim_update_sample_with_comps(u16 event_ctr, u64 packets, u64 bytes, u64 comps,
+			     struct dim_sample *s)
+{
+	dim_update_sample(event_ctr, packets, bytes, s);
+	s->comp_ctr = comps;
+}
+
 /* Net DIM */
 
 /*
diff --git a/lib/dim/dim.c b/lib/dim/dim.c
index 17d5236759bd..439d641ec796 100644
--- a/lib/dim/dim.c
+++ b/lib/dim/dim.c
@@ -62,6 +62,8 @@ void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
 	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
 	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
 			     start->byte_ctr);
+	u32 ncomps = BIT_GAP(BITS_PER_TYPE(u32), end->comp_ctr,
+			     start->comp_ctr);
 
 	if (!delta_us)
 		return;
@@ -70,5 +72,12 @@ void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
 	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
 	curr_stats->epms = DIV_ROUND_UP(DIM_NEVENTS * USEC_PER_MSEC,
 					delta_us);
+	curr_stats->cpms = DIV_ROUND_UP(ncomps * USEC_PER_MSEC, delta_us);
+	if (curr_stats->epms != 0)
+		curr_stats->cpe_ratio =
+				(curr_stats->cpms * 100) / curr_stats->epms;
+	else
+		curr_stats->cpe_ratio = 0;
+
 }
 EXPORT_SYMBOL(dim_calc_stats);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [for-next V2 08/10] linux/dim: Implement rdma_dim
  2019-06-25 20:57 [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
                   ` (6 preceding siblings ...)
  2019-06-25 20:57 ` [for-next V2 07/10] linux/dim: Add completions count to dim_sample Saeed Mahameed
@ 2019-06-25 20:57 ` Saeed Mahameed
  2019-06-25 22:02   ` Sagi Grimberg
  2019-06-25 20:57 ` [for-next V2 09/10] RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink Saeed Mahameed
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2019-06-25 20:57 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Sagi Grimberg, Tal Gilboa, netdev,
	linux-rdma, Yamin Friedman, Max Gurtovoy, Saeed Mahameed

From: Yamin Friedman <yaminf@mellanox.com>

rdma_dim implements a different algorithm than net_dim and is based on
completions which is how we can implement interrupt moderation in RDMA.
The algorithm optimizes for number of completions and ratio between
completions and events.
It also has a feature for fast reduction of moderation level when the
traffic changes in such a way as to no longer require high moderation in
order to avoid long latencies.

rdma_dim will be called from the ib_core module.

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/dim.h |  36 ++++++++++++++
 lib/dim/Makefile    |   6 +--
 lib/dim/rdma_dim.c  | 112 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 150 insertions(+), 4 deletions(-)
 create mode 100644 lib/dim/rdma_dim.c

diff --git a/include/linux/dim.h b/include/linux/dim.h
index aa9bdd47a648..1ae32835723a 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -82,6 +82,7 @@ struct dim_stats {
  * @prev_stats: Measured rates from previous iteration (for comparison)
  * @start_sample: Sampled data at start of current iteration
  * @work: Work to perform on action required
+ * @dim_owner: A pointer to the struct that points to dim
  * @profile_ix: Current moderation profile
  * @mode: CQ period count mode
  * @tune_state: Algorithm tuning state (see below)
@@ -95,6 +96,7 @@ struct dim {
 	struct dim_sample start_sample;
 	struct dim_sample measuring_sample;
 	struct work_struct work;
+	void *dim_owner;
 	u8 profile_ix;
 	u8 mode;
 	u8 tune_state;
@@ -363,4 +365,38 @@ struct dim_cq_moder net_dim_get_def_tx_moderation(u8 cq_period_mode);
  */
 void net_dim(struct dim *dim, struct dim_sample end_sample);
 
+/* RDMA DIM */
+
+/*
+ * RDMA DIM profile:
+ * profile size must be of RDMA_DIM_PARAMS_NUM_PROFILES.
+ */
+#define RDMA_DIM_PARAMS_NUM_PROFILES 9
+#define RDMA_DIM_START_PROFILE 0
+
+static const struct dim_cq_moder
+rdma_dim_prof[RDMA_DIM_PARAMS_NUM_PROFILES] = {
+	{1,   0, 1,  0},
+	{1,   0, 4,  0},
+	{2,   0, 4,  0},
+	{2,   0, 8,  0},
+	{4,   0, 8,  0},
+	{16,  0, 8,  0},
+	{16,  0, 16, 0},
+	{32,  0, 16, 0},
+	{32,  0, 32, 0},
+};
+
+/**
+ * rdma_dim - Runs the adaptive moderation.
+ * @dim: The moderation struct.
+ * @completions: The number of completions collected in this round.
+ *
+ * Each call to rdma_dim takes the latest amount of completions that
+ * have been collected and counts them as a new event.
+ * Once enough events have been collected the algorithm decides a new
+ * moderation level.
+ */
+void rdma_dim(struct dim *dim, u64 completions);
+
 #endif /* DIM_H */
diff --git a/lib/dim/Makefile b/lib/dim/Makefile
index 160afe288df0..1d6858a108cb 100644
--- a/lib/dim/Makefile
+++ b/lib/dim/Makefile
@@ -2,8 +2,6 @@
 # DIM Dynamic Interrupt Moderation library
 #
 
-obj-$(CONFIG_DIMLIB) = net_dim.o
+obj-$(CONFIG_DIMLIB) += dim.o
 
-net_dim-y = \
-	dim.o		\
-	net_dim.o
+dim-y := dim.o net_dim.o rdma_dim.o
diff --git a/lib/dim/rdma_dim.c b/lib/dim/rdma_dim.c
new file mode 100644
index 000000000000..1bfe8f546a20
--- /dev/null
+++ b/lib/dim/rdma_dim.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2019, Mellanox Technologies inc.  All rights reserved.
+ */
+
+#include <linux/dim.h>
+
+static int rdma_dim_step(struct dim *dim)
+{
+	if (dim->tune_state == DIM_GOING_RIGHT) {
+		if (dim->profile_ix == (RDMA_DIM_PARAMS_NUM_PROFILES - 1))
+			return DIM_ON_EDGE;
+		dim->profile_ix++;
+		dim->steps_right++;
+	}
+	if (dim->tune_state == DIM_GOING_LEFT) {
+		if (dim->profile_ix == 0)
+			return DIM_ON_EDGE;
+		dim->profile_ix--;
+		dim->steps_left++;
+	}
+
+	return DIM_STEPPED;
+}
+
+static int rdma_dim_stats_compare(struct dim_stats *curr,
+				  struct dim_stats *prev)
+{
+	/* first stat */
+	if (!prev->cpms)
+		return DIM_STATS_SAME;
+
+	if (IS_SIGNIFICANT_DIFF(curr->cpms, prev->cpms))
+		return (curr->cpms > prev->cpms) ? DIM_STATS_BETTER :
+						DIM_STATS_WORSE;
+
+	if (IS_SIGNIFICANT_DIFF(curr->cpe_ratio, prev->cpe_ratio))
+		return (curr->cpe_ratio > prev->cpe_ratio) ? DIM_STATS_BETTER :
+						DIM_STATS_WORSE;
+
+	return DIM_STATS_SAME;
+}
+
+static bool rdma_dim_decision(struct dim_stats *curr_stats, struct dim *dim)
+{
+	int prev_ix = dim->profile_ix;
+	u8 state = dim->tune_state;
+	int stats_res;
+	int step_res;
+
+	if (state != DIM_PARKING_ON_TOP && state != DIM_PARKING_TIRED) {
+		stats_res = rdma_dim_stats_compare(curr_stats,
+						   &dim->prev_stats);
+
+		switch (stats_res) {
+		case DIM_STATS_SAME:
+			if (curr_stats->cpe_ratio <= 50 * prev_ix)
+				dim->profile_ix = 0;
+			break;
+		case DIM_STATS_WORSE:
+			dim_turn(dim);
+			/* fall through */
+		case DIM_STATS_BETTER:
+			step_res = rdma_dim_step(dim);
+			if (step_res == DIM_ON_EDGE)
+				dim_turn(dim);
+			break;
+		}
+	}
+
+	dim->prev_stats = *curr_stats;
+
+	return dim->profile_ix != prev_ix;
+}
+
+void rdma_dim(struct dim *dim, u64 completions)
+{
+	struct dim_sample *curr_sample = &dim->measuring_sample;
+	struct dim_stats curr_stats;
+	u32 nevents;
+
+	dim_update_sample_with_comps(curr_sample->event_ctr + 1,
+				     curr_sample->pkt_ctr,
+				     curr_sample->byte_ctr,
+				     curr_sample->comp_ctr + completions,
+				     &dim->measuring_sample);
+
+	switch (dim->state) {
+	case DIM_MEASURE_IN_PROGRESS:
+		nevents = curr_sample->event_ctr - dim->start_sample.event_ctr;
+		if (nevents < DIM_NEVENTS)
+			break;
+		dim_calc_stats(&dim->start_sample, curr_sample, &curr_stats);
+		if (rdma_dim_decision(&curr_stats, dim)) {
+			dim->state = DIM_APPLY_NEW_PROFILE;
+			schedule_work(&dim->work);
+			break;
+		}
+		/* fall through */
+	case DIM_START_MEASURE:
+		dim->state = DIM_MEASURE_IN_PROGRESS;
+		dim_update_sample_with_comps(curr_sample->event_ctr,
+					     curr_sample->pkt_ctr,
+					     curr_sample->byte_ctr,
+					     curr_sample->comp_ctr,
+					     &dim->start_sample);
+		break;
+	case DIM_APPLY_NEW_PROFILE:
+		break;
+	}
+}
+EXPORT_SYMBOL(rdma_dim);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [for-next V2 09/10] RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink
  2019-06-25 20:57 [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
                   ` (7 preceding siblings ...)
  2019-06-25 20:57 ` [for-next V2 08/10] linux/dim: Implement rdma_dim Saeed Mahameed
@ 2019-06-25 20:57 ` Saeed Mahameed
  2019-06-25 21:15   ` Sagi Grimberg
  2019-06-25 20:57 ` [for-next V2 10/10] RDMA/core: Provide RDMA DIM support for ULPs Saeed Mahameed
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2019-06-25 20:57 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Sagi Grimberg, Tal Gilboa, netdev,
	linux-rdma, Yamin Friedman, Saeed Mahameed

From: Yamin Friedman <yaminf@mellanox.com>

Added parameter in ib_device for enabling dynamic interrupt moderation so
that it can be configured in userspace using rdma tool.

In order to set dim for an ib device the command is:
rdma dev set [DEV] dim [on|off]
Please set on/off.

rdma dev show
0: mlx5_0: node_type ca fw 16.26.0055 node_guid 248a:0703:00a5:29d0
sys_image_guid 248a:0703:00a5:29d0 dim on

rdma resource show cq
dev mlx5_0 cqn 0 cqe 1023 users 4 poll-ctx UNBOUND_WORKQUEUE dim off
comm [ib_core]

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/infiniband/Kconfig          |  1 +
 drivers/infiniband/core/core_priv.h |  1 +
 drivers/infiniband/core/device.c    |  9 +++++++++
 drivers/infiniband/core/nldev.c     | 14 ++++++++++++++
 include/rdma/ib_verbs.h             |  4 ++++
 include/uapi/rdma/rdma_netlink.h    |  5 +++++
 6 files changed, 34 insertions(+)

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 8ba41cbf1869..060649093ee1 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -7,6 +7,7 @@ menuconfig INFINIBAND
 	depends on m || IPV6 != m
 	depends on !ALPHA
 	select IRQ_POLL
+	select DIMLIB
 	---help---
 	  Core support for InfiniBand (IB).  Make sure to also select
 	  any protocols you wish to use as well as drivers for your
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index ff40a450b5d2..9724179a7d7b 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -60,6 +60,7 @@ extern bool ib_devices_shared_netns;
 int ib_device_register_sysfs(struct ib_device *device);
 void ib_device_unregister_sysfs(struct ib_device *device);
 int ib_device_rename(struct ib_device *ibdev, const char *name);
+int ib_device_set_dim(struct ib_device *ibdev, u8 use_dim);
 
 typedef void (*roce_netdev_callback)(struct ib_device *device, u8 port,
 	      struct net_device *idev, void *cookie);
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 78dc07c6ac4b..7da149f1afe2 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -432,6 +432,15 @@ int ib_device_rename(struct ib_device *ibdev, const char *name)
 	return ret;
 }
 
+int ib_device_set_dim(struct ib_device *ibdev, u8 use_dim)
+{
+	if (use_dim > 1)
+		return -EINVAL;
+	ibdev->use_cq_dim = use_dim;
+
+	return 0;
+}
+
 static int alloc_name(struct ib_device *ibdev, const char *name)
 {
 	struct ib_device *device;
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 69188cbbd99b..71d1ec0e43cb 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -120,6 +120,7 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
 	[RDMA_NLDEV_ATTR_DEV_PROTOCOL]		= { .type = NLA_NUL_STRING,
 				    .len = RDMA_NLDEV_ATTR_ENTRY_STRLEN },
 	[RDMA_NLDEV_NET_NS_FD]			= { .type = NLA_U32 },
+	[RDMA_NLDEV_ATTR_DEV_DIM]		= { .type = NLA_U8 },
 };
 
 static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
@@ -232,6 +233,8 @@ static int fill_dev_info(struct sk_buff *msg, struct ib_device *device)
 		return -EMSGSIZE;
 	if (nla_put_u8(msg, RDMA_NLDEV_ATTR_DEV_NODE_TYPE, device->node_type))
 		return -EMSGSIZE;
+	if (nla_put_u8(msg, RDMA_NLDEV_ATTR_DEV_DIM, device->use_cq_dim))
+		return -EMSGSIZE;
 
 	/*
 	 * Link type is determined on first port and mlx4 device
@@ -532,6 +535,9 @@ static int fill_res_cq_entry(struct sk_buff *msg, bool has_cap_net_admin,
 	    nla_put_u8(msg, RDMA_NLDEV_ATTR_RES_POLL_CTX, cq->poll_ctx))
 		goto err;
 
+	if (nla_put_u8(msg, RDMA_NLDEV_ATTR_DEV_DIM, (cq->dim != NULL)))
+		goto err;
+
 	if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_CQN, res->id))
 		goto err;
 	if (!rdma_is_kernel_res(res) &&
@@ -704,6 +710,14 @@ static int nldev_set_doit(struct sk_buff *skb, struct nlmsghdr *nlh,
 		goto put_done;
 	}
 
+	if (tb[RDMA_NLDEV_ATTR_DEV_DIM]) {
+		u8 use_dim;
+
+		use_dim = nla_get_u8(tb[RDMA_NLDEV_ATTR_DEV_DIM]);
+		err = ib_device_set_dim(device,  use_dim);
+		goto done;
+	}
+
 done:
 	ib_device_put(device);
 put_done:
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0742095355f2..074d7f4bc8a6 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -65,6 +65,7 @@
 #include <rdma/restrack.h>
 #include <uapi/rdma/rdma_user_ioctl.h>
 #include <uapi/rdma/ib_user_ioctl_verbs.h>
+#include <linux/dim.h>
 
 #define IB_FW_VERSION_NAME_MAX	ETHTOOL_FWVERS_LEN
 
@@ -1638,6 +1639,7 @@ struct ib_cq {
 	 * Implementation details of the RDMA core, don't use in drivers:
 	 */
 	struct rdma_restrack_entry res;
+	struct dim *dim;
 };
 
 struct ib_srq {
@@ -2692,6 +2694,8 @@ struct ib_device {
 	/* Used by iWarp CM */
 	char iw_ifname[IFNAMSIZ];
 	u32 iw_driver_flags;
+
+	bool use_cq_dim;
 };
 
 struct ib_client {
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index 41db51367efa..6050c7daee83 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -479,6 +479,11 @@ enum rdma_nldev_attr {
 	 */
 	RDMA_NLDEV_NET_NS_FD,			/* u32 */
 
+	/*
+	 * Setting of dynamic interrupt moderation
+	 */
+	RDMA_NLDEV_ATTR_DEV_DIM,                /* u8 */
+
 	/*
 	 * Always the end
 	 */
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [for-next V2 10/10] RDMA/core: Provide RDMA DIM support for ULPs
  2019-06-25 20:57 [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
                   ` (8 preceding siblings ...)
  2019-06-25 20:57 ` [for-next V2 09/10] RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink Saeed Mahameed
@ 2019-06-25 20:57 ` Saeed Mahameed
  2019-06-25 21:14   ` Sagi Grimberg
  2019-06-25 21:07 ` [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
  2019-06-27 19:43 ` David Miller
  11 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2019-06-25 20:57 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Sagi Grimberg, Tal Gilboa, netdev,
	linux-rdma, Yamin Friedman, Max Gurtovoy, Saeed Mahameed

From: Yamin Friedman <yaminf@mellanox.com>

Added the interface in the infiniband driver that applies the rdma_dim
adaptive moderation. There is now a special function for allocating an
ib_cq that uses rdma_dim.

Performance improvement (ConnectX-5 100GbE, x86) running FIO benchmark over
NVMf between two equal end-hosts with 56 cores across a Mellanox switch
using null_blk device:

READS without DIM:
blk size | BW       | IOPS | 99th percentile latency  | 99.99th latency
512B     | 3.8GiB/s | 7.7M | 1401  usec               | 2442  usec
4k       | 7.0GiB/s | 1.8M | 4817  usec               | 6587  usec
64k      | 10.7GiB/s| 175k | 9896  usec               | 10028 usec

IO WRITES without DIM:
blk size | BW       | IOPS | 99th percentile latency  | 99.99th latency
512B     | 3.6GiB/s | 7.5M | 1434  usec               | 2474  usec
4k       | 6.3GiB/s | 1.6M | 938   usec               | 1221  usec
64k      | 10.7GiB/s| 175k | 8979  usec               | 12780 usec

IO READS with DIM:
blk size | BW       | IOPS | 99th percentile latency  | 99.99th latency
512B     | 4GiB/s   | 8.2M | 816    usec              | 889   usec
4k       | 10.1GiB/s| 2.65M| 3359   usec              | 5080  usec
64k      | 10.7GiB/s| 175k | 9896   usec              | 10028 usec

IO WRITES with DIM:
blk size | BW       | IOPS  | 99th percentile latency | 99.99th latency
512B     | 3.9GiB/s | 8.1M  | 799   usec              | 922   usec
4k       | 9.6GiB/s | 2.5M  | 717   usec              | 1004  usec
64k      | 10.7GiB/s| 176k  | 8586  usec              | 12256 usec

The rdma_dim algorithm was designed to measure the effectiveness of
moderation on the flow in a general way and thus should be appropriate
for all RDMA storage protocols.

rdma_dim is configured to be the default option based on performance
improvement seen after extensive tests.

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/infiniband/core/cq.c      | 71 ++++++++++++++++++++++++++++++-
 drivers/infiniband/hw/mlx5/main.c |  2 +
 2 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index a4c81992267c..d8a8c466d897 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -26,6 +26,40 @@
 #define IB_POLL_FLAGS \
 	(IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)
 
+static void ib_cq_rdma_dim_work(struct work_struct *w)
+{
+	struct dim *dim = container_of(w, struct dim, work);
+	struct ib_cq *cq = (struct ib_cq *)dim->dim_owner;
+
+	u16 usec = rdma_dim_prof[dim->profile_ix].usec;
+	u16 comps = rdma_dim_prof[dim->profile_ix].comps;
+
+	dim->state = DIM_START_MEASURE;
+
+	cq->device->ops.modify_cq(cq, comps, usec);
+}
+
+static void rdma_dim_init(struct ib_cq *cq)
+{
+	struct dim *dim;
+
+	if (!cq->device->ops.modify_cq || !cq->device->use_cq_dim ||
+	    cq->poll_ctx == IB_POLL_DIRECT)
+		return;
+
+	dim = kzalloc(sizeof(struct dim), GFP_KERNEL);
+	if (!dim)
+		return;
+
+	dim->state = DIM_START_MEASURE;
+	dim->tune_state = DIM_GOING_RIGHT;
+	dim->profile_ix = RDMA_DIM_START_PROFILE;
+	dim->dim_owner = cq;
+	cq->dim = dim;
+
+	INIT_WORK(&dim->work, ib_cq_rdma_dim_work);
+}
+
 static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *wcs,
 			   int batch)
 {
@@ -98,6 +132,24 @@ static int ib_poll_handler(struct irq_poll *iop, int budget)
 	return completed;
 }
 
+static int ib_poll_dim_handler(struct irq_poll *iop, int budget)
+{
+	struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
+	struct dim *dim = cq->dim;
+	int completed;
+
+	completed = __ib_process_cq(cq, budget, cq->wc, IB_POLL_BATCH);
+	if (completed < budget) {
+		irq_poll_complete(&cq->iop);
+		if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
+			irq_poll_sched(&cq->iop);
+	}
+
+	rdma_dim(dim, completed);
+
+	return completed;
+}
+
 static void ib_cq_completion_softirq(struct ib_cq *cq, void *private)
 {
 	irq_poll_sched(&cq->iop);
@@ -105,14 +157,18 @@ static void ib_cq_completion_softirq(struct ib_cq *cq, void *private)
 
 static void ib_cq_poll_work(struct work_struct *work)
 {
-	struct ib_cq *cq = container_of(work, struct ib_cq, work);
+	struct ib_cq *cq = container_of(work, struct ib_cq,
+					work);
 	int completed;
 
 	completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, cq->wc,
 				    IB_POLL_BATCH);
+
 	if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
 	    ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
 		queue_work(cq->comp_wq, &cq->work);
+	else if (cq->dim)
+		rdma_dim(cq->dim, completed);
 }
 
 static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
@@ -166,6 +222,8 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private,
 	rdma_restrack_set_task(&cq->res, caller);
 	rdma_restrack_kadd(&cq->res);
 
+	rdma_dim_init(cq);
+
 	switch (cq->poll_ctx) {
 	case IB_POLL_DIRECT:
 		cq->comp_handler = ib_cq_completion_direct;
@@ -173,7 +231,13 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private,
 	case IB_POLL_SOFTIRQ:
 		cq->comp_handler = ib_cq_completion_softirq;
 
-		irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ, ib_poll_handler);
+		if (cq->dim) {
+			irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ,
+				      ib_poll_dim_handler);
+		} else
+			irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ,
+				      ib_poll_handler);
+
 		ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
 		break;
 	case IB_POLL_WORKQUEUE:
@@ -226,6 +290,9 @@ void ib_free_cq_user(struct ib_cq *cq, struct ib_udata *udata)
 		WARN_ON_ONCE(1);
 	}
 
+	if (cq->dim)
+		cancel_work_sync(&cq->dim->work);
+	kfree(cq->dim);
 	kfree(cq->wc);
 	rdma_restrack_del(&cq->res);
 	ret = cq->device->ops.destroy_cq(cq, udata);
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index abac70ad5c7c..b1b45dbe24a5 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -6305,6 +6305,8 @@ static int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
 	     MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
 		mutex_init(&dev->lb.mutex);
 
+	dev->ib_dev.use_cq_dim = true;
+
 	return 0;
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA
  2019-06-25 20:57 [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
                   ` (9 preceding siblings ...)
  2019-06-25 20:57 ` [for-next V2 10/10] RDMA/core: Provide RDMA DIM support for ULPs Saeed Mahameed
@ 2019-06-25 21:07 ` Saeed Mahameed
  2019-06-27 19:43 ` David Miller
  11 siblings, 0 replies; 33+ messages in thread
From: Saeed Mahameed @ 2019-06-25 21:07 UTC (permalink / raw)
  To: Jason Gunthorpe, davem, dledford
  Cc: linux-rdma, Or Gerlitz, sagi, Leon Romanovsky, Tal Gilboa, netdev

On Tue, 2019-06-25 at 20:57 +0000, Saeed Mahameed wrote:
> Hi Dave, Doug & Jason
> 
> This series improves DIM - Dynamically-tuned Interrupt
> Moderation- to be generic for netdev and RDMA use-cases.
> 
> From Tal and Yamin:
> 
> First 7 patches provide the necessary refactoring to current net_dim
> library which affect some net drivers who are using the API.
> 
> The last 3 patches provide the RDMA implementation for DIM.
> These patches are included in this pull request and they are posted

correction: The last 3 patches are *NOT* included in this pull request.

The idea here is to pull the re-factoring API patches that effect and
touch  net drivers [0-7] to both trees and [8-10] will be sent later to
rdma tree only.


Thanks,
Saeed.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 10/10] RDMA/core: Provide RDMA DIM support for ULPs
  2019-06-25 20:57 ` [for-next V2 10/10] RDMA/core: Provide RDMA DIM support for ULPs Saeed Mahameed
@ 2019-06-25 21:14   ` Sagi Grimberg
  2019-06-26  7:56     ` Idan Burstein
  2019-06-27  5:28     ` Yamin Friedman
  0 siblings, 2 replies; 33+ messages in thread
From: Sagi Grimberg @ 2019-06-25 21:14 UTC (permalink / raw)
  To: Saeed Mahameed, David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Tal Gilboa, netdev, linux-rdma,
	Yamin Friedman, Max Gurtovoy



> +static int ib_poll_dim_handler(struct irq_poll *iop, int budget)
> +{
> +	struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
> +	struct dim *dim = cq->dim;
> +	int completed;
> +
> +	completed = __ib_process_cq(cq, budget, cq->wc, IB_POLL_BATCH);
> +	if (completed < budget) {
> +		irq_poll_complete(&cq->iop);
> +		if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
> +			irq_poll_sched(&cq->iop);
> +	}
> +
> +	rdma_dim(dim, completed);

Why duplicate the entire thing for a one-liner?

> +
> +	return completed;
> +}
> +
>   static void ib_cq_completion_softirq(struct ib_cq *cq, void *private)
>   {
>   	irq_poll_sched(&cq->iop);
> @@ -105,14 +157,18 @@ static void ib_cq_completion_softirq(struct ib_cq *cq, void *private)
>   
>   static void ib_cq_poll_work(struct work_struct *work)
>   {
> -	struct ib_cq *cq = container_of(work, struct ib_cq, work);
> +	struct ib_cq *cq = container_of(work, struct ib_cq,
> +					work);

Why was that changed?

>   	int completed;
>   
>   	completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, cq->wc,
>   				    IB_POLL_BATCH);
> +

newline?

>   	if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
>   	    ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
>   		queue_work(cq->comp_wq, &cq->work);
> +	else if (cq->dim)
> +		rdma_dim(cq->dim, completed);
>   }
>   
>   static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
> @@ -166,6 +222,8 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private,
>   	rdma_restrack_set_task(&cq->res, caller);
>   	rdma_restrack_kadd(&cq->res);
>   
> +	rdma_dim_init(cq);
> +
>   	switch (cq->poll_ctx) {
>   	case IB_POLL_DIRECT:
>   		cq->comp_handler = ib_cq_completion_direct;
> @@ -173,7 +231,13 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private,
>   	case IB_POLL_SOFTIRQ:
>   		cq->comp_handler = ib_cq_completion_softirq;
>   
> -		irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ, ib_poll_handler);
> +		if (cq->dim) {
> +			irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ,
> +				      ib_poll_dim_handler);
> +		} else
> +			irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ,
> +				      ib_poll_handler);
> +
>   		ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
>   		break;
>   	case IB_POLL_WORKQUEUE:
> @@ -226,6 +290,9 @@ void ib_free_cq_user(struct ib_cq *cq, struct ib_udata *udata)
>   		WARN_ON_ONCE(1);
>   	}
>   
> +	if (cq->dim)
> +		cancel_work_sync(&cq->dim->work);
> +	kfree(cq->dim);
>   	kfree(cq->wc);
>   	rdma_restrack_del(&cq->res);
>   	ret = cq->device->ops.destroy_cq(cq, udata);
> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> index abac70ad5c7c..b1b45dbe24a5 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -6305,6 +6305,8 @@ static int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
>   	     MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
>   		mutex_init(&dev->lb.mutex);
>   
> +	dev->ib_dev.use_cq_dim = true;
> +

Please don't. This is a bad choice to opt it in by default.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 09/10] RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink
  2019-06-25 20:57 ` [for-next V2 09/10] RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink Saeed Mahameed
@ 2019-06-25 21:15   ` Sagi Grimberg
  2019-06-27  5:29     ` Yamin Friedman
  0 siblings, 1 reply; 33+ messages in thread
From: Sagi Grimberg @ 2019-06-25 21:15 UTC (permalink / raw)
  To: Saeed Mahameed, David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Tal Gilboa, netdev, linux-rdma,
	Yamin Friedman



On 6/25/19 1:57 PM, Saeed Mahameed wrote:
> From: Yamin Friedman <yaminf@mellanox.com>
> 
> Added parameter in ib_device for enabling dynamic interrupt moderation so
> that it can be configured in userspace using rdma tool.
> 
> In order to set dim for an ib device the command is:
> rdma dev set [DEV] dim [on|off]
> Please set on/off.

Is "dim" what you want to expose to the user? maybe
"adaptive-moderation" is more friendly?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 01/10] linux/dim: Move logic to dim.h
  2019-06-25 20:57 ` [for-next V2 01/10] linux/dim: Move logic to dim.h Saeed Mahameed
@ 2019-06-25 21:53   ` Sagi Grimberg
  0 siblings, 0 replies; 33+ messages in thread
From: Sagi Grimberg @ 2019-06-25 21:53 UTC (permalink / raw)
  To: Saeed Mahameed, David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Tal Gilboa, netdev, linux-rdma

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 02/10] linux/dim: Remove "net" prefix from internal DIM members
  2019-06-25 20:57 ` [for-next V2 02/10] linux/dim: Remove "net" prefix from internal DIM members Saeed Mahameed
@ 2019-06-25 21:53   ` Sagi Grimberg
  0 siblings, 0 replies; 33+ messages in thread
From: Sagi Grimberg @ 2019-06-25 21:53 UTC (permalink / raw)
  To: Saeed Mahameed, David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Tal Gilboa, netdev, linux-rdma

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 03/10] linux/dim: Rename externally exposed macros
  2019-06-25 20:57 ` [for-next V2 03/10] linux/dim: Rename externally exposed macros Saeed Mahameed
@ 2019-06-25 21:54   ` Sagi Grimberg
  0 siblings, 0 replies; 33+ messages in thread
From: Sagi Grimberg @ 2019-06-25 21:54 UTC (permalink / raw)
  To: Saeed Mahameed, David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Tal Gilboa, netdev, linux-rdma

This can be squashed to the prev one, otherwise

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 04/10] linux/dim: Rename net_dim_sample() to net_dim_update_sample()
  2019-06-25 20:57 ` [for-next V2 04/10] linux/dim: Rename net_dim_sample() to net_dim_update_sample() Saeed Mahameed
@ 2019-06-25 21:55   ` Sagi Grimberg
  0 siblings, 0 replies; 33+ messages in thread
From: Sagi Grimberg @ 2019-06-25 21:55 UTC (permalink / raw)
  To: Saeed Mahameed, David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Tal Gilboa, netdev, linux-rdma

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 05/10] linux/dim: Rename externally used net_dim members
  2019-06-25 20:57 ` [for-next V2 05/10] linux/dim: Rename externally used net_dim members Saeed Mahameed
@ 2019-06-25 21:57   ` Sagi Grimberg
  2019-06-26  6:38     ` Tal Gilboa
  0 siblings, 1 reply; 33+ messages in thread
From: Sagi Grimberg @ 2019-06-25 21:57 UTC (permalink / raw)
  To: Saeed Mahameed, David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Tal Gilboa, netdev, linux-rdma

Question, do any other nics use or plan to use this?

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 08/10] linux/dim: Implement rdma_dim
  2019-06-25 20:57 ` [for-next V2 08/10] linux/dim: Implement rdma_dim Saeed Mahameed
@ 2019-06-25 22:02   ` Sagi Grimberg
  2019-06-26 11:57     ` Or Gerlitz
  2019-06-27  5:25     ` Yamin Friedman
  0 siblings, 2 replies; 33+ messages in thread
From: Sagi Grimberg @ 2019-06-25 22:02 UTC (permalink / raw)
  To: Saeed Mahameed, David S. Miller, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Tal Gilboa, netdev, linux-rdma,
	Yamin Friedman, Max Gurtovoy


> +void rdma_dim(struct dim *dim, u64 completions)
> +{
> +	struct dim_sample *curr_sample = &dim->measuring_sample;
> +	struct dim_stats curr_stats;
> +	u32 nevents;
> +
> +	dim_update_sample_with_comps(curr_sample->event_ctr + 1,
> +				     curr_sample->pkt_ctr,
> +				     curr_sample->byte_ctr,
> +				     curr_sample->comp_ctr + completions,
> +				     &dim->measuring_sample);

If this is the only caller, why add pkt_ctr and byte_ctr at all?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 05/10] linux/dim: Rename externally used net_dim members
  2019-06-25 21:57   ` Sagi Grimberg
@ 2019-06-26  6:38     ` Tal Gilboa
  0 siblings, 0 replies; 33+ messages in thread
From: Tal Gilboa @ 2019-06-26  6:38 UTC (permalink / raw)
  To: Sagi Grimberg, Saeed Mahameed, David S. Miller, Doug Ledford,
	Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, netdev, linux-rdma

On 6/26/2019 12:57 AM, Sagi Grimberg wrote:
> Question, do any other nics use or plan to use this?
Yes, see the changed files list under drivers/net for existing usage.

> 
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [for-next V2 10/10] RDMA/core: Provide RDMA DIM support for ULPs
  2019-06-25 21:14   ` Sagi Grimberg
@ 2019-06-26  7:56     ` Idan Burstein
  2019-07-02  5:36       ` Sagi Grimberg
  2019-06-27  5:28     ` Yamin Friedman
  1 sibling, 1 reply; 33+ messages in thread
From: Idan Burstein @ 2019-06-26  7:56 UTC (permalink / raw)
  To: Sagi Grimberg, Saeed Mahameed, David S. Miller, Doug Ledford,
	Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Tal Gilboa, netdev, linux-rdma,
	Yamin Friedman, Max Gurtovoy

" Please don't. This is a bad choice to opt it in by default."

I disagree here. I'd prefer Linux to have good out of the box experience (e.g. reach 100G in 4K NVMeOF on Intel servers) with the default parameters. Especially since Yamin have shown it is beneficial / not hurting in terms of performance for variety of use cases. The whole concept of DIM is that it adapts to the workload requirements in terms of bandwidth and latency. 

Moreover, net-dim is enabled by default, I don't see why RDMA is different.


-----Original Message-----
From: linux-rdma-owner@vger.kernel.org <linux-rdma-owner@vger.kernel.org> On Behalf Of Sagi Grimberg
Sent: Wednesday, June 26, 2019 12:14 AM
To: Saeed Mahameed <saeedm@mellanox.com>; David S. Miller <davem@davemloft.net>; Doug Ledford <dledford@redhat.com>; Jason Gunthorpe <jgg@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>; Tal Gilboa <talgi@mellanox.com>; netdev@vger.kernel.org; linux-rdma@vger.kernel.org; Yamin Friedman <yaminf@mellanox.com>; Max Gurtovoy <maxg@mellanox.com>
Subject: Re: [for-next V2 10/10] RDMA/core: Provide RDMA DIM support for ULPs



> +static int ib_poll_dim_handler(struct irq_poll *iop, int budget) {
> +	struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
> +	struct dim *dim = cq->dim;
> +	int completed;
> +
> +	completed = __ib_process_cq(cq, budget, cq->wc, IB_POLL_BATCH);
> +	if (completed < budget) {
> +		irq_poll_complete(&cq->iop);
> +		if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
> +			irq_poll_sched(&cq->iop);
> +	}
> +
> +	rdma_dim(dim, completed);

Why duplicate the entire thing for a one-liner?

> +
> +	return completed;
> +}
> +
>   static void ib_cq_completion_softirq(struct ib_cq *cq, void *private)
>   {
>   	irq_poll_sched(&cq->iop);
> @@ -105,14 +157,18 @@ static void ib_cq_completion_softirq(struct 
> ib_cq *cq, void *private)
>   
>   static void ib_cq_poll_work(struct work_struct *work)
>   {
> -	struct ib_cq *cq = container_of(work, struct ib_cq, work);
> +	struct ib_cq *cq = container_of(work, struct ib_cq,
> +					work);

Why was that changed?

>   	int completed;
>   
>   	completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, cq->wc,
>   				    IB_POLL_BATCH);
> +

newline?

>   	if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
>   	    ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
>   		queue_work(cq->comp_wq, &cq->work);
> +	else if (cq->dim)
> +		rdma_dim(cq->dim, completed);
>   }
>   
>   static void ib_cq_completion_workqueue(struct ib_cq *cq, void 
> *private) @@ -166,6 +222,8 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private,
>   	rdma_restrack_set_task(&cq->res, caller);
>   	rdma_restrack_kadd(&cq->res);
>   
> +	rdma_dim_init(cq);
> +
>   	switch (cq->poll_ctx) {
>   	case IB_POLL_DIRECT:
>   		cq->comp_handler = ib_cq_completion_direct; @@ -173,7 +231,13 @@ 
> struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private,
>   	case IB_POLL_SOFTIRQ:
>   		cq->comp_handler = ib_cq_completion_softirq;
>   
> -		irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ, ib_poll_handler);
> +		if (cq->dim) {
> +			irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ,
> +				      ib_poll_dim_handler);
> +		} else
> +			irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ,
> +				      ib_poll_handler);
> +
>   		ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
>   		break;
>   	case IB_POLL_WORKQUEUE:
> @@ -226,6 +290,9 @@ void ib_free_cq_user(struct ib_cq *cq, struct ib_udata *udata)
>   		WARN_ON_ONCE(1);
>   	}
>   
> +	if (cq->dim)
> +		cancel_work_sync(&cq->dim->work);
> +	kfree(cq->dim);
>   	kfree(cq->wc);
>   	rdma_restrack_del(&cq->res);
>   	ret = cq->device->ops.destroy_cq(cq, udata); diff --git 
> a/drivers/infiniband/hw/mlx5/main.c 
> b/drivers/infiniband/hw/mlx5/main.c
> index abac70ad5c7c..b1b45dbe24a5 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -6305,6 +6305,8 @@ static int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
>   	     MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
>   		mutex_init(&dev->lb.mutex);
>   
> +	dev->ib_dev.use_cq_dim = true;
> +

Please don't. This is a bad choice to opt it in by default.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 08/10] linux/dim: Implement rdma_dim
  2019-06-25 22:02   ` Sagi Grimberg
@ 2019-06-26 11:57     ` Or Gerlitz
  2019-06-27  5:25     ` Yamin Friedman
  1 sibling, 0 replies; 33+ messages in thread
From: Or Gerlitz @ 2019-06-26 11:57 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Saeed Mahameed, David S. Miller, Doug Ledford, Jason Gunthorpe,
	Leon Romanovsky, Tal Gilboa, netdev, linux-rdma, Yamin Friedman,
	Max Gurtovoy, Idan Burstein, Or Gerlitz

On Wed, Jun 26, 2019 at 1:03 AM Sagi Grimberg <sagi@grimberg.me> wrote:
>
> > +void rdma_dim(struct dim *dim, u64 completions)
> > +{
> > +     struct dim_sample *curr_sample = &dim->measuring_sample;
> > +     struct dim_stats curr_stats;
> > +     u32 nevents;
> > +
> > +     dim_update_sample_with_comps(curr_sample->event_ctr + 1,
> > +                                  curr_sample->pkt_ctr,
> > +                                  curr_sample->byte_ctr,
> > +                                  curr_sample->comp_ctr + completions,
> > +                                  &dim->measuring_sample);
>
> If this is the only caller, why add pkt_ctr and byte_ctr at all?

Hi Sagi,

Thanks for the fast review and feedback, other than the default per
ib/rdma device setup for rdma
dim / adaptive-moderation for which Idan commented on (and lets
discuss it there please) seems
the rest of the comments are fine and Yamin will respond / address
them in the coming days.

Or.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 08/10] linux/dim: Implement rdma_dim
  2019-06-25 22:02   ` Sagi Grimberg
  2019-06-26 11:57     ` Or Gerlitz
@ 2019-06-27  5:25     ` Yamin Friedman
  1 sibling, 0 replies; 33+ messages in thread
From: Yamin Friedman @ 2019-06-27  5:25 UTC (permalink / raw)
  To: Sagi Grimberg, Saeed Mahameed, David S. Miller, Doug Ledford,
	Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Tal Gilboa, netdev, linux-rdma,
	Max Gurtovoy


On 6/26/2019 1:02 AM, Sagi Grimberg wrote:
>
>> +void rdma_dim(struct dim *dim, u64 completions)
>> +{
>> +    struct dim_sample *curr_sample = &dim->measuring_sample;
>> +    struct dim_stats curr_stats;
>> +    u32 nevents;
>> +
>> +    dim_update_sample_with_comps(curr_sample->event_ctr + 1,
>> +                     curr_sample->pkt_ctr,
>> +                     curr_sample->byte_ctr,
>> +                     curr_sample->comp_ctr + completions,
>> +                     &dim->measuring_sample);
>
> If this is the only caller, why add pkt_ctr and byte_ctr at all?


We wanted to keep the API general enough that if someone wants to 
implement a different algorithm using the dim library they will be able 
to use all the possible statistics. I agree though that in the rdma_dim 
function there is no point in making it seem like they are relevant 
parameters.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 10/10] RDMA/core: Provide RDMA DIM support for ULPs
  2019-06-25 21:14   ` Sagi Grimberg
  2019-06-26  7:56     ` Idan Burstein
@ 2019-06-27  5:28     ` Yamin Friedman
  1 sibling, 0 replies; 33+ messages in thread
From: Yamin Friedman @ 2019-06-27  5:28 UTC (permalink / raw)
  To: Sagi Grimberg, Saeed Mahameed, David S. Miller, Doug Ledford,
	Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Tal Gilboa, netdev, linux-rdma,
	Max Gurtovoy


On 6/26/2019 12:14 AM, Sagi Grimberg wrote:
>
>
>> +static int ib_poll_dim_handler(struct irq_poll *iop, int budget)
>> +{
>> +    struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
>> +    struct dim *dim = cq->dim;
>> +    int completed;
>> +
>> +    completed = __ib_process_cq(cq, budget, cq->wc, IB_POLL_BATCH);
>> +    if (completed < budget) {
>> +        irq_poll_complete(&cq->iop);
>> +        if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
>> +            irq_poll_sched(&cq->iop);
>> +    }
>> +
>> +    rdma_dim(dim, completed);
>
> Why duplicate the entire thing for a one-liner?
You are right, this was leftover from a previous version where there 
were more significant changes. I will remove the extra function.
>
>> +
>> +    return completed;
>> +}
>> +
>>   static void ib_cq_completion_softirq(struct ib_cq *cq, void *private)
>>   {
>>       irq_poll_sched(&cq->iop);
>> @@ -105,14 +157,18 @@ static void ib_cq_completion_softirq(struct 
>> ib_cq *cq, void *private)
>>     static void ib_cq_poll_work(struct work_struct *work)
>>   {
>> -    struct ib_cq *cq = container_of(work, struct ib_cq, work);
>> +    struct ib_cq *cq = container_of(work, struct ib_cq,
>> +                    work);
>
> Why was that changed?

I will fix this.

>
>>       int completed;
>>         completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, 
>> cq->wc,
>>                       IB_POLL_BATCH);
>> +
>
> newline?

Same as above.

>
>>       if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
>>           ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
>>           queue_work(cq->comp_wq, &cq->work);
>> +    else if (cq->dim)
>> +        rdma_dim(cq->dim, completed);
>>   }
>>     static void ib_cq_completion_workqueue(struct ib_cq *cq, void 
>> *private)
>> @@ -166,6 +222,8 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device 
>> *dev, void *private,
>>       rdma_restrack_set_task(&cq->res, caller);
>>       rdma_restrack_kadd(&cq->res);
>>   +    rdma_dim_init(cq);
>> +
>>       switch (cq->poll_ctx) {
>>       case IB_POLL_DIRECT:
>>           cq->comp_handler = ib_cq_completion_direct;
>> @@ -173,7 +231,13 @@ struct ib_cq *__ib_alloc_cq_user(struct 
>> ib_device *dev, void *private,
>>       case IB_POLL_SOFTIRQ:
>>           cq->comp_handler = ib_cq_completion_softirq;
>>   -        irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ, ib_poll_handler);
>> +        if (cq->dim) {
>> +            irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ,
>> +                      ib_poll_dim_handler);
>> +        } else
>> +            irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ,
>> +                      ib_poll_handler);
>> +
>>           ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
>>           break;
>>       case IB_POLL_WORKQUEUE:
>> @@ -226,6 +290,9 @@ void ib_free_cq_user(struct ib_cq *cq, struct 
>> ib_udata *udata)
>>           WARN_ON_ONCE(1);
>>       }
>>   +    if (cq->dim)
>> +        cancel_work_sync(&cq->dim->work);
>> +    kfree(cq->dim);
>>       kfree(cq->wc);
>>       rdma_restrack_del(&cq->res);
>>       ret = cq->device->ops.destroy_cq(cq, udata);
>> diff --git a/drivers/infiniband/hw/mlx5/main.c 
>> b/drivers/infiniband/hw/mlx5/main.c
>> index abac70ad5c7c..b1b45dbe24a5 100644
>> --- a/drivers/infiniband/hw/mlx5/main.c
>> +++ b/drivers/infiniband/hw/mlx5/main.c
>> @@ -6305,6 +6305,8 @@ static int mlx5_ib_stage_caps_init(struct 
>> mlx5_ib_dev *dev)
>>            MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
>>           mutex_init(&dev->lb.mutex);
>>   +    dev->ib_dev.use_cq_dim = true;
>> +
>
> Please don't. This is a bad choice to opt it in by default.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 09/10] RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink
  2019-06-25 21:15   ` Sagi Grimberg
@ 2019-06-27  5:29     ` Yamin Friedman
  0 siblings, 0 replies; 33+ messages in thread
From: Yamin Friedman @ 2019-06-27  5:29 UTC (permalink / raw)
  To: Sagi Grimberg, Saeed Mahameed, David S. Miller, Doug Ledford,
	Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Tal Gilboa, netdev, linux-rdma


On 6/26/2019 12:15 AM, Sagi Grimberg wrote:
>
>
> On 6/25/19 1:57 PM, Saeed Mahameed wrote:
>> From: Yamin Friedman <yaminf@mellanox.com>
>>
>> Added parameter in ib_device for enabling dynamic interrupt 
>> moderation so
>> that it can be configured in userspace using rdma tool.
>>
>> In order to set dim for an ib device the command is:
>> rdma dev set [DEV] dim [on|off]
>> Please set on/off.
>
> Is "dim" what you want to expose to the user? maybe
> "adaptive-moderation" is more friendly?
That makes sense, I will change it.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA
  2019-06-25 20:57 [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
                   ` (10 preceding siblings ...)
  2019-06-25 21:07 ` [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
@ 2019-06-27 19:43 ` David Miller
  11 siblings, 0 replies; 33+ messages in thread
From: David Miller @ 2019-06-27 19:43 UTC (permalink / raw)
  To: saeedm; +Cc: dledford, jgg, leonro, ogerlitz, sagi, talgi, netdev, linux-rdma

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Tue, 25 Jun 2019 20:57:27 +0000

> Once we are all happy with the series, please pull to net-next and
> rdma-next trees.

Pulled into net-next, thanks.

I'll push it back out once I am done build testing.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 10/10] RDMA/core: Provide RDMA DIM support for ULPs
  2019-06-26  7:56     ` Idan Burstein
@ 2019-07-02  5:36       ` Sagi Grimberg
  2019-07-02  6:41         ` Leon Romanovsky
  2019-07-04 12:30         ` Idan Burstein
  0 siblings, 2 replies; 33+ messages in thread
From: Sagi Grimberg @ 2019-07-02  5:36 UTC (permalink / raw)
  To: Idan Burstein, Saeed Mahameed, David S. Miller, Doug Ledford,
	Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Tal Gilboa, netdev, linux-rdma,
	Yamin Friedman, Max Gurtovoy

Hey Idan,

> " Please don't. This is a bad choice to opt it in by default."
> 
> I disagree here. I'd prefer Linux to have good out of the box experience (e.g. reach 100G in 4K NVMeOF on Intel servers) with the default parameters. Especially since Yamin have shown it is beneficial / not hurting in terms of performance for variety of use cases. The whole concept of DIM is that it adapts to the workload requirements in terms of bandwidth and latency.

Well, its a Mellanox device driver after all.

But do note that by far, the vast majority of users are not saturating
100G of 4K I/O. The absolute vast majority of users are primarily
sensitive to synchronous QD=1 I/O latency, and when the workload
is much more dynamic than the synthetic 100%/50%/0% read mix.

As much as I'm a fan (IIRC I was the one giving a first pass at this),
the dim default opt-in is not only not beneficial, but potentially
harmful to the majority of users out-of-the-box experience.

Given that this is a fresh code with almost no exposure, and that was
not tested outside of Yamin running limited performance testing, I think
it would be a mistake to add it as a default opt-in, that can come as an
incremental stage.

Obviously, I cannot tell what Mellanox should/shouldn't do in its own
device driver of course, but I just wanted to emphasize that I think
this is a mistake.

> Moreover, net-dim is enabled by default, I don't see why RDMA is different.

Very different animals.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 10/10] RDMA/core: Provide RDMA DIM support for ULPs
  2019-07-02  5:36       ` Sagi Grimberg
@ 2019-07-02  6:41         ` Leon Romanovsky
  2019-07-03 18:56           ` Sagi Grimberg
  2019-07-04 12:30         ` Idan Burstein
  1 sibling, 1 reply; 33+ messages in thread
From: Leon Romanovsky @ 2019-07-02  6:41 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Idan Burstein, Saeed Mahameed, David S. Miller, Doug Ledford,
	Jason Gunthorpe, Or Gerlitz, Tal Gilboa, netdev, linux-rdma,
	Yamin Friedman, Max Gurtovoy

On Mon, Jul 01, 2019 at 10:36:41PM -0700, Sagi Grimberg wrote:
> Hey Idan,
>
> > " Please don't. This is a bad choice to opt it in by default."
> >
> > I disagree here. I'd prefer Linux to have good out of the box experience (e.g. reach 100G in 4K NVMeOF on Intel servers) with the default parameters. Especially since Yamin have shown it is beneficial / not hurting in terms of performance for variety of use cases. The whole concept of DIM is that it adapts to the workload requirements in terms of bandwidth and latency.
>
> Well, its a Mellanox device driver after all.
>
> But do note that by far, the vast majority of users are not saturating
> 100G of 4K I/O. The absolute vast majority of users are primarily
> sensitive to synchronous QD=1 I/O latency, and when the workload
> is much more dynamic than the synthetic 100%/50%/0% read mix.
>
> As much as I'm a fan (IIRC I was the one giving a first pass at this),
> the dim default opt-in is not only not beneficial, but potentially
> harmful to the majority of users out-of-the-box experience.
>
> Given that this is a fresh code with almost no exposure, and that was
> not tested outside of Yamin running limited performance testing, I think
> it would be a mistake to add it as a default opt-in, that can come as an
> incremental stage.
>
> Obviously, I cannot tell what Mellanox should/shouldn't do in its own
> device driver of course, but I just wanted to emphasize that I think
> this is a mistake.

Hi Sagi,

I'm not sharing your worries about bad out-of-the-box experience for a
number of reasons.

First of all, this code is part of upstream kernel and will take time
till users actually start to use it as is and not as part of some distro
backports or MOFED packages.

Second, Yamin did extensive testing and worked very close with Or G.
and I have very high confident in the results of their team work.

Third (outcome of first), actually the opposite is true, the setting
this option as a default will give us more time to fix/adjust code if
needed, before users will see any potential degradation.

>
> > Moreover, net-dim is enabled by default, I don't see why RDMA is different.
>
> Very different animals.

Yes and no, the logic behind is the same and both solutions have same
constrains of throughput vs. latency.

Thanks

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 06/10] linux/dim: Move implementation to .c files
  2019-06-25 20:57 ` [for-next V2 06/10] linux/dim: Move implementation to .c files Saeed Mahameed
@ 2019-07-02 16:15   ` Geert Uytterhoeven
  0 siblings, 0 replies; 33+ messages in thread
From: Geert Uytterhoeven @ 2019-07-02 16:15 UTC (permalink / raw)
  To: Saeed Mahameed, Tal Gilboa
  Cc: David S. Miller, Doug Ledford, Jason Gunthorpe, Leon Romanovsky,
	Or Gerlitz, Sagi Grimberg, netdev, linux-rdma, linux-kernel

 	Hi Saeed, Tal,

On Tue, 25 Jun 2019, Saeed Mahameed wrote:
> From: Tal Gilboa <talgi@mellanox.com>
>
> Moved all logic from dim.h and net_dim.h to dim.c and net_dim.c.
> This is both more structurally appealing and would allow to only
> expose externally used functions.
>
> Signed-off-by: Tal Gilboa <talgi@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

This is now commit 4f75da3666c0c572 ("linux/dim: Move implementation to
.c files") in net-next.

> --- a/drivers/net/ethernet/broadcom/Kconfig
> +++ b/drivers/net/ethernet/broadcom/Kconfig
> @@ -8,6 +8,7 @@ config NET_VENDOR_BROADCOM
> 	default y
> 	depends on (SSB_POSSIBLE && HAS_DMA) || PCI || BCM63XX || \
> 		   SIBYTE_SB1xxx_SOC
> +	select DIMLIB

Merely enabling a NET_VENDOR_* symbol should not enable inclusion of
any additional code, cfr. the help text for the NET_VENDOR_BROADCOM
option.

Hence please move the select to the config symbol(s) for the driver(s)
that need it.

> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -562,6 +562,14 @@ config SIGNATURE
> 	  Digital signature verification. Currently only RSA is supported.
> 	  Implementation is done using GnuPG MPI library
>
> +config DIMLIB
> +	bool "DIM library"
> +	default y

Please drop this line, as optional library code should never be included
by default.

Thanks!

Gr{oetje,eeting}s,

 						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
 							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 10/10] RDMA/core: Provide RDMA DIM support for ULPs
  2019-07-02  6:41         ` Leon Romanovsky
@ 2019-07-03 18:56           ` Sagi Grimberg
  2019-07-04  7:51             ` Leon Romanovsky
  0 siblings, 1 reply; 33+ messages in thread
From: Sagi Grimberg @ 2019-07-03 18:56 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Idan Burstein, Saeed Mahameed, David S. Miller, Doug Ledford,
	Jason Gunthorpe, Or Gerlitz, Tal Gilboa, netdev, linux-rdma,
	Yamin Friedman, Max Gurtovoy


> Hi Sagi,
> 
> I'm not sharing your worries about bad out-of-the-box experience for a
> number of reasons.
> 
> First of all, this code is part of upstream kernel and will take time
> till users actually start to use it as is and not as part of some distro
> backports or MOFED packages.

True, but I am still saying that this feature is damaging sync IO which
represents the majority of the users. It might not be an extreme impact
but it is still a degradation (from a very limited testing I did this
morning I'm seeing a consistent 5%-10% latency increase for low QD
workloads which is consistent with what Yamin reported AFAIR).

But having said that, the call is for you guys to make as this is a
Mellanox device. I absolutely think that this is useful (as I said
before), I just don't think its necessarily a good idea to opt it by
default given that only a limited set of users would take full advantage
of it while the rest would see a negative impact (even if its 10%).

I don't have  a hard objection here, just wanted to give you my
opinion on this because mlx5 is an important driver for rdma
users.

> Second, Yamin did extensive testing and worked very close with Or G.
> and I have very high confident in the results of their team work.

Has anyone tested other RDMA ulps? NFS/RDMA or SRP/iSER?

Would be interesting to understand how other subsystems with different
characteristics behave with this.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [for-next V2 10/10] RDMA/core: Provide RDMA DIM support for ULPs
  2019-07-03 18:56           ` Sagi Grimberg
@ 2019-07-04  7:51             ` Leon Romanovsky
  0 siblings, 0 replies; 33+ messages in thread
From: Leon Romanovsky @ 2019-07-04  7:51 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Idan Burstein, Saeed Mahameed, David S. Miller, Doug Ledford,
	Jason Gunthorpe, Or Gerlitz, Tal Gilboa, netdev, linux-rdma,
	Yamin Friedman, Max Gurtovoy

On Wed, Jul 03, 2019 at 11:56:04AM -0700, Sagi Grimberg wrote:
>
> > Hi Sagi,
> >
> > I'm not sharing your worries about bad out-of-the-box experience for a
> > number of reasons.
> >
> > First of all, this code is part of upstream kernel and will take time
> > till users actually start to use it as is and not as part of some distro
> > backports or MOFED packages.
>
> True, but I am still saying that this feature is damaging sync IO which
> represents the majority of the users. It might not be an extreme impact
> but it is still a degradation (from a very limited testing I did this
> morning I'm seeing a consistent 5%-10% latency increase for low QD
> workloads which is consistent with what Yamin reported AFAIR).
>
> But having said that, the call is for you guys to make as this is a
> Mellanox device. I absolutely think that this is useful (as I said
> before), I just don't think its necessarily a good idea to opt it by
> default given that only a limited set of users would take full advantage
> of it while the rest would see a negative impact (even if its 10%).
>
> I don't have  a hard objection here, just wanted to give you my
> opinion on this because mlx5 is an important driver for rdma
> users.

Your opinion is very valuable for us and we started internal thread to
challenge this "enable by default", it just takes time and I prefer to
enable this code to get test coverage as wide as possible.

>
> > Second, Yamin did extensive testing and worked very close with Or G.
> > and I have very high confident in the results of their team work.
>
> Has anyone tested other RDMA ulps? NFS/RDMA or SRP/iSER?
>
> Would be interesting to understand how other subsystems with different
> characteristics behave with this.

Me too, and I'll revert this default if needed.

Thanks


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [for-next V2 10/10] RDMA/core: Provide RDMA DIM support for ULPs
  2019-07-02  5:36       ` Sagi Grimberg
  2019-07-02  6:41         ` Leon Romanovsky
@ 2019-07-04 12:30         ` Idan Burstein
  1 sibling, 0 replies; 33+ messages in thread
From: Idan Burstein @ 2019-07-04 12:30 UTC (permalink / raw)
  To: Sagi Grimberg, Saeed Mahameed, David S. Miller, Doug Ledford,
	Jason Gunthorpe
  Cc: Leon Romanovsky, Or Gerlitz, Tal Gilboa, netdev, linux-rdma,
	Yamin Friedman, Max Gurtovoy

The essence of the dynamic in DIM is that it would fit to the workload running on the cores. For user not to trade bandwidth/cqu% and latency with a module parameter they don't know how to config. If DIM consistently hurts latency of latency critical workloads we should debug and fix.

This is where we should go. End goal of no configurate with out of the box performance in terms of both bandwidth/cpu% and latency.

We could make several steps towards this direction if we are not mature enough today but let's define them (e.g. tests on different ulps).

-----Original Message-----
From: linux-rdma-owner@vger.kernel.org <linux-rdma-owner@vger.kernel.org> On Behalf Of Sagi Grimberg
Sent: Tuesday, July 2, 2019 8:37 AM
To: Idan Burstein <idanb@mellanox.com>; Saeed Mahameed <saeedm@mellanox.com>; David S. Miller <davem@davemloft.net>; Doug Ledford <dledford@redhat.com>; Jason Gunthorpe <jgg@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>; Or Gerlitz <ogerlitz@mellanox.com>; Tal Gilboa <talgi@mellanox.com>; netdev@vger.kernel.org; linux-rdma@vger.kernel.org; Yamin Friedman <yaminf@mellanox.com>; Max Gurtovoy <maxg@mellanox.com>
Subject: Re: [for-next V2 10/10] RDMA/core: Provide RDMA DIM support for ULPs

Hey Idan,

> " Please don't. This is a bad choice to opt it in by default."
> 
> I disagree here. I'd prefer Linux to have good out of the box experience (e.g. reach 100G in 4K NVMeOF on Intel servers) with the default parameters. Especially since Yamin have shown it is beneficial / not hurting in terms of performance for variety of use cases. The whole concept of DIM is that it adapts to the workload requirements in terms of bandwidth and latency.

Well, its a Mellanox device driver after all.

But do note that by far, the vast majority of users are not saturating 100G of 4K I/O. The absolute vast majority of users are primarily sensitive to synchronous QD=1 I/O latency, and when the workload is much more dynamic than the synthetic 100%/50%/0% read mix.

As much as I'm a fan (IIRC I was the one giving a first pass at this), the dim default opt-in is not only not beneficial, but potentially harmful to the majority of users out-of-the-box experience.

Given that this is a fresh code with almost no exposure, and that was not tested outside of Yamin running limited performance testing, I think it would be a mistake to add it as a default opt-in, that can come as an incremental stage.

Obviously, I cannot tell what Mellanox should/shouldn't do in its own device driver of course, but I just wanted to emphasize that I think this is a mistake.

> Moreover, net-dim is enabled by default, I don't see why RDMA is different.

Very different animals.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2019-07-04 12:30 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-25 20:57 [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
2019-06-25 20:57 ` [for-next V2 01/10] linux/dim: Move logic to dim.h Saeed Mahameed
2019-06-25 21:53   ` Sagi Grimberg
2019-06-25 20:57 ` [for-next V2 02/10] linux/dim: Remove "net" prefix from internal DIM members Saeed Mahameed
2019-06-25 21:53   ` Sagi Grimberg
2019-06-25 20:57 ` [for-next V2 03/10] linux/dim: Rename externally exposed macros Saeed Mahameed
2019-06-25 21:54   ` Sagi Grimberg
2019-06-25 20:57 ` [for-next V2 04/10] linux/dim: Rename net_dim_sample() to net_dim_update_sample() Saeed Mahameed
2019-06-25 21:55   ` Sagi Grimberg
2019-06-25 20:57 ` [for-next V2 05/10] linux/dim: Rename externally used net_dim members Saeed Mahameed
2019-06-25 21:57   ` Sagi Grimberg
2019-06-26  6:38     ` Tal Gilboa
2019-06-25 20:57 ` [for-next V2 06/10] linux/dim: Move implementation to .c files Saeed Mahameed
2019-07-02 16:15   ` Geert Uytterhoeven
2019-06-25 20:57 ` [for-next V2 07/10] linux/dim: Add completions count to dim_sample Saeed Mahameed
2019-06-25 20:57 ` [for-next V2 08/10] linux/dim: Implement rdma_dim Saeed Mahameed
2019-06-25 22:02   ` Sagi Grimberg
2019-06-26 11:57     ` Or Gerlitz
2019-06-27  5:25     ` Yamin Friedman
2019-06-25 20:57 ` [for-next V2 09/10] RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink Saeed Mahameed
2019-06-25 21:15   ` Sagi Grimberg
2019-06-27  5:29     ` Yamin Friedman
2019-06-25 20:57 ` [for-next V2 10/10] RDMA/core: Provide RDMA DIM support for ULPs Saeed Mahameed
2019-06-25 21:14   ` Sagi Grimberg
2019-06-26  7:56     ` Idan Burstein
2019-07-02  5:36       ` Sagi Grimberg
2019-07-02  6:41         ` Leon Romanovsky
2019-07-03 18:56           ` Sagi Grimberg
2019-07-04  7:51             ` Leon Romanovsky
2019-07-04 12:30         ` Idan Burstein
2019-06-27  5:28     ` Yamin Friedman
2019-06-25 21:07 ` [pull request][for-next V2 0/7] Generic DIM lib for netdev and RDMA Saeed Mahameed
2019-06-27 19:43 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).