All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-03-06  8:48 ` Tal Gilboa
  0 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)
  To: linux-rdma, linux-nvme, linux-block
  Cc: Yishai Hadas, Leon Romanovsky, Jason Gunthorpe, Doug Ledford,
	Tariq Toukan, Tal Gilboa, Saeed Mahameed, Idan Burstein,
	Yamin Friedman, Max Gurtovoy

net_dim.h lib exposes an implementation of the DIM algorithm for dynamically-tuned interrupt
moderation for networking interfaces.

We need the same behavior for any block CQ. The main motivation is two benefit from maximized
completion rate and reduced interrupt overhead that DIM may provide.

Current DIM implementation prioritizes reducing interrupt overhead over latency. Also, in
order to reduce DIM's own overhead, the algorithm might take take some time to identify it
needs to change profiles. For these reasons we got to the understanding that a slightly
modified algorithm is needed. Early tests with current implementation show it doesn't react
fast and sharply enough in order to satisfy the block CQ needs.

I would like to suggest an implementation for block DIM. The idea is to expose the new
functionality without the risk of breaking Net DIM behavior for netdev. Below are main
similarities and differences between the two implementations and general guidelines for the
suggested solution.

Performance tests over ConnectX-5 100GbE NIC show a 200% improvement on tail latency when
switching from high load traffic to low load traffic.

Common logic, main DIM procedure:
- Calculate current stats from a given sample
- Compare current stats vs. previous iteration stats
- Make a decision -> choose a new profile

Differences:
- Different parameters for moving between profiles
- Different moderation values and number of profiles
- Different sampled data

Suggested solution:
- Common logic will be declared in include/linux/dim.h and implemented in lib/dim/dim.c
- Net DIM (existing) logic will be declared in include/linux/net_dim.h and implemented in
  lib/dim/net_dim.c, which will use the common logic from dim.h
- Block DIM logic will be declared in /include/linux/block_dim.h and implemented in
  lib/dim/blk_dim.c.
  This new implementation will expose modified versions of profiles, dim_step() and dim_decision()

Pros for this solution are:
- Zero impact on existing net_dim implementation and usage
- Relatively more code reuse (compared to two separate solutions)
- Readiness for future implementations

Tal Gilboa (6):
  linux/dim: Move logic to dim.h
  linux/dim: Remove "net" prefix from internal DIM members
  linux/dim: Rename externally exposed macros
  linux/dim: Rename net_dim_sample() to net_dim_create_sample()
  linux/dim: Rename externally used net_dim members
  linux/dim: Move implementation to .c files

Yamin Friedman (3):
  linux/dim: Add completions count to dim_sample
  linux/dim: Implement blk_dim.h
  drivers/infiniband: Use blk_dim in infiniband driver

 MAINTAINERS                                   |   3 +
 drivers/infiniband/core/cq.c                  |  75 +++-
 drivers/infiniband/hw/mlx4/qp.c               |   2 +-
 drivers/infiniband/hw/mlx5/qp.c               |   2 +-
 drivers/net/ethernet/broadcom/bcmsysport.c    |  20 +-
 drivers/net/ethernet/broadcom/bcmsysport.h    |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  13 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   2 +-
 .../net/ethernet/broadcom/bnxt/bnxt_debugfs.c |   4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c |   7 +-
 .../net/ethernet/broadcom/genet/bcmgenet.c    |  18 +-
 .../net/ethernet/broadcom/genet/bcmgenet.h    |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   8 +-
 .../net/ethernet/mellanox/mlx5/core/en_dim.c  |  12 +-
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |   4 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  22 +-
 .../net/ethernet/mellanox/mlx5/core/en_txrx.c |  10 +-
 include/linux/blk_dim.h                       |  56 +++
 include/linux/dim.h                           | 126 +++++++
 include/linux/irq_poll.h                      |   7 +
 include/linux/net_dim.h                       | 338 +-----------------
 include/rdma/ib_verbs.h                       |  11 +-
 lib/Kconfig                                   |   7 +
 lib/Makefile                                  |   1 +
 lib/dim/Makefile                              |  14 +
 lib/dim/blk_dim.c                             | 114 ++++++
 lib/dim/dim.c                                 |  92 +++++
 lib/dim/net_dim.c                             | 193 ++++++++++
 lib/irq_poll.c                                |  13 +-
 29 files changed, 778 insertions(+), 400 deletions(-)
 create mode 100644 include/linux/blk_dim.h
 create mode 100644 include/linux/dim.h
 create mode 100644 lib/dim/Makefile
 create mode 100644 lib/dim/blk_dim.c
 create mode 100644 lib/dim/dim.c
 create mode 100644 lib/dim/net_dim.c

-- 
2.19.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-03-06  8:48 ` Tal Gilboa
  0 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)


net_dim.h lib exposes an implementation of the DIM algorithm for dynamically-tuned interrupt
moderation for networking interfaces.

We need the same behavior for any block CQ. The main motivation is two benefit from maximized
completion rate and reduced interrupt overhead that DIM may provide.

Current DIM implementation prioritizes reducing interrupt overhead over latency. Also, in
order to reduce DIM's own overhead, the algorithm might take take some time to identify it
needs to change profiles. For these reasons we got to the understanding that a slightly
modified algorithm is needed. Early tests with current implementation show it doesn't react
fast and sharply enough in order to satisfy the block CQ needs.

I would like to suggest an implementation for block DIM. The idea is to expose the new
functionality without the risk of breaking Net DIM behavior for netdev. Below are main
similarities and differences between the two implementations and general guidelines for the
suggested solution.

Performance tests over ConnectX-5 100GbE NIC show a 200% improvement on tail latency when
switching from high load traffic to low load traffic.

Common logic, main DIM procedure:
- Calculate current stats from a given sample
- Compare current stats vs. previous iteration stats
- Make a decision -> choose a new profile

Differences:
- Different parameters for moving between profiles
- Different moderation values and number of profiles
- Different sampled data

Suggested solution:
- Common logic will be declared in include/linux/dim.h and implemented in lib/dim/dim.c
- Net DIM (existing) logic will be declared in include/linux/net_dim.h and implemented in
  lib/dim/net_dim.c, which will use the common logic from dim.h
- Block DIM logic will be declared in /include/linux/block_dim.h and implemented in
  lib/dim/blk_dim.c.
  This new implementation will expose modified versions of profiles, dim_step() and dim_decision()

Pros for this solution are:
- Zero impact on existing net_dim implementation and usage
- Relatively more code reuse (compared to two separate solutions)
- Readiness for future implementations

Tal Gilboa (6):
  linux/dim: Move logic to dim.h
  linux/dim: Remove "net" prefix from internal DIM members
  linux/dim: Rename externally exposed macros
  linux/dim: Rename net_dim_sample() to net_dim_create_sample()
  linux/dim: Rename externally used net_dim members
  linux/dim: Move implementation to .c files

Yamin Friedman (3):
  linux/dim: Add completions count to dim_sample
  linux/dim: Implement blk_dim.h
  drivers/infiniband: Use blk_dim in infiniband driver

 MAINTAINERS                                   |   3 +
 drivers/infiniband/core/cq.c                  |  75 +++-
 drivers/infiniband/hw/mlx4/qp.c               |   2 +-
 drivers/infiniband/hw/mlx5/qp.c               |   2 +-
 drivers/net/ethernet/broadcom/bcmsysport.c    |  20 +-
 drivers/net/ethernet/broadcom/bcmsysport.h    |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  13 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   2 +-
 .../net/ethernet/broadcom/bnxt/bnxt_debugfs.c |   4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c |   7 +-
 .../net/ethernet/broadcom/genet/bcmgenet.c    |  18 +-
 .../net/ethernet/broadcom/genet/bcmgenet.h    |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   8 +-
 .../net/ethernet/mellanox/mlx5/core/en_dim.c  |  12 +-
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |   4 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  22 +-
 .../net/ethernet/mellanox/mlx5/core/en_txrx.c |  10 +-
 include/linux/blk_dim.h                       |  56 +++
 include/linux/dim.h                           | 126 +++++++
 include/linux/irq_poll.h                      |   7 +
 include/linux/net_dim.h                       | 338 +-----------------
 include/rdma/ib_verbs.h                       |  11 +-
 lib/Kconfig                                   |   7 +
 lib/Makefile                                  |   1 +
 lib/dim/Makefile                              |  14 +
 lib/dim/blk_dim.c                             | 114 ++++++
 lib/dim/dim.c                                 |  92 +++++
 lib/dim/net_dim.c                             | 193 ++++++++++
 lib/irq_poll.c                                |  13 +-
 29 files changed, 778 insertions(+), 400 deletions(-)
 create mode 100644 include/linux/blk_dim.h
 create mode 100644 include/linux/dim.h
 create mode 100644 lib/dim/Makefile
 create mode 100644 lib/dim/blk_dim.c
 create mode 100644 lib/dim/dim.c
 create mode 100644 lib/dim/net_dim.c

-- 
2.19.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 1/9] linux/dim: Move logic to dim.h
  2019-03-06  8:48 ` Tal Gilboa
@ 2019-03-06  8:48   ` Tal Gilboa
  -1 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)
  To: linux-rdma, linux-nvme, linux-block
  Cc: Yishai Hadas, Leon Romanovsky, Jason Gunthorpe, Doug Ledford,
	Tariq Toukan, Tal Gilboa, Saeed Mahameed, Idan Burstein,
	Yamin Friedman, Max Gurtovoy

In preparation for supporting more implementations of the DIM
algorithm, I'm moving what would become common logic to a common
library. Downstream DIM implementations will use the common lib
for their implementation.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
---
 include/linux/dim.h     | 183 ++++++++++++++++++++++++++++++++++++++++
 include/linux/net_dim.h | 148 +-------------------------------
 2 files changed, 185 insertions(+), 146 deletions(-)
 create mode 100644 include/linux/dim.h

diff --git a/include/linux/dim.h b/include/linux/dim.h
new file mode 100644
index 000000000000..c745c5d457ff
--- /dev/null
+++ b/include/linux/dim.h
@@ -0,0 +1,183 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2017-2018, Broadcom Limited. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef DIM_H
+#define DIM_H
+
+#include <linux/module.h>
+
+#define NET_DIM_NEVENTS 64
+#define IS_SIGNIFICANT_DIFF(val, ref) \
+	(((100UL * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
+#define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1))
+
+
+struct net_dim_cq_moder {
+	u16 usec;
+	u16 pkts;
+	u8 cq_period_mode;
+};
+
+struct net_dim_sample {
+	ktime_t time;
+	u32     pkt_ctr;
+	u32     byte_ctr;
+	u16     event_ctr;
+};
+
+struct net_dim_stats {
+	int ppms; /* packets per msec */
+	int bpms; /* bytes per msec */
+	int epms; /* events per msec */
+};
+
+struct net_dim { /* Dynamic Interrupt Moderation */
+	u8                                      state;
+	struct net_dim_stats                    prev_stats;
+	struct net_dim_sample                   start_sample;
+	struct work_struct                      work;
+	u8                                      profile_ix;
+	u8                                      mode;
+	u8                                      tune_state;
+	u8                                      steps_right;
+	u8                                      steps_left;
+	u8                                      tired;
+};
+
+enum {
+	NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
+	NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
+	NET_DIM_CQ_PERIOD_NUM_MODES
+};
+
+enum {
+	NET_DIM_START_MEASURE,
+	NET_DIM_MEASURE_IN_PROGRESS,
+	NET_DIM_APPLY_NEW_PROFILE,
+};
+
+enum {
+	NET_DIM_PARKING_ON_TOP,
+	NET_DIM_PARKING_TIRED,
+	NET_DIM_GOING_RIGHT,
+	NET_DIM_GOING_LEFT,
+};
+
+enum {
+	NET_DIM_STATS_WORSE,
+	NET_DIM_STATS_SAME,
+	NET_DIM_STATS_BETTER,
+};
+
+enum {
+	NET_DIM_STEPPED,
+	NET_DIM_TOO_TIRED,
+	NET_DIM_ON_EDGE,
+};
+
+static inline bool net_dim_on_top(struct net_dim *net_dim)
+{
+	switch (net_dim->tune_state) {
+	case NET_DIM_PARKING_ON_TOP:
+	case NET_DIM_PARKING_TIRED:
+		return true;
+	case NET_DIM_GOING_RIGHT:
+		return (net_dim->steps_left > 1) && (net_dim->steps_right == 1);
+	default: /* NET_DIM_GOING_LEFT */
+		return (net_dim->steps_right > 1) && (net_dim->steps_left == 1);
+	}
+}
+
+static inline void net_dim_turn(struct net_dim *net_dim)
+{
+	switch (net_dim->tune_state) {
+	case NET_DIM_PARKING_ON_TOP:
+	case NET_DIM_PARKING_TIRED:
+		break;
+	case NET_DIM_GOING_RIGHT:
+		net_dim->tune_state = NET_DIM_GOING_LEFT;
+		net_dim->steps_left = 0;
+		break;
+	case NET_DIM_GOING_LEFT:
+		net_dim->tune_state = NET_DIM_GOING_RIGHT;
+		net_dim->steps_right = 0;
+		break;
+	}
+}
+
+static inline void net_dim_park_on_top(struct net_dim *net_dim)
+{
+	net_dim->steps_right  = 0;
+	net_dim->steps_left   = 0;
+	net_dim->tired        = 0;
+	net_dim->tune_state   = NET_DIM_PARKING_ON_TOP;
+}
+
+static inline void net_dim_park_tired(struct net_dim *net_dim)
+{
+	net_dim->steps_right  = 0;
+	net_dim->steps_left   = 0;
+	net_dim->tune_state   = NET_DIM_PARKING_TIRED;
+}
+
+static inline void net_dim_sample(u16 event_ctr,
+				  u64 packets,
+				  u64 bytes,
+				  struct net_dim_sample *s)
+{
+	s->time	     = ktime_get();
+	s->pkt_ctr   = packets;
+	s->byte_ctr  = bytes;
+	s->event_ctr = event_ctr;
+}
+
+static inline void net_dim_calc_stats(struct net_dim_sample *start,
+				  struct net_dim_sample *end,
+				  struct net_dim_stats *curr_stats)
+{
+	/* u32 holds up to 71 minutes, should be enough */
+	u32 delta_us = ktime_us_delta(end->time, start->time);
+	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
+	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
+			     start->byte_ctr);
+
+	if (!delta_us)
+		return;
+
+	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
+	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
+	curr_stats->epms = DIV_ROUND_UP(NET_DIM_NEVENTS * USEC_PER_MSEC,
+					delta_us);
+}
+
+#endif /* DIM_H */
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index fd458389f7d1..373cda74b167 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -35,73 +35,10 @@
 #define NET_DIM_H
 
 #include <linux/module.h>
-
-struct net_dim_cq_moder {
-	u16 usec;
-	u16 pkts;
-	u8 cq_period_mode;
-};
-
-struct net_dim_sample {
-	ktime_t time;
-	u32     pkt_ctr;
-	u32     byte_ctr;
-	u16     event_ctr;
-};
-
-struct net_dim_stats {
-	int ppms; /* packets per msec */
-	int bpms; /* bytes per msec */
-	int epms; /* events per msec */
-};
-
-struct net_dim { /* Adaptive Moderation */
-	u8                                      state;
-	struct net_dim_stats                    prev_stats;
-	struct net_dim_sample                   start_sample;
-	struct work_struct                      work;
-	u8                                      profile_ix;
-	u8                                      mode;
-	u8                                      tune_state;
-	u8                                      steps_right;
-	u8                                      steps_left;
-	u8                                      tired;
-};
-
-enum {
-	NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
-	NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
-	NET_DIM_CQ_PERIOD_NUM_MODES
-};
-
-/* Adaptive moderation logic */
-enum {
-	NET_DIM_START_MEASURE,
-	NET_DIM_MEASURE_IN_PROGRESS,
-	NET_DIM_APPLY_NEW_PROFILE,
-};
-
-enum {
-	NET_DIM_PARKING_ON_TOP,
-	NET_DIM_PARKING_TIRED,
-	NET_DIM_GOING_RIGHT,
-	NET_DIM_GOING_LEFT,
-};
-
-enum {
-	NET_DIM_STATS_WORSE,
-	NET_DIM_STATS_SAME,
-	NET_DIM_STATS_BETTER,
-};
-
-enum {
-	NET_DIM_STEPPED,
-	NET_DIM_TOO_TIRED,
-	NET_DIM_ON_EDGE,
-};
+#include <linux/dim.h>
 
 #define NET_DIM_PARAMS_NUM_PROFILES 5
-/* Adaptive moderation profiles */
+/* Netdev dynamic interrupt moderation profiles */
 #define NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
 #define NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE 128
 #define NET_DIM_DEF_PROFILE_CQE 1
@@ -188,36 +125,6 @@ net_dim_get_def_tx_moderation(u8 cq_period_mode)
 	return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
 }
 
-static inline bool net_dim_on_top(struct net_dim *dim)
-{
-	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
-		return true;
-	case NET_DIM_GOING_RIGHT:
-		return (dim->steps_left > 1) && (dim->steps_right == 1);
-	default: /* NET_DIM_GOING_LEFT */
-		return (dim->steps_right > 1) && (dim->steps_left == 1);
-	}
-}
-
-static inline void net_dim_turn(struct net_dim *dim)
-{
-	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
-		break;
-	case NET_DIM_GOING_RIGHT:
-		dim->tune_state = NET_DIM_GOING_LEFT;
-		dim->steps_left = 0;
-		break;
-	case NET_DIM_GOING_LEFT:
-		dim->tune_state = NET_DIM_GOING_RIGHT;
-		dim->steps_right = 0;
-		break;
-	}
-}
-
 static inline int net_dim_step(struct net_dim *dim)
 {
 	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
@@ -245,21 +152,6 @@ static inline int net_dim_step(struct net_dim *dim)
 	return NET_DIM_STEPPED;
 }
 
-static inline void net_dim_park_on_top(struct net_dim *dim)
-{
-	dim->steps_right  = 0;
-	dim->steps_left   = 0;
-	dim->tired        = 0;
-	dim->tune_state   = NET_DIM_PARKING_ON_TOP;
-}
-
-static inline void net_dim_park_tired(struct net_dim *dim)
-{
-	dim->steps_right  = 0;
-	dim->steps_left   = 0;
-	dim->tune_state   = NET_DIM_PARKING_TIRED;
-}
-
 static inline void net_dim_exit_parking(struct net_dim *dim)
 {
 	dim->tune_state = dim->profile_ix ? NET_DIM_GOING_LEFT :
@@ -267,9 +159,6 @@ static inline void net_dim_exit_parking(struct net_dim *dim)
 	net_dim_step(dim);
 }
 
-#define IS_SIGNIFICANT_DIFF(val, ref) \
-	(((100UL * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
-
 static inline int net_dim_stats_compare(struct net_dim_stats *curr,
 					struct net_dim_stats *prev)
 {
@@ -351,39 +240,6 @@ static inline bool net_dim_decision(struct net_dim_stats *curr_stats,
 	return dim->profile_ix != prev_ix;
 }
 
-static inline void net_dim_sample(u16 event_ctr,
-				  u64 packets,
-				  u64 bytes,
-				  struct net_dim_sample *s)
-{
-	s->time	     = ktime_get();
-	s->pkt_ctr   = packets;
-	s->byte_ctr  = bytes;
-	s->event_ctr = event_ctr;
-}
-
-#define NET_DIM_NEVENTS 64
-#define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1))
-
-static inline void net_dim_calc_stats(struct net_dim_sample *start,
-				      struct net_dim_sample *end,
-				      struct net_dim_stats *curr_stats)
-{
-	/* u32 holds up to 71 minutes, should be enough */
-	u32 delta_us = ktime_us_delta(end->time, start->time);
-	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
-	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
-			     start->byte_ctr);
-
-	if (!delta_us)
-		return;
-
-	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
-	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
-	curr_stats->epms = DIV_ROUND_UP(NET_DIM_NEVENTS * USEC_PER_MSEC,
-					delta_us);
-}
-
 static inline void net_dim(struct net_dim *dim,
 			   struct net_dim_sample end_sample)
 {
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 1/9] linux/dim: Move logic to dim.h
@ 2019-03-06  8:48   ` Tal Gilboa
  0 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)


In preparation for supporting more implementations of the DIM
algorithm, I'm moving what would become common logic to a common
library. Downstream DIM implementations will use the common lib
for their implementation.

Signed-off-by: Tal Gilboa <talgi at mellanox.com>
---
 include/linux/dim.h     | 183 ++++++++++++++++++++++++++++++++++++++++
 include/linux/net_dim.h | 148 +-------------------------------
 2 files changed, 185 insertions(+), 146 deletions(-)
 create mode 100644 include/linux/dim.h

diff --git a/include/linux/dim.h b/include/linux/dim.h
new file mode 100644
index 000000000000..c745c5d457ff
--- /dev/null
+++ b/include/linux/dim.h
@@ -0,0 +1,183 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2017-2018, Broadcom Limited. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef DIM_H
+#define DIM_H
+
+#include <linux/module.h>
+
+#define NET_DIM_NEVENTS 64
+#define IS_SIGNIFICANT_DIFF(val, ref) \
+	(((100UL * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
+#define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1))
+
+
+struct net_dim_cq_moder {
+	u16 usec;
+	u16 pkts;
+	u8 cq_period_mode;
+};
+
+struct net_dim_sample {
+	ktime_t time;
+	u32     pkt_ctr;
+	u32     byte_ctr;
+	u16     event_ctr;
+};
+
+struct net_dim_stats {
+	int ppms; /* packets per msec */
+	int bpms; /* bytes per msec */
+	int epms; /* events per msec */
+};
+
+struct net_dim { /* Dynamic Interrupt Moderation */
+	u8                                      state;
+	struct net_dim_stats                    prev_stats;
+	struct net_dim_sample                   start_sample;
+	struct work_struct                      work;
+	u8                                      profile_ix;
+	u8                                      mode;
+	u8                                      tune_state;
+	u8                                      steps_right;
+	u8                                      steps_left;
+	u8                                      tired;
+};
+
+enum {
+	NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
+	NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
+	NET_DIM_CQ_PERIOD_NUM_MODES
+};
+
+enum {
+	NET_DIM_START_MEASURE,
+	NET_DIM_MEASURE_IN_PROGRESS,
+	NET_DIM_APPLY_NEW_PROFILE,
+};
+
+enum {
+	NET_DIM_PARKING_ON_TOP,
+	NET_DIM_PARKING_TIRED,
+	NET_DIM_GOING_RIGHT,
+	NET_DIM_GOING_LEFT,
+};
+
+enum {
+	NET_DIM_STATS_WORSE,
+	NET_DIM_STATS_SAME,
+	NET_DIM_STATS_BETTER,
+};
+
+enum {
+	NET_DIM_STEPPED,
+	NET_DIM_TOO_TIRED,
+	NET_DIM_ON_EDGE,
+};
+
+static inline bool net_dim_on_top(struct net_dim *net_dim)
+{
+	switch (net_dim->tune_state) {
+	case NET_DIM_PARKING_ON_TOP:
+	case NET_DIM_PARKING_TIRED:
+		return true;
+	case NET_DIM_GOING_RIGHT:
+		return (net_dim->steps_left > 1) && (net_dim->steps_right == 1);
+	default: /* NET_DIM_GOING_LEFT */
+		return (net_dim->steps_right > 1) && (net_dim->steps_left == 1);
+	}
+}
+
+static inline void net_dim_turn(struct net_dim *net_dim)
+{
+	switch (net_dim->tune_state) {
+	case NET_DIM_PARKING_ON_TOP:
+	case NET_DIM_PARKING_TIRED:
+		break;
+	case NET_DIM_GOING_RIGHT:
+		net_dim->tune_state = NET_DIM_GOING_LEFT;
+		net_dim->steps_left = 0;
+		break;
+	case NET_DIM_GOING_LEFT:
+		net_dim->tune_state = NET_DIM_GOING_RIGHT;
+		net_dim->steps_right = 0;
+		break;
+	}
+}
+
+static inline void net_dim_park_on_top(struct net_dim *net_dim)
+{
+	net_dim->steps_right  = 0;
+	net_dim->steps_left   = 0;
+	net_dim->tired        = 0;
+	net_dim->tune_state   = NET_DIM_PARKING_ON_TOP;
+}
+
+static inline void net_dim_park_tired(struct net_dim *net_dim)
+{
+	net_dim->steps_right  = 0;
+	net_dim->steps_left   = 0;
+	net_dim->tune_state   = NET_DIM_PARKING_TIRED;
+}
+
+static inline void net_dim_sample(u16 event_ctr,
+				  u64 packets,
+				  u64 bytes,
+				  struct net_dim_sample *s)
+{
+	s->time	     = ktime_get();
+	s->pkt_ctr   = packets;
+	s->byte_ctr  = bytes;
+	s->event_ctr = event_ctr;
+}
+
+static inline void net_dim_calc_stats(struct net_dim_sample *start,
+				  struct net_dim_sample *end,
+				  struct net_dim_stats *curr_stats)
+{
+	/* u32 holds up to 71 minutes, should be enough */
+	u32 delta_us = ktime_us_delta(end->time, start->time);
+	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
+	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
+			     start->byte_ctr);
+
+	if (!delta_us)
+		return;
+
+	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
+	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
+	curr_stats->epms = DIV_ROUND_UP(NET_DIM_NEVENTS * USEC_PER_MSEC,
+					delta_us);
+}
+
+#endif /* DIM_H */
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index fd458389f7d1..373cda74b167 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -35,73 +35,10 @@
 #define NET_DIM_H
 
 #include <linux/module.h>
-
-struct net_dim_cq_moder {
-	u16 usec;
-	u16 pkts;
-	u8 cq_period_mode;
-};
-
-struct net_dim_sample {
-	ktime_t time;
-	u32     pkt_ctr;
-	u32     byte_ctr;
-	u16     event_ctr;
-};
-
-struct net_dim_stats {
-	int ppms; /* packets per msec */
-	int bpms; /* bytes per msec */
-	int epms; /* events per msec */
-};
-
-struct net_dim { /* Adaptive Moderation */
-	u8                                      state;
-	struct net_dim_stats                    prev_stats;
-	struct net_dim_sample                   start_sample;
-	struct work_struct                      work;
-	u8                                      profile_ix;
-	u8                                      mode;
-	u8                                      tune_state;
-	u8                                      steps_right;
-	u8                                      steps_left;
-	u8                                      tired;
-};
-
-enum {
-	NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
-	NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
-	NET_DIM_CQ_PERIOD_NUM_MODES
-};
-
-/* Adaptive moderation logic */
-enum {
-	NET_DIM_START_MEASURE,
-	NET_DIM_MEASURE_IN_PROGRESS,
-	NET_DIM_APPLY_NEW_PROFILE,
-};
-
-enum {
-	NET_DIM_PARKING_ON_TOP,
-	NET_DIM_PARKING_TIRED,
-	NET_DIM_GOING_RIGHT,
-	NET_DIM_GOING_LEFT,
-};
-
-enum {
-	NET_DIM_STATS_WORSE,
-	NET_DIM_STATS_SAME,
-	NET_DIM_STATS_BETTER,
-};
-
-enum {
-	NET_DIM_STEPPED,
-	NET_DIM_TOO_TIRED,
-	NET_DIM_ON_EDGE,
-};
+#include <linux/dim.h>
 
 #define NET_DIM_PARAMS_NUM_PROFILES 5
-/* Adaptive moderation profiles */
+/* Netdev dynamic interrupt moderation profiles */
 #define NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
 #define NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE 128
 #define NET_DIM_DEF_PROFILE_CQE 1
@@ -188,36 +125,6 @@ net_dim_get_def_tx_moderation(u8 cq_period_mode)
 	return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
 }
 
-static inline bool net_dim_on_top(struct net_dim *dim)
-{
-	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
-		return true;
-	case NET_DIM_GOING_RIGHT:
-		return (dim->steps_left > 1) && (dim->steps_right == 1);
-	default: /* NET_DIM_GOING_LEFT */
-		return (dim->steps_right > 1) && (dim->steps_left == 1);
-	}
-}
-
-static inline void net_dim_turn(struct net_dim *dim)
-{
-	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
-		break;
-	case NET_DIM_GOING_RIGHT:
-		dim->tune_state = NET_DIM_GOING_LEFT;
-		dim->steps_left = 0;
-		break;
-	case NET_DIM_GOING_LEFT:
-		dim->tune_state = NET_DIM_GOING_RIGHT;
-		dim->steps_right = 0;
-		break;
-	}
-}
-
 static inline int net_dim_step(struct net_dim *dim)
 {
 	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
@@ -245,21 +152,6 @@ static inline int net_dim_step(struct net_dim *dim)
 	return NET_DIM_STEPPED;
 }
 
-static inline void net_dim_park_on_top(struct net_dim *dim)
-{
-	dim->steps_right  = 0;
-	dim->steps_left   = 0;
-	dim->tired        = 0;
-	dim->tune_state   = NET_DIM_PARKING_ON_TOP;
-}
-
-static inline void net_dim_park_tired(struct net_dim *dim)
-{
-	dim->steps_right  = 0;
-	dim->steps_left   = 0;
-	dim->tune_state   = NET_DIM_PARKING_TIRED;
-}
-
 static inline void net_dim_exit_parking(struct net_dim *dim)
 {
 	dim->tune_state = dim->profile_ix ? NET_DIM_GOING_LEFT :
@@ -267,9 +159,6 @@ static inline void net_dim_exit_parking(struct net_dim *dim)
 	net_dim_step(dim);
 }
 
-#define IS_SIGNIFICANT_DIFF(val, ref) \
-	(((100UL * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
-
 static inline int net_dim_stats_compare(struct net_dim_stats *curr,
 					struct net_dim_stats *prev)
 {
@@ -351,39 +240,6 @@ static inline bool net_dim_decision(struct net_dim_stats *curr_stats,
 	return dim->profile_ix != prev_ix;
 }
 
-static inline void net_dim_sample(u16 event_ctr,
-				  u64 packets,
-				  u64 bytes,
-				  struct net_dim_sample *s)
-{
-	s->time	     = ktime_get();
-	s->pkt_ctr   = packets;
-	s->byte_ctr  = bytes;
-	s->event_ctr = event_ctr;
-}
-
-#define NET_DIM_NEVENTS 64
-#define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1))
-
-static inline void net_dim_calc_stats(struct net_dim_sample *start,
-				      struct net_dim_sample *end,
-				      struct net_dim_stats *curr_stats)
-{
-	/* u32 holds up to 71 minutes, should be enough */
-	u32 delta_us = ktime_us_delta(end->time, start->time);
-	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
-	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
-			     start->byte_ctr);
-
-	if (!delta_us)
-		return;
-
-	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
-	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
-	curr_stats->epms = DIV_ROUND_UP(NET_DIM_NEVENTS * USEC_PER_MSEC,
-					delta_us);
-}
-
 static inline void net_dim(struct net_dim *dim,
 			   struct net_dim_sample end_sample)
 {
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 2/9] linux/dim: Remove "net" prefix from internal DIM members
  2019-03-06  8:48 ` Tal Gilboa
@ 2019-03-06  8:48   ` Tal Gilboa
  -1 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)
  To: linux-rdma, linux-nvme, linux-block
  Cc: Yishai Hadas, Leon Romanovsky, Jason Gunthorpe, Doug Ledford,
	Tariq Toukan, Tal Gilboa, Saeed Mahameed, Idan Burstein,
	Yamin Friedman, Max Gurtovoy

Only renaming functions and structs which aren't used by an external code.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
---
 include/linux/dim.h     | 86 ++++++++++++++++++++---------------------
 include/linux/net_dim.h | 86 ++++++++++++++++++++---------------------
 2 files changed, 86 insertions(+), 86 deletions(-)

diff --git a/include/linux/dim.h b/include/linux/dim.h
index c745c5d457ff..16ef59524b69 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -36,7 +36,7 @@
 
 #include <linux/module.h>
 
-#define NET_DIM_NEVENTS 64
+#define DIM_NEVENTS 64
 #define IS_SIGNIFICANT_DIFF(val, ref) \
 	(((100UL * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
 #define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1))
@@ -55,7 +55,7 @@ struct net_dim_sample {
 	u16     event_ctr;
 };
 
-struct net_dim_stats {
+struct dim_stats {
 	int ppms; /* packets per msec */
 	int bpms; /* bytes per msec */
 	int epms; /* events per msec */
@@ -63,7 +63,7 @@ struct net_dim_stats {
 
 struct net_dim { /* Dynamic Interrupt Moderation */
 	u8                                      state;
-	struct net_dim_stats                    prev_stats;
+	struct dim_stats                        prev_stats;
 	struct net_dim_sample                   start_sample;
 	struct work_struct                      work;
 	u8                                      profile_ix;
@@ -87,67 +87,67 @@ enum {
 };
 
 enum {
-	NET_DIM_PARKING_ON_TOP,
-	NET_DIM_PARKING_TIRED,
-	NET_DIM_GOING_RIGHT,
-	NET_DIM_GOING_LEFT,
+	DIM_PARKING_ON_TOP,
+	DIM_PARKING_TIRED,
+	DIM_GOING_RIGHT,
+	DIM_GOING_LEFT,
 };
 
 enum {
-	NET_DIM_STATS_WORSE,
-	NET_DIM_STATS_SAME,
-	NET_DIM_STATS_BETTER,
+	DIM_STATS_WORSE,
+	DIM_STATS_SAME,
+	DIM_STATS_BETTER,
 };
 
 enum {
-	NET_DIM_STEPPED,
-	NET_DIM_TOO_TIRED,
-	NET_DIM_ON_EDGE,
+	DIM_STEPPED,
+	DIM_TOO_TIRED,
+	DIM_ON_EDGE,
 };
 
-static inline bool net_dim_on_top(struct net_dim *net_dim)
+static inline bool dim_on_top(struct net_dim *dim)
 {
-	switch (net_dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
 		return true;
-	case NET_DIM_GOING_RIGHT:
-		return (net_dim->steps_left > 1) && (net_dim->steps_right == 1);
-	default: /* NET_DIM_GOING_LEFT */
-		return (net_dim->steps_right > 1) && (net_dim->steps_left == 1);
+	case DIM_GOING_RIGHT:
+		return (dim->steps_left > 1) && (dim->steps_right == 1);
+	default: /* DIM_GOING_LEFT */
+		return (dim->steps_right > 1) && (dim->steps_left == 1);
 	}
 }
 
-static inline void net_dim_turn(struct net_dim *net_dim)
+static inline void dim_turn(struct net_dim *dim)
 {
-	switch (net_dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
 		break;
-	case NET_DIM_GOING_RIGHT:
-		net_dim->tune_state = NET_DIM_GOING_LEFT;
-		net_dim->steps_left = 0;
+	case DIM_GOING_RIGHT:
+		dim->tune_state = DIM_GOING_LEFT;
+		dim->steps_left = 0;
 		break;
-	case NET_DIM_GOING_LEFT:
-		net_dim->tune_state = NET_DIM_GOING_RIGHT;
-		net_dim->steps_right = 0;
+	case DIM_GOING_LEFT:
+		dim->tune_state = DIM_GOING_RIGHT;
+		dim->steps_right = 0;
 		break;
 	}
 }
 
-static inline void net_dim_park_on_top(struct net_dim *net_dim)
+static inline void dim_park_on_top(struct net_dim *dim)
 {
-	net_dim->steps_right  = 0;
-	net_dim->steps_left   = 0;
-	net_dim->tired        = 0;
-	net_dim->tune_state   = NET_DIM_PARKING_ON_TOP;
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tired        = 0;
+	dim->tune_state   = DIM_PARKING_ON_TOP;
 }
 
-static inline void net_dim_park_tired(struct net_dim *net_dim)
+static inline void dim_park_tired(struct net_dim *dim)
 {
-	net_dim->steps_right  = 0;
-	net_dim->steps_left   = 0;
-	net_dim->tune_state   = NET_DIM_PARKING_TIRED;
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tune_state   = DIM_PARKING_TIRED;
 }
 
 static inline void net_dim_sample(u16 event_ctr,
@@ -161,9 +161,9 @@ static inline void net_dim_sample(u16 event_ctr,
 	s->event_ctr = event_ctr;
 }
 
-static inline void net_dim_calc_stats(struct net_dim_sample *start,
+static inline void dim_calc_stats(struct net_dim_sample *start,
 				  struct net_dim_sample *end,
-				  struct net_dim_stats *curr_stats)
+				  struct dim_stats *curr_stats)
 {
 	/* u32 holds up to 71 minutes, should be enough */
 	u32 delta_us = ktime_us_delta(end->time, start->time);
@@ -176,7 +176,7 @@ static inline void net_dim_calc_stats(struct net_dim_sample *start,
 
 	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
 	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
-	curr_stats->epms = DIV_ROUND_UP(NET_DIM_NEVENTS * USEC_PER_MSEC,
+	curr_stats->epms = DIV_ROUND_UP(DIM_NEVENTS * USEC_PER_MSEC,
 					delta_us);
 }
 
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index 373cda74b167..1ce0899b5f30 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -128,67 +128,67 @@ net_dim_get_def_tx_moderation(u8 cq_period_mode)
 static inline int net_dim_step(struct net_dim *dim)
 {
 	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
-		return NET_DIM_TOO_TIRED;
+		return DIM_TOO_TIRED;
 
 	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
 		break;
-	case NET_DIM_GOING_RIGHT:
+	case DIM_GOING_RIGHT:
 		if (dim->profile_ix == (NET_DIM_PARAMS_NUM_PROFILES - 1))
-			return NET_DIM_ON_EDGE;
+			return DIM_ON_EDGE;
 		dim->profile_ix++;
 		dim->steps_right++;
 		break;
-	case NET_DIM_GOING_LEFT:
+	case DIM_GOING_LEFT:
 		if (dim->profile_ix == 0)
-			return NET_DIM_ON_EDGE;
+			return DIM_ON_EDGE;
 		dim->profile_ix--;
 		dim->steps_left++;
 		break;
 	}
 
 	dim->tired++;
-	return NET_DIM_STEPPED;
+	return DIM_STEPPED;
 }
 
 static inline void net_dim_exit_parking(struct net_dim *dim)
 {
-	dim->tune_state = dim->profile_ix ? NET_DIM_GOING_LEFT :
-					  NET_DIM_GOING_RIGHT;
+	dim->tune_state = dim->profile_ix ? DIM_GOING_LEFT :
+					  DIM_GOING_RIGHT;
 	net_dim_step(dim);
 }
 
-static inline int net_dim_stats_compare(struct net_dim_stats *curr,
-					struct net_dim_stats *prev)
+static inline int net_dim_stats_compare(struct dim_stats *curr,
+					struct dim_stats *prev)
 {
 	if (!prev->bpms)
-		return curr->bpms ? NET_DIM_STATS_BETTER :
-				    NET_DIM_STATS_SAME;
+		return curr->bpms ? DIM_STATS_BETTER :
+				    DIM_STATS_SAME;
 
 	if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
-		return (curr->bpms > prev->bpms) ? NET_DIM_STATS_BETTER :
-						   NET_DIM_STATS_WORSE;
+		return (curr->bpms > prev->bpms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
 
 	if (!prev->ppms)
-		return curr->ppms ? NET_DIM_STATS_BETTER :
-				    NET_DIM_STATS_SAME;
+		return curr->ppms ? DIM_STATS_BETTER :
+				    DIM_STATS_SAME;
 
 	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
-		return (curr->ppms > prev->ppms) ? NET_DIM_STATS_BETTER :
-						   NET_DIM_STATS_WORSE;
+		return (curr->ppms > prev->ppms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
 
 	if (!prev->epms)
-		return NET_DIM_STATS_SAME;
+		return DIM_STATS_SAME;
 
 	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
-		return (curr->epms < prev->epms) ? NET_DIM_STATS_BETTER :
-						   NET_DIM_STATS_WORSE;
+		return (curr->epms < prev->epms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
 
-	return NET_DIM_STATS_SAME;
+	return DIM_STATS_SAME;
 }
 
-static inline bool net_dim_decision(struct net_dim_stats *curr_stats,
+static inline bool net_dim_decision(struct dim_stats *curr_stats,
 				    struct net_dim *dim)
 {
 	int prev_state = dim->tune_state;
@@ -197,44 +197,44 @@ static inline bool net_dim_decision(struct net_dim_stats *curr_stats,
 	int step_res;
 
 	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
+	case DIM_PARKING_ON_TOP:
 		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
-		if (stats_res != NET_DIM_STATS_SAME)
+		if (stats_res != DIM_STATS_SAME)
 			net_dim_exit_parking(dim);
 		break;
 
-	case NET_DIM_PARKING_TIRED:
+	case DIM_PARKING_TIRED:
 		dim->tired--;
 		if (!dim->tired)
 			net_dim_exit_parking(dim);
 		break;
 
-	case NET_DIM_GOING_RIGHT:
-	case NET_DIM_GOING_LEFT:
+	case DIM_GOING_RIGHT:
+	case DIM_GOING_LEFT:
 		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
-		if (stats_res != NET_DIM_STATS_BETTER)
-			net_dim_turn(dim);
+		if (stats_res != DIM_STATS_BETTER)
+			dim_turn(dim);
 
-		if (net_dim_on_top(dim)) {
-			net_dim_park_on_top(dim);
+		if (dim_on_top(dim)) {
+			dim_park_on_top(dim);
 			break;
 		}
 
 		step_res = net_dim_step(dim);
 		switch (step_res) {
-		case NET_DIM_ON_EDGE:
-			net_dim_park_on_top(dim);
+		case DIM_ON_EDGE:
+			dim_park_on_top(dim);
 			break;
-		case NET_DIM_TOO_TIRED:
-			net_dim_park_tired(dim);
+		case DIM_TOO_TIRED:
+			dim_park_tired(dim);
 			break;
 		}
 
 		break;
 	}
 
-	if ((prev_state      != NET_DIM_PARKING_ON_TOP) ||
-	    (dim->tune_state != NET_DIM_PARKING_ON_TOP))
+	if ((prev_state      != DIM_PARKING_ON_TOP) ||
+	    (dim->tune_state != DIM_PARKING_ON_TOP))
 		dim->prev_stats = *curr_stats;
 
 	return dim->profile_ix != prev_ix;
@@ -243,7 +243,7 @@ static inline bool net_dim_decision(struct net_dim_stats *curr_stats,
 static inline void net_dim(struct net_dim *dim,
 			   struct net_dim_sample end_sample)
 {
-	struct net_dim_stats curr_stats;
+	struct dim_stats curr_stats;
 	u16 nevents;
 
 	switch (dim->state) {
@@ -251,9 +251,9 @@ static inline void net_dim(struct net_dim *dim,
 		nevents = BIT_GAP(BITS_PER_TYPE(u16),
 				  end_sample.event_ctr,
 				  dim->start_sample.event_ctr);
-		if (nevents < NET_DIM_NEVENTS)
+		if (nevents < DIM_NEVENTS)
 			break;
-		net_dim_calc_stats(&dim->start_sample, &end_sample,
+		dim_calc_stats(&dim->start_sample, &end_sample,
 				   &curr_stats);
 		if (net_dim_decision(&curr_stats, dim)) {
 			dim->state = NET_DIM_APPLY_NEW_PROFILE;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 2/9] linux/dim: Remove "net" prefix from internal DIM members
@ 2019-03-06  8:48   ` Tal Gilboa
  0 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)


Only renaming functions and structs which aren't used by an external code.

Signed-off-by: Tal Gilboa <talgi at mellanox.com>
---
 include/linux/dim.h     | 86 ++++++++++++++++++++---------------------
 include/linux/net_dim.h | 86 ++++++++++++++++++++---------------------
 2 files changed, 86 insertions(+), 86 deletions(-)

diff --git a/include/linux/dim.h b/include/linux/dim.h
index c745c5d457ff..16ef59524b69 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -36,7 +36,7 @@
 
 #include <linux/module.h>
 
-#define NET_DIM_NEVENTS 64
+#define DIM_NEVENTS 64
 #define IS_SIGNIFICANT_DIFF(val, ref) \
 	(((100UL * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
 #define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1))
@@ -55,7 +55,7 @@ struct net_dim_sample {
 	u16     event_ctr;
 };
 
-struct net_dim_stats {
+struct dim_stats {
 	int ppms; /* packets per msec */
 	int bpms; /* bytes per msec */
 	int epms; /* events per msec */
@@ -63,7 +63,7 @@ struct net_dim_stats {
 
 struct net_dim { /* Dynamic Interrupt Moderation */
 	u8                                      state;
-	struct net_dim_stats                    prev_stats;
+	struct dim_stats                        prev_stats;
 	struct net_dim_sample                   start_sample;
 	struct work_struct                      work;
 	u8                                      profile_ix;
@@ -87,67 +87,67 @@ enum {
 };
 
 enum {
-	NET_DIM_PARKING_ON_TOP,
-	NET_DIM_PARKING_TIRED,
-	NET_DIM_GOING_RIGHT,
-	NET_DIM_GOING_LEFT,
+	DIM_PARKING_ON_TOP,
+	DIM_PARKING_TIRED,
+	DIM_GOING_RIGHT,
+	DIM_GOING_LEFT,
 };
 
 enum {
-	NET_DIM_STATS_WORSE,
-	NET_DIM_STATS_SAME,
-	NET_DIM_STATS_BETTER,
+	DIM_STATS_WORSE,
+	DIM_STATS_SAME,
+	DIM_STATS_BETTER,
 };
 
 enum {
-	NET_DIM_STEPPED,
-	NET_DIM_TOO_TIRED,
-	NET_DIM_ON_EDGE,
+	DIM_STEPPED,
+	DIM_TOO_TIRED,
+	DIM_ON_EDGE,
 };
 
-static inline bool net_dim_on_top(struct net_dim *net_dim)
+static inline bool dim_on_top(struct net_dim *dim)
 {
-	switch (net_dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
 		return true;
-	case NET_DIM_GOING_RIGHT:
-		return (net_dim->steps_left > 1) && (net_dim->steps_right == 1);
-	default: /* NET_DIM_GOING_LEFT */
-		return (net_dim->steps_right > 1) && (net_dim->steps_left == 1);
+	case DIM_GOING_RIGHT:
+		return (dim->steps_left > 1) && (dim->steps_right == 1);
+	default: /* DIM_GOING_LEFT */
+		return (dim->steps_right > 1) && (dim->steps_left == 1);
 	}
 }
 
-static inline void net_dim_turn(struct net_dim *net_dim)
+static inline void dim_turn(struct net_dim *dim)
 {
-	switch (net_dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
 		break;
-	case NET_DIM_GOING_RIGHT:
-		net_dim->tune_state = NET_DIM_GOING_LEFT;
-		net_dim->steps_left = 0;
+	case DIM_GOING_RIGHT:
+		dim->tune_state = DIM_GOING_LEFT;
+		dim->steps_left = 0;
 		break;
-	case NET_DIM_GOING_LEFT:
-		net_dim->tune_state = NET_DIM_GOING_RIGHT;
-		net_dim->steps_right = 0;
+	case DIM_GOING_LEFT:
+		dim->tune_state = DIM_GOING_RIGHT;
+		dim->steps_right = 0;
 		break;
 	}
 }
 
-static inline void net_dim_park_on_top(struct net_dim *net_dim)
+static inline void dim_park_on_top(struct net_dim *dim)
 {
-	net_dim->steps_right  = 0;
-	net_dim->steps_left   = 0;
-	net_dim->tired        = 0;
-	net_dim->tune_state   = NET_DIM_PARKING_ON_TOP;
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tired        = 0;
+	dim->tune_state   = DIM_PARKING_ON_TOP;
 }
 
-static inline void net_dim_park_tired(struct net_dim *net_dim)
+static inline void dim_park_tired(struct net_dim *dim)
 {
-	net_dim->steps_right  = 0;
-	net_dim->steps_left   = 0;
-	net_dim->tune_state   = NET_DIM_PARKING_TIRED;
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tune_state   = DIM_PARKING_TIRED;
 }
 
 static inline void net_dim_sample(u16 event_ctr,
@@ -161,9 +161,9 @@ static inline void net_dim_sample(u16 event_ctr,
 	s->event_ctr = event_ctr;
 }
 
-static inline void net_dim_calc_stats(struct net_dim_sample *start,
+static inline void dim_calc_stats(struct net_dim_sample *start,
 				  struct net_dim_sample *end,
-				  struct net_dim_stats *curr_stats)
+				  struct dim_stats *curr_stats)
 {
 	/* u32 holds up to 71 minutes, should be enough */
 	u32 delta_us = ktime_us_delta(end->time, start->time);
@@ -176,7 +176,7 @@ static inline void net_dim_calc_stats(struct net_dim_sample *start,
 
 	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
 	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
-	curr_stats->epms = DIV_ROUND_UP(NET_DIM_NEVENTS * USEC_PER_MSEC,
+	curr_stats->epms = DIV_ROUND_UP(DIM_NEVENTS * USEC_PER_MSEC,
 					delta_us);
 }
 
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index 373cda74b167..1ce0899b5f30 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -128,67 +128,67 @@ net_dim_get_def_tx_moderation(u8 cq_period_mode)
 static inline int net_dim_step(struct net_dim *dim)
 {
 	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
-		return NET_DIM_TOO_TIRED;
+		return DIM_TOO_TIRED;
 
 	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
 		break;
-	case NET_DIM_GOING_RIGHT:
+	case DIM_GOING_RIGHT:
 		if (dim->profile_ix == (NET_DIM_PARAMS_NUM_PROFILES - 1))
-			return NET_DIM_ON_EDGE;
+			return DIM_ON_EDGE;
 		dim->profile_ix++;
 		dim->steps_right++;
 		break;
-	case NET_DIM_GOING_LEFT:
+	case DIM_GOING_LEFT:
 		if (dim->profile_ix == 0)
-			return NET_DIM_ON_EDGE;
+			return DIM_ON_EDGE;
 		dim->profile_ix--;
 		dim->steps_left++;
 		break;
 	}
 
 	dim->tired++;
-	return NET_DIM_STEPPED;
+	return DIM_STEPPED;
 }
 
 static inline void net_dim_exit_parking(struct net_dim *dim)
 {
-	dim->tune_state = dim->profile_ix ? NET_DIM_GOING_LEFT :
-					  NET_DIM_GOING_RIGHT;
+	dim->tune_state = dim->profile_ix ? DIM_GOING_LEFT :
+					  DIM_GOING_RIGHT;
 	net_dim_step(dim);
 }
 
-static inline int net_dim_stats_compare(struct net_dim_stats *curr,
-					struct net_dim_stats *prev)
+static inline int net_dim_stats_compare(struct dim_stats *curr,
+					struct dim_stats *prev)
 {
 	if (!prev->bpms)
-		return curr->bpms ? NET_DIM_STATS_BETTER :
-				    NET_DIM_STATS_SAME;
+		return curr->bpms ? DIM_STATS_BETTER :
+				    DIM_STATS_SAME;
 
 	if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
-		return (curr->bpms > prev->bpms) ? NET_DIM_STATS_BETTER :
-						   NET_DIM_STATS_WORSE;
+		return (curr->bpms > prev->bpms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
 
 	if (!prev->ppms)
-		return curr->ppms ? NET_DIM_STATS_BETTER :
-				    NET_DIM_STATS_SAME;
+		return curr->ppms ? DIM_STATS_BETTER :
+				    DIM_STATS_SAME;
 
 	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
-		return (curr->ppms > prev->ppms) ? NET_DIM_STATS_BETTER :
-						   NET_DIM_STATS_WORSE;
+		return (curr->ppms > prev->ppms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
 
 	if (!prev->epms)
-		return NET_DIM_STATS_SAME;
+		return DIM_STATS_SAME;
 
 	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
-		return (curr->epms < prev->epms) ? NET_DIM_STATS_BETTER :
-						   NET_DIM_STATS_WORSE;
+		return (curr->epms < prev->epms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
 
-	return NET_DIM_STATS_SAME;
+	return DIM_STATS_SAME;
 }
 
-static inline bool net_dim_decision(struct net_dim_stats *curr_stats,
+static inline bool net_dim_decision(struct dim_stats *curr_stats,
 				    struct net_dim *dim)
 {
 	int prev_state = dim->tune_state;
@@ -197,44 +197,44 @@ static inline bool net_dim_decision(struct net_dim_stats *curr_stats,
 	int step_res;
 
 	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
+	case DIM_PARKING_ON_TOP:
 		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
-		if (stats_res != NET_DIM_STATS_SAME)
+		if (stats_res != DIM_STATS_SAME)
 			net_dim_exit_parking(dim);
 		break;
 
-	case NET_DIM_PARKING_TIRED:
+	case DIM_PARKING_TIRED:
 		dim->tired--;
 		if (!dim->tired)
 			net_dim_exit_parking(dim);
 		break;
 
-	case NET_DIM_GOING_RIGHT:
-	case NET_DIM_GOING_LEFT:
+	case DIM_GOING_RIGHT:
+	case DIM_GOING_LEFT:
 		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
-		if (stats_res != NET_DIM_STATS_BETTER)
-			net_dim_turn(dim);
+		if (stats_res != DIM_STATS_BETTER)
+			dim_turn(dim);
 
-		if (net_dim_on_top(dim)) {
-			net_dim_park_on_top(dim);
+		if (dim_on_top(dim)) {
+			dim_park_on_top(dim);
 			break;
 		}
 
 		step_res = net_dim_step(dim);
 		switch (step_res) {
-		case NET_DIM_ON_EDGE:
-			net_dim_park_on_top(dim);
+		case DIM_ON_EDGE:
+			dim_park_on_top(dim);
 			break;
-		case NET_DIM_TOO_TIRED:
-			net_dim_park_tired(dim);
+		case DIM_TOO_TIRED:
+			dim_park_tired(dim);
 			break;
 		}
 
 		break;
 	}
 
-	if ((prev_state      != NET_DIM_PARKING_ON_TOP) ||
-	    (dim->tune_state != NET_DIM_PARKING_ON_TOP))
+	if ((prev_state      != DIM_PARKING_ON_TOP) ||
+	    (dim->tune_state != DIM_PARKING_ON_TOP))
 		dim->prev_stats = *curr_stats;
 
 	return dim->profile_ix != prev_ix;
@@ -243,7 +243,7 @@ static inline bool net_dim_decision(struct net_dim_stats *curr_stats,
 static inline void net_dim(struct net_dim *dim,
 			   struct net_dim_sample end_sample)
 {
-	struct net_dim_stats curr_stats;
+	struct dim_stats curr_stats;
 	u16 nevents;
 
 	switch (dim->state) {
@@ -251,9 +251,9 @@ static inline void net_dim(struct net_dim *dim,
 		nevents = BIT_GAP(BITS_PER_TYPE(u16),
 				  end_sample.event_ctr,
 				  dim->start_sample.event_ctr);
-		if (nevents < NET_DIM_NEVENTS)
+		if (nevents < DIM_NEVENTS)
 			break;
-		net_dim_calc_stats(&dim->start_sample, &end_sample,
+		dim_calc_stats(&dim->start_sample, &end_sample,
 				   &curr_stats);
 		if (net_dim_decision(&curr_stats, dim)) {
 			dim->state = NET_DIM_APPLY_NEW_PROFILE;
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 3/9] linux/dim: Rename externally exposed macros
  2019-03-06  8:48 ` Tal Gilboa
@ 2019-03-06  8:48   ` Tal Gilboa
  -1 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)
  To: linux-rdma, linux-nvme, linux-block
  Cc: Yishai Hadas, Leon Romanovsky, Jason Gunthorpe, Doug Ledford,
	Tariq Toukan, Tal Gilboa, Saeed Mahameed, Idan Burstein,
	Yamin Friedman, Max Gurtovoy

Renamed macros in use by external drivers.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c     |  4 ++--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c      |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c  |  2 +-
 drivers/net/ethernet/broadcom/genet/bcmgenet.c |  4 ++--
 .../net/ethernet/mellanox/mlx5/core/en_dim.c   |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c  | 10 +++++-----
 include/linux/dim.h                            | 12 ++++++------
 include/linux/net_dim.h                        | 18 +++++++++---------
 8 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index f9521d0274b7..0dbdef541f3c 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1114,7 +1114,7 @@ static void bcm_sysport_dim_work(struct work_struct *work)
 			net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
 
 	bcm_sysport_set_rx_coalesce(priv, cur_profile.usec, cur_profile.pkts);
-	dim->state = NET_DIM_START_MEASURE;
+	dim->state = DIM_START_MEASURE;
 }
 
 /* RX and misc interrupt routine */
@@ -1466,7 +1466,7 @@ static void bcm_sysport_init_dim(struct bcm_sysport_priv *priv,
 	struct bcm_sysport_net_dim *dim = &priv->dim;
 
 	INIT_WORK(&dim->dim.work, cb);
-	dim->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+	dim->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 	dim->event_ctr = 0;
 	dim->packets = 0;
 	dim->bytes = 0;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 6a512871176b..08d0679ccf63 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -7673,7 +7673,7 @@ static void bnxt_enable_napi(struct bnxt *bp)
 
 		if (bp->bnapi[i]->rx_ring) {
 			INIT_WORK(&cpr->dim.work, bnxt_dim_work);
-			cpr->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+			cpr->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 		}
 		napi_enable(&bp->bnapi[i]->napi);
 	}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
index afa97c8bb081..16a4588709d1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
@@ -28,5 +28,5 @@ void bnxt_dim_work(struct work_struct *work)
 	cpr->rx_ring_coal.coal_bufs = cur_moder.pkts;
 
 	bnxt_hwrm_set_ring_coal(bnapi->bp, bnapi);
-	dim->state = NET_DIM_START_MEASURE;
+	dim->state = DIM_START_MEASURE;
 }
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 983245c0867c..eee48dae8ed5 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1928,7 +1928,7 @@ static void bcmgenet_dim_work(struct work_struct *work)
 			net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
 
 	bcmgenet_set_rx_coalesce(ring, cur_profile.usec, cur_profile.pkts);
-	dim->state = NET_DIM_START_MEASURE;
+	dim->state = DIM_START_MEASURE;
 }
 
 /* Assign skb to RX DMA descriptor. */
@@ -2085,7 +2085,7 @@ static void bcmgenet_init_dim(struct bcmgenet_rx_ring *ring,
 	struct bcmgenet_net_dim *dim = &ring->dim;
 
 	INIT_WORK(&dim->dim.work, cb);
-	dim->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+	dim->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 	dim->event_ctr = 0;
 	dim->packets = 0;
 	dim->bytes = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index d67adf70a97b..a80303add7c0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -38,7 +38,7 @@ mlx5e_complete_dim_work(struct net_dim *dim, struct net_dim_cq_moder moder,
 			struct mlx5_core_dev *mdev, struct mlx5_core_cq *mcq)
 {
 	mlx5_core_modify_cq_moderation(mdev, mcq, moder.usec, moder.pkts);
-	dim->state = NET_DIM_START_MEASURE;
+	dim->state = DIM_START_MEASURE;
 }
 
 void mlx5e_rx_dim_work(struct work_struct *work)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 099d307e6f25..7d80f803f445 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -679,11 +679,11 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 
 	switch (params->rx_cq_moderation.cq_period_mode) {
 	case MLX5_CQ_PERIOD_MODE_START_FROM_CQE:
-		rq->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE;
+		rq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_CQE;
 		break;
 	case MLX5_CQ_PERIOD_MODE_START_FROM_EQE:
 	default:
-		rq->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+		rq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 	}
 
 	rq->page_cache.head = 0;
@@ -2345,7 +2345,7 @@ static void mlx5e_build_ico_cq_param(struct mlx5e_priv *priv,
 
 	mlx5e_build_common_cq_param(priv, param);
 
-	param->cq_period_mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+	param->cq_period_mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 }
 
 static void mlx5e_build_icosq_param(struct mlx5e_priv *priv,
@@ -4545,8 +4545,8 @@ static struct net_dim_cq_moder mlx5e_get_def_rx_moderation(u8 cq_period_mode)
 static u8 mlx5_to_net_dim_cq_period_mode(u8 cq_period_mode)
 {
 	return cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE ?
-		NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE :
-		NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+		DIM_CQ_PERIOD_MODE_START_FROM_CQE :
+		DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 }
 
 void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode)
diff --git a/include/linux/dim.h b/include/linux/dim.h
index 16ef59524b69..adb20e58bf19 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -75,15 +75,15 @@ struct net_dim { /* Dynamic Interrupt Moderation */
 };
 
 enum {
-	NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
-	NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
-	NET_DIM_CQ_PERIOD_NUM_MODES
+	DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
+	DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
+	DIM_CQ_PERIOD_NUM_MODES
 };
 
 enum {
-	NET_DIM_START_MEASURE,
-	NET_DIM_MEASURE_IN_PROGRESS,
-	NET_DIM_APPLY_NEW_PROFILE,
+	DIM_START_MEASURE,
+	DIM_MEASURE_IN_PROGRESS,
+	DIM_APPLY_NEW_PROFILE,
 };
 
 enum {
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index 1ce0899b5f30..98ea0bb6f130 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -78,13 +78,13 @@
 }
 
 static const struct net_dim_cq_moder
-rx_profile[NET_DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
+rx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
 	NET_DIM_RX_EQE_PROFILES,
 	NET_DIM_RX_CQE_PROFILES,
 };
 
 static const struct net_dim_cq_moder
-tx_profile[NET_DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
+tx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
 	NET_DIM_TX_EQE_PROFILES,
 	NET_DIM_TX_CQE_PROFILES,
 };
@@ -101,7 +101,7 @@ net_dim_get_rx_moderation(u8 cq_period_mode, int ix)
 static inline struct net_dim_cq_moder
 net_dim_get_def_rx_moderation(u8 cq_period_mode)
 {
-	u8 profile_ix = cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
+	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
 			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
 
 	return net_dim_get_rx_moderation(cq_period_mode, profile_ix);
@@ -119,7 +119,7 @@ net_dim_get_tx_moderation(u8 cq_period_mode, int ix)
 static inline struct net_dim_cq_moder
 net_dim_get_def_tx_moderation(u8 cq_period_mode)
 {
-	u8 profile_ix = cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
+	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
 			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
 
 	return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
@@ -247,7 +247,7 @@ static inline void net_dim(struct net_dim *dim,
 	u16 nevents;
 
 	switch (dim->state) {
-	case NET_DIM_MEASURE_IN_PROGRESS:
+	case DIM_MEASURE_IN_PROGRESS:
 		nevents = BIT_GAP(BITS_PER_TYPE(u16),
 				  end_sample.event_ctr,
 				  dim->start_sample.event_ctr);
@@ -256,17 +256,17 @@ static inline void net_dim(struct net_dim *dim,
 		dim_calc_stats(&dim->start_sample, &end_sample,
 				   &curr_stats);
 		if (net_dim_decision(&curr_stats, dim)) {
-			dim->state = NET_DIM_APPLY_NEW_PROFILE;
+			dim->state = DIM_APPLY_NEW_PROFILE;
 			schedule_work(&dim->work);
 			break;
 		}
 		/* fall through */
-	case NET_DIM_START_MEASURE:
+	case DIM_START_MEASURE:
 		net_dim_sample(end_sample.event_ctr, end_sample.pkt_ctr, end_sample.byte_ctr,
 			       &dim->start_sample);
-		dim->state = NET_DIM_MEASURE_IN_PROGRESS;
+		dim->state = DIM_MEASURE_IN_PROGRESS;
 		break;
-	case NET_DIM_APPLY_NEW_PROFILE:
+	case DIM_APPLY_NEW_PROFILE:
 		break;
 	}
 }
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 3/9] linux/dim: Rename externally exposed macros
@ 2019-03-06  8:48   ` Tal Gilboa
  0 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)


Renamed macros in use by external drivers.

Signed-off-by: Tal Gilboa <talgi at mellanox.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c     |  4 ++--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c      |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c  |  2 +-
 drivers/net/ethernet/broadcom/genet/bcmgenet.c |  4 ++--
 .../net/ethernet/mellanox/mlx5/core/en_dim.c   |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c  | 10 +++++-----
 include/linux/dim.h                            | 12 ++++++------
 include/linux/net_dim.h                        | 18 +++++++++---------
 8 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index f9521d0274b7..0dbdef541f3c 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1114,7 +1114,7 @@ static void bcm_sysport_dim_work(struct work_struct *work)
 			net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
 
 	bcm_sysport_set_rx_coalesce(priv, cur_profile.usec, cur_profile.pkts);
-	dim->state = NET_DIM_START_MEASURE;
+	dim->state = DIM_START_MEASURE;
 }
 
 /* RX and misc interrupt routine */
@@ -1466,7 +1466,7 @@ static void bcm_sysport_init_dim(struct bcm_sysport_priv *priv,
 	struct bcm_sysport_net_dim *dim = &priv->dim;
 
 	INIT_WORK(&dim->dim.work, cb);
-	dim->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+	dim->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 	dim->event_ctr = 0;
 	dim->packets = 0;
 	dim->bytes = 0;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 6a512871176b..08d0679ccf63 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -7673,7 +7673,7 @@ static void bnxt_enable_napi(struct bnxt *bp)
 
 		if (bp->bnapi[i]->rx_ring) {
 			INIT_WORK(&cpr->dim.work, bnxt_dim_work);
-			cpr->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+			cpr->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 		}
 		napi_enable(&bp->bnapi[i]->napi);
 	}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
index afa97c8bb081..16a4588709d1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
@@ -28,5 +28,5 @@ void bnxt_dim_work(struct work_struct *work)
 	cpr->rx_ring_coal.coal_bufs = cur_moder.pkts;
 
 	bnxt_hwrm_set_ring_coal(bnapi->bp, bnapi);
-	dim->state = NET_DIM_START_MEASURE;
+	dim->state = DIM_START_MEASURE;
 }
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 983245c0867c..eee48dae8ed5 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1928,7 +1928,7 @@ static void bcmgenet_dim_work(struct work_struct *work)
 			net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
 
 	bcmgenet_set_rx_coalesce(ring, cur_profile.usec, cur_profile.pkts);
-	dim->state = NET_DIM_START_MEASURE;
+	dim->state = DIM_START_MEASURE;
 }
 
 /* Assign skb to RX DMA descriptor. */
@@ -2085,7 +2085,7 @@ static void bcmgenet_init_dim(struct bcmgenet_rx_ring *ring,
 	struct bcmgenet_net_dim *dim = &ring->dim;
 
 	INIT_WORK(&dim->dim.work, cb);
-	dim->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+	dim->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 	dim->event_ctr = 0;
 	dim->packets = 0;
 	dim->bytes = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index d67adf70a97b..a80303add7c0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -38,7 +38,7 @@ mlx5e_complete_dim_work(struct net_dim *dim, struct net_dim_cq_moder moder,
 			struct mlx5_core_dev *mdev, struct mlx5_core_cq *mcq)
 {
 	mlx5_core_modify_cq_moderation(mdev, mcq, moder.usec, moder.pkts);
-	dim->state = NET_DIM_START_MEASURE;
+	dim->state = DIM_START_MEASURE;
 }
 
 void mlx5e_rx_dim_work(struct work_struct *work)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 099d307e6f25..7d80f803f445 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -679,11 +679,11 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 
 	switch (params->rx_cq_moderation.cq_period_mode) {
 	case MLX5_CQ_PERIOD_MODE_START_FROM_CQE:
-		rq->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE;
+		rq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_CQE;
 		break;
 	case MLX5_CQ_PERIOD_MODE_START_FROM_EQE:
 	default:
-		rq->dim.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+		rq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 	}
 
 	rq->page_cache.head = 0;
@@ -2345,7 +2345,7 @@ static void mlx5e_build_ico_cq_param(struct mlx5e_priv *priv,
 
 	mlx5e_build_common_cq_param(priv, param);
 
-	param->cq_period_mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+	param->cq_period_mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 }
 
 static void mlx5e_build_icosq_param(struct mlx5e_priv *priv,
@@ -4545,8 +4545,8 @@ static struct net_dim_cq_moder mlx5e_get_def_rx_moderation(u8 cq_period_mode)
 static u8 mlx5_to_net_dim_cq_period_mode(u8 cq_period_mode)
 {
 	return cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE ?
-		NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE :
-		NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+		DIM_CQ_PERIOD_MODE_START_FROM_CQE :
+		DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 }
 
 void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode)
diff --git a/include/linux/dim.h b/include/linux/dim.h
index 16ef59524b69..adb20e58bf19 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -75,15 +75,15 @@ struct net_dim { /* Dynamic Interrupt Moderation */
 };
 
 enum {
-	NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
-	NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
-	NET_DIM_CQ_PERIOD_NUM_MODES
+	DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
+	DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
+	DIM_CQ_PERIOD_NUM_MODES
 };
 
 enum {
-	NET_DIM_START_MEASURE,
-	NET_DIM_MEASURE_IN_PROGRESS,
-	NET_DIM_APPLY_NEW_PROFILE,
+	DIM_START_MEASURE,
+	DIM_MEASURE_IN_PROGRESS,
+	DIM_APPLY_NEW_PROFILE,
 };
 
 enum {
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index 1ce0899b5f30..98ea0bb6f130 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -78,13 +78,13 @@
 }
 
 static const struct net_dim_cq_moder
-rx_profile[NET_DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
+rx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
 	NET_DIM_RX_EQE_PROFILES,
 	NET_DIM_RX_CQE_PROFILES,
 };
 
 static const struct net_dim_cq_moder
-tx_profile[NET_DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
+tx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
 	NET_DIM_TX_EQE_PROFILES,
 	NET_DIM_TX_CQE_PROFILES,
 };
@@ -101,7 +101,7 @@ net_dim_get_rx_moderation(u8 cq_period_mode, int ix)
 static inline struct net_dim_cq_moder
 net_dim_get_def_rx_moderation(u8 cq_period_mode)
 {
-	u8 profile_ix = cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
+	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
 			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
 
 	return net_dim_get_rx_moderation(cq_period_mode, profile_ix);
@@ -119,7 +119,7 @@ net_dim_get_tx_moderation(u8 cq_period_mode, int ix)
 static inline struct net_dim_cq_moder
 net_dim_get_def_tx_moderation(u8 cq_period_mode)
 {
-	u8 profile_ix = cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
+	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
 			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
 
 	return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
@@ -247,7 +247,7 @@ static inline void net_dim(struct net_dim *dim,
 	u16 nevents;
 
 	switch (dim->state) {
-	case NET_DIM_MEASURE_IN_PROGRESS:
+	case DIM_MEASURE_IN_PROGRESS:
 		nevents = BIT_GAP(BITS_PER_TYPE(u16),
 				  end_sample.event_ctr,
 				  dim->start_sample.event_ctr);
@@ -256,17 +256,17 @@ static inline void net_dim(struct net_dim *dim,
 		dim_calc_stats(&dim->start_sample, &end_sample,
 				   &curr_stats);
 		if (net_dim_decision(&curr_stats, dim)) {
-			dim->state = NET_DIM_APPLY_NEW_PROFILE;
+			dim->state = DIM_APPLY_NEW_PROFILE;
 			schedule_work(&dim->work);
 			break;
 		}
 		/* fall through */
-	case NET_DIM_START_MEASURE:
+	case DIM_START_MEASURE:
 		net_dim_sample(end_sample.event_ctr, end_sample.pkt_ctr, end_sample.byte_ctr,
 			       &dim->start_sample);
-		dim->state = NET_DIM_MEASURE_IN_PROGRESS;
+		dim->state = DIM_MEASURE_IN_PROGRESS;
 		break;
-	case NET_DIM_APPLY_NEW_PROFILE:
+	case DIM_APPLY_NEW_PROFILE:
 		break;
 	}
 }
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 4/9] linux/dim: Rename net_dim_sample() to net_dim_create_sample()
  2019-03-06  8:48 ` Tal Gilboa
@ 2019-03-06  8:48   ` Tal Gilboa
  -1 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)
  To: linux-rdma, linux-nvme, linux-block
  Cc: Yishai Hadas, Leon Romanovsky, Jason Gunthorpe, Doug Ledford,
	Tariq Toukan, Tal Gilboa, Saeed Mahameed, Idan Burstein,
	Yamin Friedman, Max Gurtovoy

In order to avoid confusion between the function and the similarly
named struct.
In preparation for removing the 'net' prefix from dim members.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c        | 4 ++--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 8 ++++----
 drivers/net/ethernet/broadcom/genet/bcmgenet.c    | 4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 6 ++----
 include/linux/dim.h                               | 8 ++++----
 include/linux/net_dim.h                           | 4 ++--
 6 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 0dbdef541f3c..00958c4bf740 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1034,8 +1034,8 @@ static int bcm_sysport_poll(struct napi_struct *napi, int budget)
 	}
 
 	if (priv->dim.use_dim) {
-		net_dim_sample(priv->dim.event_ctr, priv->dim.packets,
-			       priv->dim.bytes, &dim_sample);
+		net_dim_create_sample(priv->dim.event_ctr, priv->dim.packets,
+				      priv->dim.bytes, &dim_sample);
 		net_dim(&priv->dim.dim, dim_sample);
 	}
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 08d0679ccf63..9bf7b51d9405 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2107,10 +2107,10 @@ static int bnxt_poll(struct napi_struct *napi, int budget)
 	if (bp->flags & BNXT_FLAG_DIM) {
 		struct net_dim_sample dim_sample;
 
-		net_dim_sample(cpr->event_ctr,
-			       cpr->rx_packets,
-			       cpr->rx_bytes,
-			       &dim_sample);
+		net_dim_create_sample(cpr->event_ctr,
+				      cpr->rx_packets,
+				      cpr->rx_bytes,
+				      &dim_sample);
 		net_dim(&cpr->dim, dim_sample);
 	}
 	mmiowb();
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index eee48dae8ed5..8ab2d672d2ba 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1909,8 +1909,8 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 	}
 
 	if (ring->dim.use_dim) {
-		net_dim_sample(ring->dim.event_ctr, ring->dim.packets,
-			       ring->dim.bytes, &dim_sample);
+		net_dim_create_sample(ring->dim.event_ctr, ring->dim.packets,
+				      ring->dim.bytes, &dim_sample);
 		net_dim(&ring->dim.dim, dim_sample);
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index b4af5e19f6ac..6dd820242f4b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -53,8 +53,7 @@ static void mlx5e_handle_tx_dim(struct mlx5e_txqsq *sq)
 	if (unlikely(!test_bit(MLX5E_SQ_STATE_AM, &sq->state)))
 		return;
 
-	net_dim_sample(sq->cq.event_ctr, stats->packets, stats->bytes,
-		       &dim_sample);
+	net_dim_create_sample(sq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
 	net_dim(&sq->dim, dim_sample);
 }
 
@@ -66,8 +65,7 @@ static void mlx5e_handle_rx_dim(struct mlx5e_rq *rq)
 	if (unlikely(!test_bit(MLX5E_RQ_STATE_AM, &rq->state)))
 		return;
 
-	net_dim_sample(rq->cq.event_ctr, stats->packets, stats->bytes,
-		       &dim_sample);
+	net_dim_create_sample(rq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
 	net_dim(&rq->dim, dim_sample);
 }
 
diff --git a/include/linux/dim.h b/include/linux/dim.h
index adb20e58bf19..7809ffe470ff 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -150,10 +150,10 @@ static inline void dim_park_tired(struct net_dim *dim)
 	dim->tune_state   = DIM_PARKING_TIRED;
 }
 
-static inline void net_dim_sample(u16 event_ctr,
-				  u64 packets,
-				  u64 bytes,
-				  struct net_dim_sample *s)
+static inline void net_dim_create_sample(u16 event_ctr,
+					 u64 packets,
+					 u64 bytes,
+					 struct net_dim_sample *s)
 {
 	s->time	     = ktime_get();
 	s->pkt_ctr   = packets;
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index 98ea0bb6f130..8cddfc93819c 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -262,8 +262,8 @@ static inline void net_dim(struct net_dim *dim,
 		}
 		/* fall through */
 	case DIM_START_MEASURE:
-		net_dim_sample(end_sample.event_ctr, end_sample.pkt_ctr, end_sample.byte_ctr,
-			       &dim->start_sample);
+		net_dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr,
+				      end_sample.byte_ctr, &dim->start_sample);
 		dim->state = DIM_MEASURE_IN_PROGRESS;
 		break;
 	case DIM_APPLY_NEW_PROFILE:
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 4/9] linux/dim: Rename net_dim_sample() to net_dim_create_sample()
@ 2019-03-06  8:48   ` Tal Gilboa
  0 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)


In order to avoid confusion between the function and the similarly
named struct.
In preparation for removing the 'net' prefix from dim members.

Signed-off-by: Tal Gilboa <talgi at mellanox.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c        | 4 ++--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 8 ++++----
 drivers/net/ethernet/broadcom/genet/bcmgenet.c    | 4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 6 ++----
 include/linux/dim.h                               | 8 ++++----
 include/linux/net_dim.h                           | 4 ++--
 6 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 0dbdef541f3c..00958c4bf740 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1034,8 +1034,8 @@ static int bcm_sysport_poll(struct napi_struct *napi, int budget)
 	}
 
 	if (priv->dim.use_dim) {
-		net_dim_sample(priv->dim.event_ctr, priv->dim.packets,
-			       priv->dim.bytes, &dim_sample);
+		net_dim_create_sample(priv->dim.event_ctr, priv->dim.packets,
+				      priv->dim.bytes, &dim_sample);
 		net_dim(&priv->dim.dim, dim_sample);
 	}
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 08d0679ccf63..9bf7b51d9405 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2107,10 +2107,10 @@ static int bnxt_poll(struct napi_struct *napi, int budget)
 	if (bp->flags & BNXT_FLAG_DIM) {
 		struct net_dim_sample dim_sample;
 
-		net_dim_sample(cpr->event_ctr,
-			       cpr->rx_packets,
-			       cpr->rx_bytes,
-			       &dim_sample);
+		net_dim_create_sample(cpr->event_ctr,
+				      cpr->rx_packets,
+				      cpr->rx_bytes,
+				      &dim_sample);
 		net_dim(&cpr->dim, dim_sample);
 	}
 	mmiowb();
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index eee48dae8ed5..8ab2d672d2ba 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1909,8 +1909,8 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 	}
 
 	if (ring->dim.use_dim) {
-		net_dim_sample(ring->dim.event_ctr, ring->dim.packets,
-			       ring->dim.bytes, &dim_sample);
+		net_dim_create_sample(ring->dim.event_ctr, ring->dim.packets,
+				      ring->dim.bytes, &dim_sample);
 		net_dim(&ring->dim.dim, dim_sample);
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index b4af5e19f6ac..6dd820242f4b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -53,8 +53,7 @@ static void mlx5e_handle_tx_dim(struct mlx5e_txqsq *sq)
 	if (unlikely(!test_bit(MLX5E_SQ_STATE_AM, &sq->state)))
 		return;
 
-	net_dim_sample(sq->cq.event_ctr, stats->packets, stats->bytes,
-		       &dim_sample);
+	net_dim_create_sample(sq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
 	net_dim(&sq->dim, dim_sample);
 }
 
@@ -66,8 +65,7 @@ static void mlx5e_handle_rx_dim(struct mlx5e_rq *rq)
 	if (unlikely(!test_bit(MLX5E_RQ_STATE_AM, &rq->state)))
 		return;
 
-	net_dim_sample(rq->cq.event_ctr, stats->packets, stats->bytes,
-		       &dim_sample);
+	net_dim_create_sample(rq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
 	net_dim(&rq->dim, dim_sample);
 }
 
diff --git a/include/linux/dim.h b/include/linux/dim.h
index adb20e58bf19..7809ffe470ff 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -150,10 +150,10 @@ static inline void dim_park_tired(struct net_dim *dim)
 	dim->tune_state   = DIM_PARKING_TIRED;
 }
 
-static inline void net_dim_sample(u16 event_ctr,
-				  u64 packets,
-				  u64 bytes,
-				  struct net_dim_sample *s)
+static inline void net_dim_create_sample(u16 event_ctr,
+					 u64 packets,
+					 u64 bytes,
+					 struct net_dim_sample *s)
 {
 	s->time	     = ktime_get();
 	s->pkt_ctr   = packets;
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index 98ea0bb6f130..8cddfc93819c 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -262,8 +262,8 @@ static inline void net_dim(struct net_dim *dim,
 		}
 		/* fall through */
 	case DIM_START_MEASURE:
-		net_dim_sample(end_sample.event_ctr, end_sample.pkt_ctr, end_sample.byte_ctr,
-			       &dim->start_sample);
+		net_dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr,
+				      end_sample.byte_ctr, &dim->start_sample);
 		dim->state = DIM_MEASURE_IN_PROGRESS;
 		break;
 	case DIM_APPLY_NEW_PROFILE:
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 5/9] linux/dim: Rename externally used net_dim members
  2019-03-06  8:48 ` Tal Gilboa
@ 2019-03-06  8:48   ` Tal Gilboa
  -1 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)
  To: linux-rdma, linux-nvme, linux-block
  Cc: Yishai Hadas, Leon Romanovsky, Jason Gunthorpe, Doug Ledford,
	Tariq Toukan, Tal Gilboa, Saeed Mahameed, Idan Burstein,
	Yamin Friedman, Max Gurtovoy

Removed 'net' prefix from functions and structs used by external drivers.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c    | 16 +++++-----
 drivers/net/ethernet/broadcom/bcmsysport.h    |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 10 +++----
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  2 +-
 .../net/ethernet/broadcom/bnxt/bnxt_debugfs.c |  4 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c |  5 ++--
 .../net/ethernet/broadcom/genet/bcmgenet.c    | 14 ++++-----
 .../net/ethernet/broadcom/genet/bcmgenet.h    |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  8 ++---
 .../net/ethernet/mellanox/mlx5/core/en_dim.c  | 10 +++----
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |  4 +--
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 12 ++++----
 .../net/ethernet/mellanox/mlx5/core/en_txrx.c |  8 ++---
 include/linux/dim.h                           | 28 ++++++++---------
 include/linux/net_dim.h                       | 30 +++++++++----------
 15 files changed, 77 insertions(+), 78 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 00958c4bf740..840b3bf1ae3e 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -627,7 +627,7 @@ static int bcm_sysport_set_coalesce(struct net_device *dev,
 				    struct ethtool_coalesce *ec)
 {
 	struct bcm_sysport_priv *priv = netdev_priv(dev);
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 	u32 usecs, pkts;
 	unsigned int i;
 
@@ -1010,7 +1010,7 @@ static int bcm_sysport_poll(struct napi_struct *napi, int budget)
 {
 	struct bcm_sysport_priv *priv =
 		container_of(napi, struct bcm_sysport_priv, napi);
-	struct net_dim_sample dim_sample;
+	struct dim_sample dim_sample;
 	unsigned int work_done = 0;
 
 	work_done = bcm_sysport_desc_rx(priv, budget);
@@ -1034,8 +1034,8 @@ static int bcm_sysport_poll(struct napi_struct *napi, int budget)
 	}
 
 	if (priv->dim.use_dim) {
-		net_dim_create_sample(priv->dim.event_ctr, priv->dim.packets,
-				      priv->dim.bytes, &dim_sample);
+		dim_create_sample(priv->dim.event_ctr, priv->dim.packets,
+				  priv->dim.bytes, &dim_sample);
 		net_dim(&priv->dim.dim, dim_sample);
 	}
 
@@ -1105,13 +1105,13 @@ static void bcm_sysport_resume_from_wol(struct bcm_sysport_priv *priv)
 
 static void bcm_sysport_dim_work(struct work_struct *work)
 {
-	struct net_dim *dim = container_of(work, struct net_dim, work);
+	struct dim *dim = container_of(work, struct dim, work);
 	struct bcm_sysport_net_dim *ndim =
 			container_of(dim, struct bcm_sysport_net_dim, dim);
 	struct bcm_sysport_priv *priv =
 			container_of(ndim, struct bcm_sysport_priv, dim);
-	struct net_dim_cq_moder cur_profile =
-			net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
+	struct dim_cq_moder cur_profile = net_dim_get_rx_moderation(dim->mode,
+								    dim->profile_ix);
 
 	bcm_sysport_set_rx_coalesce(priv, cur_profile.usec, cur_profile.pkts);
 	dim->state = DIM_START_MEASURE;
@@ -1475,7 +1475,7 @@ static void bcm_sysport_init_dim(struct bcm_sysport_priv *priv,
 static void bcm_sysport_init_rx_coalesce(struct bcm_sysport_priv *priv)
 {
 	struct bcm_sysport_net_dim *dim = &priv->dim;
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 	u32 usecs, pkts;
 
 	usecs = priv->rx_coalesce_usecs;
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.h b/drivers/net/ethernet/broadcom/bcmsysport.h
index 0887e6356649..db9143eeb817 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.h
+++ b/drivers/net/ethernet/broadcom/bcmsysport.h
@@ -710,7 +710,7 @@ struct bcm_sysport_net_dim {
 	u16			event_ctr;
 	unsigned long		packets;
 	unsigned long		bytes;
-	struct net_dim		dim;
+	struct dim		dim;
 };
 
 /* Software view of the TX ring */
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 9bf7b51d9405..131ab07aad83 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2105,12 +2105,12 @@ static int bnxt_poll(struct napi_struct *napi, int budget)
 		}
 	}
 	if (bp->flags & BNXT_FLAG_DIM) {
-		struct net_dim_sample dim_sample;
+		struct dim_sample dim_sample;
 
-		net_dim_create_sample(cpr->event_ctr,
-				      cpr->rx_packets,
-				      cpr->rx_bytes,
-				      &dim_sample);
+		dim_create_sample(cpr->event_ctr,
+				  cpr->rx_packets,
+				  cpr->rx_bytes,
+				  &dim_sample);
 		net_dim(&cpr->dim, dim_sample);
 	}
 	mmiowb();
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 5c886a700bcc..211f68d3e14b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -810,7 +810,7 @@ struct bnxt_cp_ring_info {
 	u64			rx_bytes;
 	u64			event_ctr;
 
-	struct net_dim		dim;
+	struct dim		dim;
 
 	union {
 		struct tx_cmp	*cp_desc_ring[MAX_CP_PAGES];
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c
index 94e208e9789f..3d1d53fbb135 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c
@@ -21,7 +21,7 @@ static ssize_t debugfs_dim_read(struct file *filep,
 				char __user *buffer,
 				size_t count, loff_t *ppos)
 {
-	struct net_dim *dim = filep->private_data;
+	struct dim *dim = filep->private_data;
 	int len;
 	char *buf;
 
@@ -61,7 +61,7 @@ static const struct file_operations debugfs_dim_fops = {
 	.read = debugfs_dim_read,
 };
 
-static struct dentry *debugfs_dim_ring_init(struct net_dim *dim, int ring_idx,
+static struct dentry *debugfs_dim_ring_init(struct dim *dim, int ring_idx,
 					    struct dentry *dd)
 {
 	static char qname[16];
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
index 16a4588709d1..11605f9fa61e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
@@ -13,15 +13,14 @@
 
 void bnxt_dim_work(struct work_struct *work)
 {
-	struct net_dim *dim = container_of(work, struct net_dim,
-					   work);
+	struct dim *dim = container_of(work, struct dim, work);
 	struct bnxt_cp_ring_info *cpr = container_of(dim,
 						     struct bnxt_cp_ring_info,
 						     dim);
 	struct bnxt_napi *bnapi = container_of(cpr,
 					       struct bnxt_napi,
 					       cp_ring);
-	struct net_dim_cq_moder cur_moder =
+	struct dim_cq_moder cur_moder =
 		net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
 
 	cpr->rx_ring_coal.coal_ticks = cur_moder.usec;
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 8ab2d672d2ba..68d96e333c6d 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -643,7 +643,7 @@ static void bcmgenet_set_rx_coalesce(struct bcmgenet_rx_ring *ring,
 static void bcmgenet_set_ring_rx_coalesce(struct bcmgenet_rx_ring *ring,
 					  struct ethtool_coalesce *ec)
 {
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 	u32 usecs, pkts;
 
 	ring->rx_coalesce_usecs = ec->rx_coalesce_usecs;
@@ -1898,7 +1898,7 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 {
 	struct bcmgenet_rx_ring *ring = container_of(napi,
 			struct bcmgenet_rx_ring, napi);
-	struct net_dim_sample dim_sample;
+	struct dim_sample dim_sample;
 	unsigned int work_done;
 
 	work_done = bcmgenet_desc_rx(ring, budget);
@@ -1909,8 +1909,8 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 	}
 
 	if (ring->dim.use_dim) {
-		net_dim_create_sample(ring->dim.event_ctr, ring->dim.packets,
-				      ring->dim.bytes, &dim_sample);
+		dim_create_sample(ring->dim.event_ctr, ring->dim.packets,
+				  ring->dim.bytes, &dim_sample);
 		net_dim(&ring->dim.dim, dim_sample);
 	}
 
@@ -1919,12 +1919,12 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 
 static void bcmgenet_dim_work(struct work_struct *work)
 {
-	struct net_dim *dim = container_of(work, struct net_dim, work);
+	struct dim *dim = container_of(work, struct dim, work);
 	struct bcmgenet_net_dim *ndim =
 			container_of(dim, struct bcmgenet_net_dim, dim);
 	struct bcmgenet_rx_ring *ring =
 			container_of(ndim, struct bcmgenet_rx_ring, dim);
-	struct net_dim_cq_moder cur_profile =
+	struct dim_cq_moder cur_profile =
 			net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
 
 	bcmgenet_set_rx_coalesce(ring, cur_profile.usec, cur_profile.pkts);
@@ -2094,7 +2094,7 @@ static void bcmgenet_init_dim(struct bcmgenet_rx_ring *ring,
 static void bcmgenet_init_rx_coalesce(struct bcmgenet_rx_ring *ring)
 {
 	struct bcmgenet_net_dim *dim = &ring->dim;
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 	u32 usecs, pkts;
 
 	usecs = ring->rx_coalesce_usecs;
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index 14b49612aa86..6e418d9c3706 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -581,7 +581,7 @@ struct bcmgenet_net_dim {
 	u16		event_ctr;
 	unsigned long	packets;
 	unsigned long	bytes;
-	struct net_dim	dim;
+	struct dim	dim;
 };
 
 struct bcmgenet_rx_ring {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 6dd74ef69389..94826b48aead 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -238,8 +238,8 @@ struct mlx5e_params {
 	u16 num_channels;
 	u8  num_tc;
 	bool rx_cqe_compress_def;
-	struct net_dim_cq_moder rx_cq_moderation;
-	struct net_dim_cq_moder tx_cq_moderation;
+	struct dim_cq_moder rx_cq_moderation;
+	struct dim_cq_moder tx_cq_moderation;
 	bool lro_en;
 	u32 lro_wqe_sz;
 	u8  tx_min_inline_mode;
@@ -356,7 +356,7 @@ struct mlx5e_txqsq {
 	/* dirtied @completion */
 	u16                        cc;
 	u32                        dma_fifo_cc;
-	struct net_dim             dim; /* Adaptive Moderation */
+	struct dim                 dim; /* Adaptive Moderation */
 
 	/* dirtied @xmit */
 	u16                        pc ____cacheline_aligned_in_smp;
@@ -593,7 +593,7 @@ struct mlx5e_rq {
 	int                    ix;
 	unsigned int           hw_mtu;
 
-	struct net_dim         dim; /* Dynamic Interrupt Moderation */
+	struct dim         dim; /* Dynamic Interrupt Moderation */
 
 	/* XDP */
 	struct bpf_prog       *xdp_prog;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index a80303add7c0..ba3c1be9f2d3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -34,7 +34,7 @@
 #include "en.h"
 
 static void
-mlx5e_complete_dim_work(struct net_dim *dim, struct net_dim_cq_moder moder,
+mlx5e_complete_dim_work(struct dim *dim, struct dim_cq_moder moder,
 			struct mlx5_core_dev *mdev, struct mlx5_core_cq *mcq)
 {
 	mlx5_core_modify_cq_moderation(mdev, mcq, moder.usec, moder.pkts);
@@ -43,9 +43,9 @@ mlx5e_complete_dim_work(struct net_dim *dim, struct net_dim_cq_moder moder,
 
 void mlx5e_rx_dim_work(struct work_struct *work)
 {
-	struct net_dim *dim = container_of(work, struct net_dim, work);
+	struct dim *dim = container_of(work, struct dim, work);
 	struct mlx5e_rq *rq = container_of(dim, struct mlx5e_rq, dim);
-	struct net_dim_cq_moder cur_moder =
+	struct dim_cq_moder cur_moder =
 		net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
 
 	mlx5e_complete_dim_work(dim, cur_moder, rq->mdev, &rq->cq.mcq);
@@ -53,9 +53,9 @@ void mlx5e_rx_dim_work(struct work_struct *work)
 
 void mlx5e_tx_dim_work(struct work_struct *work)
 {
-	struct net_dim *dim = container_of(work, struct net_dim, work);
+	struct dim *dim = container_of(work, struct dim, work);
 	struct mlx5e_txqsq *sq = container_of(dim, struct mlx5e_txqsq, dim);
-	struct net_dim_cq_moder cur_moder =
+	struct dim_cq_moder cur_moder =
 		net_dim_get_tx_moderation(dim->mode, dim->profile_ix);
 
 	mlx5e_complete_dim_work(dim, cur_moder, sq->cq.mdev, &sq->cq.mcq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 3bbccead2f63..5c693556cd1e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -399,7 +399,7 @@ static int mlx5e_set_channels(struct net_device *dev,
 int mlx5e_ethtool_get_coalesce(struct mlx5e_priv *priv,
 			       struct ethtool_coalesce *coal)
 {
-	struct net_dim_cq_moder *rx_moder, *tx_moder;
+	struct dim_cq_moder *rx_moder, *tx_moder;
 
 	if (!MLX5_CAP_GEN(priv->mdev, cq_moderation))
 		return -EOPNOTSUPP;
@@ -454,7 +454,7 @@ mlx5e_set_priv_channels_coalesce(struct mlx5e_priv *priv, struct ethtool_coalesc
 int mlx5e_ethtool_set_coalesce(struct mlx5e_priv *priv,
 			       struct ethtool_coalesce *coal)
 {
-	struct net_dim_cq_moder *rx_moder, *tx_moder;
+	struct dim_cq_moder *rx_moder, *tx_moder;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	struct mlx5e_channels new_channels = {};
 	int err = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 7d80f803f445..6ed12a77cdb4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1773,7 +1773,7 @@ static void mlx5e_destroy_cq(struct mlx5e_cq *cq)
 }
 
 static int mlx5e_open_cq(struct mlx5e_channel *c,
-			 struct net_dim_cq_moder moder,
+			 struct dim_cq_moder moder,
 			 struct mlx5e_cq_param *param,
 			 struct mlx5e_cq *cq)
 {
@@ -1978,7 +1978,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 			      struct mlx5e_channel **cp)
 {
 	int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
-	struct net_dim_cq_moder icocq_moder = {0, 0};
+	struct dim_cq_moder icocq_moder = {0, 0};
 	struct net_device *netdev = priv->netdev;
 	struct mlx5e_channel *c;
 	unsigned int irq;
@@ -4516,9 +4516,9 @@ static bool slow_pci_heuristic(struct mlx5_core_dev *mdev)
 		link_speed > MLX5E_SLOW_PCI_RATIO * pci_bw;
 }
 
-static struct net_dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode)
+static struct dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode)
 {
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 
 	moder.cq_period_mode = cq_period_mode;
 	moder.pkts = MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_PKTS;
@@ -4529,9 +4529,9 @@ static struct net_dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode)
 	return moder;
 }
 
-static struct net_dim_cq_moder mlx5e_get_def_rx_moderation(u8 cq_period_mode)
+static struct dim_cq_moder mlx5e_get_def_rx_moderation(u8 cq_period_mode)
 {
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 
 	moder.cq_period_mode = cq_period_mode;
 	moder.pkts = MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_PKTS;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index 6dd820242f4b..432474754d77 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -48,24 +48,24 @@ static inline bool mlx5e_channel_no_affinity_change(struct mlx5e_channel *c)
 static void mlx5e_handle_tx_dim(struct mlx5e_txqsq *sq)
 {
 	struct mlx5e_sq_stats *stats = sq->stats;
-	struct net_dim_sample dim_sample;
+	struct dim_sample dim_sample;
 
 	if (unlikely(!test_bit(MLX5E_SQ_STATE_AM, &sq->state)))
 		return;
 
-	net_dim_create_sample(sq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
+	dim_create_sample(sq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
 	net_dim(&sq->dim, dim_sample);
 }
 
 static void mlx5e_handle_rx_dim(struct mlx5e_rq *rq)
 {
 	struct mlx5e_rq_stats *stats = rq->stats;
-	struct net_dim_sample dim_sample;
+	struct dim_sample dim_sample;
 
 	if (unlikely(!test_bit(MLX5E_RQ_STATE_AM, &rq->state)))
 		return;
 
-	net_dim_create_sample(rq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
+	dim_create_sample(rq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
 	net_dim(&rq->dim, dim_sample);
 }
 
diff --git a/include/linux/dim.h b/include/linux/dim.h
index 7809ffe470ff..8d9f8279ebee 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -42,13 +42,13 @@
 #define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1))
 
 
-struct net_dim_cq_moder {
+struct dim_cq_moder {
 	u16 usec;
 	u16 pkts;
 	u8 cq_period_mode;
 };
 
-struct net_dim_sample {
+struct dim_sample {
 	ktime_t time;
 	u32     pkt_ctr;
 	u32     byte_ctr;
@@ -61,10 +61,10 @@ struct dim_stats {
 	int epms; /* events per msec */
 };
 
-struct net_dim { /* Dynamic Interrupt Moderation */
+struct dim { /* Dynamic Interrupt Moderation */
 	u8                                      state;
 	struct dim_stats                        prev_stats;
-	struct net_dim_sample                   start_sample;
+	struct dim_sample                       start_sample;
 	struct work_struct                      work;
 	u8                                      profile_ix;
 	u8                                      mode;
@@ -105,7 +105,7 @@ enum {
 	DIM_ON_EDGE,
 };
 
-static inline bool dim_on_top(struct net_dim *dim)
+static inline bool dim_on_top(struct dim *dim)
 {
 	switch (dim->tune_state) {
 	case DIM_PARKING_ON_TOP:
@@ -118,7 +118,7 @@ static inline bool dim_on_top(struct net_dim *dim)
 	}
 }
 
-static inline void dim_turn(struct net_dim *dim)
+static inline void dim_turn(struct dim *dim)
 {
 	switch (dim->tune_state) {
 	case DIM_PARKING_ON_TOP:
@@ -135,7 +135,7 @@ static inline void dim_turn(struct net_dim *dim)
 	}
 }
 
-static inline void dim_park_on_top(struct net_dim *dim)
+static inline void dim_park_on_top(struct dim *dim)
 {
 	dim->steps_right  = 0;
 	dim->steps_left   = 0;
@@ -143,17 +143,17 @@ static inline void dim_park_on_top(struct net_dim *dim)
 	dim->tune_state   = DIM_PARKING_ON_TOP;
 }
 
-static inline void dim_park_tired(struct net_dim *dim)
+static inline void dim_park_tired(struct dim *dim)
 {
 	dim->steps_right  = 0;
 	dim->steps_left   = 0;
 	dim->tune_state   = DIM_PARKING_TIRED;
 }
 
-static inline void net_dim_create_sample(u16 event_ctr,
-					 u64 packets,
-					 u64 bytes,
-					 struct net_dim_sample *s)
+static inline void dim_create_sample(u16 event_ctr,
+				     u64 packets,
+				     u64 bytes,
+				     struct dim_sample *s)
 {
 	s->time	     = ktime_get();
 	s->pkt_ctr   = packets;
@@ -161,8 +161,8 @@ static inline void net_dim_create_sample(u16 event_ctr,
 	s->event_ctr = event_ctr;
 }
 
-static inline void dim_calc_stats(struct net_dim_sample *start,
-				  struct net_dim_sample *end,
+static inline void dim_calc_stats(struct dim_sample *start,
+				  struct dim_sample *end,
 				  struct dim_stats *curr_stats)
 {
 	/* u32 holds up to 71 minutes, should be enough */
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index 8cddfc93819c..e9363c372b68 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -77,28 +77,28 @@
 	{64, 32}   \
 }
 
-static const struct net_dim_cq_moder
+static const struct dim_cq_moder
 rx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
 	NET_DIM_RX_EQE_PROFILES,
 	NET_DIM_RX_CQE_PROFILES,
 };
 
-static const struct net_dim_cq_moder
+static const struct dim_cq_moder
 tx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
 	NET_DIM_TX_EQE_PROFILES,
 	NET_DIM_TX_CQE_PROFILES,
 };
 
-static inline struct net_dim_cq_moder
+static inline struct dim_cq_moder
 net_dim_get_rx_moderation(u8 cq_period_mode, int ix)
 {
-	struct net_dim_cq_moder cq_moder = rx_profile[cq_period_mode][ix];
+	struct dim_cq_moder cq_moder = rx_profile[cq_period_mode][ix];
 
 	cq_moder.cq_period_mode = cq_period_mode;
 	return cq_moder;
 }
 
-static inline struct net_dim_cq_moder
+static inline struct dim_cq_moder
 net_dim_get_def_rx_moderation(u8 cq_period_mode)
 {
 	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
@@ -107,16 +107,16 @@ net_dim_get_def_rx_moderation(u8 cq_period_mode)
 	return net_dim_get_rx_moderation(cq_period_mode, profile_ix);
 }
 
-static inline struct net_dim_cq_moder
+static inline struct dim_cq_moder
 net_dim_get_tx_moderation(u8 cq_period_mode, int ix)
 {
-	struct net_dim_cq_moder cq_moder = tx_profile[cq_period_mode][ix];
+	struct dim_cq_moder cq_moder = tx_profile[cq_period_mode][ix];
 
 	cq_moder.cq_period_mode = cq_period_mode;
 	return cq_moder;
 }
 
-static inline struct net_dim_cq_moder
+static inline struct dim_cq_moder
 net_dim_get_def_tx_moderation(u8 cq_period_mode)
 {
 	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
@@ -125,7 +125,7 @@ net_dim_get_def_tx_moderation(u8 cq_period_mode)
 	return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
 }
 
-static inline int net_dim_step(struct net_dim *dim)
+static inline int net_dim_step(struct dim *dim)
 {
 	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
 		return DIM_TOO_TIRED;
@@ -152,7 +152,7 @@ static inline int net_dim_step(struct net_dim *dim)
 	return DIM_STEPPED;
 }
 
-static inline void net_dim_exit_parking(struct net_dim *dim)
+static inline void net_dim_exit_parking(struct dim *dim)
 {
 	dim->tune_state = dim->profile_ix ? DIM_GOING_LEFT :
 					  DIM_GOING_RIGHT;
@@ -189,7 +189,7 @@ static inline int net_dim_stats_compare(struct dim_stats *curr,
 }
 
 static inline bool net_dim_decision(struct dim_stats *curr_stats,
-				    struct net_dim *dim)
+				    struct dim *dim)
 {
 	int prev_state = dim->tune_state;
 	int prev_ix = dim->profile_ix;
@@ -240,8 +240,8 @@ static inline bool net_dim_decision(struct dim_stats *curr_stats,
 	return dim->profile_ix != prev_ix;
 }
 
-static inline void net_dim(struct net_dim *dim,
-			   struct net_dim_sample end_sample)
+static inline void net_dim(struct dim *dim,
+			   struct dim_sample end_sample)
 {
 	struct dim_stats curr_stats;
 	u16 nevents;
@@ -262,8 +262,8 @@ static inline void net_dim(struct net_dim *dim,
 		}
 		/* fall through */
 	case DIM_START_MEASURE:
-		net_dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr,
-				      end_sample.byte_ctr, &dim->start_sample);
+		dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr,
+				  end_sample.byte_ctr, &dim->start_sample);
 		dim->state = DIM_MEASURE_IN_PROGRESS;
 		break;
 	case DIM_APPLY_NEW_PROFILE:
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 5/9] linux/dim: Rename externally used net_dim members
@ 2019-03-06  8:48   ` Tal Gilboa
  0 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)


Removed 'net' prefix from functions and structs used by external drivers.

Signed-off-by: Tal Gilboa <talgi at mellanox.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c    | 16 +++++-----
 drivers/net/ethernet/broadcom/bcmsysport.h    |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 10 +++----
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  2 +-
 .../net/ethernet/broadcom/bnxt/bnxt_debugfs.c |  4 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c |  5 ++--
 .../net/ethernet/broadcom/genet/bcmgenet.c    | 14 ++++-----
 .../net/ethernet/broadcom/genet/bcmgenet.h    |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  8 ++---
 .../net/ethernet/mellanox/mlx5/core/en_dim.c  | 10 +++----
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |  4 +--
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 12 ++++----
 .../net/ethernet/mellanox/mlx5/core/en_txrx.c |  8 ++---
 include/linux/dim.h                           | 28 ++++++++---------
 include/linux/net_dim.h                       | 30 +++++++++----------
 15 files changed, 77 insertions(+), 78 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 00958c4bf740..840b3bf1ae3e 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -627,7 +627,7 @@ static int bcm_sysport_set_coalesce(struct net_device *dev,
 				    struct ethtool_coalesce *ec)
 {
 	struct bcm_sysport_priv *priv = netdev_priv(dev);
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 	u32 usecs, pkts;
 	unsigned int i;
 
@@ -1010,7 +1010,7 @@ static int bcm_sysport_poll(struct napi_struct *napi, int budget)
 {
 	struct bcm_sysport_priv *priv =
 		container_of(napi, struct bcm_sysport_priv, napi);
-	struct net_dim_sample dim_sample;
+	struct dim_sample dim_sample;
 	unsigned int work_done = 0;
 
 	work_done = bcm_sysport_desc_rx(priv, budget);
@@ -1034,8 +1034,8 @@ static int bcm_sysport_poll(struct napi_struct *napi, int budget)
 	}
 
 	if (priv->dim.use_dim) {
-		net_dim_create_sample(priv->dim.event_ctr, priv->dim.packets,
-				      priv->dim.bytes, &dim_sample);
+		dim_create_sample(priv->dim.event_ctr, priv->dim.packets,
+				  priv->dim.bytes, &dim_sample);
 		net_dim(&priv->dim.dim, dim_sample);
 	}
 
@@ -1105,13 +1105,13 @@ static void bcm_sysport_resume_from_wol(struct bcm_sysport_priv *priv)
 
 static void bcm_sysport_dim_work(struct work_struct *work)
 {
-	struct net_dim *dim = container_of(work, struct net_dim, work);
+	struct dim *dim = container_of(work, struct dim, work);
 	struct bcm_sysport_net_dim *ndim =
 			container_of(dim, struct bcm_sysport_net_dim, dim);
 	struct bcm_sysport_priv *priv =
 			container_of(ndim, struct bcm_sysport_priv, dim);
-	struct net_dim_cq_moder cur_profile =
-			net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
+	struct dim_cq_moder cur_profile = net_dim_get_rx_moderation(dim->mode,
+								    dim->profile_ix);
 
 	bcm_sysport_set_rx_coalesce(priv, cur_profile.usec, cur_profile.pkts);
 	dim->state = DIM_START_MEASURE;
@@ -1475,7 +1475,7 @@ static void bcm_sysport_init_dim(struct bcm_sysport_priv *priv,
 static void bcm_sysport_init_rx_coalesce(struct bcm_sysport_priv *priv)
 {
 	struct bcm_sysport_net_dim *dim = &priv->dim;
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 	u32 usecs, pkts;
 
 	usecs = priv->rx_coalesce_usecs;
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.h b/drivers/net/ethernet/broadcom/bcmsysport.h
index 0887e6356649..db9143eeb817 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.h
+++ b/drivers/net/ethernet/broadcom/bcmsysport.h
@@ -710,7 +710,7 @@ struct bcm_sysport_net_dim {
 	u16			event_ctr;
 	unsigned long		packets;
 	unsigned long		bytes;
-	struct net_dim		dim;
+	struct dim		dim;
 };
 
 /* Software view of the TX ring */
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 9bf7b51d9405..131ab07aad83 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2105,12 +2105,12 @@ static int bnxt_poll(struct napi_struct *napi, int budget)
 		}
 	}
 	if (bp->flags & BNXT_FLAG_DIM) {
-		struct net_dim_sample dim_sample;
+		struct dim_sample dim_sample;
 
-		net_dim_create_sample(cpr->event_ctr,
-				      cpr->rx_packets,
-				      cpr->rx_bytes,
-				      &dim_sample);
+		dim_create_sample(cpr->event_ctr,
+				  cpr->rx_packets,
+				  cpr->rx_bytes,
+				  &dim_sample);
 		net_dim(&cpr->dim, dim_sample);
 	}
 	mmiowb();
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 5c886a700bcc..211f68d3e14b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -810,7 +810,7 @@ struct bnxt_cp_ring_info {
 	u64			rx_bytes;
 	u64			event_ctr;
 
-	struct net_dim		dim;
+	struct dim		dim;
 
 	union {
 		struct tx_cmp	*cp_desc_ring[MAX_CP_PAGES];
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c
index 94e208e9789f..3d1d53fbb135 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c
@@ -21,7 +21,7 @@ static ssize_t debugfs_dim_read(struct file *filep,
 				char __user *buffer,
 				size_t count, loff_t *ppos)
 {
-	struct net_dim *dim = filep->private_data;
+	struct dim *dim = filep->private_data;
 	int len;
 	char *buf;
 
@@ -61,7 +61,7 @@ static const struct file_operations debugfs_dim_fops = {
 	.read = debugfs_dim_read,
 };
 
-static struct dentry *debugfs_dim_ring_init(struct net_dim *dim, int ring_idx,
+static struct dentry *debugfs_dim_ring_init(struct dim *dim, int ring_idx,
 					    struct dentry *dd)
 {
 	static char qname[16];
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
index 16a4588709d1..11605f9fa61e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
@@ -13,15 +13,14 @@
 
 void bnxt_dim_work(struct work_struct *work)
 {
-	struct net_dim *dim = container_of(work, struct net_dim,
-					   work);
+	struct dim *dim = container_of(work, struct dim, work);
 	struct bnxt_cp_ring_info *cpr = container_of(dim,
 						     struct bnxt_cp_ring_info,
 						     dim);
 	struct bnxt_napi *bnapi = container_of(cpr,
 					       struct bnxt_napi,
 					       cp_ring);
-	struct net_dim_cq_moder cur_moder =
+	struct dim_cq_moder cur_moder =
 		net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
 
 	cpr->rx_ring_coal.coal_ticks = cur_moder.usec;
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 8ab2d672d2ba..68d96e333c6d 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -643,7 +643,7 @@ static void bcmgenet_set_rx_coalesce(struct bcmgenet_rx_ring *ring,
 static void bcmgenet_set_ring_rx_coalesce(struct bcmgenet_rx_ring *ring,
 					  struct ethtool_coalesce *ec)
 {
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 	u32 usecs, pkts;
 
 	ring->rx_coalesce_usecs = ec->rx_coalesce_usecs;
@@ -1898,7 +1898,7 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 {
 	struct bcmgenet_rx_ring *ring = container_of(napi,
 			struct bcmgenet_rx_ring, napi);
-	struct net_dim_sample dim_sample;
+	struct dim_sample dim_sample;
 	unsigned int work_done;
 
 	work_done = bcmgenet_desc_rx(ring, budget);
@@ -1909,8 +1909,8 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 	}
 
 	if (ring->dim.use_dim) {
-		net_dim_create_sample(ring->dim.event_ctr, ring->dim.packets,
-				      ring->dim.bytes, &dim_sample);
+		dim_create_sample(ring->dim.event_ctr, ring->dim.packets,
+				  ring->dim.bytes, &dim_sample);
 		net_dim(&ring->dim.dim, dim_sample);
 	}
 
@@ -1919,12 +1919,12 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 
 static void bcmgenet_dim_work(struct work_struct *work)
 {
-	struct net_dim *dim = container_of(work, struct net_dim, work);
+	struct dim *dim = container_of(work, struct dim, work);
 	struct bcmgenet_net_dim *ndim =
 			container_of(dim, struct bcmgenet_net_dim, dim);
 	struct bcmgenet_rx_ring *ring =
 			container_of(ndim, struct bcmgenet_rx_ring, dim);
-	struct net_dim_cq_moder cur_profile =
+	struct dim_cq_moder cur_profile =
 			net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
 
 	bcmgenet_set_rx_coalesce(ring, cur_profile.usec, cur_profile.pkts);
@@ -2094,7 +2094,7 @@ static void bcmgenet_init_dim(struct bcmgenet_rx_ring *ring,
 static void bcmgenet_init_rx_coalesce(struct bcmgenet_rx_ring *ring)
 {
 	struct bcmgenet_net_dim *dim = &ring->dim;
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 	u32 usecs, pkts;
 
 	usecs = ring->rx_coalesce_usecs;
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
index 14b49612aa86..6e418d9c3706 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -581,7 +581,7 @@ struct bcmgenet_net_dim {
 	u16		event_ctr;
 	unsigned long	packets;
 	unsigned long	bytes;
-	struct net_dim	dim;
+	struct dim	dim;
 };
 
 struct bcmgenet_rx_ring {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 6dd74ef69389..94826b48aead 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -238,8 +238,8 @@ struct mlx5e_params {
 	u16 num_channels;
 	u8  num_tc;
 	bool rx_cqe_compress_def;
-	struct net_dim_cq_moder rx_cq_moderation;
-	struct net_dim_cq_moder tx_cq_moderation;
+	struct dim_cq_moder rx_cq_moderation;
+	struct dim_cq_moder tx_cq_moderation;
 	bool lro_en;
 	u32 lro_wqe_sz;
 	u8  tx_min_inline_mode;
@@ -356,7 +356,7 @@ struct mlx5e_txqsq {
 	/* dirtied @completion */
 	u16                        cc;
 	u32                        dma_fifo_cc;
-	struct net_dim             dim; /* Adaptive Moderation */
+	struct dim                 dim; /* Adaptive Moderation */
 
 	/* dirtied @xmit */
 	u16                        pc ____cacheline_aligned_in_smp;
@@ -593,7 +593,7 @@ struct mlx5e_rq {
 	int                    ix;
 	unsigned int           hw_mtu;
 
-	struct net_dim         dim; /* Dynamic Interrupt Moderation */
+	struct dim         dim; /* Dynamic Interrupt Moderation */
 
 	/* XDP */
 	struct bpf_prog       *xdp_prog;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index a80303add7c0..ba3c1be9f2d3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -34,7 +34,7 @@
 #include "en.h"
 
 static void
-mlx5e_complete_dim_work(struct net_dim *dim, struct net_dim_cq_moder moder,
+mlx5e_complete_dim_work(struct dim *dim, struct dim_cq_moder moder,
 			struct mlx5_core_dev *mdev, struct mlx5_core_cq *mcq)
 {
 	mlx5_core_modify_cq_moderation(mdev, mcq, moder.usec, moder.pkts);
@@ -43,9 +43,9 @@ mlx5e_complete_dim_work(struct net_dim *dim, struct net_dim_cq_moder moder,
 
 void mlx5e_rx_dim_work(struct work_struct *work)
 {
-	struct net_dim *dim = container_of(work, struct net_dim, work);
+	struct dim *dim = container_of(work, struct dim, work);
 	struct mlx5e_rq *rq = container_of(dim, struct mlx5e_rq, dim);
-	struct net_dim_cq_moder cur_moder =
+	struct dim_cq_moder cur_moder =
 		net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
 
 	mlx5e_complete_dim_work(dim, cur_moder, rq->mdev, &rq->cq.mcq);
@@ -53,9 +53,9 @@ void mlx5e_rx_dim_work(struct work_struct *work)
 
 void mlx5e_tx_dim_work(struct work_struct *work)
 {
-	struct net_dim *dim = container_of(work, struct net_dim, work);
+	struct dim *dim = container_of(work, struct dim, work);
 	struct mlx5e_txqsq *sq = container_of(dim, struct mlx5e_txqsq, dim);
-	struct net_dim_cq_moder cur_moder =
+	struct dim_cq_moder cur_moder =
 		net_dim_get_tx_moderation(dim->mode, dim->profile_ix);
 
 	mlx5e_complete_dim_work(dim, cur_moder, sq->cq.mdev, &sq->cq.mcq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 3bbccead2f63..5c693556cd1e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -399,7 +399,7 @@ static int mlx5e_set_channels(struct net_device *dev,
 int mlx5e_ethtool_get_coalesce(struct mlx5e_priv *priv,
 			       struct ethtool_coalesce *coal)
 {
-	struct net_dim_cq_moder *rx_moder, *tx_moder;
+	struct dim_cq_moder *rx_moder, *tx_moder;
 
 	if (!MLX5_CAP_GEN(priv->mdev, cq_moderation))
 		return -EOPNOTSUPP;
@@ -454,7 +454,7 @@ mlx5e_set_priv_channels_coalesce(struct mlx5e_priv *priv, struct ethtool_coalesc
 int mlx5e_ethtool_set_coalesce(struct mlx5e_priv *priv,
 			       struct ethtool_coalesce *coal)
 {
-	struct net_dim_cq_moder *rx_moder, *tx_moder;
+	struct dim_cq_moder *rx_moder, *tx_moder;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	struct mlx5e_channels new_channels = {};
 	int err = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 7d80f803f445..6ed12a77cdb4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1773,7 +1773,7 @@ static void mlx5e_destroy_cq(struct mlx5e_cq *cq)
 }
 
 static int mlx5e_open_cq(struct mlx5e_channel *c,
-			 struct net_dim_cq_moder moder,
+			 struct dim_cq_moder moder,
 			 struct mlx5e_cq_param *param,
 			 struct mlx5e_cq *cq)
 {
@@ -1978,7 +1978,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 			      struct mlx5e_channel **cp)
 {
 	int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
-	struct net_dim_cq_moder icocq_moder = {0, 0};
+	struct dim_cq_moder icocq_moder = {0, 0};
 	struct net_device *netdev = priv->netdev;
 	struct mlx5e_channel *c;
 	unsigned int irq;
@@ -4516,9 +4516,9 @@ static bool slow_pci_heuristic(struct mlx5_core_dev *mdev)
 		link_speed > MLX5E_SLOW_PCI_RATIO * pci_bw;
 }
 
-static struct net_dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode)
+static struct dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode)
 {
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 
 	moder.cq_period_mode = cq_period_mode;
 	moder.pkts = MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_PKTS;
@@ -4529,9 +4529,9 @@ static struct net_dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode)
 	return moder;
 }
 
-static struct net_dim_cq_moder mlx5e_get_def_rx_moderation(u8 cq_period_mode)
+static struct dim_cq_moder mlx5e_get_def_rx_moderation(u8 cq_period_mode)
 {
-	struct net_dim_cq_moder moder;
+	struct dim_cq_moder moder;
 
 	moder.cq_period_mode = cq_period_mode;
 	moder.pkts = MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_PKTS;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index 6dd820242f4b..432474754d77 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -48,24 +48,24 @@ static inline bool mlx5e_channel_no_affinity_change(struct mlx5e_channel *c)
 static void mlx5e_handle_tx_dim(struct mlx5e_txqsq *sq)
 {
 	struct mlx5e_sq_stats *stats = sq->stats;
-	struct net_dim_sample dim_sample;
+	struct dim_sample dim_sample;
 
 	if (unlikely(!test_bit(MLX5E_SQ_STATE_AM, &sq->state)))
 		return;
 
-	net_dim_create_sample(sq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
+	dim_create_sample(sq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
 	net_dim(&sq->dim, dim_sample);
 }
 
 static void mlx5e_handle_rx_dim(struct mlx5e_rq *rq)
 {
 	struct mlx5e_rq_stats *stats = rq->stats;
-	struct net_dim_sample dim_sample;
+	struct dim_sample dim_sample;
 
 	if (unlikely(!test_bit(MLX5E_RQ_STATE_AM, &rq->state)))
 		return;
 
-	net_dim_create_sample(rq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
+	dim_create_sample(rq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
 	net_dim(&rq->dim, dim_sample);
 }
 
diff --git a/include/linux/dim.h b/include/linux/dim.h
index 7809ffe470ff..8d9f8279ebee 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -42,13 +42,13 @@
 #define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1))
 
 
-struct net_dim_cq_moder {
+struct dim_cq_moder {
 	u16 usec;
 	u16 pkts;
 	u8 cq_period_mode;
 };
 
-struct net_dim_sample {
+struct dim_sample {
 	ktime_t time;
 	u32     pkt_ctr;
 	u32     byte_ctr;
@@ -61,10 +61,10 @@ struct dim_stats {
 	int epms; /* events per msec */
 };
 
-struct net_dim { /* Dynamic Interrupt Moderation */
+struct dim { /* Dynamic Interrupt Moderation */
 	u8                                      state;
 	struct dim_stats                        prev_stats;
-	struct net_dim_sample                   start_sample;
+	struct dim_sample                       start_sample;
 	struct work_struct                      work;
 	u8                                      profile_ix;
 	u8                                      mode;
@@ -105,7 +105,7 @@ enum {
 	DIM_ON_EDGE,
 };
 
-static inline bool dim_on_top(struct net_dim *dim)
+static inline bool dim_on_top(struct dim *dim)
 {
 	switch (dim->tune_state) {
 	case DIM_PARKING_ON_TOP:
@@ -118,7 +118,7 @@ static inline bool dim_on_top(struct net_dim *dim)
 	}
 }
 
-static inline void dim_turn(struct net_dim *dim)
+static inline void dim_turn(struct dim *dim)
 {
 	switch (dim->tune_state) {
 	case DIM_PARKING_ON_TOP:
@@ -135,7 +135,7 @@ static inline void dim_turn(struct net_dim *dim)
 	}
 }
 
-static inline void dim_park_on_top(struct net_dim *dim)
+static inline void dim_park_on_top(struct dim *dim)
 {
 	dim->steps_right  = 0;
 	dim->steps_left   = 0;
@@ -143,17 +143,17 @@ static inline void dim_park_on_top(struct net_dim *dim)
 	dim->tune_state   = DIM_PARKING_ON_TOP;
 }
 
-static inline void dim_park_tired(struct net_dim *dim)
+static inline void dim_park_tired(struct dim *dim)
 {
 	dim->steps_right  = 0;
 	dim->steps_left   = 0;
 	dim->tune_state   = DIM_PARKING_TIRED;
 }
 
-static inline void net_dim_create_sample(u16 event_ctr,
-					 u64 packets,
-					 u64 bytes,
-					 struct net_dim_sample *s)
+static inline void dim_create_sample(u16 event_ctr,
+				     u64 packets,
+				     u64 bytes,
+				     struct dim_sample *s)
 {
 	s->time	     = ktime_get();
 	s->pkt_ctr   = packets;
@@ -161,8 +161,8 @@ static inline void net_dim_create_sample(u16 event_ctr,
 	s->event_ctr = event_ctr;
 }
 
-static inline void dim_calc_stats(struct net_dim_sample *start,
-				  struct net_dim_sample *end,
+static inline void dim_calc_stats(struct dim_sample *start,
+				  struct dim_sample *end,
 				  struct dim_stats *curr_stats)
 {
 	/* u32 holds up to 71 minutes, should be enough */
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index 8cddfc93819c..e9363c372b68 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -77,28 +77,28 @@
 	{64, 32}   \
 }
 
-static const struct net_dim_cq_moder
+static const struct dim_cq_moder
 rx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
 	NET_DIM_RX_EQE_PROFILES,
 	NET_DIM_RX_CQE_PROFILES,
 };
 
-static const struct net_dim_cq_moder
+static const struct dim_cq_moder
 tx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
 	NET_DIM_TX_EQE_PROFILES,
 	NET_DIM_TX_CQE_PROFILES,
 };
 
-static inline struct net_dim_cq_moder
+static inline struct dim_cq_moder
 net_dim_get_rx_moderation(u8 cq_period_mode, int ix)
 {
-	struct net_dim_cq_moder cq_moder = rx_profile[cq_period_mode][ix];
+	struct dim_cq_moder cq_moder = rx_profile[cq_period_mode][ix];
 
 	cq_moder.cq_period_mode = cq_period_mode;
 	return cq_moder;
 }
 
-static inline struct net_dim_cq_moder
+static inline struct dim_cq_moder
 net_dim_get_def_rx_moderation(u8 cq_period_mode)
 {
 	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
@@ -107,16 +107,16 @@ net_dim_get_def_rx_moderation(u8 cq_period_mode)
 	return net_dim_get_rx_moderation(cq_period_mode, profile_ix);
 }
 
-static inline struct net_dim_cq_moder
+static inline struct dim_cq_moder
 net_dim_get_tx_moderation(u8 cq_period_mode, int ix)
 {
-	struct net_dim_cq_moder cq_moder = tx_profile[cq_period_mode][ix];
+	struct dim_cq_moder cq_moder = tx_profile[cq_period_mode][ix];
 
 	cq_moder.cq_period_mode = cq_period_mode;
 	return cq_moder;
 }
 
-static inline struct net_dim_cq_moder
+static inline struct dim_cq_moder
 net_dim_get_def_tx_moderation(u8 cq_period_mode)
 {
 	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
@@ -125,7 +125,7 @@ net_dim_get_def_tx_moderation(u8 cq_period_mode)
 	return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
 }
 
-static inline int net_dim_step(struct net_dim *dim)
+static inline int net_dim_step(struct dim *dim)
 {
 	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
 		return DIM_TOO_TIRED;
@@ -152,7 +152,7 @@ static inline int net_dim_step(struct net_dim *dim)
 	return DIM_STEPPED;
 }
 
-static inline void net_dim_exit_parking(struct net_dim *dim)
+static inline void net_dim_exit_parking(struct dim *dim)
 {
 	dim->tune_state = dim->profile_ix ? DIM_GOING_LEFT :
 					  DIM_GOING_RIGHT;
@@ -189,7 +189,7 @@ static inline int net_dim_stats_compare(struct dim_stats *curr,
 }
 
 static inline bool net_dim_decision(struct dim_stats *curr_stats,
-				    struct net_dim *dim)
+				    struct dim *dim)
 {
 	int prev_state = dim->tune_state;
 	int prev_ix = dim->profile_ix;
@@ -240,8 +240,8 @@ static inline bool net_dim_decision(struct dim_stats *curr_stats,
 	return dim->profile_ix != prev_ix;
 }
 
-static inline void net_dim(struct net_dim *dim,
-			   struct net_dim_sample end_sample)
+static inline void net_dim(struct dim *dim,
+			   struct dim_sample end_sample)
 {
 	struct dim_stats curr_stats;
 	u16 nevents;
@@ -262,8 +262,8 @@ static inline void net_dim(struct net_dim *dim,
 		}
 		/* fall through */
 	case DIM_START_MEASURE:
-		net_dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr,
-				      end_sample.byte_ctr, &dim->start_sample);
+		dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr,
+				  end_sample.byte_ctr, &dim->start_sample);
 		dim->state = DIM_MEASURE_IN_PROGRESS;
 		break;
 	case DIM_APPLY_NEW_PROFILE:
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 6/9] linux/dim: Move implementation to .c files
  2019-03-06  8:48 ` Tal Gilboa
@ 2019-03-06  8:48   ` Tal Gilboa
  -1 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)
  To: linux-rdma, linux-nvme, linux-block
  Cc: Yishai Hadas, Leon Romanovsky, Jason Gunthorpe, Doug Ledford,
	Tariq Toukan, Tal Gilboa, Saeed Mahameed, Idan Burstein,
	Yamin Friedman, Max Gurtovoy

Moved all logic from dim.h and net_dim.h to dim.c and net_dim.c.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
---
 MAINTAINERS             |   2 +
 include/linux/dim.h     |  86 +++---------------
 include/linux/net_dim.h | 182 ++-----------------------------------
 lib/Kconfig             |   7 ++
 lib/Makefile            |   1 +
 lib/dim/Makefile        |   9 ++
 lib/dim/dim.c           |  83 +++++++++++++++++
 lib/dim/net_dim.c       | 193 ++++++++++++++++++++++++++++++++++++++++
 8 files changed, 312 insertions(+), 251 deletions(-)
 create mode 100644 lib/dim/Makefile
 create mode 100644 lib/dim/dim.c
 create mode 100644 lib/dim/net_dim.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 019a2bcfbd09..6ae949be8b83 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5337,6 +5337,8 @@ DYNAMIC INTERRUPT MODERATION
 M:	Tal Gilboa <talgi@mellanox.com>
 S:	Maintained
 F:	include/linux/net_dim.h
+F:	include/linux/dim.h
+F:	lib/dim/
 
 DZ DECSTATION DZ11 SERIAL DRIVER
 M:	"Maciej W. Rozycki" <macro@linux-mips.org>
diff --git a/include/linux/dim.h b/include/linux/dim.h
index 8d9f8279ebee..88a74cd83d95 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -105,79 +105,17 @@ enum {
 	DIM_ON_EDGE,
 };
 
-static inline bool dim_on_top(struct dim *dim)
-{
-	switch (dim->tune_state) {
-	case DIM_PARKING_ON_TOP:
-	case DIM_PARKING_TIRED:
-		return true;
-	case DIM_GOING_RIGHT:
-		return (dim->steps_left > 1) && (dim->steps_right == 1);
-	default: /* DIM_GOING_LEFT */
-		return (dim->steps_right > 1) && (dim->steps_left == 1);
-	}
-}
-
-static inline void dim_turn(struct dim *dim)
-{
-	switch (dim->tune_state) {
-	case DIM_PARKING_ON_TOP:
-	case DIM_PARKING_TIRED:
-		break;
-	case DIM_GOING_RIGHT:
-		dim->tune_state = DIM_GOING_LEFT;
-		dim->steps_left = 0;
-		break;
-	case DIM_GOING_LEFT:
-		dim->tune_state = DIM_GOING_RIGHT;
-		dim->steps_right = 0;
-		break;
-	}
-}
-
-static inline void dim_park_on_top(struct dim *dim)
-{
-	dim->steps_right  = 0;
-	dim->steps_left   = 0;
-	dim->tired        = 0;
-	dim->tune_state   = DIM_PARKING_ON_TOP;
-}
-
-static inline void dim_park_tired(struct dim *dim)
-{
-	dim->steps_right  = 0;
-	dim->steps_left   = 0;
-	dim->tune_state   = DIM_PARKING_TIRED;
-}
-
-static inline void dim_create_sample(u16 event_ctr,
-				     u64 packets,
-				     u64 bytes,
-				     struct dim_sample *s)
-{
-	s->time	     = ktime_get();
-	s->pkt_ctr   = packets;
-	s->byte_ctr  = bytes;
-	s->event_ctr = event_ctr;
-}
-
-static inline void dim_calc_stats(struct dim_sample *start,
-				  struct dim_sample *end,
-				  struct dim_stats *curr_stats)
-{
-	/* u32 holds up to 71 minutes, should be enough */
-	u32 delta_us = ktime_us_delta(end->time, start->time);
-	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
-	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
-			     start->byte_ctr);
-
-	if (!delta_us)
-		return;
-
-	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
-	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
-	curr_stats->epms = DIV_ROUND_UP(DIM_NEVENTS * USEC_PER_MSEC,
-					delta_us);
-}
+bool dim_on_top(struct dim *dim);
+
+void dim_turn(struct dim *dim);
+
+void dim_park_on_top(struct dim *dim);
+
+void dim_park_tired(struct dim *dim);
+
+void dim_create_sample(u16 event_ctr, u64 packets, u64 bytes, struct dim_sample *s);
+
+void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
+		    struct dim_stats *curr_stats);
 
 #endif /* DIM_H */
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index e9363c372b68..debf548cfb33 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -89,186 +89,14 @@ tx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
 	NET_DIM_TX_CQE_PROFILES,
 };
 
-static inline struct dim_cq_moder
-net_dim_get_rx_moderation(u8 cq_period_mode, int ix)
-{
-	struct dim_cq_moder cq_moder = rx_profile[cq_period_mode][ix];
+struct dim_cq_moder net_dim_get_rx_moderation(u8 cq_period_mode, int ix);
 
-	cq_moder.cq_period_mode = cq_period_mode;
-	return cq_moder;
-}
-
-static inline struct dim_cq_moder
-net_dim_get_def_rx_moderation(u8 cq_period_mode)
-{
-	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
-			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
-
-	return net_dim_get_rx_moderation(cq_period_mode, profile_ix);
-}
-
-static inline struct dim_cq_moder
-net_dim_get_tx_moderation(u8 cq_period_mode, int ix)
-{
-	struct dim_cq_moder cq_moder = tx_profile[cq_period_mode][ix];
-
-	cq_moder.cq_period_mode = cq_period_mode;
-	return cq_moder;
-}
-
-static inline struct dim_cq_moder
-net_dim_get_def_tx_moderation(u8 cq_period_mode)
-{
-	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
-			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
-
-	return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
-}
-
-static inline int net_dim_step(struct dim *dim)
-{
-	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
-		return DIM_TOO_TIRED;
-
-	switch (dim->tune_state) {
-	case DIM_PARKING_ON_TOP:
-	case DIM_PARKING_TIRED:
-		break;
-	case DIM_GOING_RIGHT:
-		if (dim->profile_ix == (NET_DIM_PARAMS_NUM_PROFILES - 1))
-			return DIM_ON_EDGE;
-		dim->profile_ix++;
-		dim->steps_right++;
-		break;
-	case DIM_GOING_LEFT:
-		if (dim->profile_ix == 0)
-			return DIM_ON_EDGE;
-		dim->profile_ix--;
-		dim->steps_left++;
-		break;
-	}
-
-	dim->tired++;
-	return DIM_STEPPED;
-}
-
-static inline void net_dim_exit_parking(struct dim *dim)
-{
-	dim->tune_state = dim->profile_ix ? DIM_GOING_LEFT :
-					  DIM_GOING_RIGHT;
-	net_dim_step(dim);
-}
-
-static inline int net_dim_stats_compare(struct dim_stats *curr,
-					struct dim_stats *prev)
-{
-	if (!prev->bpms)
-		return curr->bpms ? DIM_STATS_BETTER :
-				    DIM_STATS_SAME;
+struct dim_cq_moder net_dim_get_def_rx_moderation(u8 cq_period_mode);
 
-	if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
-		return (curr->bpms > prev->bpms) ? DIM_STATS_BETTER :
-						   DIM_STATS_WORSE;
+struct dim_cq_moder net_dim_get_tx_moderation(u8 cq_period_mode, int ix);
 
-	if (!prev->ppms)
-		return curr->ppms ? DIM_STATS_BETTER :
-				    DIM_STATS_SAME;
+struct dim_cq_moder net_dim_get_def_tx_moderation(u8 cq_period_mode);
 
-	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
-		return (curr->ppms > prev->ppms) ? DIM_STATS_BETTER :
-						   DIM_STATS_WORSE;
-
-	if (!prev->epms)
-		return DIM_STATS_SAME;
-
-	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
-		return (curr->epms < prev->epms) ? DIM_STATS_BETTER :
-						   DIM_STATS_WORSE;
-
-	return DIM_STATS_SAME;
-}
-
-static inline bool net_dim_decision(struct dim_stats *curr_stats,
-				    struct dim *dim)
-{
-	int prev_state = dim->tune_state;
-	int prev_ix = dim->profile_ix;
-	int stats_res;
-	int step_res;
-
-	switch (dim->tune_state) {
-	case DIM_PARKING_ON_TOP:
-		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
-		if (stats_res != DIM_STATS_SAME)
-			net_dim_exit_parking(dim);
-		break;
-
-	case DIM_PARKING_TIRED:
-		dim->tired--;
-		if (!dim->tired)
-			net_dim_exit_parking(dim);
-		break;
-
-	case DIM_GOING_RIGHT:
-	case DIM_GOING_LEFT:
-		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
-		if (stats_res != DIM_STATS_BETTER)
-			dim_turn(dim);
-
-		if (dim_on_top(dim)) {
-			dim_park_on_top(dim);
-			break;
-		}
-
-		step_res = net_dim_step(dim);
-		switch (step_res) {
-		case DIM_ON_EDGE:
-			dim_park_on_top(dim);
-			break;
-		case DIM_TOO_TIRED:
-			dim_park_tired(dim);
-			break;
-		}
-
-		break;
-	}
-
-	if ((prev_state      != DIM_PARKING_ON_TOP) ||
-	    (dim->tune_state != DIM_PARKING_ON_TOP))
-		dim->prev_stats = *curr_stats;
-
-	return dim->profile_ix != prev_ix;
-}
-
-static inline void net_dim(struct dim *dim,
-			   struct dim_sample end_sample)
-{
-	struct dim_stats curr_stats;
-	u16 nevents;
-
-	switch (dim->state) {
-	case DIM_MEASURE_IN_PROGRESS:
-		nevents = BIT_GAP(BITS_PER_TYPE(u16),
-				  end_sample.event_ctr,
-				  dim->start_sample.event_ctr);
-		if (nevents < DIM_NEVENTS)
-			break;
-		dim_calc_stats(&dim->start_sample, &end_sample,
-				   &curr_stats);
-		if (net_dim_decision(&curr_stats, dim)) {
-			dim->state = DIM_APPLY_NEW_PROFILE;
-			schedule_work(&dim->work);
-			break;
-		}
-		/* fall through */
-	case DIM_START_MEASURE:
-		dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr,
-				  end_sample.byte_ctr, &dim->start_sample);
-		dim->state = DIM_MEASURE_IN_PROGRESS;
-		break;
-	case DIM_APPLY_NEW_PROFILE:
-		break;
-	}
-}
+void net_dim(struct dim *dim, struct dim_sample end_sample);
 
 #endif /* NET_DIM_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index a9e56539bd11..153465c65624 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -551,6 +551,13 @@ config SIGNATURE
 	  Digital signature verification. Currently only RSA is supported.
 	  Implementation is done using GnuPG MPI library
 
+config DIMLIB
+	bool "DIM library"
+	default y
+	help
+	  Dynamic Interrupt Moderation library.
+	  It is used to implement DIM logic.
+
 #
 # libfdt files, only selected if needed.
 #
diff --git a/lib/Makefile b/lib/Makefile
index e1b59da71418..0f2155eb9a85 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -185,6 +185,7 @@ obj-$(CONFIG_GLOB) += glob.o
 obj-$(CONFIG_GLOB_SELFTEST) += globtest.o
 
 obj-$(CONFIG_MPILIB) += mpi/
+obj-$(CONFIG_DIMLIB) += dim/
 obj-$(CONFIG_SIGNATURE) += digsig.o
 
 lib-$(CONFIG_CLZ_TAB) += clz_tab.o
diff --git a/lib/dim/Makefile b/lib/dim/Makefile
new file mode 100644
index 000000000000..160afe288df0
--- /dev/null
+++ b/lib/dim/Makefile
@@ -0,0 +1,9 @@
+#
+# DIM Dynamic Interrupt Moderation library
+#
+
+obj-$(CONFIG_DIMLIB) = net_dim.o
+
+net_dim-y = \
+	dim.o		\
+	net_dim.o
diff --git a/lib/dim/dim.c b/lib/dim/dim.c
new file mode 100644
index 000000000000..93e1ddd701b0
--- /dev/null
+++ b/lib/dim/dim.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ */
+
+#include <linux/dim.h>
+
+bool dim_on_top(struct dim *dim)
+{
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
+		return true;
+	case DIM_GOING_RIGHT:
+		return (dim->steps_left > 1) && (dim->steps_right == 1);
+	default: /* DIM_GOING_LEFT */
+		return (dim->steps_right > 1) && (dim->steps_left == 1);
+	}
+}
+EXPORT_SYMBOL(dim_on_top);
+
+void dim_turn(struct dim *dim)
+{
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
+		break;
+	case DIM_GOING_RIGHT:
+		dim->tune_state = DIM_GOING_LEFT;
+		dim->steps_left = 0;
+		break;
+	case DIM_GOING_LEFT:
+		dim->tune_state = DIM_GOING_RIGHT;
+		dim->steps_right = 0;
+		break;
+	}
+}
+EXPORT_SYMBOL(dim_turn);
+
+void dim_park_on_top(struct dim *dim)
+{
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tired        = 0;
+	dim->tune_state   = DIM_PARKING_ON_TOP;
+}
+EXPORT_SYMBOL(dim_park_on_top);
+
+void dim_park_tired(struct dim *dim)
+{
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tune_state   = DIM_PARKING_TIRED;
+}
+EXPORT_SYMBOL(dim_park_tired);
+
+void dim_create_sample(u16 event_ctr, u64 packets, u64 bytes, struct dim_sample *s)
+{
+	s->time	     = ktime_get();
+	s->pkt_ctr   = packets;
+	s->byte_ctr  = bytes;
+	s->event_ctr = event_ctr;
+}
+EXPORT_SYMBOL(dim_create_sample);
+
+void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
+		    struct dim_stats *curr_stats)
+{
+	/* u32 holds up to 71 minutes, should be enough */
+	u32 delta_us = ktime_us_delta(end->time, start->time);
+	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
+	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
+			     start->byte_ctr);
+
+	if (!delta_us)
+		return;
+
+	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
+	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
+	curr_stats->epms = DIV_ROUND_UP(DIM_NEVENTS * USEC_PER_MSEC,
+					delta_us);
+}
+EXPORT_SYMBOL(dim_calc_stats);
diff --git a/lib/dim/net_dim.c b/lib/dim/net_dim.c
new file mode 100644
index 000000000000..cf95cd20cf02
--- /dev/null
+++ b/lib/dim/net_dim.c
@@ -0,0 +1,193 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ */
+
+#include <linux/net_dim.h>
+#include <linux/dim.h>
+
+struct dim_cq_moder
+net_dim_get_rx_moderation(u8 cq_period_mode, int ix)
+{
+	struct dim_cq_moder cq_moder = rx_profile[cq_period_mode][ix];
+
+	cq_moder.cq_period_mode = cq_period_mode;
+	return cq_moder;
+}
+EXPORT_SYMBOL(net_dim_get_rx_moderation);
+
+struct dim_cq_moder
+net_dim_get_def_rx_moderation(u8 cq_period_mode)
+{
+	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
+			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
+
+	return net_dim_get_rx_moderation(cq_period_mode, profile_ix);
+}
+EXPORT_SYMBOL(net_dim_get_def_rx_moderation);
+
+struct dim_cq_moder
+net_dim_get_tx_moderation(u8 cq_period_mode, int ix)
+{
+	struct dim_cq_moder cq_moder = tx_profile[cq_period_mode][ix];
+
+	cq_moder.cq_period_mode = cq_period_mode;
+	return cq_moder;
+}
+EXPORT_SYMBOL(net_dim_get_tx_moderation);
+
+struct dim_cq_moder
+net_dim_get_def_tx_moderation(u8 cq_period_mode)
+{
+	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
+			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
+
+	return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
+}
+EXPORT_SYMBOL(net_dim_get_def_tx_moderation);
+
+static inline int net_dim_step(struct dim *dim)
+{
+	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
+		return DIM_TOO_TIRED;
+
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
+		break;
+	case DIM_GOING_RIGHT:
+		if (dim->profile_ix == (NET_DIM_PARAMS_NUM_PROFILES - 1))
+			return DIM_ON_EDGE;
+		dim->profile_ix++;
+		dim->steps_right++;
+		break;
+	case DIM_GOING_LEFT:
+		if (dim->profile_ix == 0)
+			return DIM_ON_EDGE;
+		dim->profile_ix--;
+		dim->steps_left++;
+		break;
+	}
+
+	dim->tired++;
+	return DIM_STEPPED;
+}
+
+static inline void net_dim_exit_parking(struct dim *dim)
+{
+	dim->tune_state = dim->profile_ix ? DIM_GOING_LEFT :
+					  DIM_GOING_RIGHT;
+	net_dim_step(dim);
+}
+
+static inline int net_dim_stats_compare(struct dim_stats *curr,
+					struct dim_stats *prev)
+{
+	if (!prev->bpms)
+		return curr->bpms ? DIM_STATS_BETTER :
+				    DIM_STATS_SAME;
+
+	if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
+		return (curr->bpms > prev->bpms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
+
+	if (!prev->ppms)
+		return curr->ppms ? DIM_STATS_BETTER :
+				    DIM_STATS_SAME;
+
+	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
+		return (curr->ppms > prev->ppms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
+
+	if (!prev->epms)
+		return DIM_STATS_SAME;
+
+	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
+		return (curr->epms < prev->epms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
+
+	return DIM_STATS_SAME;
+}
+
+static inline bool net_dim_decision(struct dim_stats *curr_stats,
+				    struct dim *dim)
+{
+	int prev_state = dim->tune_state;
+	int prev_ix = dim->profile_ix;
+	int stats_res;
+	int step_res;
+
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
+		if (stats_res != DIM_STATS_SAME)
+			net_dim_exit_parking(dim);
+		break;
+
+	case DIM_PARKING_TIRED:
+		dim->tired--;
+		if (!dim->tired)
+			net_dim_exit_parking(dim);
+		break;
+
+	case DIM_GOING_RIGHT:
+	case DIM_GOING_LEFT:
+		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
+		if (stats_res != DIM_STATS_BETTER)
+			dim_turn(dim);
+
+		if (dim_on_top(dim)) {
+			dim_park_on_top(dim);
+			break;
+		}
+
+		step_res = net_dim_step(dim);
+		switch (step_res) {
+		case DIM_ON_EDGE:
+			dim_park_on_top(dim);
+			break;
+		case DIM_TOO_TIRED:
+			dim_park_tired(dim);
+			break;
+		}
+
+		break;
+	}
+
+	if ((prev_state      != DIM_PARKING_ON_TOP) ||
+	    (dim->tune_state != DIM_PARKING_ON_TOP))
+		dim->prev_stats = *curr_stats;
+
+	return dim->profile_ix != prev_ix;
+}
+
+void net_dim(struct dim *dim, struct dim_sample end_sample)
+{
+	struct dim_stats curr_stats;
+	u16 nevents;
+
+	switch (dim->state) {
+	case DIM_MEASURE_IN_PROGRESS:
+		nevents = BIT_GAP(BITS_PER_TYPE(u16),
+				  end_sample.event_ctr,
+				  dim->start_sample.event_ctr);
+		if (nevents < DIM_NEVENTS)
+			break;
+		dim_calc_stats(&dim->start_sample, &end_sample,
+				   &curr_stats);
+		if (net_dim_decision(&curr_stats, dim)) {
+			dim->state = DIM_APPLY_NEW_PROFILE;
+			schedule_work(&dim->work);
+			break;
+		}
+		/* fall through */
+	case DIM_START_MEASURE:
+		dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr,
+				  end_sample.byte_ctr, &dim->start_sample);
+		dim->state = DIM_MEASURE_IN_PROGRESS;
+		break;
+	case DIM_APPLY_NEW_PROFILE:
+		break;
+	}
+}
+EXPORT_SYMBOL(net_dim);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 6/9] linux/dim: Move implementation to .c files
@ 2019-03-06  8:48   ` Tal Gilboa
  0 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)


Moved all logic from dim.h and net_dim.h to dim.c and net_dim.c.

Signed-off-by: Tal Gilboa <talgi at mellanox.com>
---
 MAINTAINERS             |   2 +
 include/linux/dim.h     |  86 +++---------------
 include/linux/net_dim.h | 182 ++-----------------------------------
 lib/Kconfig             |   7 ++
 lib/Makefile            |   1 +
 lib/dim/Makefile        |   9 ++
 lib/dim/dim.c           |  83 +++++++++++++++++
 lib/dim/net_dim.c       | 193 ++++++++++++++++++++++++++++++++++++++++
 8 files changed, 312 insertions(+), 251 deletions(-)
 create mode 100644 lib/dim/Makefile
 create mode 100644 lib/dim/dim.c
 create mode 100644 lib/dim/net_dim.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 019a2bcfbd09..6ae949be8b83 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5337,6 +5337,8 @@ DYNAMIC INTERRUPT MODERATION
 M:	Tal Gilboa <talgi at mellanox.com>
 S:	Maintained
 F:	include/linux/net_dim.h
+F:	include/linux/dim.h
+F:	lib/dim/
 
 DZ DECSTATION DZ11 SERIAL DRIVER
 M:	"Maciej W. Rozycki" <macro at linux-mips.org>
diff --git a/include/linux/dim.h b/include/linux/dim.h
index 8d9f8279ebee..88a74cd83d95 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -105,79 +105,17 @@ enum {
 	DIM_ON_EDGE,
 };
 
-static inline bool dim_on_top(struct dim *dim)
-{
-	switch (dim->tune_state) {
-	case DIM_PARKING_ON_TOP:
-	case DIM_PARKING_TIRED:
-		return true;
-	case DIM_GOING_RIGHT:
-		return (dim->steps_left > 1) && (dim->steps_right == 1);
-	default: /* DIM_GOING_LEFT */
-		return (dim->steps_right > 1) && (dim->steps_left == 1);
-	}
-}
-
-static inline void dim_turn(struct dim *dim)
-{
-	switch (dim->tune_state) {
-	case DIM_PARKING_ON_TOP:
-	case DIM_PARKING_TIRED:
-		break;
-	case DIM_GOING_RIGHT:
-		dim->tune_state = DIM_GOING_LEFT;
-		dim->steps_left = 0;
-		break;
-	case DIM_GOING_LEFT:
-		dim->tune_state = DIM_GOING_RIGHT;
-		dim->steps_right = 0;
-		break;
-	}
-}
-
-static inline void dim_park_on_top(struct dim *dim)
-{
-	dim->steps_right  = 0;
-	dim->steps_left   = 0;
-	dim->tired        = 0;
-	dim->tune_state   = DIM_PARKING_ON_TOP;
-}
-
-static inline void dim_park_tired(struct dim *dim)
-{
-	dim->steps_right  = 0;
-	dim->steps_left   = 0;
-	dim->tune_state   = DIM_PARKING_TIRED;
-}
-
-static inline void dim_create_sample(u16 event_ctr,
-				     u64 packets,
-				     u64 bytes,
-				     struct dim_sample *s)
-{
-	s->time	     = ktime_get();
-	s->pkt_ctr   = packets;
-	s->byte_ctr  = bytes;
-	s->event_ctr = event_ctr;
-}
-
-static inline void dim_calc_stats(struct dim_sample *start,
-				  struct dim_sample *end,
-				  struct dim_stats *curr_stats)
-{
-	/* u32 holds up to 71 minutes, should be enough */
-	u32 delta_us = ktime_us_delta(end->time, start->time);
-	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
-	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
-			     start->byte_ctr);
-
-	if (!delta_us)
-		return;
-
-	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
-	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
-	curr_stats->epms = DIV_ROUND_UP(DIM_NEVENTS * USEC_PER_MSEC,
-					delta_us);
-}
+bool dim_on_top(struct dim *dim);
+
+void dim_turn(struct dim *dim);
+
+void dim_park_on_top(struct dim *dim);
+
+void dim_park_tired(struct dim *dim);
+
+void dim_create_sample(u16 event_ctr, u64 packets, u64 bytes, struct dim_sample *s);
+
+void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
+		    struct dim_stats *curr_stats);
 
 #endif /* DIM_H */
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index e9363c372b68..debf548cfb33 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -89,186 +89,14 @@ tx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
 	NET_DIM_TX_CQE_PROFILES,
 };
 
-static inline struct dim_cq_moder
-net_dim_get_rx_moderation(u8 cq_period_mode, int ix)
-{
-	struct dim_cq_moder cq_moder = rx_profile[cq_period_mode][ix];
+struct dim_cq_moder net_dim_get_rx_moderation(u8 cq_period_mode, int ix);
 
-	cq_moder.cq_period_mode = cq_period_mode;
-	return cq_moder;
-}
-
-static inline struct dim_cq_moder
-net_dim_get_def_rx_moderation(u8 cq_period_mode)
-{
-	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
-			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
-
-	return net_dim_get_rx_moderation(cq_period_mode, profile_ix);
-}
-
-static inline struct dim_cq_moder
-net_dim_get_tx_moderation(u8 cq_period_mode, int ix)
-{
-	struct dim_cq_moder cq_moder = tx_profile[cq_period_mode][ix];
-
-	cq_moder.cq_period_mode = cq_period_mode;
-	return cq_moder;
-}
-
-static inline struct dim_cq_moder
-net_dim_get_def_tx_moderation(u8 cq_period_mode)
-{
-	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
-			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
-
-	return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
-}
-
-static inline int net_dim_step(struct dim *dim)
-{
-	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
-		return DIM_TOO_TIRED;
-
-	switch (dim->tune_state) {
-	case DIM_PARKING_ON_TOP:
-	case DIM_PARKING_TIRED:
-		break;
-	case DIM_GOING_RIGHT:
-		if (dim->profile_ix == (NET_DIM_PARAMS_NUM_PROFILES - 1))
-			return DIM_ON_EDGE;
-		dim->profile_ix++;
-		dim->steps_right++;
-		break;
-	case DIM_GOING_LEFT:
-		if (dim->profile_ix == 0)
-			return DIM_ON_EDGE;
-		dim->profile_ix--;
-		dim->steps_left++;
-		break;
-	}
-
-	dim->tired++;
-	return DIM_STEPPED;
-}
-
-static inline void net_dim_exit_parking(struct dim *dim)
-{
-	dim->tune_state = dim->profile_ix ? DIM_GOING_LEFT :
-					  DIM_GOING_RIGHT;
-	net_dim_step(dim);
-}
-
-static inline int net_dim_stats_compare(struct dim_stats *curr,
-					struct dim_stats *prev)
-{
-	if (!prev->bpms)
-		return curr->bpms ? DIM_STATS_BETTER :
-				    DIM_STATS_SAME;
+struct dim_cq_moder net_dim_get_def_rx_moderation(u8 cq_period_mode);
 
-	if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
-		return (curr->bpms > prev->bpms) ? DIM_STATS_BETTER :
-						   DIM_STATS_WORSE;
+struct dim_cq_moder net_dim_get_tx_moderation(u8 cq_period_mode, int ix);
 
-	if (!prev->ppms)
-		return curr->ppms ? DIM_STATS_BETTER :
-				    DIM_STATS_SAME;
+struct dim_cq_moder net_dim_get_def_tx_moderation(u8 cq_period_mode);
 
-	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
-		return (curr->ppms > prev->ppms) ? DIM_STATS_BETTER :
-						   DIM_STATS_WORSE;
-
-	if (!prev->epms)
-		return DIM_STATS_SAME;
-
-	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
-		return (curr->epms < prev->epms) ? DIM_STATS_BETTER :
-						   DIM_STATS_WORSE;
-
-	return DIM_STATS_SAME;
-}
-
-static inline bool net_dim_decision(struct dim_stats *curr_stats,
-				    struct dim *dim)
-{
-	int prev_state = dim->tune_state;
-	int prev_ix = dim->profile_ix;
-	int stats_res;
-	int step_res;
-
-	switch (dim->tune_state) {
-	case DIM_PARKING_ON_TOP:
-		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
-		if (stats_res != DIM_STATS_SAME)
-			net_dim_exit_parking(dim);
-		break;
-
-	case DIM_PARKING_TIRED:
-		dim->tired--;
-		if (!dim->tired)
-			net_dim_exit_parking(dim);
-		break;
-
-	case DIM_GOING_RIGHT:
-	case DIM_GOING_LEFT:
-		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
-		if (stats_res != DIM_STATS_BETTER)
-			dim_turn(dim);
-
-		if (dim_on_top(dim)) {
-			dim_park_on_top(dim);
-			break;
-		}
-
-		step_res = net_dim_step(dim);
-		switch (step_res) {
-		case DIM_ON_EDGE:
-			dim_park_on_top(dim);
-			break;
-		case DIM_TOO_TIRED:
-			dim_park_tired(dim);
-			break;
-		}
-
-		break;
-	}
-
-	if ((prev_state      != DIM_PARKING_ON_TOP) ||
-	    (dim->tune_state != DIM_PARKING_ON_TOP))
-		dim->prev_stats = *curr_stats;
-
-	return dim->profile_ix != prev_ix;
-}
-
-static inline void net_dim(struct dim *dim,
-			   struct dim_sample end_sample)
-{
-	struct dim_stats curr_stats;
-	u16 nevents;
-
-	switch (dim->state) {
-	case DIM_MEASURE_IN_PROGRESS:
-		nevents = BIT_GAP(BITS_PER_TYPE(u16),
-				  end_sample.event_ctr,
-				  dim->start_sample.event_ctr);
-		if (nevents < DIM_NEVENTS)
-			break;
-		dim_calc_stats(&dim->start_sample, &end_sample,
-				   &curr_stats);
-		if (net_dim_decision(&curr_stats, dim)) {
-			dim->state = DIM_APPLY_NEW_PROFILE;
-			schedule_work(&dim->work);
-			break;
-		}
-		/* fall through */
-	case DIM_START_MEASURE:
-		dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr,
-				  end_sample.byte_ctr, &dim->start_sample);
-		dim->state = DIM_MEASURE_IN_PROGRESS;
-		break;
-	case DIM_APPLY_NEW_PROFILE:
-		break;
-	}
-}
+void net_dim(struct dim *dim, struct dim_sample end_sample);
 
 #endif /* NET_DIM_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index a9e56539bd11..153465c65624 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -551,6 +551,13 @@ config SIGNATURE
 	  Digital signature verification. Currently only RSA is supported.
 	  Implementation is done using GnuPG MPI library
 
+config DIMLIB
+	bool "DIM library"
+	default y
+	help
+	  Dynamic Interrupt Moderation library.
+	  It is used to implement DIM logic.
+
 #
 # libfdt files, only selected if needed.
 #
diff --git a/lib/Makefile b/lib/Makefile
index e1b59da71418..0f2155eb9a85 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -185,6 +185,7 @@ obj-$(CONFIG_GLOB) += glob.o
 obj-$(CONFIG_GLOB_SELFTEST) += globtest.o
 
 obj-$(CONFIG_MPILIB) += mpi/
+obj-$(CONFIG_DIMLIB) += dim/
 obj-$(CONFIG_SIGNATURE) += digsig.o
 
 lib-$(CONFIG_CLZ_TAB) += clz_tab.o
diff --git a/lib/dim/Makefile b/lib/dim/Makefile
new file mode 100644
index 000000000000..160afe288df0
--- /dev/null
+++ b/lib/dim/Makefile
@@ -0,0 +1,9 @@
+#
+# DIM Dynamic Interrupt Moderation library
+#
+
+obj-$(CONFIG_DIMLIB) = net_dim.o
+
+net_dim-y = \
+	dim.o		\
+	net_dim.o
diff --git a/lib/dim/dim.c b/lib/dim/dim.c
new file mode 100644
index 000000000000..93e1ddd701b0
--- /dev/null
+++ b/lib/dim/dim.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ */
+
+#include <linux/dim.h>
+
+bool dim_on_top(struct dim *dim)
+{
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
+		return true;
+	case DIM_GOING_RIGHT:
+		return (dim->steps_left > 1) && (dim->steps_right == 1);
+	default: /* DIM_GOING_LEFT */
+		return (dim->steps_right > 1) && (dim->steps_left == 1);
+	}
+}
+EXPORT_SYMBOL(dim_on_top);
+
+void dim_turn(struct dim *dim)
+{
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
+		break;
+	case DIM_GOING_RIGHT:
+		dim->tune_state = DIM_GOING_LEFT;
+		dim->steps_left = 0;
+		break;
+	case DIM_GOING_LEFT:
+		dim->tune_state = DIM_GOING_RIGHT;
+		dim->steps_right = 0;
+		break;
+	}
+}
+EXPORT_SYMBOL(dim_turn);
+
+void dim_park_on_top(struct dim *dim)
+{
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tired        = 0;
+	dim->tune_state   = DIM_PARKING_ON_TOP;
+}
+EXPORT_SYMBOL(dim_park_on_top);
+
+void dim_park_tired(struct dim *dim)
+{
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tune_state   = DIM_PARKING_TIRED;
+}
+EXPORT_SYMBOL(dim_park_tired);
+
+void dim_create_sample(u16 event_ctr, u64 packets, u64 bytes, struct dim_sample *s)
+{
+	s->time	     = ktime_get();
+	s->pkt_ctr   = packets;
+	s->byte_ctr  = bytes;
+	s->event_ctr = event_ctr;
+}
+EXPORT_SYMBOL(dim_create_sample);
+
+void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
+		    struct dim_stats *curr_stats)
+{
+	/* u32 holds up to 71 minutes, should be enough */
+	u32 delta_us = ktime_us_delta(end->time, start->time);
+	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
+	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
+			     start->byte_ctr);
+
+	if (!delta_us)
+		return;
+
+	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
+	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
+	curr_stats->epms = DIV_ROUND_UP(DIM_NEVENTS * USEC_PER_MSEC,
+					delta_us);
+}
+EXPORT_SYMBOL(dim_calc_stats);
diff --git a/lib/dim/net_dim.c b/lib/dim/net_dim.c
new file mode 100644
index 000000000000..cf95cd20cf02
--- /dev/null
+++ b/lib/dim/net_dim.c
@@ -0,0 +1,193 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ */
+
+#include <linux/net_dim.h>
+#include <linux/dim.h>
+
+struct dim_cq_moder
+net_dim_get_rx_moderation(u8 cq_period_mode, int ix)
+{
+	struct dim_cq_moder cq_moder = rx_profile[cq_period_mode][ix];
+
+	cq_moder.cq_period_mode = cq_period_mode;
+	return cq_moder;
+}
+EXPORT_SYMBOL(net_dim_get_rx_moderation);
+
+struct dim_cq_moder
+net_dim_get_def_rx_moderation(u8 cq_period_mode)
+{
+	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
+			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
+
+	return net_dim_get_rx_moderation(cq_period_mode, profile_ix);
+}
+EXPORT_SYMBOL(net_dim_get_def_rx_moderation);
+
+struct dim_cq_moder
+net_dim_get_tx_moderation(u8 cq_period_mode, int ix)
+{
+	struct dim_cq_moder cq_moder = tx_profile[cq_period_mode][ix];
+
+	cq_moder.cq_period_mode = cq_period_mode;
+	return cq_moder;
+}
+EXPORT_SYMBOL(net_dim_get_tx_moderation);
+
+struct dim_cq_moder
+net_dim_get_def_tx_moderation(u8 cq_period_mode)
+{
+	u8 profile_ix = cq_period_mode == DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
+			NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
+
+	return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
+}
+EXPORT_SYMBOL(net_dim_get_def_tx_moderation);
+
+static inline int net_dim_step(struct dim *dim)
+{
+	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
+		return DIM_TOO_TIRED;
+
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
+		break;
+	case DIM_GOING_RIGHT:
+		if (dim->profile_ix == (NET_DIM_PARAMS_NUM_PROFILES - 1))
+			return DIM_ON_EDGE;
+		dim->profile_ix++;
+		dim->steps_right++;
+		break;
+	case DIM_GOING_LEFT:
+		if (dim->profile_ix == 0)
+			return DIM_ON_EDGE;
+		dim->profile_ix--;
+		dim->steps_left++;
+		break;
+	}
+
+	dim->tired++;
+	return DIM_STEPPED;
+}
+
+static inline void net_dim_exit_parking(struct dim *dim)
+{
+	dim->tune_state = dim->profile_ix ? DIM_GOING_LEFT :
+					  DIM_GOING_RIGHT;
+	net_dim_step(dim);
+}
+
+static inline int net_dim_stats_compare(struct dim_stats *curr,
+					struct dim_stats *prev)
+{
+	if (!prev->bpms)
+		return curr->bpms ? DIM_STATS_BETTER :
+				    DIM_STATS_SAME;
+
+	if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
+		return (curr->bpms > prev->bpms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
+
+	if (!prev->ppms)
+		return curr->ppms ? DIM_STATS_BETTER :
+				    DIM_STATS_SAME;
+
+	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
+		return (curr->ppms > prev->ppms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
+
+	if (!prev->epms)
+		return DIM_STATS_SAME;
+
+	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
+		return (curr->epms < prev->epms) ? DIM_STATS_BETTER :
+						   DIM_STATS_WORSE;
+
+	return DIM_STATS_SAME;
+}
+
+static inline bool net_dim_decision(struct dim_stats *curr_stats,
+				    struct dim *dim)
+{
+	int prev_state = dim->tune_state;
+	int prev_ix = dim->profile_ix;
+	int stats_res;
+	int step_res;
+
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
+		if (stats_res != DIM_STATS_SAME)
+			net_dim_exit_parking(dim);
+		break;
+
+	case DIM_PARKING_TIRED:
+		dim->tired--;
+		if (!dim->tired)
+			net_dim_exit_parking(dim);
+		break;
+
+	case DIM_GOING_RIGHT:
+	case DIM_GOING_LEFT:
+		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
+		if (stats_res != DIM_STATS_BETTER)
+			dim_turn(dim);
+
+		if (dim_on_top(dim)) {
+			dim_park_on_top(dim);
+			break;
+		}
+
+		step_res = net_dim_step(dim);
+		switch (step_res) {
+		case DIM_ON_EDGE:
+			dim_park_on_top(dim);
+			break;
+		case DIM_TOO_TIRED:
+			dim_park_tired(dim);
+			break;
+		}
+
+		break;
+	}
+
+	if ((prev_state      != DIM_PARKING_ON_TOP) ||
+	    (dim->tune_state != DIM_PARKING_ON_TOP))
+		dim->prev_stats = *curr_stats;
+
+	return dim->profile_ix != prev_ix;
+}
+
+void net_dim(struct dim *dim, struct dim_sample end_sample)
+{
+	struct dim_stats curr_stats;
+	u16 nevents;
+
+	switch (dim->state) {
+	case DIM_MEASURE_IN_PROGRESS:
+		nevents = BIT_GAP(BITS_PER_TYPE(u16),
+				  end_sample.event_ctr,
+				  dim->start_sample.event_ctr);
+		if (nevents < DIM_NEVENTS)
+			break;
+		dim_calc_stats(&dim->start_sample, &end_sample,
+				   &curr_stats);
+		if (net_dim_decision(&curr_stats, dim)) {
+			dim->state = DIM_APPLY_NEW_PROFILE;
+			schedule_work(&dim->work);
+			break;
+		}
+		/* fall through */
+	case DIM_START_MEASURE:
+		dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr,
+				  end_sample.byte_ctr, &dim->start_sample);
+		dim->state = DIM_MEASURE_IN_PROGRESS;
+		break;
+	case DIM_APPLY_NEW_PROFILE:
+		break;
+	}
+}
+EXPORT_SYMBOL(net_dim);
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 7/9] linux/dim: Add completions count to dim_sample
  2019-03-06  8:48 ` Tal Gilboa
@ 2019-03-06  8:48   ` Tal Gilboa
  -1 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)
  To: linux-rdma, linux-nvme, linux-block
  Cc: Yishai Hadas, Leon Romanovsky, Jason Gunthorpe, Doug Ledford,
	Tariq Toukan, Tal Gilboa, Saeed Mahameed, Idan Burstein,
	Yamin Friedman, Max Gurtovoy

From: Yamin Friedman <yaminf@mellanox.com>

Added a measurement of completions per/msec to allow for completion based dim
algorithms.

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c        |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         |  1 +
 drivers/net/ethernet/broadcom/genet/bcmgenet.c    |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c |  4 ++--
 include/linux/dim.h                               |  7 ++++++-
 lib/dim/dim.c                                     | 11 ++++++++++-
 lib/dim/net_dim.c                                 |  2 +-
 7 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 840b3bf1ae3e..df38c8fd373f 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1035,7 +1035,7 @@ static int bcm_sysport_poll(struct napi_struct *napi, int budget)
 
 	if (priv->dim.use_dim) {
 		dim_create_sample(priv->dim.event_ctr, priv->dim.packets,
-				  priv->dim.bytes, &dim_sample);
+				  priv->dim.bytes, 0, &dim_sample);
 		net_dim(&priv->dim.dim, dim_sample);
 	}
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 131ab07aad83..516703ac0009 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2110,6 +2110,7 @@ static int bnxt_poll(struct napi_struct *napi, int budget)
 		dim_create_sample(cpr->event_ctr,
 				  cpr->rx_packets,
 				  cpr->rx_bytes,
+				  0,
 				  &dim_sample);
 		net_dim(&cpr->dim, dim_sample);
 	}
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 68d96e333c6d..aca82ef12d28 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1910,7 +1910,7 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 
 	if (ring->dim.use_dim) {
 		dim_create_sample(ring->dim.event_ctr, ring->dim.packets,
-				  ring->dim.bytes, &dim_sample);
+				  ring->dim.bytes, 0, &dim_sample);
 		net_dim(&ring->dim.dim, dim_sample);
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index 432474754d77..76fc57762083 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -53,7 +53,7 @@ static void mlx5e_handle_tx_dim(struct mlx5e_txqsq *sq)
 	if (unlikely(!test_bit(MLX5E_SQ_STATE_AM, &sq->state)))
 		return;
 
-	dim_create_sample(sq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
+	dim_create_sample(sq->cq.event_ctr, stats->packets, stats->bytes, 0, &dim_sample);
 	net_dim(&sq->dim, dim_sample);
 }
 
@@ -65,7 +65,7 @@ static void mlx5e_handle_rx_dim(struct mlx5e_rq *rq)
 	if (unlikely(!test_bit(MLX5E_RQ_STATE_AM, &rq->state)))
 		return;
 
-	dim_create_sample(rq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
+	dim_create_sample(rq->cq.event_ctr, stats->packets, stats->bytes, 0, &dim_sample);
 	net_dim(&rq->dim, dim_sample);
 }
 
diff --git a/include/linux/dim.h b/include/linux/dim.h
index 88a74cd83d95..39b621dc8e3e 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -45,6 +45,7 @@
 struct dim_cq_moder {
 	u16 usec;
 	u16 pkts;
+	u16 comps;
 	u8 cq_period_mode;
 };
 
@@ -53,18 +54,22 @@ struct dim_sample {
 	u32     pkt_ctr;
 	u32     byte_ctr;
 	u16     event_ctr;
+	u32     comp_ctr;
 };
 
 struct dim_stats {
 	int ppms; /* packets per msec */
 	int bpms; /* bytes per msec */
 	int epms; /* events per msec */
+	int cpms; /* completions per msec */
+	int cpe_ratio; /* ratio of completions to events */
 };
 
 struct dim { /* Dynamic Interrupt Moderation */
 	u8                                      state;
 	struct dim_stats                        prev_stats;
 	struct dim_sample                       start_sample;
+	struct dim_sample                       measuring_sample;
 	struct work_struct                      work;
 	u8                                      profile_ix;
 	u8                                      mode;
@@ -113,7 +118,7 @@ void dim_park_on_top(struct dim *dim);
 
 void dim_park_tired(struct dim *dim);
 
-void dim_create_sample(u16 event_ctr, u64 packets, u64 bytes, struct dim_sample *s);
+void dim_create_sample(u16 event_ctr, u64 packets, u64 bytes, u64 comps, struct dim_sample *s);
 
 void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
 		    struct dim_stats *curr_stats);
diff --git a/lib/dim/dim.c b/lib/dim/dim.c
index 93e1ddd701b0..b7283f1cb000 100644
--- a/lib/dim/dim.c
+++ b/lib/dim/dim.c
@@ -54,12 +54,13 @@ void dim_park_tired(struct dim *dim)
 }
 EXPORT_SYMBOL(dim_park_tired);
 
-void dim_create_sample(u16 event_ctr, u64 packets, u64 bytes, struct dim_sample *s)
+void dim_create_sample(u16 event_ctr, u64 packets, u64 bytes, u64 comps, struct dim_sample *s)
 {
 	s->time	     = ktime_get();
 	s->pkt_ctr   = packets;
 	s->byte_ctr  = bytes;
 	s->event_ctr = event_ctr;
+	s->comp_ctr  = comps;
 }
 EXPORT_SYMBOL(dim_create_sample);
 
@@ -71,6 +72,8 @@ void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
 	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
 	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
 			     start->byte_ctr);
+	u32 ncomps = BIT_GAP(BITS_PER_TYPE(u32), end->comp_ctr,
+			     start->comp_ctr);
 
 	if (!delta_us)
 		return;
@@ -79,5 +82,11 @@ void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
 	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
 	curr_stats->epms = DIV_ROUND_UP(DIM_NEVENTS * USEC_PER_MSEC,
 					delta_us);
+	curr_stats->cpms = DIV_ROUND_UP(ncomps * USEC_PER_MSEC, delta_us);
+	if (curr_stats->epms != 0)
+		curr_stats->cpe_ratio = (curr_stats->cpms * 100) / curr_stats->epms;
+	else
+		curr_stats->cpe_ratio = 0;
+
 }
 EXPORT_SYMBOL(dim_calc_stats);
diff --git a/lib/dim/net_dim.c b/lib/dim/net_dim.c
index cf95cd20cf02..10605b77bbc5 100644
--- a/lib/dim/net_dim.c
+++ b/lib/dim/net_dim.c
@@ -183,7 +183,7 @@ void net_dim(struct dim *dim, struct dim_sample end_sample)
 		/* fall through */
 	case DIM_START_MEASURE:
 		dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr,
-				  end_sample.byte_ctr, &dim->start_sample);
+				  end_sample.byte_ctr, 0, &dim->start_sample);
 		dim->state = DIM_MEASURE_IN_PROGRESS;
 		break;
 	case DIM_APPLY_NEW_PROFILE:
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 7/9] linux/dim: Add completions count to dim_sample
@ 2019-03-06  8:48   ` Tal Gilboa
  0 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)


From: Yamin Friedman <yaminf@mellanox.com>

Added a measurement of completions per/msec to allow for completion based dim
algorithms.

Signed-off-by: Yamin Friedman <yaminf at mellanox.com>
Signed-off-by: Tal Gilboa <talgi at mellanox.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c        |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         |  1 +
 drivers/net/ethernet/broadcom/genet/bcmgenet.c    |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c |  4 ++--
 include/linux/dim.h                               |  7 ++++++-
 lib/dim/dim.c                                     | 11 ++++++++++-
 lib/dim/net_dim.c                                 |  2 +-
 7 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 840b3bf1ae3e..df38c8fd373f 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1035,7 +1035,7 @@ static int bcm_sysport_poll(struct napi_struct *napi, int budget)
 
 	if (priv->dim.use_dim) {
 		dim_create_sample(priv->dim.event_ctr, priv->dim.packets,
-				  priv->dim.bytes, &dim_sample);
+				  priv->dim.bytes, 0, &dim_sample);
 		net_dim(&priv->dim.dim, dim_sample);
 	}
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 131ab07aad83..516703ac0009 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2110,6 +2110,7 @@ static int bnxt_poll(struct napi_struct *napi, int budget)
 		dim_create_sample(cpr->event_ctr,
 				  cpr->rx_packets,
 				  cpr->rx_bytes,
+				  0,
 				  &dim_sample);
 		net_dim(&cpr->dim, dim_sample);
 	}
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 68d96e333c6d..aca82ef12d28 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1910,7 +1910,7 @@ static int bcmgenet_rx_poll(struct napi_struct *napi, int budget)
 
 	if (ring->dim.use_dim) {
 		dim_create_sample(ring->dim.event_ctr, ring->dim.packets,
-				  ring->dim.bytes, &dim_sample);
+				  ring->dim.bytes, 0, &dim_sample);
 		net_dim(&ring->dim.dim, dim_sample);
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index 432474754d77..76fc57762083 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -53,7 +53,7 @@ static void mlx5e_handle_tx_dim(struct mlx5e_txqsq *sq)
 	if (unlikely(!test_bit(MLX5E_SQ_STATE_AM, &sq->state)))
 		return;
 
-	dim_create_sample(sq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
+	dim_create_sample(sq->cq.event_ctr, stats->packets, stats->bytes, 0, &dim_sample);
 	net_dim(&sq->dim, dim_sample);
 }
 
@@ -65,7 +65,7 @@ static void mlx5e_handle_rx_dim(struct mlx5e_rq *rq)
 	if (unlikely(!test_bit(MLX5E_RQ_STATE_AM, &rq->state)))
 		return;
 
-	dim_create_sample(rq->cq.event_ctr, stats->packets, stats->bytes, &dim_sample);
+	dim_create_sample(rq->cq.event_ctr, stats->packets, stats->bytes, 0, &dim_sample);
 	net_dim(&rq->dim, dim_sample);
 }
 
diff --git a/include/linux/dim.h b/include/linux/dim.h
index 88a74cd83d95..39b621dc8e3e 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -45,6 +45,7 @@
 struct dim_cq_moder {
 	u16 usec;
 	u16 pkts;
+	u16 comps;
 	u8 cq_period_mode;
 };
 
@@ -53,18 +54,22 @@ struct dim_sample {
 	u32     pkt_ctr;
 	u32     byte_ctr;
 	u16     event_ctr;
+	u32     comp_ctr;
 };
 
 struct dim_stats {
 	int ppms; /* packets per msec */
 	int bpms; /* bytes per msec */
 	int epms; /* events per msec */
+	int cpms; /* completions per msec */
+	int cpe_ratio; /* ratio of completions to events */
 };
 
 struct dim { /* Dynamic Interrupt Moderation */
 	u8                                      state;
 	struct dim_stats                        prev_stats;
 	struct dim_sample                       start_sample;
+	struct dim_sample                       measuring_sample;
 	struct work_struct                      work;
 	u8                                      profile_ix;
 	u8                                      mode;
@@ -113,7 +118,7 @@ void dim_park_on_top(struct dim *dim);
 
 void dim_park_tired(struct dim *dim);
 
-void dim_create_sample(u16 event_ctr, u64 packets, u64 bytes, struct dim_sample *s);
+void dim_create_sample(u16 event_ctr, u64 packets, u64 bytes, u64 comps, struct dim_sample *s);
 
 void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
 		    struct dim_stats *curr_stats);
diff --git a/lib/dim/dim.c b/lib/dim/dim.c
index 93e1ddd701b0..b7283f1cb000 100644
--- a/lib/dim/dim.c
+++ b/lib/dim/dim.c
@@ -54,12 +54,13 @@ void dim_park_tired(struct dim *dim)
 }
 EXPORT_SYMBOL(dim_park_tired);
 
-void dim_create_sample(u16 event_ctr, u64 packets, u64 bytes, struct dim_sample *s)
+void dim_create_sample(u16 event_ctr, u64 packets, u64 bytes, u64 comps, struct dim_sample *s)
 {
 	s->time	     = ktime_get();
 	s->pkt_ctr   = packets;
 	s->byte_ctr  = bytes;
 	s->event_ctr = event_ctr;
+	s->comp_ctr  = comps;
 }
 EXPORT_SYMBOL(dim_create_sample);
 
@@ -71,6 +72,8 @@ void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
 	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
 	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
 			     start->byte_ctr);
+	u32 ncomps = BIT_GAP(BITS_PER_TYPE(u32), end->comp_ctr,
+			     start->comp_ctr);
 
 	if (!delta_us)
 		return;
@@ -79,5 +82,11 @@ void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
 	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
 	curr_stats->epms = DIV_ROUND_UP(DIM_NEVENTS * USEC_PER_MSEC,
 					delta_us);
+	curr_stats->cpms = DIV_ROUND_UP(ncomps * USEC_PER_MSEC, delta_us);
+	if (curr_stats->epms != 0)
+		curr_stats->cpe_ratio = (curr_stats->cpms * 100) / curr_stats->epms;
+	else
+		curr_stats->cpe_ratio = 0;
+
 }
 EXPORT_SYMBOL(dim_calc_stats);
diff --git a/lib/dim/net_dim.c b/lib/dim/net_dim.c
index cf95cd20cf02..10605b77bbc5 100644
--- a/lib/dim/net_dim.c
+++ b/lib/dim/net_dim.c
@@ -183,7 +183,7 @@ void net_dim(struct dim *dim, struct dim_sample end_sample)
 		/* fall through */
 	case DIM_START_MEASURE:
 		dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr,
-				  end_sample.byte_ctr, &dim->start_sample);
+				  end_sample.byte_ctr, 0, &dim->start_sample);
 		dim->state = DIM_MEASURE_IN_PROGRESS;
 		break;
 	case DIM_APPLY_NEW_PROFILE:
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 8/9] linux/dim: Implement blk_dim.h
  2019-03-06  8:48 ` Tal Gilboa
@ 2019-03-06  8:48   ` Tal Gilboa
  -1 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)
  To: linux-rdma, linux-nvme, linux-block
  Cc: Yishai Hadas, Leon Romanovsky, Jason Gunthorpe, Doug Ledford,
	Tariq Toukan, Tal Gilboa, Saeed Mahameed, Idan Burstein,
	Yamin Friedman, Max Gurtovoy

From: Yamin Friedman <yaminf@mellanox.com>

blk_dim implements a different algorithm than net_dim that is optimized for nvmf
storage applications.
The algorithm optimizes for number of completions and ratio between completions
and events.
It also has a feature for fast reduction of moderation level when the traffic
changes in such a way as to no longer require high moderation in order to avoid
long latencies.

blk_dim.h will be called from the ib_core module.

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
---
 MAINTAINERS             |   1 +
 include/linux/blk_dim.h |  56 ++++++++++++++++++++
 lib/dim/Makefile        |   7 ++-
 lib/dim/blk_dim.c       | 114 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 177 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/blk_dim.h
 create mode 100644 lib/dim/blk_dim.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 6ae949be8b83..2860a3316be5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5337,6 +5337,7 @@ DYNAMIC INTERRUPT MODERATION
 M:	Tal Gilboa <talgi@mellanox.com>
 S:	Maintained
 F:	include/linux/net_dim.h
+F:	include/linux/blk_dim.h
 F:	include/linux/dim.h
 F:	lib/dim/
 
diff --git a/include/linux/blk_dim.h b/include/linux/blk_dim.h
new file mode 100644
index 000000000000..a044f62ec8fe
--- /dev/null
+++ b/include/linux/blk_dim.h
@@ -0,0 +1,56 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef BLK_DIM_H
+#define BLK_DIM_H
+
+#include <linux/module.h>
+#include <linux/dim.h>
+
+#define BLK_DIM_PARAMS_NUM_PROFILES 8
+#define BLK_DIM_START_PROFILE 0
+
+static const struct dim_cq_moder
+blk_dim_prof[BLK_DIM_PARAMS_NUM_PROFILES] = {
+	{1,   0, 1,  0},
+	{2,   0, 2,  0},
+	{4,   0, 4,  0},
+	{16,  0, 4,  0},
+	{32,  0, 4,  0},
+	{32,  0, 16, 0},
+	{256, 0, 16, 0},
+	{256, 0, 32, 0},
+};
+
+void blk_dim(struct dim *dim, struct dim_sample end_sample);
+
+#endif /* BLK_DIM_H */
diff --git a/lib/dim/Makefile b/lib/dim/Makefile
index 160afe288df0..2b3c57318dbb 100644
--- a/lib/dim/Makefile
+++ b/lib/dim/Makefile
@@ -2,8 +2,13 @@
 # DIM Dynamic Interrupt Moderation library
 #
 
-obj-$(CONFIG_DIMLIB) = net_dim.o
+obj-$(CONFIG_DIMLIB) += net_dim.o
+obj-$(CONFIG_DIMLIB) += blk_dim.o
 
 net_dim-y = \
 	dim.o		\
 	net_dim.o
+
+blk_dim-y = \
+	dim.o		\
+	blk_dim.o
diff --git a/lib/dim/blk_dim.c b/lib/dim/blk_dim.c
new file mode 100644
index 000000000000..49107c169b56
--- /dev/null
+++ b/lib/dim/blk_dim.c
@@ -0,0 +1,114 @@
+#include <linux/blk_dim.h>
+
+static inline int blk_dim_step(struct dim *dim)
+{
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
+		break;
+	case DIM_GOING_RIGHT:
+		if (dim->profile_ix == (BLK_DIM_PARAMS_NUM_PROFILES - 1))
+			return DIM_ON_EDGE;
+		dim->profile_ix++;
+		dim->steps_right++;
+		break;
+	case DIM_GOING_LEFT:
+		if (dim->profile_ix == 0)
+			return DIM_ON_EDGE;
+		dim->profile_ix--;
+		dim->steps_left++;
+		break;
+	}
+
+	return DIM_STEPPED;
+}
+
+static inline int blk_dim_stats_compare(struct dim_stats *curr, struct dim_stats *prev)
+{
+	/* first stat */
+	if (!prev->cpms)
+		return DIM_STATS_SAME;
+
+	if (IS_SIGNIFICANT_DIFF(curr->cpms, prev->cpms))
+		return (curr->cpms > prev->cpms) ? DIM_STATS_BETTER :
+						DIM_STATS_WORSE;
+
+	if (IS_SIGNIFICANT_DIFF(curr->cpe_ratio, prev->cpe_ratio))
+		return (curr->cpe_ratio > prev->cpe_ratio) ? DIM_STATS_BETTER :
+						DIM_STATS_WORSE;
+
+	return DIM_STATS_SAME;
+}
+
+static inline bool blk_dim_decision(struct dim_stats *curr_stats, struct dim *dim)
+{
+	int prev_ix = dim->profile_ix;
+	int stats_res;
+	int step_res;
+
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+		break;
+	case DIM_PARKING_TIRED:
+		break;
+
+	case DIM_GOING_RIGHT:
+	case DIM_GOING_LEFT:
+		stats_res = blk_dim_stats_compare(curr_stats, &dim->prev_stats);
+
+		switch (stats_res) {
+		case DIM_STATS_SAME:
+			if (curr_stats->cpe_ratio <= 50*prev_ix)
+				dim->profile_ix = 0;
+			break;
+		case DIM_STATS_WORSE:
+			dim_turn(dim);
+		default:
+		case DIM_STATS_BETTER:
+			/* fall through */
+			step_res = blk_dim_step(dim);
+			if (step_res == DIM_ON_EDGE)
+				dim_turn(dim);
+			break;
+		}
+		break;
+	}
+
+	dim->prev_stats = *curr_stats;
+
+	return dim->profile_ix != prev_ix;
+}
+
+void blk_dim(struct dim *dim, struct dim_sample end_sample)
+{
+	struct dim_stats curr_stats;
+	u16 nevents;
+
+	switch (dim->state) {
+	case DIM_MEASURE_IN_PROGRESS:
+		nevents = end_sample.event_ctr - dim->start_sample.event_ctr;
+		if (nevents < DIM_NEVENTS) {
+			dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr,
+				end_sample.byte_ctr, end_sample.comp_ctr, &dim->measuring_sample);
+			break;
+		}
+		dim_calc_stats(&dim->start_sample, &end_sample,
+				   &curr_stats);
+		if (blk_dim_decision(&curr_stats, dim)) {
+			dim->state = DIM_APPLY_NEW_PROFILE;
+			schedule_work(&dim->work);
+			break;
+		}
+		/* fall through */
+	case DIM_START_MEASURE:
+		dim->state = DIM_MEASURE_IN_PROGRESS;
+		dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr, end_sample.byte_ctr,
+				end_sample.comp_ctr, &dim->start_sample);
+		dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr, end_sample.byte_ctr,
+				end_sample.comp_ctr, &dim->measuring_sample);
+		break;
+	case DIM_APPLY_NEW_PROFILE:
+		break;
+	}
+}
+EXPORT_SYMBOL(blk_dim);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 8/9] linux/dim: Implement blk_dim.h
@ 2019-03-06  8:48   ` Tal Gilboa
  0 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)


From: Yamin Friedman <yaminf@mellanox.com>

blk_dim implements a different algorithm than net_dim that is optimized for nvmf
storage applications.
The algorithm optimizes for number of completions and ratio between completions
and events.
It also has a feature for fast reduction of moderation level when the traffic
changes in such a way as to no longer require high moderation in order to avoid
long latencies.

blk_dim.h will be called from the ib_core module.

Signed-off-by: Yamin Friedman <yaminf at mellanox.com>
Signed-off-by: Tal Gilboa <talgi at mellanox.com>
---
 MAINTAINERS             |   1 +
 include/linux/blk_dim.h |  56 ++++++++++++++++++++
 lib/dim/Makefile        |   7 ++-
 lib/dim/blk_dim.c       | 114 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 177 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/blk_dim.h
 create mode 100644 lib/dim/blk_dim.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 6ae949be8b83..2860a3316be5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5337,6 +5337,7 @@ DYNAMIC INTERRUPT MODERATION
 M:	Tal Gilboa <talgi at mellanox.com>
 S:	Maintained
 F:	include/linux/net_dim.h
+F:	include/linux/blk_dim.h
 F:	include/linux/dim.h
 F:	lib/dim/
 
diff --git a/include/linux/blk_dim.h b/include/linux/blk_dim.h
new file mode 100644
index 000000000000..a044f62ec8fe
--- /dev/null
+++ b/include/linux/blk_dim.h
@@ -0,0 +1,56 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef BLK_DIM_H
+#define BLK_DIM_H
+
+#include <linux/module.h>
+#include <linux/dim.h>
+
+#define BLK_DIM_PARAMS_NUM_PROFILES 8
+#define BLK_DIM_START_PROFILE 0
+
+static const struct dim_cq_moder
+blk_dim_prof[BLK_DIM_PARAMS_NUM_PROFILES] = {
+	{1,   0, 1,  0},
+	{2,   0, 2,  0},
+	{4,   0, 4,  0},
+	{16,  0, 4,  0},
+	{32,  0, 4,  0},
+	{32,  0, 16, 0},
+	{256, 0, 16, 0},
+	{256, 0, 32, 0},
+};
+
+void blk_dim(struct dim *dim, struct dim_sample end_sample);
+
+#endif /* BLK_DIM_H */
diff --git a/lib/dim/Makefile b/lib/dim/Makefile
index 160afe288df0..2b3c57318dbb 100644
--- a/lib/dim/Makefile
+++ b/lib/dim/Makefile
@@ -2,8 +2,13 @@
 # DIM Dynamic Interrupt Moderation library
 #
 
-obj-$(CONFIG_DIMLIB) = net_dim.o
+obj-$(CONFIG_DIMLIB) += net_dim.o
+obj-$(CONFIG_DIMLIB) += blk_dim.o
 
 net_dim-y = \
 	dim.o		\
 	net_dim.o
+
+blk_dim-y = \
+	dim.o		\
+	blk_dim.o
diff --git a/lib/dim/blk_dim.c b/lib/dim/blk_dim.c
new file mode 100644
index 000000000000..49107c169b56
--- /dev/null
+++ b/lib/dim/blk_dim.c
@@ -0,0 +1,114 @@
+#include <linux/blk_dim.h>
+
+static inline int blk_dim_step(struct dim *dim)
+{
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+	case DIM_PARKING_TIRED:
+		break;
+	case DIM_GOING_RIGHT:
+		if (dim->profile_ix == (BLK_DIM_PARAMS_NUM_PROFILES - 1))
+			return DIM_ON_EDGE;
+		dim->profile_ix++;
+		dim->steps_right++;
+		break;
+	case DIM_GOING_LEFT:
+		if (dim->profile_ix == 0)
+			return DIM_ON_EDGE;
+		dim->profile_ix--;
+		dim->steps_left++;
+		break;
+	}
+
+	return DIM_STEPPED;
+}
+
+static inline int blk_dim_stats_compare(struct dim_stats *curr, struct dim_stats *prev)
+{
+	/* first stat */
+	if (!prev->cpms)
+		return DIM_STATS_SAME;
+
+	if (IS_SIGNIFICANT_DIFF(curr->cpms, prev->cpms))
+		return (curr->cpms > prev->cpms) ? DIM_STATS_BETTER :
+						DIM_STATS_WORSE;
+
+	if (IS_SIGNIFICANT_DIFF(curr->cpe_ratio, prev->cpe_ratio))
+		return (curr->cpe_ratio > prev->cpe_ratio) ? DIM_STATS_BETTER :
+						DIM_STATS_WORSE;
+
+	return DIM_STATS_SAME;
+}
+
+static inline bool blk_dim_decision(struct dim_stats *curr_stats, struct dim *dim)
+{
+	int prev_ix = dim->profile_ix;
+	int stats_res;
+	int step_res;
+
+	switch (dim->tune_state) {
+	case DIM_PARKING_ON_TOP:
+		break;
+	case DIM_PARKING_TIRED:
+		break;
+
+	case DIM_GOING_RIGHT:
+	case DIM_GOING_LEFT:
+		stats_res = blk_dim_stats_compare(curr_stats, &dim->prev_stats);
+
+		switch (stats_res) {
+		case DIM_STATS_SAME:
+			if (curr_stats->cpe_ratio <= 50*prev_ix)
+				dim->profile_ix = 0;
+			break;
+		case DIM_STATS_WORSE:
+			dim_turn(dim);
+		default:
+		case DIM_STATS_BETTER:
+			/* fall through */
+			step_res = blk_dim_step(dim);
+			if (step_res == DIM_ON_EDGE)
+				dim_turn(dim);
+			break;
+		}
+		break;
+	}
+
+	dim->prev_stats = *curr_stats;
+
+	return dim->profile_ix != prev_ix;
+}
+
+void blk_dim(struct dim *dim, struct dim_sample end_sample)
+{
+	struct dim_stats curr_stats;
+	u16 nevents;
+
+	switch (dim->state) {
+	case DIM_MEASURE_IN_PROGRESS:
+		nevents = end_sample.event_ctr - dim->start_sample.event_ctr;
+		if (nevents < DIM_NEVENTS) {
+			dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr,
+				end_sample.byte_ctr, end_sample.comp_ctr, &dim->measuring_sample);
+			break;
+		}
+		dim_calc_stats(&dim->start_sample, &end_sample,
+				   &curr_stats);
+		if (blk_dim_decision(&curr_stats, dim)) {
+			dim->state = DIM_APPLY_NEW_PROFILE;
+			schedule_work(&dim->work);
+			break;
+		}
+		/* fall through */
+	case DIM_START_MEASURE:
+		dim->state = DIM_MEASURE_IN_PROGRESS;
+		dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr, end_sample.byte_ctr,
+				end_sample.comp_ctr, &dim->start_sample);
+		dim_create_sample(end_sample.event_ctr, end_sample.pkt_ctr, end_sample.byte_ctr,
+				end_sample.comp_ctr, &dim->measuring_sample);
+		break;
+	case DIM_APPLY_NEW_PROFILE:
+		break;
+	}
+}
+EXPORT_SYMBOL(blk_dim);
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 9/9] drivers/infiniband: Use blk_dim in infiniband driver
  2019-03-06  8:48 ` Tal Gilboa
@ 2019-03-06  8:48   ` Tal Gilboa
  -1 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)
  To: linux-rdma, linux-nvme, linux-block
  Cc: Yishai Hadas, Leon Romanovsky, Jason Gunthorpe, Doug Ledford,
	Tariq Toukan, Tal Gilboa, Saeed Mahameed, Idan Burstein,
	Yamin Friedman, Max Gurtovoy

From: Yamin Friedman <yaminf@mellanox.com>

Added the interface in the infiniband driver that applies the blk_dim adaptive
moderation.

Performance improvment (ConnectX-5 100GbE, x86) running FIO benchmark over NVMf
between two equal end-hosts across a Mellanox switch:
Running long tests that switch between periods of high bandwidth high latency
and low bandwidth low latency, using the blk_dim algorithm there is a much
shorter wait before the moderation is reduced and thus tail latency is reduced.
There is a 200% reduction on tail latency when switching from high bandwidth to
low bandwidth traffic without degredation of other flow parameters.

The blk_dim algorithm was designed to measure the effectiveness of moderation
on the flow in a general way and thus should be appropriate for all RDMA storage
protocols.

Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
---
 drivers/infiniband/core/cq.c    | 75 ++++++++++++++++++++++++++++++---
 drivers/infiniband/hw/mlx4/qp.c |  2 +-
 drivers/infiniband/hw/mlx5/qp.c |  2 +-
 include/linux/irq_poll.h        |  7 +++
 include/rdma/ib_verbs.h         | 11 ++++-
 lib/irq_poll.c                  | 13 +++++-
 6 files changed, 100 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index d61e5e1427c2..065b54978dae 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -14,6 +14,7 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <rdma/ib_verbs.h>
+#include <linux/blk_dim.h>
 
 /* # of WCs to poll for with a single call to ib_poll_cq */
 #define IB_POLL_BATCH			16
@@ -26,6 +27,51 @@
 #define IB_POLL_FLAGS \
 	(IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)
 
+static bool use_am = true;
+module_param(use_am, bool, 0444);
+MODULE_PARM_DESC(use_am, "Use cq adaptive moderation");
+
+static int ib_cq_dim_modify_cq(struct ib_cq *cq, unsigned short level)
+{
+	u16 usec = blk_dim_prof[level].usec;
+	u16 comps = blk_dim_prof[level].comps;
+
+	return cq->device->modify_cq(cq, comps, usec);
+}
+
+static void update_cq_moderation(struct dim *dim, struct ib_cq *cq)
+{
+	dim->state = DIM_START_MEASURE;
+
+	ib_cq_dim_modify_cq(cq, dim->profile_ix);
+}
+
+static void ib_cq_blk_dim_workqueue_work(struct work_struct *w)
+{
+	struct dim *dim = container_of(w, struct dim, work);
+	struct ib_cq *cq = container_of(dim, struct ib_cq, workqueue_poll.dim);
+
+	update_cq_moderation(dim, cq);
+}
+
+static void ib_cq_blk_dim_irqpoll_work(struct work_struct *w)
+{
+	struct dim *dim = container_of(w, struct dim, work);
+	struct irq_poll *iop = container_of(dim, struct irq_poll, dim);
+	struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
+
+	update_cq_moderation(dim, cq);
+}
+
+void blk_dim_init(struct dim *dim, work_func_t func)
+{
+	memset(dim, 0, sizeof(*dim));
+	dim->state = DIM_START_MEASURE;
+	dim->tune_state = DIM_GOING_RIGHT;
+	dim->profile_ix = BLK_DIM_START_PROFILE;
+	INIT_WORK(&dim->work, func);
+}
+
 static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *wcs,
 			   int batch)
 {
@@ -105,19 +151,28 @@ static void ib_cq_completion_softirq(struct ib_cq *cq, void *private)
 
 static void ib_cq_poll_work(struct work_struct *work)
 {
-	struct ib_cq *cq = container_of(work, struct ib_cq, work);
+	struct ib_cq *cq = container_of(work, struct ib_cq, workqueue_poll.work);
 	int completed;
+	struct dim_sample e_sample;
+	struct dim_sample *m_sample = &cq->workqueue_poll.dim.measuring_sample;
 
 	completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, cq->wc,
 				    IB_POLL_BATCH);
+
+	if (cq->workqueue_poll.dim_used)
+		dim_create_sample(m_sample->event_ctr + 1, m_sample->pkt_ctr, m_sample->byte_ctr,
+							m_sample->comp_ctr + completed, &e_sample);
+
 	if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
 	    ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
-		queue_work(cq->comp_wq, &cq->work);
+		queue_work(cq->comp_wq, &cq->workqueue_poll.work);
+	else if (cq->workqueue_poll.dim_used)
+		blk_dim(&cq->workqueue_poll.dim, e_sample);
 }
 
 static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
 {
-	queue_work(cq->comp_wq, &cq->work);
+	queue_work(cq->comp_wq, &cq->workqueue_poll.work);
 }
 
 /**
@@ -172,12 +227,20 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private,
 		cq->comp_handler = ib_cq_completion_softirq;
 
 		irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ, ib_poll_handler);
+		if (cq->device->modify_cq && use_am) {
+			blk_dim_init(&cq->iop.dim, ib_cq_blk_dim_irqpoll_work);
+			cq->iop.dim_used = true;
+		}
 		ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
 		break;
 	case IB_POLL_WORKQUEUE:
 	case IB_POLL_UNBOUND_WORKQUEUE:
 		cq->comp_handler = ib_cq_completion_workqueue;
-		INIT_WORK(&cq->work, ib_cq_poll_work);
+		INIT_WORK(&cq->workqueue_poll.work, ib_cq_poll_work);
+		if (cq->device->modify_cq && use_am) {
+			blk_dim_init(&cq->workqueue_poll.dim, ib_cq_blk_dim_workqueue_work);
+			cq->workqueue_poll.dim_used = true;
+		}
 		ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
 		cq->comp_wq = (cq->poll_ctx == IB_POLL_WORKQUEUE) ?
 				ib_comp_wq : ib_comp_unbound_wq;
@@ -217,7 +280,9 @@ void ib_free_cq(struct ib_cq *cq)
 		break;
 	case IB_POLL_WORKQUEUE:
 	case IB_POLL_UNBOUND_WORKQUEUE:
-		cancel_work_sync(&cq->work);
+		cancel_work_sync(&cq->workqueue_poll.work);
+		if (cq->workqueue_poll.dim_used)
+			flush_work(&cq->iop.dim.work);
 		break;
 	default:
 		WARN_ON_ONCE(1);
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 971e9a9ebdaf..f3e5dbe4689a 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -4365,7 +4365,7 @@ static void handle_drain_completion(struct ib_cq *cq,
 				irq_poll_enable(&cq->iop);
 				break;
 			case IB_POLL_WORKQUEUE:
-				cancel_work_sync(&cq->work);
+				cancel_work_sync(&cq->workqueue_poll.work);
 				break;
 			default:
 				WARN_ON_ONCE(1);
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index dd2ae640bc84..4b65147010cc 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -6102,7 +6102,7 @@ static void handle_drain_completion(struct ib_cq *cq,
 				irq_poll_enable(&cq->iop);
 				break;
 			case IB_POLL_WORKQUEUE:
-				cancel_work_sync(&cq->work);
+				cancel_work_sync(&cq->workqueue_poll.work);
 				break;
 			default:
 				WARN_ON_ONCE(1);
diff --git a/include/linux/irq_poll.h b/include/linux/irq_poll.h
index 16aaeccb65cb..ede1a390159b 100644
--- a/include/linux/irq_poll.h
+++ b/include/linux/irq_poll.h
@@ -2,14 +2,21 @@
 #ifndef IRQ_POLL_H
 #define IRQ_POLL_H
 
+#include <linux/blk_dim.h>
+
 struct irq_poll;
 typedef int (irq_poll_fn)(struct irq_poll *, int);
+typedef int (irq_poll_dim_fn)(struct irq_poll *);
 
 struct irq_poll {
 	struct list_head list;
 	unsigned long state;
 	int weight;
 	irq_poll_fn *poll;
+
+	bool dim_used;
+	struct dim dim;
+	irq_poll_dim_fn *dimfn;
 };
 
 enum {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index a3ceed3a040a..d8060c3cee06 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1584,6 +1584,13 @@ enum ib_poll_context {
 	IB_POLL_UNBOUND_WORKQUEUE, /* poll from unbound workqueue */
 };
 
+struct ib_cq_workqueue_poll {
+	struct dim              dim;
+	struct work_struct      work;
+	bool                    dim_used;
+};
+
+
 struct ib_cq {
 	struct ib_device       *device;
 	struct ib_uobject      *uobject;
@@ -1595,8 +1602,8 @@ struct ib_cq {
 	enum ib_poll_context	poll_ctx;
 	struct ib_wc		*wc;
 	union {
-		struct irq_poll		iop;
-		struct work_struct	work;
+		struct irq_poll			iop;
+		struct ib_cq_workqueue_poll	workqueue_poll;
 	};
 	struct workqueue_struct *comp_wq;
 	/*
diff --git a/lib/irq_poll.c b/lib/irq_poll.c
index 86a709954f5a..2b5e41f0e583 100644
--- a/lib/irq_poll.c
+++ b/lib/irq_poll.c
@@ -53,6 +53,8 @@ static void __irq_poll_complete(struct irq_poll *iop)
 	list_del(&iop->list);
 	smp_mb__before_atomic();
 	clear_bit_unlock(IRQ_POLL_F_SCHED, &iop->state);
+	if (iop->dim_used)
+		blk_dim(&iop->dim, iop->dim.measuring_sample);
 }
 
 /**
@@ -86,6 +88,7 @@ static void __latent_entropy irq_poll_softirq(struct softirq_action *h)
 	while (!list_empty(list)) {
 		struct irq_poll *iop;
 		int work, weight;
+		struct dim_sample *m_sample;
 
 		/*
 		 * If softirq window is exhausted then punt.
@@ -104,10 +107,16 @@ static void __latent_entropy irq_poll_softirq(struct softirq_action *h)
 		 */
 		iop = list_entry(list->next, struct irq_poll, list);
 
+		m_sample = &iop->dim.measuring_sample;
 		weight = iop->weight;
 		work = 0;
-		if (test_bit(IRQ_POLL_F_SCHED, &iop->state))
+		if (test_bit(IRQ_POLL_F_SCHED, &iop->state)) {
 			work = iop->poll(iop, weight);
+			if (iop->dim_used)
+				dim_create_sample(m_sample->event_ctr + 1, m_sample->pkt_ctr,
+					m_sample->byte_ctr, m_sample->comp_ctr + work,
+						&iop->dim.measuring_sample);
+		}
 
 		budget -= work;
 
@@ -144,6 +153,8 @@ static void __latent_entropy irq_poll_softirq(struct softirq_action *h)
  **/
 void irq_poll_disable(struct irq_poll *iop)
 {
+	if (iop->dim_used)
+		flush_work(&iop->dim.work);
 	set_bit(IRQ_POLL_F_DISABLE, &iop->state);
 	while (test_and_set_bit(IRQ_POLL_F_SCHED, &iop->state))
 		msleep(1);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 9/9] drivers/infiniband: Use blk_dim in infiniband driver
@ 2019-03-06  8:48   ` Tal Gilboa
  0 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-03-06  8:48 UTC (permalink / raw)


From: Yamin Friedman <yaminf@mellanox.com>

Added the interface in the infiniband driver that applies the blk_dim adaptive
moderation.

Performance improvment (ConnectX-5 100GbE, x86) running FIO benchmark over NVMf
between two equal end-hosts across a Mellanox switch:
Running long tests that switch between periods of high bandwidth high latency
and low bandwidth low latency, using the blk_dim algorithm there is a much
shorter wait before the moderation is reduced and thus tail latency is reduced.
There is a 200% reduction on tail latency when switching from high bandwidth to
low bandwidth traffic without degredation of other flow parameters.

The blk_dim algorithm was designed to measure the effectiveness of moderation
on the flow in a general way and thus should be appropriate for all RDMA storage
protocols.

Signed-off-by: Yamin Friedman <yaminf at mellanox.com>
Signed-off-by: Tal Gilboa <talgi at mellanox.com>
---
 drivers/infiniband/core/cq.c    | 75 ++++++++++++++++++++++++++++++---
 drivers/infiniband/hw/mlx4/qp.c |  2 +-
 drivers/infiniband/hw/mlx5/qp.c |  2 +-
 include/linux/irq_poll.h        |  7 +++
 include/rdma/ib_verbs.h         | 11 ++++-
 lib/irq_poll.c                  | 13 +++++-
 6 files changed, 100 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index d61e5e1427c2..065b54978dae 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -14,6 +14,7 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <rdma/ib_verbs.h>
+#include <linux/blk_dim.h>
 
 /* # of WCs to poll for with a single call to ib_poll_cq */
 #define IB_POLL_BATCH			16
@@ -26,6 +27,51 @@
 #define IB_POLL_FLAGS \
 	(IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)
 
+static bool use_am = true;
+module_param(use_am, bool, 0444);
+MODULE_PARM_DESC(use_am, "Use cq adaptive moderation");
+
+static int ib_cq_dim_modify_cq(struct ib_cq *cq, unsigned short level)
+{
+	u16 usec = blk_dim_prof[level].usec;
+	u16 comps = blk_dim_prof[level].comps;
+
+	return cq->device->modify_cq(cq, comps, usec);
+}
+
+static void update_cq_moderation(struct dim *dim, struct ib_cq *cq)
+{
+	dim->state = DIM_START_MEASURE;
+
+	ib_cq_dim_modify_cq(cq, dim->profile_ix);
+}
+
+static void ib_cq_blk_dim_workqueue_work(struct work_struct *w)
+{
+	struct dim *dim = container_of(w, struct dim, work);
+	struct ib_cq *cq = container_of(dim, struct ib_cq, workqueue_poll.dim);
+
+	update_cq_moderation(dim, cq);
+}
+
+static void ib_cq_blk_dim_irqpoll_work(struct work_struct *w)
+{
+	struct dim *dim = container_of(w, struct dim, work);
+	struct irq_poll *iop = container_of(dim, struct irq_poll, dim);
+	struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
+
+	update_cq_moderation(dim, cq);
+}
+
+void blk_dim_init(struct dim *dim, work_func_t func)
+{
+	memset(dim, 0, sizeof(*dim));
+	dim->state = DIM_START_MEASURE;
+	dim->tune_state = DIM_GOING_RIGHT;
+	dim->profile_ix = BLK_DIM_START_PROFILE;
+	INIT_WORK(&dim->work, func);
+}
+
 static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *wcs,
 			   int batch)
 {
@@ -105,19 +151,28 @@ static void ib_cq_completion_softirq(struct ib_cq *cq, void *private)
 
 static void ib_cq_poll_work(struct work_struct *work)
 {
-	struct ib_cq *cq = container_of(work, struct ib_cq, work);
+	struct ib_cq *cq = container_of(work, struct ib_cq, workqueue_poll.work);
 	int completed;
+	struct dim_sample e_sample;
+	struct dim_sample *m_sample = &cq->workqueue_poll.dim.measuring_sample;
 
 	completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, cq->wc,
 				    IB_POLL_BATCH);
+
+	if (cq->workqueue_poll.dim_used)
+		dim_create_sample(m_sample->event_ctr + 1, m_sample->pkt_ctr, m_sample->byte_ctr,
+							m_sample->comp_ctr + completed, &e_sample);
+
 	if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
 	    ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
-		queue_work(cq->comp_wq, &cq->work);
+		queue_work(cq->comp_wq, &cq->workqueue_poll.work);
+	else if (cq->workqueue_poll.dim_used)
+		blk_dim(&cq->workqueue_poll.dim, e_sample);
 }
 
 static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
 {
-	queue_work(cq->comp_wq, &cq->work);
+	queue_work(cq->comp_wq, &cq->workqueue_poll.work);
 }
 
 /**
@@ -172,12 +227,20 @@ struct ib_cq *__ib_alloc_cq(struct ib_device *dev, void *private,
 		cq->comp_handler = ib_cq_completion_softirq;
 
 		irq_poll_init(&cq->iop, IB_POLL_BUDGET_IRQ, ib_poll_handler);
+		if (cq->device->modify_cq && use_am) {
+			blk_dim_init(&cq->iop.dim, ib_cq_blk_dim_irqpoll_work);
+			cq->iop.dim_used = true;
+		}
 		ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
 		break;
 	case IB_POLL_WORKQUEUE:
 	case IB_POLL_UNBOUND_WORKQUEUE:
 		cq->comp_handler = ib_cq_completion_workqueue;
-		INIT_WORK(&cq->work, ib_cq_poll_work);
+		INIT_WORK(&cq->workqueue_poll.work, ib_cq_poll_work);
+		if (cq->device->modify_cq && use_am) {
+			blk_dim_init(&cq->workqueue_poll.dim, ib_cq_blk_dim_workqueue_work);
+			cq->workqueue_poll.dim_used = true;
+		}
 		ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
 		cq->comp_wq = (cq->poll_ctx == IB_POLL_WORKQUEUE) ?
 				ib_comp_wq : ib_comp_unbound_wq;
@@ -217,7 +280,9 @@ void ib_free_cq(struct ib_cq *cq)
 		break;
 	case IB_POLL_WORKQUEUE:
 	case IB_POLL_UNBOUND_WORKQUEUE:
-		cancel_work_sync(&cq->work);
+		cancel_work_sync(&cq->workqueue_poll.work);
+		if (cq->workqueue_poll.dim_used)
+			flush_work(&cq->iop.dim.work);
 		break;
 	default:
 		WARN_ON_ONCE(1);
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 971e9a9ebdaf..f3e5dbe4689a 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -4365,7 +4365,7 @@ static void handle_drain_completion(struct ib_cq *cq,
 				irq_poll_enable(&cq->iop);
 				break;
 			case IB_POLL_WORKQUEUE:
-				cancel_work_sync(&cq->work);
+				cancel_work_sync(&cq->workqueue_poll.work);
 				break;
 			default:
 				WARN_ON_ONCE(1);
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index dd2ae640bc84..4b65147010cc 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -6102,7 +6102,7 @@ static void handle_drain_completion(struct ib_cq *cq,
 				irq_poll_enable(&cq->iop);
 				break;
 			case IB_POLL_WORKQUEUE:
-				cancel_work_sync(&cq->work);
+				cancel_work_sync(&cq->workqueue_poll.work);
 				break;
 			default:
 				WARN_ON_ONCE(1);
diff --git a/include/linux/irq_poll.h b/include/linux/irq_poll.h
index 16aaeccb65cb..ede1a390159b 100644
--- a/include/linux/irq_poll.h
+++ b/include/linux/irq_poll.h
@@ -2,14 +2,21 @@
 #ifndef IRQ_POLL_H
 #define IRQ_POLL_H
 
+#include <linux/blk_dim.h>
+
 struct irq_poll;
 typedef int (irq_poll_fn)(struct irq_poll *, int);
+typedef int (irq_poll_dim_fn)(struct irq_poll *);
 
 struct irq_poll {
 	struct list_head list;
 	unsigned long state;
 	int weight;
 	irq_poll_fn *poll;
+
+	bool dim_used;
+	struct dim dim;
+	irq_poll_dim_fn *dimfn;
 };
 
 enum {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index a3ceed3a040a..d8060c3cee06 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1584,6 +1584,13 @@ enum ib_poll_context {
 	IB_POLL_UNBOUND_WORKQUEUE, /* poll from unbound workqueue */
 };
 
+struct ib_cq_workqueue_poll {
+	struct dim              dim;
+	struct work_struct      work;
+	bool                    dim_used;
+};
+
+
 struct ib_cq {
 	struct ib_device       *device;
 	struct ib_uobject      *uobject;
@@ -1595,8 +1602,8 @@ struct ib_cq {
 	enum ib_poll_context	poll_ctx;
 	struct ib_wc		*wc;
 	union {
-		struct irq_poll		iop;
-		struct work_struct	work;
+		struct irq_poll			iop;
+		struct ib_cq_workqueue_poll	workqueue_poll;
 	};
 	struct workqueue_struct *comp_wq;
 	/*
diff --git a/lib/irq_poll.c b/lib/irq_poll.c
index 86a709954f5a..2b5e41f0e583 100644
--- a/lib/irq_poll.c
+++ b/lib/irq_poll.c
@@ -53,6 +53,8 @@ static void __irq_poll_complete(struct irq_poll *iop)
 	list_del(&iop->list);
 	smp_mb__before_atomic();
 	clear_bit_unlock(IRQ_POLL_F_SCHED, &iop->state);
+	if (iop->dim_used)
+		blk_dim(&iop->dim, iop->dim.measuring_sample);
 }
 
 /**
@@ -86,6 +88,7 @@ static void __latent_entropy irq_poll_softirq(struct softirq_action *h)
 	while (!list_empty(list)) {
 		struct irq_poll *iop;
 		int work, weight;
+		struct dim_sample *m_sample;
 
 		/*
 		 * If softirq window is exhausted then punt.
@@ -104,10 +107,16 @@ static void __latent_entropy irq_poll_softirq(struct softirq_action *h)
 		 */
 		iop = list_entry(list->next, struct irq_poll, list);
 
+		m_sample = &iop->dim.measuring_sample;
 		weight = iop->weight;
 		work = 0;
-		if (test_bit(IRQ_POLL_F_SCHED, &iop->state))
+		if (test_bit(IRQ_POLL_F_SCHED, &iop->state)) {
 			work = iop->poll(iop, weight);
+			if (iop->dim_used)
+				dim_create_sample(m_sample->event_ctr + 1, m_sample->pkt_ctr,
+					m_sample->byte_ctr, m_sample->comp_ctr + work,
+						&iop->dim.measuring_sample);
+		}
 
 		budget -= work;
 
@@ -144,6 +153,8 @@ static void __latent_entropy irq_poll_softirq(struct softirq_action *h)
  **/
 void irq_poll_disable(struct irq_poll *iop)
 {
+	if (iop->dim_used)
+		flush_work(&iop->dim.work);
 	set_bit(IRQ_POLL_F_DISABLE, &iop->state);
 	while (test_and_set_bit(IRQ_POLL_F_SCHED, &iop->state))
 		msleep(1);
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
  2019-03-06  8:48 ` Tal Gilboa
@ 2019-03-06 16:15   ` Bart Van Assche
  -1 siblings, 0 replies; 48+ messages in thread
From: Bart Van Assche @ 2019-03-06 16:15 UTC (permalink / raw)
  To: Tal Gilboa, linux-rdma, linux-nvme, linux-block
  Cc: Yamin Friedman, Leon Romanovsky, Saeed Mahameed, Yishai Hadas,
	Jason Gunthorpe, Doug Ledford, Max Gurtovoy, Idan Burstein,
	Tariq Toukan

On Wed, 2019-03-06 at 10:48 +0200, Tal Gilboa wrote:
> net_dim.h lib exposes an implementation of the DIM algorithm for dynamically-tuned interrupt
> moderation for networking interfaces.
> 
> We need the same behavior for any block CQ. The main motivation is two benefit from maximized
> completion rate and reduced interrupt overhead that DIM may provide.

What is a "block CQ"? How does net_dim compare to lib/irq_poll? Which approach
results in the best performance and lowest latency?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-03-06 16:15   ` Bart Van Assche
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Van Assche @ 2019-03-06 16:15 UTC (permalink / raw)


On Wed, 2019-03-06@10:48 +0200, Tal Gilboa wrote:
> net_dim.h lib exposes an implementation of the DIM algorithm for dynamically-tuned interrupt
> moderation for networking interfaces.
> 
> We need the same behavior for any block CQ. The main motivation is two benefit from maximized
> completion rate and reduced interrupt overhead that DIM may provide.

What is a "block CQ"? How does net_dim compare to lib/irq_poll? Which approach
results in the best performance and lowest latency?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
  2019-03-06 16:15   ` Bart Van Assche
@ 2019-03-07  1:56     ` Sagi Grimberg
  -1 siblings, 0 replies; 48+ messages in thread
From: Sagi Grimberg @ 2019-03-07  1:56 UTC (permalink / raw)
  To: Bart Van Assche, Tal Gilboa, linux-rdma, linux-nvme, linux-block
  Cc: Yamin Friedman, Leon Romanovsky, Jason Gunthorpe, Yishai Hadas,
	Saeed Mahameed, Doug Ledford, Max Gurtovoy, Idan Burstein,
	Tariq Toukan


>> net_dim.h lib exposes an implementation of the DIM algorithm for dynamically-tuned interrupt
>> moderation for networking interfaces.
>>
>> We need the same behavior for any block CQ. The main motivation is two benefit from maximized
>> completion rate and reduced interrupt overhead that DIM may provide.
> 
> What is a "block CQ"?

There is no such thing... Also, this has no difference
if a block/file/whatever is using the rdma cq.

The naming should really be something like rdma_dim as it accounts
for completions and not bytes/packets.

> How does net_dim compare to lib/irq_poll?

Its orthogonal, its basically adaptive interrupt moderation for
RDMA devices. Its sort of below the irq_poll code. It basically
configures interrupt moderation based on stats collected by
the rdma driver.

> Which approach results in the best performance and lowest latency?

I guess it depends on what is the test case. This approach tries to
apply some time or completion count limit to when the HW should fire
an interrupt based on the load in an adaptive fashion.

The scheme is to try and detect what are the load characteristics and
come up with a moderation parameters that fit. For high interrupt rate
(usually seen with small size high queue-depth workloads) it configures
the device to aggregate some more before firing an interrupt - so less
interrupts, better efficiency per interrupt (finds more completions).
For low interrupt rate (low queue depth) the load is probably low to
moderate and aggregating before firing an interrupt is just added
latency for no benefit. So the algorithm tries to transition between a
number of pre-defined levels according to the load it samples.

This has been widely used by the network drivers for the past decade.

Now, this algorithm while trying to adjust itself by learning the load,
also adds entropy to the overall system performance and latency.
So this is not a trivial trade-off for any workload.

I took a stab at this once (came up with something very similar),
and while for large queue-depth workloads I got up to 2x IOPs as the
algorithm chose aggressive moderation parameters which improved the
efficiency a lot, but when the workload varied the algorithm wasn't very
successful detecting the load and the step direction (I used a variation
of the same basic algorithm from mlx5 driver that net_dim is based on).

Also, QD=1 resulted in higher latency as the algorithm was dangling
between the two lowest levels. So I guess this needs to undergo a
thorough performance evaluation for steady and varying workloads before
we can consider this.

Overall, I think its a great idea to add that to the rdma subsystem
but we cannot make it the default and especially without being able
to turn it off. So this needs to be opt in with a sysctl option.

Moreover, not every device support cq moderation so you need to check
the device capabilities before you apply any of this.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-03-07  1:56     ` Sagi Grimberg
  0 siblings, 0 replies; 48+ messages in thread
From: Sagi Grimberg @ 2019-03-07  1:56 UTC (permalink / raw)



>> net_dim.h lib exposes an implementation of the DIM algorithm for dynamically-tuned interrupt
>> moderation for networking interfaces.
>>
>> We need the same behavior for any block CQ. The main motivation is two benefit from maximized
>> completion rate and reduced interrupt overhead that DIM may provide.
> 
> What is a "block CQ"?

There is no such thing... Also, this has no difference
if a block/file/whatever is using the rdma cq.

The naming should really be something like rdma_dim as it accounts
for completions and not bytes/packets.

> How does net_dim compare to lib/irq_poll?

Its orthogonal, its basically adaptive interrupt moderation for
RDMA devices. Its sort of below the irq_poll code. It basically
configures interrupt moderation based on stats collected by
the rdma driver.

> Which approach results in the best performance and lowest latency?

I guess it depends on what is the test case. This approach tries to
apply some time or completion count limit to when the HW should fire
an interrupt based on the load in an adaptive fashion.

The scheme is to try and detect what are the load characteristics and
come up with a moderation parameters that fit. For high interrupt rate
(usually seen with small size high queue-depth workloads) it configures
the device to aggregate some more before firing an interrupt - so less
interrupts, better efficiency per interrupt (finds more completions).
For low interrupt rate (low queue depth) the load is probably low to
moderate and aggregating before firing an interrupt is just added
latency for no benefit. So the algorithm tries to transition between a
number of pre-defined levels according to the load it samples.

This has been widely used by the network drivers for the past decade.

Now, this algorithm while trying to adjust itself by learning the load,
also adds entropy to the overall system performance and latency.
So this is not a trivial trade-off for any workload.

I took a stab at this once (came up with something very similar),
and while for large queue-depth workloads I got up to 2x IOPs as the
algorithm chose aggressive moderation parameters which improved the
efficiency a lot, but when the workload varied the algorithm wasn't very
successful detecting the load and the step direction (I used a variation
of the same basic algorithm from mlx5 driver that net_dim is based on).

Also, QD=1 resulted in higher latency as the algorithm was dangling
between the two lowest levels. So I guess this needs to undergo a
thorough performance evaluation for steady and varying workloads before
we can consider this.

Overall, I think its a great idea to add that to the rdma subsystem
but we cannot make it the default and especially without being able
to turn it off. So this needs to be opt in with a sysctl option.

Moreover, not every device support cq moderation so you need to check
the device capabilities before you apply any of this.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
  2019-03-07  1:56     ` Sagi Grimberg
@ 2019-03-14 11:45       ` Max Gurtovoy
  -1 siblings, 0 replies; 48+ messages in thread
From: Max Gurtovoy @ 2019-03-14 11:45 UTC (permalink / raw)
  To: Sagi Grimberg, Bart Van Assche, Tal Gilboa, linux-rdma,
	linux-nvme, linux-block
  Cc: Yamin Friedman, Leon Romanovsky, Jason Gunthorpe, Yishai Hadas,
	Saeed Mahameed, Doug Ledford, Idan Burstein, Tariq Toukan


On 3/7/2019 3:56 AM, Sagi Grimberg wrote:
>
>>> net_dim.h lib exposes an implementation of the DIM algorithm for 
>>> dynamically-tuned interrupt
>>> moderation for networking interfaces.
>>>
>>> We need the same behavior for any block CQ. The main motivation is 
>>> two benefit from maximized
>>> completion rate and reduced interrupt overhead that DIM may provide.
>>
>> What is a "block CQ"?
>
> There is no such thing... Also, this has no difference
> if a block/file/whatever is using the rdma cq.
>
> The naming should really be something like rdma_dim as it accounts
> for completions and not bytes/packets.

Sagi,

I think that in the future we could use it in nvme since there is an 
option to set the interrupt coalescing in NVMe spec.

This might improve performance for NVMe driver.

We already see some bottlenecks in performance (maybe driver ones) while 
developing the NVMe SNAP feature in Bluefield (NVMe emulation using 
Smart NIC).

We're trying to get 2.5-2.7 MIOPs OOB from 1 controller and it's not 
trivial for today's driver.

So let's take this into consideration when we set the naming.


>
>
>> How does net_dim compare to lib/irq_poll?
>
> Its orthogonal, its basically adaptive interrupt moderation for
> RDMA devices. Its sort of below the irq_poll code. It basically
> configures interrupt moderation based on stats collected by
> the rdma driver.
>
>> Which approach results in the best performance and lowest latency?
>
> I guess it depends on what is the test case. This approach tries to
> apply some time or completion count limit to when the HW should fire
> an interrupt based on the load in an adaptive fashion.
>
> The scheme is to try and detect what are the load characteristics and
> come up with a moderation parameters that fit. For high interrupt rate
> (usually seen with small size high queue-depth workloads) it configures
> the device to aggregate some more before firing an interrupt - so less
> interrupts, better efficiency per interrupt (finds more completions).
> For low interrupt rate (low queue depth) the load is probably low to
> moderate and aggregating before firing an interrupt is just added
> latency for no benefit. So the algorithm tries to transition between a
> number of pre-defined levels according to the load it samples.
>
> This has been widely used by the network drivers for the past decade.
>
> Now, this algorithm while trying to adjust itself by learning the load,
> also adds entropy to the overall system performance and latency.
> So this is not a trivial trade-off for any workload.
>
> I took a stab at this once (came up with something very similar),
> and while for large queue-depth workloads I got up to 2x IOPs as the
> algorithm chose aggressive moderation parameters which improved the
> efficiency a lot, but when the workload varied the algorithm wasn't very
> successful detecting the load and the step direction (I used a variation
> of the same basic algorithm from mlx5 driver that net_dim is based on).
>
> Also, QD=1 resulted in higher latency as the algorithm was dangling
> between the two lowest levels. So I guess this needs to undergo a
> thorough performance evaluation for steady and varying workloads before
> we can consider this.
>
> Overall, I think its a great idea to add that to the rdma subsystem
> but we cannot make it the default and especially without being able
> to turn it off. So this needs to be opt in with a sysctl option.

We can add flag in create_cq command that will 
try_coalescing_is_possible instead of module parameter of course.

Storage ULPs can set it to True.

Also in the internal review Yamin added a table that summarize all the 
testing that were done using NVMeoF (I guess it somehow didn't get to 
this RFC).

I guess we can do the same for iSER to get more confidence and then set 
both to create modifiable cq (if HCA supports, of course).

Agreed ?

>
>
> Moreover, not every device support cq moderation so you need to check
> the device capabilities before you apply any of this.

for sure.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-03-14 11:45       ` Max Gurtovoy
  0 siblings, 0 replies; 48+ messages in thread
From: Max Gurtovoy @ 2019-03-14 11:45 UTC (permalink / raw)



On 3/7/2019 3:56 AM, Sagi Grimberg wrote:
>
>>> net_dim.h lib exposes an implementation of the DIM algorithm for 
>>> dynamically-tuned interrupt
>>> moderation for networking interfaces.
>>>
>>> We need the same behavior for any block CQ. The main motivation is 
>>> two benefit from maximized
>>> completion rate and reduced interrupt overhead that DIM may provide.
>>
>> What is a "block CQ"?
>
> There is no such thing... Also, this has no difference
> if a block/file/whatever is using the rdma cq.
>
> The naming should really be something like rdma_dim as it accounts
> for completions and not bytes/packets.

Sagi,

I think that in the future we could use it in nvme since there is an 
option to set the interrupt coalescing in NVMe spec.

This might improve performance for NVMe driver.

We already see some bottlenecks in performance (maybe driver ones) while 
developing the NVMe SNAP feature in Bluefield (NVMe emulation using 
Smart NIC).

We're trying to get 2.5-2.7 MIOPs OOB from 1 controller and it's not 
trivial for today's driver.

So let's take this into consideration when we set the naming.


>
>
>> How does net_dim compare to lib/irq_poll?
>
> Its orthogonal, its basically adaptive interrupt moderation for
> RDMA devices. Its sort of below the irq_poll code. It basically
> configures interrupt moderation based on stats collected by
> the rdma driver.
>
>> Which approach results in the best performance and lowest latency?
>
> I guess it depends on what is the test case. This approach tries to
> apply some time or completion count limit to when the HW should fire
> an interrupt based on the load in an adaptive fashion.
>
> The scheme is to try and detect what are the load characteristics and
> come up with a moderation parameters that fit. For high interrupt rate
> (usually seen with small size high queue-depth workloads) it configures
> the device to aggregate some more before firing an interrupt - so less
> interrupts, better efficiency per interrupt (finds more completions).
> For low interrupt rate (low queue depth) the load is probably low to
> moderate and aggregating before firing an interrupt is just added
> latency for no benefit. So the algorithm tries to transition between a
> number of pre-defined levels according to the load it samples.
>
> This has been widely used by the network drivers for the past decade.
>
> Now, this algorithm while trying to adjust itself by learning the load,
> also adds entropy to the overall system performance and latency.
> So this is not a trivial trade-off for any workload.
>
> I took a stab at this once (came up with something very similar),
> and while for large queue-depth workloads I got up to 2x IOPs as the
> algorithm chose aggressive moderation parameters which improved the
> efficiency a lot, but when the workload varied the algorithm wasn't very
> successful detecting the load and the step direction (I used a variation
> of the same basic algorithm from mlx5 driver that net_dim is based on).
>
> Also, QD=1 resulted in higher latency as the algorithm was dangling
> between the two lowest levels. So I guess this needs to undergo a
> thorough performance evaluation for steady and varying workloads before
> we can consider this.
>
> Overall, I think its a great idea to add that to the rdma subsystem
> but we cannot make it the default and especially without being able
> to turn it off. So this needs to be opt in with a sysctl option.

We can add flag in create_cq command that will 
try_coalescing_is_possible instead of module parameter of course.

Storage ULPs can set it to True.

Also in the internal review Yamin added a table that summarize all the 
testing that were done using NVMeoF (I guess it somehow didn't get to 
this RFC).

I guess we can do the same for iSER to get more confidence and then set 
both to create modifiable cq (if HCA supports, of course).

Agreed ?

>
>
> Moreover, not every device support cq moderation so you need to check
> the device capabilities before you apply any of this.

for sure.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
  2019-03-14 11:45       ` Max Gurtovoy
@ 2019-03-14 21:53         ` Sagi Grimberg
  -1 siblings, 0 replies; 48+ messages in thread
From: Sagi Grimberg @ 2019-03-14 21:53 UTC (permalink / raw)
  To: Max Gurtovoy, Bart Van Assche, Tal Gilboa, linux-rdma,
	linux-nvme, linux-block
  Cc: Yamin Friedman, Leon Romanovsky, Jason Gunthorpe, Yishai Hadas,
	Saeed Mahameed, Doug Ledford, Idan Burstein, Tariq Toukan


>>> What is a "block CQ"?
>>
>> There is no such thing... Also, this has no difference
>> if a block/file/whatever is using the rdma cq.
>>
>> The naming should really be something like rdma_dim as it accounts
>> for completions and not bytes/packets.
> 
> Sagi,
> 
> I think that in the future we could use it in nvme since there is an 
> option to set the interrupt coalescing in NVMe spec.
> 
> This might improve performance for NVMe driver.

That would require changing the spec to make moderation config per-queue
and not controller-wide. This does not apply specifically to block so
naming it with blk does not make sense.

>> Overall, I think its a great idea to add that to the rdma subsystem
>> but we cannot make it the default and especially without being able
>> to turn it off. So this needs to be opt in with a sysctl option.
> 
> We can add flag in create_cq command that will 
> try_coalescing_is_possible instead of module parameter of course.
> 
> Storage ULPs can set it to True.

The point is that it can't be universally on.

> Also in the internal review Yamin added a table that summarize all the 
> testing that were done using NVMeoF (I guess it somehow didn't get to 
> this RFC).
> 
> I guess we can do the same for iSER to get more confidence and then set 
> both to create modifiable cq (if HCA supports, of course).
> 
> Agreed ?

Sure.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-03-14 21:53         ` Sagi Grimberg
  0 siblings, 0 replies; 48+ messages in thread
From: Sagi Grimberg @ 2019-03-14 21:53 UTC (permalink / raw)



>>> What is a "block CQ"?
>>
>> There is no such thing... Also, this has no difference
>> if a block/file/whatever is using the rdma cq.
>>
>> The naming should really be something like rdma_dim as it accounts
>> for completions and not bytes/packets.
> 
> Sagi,
> 
> I think that in the future we could use it in nvme since there is an 
> option to set the interrupt coalescing in NVMe spec.
> 
> This might improve performance for NVMe driver.

That would require changing the spec to make moderation config per-queue
and not controller-wide. This does not apply specifically to block so
naming it with blk does not make sense.

>> Overall, I think its a great idea to add that to the rdma subsystem
>> but we cannot make it the default and especially without being able
>> to turn it off. So this needs to be opt in with a sysctl option.
> 
> We can add flag in create_cq command that will 
> try_coalescing_is_possible instead of module parameter of course.
> 
> Storage ULPs can set it to True.

The point is that it can't be universally on.

> Also in the internal review Yamin added a table that summarize all the 
> testing that were done using NVMeoF (I guess it somehow didn't get to 
> this RFC).
> 
> I guess we can do the same for iSER to get more confidence and then set 
> both to create modifiable cq (if HCA supports, of course).
> 
> Agreed ?

Sure.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
  2019-03-14 11:45       ` Max Gurtovoy
@ 2019-03-18  9:24         ` Yamin Friedman
  -1 siblings, 0 replies; 48+ messages in thread
From: Yamin Friedman @ 2019-03-18  9:24 UTC (permalink / raw)
  To: Max Gurtovoy, Sagi Grimberg, Bart Van Assche, Tal Gilboa,
	linux-rdma, linux-nvme, linux-block
  Cc: Leon Romanovsky, Jason Gunthorpe, Yishai Hadas, Saeed Mahameed,
	Doug Ledford, Idan Burstein, Tariq Toukan


On 3/14/2019 1:45 PM, Max Gurtovoy wrote:
>
> On 3/7/2019 3:56 AM, Sagi Grimberg wrote:
>>
>>>> net_dim.h lib exposes an implementation of the DIM algorithm for 
>>>> dynamically-tuned interrupt
>>>> moderation for networking interfaces.
>>>>
>>>> We need the same behavior for any block CQ. The main motivation is 
>>>> two benefit from maximized
>>>> completion rate and reduced interrupt overhead that DIM may provide.
>>>
>>> What is a "block CQ"?
>>
>> There is no such thing... Also, this has no difference
>> if a block/file/whatever is using the rdma cq.
>>
>> The naming should really be something like rdma_dim as it accounts
>> for completions and not bytes/packets.
>
> Sagi,
>
> I think that in the future we could use it in nvme since there is an 
> option to set the interrupt coalescing in NVMe spec.
>
> This might improve performance for NVMe driver.
>
> We already see some bottlenecks in performance (maybe driver ones) 
> while developing the NVMe SNAP feature in Bluefield (NVMe emulation 
> using Smart NIC).
>
> We're trying to get 2.5-2.7 MIOPs OOB from 1 controller and it's not 
> trivial for today's driver.
>
> So let's take this into consideration when we set the naming.
>
>
I agree that blk is not the most successful name, we were trying to find 
something that would work for general storage applications. I think 
rdma_dim would work as it is completion based but then when we want to 
use it for nvme it will probably require code duplication.

>>
>>
>>> How does net_dim compare to lib/irq_poll?
>>
>> Its orthogonal, its basically adaptive interrupt moderation for
>> RDMA devices. Its sort of below the irq_poll code. It basically
>> configures interrupt moderation based on stats collected by
>> the rdma driver.
>>
>>> Which approach results in the best performance and lowest latency?
>>
>> I guess it depends on what is the test case. This approach tries to
>> apply some time or completion count limit to when the HW should fire
>> an interrupt based on the load in an adaptive fashion.
>>
>> The scheme is to try and detect what are the load characteristics and
>> come up with a moderation parameters that fit. For high interrupt rate
>> (usually seen with small size high queue-depth workloads) it configures
>> the device to aggregate some more before firing an interrupt - so less
>> interrupts, better efficiency per interrupt (finds more completions).
>> For low interrupt rate (low queue depth) the load is probably low to
>> moderate and aggregating before firing an interrupt is just added
>> latency for no benefit. So the algorithm tries to transition between a
>> number of pre-defined levels according to the load it samples.
>>
>> This has been widely used by the network drivers for the past decade.
>>
>> Now, this algorithm while trying to adjust itself by learning the load,
>> also adds entropy to the overall system performance and latency.
>> So this is not a trivial trade-off for any workload.
>>
>> I took a stab at this once (came up with something very similar),
>> and while for large queue-depth workloads I got up to 2x IOPs as the
>> algorithm chose aggressive moderation parameters which improved the
>> efficiency a lot, but when the workload varied the algorithm wasn't very
>> successful detecting the load and the step direction (I used a variation
>> of the same basic algorithm from mlx5 driver that net_dim is based on).
>>
>> Also, QD=1 resulted in higher latency as the algorithm was dangling
>> between the two lowest levels. So I guess this needs to undergo a
>> thorough performance evaluation for steady and varying workloads before
>> we can consider this.
>>
>> Overall, I think its a great idea to add that to the rdma subsystem
>> but we cannot make it the default and especially without being able
>> to turn it off. So this needs to be opt in with a sysctl option.
>
> We can add flag in create_cq command that will 
> try_coalescing_is_possible instead of module parameter of course.
>
> Storage ULPs can set it to True.
>
> Also in the internal review Yamin added a table that summarize all the 
> testing that were done using NVMeoF (I guess it somehow didn't get to 
> this RFC).
>
> I guess we can do the same for iSER to get more confidence and then 
> set both to create modifiable cq (if HCA supports, of course).
>
> Agreed ?
>
I think that adding a flag in create_cq will be less clean as it will 
require more work for anyone writing applications that should not have 
to consider this feature.

Based on the results I saw during testing I would set it to work by 
default as I could not find a use case where it significantly reduces 
performance and in many cases it is a large improvement. It should be 
more of an opt out situation.

Performance improvement (ConnectX-5 100GbE, x86) running FIO benchmark over
     NVMf between two equal end-hosts with 56 cores across a Mellanox switch
     using null_blk device:

     IO READS before:
     blk size | BW      | IOPS | 99th percentile latency
     512B     | 3.2GiB  | 6.6M | 1549  usec
     4k       | 7.2GiB  | 1.8M | 7177  usec
     64k      | 10.7GiB | 176k | 82314 usec

     IO READS after:
     blk size | BW      | IOPS | 99th percentile latency
     512B     | 4.2GiB  | 8.6M | 1729   usec
     4k       | 10.5GiB | 2.7M | 5669   usec
     64k      | 10.7GiB | 176k | 102000 usec

     IO WRITES before:
     blk size | BW      | IOPS | 99th percentile latency
     512B     | 3GiB    | 6.2M | 2573  usec
     4k       | 7.2GiB  | 1.8M | 5342  usec
     64k      | 10.7GiB | 176k | 62129 usec

     IO WRITES after:
     blk size | BW      | IOPS  | 99th percentile latency
     512B     | 4.2GiB  | 8.6M  | 938   usec
     4k       | 10.2GiB | 2.68M | 2769  usec
     64k      | 10.6GiB | 173k  | 87557 usec

It doesn't really make a difference to me how the option is implemented 
but I think it makes more sense to have it dealt with by us such as in a 
module parameter and not something like a flag that has a larger radius 
of effect.

>>
>>
>> Moreover, not every device support cq moderation so you need to check
>> the device capabilities before you apply any of this.
>
> for sure.
>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-03-18  9:24         ` Yamin Friedman
  0 siblings, 0 replies; 48+ messages in thread
From: Yamin Friedman @ 2019-03-18  9:24 UTC (permalink / raw)



On 3/14/2019 1:45 PM, Max Gurtovoy wrote:
>
> On 3/7/2019 3:56 AM, Sagi Grimberg wrote:
>>
>>>> net_dim.h lib exposes an implementation of the DIM algorithm for 
>>>> dynamically-tuned interrupt
>>>> moderation for networking interfaces.
>>>>
>>>> We need the same behavior for any block CQ. The main motivation is 
>>>> two benefit from maximized
>>>> completion rate and reduced interrupt overhead that DIM may provide.
>>>
>>> What is a "block CQ"?
>>
>> There is no such thing... Also, this has no difference
>> if a block/file/whatever is using the rdma cq.
>>
>> The naming should really be something like rdma_dim as it accounts
>> for completions and not bytes/packets.
>
> Sagi,
>
> I think that in the future we could use it in nvme since there is an 
> option to set the interrupt coalescing in NVMe spec.
>
> This might improve performance for NVMe driver.
>
> We already see some bottlenecks in performance (maybe driver ones) 
> while developing the NVMe SNAP feature in Bluefield (NVMe emulation 
> using Smart NIC).
>
> We're trying to get 2.5-2.7 MIOPs OOB from 1 controller and it's not 
> trivial for today's driver.
>
> So let's take this into consideration when we set the naming.
>
>
I agree that blk is not the most successful name, we were trying to find 
something that would work for general storage applications. I think 
rdma_dim would work as it is completion based but then when we want to 
use it for nvme it will probably require code duplication.

>>
>>
>>> How does net_dim compare to lib/irq_poll?
>>
>> Its orthogonal, its basically adaptive interrupt moderation for
>> RDMA devices. Its sort of below the irq_poll code. It basically
>> configures interrupt moderation based on stats collected by
>> the rdma driver.
>>
>>> Which approach results in the best performance and lowest latency?
>>
>> I guess it depends on what is the test case. This approach tries to
>> apply some time or completion count limit to when the HW should fire
>> an interrupt based on the load in an adaptive fashion.
>>
>> The scheme is to try and detect what are the load characteristics and
>> come up with a moderation parameters that fit. For high interrupt rate
>> (usually seen with small size high queue-depth workloads) it configures
>> the device to aggregate some more before firing an interrupt - so less
>> interrupts, better efficiency per interrupt (finds more completions).
>> For low interrupt rate (low queue depth) the load is probably low to
>> moderate and aggregating before firing an interrupt is just added
>> latency for no benefit. So the algorithm tries to transition between a
>> number of pre-defined levels according to the load it samples.
>>
>> This has been widely used by the network drivers for the past decade.
>>
>> Now, this algorithm while trying to adjust itself by learning the load,
>> also adds entropy to the overall system performance and latency.
>> So this is not a trivial trade-off for any workload.
>>
>> I took a stab at this once (came up with something very similar),
>> and while for large queue-depth workloads I got up to 2x IOPs as the
>> algorithm chose aggressive moderation parameters which improved the
>> efficiency a lot, but when the workload varied the algorithm wasn't very
>> successful detecting the load and the step direction (I used a variation
>> of the same basic algorithm from mlx5 driver that net_dim is based on).
>>
>> Also, QD=1 resulted in higher latency as the algorithm was dangling
>> between the two lowest levels. So I guess this needs to undergo a
>> thorough performance evaluation for steady and varying workloads before
>> we can consider this.
>>
>> Overall, I think its a great idea to add that to the rdma subsystem
>> but we cannot make it the default and especially without being able
>> to turn it off. So this needs to be opt in with a sysctl option.
>
> We can add flag in create_cq command that will 
> try_coalescing_is_possible instead of module parameter of course.
>
> Storage ULPs can set it to True.
>
> Also in the internal review Yamin added a table that summarize all the 
> testing that were done using NVMeoF (I guess it somehow didn't get to 
> this RFC).
>
> I guess we can do the same for iSER to get more confidence and then 
> set both to create modifiable cq (if HCA supports, of course).
>
> Agreed ?
>
I think that adding a flag in create_cq will be less clean as it will 
require more work for anyone writing applications that should not have 
to consider this feature.

Based on the results I saw during testing I would set it to work by 
default as I could not find a use case where it significantly reduces 
performance and in many cases it is a large improvement. It should be 
more of an opt out situation.

Performance improvement (ConnectX-5 100GbE, x86) running FIO benchmark over
 ??? NVMf between two equal end-hosts with 56 cores across a Mellanox switch
 ??? using null_blk device:

 ??? IO READS before:
 ??? blk size | BW????? | IOPS | 99th percentile latency
 ??? 512B???? | 3.2GiB? | 6.6M | 1549? usec
 ??? 4k?????? | 7.2GiB? | 1.8M | 7177? usec
 ??? 64k????? | 10.7GiB | 176k | 82314 usec

 ??? IO READS after:
 ??? blk size | BW????? | IOPS | 99th percentile latency
 ??? 512B???? | 4.2GiB? | 8.6M | 1729?? usec
 ??? 4k?????? | 10.5GiB | 2.7M | 5669?? usec
 ??? 64k????? | 10.7GiB | 176k | 102000 usec

 ??? IO WRITES before:
 ??? blk size | BW????? | IOPS | 99th percentile latency
 ??? 512B???? | 3GiB??? | 6.2M | 2573? usec
 ??? 4k?????? | 7.2GiB? | 1.8M | 5342? usec
 ??? 64k????? | 10.7GiB | 176k | 62129 usec

 ??? IO WRITES after:
 ??? blk size | BW????? | IOPS? | 99th percentile latency
 ??? 512B???? | 4.2GiB? | 8.6M? | 938?? usec
 ??? 4k?????? | 10.2GiB | 2.68M | 2769? usec
 ??? 64k????? | 10.6GiB | 173k? | 87557 usec

It doesn't really make a difference to me how the option is implemented 
but I think it makes more sense to have it dealt with by us such as in a 
module parameter and not something like a flag that has a larger radius 
of effect.

>>
>>
>> Moreover, not every device support cq moderation so you need to check
>> the device capabilities before you apply any of this.
>
> for sure.
>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
  2019-03-18  9:24         ` Yamin Friedman
@ 2019-03-18 11:08           ` Max Gurtovoy
  -1 siblings, 0 replies; 48+ messages in thread
From: Max Gurtovoy @ 2019-03-18 11:08 UTC (permalink / raw)
  To: Yamin Friedman, Sagi Grimberg, Bart Van Assche, Tal Gilboa,
	linux-rdma, linux-nvme, linux-block
  Cc: Leon Romanovsky, Jason Gunthorpe, Yishai Hadas, Saeed Mahameed,
	Doug Ledford, Idan Burstein, Tariq Toukan, Christoph Hellwig


On 3/18/2019 11:24 AM, Yamin Friedman wrote:
> On 3/14/2019 1:45 PM, Max Gurtovoy wrote:
>> On 3/7/2019 3:56 AM, Sagi Grimberg wrote:
>>>>> net_dim.h lib exposes an implementation of the DIM algorithm for
>>>>> dynamically-tuned interrupt
>>>>> moderation for networking interfaces.
>>>>>
>>>>> We need the same behavior for any block CQ. The main motivation is
>>>>> two benefit from maximized
>>>>> completion rate and reduced interrupt overhead that DIM may provide.
>>>> What is a "block CQ"?
>>> There is no such thing... Also, this has no difference
>>> if a block/file/whatever is using the rdma cq.
>>>
>>> The naming should really be something like rdma_dim as it accounts
>>> for completions and not bytes/packets.
>> Sagi,
>>
>> I think that in the future we could use it in nvme since there is an
>> option to set the interrupt coalescing in NVMe spec.
>>
>> This might improve performance for NVMe driver.
>>
>> We already see some bottlenecks in performance (maybe driver ones)
>> while developing the NVMe SNAP feature in Bluefield (NVMe emulation
>> using Smart NIC).
>>
>> We're trying to get 2.5-2.7 MIOPs OOB from 1 controller and it's not
>> trivial for today's driver.
>>
>> So let's take this into consideration when we set the naming.
>>
>>
> I agree that blk is not the most successful name, we were trying to find
> something that would work for general storage applications. I think
> rdma_dim would work as it is completion based but then when we want to
> use it for nvme it will probably require code duplication.

agreed for rdma_dim.

Yamin/Idan,

let's discuss internally regarding Sagi's proposal for 
moderation/coalescing per NVMe queue (and not per controller).

maybe need to update specification.


>
>>>
>>>> How does net_dim compare to lib/irq_poll?
>>> Its orthogonal, its basically adaptive interrupt moderation for
>>> RDMA devices. Its sort of below the irq_poll code. It basically
>>> configures interrupt moderation based on stats collected by
>>> the rdma driver.
>>>
>>>> Which approach results in the best performance and lowest latency?
>>> I guess it depends on what is the test case. This approach tries to
>>> apply some time or completion count limit to when the HW should fire
>>> an interrupt based on the load in an adaptive fashion.
>>>
>>> The scheme is to try and detect what are the load characteristics and
>>> come up with a moderation parameters that fit. For high interrupt rate
>>> (usually seen with small size high queue-depth workloads) it configures
>>> the device to aggregate some more before firing an interrupt - so less
>>> interrupts, better efficiency per interrupt (finds more completions).
>>> For low interrupt rate (low queue depth) the load is probably low to
>>> moderate and aggregating before firing an interrupt is just added
>>> latency for no benefit. So the algorithm tries to transition between a
>>> number of pre-defined levels according to the load it samples.
>>>
>>> This has been widely used by the network drivers for the past decade.
>>>
>>> Now, this algorithm while trying to adjust itself by learning the load,
>>> also adds entropy to the overall system performance and latency.
>>> So this is not a trivial trade-off for any workload.
>>>
>>> I took a stab at this once (came up with something very similar),
>>> and while for large queue-depth workloads I got up to 2x IOPs as the
>>> algorithm chose aggressive moderation parameters which improved the
>>> efficiency a lot, but when the workload varied the algorithm wasn't very
>>> successful detecting the load and the step direction (I used a variation
>>> of the same basic algorithm from mlx5 driver that net_dim is based on).
>>>
>>> Also, QD=1 resulted in higher latency as the algorithm was dangling
>>> between the two lowest levels. So I guess this needs to undergo a
>>> thorough performance evaluation for steady and varying workloads before
>>> we can consider this.
>>>
>>> Overall, I think its a great idea to add that to the rdma subsystem
>>> but we cannot make it the default and especially without being able
>>> to turn it off. So this needs to be opt in with a sysctl option.
>> We can add flag in create_cq command that will
>> try_coalescing_is_possible instead of module parameter of course.
>>
>> Storage ULPs can set it to True.
>>
>> Also in the internal review Yamin added a table that summarize all the
>> testing that were done using NVMeoF (I guess it somehow didn't get to
>> this RFC).
>>
>> I guess we can do the same for iSER to get more confidence and then
>> set both to create modifiable cq (if HCA supports, of course).
>>
>> Agreed ?
>>
> I think that adding a flag in create_cq will be less clean as it will
> require more work for anyone writing applications that should not have
> to consider this feature.


As we discussed, let's check with RDMA maintainers if it's better to 
extend alloc_cq API or create alloc_cq_dim API function.

Sagi/Christoph,

how about adding a module param per ULP ? as we use register_always 
today, create a use_dimm module param for iSER/NVMe-RDMA ?

>
> Based on the results I saw during testing I would set it to work by
> default as I could not find a use case where it significantly reduces
> performance and in many cases it is a large improvement. It should be
> more of an opt out situation.
>
> Performance improvement (ConnectX-5 100GbE, x86) running FIO benchmark over
>       NVMf between two equal end-hosts with 56 cores across a Mellanox switch
>       using null_blk device:
>
>       IO READS before:
>       blk size | BW      | IOPS | 99th percentile latency
>       512B     | 3.2GiB  | 6.6M | 1549  usec
>       4k       | 7.2GiB  | 1.8M | 7177  usec
>       64k      | 10.7GiB | 176k | 82314 usec
>
>       IO READS after:
>       blk size | BW      | IOPS | 99th percentile latency
>       512B     | 4.2GiB  | 8.6M | 1729   usec
>       4k       | 10.5GiB | 2.7M | 5669   usec
>       64k      | 10.7GiB | 176k | 102000 usec
>
>       IO WRITES before:
>       blk size | BW      | IOPS | 99th percentile latency
>       512B     | 3GiB    | 6.2M | 2573  usec
>       4k       | 7.2GiB  | 1.8M | 5342  usec
>       64k      | 10.7GiB | 176k | 62129 usec
>
>       IO WRITES after:
>       blk size | BW      | IOPS  | 99th percentile latency
>       512B     | 4.2GiB  | 8.6M  | 938   usec
>       4k       | 10.2GiB | 2.68M | 2769  usec
>       64k      | 10.6GiB | 173k  | 87557 usec
>
> It doesn't really make a difference to me how the option is implemented
> but I think it makes more sense to have it dealt with by us such as in a
> module parameter and not something like a flag that has a larger radius
> of effect.
>
>>>
>>> Moreover, not every device support cq moderation so you need to check
>>> the device capabilities before you apply any of this.
>> for sure.
>>
>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-03-18 11:08           ` Max Gurtovoy
  0 siblings, 0 replies; 48+ messages in thread
From: Max Gurtovoy @ 2019-03-18 11:08 UTC (permalink / raw)



On 3/18/2019 11:24 AM, Yamin Friedman wrote:
> On 3/14/2019 1:45 PM, Max Gurtovoy wrote:
>> On 3/7/2019 3:56 AM, Sagi Grimberg wrote:
>>>>> net_dim.h lib exposes an implementation of the DIM algorithm for
>>>>> dynamically-tuned interrupt
>>>>> moderation for networking interfaces.
>>>>>
>>>>> We need the same behavior for any block CQ. The main motivation is
>>>>> two benefit from maximized
>>>>> completion rate and reduced interrupt overhead that DIM may provide.
>>>> What is a "block CQ"?
>>> There is no such thing... Also, this has no difference
>>> if a block/file/whatever is using the rdma cq.
>>>
>>> The naming should really be something like rdma_dim as it accounts
>>> for completions and not bytes/packets.
>> Sagi,
>>
>> I think that in the future we could use it in nvme since there is an
>> option to set the interrupt coalescing in NVMe spec.
>>
>> This might improve performance for NVMe driver.
>>
>> We already see some bottlenecks in performance (maybe driver ones)
>> while developing the NVMe SNAP feature in Bluefield (NVMe emulation
>> using Smart NIC).
>>
>> We're trying to get 2.5-2.7 MIOPs OOB from 1 controller and it's not
>> trivial for today's driver.
>>
>> So let's take this into consideration when we set the naming.
>>
>>
> I agree that blk is not the most successful name, we were trying to find
> something that would work for general storage applications. I think
> rdma_dim would work as it is completion based but then when we want to
> use it for nvme it will probably require code duplication.

agreed for rdma_dim.

Yamin/Idan,

let's discuss internally regarding Sagi's proposal for 
moderation/coalescing per NVMe queue (and not per controller).

maybe need to update specification.


>
>>>
>>>> How does net_dim compare to lib/irq_poll?
>>> Its orthogonal, its basically adaptive interrupt moderation for
>>> RDMA devices. Its sort of below the irq_poll code. It basically
>>> configures interrupt moderation based on stats collected by
>>> the rdma driver.
>>>
>>>> Which approach results in the best performance and lowest latency?
>>> I guess it depends on what is the test case. This approach tries to
>>> apply some time or completion count limit to when the HW should fire
>>> an interrupt based on the load in an adaptive fashion.
>>>
>>> The scheme is to try and detect what are the load characteristics and
>>> come up with a moderation parameters that fit. For high interrupt rate
>>> (usually seen with small size high queue-depth workloads) it configures
>>> the device to aggregate some more before firing an interrupt - so less
>>> interrupts, better efficiency per interrupt (finds more completions).
>>> For low interrupt rate (low queue depth) the load is probably low to
>>> moderate and aggregating before firing an interrupt is just added
>>> latency for no benefit. So the algorithm tries to transition between a
>>> number of pre-defined levels according to the load it samples.
>>>
>>> This has been widely used by the network drivers for the past decade.
>>>
>>> Now, this algorithm while trying to adjust itself by learning the load,
>>> also adds entropy to the overall system performance and latency.
>>> So this is not a trivial trade-off for any workload.
>>>
>>> I took a stab at this once (came up with something very similar),
>>> and while for large queue-depth workloads I got up to 2x IOPs as the
>>> algorithm chose aggressive moderation parameters which improved the
>>> efficiency a lot, but when the workload varied the algorithm wasn't very
>>> successful detecting the load and the step direction (I used a variation
>>> of the same basic algorithm from mlx5 driver that net_dim is based on).
>>>
>>> Also, QD=1 resulted in higher latency as the algorithm was dangling
>>> between the two lowest levels. So I guess this needs to undergo a
>>> thorough performance evaluation for steady and varying workloads before
>>> we can consider this.
>>>
>>> Overall, I think its a great idea to add that to the rdma subsystem
>>> but we cannot make it the default and especially without being able
>>> to turn it off. So this needs to be opt in with a sysctl option.
>> We can add flag in create_cq command that will
>> try_coalescing_is_possible instead of module parameter of course.
>>
>> Storage ULPs can set it to True.
>>
>> Also in the internal review Yamin added a table that summarize all the
>> testing that were done using NVMeoF (I guess it somehow didn't get to
>> this RFC).
>>
>> I guess we can do the same for iSER to get more confidence and then
>> set both to create modifiable cq (if HCA supports, of course).
>>
>> Agreed ?
>>
> I think that adding a flag in create_cq will be less clean as it will
> require more work for anyone writing applications that should not have
> to consider this feature.


As we discussed, let's check with RDMA maintainers if it's better to 
extend alloc_cq API or create alloc_cq_dim API function.

Sagi/Christoph,

how about adding a module param per ULP ? as we use register_always 
today, create a use_dimm module param for iSER/NVMe-RDMA ?

>
> Based on the results I saw during testing I would set it to work by
> default as I could not find a use case where it significantly reduces
> performance and in many cases it is a large improvement. It should be
> more of an opt out situation.
>
> Performance improvement (ConnectX-5 100GbE, x86) running FIO benchmark over
>   ??? NVMf between two equal end-hosts with 56 cores across a Mellanox switch
>   ??? using null_blk device:
>
>   ??? IO READS before:
>   ??? blk size | BW????? | IOPS | 99th percentile latency
>   ??? 512B???? | 3.2GiB? | 6.6M | 1549? usec
>   ??? 4k?????? | 7.2GiB? | 1.8M | 7177? usec
>   ??? 64k????? | 10.7GiB | 176k | 82314 usec
>
>   ??? IO READS after:
>   ??? blk size | BW????? | IOPS | 99th percentile latency
>   ??? 512B???? | 4.2GiB? | 8.6M | 1729?? usec
>   ??? 4k?????? | 10.5GiB | 2.7M | 5669?? usec
>   ??? 64k????? | 10.7GiB | 176k | 102000 usec
>
>   ??? IO WRITES before:
>   ??? blk size | BW????? | IOPS | 99th percentile latency
>   ??? 512B???? | 3GiB??? | 6.2M | 2573? usec
>   ??? 4k?????? | 7.2GiB? | 1.8M | 5342? usec
>   ??? 64k????? | 10.7GiB | 176k | 62129 usec
>
>   ??? IO WRITES after:
>   ??? blk size | BW????? | IOPS? | 99th percentile latency
>   ??? 512B???? | 4.2GiB? | 8.6M? | 938?? usec
>   ??? 4k?????? | 10.2GiB | 2.68M | 2769? usec
>   ??? 64k????? | 10.6GiB | 173k? | 87557 usec
>
> It doesn't really make a difference to me how the option is implemented
> but I think it makes more sense to have it dealt with by us such as in a
> module parameter and not something like a flag that has a larger radius
> of effect.
>
>>>
>>> Moreover, not every device support cq moderation so you need to check
>>> the device capabilities before you apply any of this.
>> for sure.
>>
>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
  2019-03-18 11:08           ` Max Gurtovoy
@ 2019-03-18 15:05             ` Max Gurtovoy
  -1 siblings, 0 replies; 48+ messages in thread
From: Max Gurtovoy @ 2019-03-18 15:05 UTC (permalink / raw)
  To: Yamin Friedman, Sagi Grimberg, Bart Van Assche, Tal Gilboa,
	linux-rdma, linux-nvme, linux-block
  Cc: Leon Romanovsky, Jason Gunthorpe, Yishai Hadas, Saeed Mahameed,
	Doug Ledford, Idan Burstein, Tariq Toukan, Christoph Hellwig


On 3/18/2019 1:08 PM, Max Gurtovoy wrote:
>
> On 3/18/2019 11:24 AM, Yamin Friedman wrote:
>> On 3/14/2019 1:45 PM, Max Gurtovoy wrote:
>>> On 3/7/2019 3:56 AM, Sagi Grimberg wrote:
>>>>>> net_dim.h lib exposes an implementation of the DIM algorithm for
>>>>>> dynamically-tuned interrupt
>>>>>> moderation for networking interfaces.
>>>>>>
>>>>>> We need the same behavior for any block CQ. The main motivation is
>>>>>> two benefit from maximized
>>>>>> completion rate and reduced interrupt overhead that DIM may provide.
>>>>> What is a "block CQ"?
>>>> There is no such thing... Also, this has no difference
>>>> if a block/file/whatever is using the rdma cq.
>>>>
>>>> The naming should really be something like rdma_dim as it accounts
>>>> for completions and not bytes/packets.
>>> Sagi,
>>>
>>> I think that in the future we could use it in nvme since there is an
>>> option to set the interrupt coalescing in NVMe spec.
>>>
>>> This might improve performance for NVMe driver.
>>>
>>> We already see some bottlenecks in performance (maybe driver ones)
>>> while developing the NVMe SNAP feature in Bluefield (NVMe emulation
>>> using Smart NIC).
>>>
>>> We're trying to get 2.5-2.7 MIOPs OOB from 1 controller and it's not
>>> trivial for today's driver.
>>>
>>> So let's take this into consideration when we set the naming.
>>>
>>>
>> I agree that blk is not the most successful name, we were trying to find
>> something that would work for general storage applications. I think
>> rdma_dim would work as it is completion based but then when we want to
>> use it for nvme it will probably require code duplication.
>
> agreed for rdma_dim.
>
> Yamin/Idan,
>
> let's discuss internally regarding Sagi's proposal for 
> moderation/coalescing per NVMe queue (and not per controller).
>
> maybe need to update specification.
>
>
>>
>>>>
>>>>> How does net_dim compare to lib/irq_poll?
>>>> Its orthogonal, its basically adaptive interrupt moderation for
>>>> RDMA devices. Its sort of below the irq_poll code. It basically
>>>> configures interrupt moderation based on stats collected by
>>>> the rdma driver.
>>>>
>>>>> Which approach results in the best performance and lowest latency?
>>>> I guess it depends on what is the test case. This approach tries to
>>>> apply some time or completion count limit to when the HW should fire
>>>> an interrupt based on the load in an adaptive fashion.
>>>>
>>>> The scheme is to try and detect what are the load characteristics and
>>>> come up with a moderation parameters that fit. For high interrupt rate
>>>> (usually seen with small size high queue-depth workloads) it 
>>>> configures
>>>> the device to aggregate some more before firing an interrupt - so less
>>>> interrupts, better efficiency per interrupt (finds more completions).
>>>> For low interrupt rate (low queue depth) the load is probably low to
>>>> moderate and aggregating before firing an interrupt is just added
>>>> latency for no benefit. So the algorithm tries to transition between a
>>>> number of pre-defined levels according to the load it samples.
>>>>
>>>> This has been widely used by the network drivers for the past decade.
>>>>
>>>> Now, this algorithm while trying to adjust itself by learning the 
>>>> load,
>>>> also adds entropy to the overall system performance and latency.
>>>> So this is not a trivial trade-off for any workload.
>>>>
>>>> I took a stab at this once (came up with something very similar),
>>>> and while for large queue-depth workloads I got up to 2x IOPs as the
>>>> algorithm chose aggressive moderation parameters which improved the
>>>> efficiency a lot, but when the workload varied the algorithm wasn't 
>>>> very
>>>> successful detecting the load and the step direction (I used a 
>>>> variation
>>>> of the same basic algorithm from mlx5 driver that net_dim is based 
>>>> on).
>>>>
>>>> Also, QD=1 resulted in higher latency as the algorithm was dangling
>>>> between the two lowest levels. So I guess this needs to undergo a
>>>> thorough performance evaluation for steady and varying workloads 
>>>> before
>>>> we can consider this.
>>>>
>>>> Overall, I think its a great idea to add that to the rdma subsystem
>>>> but we cannot make it the default and especially without being able
>>>> to turn it off. So this needs to be opt in with a sysctl option.
>>> We can add flag in create_cq command that will
>>> try_coalescing_is_possible instead of module parameter of course.
>>>
>>> Storage ULPs can set it to True.
>>>
>>> Also in the internal review Yamin added a table that summarize all the
>>> testing that were done using NVMeoF (I guess it somehow didn't get to
>>> this RFC).
>>>
>>> I guess we can do the same for iSER to get more confidence and then
>>> set both to create modifiable cq (if HCA supports, of course).
>>>
>>> Agreed ?
>>>
>> I think that adding a flag in create_cq will be less clean as it will
>> require more work for anyone writing applications that should not have
>> to consider this feature.
>
>
> As we discussed, let's check with RDMA maintainers if it's better to 
> extend alloc_cq API or create alloc_cq_dim API function.
>
> Sagi/Christoph,
>
> how about adding a module param per ULP ? as we use register_always 
> today, create a use_dimm module param for iSER/NVMe-RDMA ?

another option is to add it to each ULP user space utility (nvme-cli, 
iscsiadm, etc...)

thoughts ?


>
>
>>
>> Based on the results I saw during testing I would set it to work by
>> default as I could not find a use case where it significantly reduces
>> performance and in many cases it is a large improvement. It should be
>> more of an opt out situation.
>>
>> Performance improvement (ConnectX-5 100GbE, x86) running FIO 
>> benchmark over
>>       NVMf between two equal end-hosts with 56 cores across a 
>> Mellanox switch
>>       using null_blk device:
>>
>>       IO READS before:
>>       blk size | BW      | IOPS | 99th percentile latency
>>       512B     | 3.2GiB  | 6.6M | 1549  usec
>>       4k       | 7.2GiB  | 1.8M | 7177  usec
>>       64k      | 10.7GiB | 176k | 82314 usec
>>
>>       IO READS after:
>>       blk size | BW      | IOPS | 99th percentile latency
>>       512B     | 4.2GiB  | 8.6M | 1729   usec
>>       4k       | 10.5GiB | 2.7M | 5669   usec
>>       64k      | 10.7GiB | 176k | 102000 usec
>>
>>       IO WRITES before:
>>       blk size | BW      | IOPS | 99th percentile latency
>>       512B     | 3GiB    | 6.2M | 2573  usec
>>       4k       | 7.2GiB  | 1.8M | 5342  usec
>>       64k      | 10.7GiB | 176k | 62129 usec
>>
>>       IO WRITES after:
>>       blk size | BW      | IOPS  | 99th percentile latency
>>       512B     | 4.2GiB  | 8.6M  | 938   usec
>>       4k       | 10.2GiB | 2.68M | 2769  usec
>>       64k      | 10.6GiB | 173k  | 87557 usec
>>
>> It doesn't really make a difference to me how the option is implemented
>> but I think it makes more sense to have it dealt with by us such as in a
>> module parameter and not something like a flag that has a larger radius
>> of effect.
>>
>>>>
>>>> Moreover, not every device support cq moderation so you need to check
>>>> the device capabilities before you apply any of this.
>>> for sure.
>>>
>>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-03-18 15:05             ` Max Gurtovoy
  0 siblings, 0 replies; 48+ messages in thread
From: Max Gurtovoy @ 2019-03-18 15:05 UTC (permalink / raw)



On 3/18/2019 1:08 PM, Max Gurtovoy wrote:
>
> On 3/18/2019 11:24 AM, Yamin Friedman wrote:
>> On 3/14/2019 1:45 PM, Max Gurtovoy wrote:
>>> On 3/7/2019 3:56 AM, Sagi Grimberg wrote:
>>>>>> net_dim.h lib exposes an implementation of the DIM algorithm for
>>>>>> dynamically-tuned interrupt
>>>>>> moderation for networking interfaces.
>>>>>>
>>>>>> We need the same behavior for any block CQ. The main motivation is
>>>>>> two benefit from maximized
>>>>>> completion rate and reduced interrupt overhead that DIM may provide.
>>>>> What is a "block CQ"?
>>>> There is no such thing... Also, this has no difference
>>>> if a block/file/whatever is using the rdma cq.
>>>>
>>>> The naming should really be something like rdma_dim as it accounts
>>>> for completions and not bytes/packets.
>>> Sagi,
>>>
>>> I think that in the future we could use it in nvme since there is an
>>> option to set the interrupt coalescing in NVMe spec.
>>>
>>> This might improve performance for NVMe driver.
>>>
>>> We already see some bottlenecks in performance (maybe driver ones)
>>> while developing the NVMe SNAP feature in Bluefield (NVMe emulation
>>> using Smart NIC).
>>>
>>> We're trying to get 2.5-2.7 MIOPs OOB from 1 controller and it's not
>>> trivial for today's driver.
>>>
>>> So let's take this into consideration when we set the naming.
>>>
>>>
>> I agree that blk is not the most successful name, we were trying to find
>> something that would work for general storage applications. I think
>> rdma_dim would work as it is completion based but then when we want to
>> use it for nvme it will probably require code duplication.
>
> agreed for rdma_dim.
>
> Yamin/Idan,
>
> let's discuss internally regarding Sagi's proposal for 
> moderation/coalescing per NVMe queue (and not per controller).
>
> maybe need to update specification.
>
>
>>
>>>>
>>>>> How does net_dim compare to lib/irq_poll?
>>>> Its orthogonal, its basically adaptive interrupt moderation for
>>>> RDMA devices. Its sort of below the irq_poll code. It basically
>>>> configures interrupt moderation based on stats collected by
>>>> the rdma driver.
>>>>
>>>>> Which approach results in the best performance and lowest latency?
>>>> I guess it depends on what is the test case. This approach tries to
>>>> apply some time or completion count limit to when the HW should fire
>>>> an interrupt based on the load in an adaptive fashion.
>>>>
>>>> The scheme is to try and detect what are the load characteristics and
>>>> come up with a moderation parameters that fit. For high interrupt rate
>>>> (usually seen with small size high queue-depth workloads) it 
>>>> configures
>>>> the device to aggregate some more before firing an interrupt - so less
>>>> interrupts, better efficiency per interrupt (finds more completions).
>>>> For low interrupt rate (low queue depth) the load is probably low to
>>>> moderate and aggregating before firing an interrupt is just added
>>>> latency for no benefit. So the algorithm tries to transition between a
>>>> number of pre-defined levels according to the load it samples.
>>>>
>>>> This has been widely used by the network drivers for the past decade.
>>>>
>>>> Now, this algorithm while trying to adjust itself by learning the 
>>>> load,
>>>> also adds entropy to the overall system performance and latency.
>>>> So this is not a trivial trade-off for any workload.
>>>>
>>>> I took a stab at this once (came up with something very similar),
>>>> and while for large queue-depth workloads I got up to 2x IOPs as the
>>>> algorithm chose aggressive moderation parameters which improved the
>>>> efficiency a lot, but when the workload varied the algorithm wasn't 
>>>> very
>>>> successful detecting the load and the step direction (I used a 
>>>> variation
>>>> of the same basic algorithm from mlx5 driver that net_dim is based 
>>>> on).
>>>>
>>>> Also, QD=1 resulted in higher latency as the algorithm was dangling
>>>> between the two lowest levels. So I guess this needs to undergo a
>>>> thorough performance evaluation for steady and varying workloads 
>>>> before
>>>> we can consider this.
>>>>
>>>> Overall, I think its a great idea to add that to the rdma subsystem
>>>> but we cannot make it the default and especially without being able
>>>> to turn it off. So this needs to be opt in with a sysctl option.
>>> We can add flag in create_cq command that will
>>> try_coalescing_is_possible instead of module parameter of course.
>>>
>>> Storage ULPs can set it to True.
>>>
>>> Also in the internal review Yamin added a table that summarize all the
>>> testing that were done using NVMeoF (I guess it somehow didn't get to
>>> this RFC).
>>>
>>> I guess we can do the same for iSER to get more confidence and then
>>> set both to create modifiable cq (if HCA supports, of course).
>>>
>>> Agreed ?
>>>
>> I think that adding a flag in create_cq will be less clean as it will
>> require more work for anyone writing applications that should not have
>> to consider this feature.
>
>
> As we discussed, let's check with RDMA maintainers if it's better to 
> extend alloc_cq API or create alloc_cq_dim API function.
>
> Sagi/Christoph,
>
> how about adding a module param per ULP ? as we use register_always 
> today, create a use_dimm module param for iSER/NVMe-RDMA ?

another option is to add it to each ULP user space utility (nvme-cli, 
iscsiadm, etc...)

thoughts ?


>
>
>>
>> Based on the results I saw during testing I would set it to work by
>> default as I could not find a use case where it significantly reduces
>> performance and in many cases it is a large improvement. It should be
>> more of an opt out situation.
>>
>> Performance improvement (ConnectX-5 100GbE, x86) running FIO 
>> benchmark over
>> ? ??? NVMf between two equal end-hosts with 56 cores across a 
>> Mellanox switch
>> ? ??? using null_blk device:
>>
>> ? ??? IO READS before:
>> ? ??? blk size | BW????? | IOPS | 99th percentile latency
>> ? ??? 512B???? | 3.2GiB? | 6.6M | 1549? usec
>> ? ??? 4k?????? | 7.2GiB? | 1.8M | 7177? usec
>> ? ??? 64k????? | 10.7GiB | 176k | 82314 usec
>>
>> ? ??? IO READS after:
>> ? ??? blk size | BW????? | IOPS | 99th percentile latency
>> ? ??? 512B???? | 4.2GiB? | 8.6M | 1729?? usec
>> ? ??? 4k?????? | 10.5GiB | 2.7M | 5669?? usec
>> ? ??? 64k????? | 10.7GiB | 176k | 102000 usec
>>
>> ? ??? IO WRITES before:
>> ? ??? blk size | BW????? | IOPS | 99th percentile latency
>> ? ??? 512B???? | 3GiB??? | 6.2M | 2573? usec
>> ? ??? 4k?????? | 7.2GiB? | 1.8M | 5342? usec
>> ? ??? 64k????? | 10.7GiB | 176k | 62129 usec
>>
>> ? ??? IO WRITES after:
>> ? ??? blk size | BW????? | IOPS? | 99th percentile latency
>> ? ??? 512B???? | 4.2GiB? | 8.6M? | 938?? usec
>> ? ??? 4k?????? | 10.2GiB | 2.68M | 2769? usec
>> ? ??? 64k????? | 10.6GiB | 173k? | 87557 usec
>>
>> It doesn't really make a difference to me how the option is implemented
>> but I think it makes more sense to have it dealt with by us such as in a
>> module parameter and not something like a flag that has a larger radius
>> of effect.
>>
>>>>
>>>> Moreover, not every device support cq moderation so you need to check
>>>> the device capabilities before you apply any of this.
>>> for sure.
>>>
>>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
  2019-03-18  9:24         ` Yamin Friedman
@ 2019-03-18 21:32           ` Sagi Grimberg
  -1 siblings, 0 replies; 48+ messages in thread
From: Sagi Grimberg @ 2019-03-18 21:32 UTC (permalink / raw)
  To: Yamin Friedman, Max Gurtovoy, Bart Van Assche, Tal Gilboa,
	linux-rdma, linux-nvme, linux-block
  Cc: Leon Romanovsky, Jason Gunthorpe, Yishai Hadas, Saeed Mahameed,
	Doug Ledford, Idan Burstein, Tariq Toukan


> I agree that blk is not the most successful name, we were trying to find
> something that would work for general storage applications. I think
> rdma_dim would work as it is completion based but then when we want to
> use it for nvme it will probably require code duplication.

Lets worry about reuse NVMe when it is actually applicable.

>> Also in the internal review Yamin added a table that summarize all the
>> testing that were done using NVMeoF (I guess it somehow didn't get to
>> this RFC).
>>
>> I guess we can do the same for iSER to get more confidence and then
>> set both to create modifiable cq (if HCA supports, of course).
>>
>> Agreed ?
>>
> I think that adding a flag in create_cq will be less clean as it will
> require more work for anyone writing applications that should not have
> to consider this feature.
> 
> Based on the results I saw during testing I would set it to work by
> default as I could not find a use case where it significantly reduces
> performance and in many cases it is a large improvement. It should be
> more of an opt out situation.

By detailed performance results I meant:
1. Full latency histogram for QD=1 both for single queue and multi-queue
(including max, 99.99% and 99.999% percentiles)
2. latency vs. IOPs graph/table for both single queue and multi-queue
3. At least some measurement/analysis of how well the algorithm is
    adapting to workload change dynamically and how quickly.
4. Test also with real NVMe devices.

Also, we need to separate the host side moderation and the target
side moderation to understand if/how they effect each other.

Its very easy to show that for high stress workloads you can get an
improvement as obviously there is a clear win for interrupt moderation,
however, if this was the only metric that is interesting, we wouldn't
need it to be adaptive.

As I said before, this adds entropy to the equation which in certain use
cases can make more harm than good, and we need to quantify where is the
impact and understand how important they are compared to the extremely
niche use-case of a single host pushing 2M-8M IOPs.

> Performance improvement (ConnectX-5 100GbE, x86) running FIO benchmark over
>       NVMf between two equal end-hosts with 56 cores across a Mellanox switch
>       using null_blk device:
> 
>       IO READS before:
>       blk size | BW      | IOPS | 99th percentile latency
>       512B     | 3.2GiB  | 6.6M | 1549  usec
>       4k       | 7.2GiB  | 1.8M | 7177  usec
>       64k      | 10.7GiB | 176k | 82314 usec

I've seen this before, why are we not getting 100Gb/s for 4k with CX5?
I recall we used to get it with CX4.

>       IO READS after:
>       blk size | BW      | IOPS | 99th percentile latency
>       512B     | 4.2GiB  | 8.6M | 1729   usec
>       4k       | 10.5GiB | 2.7M | 5669   usec
>       64k      | 10.7GiB | 176k | 102000 usec
> 
>       IO WRITES before:
>       blk size | BW      | IOPS | 99th percentile latency
>       512B     | 3GiB    | 6.2M | 2573  usec
>       4k       | 7.2GiB  | 1.8M | 5342  usec
>       64k      | 10.7GiB | 176k | 62129 usec
> 
>       IO WRITES after:
>       blk size | BW      | IOPS  | 99th percentile latency
>       512B     | 4.2GiB  | 8.6M  | 938   usec
>       4k       | 10.2GiB | 2.68M | 2769  usec
>       64k      | 10.6GiB | 173k  | 87557 usec

The fact that the 64k 99% latency is substantially higher (20+
milliseconds) without any BW benefit, while its not a very interesting
measurement, gives me an indication that a more detailed analysis needs
to be made here to understand where are the trade-offs.

> It doesn't really make a difference to me how the option is implemented
> but I think it makes more sense to have it dealt with by us such as in a
> module parameter and not something like a flag that has a larger radius
> of effect.

I was suggesting a sysctl global parameter for global behavior and of
someone wants to override it it can add a CQ flag (which follows the
common net params exactly).

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-03-18 21:32           ` Sagi Grimberg
  0 siblings, 0 replies; 48+ messages in thread
From: Sagi Grimberg @ 2019-03-18 21:32 UTC (permalink / raw)



> I agree that blk is not the most successful name, we were trying to find
> something that would work for general storage applications. I think
> rdma_dim would work as it is completion based but then when we want to
> use it for nvme it will probably require code duplication.

Lets worry about reuse NVMe when it is actually applicable.

>> Also in the internal review Yamin added a table that summarize all the
>> testing that were done using NVMeoF (I guess it somehow didn't get to
>> this RFC).
>>
>> I guess we can do the same for iSER to get more confidence and then
>> set both to create modifiable cq (if HCA supports, of course).
>>
>> Agreed ?
>>
> I think that adding a flag in create_cq will be less clean as it will
> require more work for anyone writing applications that should not have
> to consider this feature.
> 
> Based on the results I saw during testing I would set it to work by
> default as I could not find a use case where it significantly reduces
> performance and in many cases it is a large improvement. It should be
> more of an opt out situation.

By detailed performance results I meant:
1. Full latency histogram for QD=1 both for single queue and multi-queue
(including max, 99.99% and 99.999% percentiles)
2. latency vs. IOPs graph/table for both single queue and multi-queue
3. At least some measurement/analysis of how well the algorithm is
    adapting to workload change dynamically and how quickly.
4. Test also with real NVMe devices.

Also, we need to separate the host side moderation and the target
side moderation to understand if/how they effect each other.

Its very easy to show that for high stress workloads you can get an
improvement as obviously there is a clear win for interrupt moderation,
however, if this was the only metric that is interesting, we wouldn't
need it to be adaptive.

As I said before, this adds entropy to the equation which in certain use
cases can make more harm than good, and we need to quantify where is the
impact and understand how important they are compared to the extremely
niche use-case of a single host pushing 2M-8M IOPs.

> Performance improvement (ConnectX-5 100GbE, x86) running FIO benchmark over
>   ??? NVMf between two equal end-hosts with 56 cores across a Mellanox switch
>   ??? using null_blk device:
> 
>   ??? IO READS before:
>   ??? blk size | BW????? | IOPS | 99th percentile latency
>   ??? 512B???? | 3.2GiB? | 6.6M | 1549? usec
>   ??? 4k?????? | 7.2GiB? | 1.8M | 7177? usec
>   ??? 64k????? | 10.7GiB | 176k | 82314 usec

I've seen this before, why are we not getting 100Gb/s for 4k with CX5?
I recall we used to get it with CX4.

>   ??? IO READS after:
>   ??? blk size | BW????? | IOPS | 99th percentile latency
>   ??? 512B???? | 4.2GiB? | 8.6M | 1729?? usec
>   ??? 4k?????? | 10.5GiB | 2.7M | 5669?? usec
>   ??? 64k????? | 10.7GiB | 176k | 102000 usec
> 
>   ??? IO WRITES before:
>   ??? blk size | BW????? | IOPS | 99th percentile latency
>   ??? 512B???? | 3GiB??? | 6.2M | 2573? usec
>   ??? 4k?????? | 7.2GiB? | 1.8M | 5342? usec
>   ??? 64k????? | 10.7GiB | 176k | 62129 usec
> 
>   ??? IO WRITES after:
>   ??? blk size | BW????? | IOPS? | 99th percentile latency
>   ??? 512B???? | 4.2GiB? | 8.6M? | 938?? usec
>   ??? 4k?????? | 10.2GiB | 2.68M | 2769? usec
>   ??? 64k????? | 10.6GiB | 173k? | 87557 usec

The fact that the 64k 99% latency is substantially higher (20+
milliseconds) without any BW benefit, while its not a very interesting
measurement, gives me an indication that a more detailed analysis needs
to be made here to understand where are the trade-offs.

> It doesn't really make a difference to me how the option is implemented
> but I think it makes more sense to have it dealt with by us such as in a
> module parameter and not something like a flag that has a larger radius
> of effect.

I was suggesting a sysctl global parameter for global behavior and of
someone wants to override it it can add a CQ flag (which follows the
common net params exactly).

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
  2019-03-18 11:08           ` Max Gurtovoy
@ 2019-03-18 21:34             ` Sagi Grimberg
  -1 siblings, 0 replies; 48+ messages in thread
From: Sagi Grimberg @ 2019-03-18 21:34 UTC (permalink / raw)
  To: Max Gurtovoy, Yamin Friedman, Bart Van Assche, Tal Gilboa,
	linux-rdma, linux-nvme, linux-block
  Cc: Leon Romanovsky, Jason Gunthorpe, Yishai Hadas, Saeed Mahameed,
	Doug Ledford, Idan Burstein, Tariq Toukan, Christoph Hellwig


> As we discussed, let's check with RDMA maintainers if it's better to 
> extend alloc_cq API or create alloc_cq_dim API function.
> 
> Sagi/Christoph,
> 
> how about adding a module param per ULP ? as we use register_always 
> today, create a use_dimm module param for iSER/NVMe-RDMA ?

I would say that its better (and simpler) to do a global sysctl knob for
it. No need for a per-ULP param for a starting point.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-03-18 21:34             ` Sagi Grimberg
  0 siblings, 0 replies; 48+ messages in thread
From: Sagi Grimberg @ 2019-03-18 21:34 UTC (permalink / raw)



> As we discussed, let's check with RDMA maintainers if it's better to 
> extend alloc_cq API or create alloc_cq_dim API function.
> 
> Sagi/Christoph,
> 
> how about adding a module param per ULP ? as we use register_always 
> today, create a use_dimm module param for iSER/NVMe-RDMA ?

I would say that its better (and simpler) to do a global sysctl knob for
it. No need for a per-ULP param for a starting point.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
  2019-03-18 21:34             ` Sagi Grimberg
@ 2019-03-20  9:17               ` Max Gurtovoy
  -1 siblings, 0 replies; 48+ messages in thread
From: Max Gurtovoy @ 2019-03-20  9:17 UTC (permalink / raw)
  To: Sagi Grimberg, Yamin Friedman, Bart Van Assche, Tal Gilboa,
	linux-rdma, linux-nvme, linux-block
  Cc: Leon Romanovsky, Jason Gunthorpe, Yishai Hadas, Saeed Mahameed,
	Doug Ledford, Idan Burstein, Tariq Toukan, Christoph Hellwig


On 3/18/2019 11:34 PM, Sagi Grimberg wrote:
>
>> As we discussed, let's check with RDMA maintainers if it's better to 
>> extend alloc_cq API or create alloc_cq_dim API function.
>>
>> Sagi/Christoph,
>>
>> how about adding a module param per ULP ? as we use register_always 
>> today, create a use_dimm module param for iSER/NVMe-RDMA ?
>
> I would say that its better (and simpler) to do a global sysctl knob for
> it. No need for a per-ULP param for a starting point.

for sure it's simpler, but in this case all should agree that in case we 
run more than 1 ULP in a server, all of them will have the same 
configuration (no QoS).

are you suggesting something like:

/proc/sys/rdma/mlx5_0/dim

Jason/Leon/Doug,

thoughts about the best way to configure this feature from your 
perspective ?


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-03-20  9:17               ` Max Gurtovoy
  0 siblings, 0 replies; 48+ messages in thread
From: Max Gurtovoy @ 2019-03-20  9:17 UTC (permalink / raw)



On 3/18/2019 11:34 PM, Sagi Grimberg wrote:
>
>> As we discussed, let's check with RDMA maintainers if it's better to 
>> extend alloc_cq API or create alloc_cq_dim API function.
>>
>> Sagi/Christoph,
>>
>> how about adding a module param per ULP ? as we use register_always 
>> today, create a use_dimm module param for iSER/NVMe-RDMA ?
>
> I would say that its better (and simpler) to do a global sysctl knob for
> it. No need for a per-ULP param for a starting point.

for sure it's simpler, but in this case all should agree that in case we 
run more than 1 ULP in a server, all of them will have the same 
configuration (no QoS).

are you suggesting something like:

/proc/sys/rdma/mlx5_0/dim

Jason/Leon/Doug,

thoughts about the best way to configure this feature from your 
perspective ?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
  2019-03-20  9:17               ` Max Gurtovoy
@ 2019-03-20 11:10                 ` Leon Romanovsky
  -1 siblings, 0 replies; 48+ messages in thread
From: Leon Romanovsky @ 2019-03-20 11:10 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Sagi Grimberg, Yamin Friedman, Bart Van Assche, Tal Gilboa,
	linux-rdma, linux-nvme, linux-block, Jason Gunthorpe,
	Yishai Hadas, Saeed Mahameed, Doug Ledford, Idan Burstein,
	Tariq Toukan, Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 1326 bytes --]

On Wed, Mar 20, 2019 at 11:17:36AM +0200, Max Gurtovoy wrote:
>
> On 3/18/2019 11:34 PM, Sagi Grimberg wrote:
> >
> > > As we discussed, let's check with RDMA maintainers if it's better to
> > > extend alloc_cq API or create alloc_cq_dim API function.
> > >
> > > Sagi/Christoph,
> > >
> > > how about adding a module param per ULP ? as we use register_always
> > > today, create a use_dimm module param for iSER/NVMe-RDMA ?
> >
> > I would say that its better (and simpler) to do a global sysctl knob for
> > it. No need for a per-ULP param for a starting point.
>
> for sure it's simpler, but in this case all should agree that in case we run
> more than 1 ULP in a server, all of them will have the same configuration
> (no QoS).
>
> are you suggesting something like:
>
> /proc/sys/rdma/mlx5_0/dim
>
> Jason/Leon/Doug,
>
> thoughts about the best way to configure this feature from your perspective
> ?

It doesn't sound reasonable that we have per-ULP feature (DIM is
per-ULP) to be configured globally. Especially given the fact that
users can find themselves running different workloads with different
requirements on the same system.

Currently each ULP has some sort of tool to configure itself and I think
that once ULP is converted to use DIM, it should have on/off knob in the
tool used by their users.

Thanks

>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-03-20 11:10                 ` Leon Romanovsky
  0 siblings, 0 replies; 48+ messages in thread
From: Leon Romanovsky @ 2019-03-20 11:10 UTC (permalink / raw)


On Wed, Mar 20, 2019@11:17:36AM +0200, Max Gurtovoy wrote:
>
> On 3/18/2019 11:34 PM, Sagi Grimberg wrote:
> >
> > > As we discussed, let's check with RDMA maintainers if it's better to
> > > extend alloc_cq API or create alloc_cq_dim API function.
> > >
> > > Sagi/Christoph,
> > >
> > > how about adding a module param per ULP ? as we use register_always
> > > today, create a use_dimm module param for iSER/NVMe-RDMA ?
> >
> > I would say that its better (and simpler) to do a global sysctl knob for
> > it. No need for a per-ULP param for a starting point.
>
> for sure it's simpler, but in this case all should agree that in case we run
> more than 1 ULP in a server, all of them will have the same configuration
> (no QoS).
>
> are you suggesting something like:
>
> /proc/sys/rdma/mlx5_0/dim
>
> Jason/Leon/Doug,
>
> thoughts about the best way to configure this feature from your perspective
> ?

It doesn't sound reasonable that we have per-ULP feature (DIM is
per-ULP) to be configured globally. Especially given the fact that
users can find themselves running different workloads with different
requirements on the same system.

Currently each ULP has some sort of tool to configure itself and I think
that once ULP is converted to use DIM, it should have on/off knob in the
tool used by their users.

Thanks

>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20190320/34b86ac6/attachment.sig>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
  2019-03-20 11:10                 ` Leon Romanovsky
@ 2019-03-20 18:34                   ` Sagi Grimberg
  -1 siblings, 0 replies; 48+ messages in thread
From: Sagi Grimberg @ 2019-03-20 18:34 UTC (permalink / raw)
  To: Leon Romanovsky, Max Gurtovoy
  Cc: Yamin Friedman, Bart Van Assche, Tal Gilboa, linux-rdma,
	linux-nvme, linux-block, Jason Gunthorpe, Yishai Hadas,
	Saeed Mahameed, Doug Ledford, Idan Burstein, Tariq Toukan,
	Christoph Hellwig


> It doesn't sound reasonable that we have per-ULP feature (DIM is
> per-ULP) to be configured globally. Especially given the fact that
> users can find themselves running different workloads with different
> requirements on the same system.
> 
> Currently each ULP has some sort of tool to configure itself and I think
> that once ULP is converted to use DIM, it should have on/off knob in the
> tool used by their users.

Its not any different than socket options that can have a global sysctl
knob that can be overridden by individual socket consumers.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-03-20 18:34                   ` Sagi Grimberg
  0 siblings, 0 replies; 48+ messages in thread
From: Sagi Grimberg @ 2019-03-20 18:34 UTC (permalink / raw)



> It doesn't sound reasonable that we have per-ULP feature (DIM is
> per-ULP) to be configured globally. Especially given the fact that
> users can find themselves running different workloads with different
> requirements on the same system.
> 
> Currently each ULP has some sort of tool to configure itself and I think
> that once ULP is converted to use DIM, it should have on/off knob in the
> tool used by their users.

Its not any different than socket options that can have a global sysctl
knob that can be overridden by individual socket consumers.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
  2019-03-18 21:32           ` Sagi Grimberg
  (?)
@ 2019-03-21  5:53           ` Yamin Friedman
  -1 siblings, 0 replies; 48+ messages in thread
From: Yamin Friedman @ 2019-03-21  5:53 UTC (permalink / raw)



On 3/18/2019 11:32 PM, Sagi Grimberg wrote:
>
>> I agree that blk is not the most successful name, we were trying to find
>> something that would work for general storage applications. I think
>> rdma_dim would work as it is completion based but then when we want to
>> use it for nvme it will probably require code duplication.
>
> Lets worry about reuse NVMe when it is actually applicable.
>
>>> Also in the internal review Yamin added a table that summarize all the
>>> testing that were done using NVMeoF (I guess it somehow didn't get to
>>> this RFC).
>>>
>>> I guess we can do the same for iSER to get more confidence and then
>>> set both to create modifiable cq (if HCA supports, of course).
>>>
>>> Agreed ?
>>>
>> I think that adding a flag in create_cq will be less clean as it will
>> require more work for anyone writing applications that should not have
>> to consider this feature.
>>
>> Based on the results I saw during testing I would set it to work by
>> default as I could not find a use case where it significantly reduces
>> performance and in many cases it is a large improvement. It should be
>> more of an opt out situation.
>
> By detailed performance results I meant:
> 1. Full latency histogram for QD=1 both for single queue and multi-queue
> (including max, 99.99% and 99.999% percentiles)
> 2. latency vs. IOPs graph/table for both single queue and multi-queue
> 3. At least some measurement/analysis of how well the algorithm is
> ?? adapting to workload change dynamically and how quickly.
> 4. Test also with real NVMe devices.
>
> Also, we need to separate the host side moderation and the target
> side moderation to understand if/how they effect each other.
>
> Its very easy to show that for high stress workloads you can get an
> improvement as obviously there is a clear win for interrupt moderation,
> however, if this was the only metric that is interesting, we wouldn't
> need it to be adaptive.
>
> As I said before, this adds entropy to the equation which in certain use
> cases can make more harm than good, and we need to quantify where is the
> impact and understand how important they are compared to the extremely
> niche use-case of a single host pushing 2M-8M IOPs.


I ran extensive tests between two hosts using FIO and NVMeoF with and 
without DIM on each side separately and together. I am attaching the 
results of the tests. The main point I would like to make is that when 
used on both sides especially we see improvements across the board in 
both IOPS and latency.

>
>> Performance improvement (ConnectX-5 100GbE, x86) running FIO 
>> benchmark over
>> ? ??? NVMf between two equal end-hosts with 56 cores across a 
>> Mellanox switch
>> ? ??? using null_blk device:
>>
>> ? ??? IO READS before:
>> ? ??? blk size | BW????? | IOPS | 99th percentile latency
>> ? ??? 512B???? | 3.2GiB? | 6.6M | 1549? usec
>> ? ??? 4k?????? | 7.2GiB? | 1.8M | 7177? usec
>> ? ??? 64k????? | 10.7GiB | 176k | 82314 usec
>
> I've seen this before, why are we not getting 100Gb/s for 4k with CX5?
> I recall we used to get it with CX4.

It depends on the host server and whether or not it causes backpressure 
on the PCI because of the amount of interrupts.

>
>> ? ??? IO READS after:
>> ? ??? blk size | BW????? | IOPS | 99th percentile latency
>> ? ??? 512B???? | 4.2GiB? | 8.6M | 1729?? usec
>> ? ??? 4k?????? | 10.5GiB | 2.7M | 5669?? usec
>> ? ??? 64k????? | 10.7GiB | 176k | 102000 usec
>>
>> ? ??? IO WRITES before:
>> ? ??? blk size | BW????? | IOPS | 99th percentile latency
>> ? ??? 512B???? | 3GiB??? | 6.2M | 2573? usec
>> ? ??? 4k?????? | 7.2GiB? | 1.8M | 5342? usec
>> ? ??? 64k????? | 10.7GiB | 176k | 62129 usec
>>
>> ? ??? IO WRITES after:
>> ? ??? blk size | BW????? | IOPS? | 99th percentile latency
>> ? ??? 512B???? | 4.2GiB? | 8.6M? | 938?? usec
>> ? ??? 4k?????? | 10.2GiB | 2.68M | 2769? usec
>> ? ??? 64k????? | 10.6GiB | 173k? | 87557 usec
>
> The fact that the 64k 99% latency is substantially higher (20+
> milliseconds) without any BW benefit, while its not a very interesting
> measurement, gives me an indication that a more detailed analysis needs
> to be made here to understand where are the trade-offs.
>
>> It doesn't really make a difference to me how the option is implemented
>> but I think it makes more sense to have it dealt with by us such as in a
>> module parameter and not something like a flag that has a larger radius
>> of effect.
>
> I was suggesting a sysctl global parameter for global behavior and of
> someone wants to override it it can add a CQ flag (which follows the
> common net params exactly).
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: comp_DIM_to_no_DIM.txt
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20190321/11839723/attachment-0003.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: comp_initiator_DIM_to_no_DIM.txt
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20190321/11839723/attachment-0004.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: comp_target_DIM_to_no_DIM.txt
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20190321/11839723/attachment-0005.txt>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
  2019-03-20 18:34                   ` Sagi Grimberg
@ 2019-03-24  9:11                     ` Leon Romanovsky
  -1 siblings, 0 replies; 48+ messages in thread
From: Leon Romanovsky @ 2019-03-24  9:11 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Max Gurtovoy, Yamin Friedman, Bart Van Assche, Tal Gilboa,
	linux-rdma, linux-nvme, linux-block, Jason Gunthorpe,
	Yishai Hadas, Saeed Mahameed, Doug Ledford, Idan Burstein,
	Tariq Toukan, Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 1030 bytes --]

On Wed, Mar 20, 2019 at 11:34:27AM -0700, Sagi Grimberg wrote:
>
> > It doesn't sound reasonable that we have per-ULP feature (DIM is
> > per-ULP) to be configured globally. Especially given the fact that
> > users can find themselves running different workloads with different
> > requirements on the same system.
> >
> > Currently each ULP has some sort of tool to configure itself and I think
> > that once ULP is converted to use DIM, it should have on/off knob in the
> > tool used by their users.
>
> Its not any different than socket options that can have a global sysctl
> knob that can be overridden by individual socket consumers.

Right, but there is a major difference between socket example and
your proposal.

Combination of socket option with general knob gives you maximum
versatility in order to disable/enable/configure through program
or through some sensible default. In your proposal, you will limit
yourself to some system wide default, without any ability to override
it specifically for your load.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-03-24  9:11                     ` Leon Romanovsky
  0 siblings, 0 replies; 48+ messages in thread
From: Leon Romanovsky @ 2019-03-24  9:11 UTC (permalink / raw)


On Wed, Mar 20, 2019@11:34:27AM -0700, Sagi Grimberg wrote:
>
> > It doesn't sound reasonable that we have per-ULP feature (DIM is
> > per-ULP) to be configured globally. Especially given the fact that
> > users can find themselves running different workloads with different
> > requirements on the same system.
> >
> > Currently each ULP has some sort of tool to configure itself and I think
> > that once ULP is converted to use DIM, it should have on/off knob in the
> > tool used by their users.
>
> Its not any different than socket options that can have a global sysctl
> knob that can be overridden by individual socket consumers.

Right, but there is a major difference between socket example and
your proposal.

Combination of socket option with general knob gives you maximum
versatility in order to disable/enable/configure through program
or through some sensible default. In your proposal, you will limit
yourself to some system wide default, without any ability to override
it specifically for your load.

Thanks
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20190324/f852d82c/attachment.sig>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations
@ 2019-02-03 13:50 Tal Gilboa
  0 siblings, 0 replies; 48+ messages in thread
From: Tal Gilboa @ 2019-02-03 13:50 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Tariq Toukan, Tal Gilboa, Saeed Mahameed, Idan Burstein,
	Yamin Friedman, Max Gurtovoy, Florian Fainelli, Andy Gospodarek

net_dim.h lib exposes an implementation of the DIM algorithm for dynamically-tuned interrupt
moderation for networking interfaces.

We need the same behavior for any block CQ. The main motivation is two benefit from maximized
completion rate and reduced interrupt overhead that DIM may provide.

Current DIM implementation prioritizes reducing interrupt overhead over latency. Also, in
order to reduce DIM's own overhead, the algorithm might take take some time to identify it
needs to change profiles. For these reasons we got to the understanding that a slightly
modified algorithm is needed. Early tests with current implementation show it doesn't react
fast and sharply enough in order to satisfy the block CQ needs.

I would like to suggest an implementation for block DIM. The idea is to expose the new
functionality without the risk of breaking Net DIM behavior for netdev. Below are main
similarities and differences between the two implementations and general guidelines for the
suggested solution.

Performance tests over ConnectX-5 100GbE NIC show a 200% improvement on tail latency when
switching from high load traffic to low load traffic.

Common logic, main DIM procedure:
- Calculate current stats from a given sample
- Compare current stats vs. previous iteration stats
- Make a decision -> choose a new profile

Differences:
- Different parameters for moving between profiles
- Different moderation values and number of profiles
- Different sampled data

Suggested solution:
- Common logic will be declared in include/linux/dim.h and implemented in lib/dim/dim.c
- Net DIM (existing) logic will be declared in include/linux/net_dim.h and implemented in
  lib/dim/net_dim.c, which will use the common logic from dim.h
- Block DIM logic will be declared in /include/linux/block_dim.h and implemented in
  lib/dim/blk_dim.c.
  This new implementation will expose modified versions of profiles, dim_step() and dim_decision()

Pros for this solution are:
- Zero impact on existing net_dim implementation and usage
- Relatively more code reuse (compared to two separate solutions)
- Readiness for future implementations

Tal Gilboa (6):
  linux/dim: Move logic to dim.h
  linux/dim: Remove "net" prefix from internal DIM members
  linux/dim: Rename externally exposed macros
  linux/dim: Rename net_dim_sample() to net_dim_create_sample()
  linux/dim: Rename externally used net_dim members
  linux/dim: Move implementation to .c files

Yamin Friedman (3):
  linux/dim: Add completions count to dim_sample
  linux/dim: Implement blk_dim.h
  drivers/infiniband: Use blk_dim in infiniband driver

 MAINTAINERS                                   |   3 +
 drivers/infiniband/core/cq.c                  |  75 +++-
 drivers/infiniband/hw/mlx4/qp.c               |   2 +-
 drivers/infiniband/hw/mlx5/qp.c               |   2 +-
 drivers/net/ethernet/broadcom/bcmsysport.c    |  20 +-
 drivers/net/ethernet/broadcom/bcmsysport.h    |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  13 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   2 +-
 .../net/ethernet/broadcom/bnxt/bnxt_debugfs.c |   4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c |   7 +-
 .../net/ethernet/broadcom/genet/bcmgenet.c    |  18 +-
 .../net/ethernet/broadcom/genet/bcmgenet.h    |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   8 +-
 .../net/ethernet/mellanox/mlx5/core/en_dim.c  |  12 +-
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |   4 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  22 +-
 .../net/ethernet/mellanox/mlx5/core/en_txrx.c |  10 +-
 include/linux/blk_dim.h                       |  56 +++
 include/linux/dim.h                           | 126 +++++++
 include/linux/irq_poll.h                      |   7 +
 include/linux/net_dim.h                       | 338 +-----------------
 include/rdma/ib_verbs.h                       |  11 +-
 lib/Kconfig                                   |   7 +
 lib/Makefile                                  |   1 +
 lib/dim/Makefile                              |  14 +
 lib/dim/blk_dim.c                             | 114 ++++++
 lib/dim/dim.c                                 |  92 +++++
 lib/dim/net_dim.c                             | 193 ++++++++++
 lib/irq_poll.c                                |  13 +-
 29 files changed, 778 insertions(+), 400 deletions(-)
 create mode 100644 include/linux/blk_dim.h
 create mode 100644 include/linux/dim.h
 create mode 100644 lib/dim/Makefile
 create mode 100644 lib/dim/blk_dim.c
 create mode 100644 lib/dim/dim.c
 create mode 100644 lib/dim/net_dim.c

-- 
2.19.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2019-03-24  9:11 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-06  8:48 [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations Tal Gilboa
2019-03-06  8:48 ` Tal Gilboa
2019-03-06  8:48 ` [RFC/PATCH net-next 1/9] linux/dim: Move logic to dim.h Tal Gilboa
2019-03-06  8:48   ` Tal Gilboa
2019-03-06  8:48 ` [RFC/PATCH net-next 2/9] linux/dim: Remove "net" prefix from internal DIM members Tal Gilboa
2019-03-06  8:48   ` Tal Gilboa
2019-03-06  8:48 ` [RFC/PATCH net-next 3/9] linux/dim: Rename externally exposed macros Tal Gilboa
2019-03-06  8:48   ` Tal Gilboa
2019-03-06  8:48 ` [RFC/PATCH net-next 4/9] linux/dim: Rename net_dim_sample() to net_dim_create_sample() Tal Gilboa
2019-03-06  8:48   ` Tal Gilboa
2019-03-06  8:48 ` [RFC/PATCH net-next 5/9] linux/dim: Rename externally used net_dim members Tal Gilboa
2019-03-06  8:48   ` Tal Gilboa
2019-03-06  8:48 ` [RFC/PATCH net-next 6/9] linux/dim: Move implementation to .c files Tal Gilboa
2019-03-06  8:48   ` Tal Gilboa
2019-03-06  8:48 ` [RFC/PATCH net-next 7/9] linux/dim: Add completions count to dim_sample Tal Gilboa
2019-03-06  8:48   ` Tal Gilboa
2019-03-06  8:48 ` [RFC/PATCH net-next 8/9] linux/dim: Implement blk_dim.h Tal Gilboa
2019-03-06  8:48   ` Tal Gilboa
2019-03-06  8:48 ` [RFC/PATCH net-next 9/9] drivers/infiniband: Use blk_dim in infiniband driver Tal Gilboa
2019-03-06  8:48   ` Tal Gilboa
2019-03-06 16:15 ` [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations Bart Van Assche
2019-03-06 16:15   ` Bart Van Assche
2019-03-07  1:56   ` Sagi Grimberg
2019-03-07  1:56     ` Sagi Grimberg
2019-03-14 11:45     ` Max Gurtovoy
2019-03-14 11:45       ` Max Gurtovoy
2019-03-14 21:53       ` Sagi Grimberg
2019-03-14 21:53         ` Sagi Grimberg
2019-03-18  9:24       ` Yamin Friedman
2019-03-18  9:24         ` Yamin Friedman
2019-03-18 11:08         ` Max Gurtovoy
2019-03-18 11:08           ` Max Gurtovoy
2019-03-18 15:05           ` Max Gurtovoy
2019-03-18 15:05             ` Max Gurtovoy
2019-03-18 21:34           ` Sagi Grimberg
2019-03-18 21:34             ` Sagi Grimberg
2019-03-20  9:17             ` Max Gurtovoy
2019-03-20  9:17               ` Max Gurtovoy
2019-03-20 11:10               ` Leon Romanovsky
2019-03-20 11:10                 ` Leon Romanovsky
2019-03-20 18:34                 ` Sagi Grimberg
2019-03-20 18:34                   ` Sagi Grimberg
2019-03-24  9:11                   ` Leon Romanovsky
2019-03-24  9:11                     ` Leon Romanovsky
2019-03-18 21:32         ` Sagi Grimberg
2019-03-18 21:32           ` Sagi Grimberg
2019-03-21  5:53           ` Yamin Friedman
  -- strict thread matches above, loose matches on Subject: below --
2019-02-03 13:50 Tal Gilboa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.