netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [pull request][net 00/11] mlx5 fixes 2020-07-02
@ 2020-07-02 22:19 Saeed Mahameed
  2020-07-02 22:19 ` [net 01/11] net/mlx5: Fix eeprom support for SFP module Saeed Mahameed
                   ` (11 more replies)
  0 siblings, 12 replies; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-02 22:19 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski; +Cc: netdev, Saeed Mahameed

Hi Dave,

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

For -stable v5.1
 ('net/mlx5e: Hold reference on mirred devices while accessing them')

For -stable v5.2
 ('net/mlx5: Fix eeprom support for SFP module')

For -stable v5.4
 ('net/mlx5e: Fix multicast counter not up-to-date in "ip -s"')
 ('net/mlx5e: Fix 50G per lane indication')

For -stable v5.5
 ('net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash')
 ('net/mlx5e: Fix VXLAN configuration restore after function reload')

For -stable v5.7
 ('net/mlx5e: CT: Fix memory leak in cleanup')

Thanks,
Saeed.

---
The following changes since commit ad4e2b64839710e3b6e17a11b2684ceaaeae795e:

  MAINTAINERS: net: macb: add Claudiu as co-maintainer (2020-07-02 14:33:50 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-fixes-2020-07-02

for you to fetch changes up to c422d24e732c1dd73033f821c3a91e6021a62e19:

  net/mlx5e: CT: Fix memory leak in cleanup (2020-07-02 15:12:37 -0700)

----------------------------------------------------------------
mlx5-fixes-2020-07-02

----------------------------------------------------------------
Aya Levin (3):
      net/mlx5e: Fix VXLAN configuration restore after function reload
      net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash
      net/mlx5e: Fix 50G per lane indication

Eli Britstein (1):
      net/mlx5e: CT: Fix memory leak in cleanup

Eli Cohen (1):
      net/mlx5e: Hold reference on mirred devices while accessing them

Eran Ben Elisha (2):
      net/mlx5: Fix eeprom support for SFP module
      net/mlx5e: Fix port buffers cell size value

Ron Diskin (1):
      net/mlx5e: Fix multicast counter not up-to-date in "ip -s"

Vlad Buslov (2):
      net/mxl5e: Verify that rpriv is not NULL
      net/mlx5e: Fix usage of rcu-protected pointer

Vu Pham (1):
      net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode

 drivers/net/ethernet/mellanox/mlx5/core/en/dcbnl.h |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en/port.c  | 21 ++++-
 drivers/net/ethernet/mellanox/mlx5/core/en/port.h  |  2 +-
 .../ethernet/mellanox/mlx5/core/en/port_buffer.c   | 53 ++++++------
 .../ethernet/mellanox/mlx5/core/en/port_buffer.h   |  1 -
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h  |  5 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 19 +++++
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  8 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 23 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |  6 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c |  6 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |  4 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 22 +++--
 .../mellanox/mlx5/core/esw/acl/ingress_lgcy.c      |  1 -
 drivers/net/ethernet/mellanox/mlx5/core/port.c     | 93 ++++++++++++++++++----
 include/linux/mlx5/driver.h                        |  1 +
 include/linux/mlx5/mlx5_ifc.h                      | 28 +++++++
 18 files changed, 224 insertions(+), 71 deletions(-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [net 01/11] net/mlx5: Fix eeprom support for SFP module
  2020-07-02 22:19 [pull request][net 00/11] mlx5 fixes 2020-07-02 Saeed Mahameed
@ 2020-07-02 22:19 ` Saeed Mahameed
  2020-07-02 22:19 ` [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s" Saeed Mahameed
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-02 22:19 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Eran Ben Elisha, Huy Nguyen, Saeed Mahameed

From: Eran Ben Elisha <eranbe@mellanox.com>

Fix eeprom SFP query support by setting i2c_addr, offset and page number
correctly. Unlike QSFP modules, SFP eeprom params are as follow:
- i2c_addr is 0x50 for offset 0 - 255 and 0x51 for offset 256 - 511.
- Page number is always zero.
- Page offset is always relative to zero.

As part of eeprom query, query the module ID (SFP / QSFP*) via helper
function to set the params accordingly.

In addition, change mlx5_qsfp_eeprom_page() input type to be u16 to avoid
unnecessary casting.

Fixes: a708fb7b1f8d ("net/mlx5e: ethtool, Add support for EEPROM high pages query")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/port.c    | 93 +++++++++++++++----
 1 file changed, 77 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index 9f829e68fc73..e4186e84b3ff 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -293,7 +293,40 @@ static int mlx5_query_module_num(struct mlx5_core_dev *dev, int *module_num)
 	return 0;
 }
 
-static int mlx5_eeprom_page(int offset)
+static int mlx5_query_module_id(struct mlx5_core_dev *dev, int module_num,
+				u8 *module_id)
+{
+	u32 in[MLX5_ST_SZ_DW(mcia_reg)] = {};
+	u32 out[MLX5_ST_SZ_DW(mcia_reg)];
+	int err, status;
+	u8 *ptr;
+
+	MLX5_SET(mcia_reg, in, i2c_device_address, MLX5_I2C_ADDR_LOW);
+	MLX5_SET(mcia_reg, in, module, module_num);
+	MLX5_SET(mcia_reg, in, device_address, 0);
+	MLX5_SET(mcia_reg, in, page_number, 0);
+	MLX5_SET(mcia_reg, in, size, 1);
+	MLX5_SET(mcia_reg, in, l, 0);
+
+	err = mlx5_core_access_reg(dev, in, sizeof(in), out,
+				   sizeof(out), MLX5_REG_MCIA, 0, 0);
+	if (err)
+		return err;
+
+	status = MLX5_GET(mcia_reg, out, status);
+	if (status) {
+		mlx5_core_err(dev, "query_mcia_reg failed: status: 0x%x\n",
+			      status);
+		return -EIO;
+	}
+	ptr = MLX5_ADDR_OF(mcia_reg, out, dword_0);
+
+	*module_id = ptr[0];
+
+	return 0;
+}
+
+static int mlx5_qsfp_eeprom_page(u16 offset)
 {
 	if (offset < MLX5_EEPROM_PAGE_LENGTH)
 		/* Addresses between 0-255 - page 00 */
@@ -307,7 +340,7 @@ static int mlx5_eeprom_page(int offset)
 		    MLX5_EEPROM_HIGH_PAGE_LENGTH);
 }
 
-static int mlx5_eeprom_high_page_offset(int page_num)
+static int mlx5_qsfp_eeprom_high_page_offset(int page_num)
 {
 	if (!page_num) /* Page 0 always start from low page */
 		return 0;
@@ -316,35 +349,62 @@ static int mlx5_eeprom_high_page_offset(int page_num)
 	return page_num * MLX5_EEPROM_HIGH_PAGE_LENGTH;
 }
 
+static void mlx5_qsfp_eeprom_params_set(u16 *i2c_addr, int *page_num, u16 *offset)
+{
+	*i2c_addr = MLX5_I2C_ADDR_LOW;
+	*page_num = mlx5_qsfp_eeprom_page(*offset);
+	*offset -=  mlx5_qsfp_eeprom_high_page_offset(*page_num);
+}
+
+static void mlx5_sfp_eeprom_params_set(u16 *i2c_addr, int *page_num, u16 *offset)
+{
+	*i2c_addr = MLX5_I2C_ADDR_LOW;
+	*page_num = 0;
+
+	if (*offset < MLX5_EEPROM_PAGE_LENGTH)
+		return;
+
+	*i2c_addr = MLX5_I2C_ADDR_HIGH;
+	*offset -= MLX5_EEPROM_PAGE_LENGTH;
+}
+
 int mlx5_query_module_eeprom(struct mlx5_core_dev *dev,
 			     u16 offset, u16 size, u8 *data)
 {
-	int module_num, page_num, status, err;
+	int module_num, status, err, page_num = 0;
+	u32 in[MLX5_ST_SZ_DW(mcia_reg)] = {};
 	u32 out[MLX5_ST_SZ_DW(mcia_reg)];
-	u32 in[MLX5_ST_SZ_DW(mcia_reg)];
-	u16 i2c_addr;
-	void *ptr = MLX5_ADDR_OF(mcia_reg, out, dword_0);
+	u16 i2c_addr = 0;
+	u8 module_id;
+	void *ptr;
 
 	err = mlx5_query_module_num(dev, &module_num);
 	if (err)
 		return err;
 
-	memset(in, 0, sizeof(in));
-	size = min_t(int, size, MLX5_EEPROM_MAX_BYTES);
-
-	/* Get the page number related to the given offset */
-	page_num = mlx5_eeprom_page(offset);
+	err = mlx5_query_module_id(dev, module_num, &module_id);
+	if (err)
+		return err;
 
-	/* Set the right offset according to the page number,
-	 * For page_num > 0, relative offset is always >= 128 (high page).
-	 */
-	offset -= mlx5_eeprom_high_page_offset(page_num);
+	switch (module_id) {
+	case MLX5_MODULE_ID_SFP:
+		mlx5_sfp_eeprom_params_set(&i2c_addr, &page_num, &offset);
+		break;
+	case MLX5_MODULE_ID_QSFP:
+	case MLX5_MODULE_ID_QSFP_PLUS:
+	case MLX5_MODULE_ID_QSFP28:
+		mlx5_qsfp_eeprom_params_set(&i2c_addr, &page_num, &offset);
+		break;
+	default:
+		mlx5_core_err(dev, "Module ID not recognized: 0x%x\n", module_id);
+		return -EINVAL;
+	}
 
 	if (offset + size > MLX5_EEPROM_PAGE_LENGTH)
 		/* Cross pages read, read until offset 256 in low page */
 		size -= offset + size - MLX5_EEPROM_PAGE_LENGTH;
 
-	i2c_addr = MLX5_I2C_ADDR_LOW;
+	size = min_t(int, size, MLX5_EEPROM_MAX_BYTES);
 
 	MLX5_SET(mcia_reg, in, l, 0);
 	MLX5_SET(mcia_reg, in, module, module_num);
@@ -365,6 +425,7 @@ int mlx5_query_module_eeprom(struct mlx5_core_dev *dev,
 		return -EIO;
 	}
 
+	ptr = MLX5_ADDR_OF(mcia_reg, out, dword_0);
 	memcpy(data, ptr, size);
 
 	return size;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
  2020-07-02 22:19 [pull request][net 00/11] mlx5 fixes 2020-07-02 Saeed Mahameed
  2020-07-02 22:19 ` [net 01/11] net/mlx5: Fix eeprom support for SFP module Saeed Mahameed
@ 2020-07-02 22:19 ` Saeed Mahameed
  2020-07-03  1:47   ` Jakub Kicinski
  2020-07-02 22:19 ` [net 03/11] net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode Saeed Mahameed
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-02 22:19 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Ron Diskin, Tariq Toukan, Saeed Mahameed

From: Ron Diskin <rondi@mellanox.com>

Currently the FW does not generate events for counters other than error
counters. Unlike ".get_ethtool_stats", ".ndo_get_stats64" (which ip -s
uses) might run in atomic context, while the FW interface is non atomic.
Thus, 'ip' is not allowed to issue fw commands, so it will only display
cached counters in the driver.

Add a SW counter (mcast_packets) in the driver to count rx multicast
packets. The counter also counts broadcast packets, as we consider it a
special case of multicast.
Use the counter value when calling "ip -s"/"ifconfig".  Display the new
counter when calling "ethtool -S", and add a matching counter
(mcast_bytes) for completeness.

Fixes: f62b8bb8f2d3 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h  | 5 +++++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 8 +-------
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    | 6 ++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 6 ++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 4 ++++
 5 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
index bfd3e1161bc6..5b554e79d734 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
@@ -13,6 +13,11 @@ enum mlx5e_icosq_wqe_type {
 	MLX5E_ICOSQ_WQE_UMR_RX,
 };
 
+static inline bool mlx5e_skb_is_multicast(struct sk_buff *skb)
+{
+	return skb->pkt_type == PACKET_MULTICAST || skb->pkt_type == PACKET_BROADCAST;
+}
+
 static inline bool
 mlx5e_wqc_has_room_for(struct mlx5_wq_cyc *wq, u16 cc, u16 pc, u16 n)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a836a02a2116..ee4cc723d225 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3558,6 +3558,7 @@ void mlx5e_fold_sw_stats64(struct mlx5e_priv *priv, struct rtnl_link_stats64 *s)
 
 		s->rx_packets   += rq_stats->packets + xskrq_stats->packets;
 		s->rx_bytes     += rq_stats->bytes + xskrq_stats->bytes;
+		s->multicast    += rq_stats->mcast_packets + xskrq_stats->mcast_packets;
 
 		for (j = 0; j < priv->max_opened_tc; j++) {
 			struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j];
@@ -3573,7 +3574,6 @@ void
 mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
-	struct mlx5e_vport_stats *vstats = &priv->stats.vport;
 	struct mlx5e_pport_stats *pstats = &priv->stats.pport;
 
 	/* In switchdev mode, monitor counters doesn't monitor
@@ -3608,12 +3608,6 @@ mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
 	stats->rx_errors = stats->rx_length_errors + stats->rx_crc_errors +
 			   stats->rx_frame_errors;
 	stats->tx_errors = stats->tx_aborted_errors + stats->tx_carrier_errors;
-
-	/* vport multicast also counts packets that are dropped due to steering
-	 * or rx out of buffer
-	 */
-	stats->multicast =
-		VPORT_COUNTER_GET(vstats, received_eth_multicast.packets);
 }
 
 static void mlx5e_set_rx_mode(struct net_device *dev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index dbb1c6323967..ce023f1c45d5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -50,6 +50,7 @@
 #include "en/xdp.h"
 #include "en/xsk/rx.h"
 #include "en/health.h"
+#include "en/txrx.h"
 
 static inline bool mlx5e_rx_hw_stamp(struct hwtstamp_config *config)
 {
@@ -1020,6 +1021,11 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 		mlx5e_enable_ecn(rq, skb);
 
 	skb->protocol = eth_type_trans(skb, netdev);
+
+	if (unlikely(mlx5e_skb_is_multicast(skb))) {
+		stats->mcast_packets++;
+		stats->mcast_bytes += cqe_bcnt;
+	}
 }
 
 static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index f009fe09e99b..f028971016d2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -114,6 +114,8 @@ static const struct counter_desc sw_stats_desc[] = {
 
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_lro_packets) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_lro_bytes) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_mcast_packets) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_mcast_bytes) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_ecn_mark) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_removed_vlan_packets) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_csum_unnecessary) },
@@ -243,6 +245,8 @@ static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
 		s->rx_bytes	+= rq_stats->bytes;
 		s->rx_lro_packets += rq_stats->lro_packets;
 		s->rx_lro_bytes	+= rq_stats->lro_bytes;
+		s->rx_mcast_packets += rq_stats->mcast_packets;
+		s->rx_mcast_bytes += rq_stats->mcast_bytes;
 		s->rx_ecn_mark	+= rq_stats->ecn_mark;
 		s->rx_removed_vlan_packets += rq_stats->removed_vlan_packets;
 		s->rx_csum_none	+= rq_stats->csum_none;
@@ -1458,6 +1462,8 @@ static const struct counter_desc rq_stats_desc[] = {
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_redirect) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, lro_packets) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, lro_bytes) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, mcast_packets) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, mcast_bytes) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, ecn_mark) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, removed_vlan_packets) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, wqe_err) },
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 2b83ba990714..86af203f5b05 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -119,6 +119,8 @@ struct mlx5e_sw_stats {
 	u64 tx_nop;
 	u64 rx_lro_packets;
 	u64 rx_lro_bytes;
+	u64 rx_mcast_packets;
+	u64 rx_mcast_bytes;
 	u64 rx_ecn_mark;
 	u64 rx_removed_vlan_packets;
 	u64 rx_csum_unnecessary;
@@ -286,6 +288,8 @@ struct mlx5e_rq_stats {
 	u64 csum_none;
 	u64 lro_packets;
 	u64 lro_bytes;
+	u64 mcast_packets;
+	u64 mcast_bytes;
 	u64 ecn_mark;
 	u64 removed_vlan_packets;
 	u64 xdp_drop;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net 03/11] net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode
  2020-07-02 22:19 [pull request][net 00/11] mlx5 fixes 2020-07-02 Saeed Mahameed
  2020-07-02 22:19 ` [net 01/11] net/mlx5: Fix eeprom support for SFP module Saeed Mahameed
  2020-07-02 22:19 ` [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s" Saeed Mahameed
@ 2020-07-02 22:19 ` Saeed Mahameed
  2020-07-02 22:19 ` [net 04/11] net/mxl5e: Verify that rpriv is not NULL Saeed Mahameed
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-02 22:19 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski; +Cc: netdev, Vu Pham, Saeed Mahameed

From: Vu Pham <vuhuong@mellanox.com>

Refactoring eswitch ingress acl codes accidentally inserts extra
memset zero that removes vlan and/or qos setting in legacy mode.

Fixes: 07bab9502641 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c
index 5dc335e621c5..b68976b378b8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c
@@ -217,7 +217,6 @@ int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw,
 	}
 
 	/* Create ingress allow rule */
-	memset(spec, 0, sizeof(*spec));
 	spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_ALLOW;
 	vport->ingress.allow_rule = mlx5_add_flow_rules(vport->ingress.acl, spec,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net 04/11] net/mxl5e: Verify that rpriv is not NULL
  2020-07-02 22:19 [pull request][net 00/11] mlx5 fixes 2020-07-02 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2020-07-02 22:19 ` [net 03/11] net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode Saeed Mahameed
@ 2020-07-02 22:19 ` Saeed Mahameed
  2020-07-02 22:19 ` [net 05/11] net/mlx5e: Hold reference on mirred devices while accessing them Saeed Mahameed
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-02 22:19 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Vlad Buslov, David Ahern, Roi Dayan, Saeed Mahameed

From: Vlad Buslov <vladbu@mellanox.com>

In helper function is_flow_rule_duplicate_allowed() verify that rpviv
pointer is not NULL before dereferencing it. This can happen when device is
in NIC mode and leads to following crash:

[90444.046419] BUG: kernel NULL pointer dereference, address: 0000000000000000
[90444.048149] #PF: supervisor read access in kernel mode
[90444.049781] #PF: error_code(0x0000) - not-present page
[90444.051386] PGD 80000003d35a4067 P4D 80000003d35a4067 PUD 3d35a3067 PMD 0
[90444.053051] Oops: 0000 [#1] SMP PTI
[90444.054683] CPU: 16 PID: 31736 Comm: tc Not tainted 5.8.0-rc1+ #1157
[90444.056340] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[90444.058079] RIP: 0010:mlx5e_configure_flower+0x3aa/0x9b0 [mlx5_core]
[90444.059753] Code: 24 50 49 8b 95 08 02 00 00 48 b8 00 08 00 00 04 00 00 00 48 21 c2 48 39 c2 74 0a 41 f6 85 0d 02 00 00 20 74 16 48 8b 44 24 20 <48> 8b 00 66 83 78 20 ff 74 07 4d 89 aa e0 00 00 00 48 83 7d 28 00
[90444.063232] RSP: 0018:ffffabe9c61ff768 EFLAGS: 00010246
[90444.065014] RAX: 0000000000000000 RBX: ffff9b13c4c91e80 RCX: 00000000000093fa
[90444.066784] RDX: 0000000400000800 RSI: 0000000000000000 RDI: 000000000002d5e0
[90444.068533] RBP: ffff9b174d308468 R08: 0000000000000000 R09: ffff9b17d63003f0
[90444.070285] R10: ffff9b17ea288600 R11: 0000000000000000 R12: ffffabe9c61ff878
[90444.072032] R13: ffff9b174d300000 R14: ffffabe9c61ffbb8 R15: ffff9b174d300880
[90444.073760] FS:  00007f3c23775480(0000) GS:ffff9b13efc80000(0000) knlGS:0000000000000000
[90444.075492] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[90444.077266] CR2: 0000000000000000 CR3: 00000003e2a60002 CR4: 00000000001606e0
[90444.079024] Call Trace:
[90444.080753]  tc_setup_cb_add+0xca/0x1e0
[90444.082415]  fl_hw_replace_filter+0x15f/0x1f0 [cls_flower]
[90444.084119]  fl_change+0xa59/0x13dc [cls_flower]
[90444.085772]  ? wait_for_completion+0xa8/0xf0
[90444.087364]  tc_new_tfilter+0x3f5/0xa60
[90444.088960]  rtnetlink_rcv_msg+0xeb/0x360
[90444.090514]  ? __d_lookup_done+0x76/0xe0
[90444.092034]  ? proc_alloc_inode+0x16/0x70
[90444.093560]  ? prep_new_page+0x8c/0xf0
[90444.095048]  ? _cond_resched+0x15/0x30
[90444.096483]  ? rtnl_calcit.isra.0+0x110/0x110
[90444.097907]  netlink_rcv_skb+0x49/0x110
[90444.099289]  netlink_unicast+0x191/0x230
[90444.100629]  netlink_sendmsg+0x243/0x480
[90444.101984]  sock_sendmsg+0x5e/0x60
[90444.103305]  ____sys_sendmsg+0x1f3/0x260
[90444.104597]  ? copy_msghdr_from_user+0x5c/0x90
[90444.105916]  ? __mod_lruvec_state+0x3c/0xe0
[90444.107210]  ___sys_sendmsg+0x81/0xc0
[90444.108484]  ? do_filp_open+0xa5/0x100
[90444.109732]  ? handle_mm_fault+0x117b/0x1e00
[90444.110970]  ? __check_object_size+0x46/0x147
[90444.112205]  ? __check_object_size+0x136/0x147
[90444.113402]  __sys_sendmsg+0x59/0xa0
[90444.114587]  do_syscall_64+0x4d/0x90
[90444.115782]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[90444.116953] RIP: 0033:0x7f3c2393b7b8
[90444.118101] Code: Bad RIP value.
[90444.119240] RSP: 002b:00007ffc6ad8e6c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[90444.120408] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3c2393b7b8
[90444.121583] RDX: 0000000000000000 RSI: 00007ffc6ad8e740 RDI: 0000000000000003
[90444.122750] RBP: 000000005eea0c3a R08: 0000000000000001 R09: 00007ffc6ad8e68c
[90444.123928] R10: 0000000000404fa8 R11: 0000000000000246 R12: 0000000000000001
[90444.125073] R13: 0000000000000000 R14: 00007ffc6ad92a00 R15: 00000000004866a0
[90444.126221] Modules linked in: act_skbedit act_tunnel_key act_mirred bonding vxlan ip6_udp_tunnel udp_tunnel nfnetlink act_gact cls_flower sch_ingress openvswitch nsh nf_conncount nfsv3 nfs_acl nfs lockd grace fscache tun bridge stp llc sunrpc rdma_ucm rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core intel_r
apl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlxfw kvm act_ct nf_flow_table nf_nat nf_conntrack irqbypass crct10dif_pclmul nf_defrag_ipv6 igb ipmi_ssif libcrc32c crc32_pclmul crc32c_intel ipmi_si nf_defrag_ipv4 ptp ghash_clmulni_intel mei_me ses iTCO_wdt i2c_i801 pps_core
ioatdma iTCO_vendor_support joydev mei enclosure intel_cstate i2c_smbus wmi dca ipmi_devintf intel_uncore lpc_ich ipmi_msghandler pcspkr acpi_pad acpi_power_meter ast i2c_algo_bit drm_vram_helper drm_kms_helper drm_ttm_helper ttm drm mpt3sas raid_class scsi_transport_sas
[90444.136253] CR2: 0000000000000000
[90444.137621] ---[ end trace 924af62aa2b151bd ]---

Fixes: 553f9328385d ("net/mlx5e: Support tc block sharing for representors")
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 7fc84f58e28a..75f169aef1cf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -4670,9 +4670,10 @@ static bool is_flow_rule_duplicate_allowed(struct net_device *dev,
 					   struct mlx5e_rep_priv *rpriv)
 {
 	/* Offloaded flow rule is allowed to duplicate on non-uplink representor
-	 * sharing tc block with other slaves of a lag device.
+	 * sharing tc block with other slaves of a lag device. Rpriv can be NULL if this
+	 * function is called from NIC mode.
 	 */
-	return netif_is_lag_port(dev) && rpriv->rep->vport != MLX5_VPORT_UPLINK;
+	return netif_is_lag_port(dev) && rpriv && rpriv->rep->vport != MLX5_VPORT_UPLINK;
 }
 
 int mlx5e_configure_flower(struct net_device *dev, struct mlx5e_priv *priv,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net 05/11] net/mlx5e: Hold reference on mirred devices while accessing them
  2020-07-02 22:19 [pull request][net 00/11] mlx5 fixes 2020-07-02 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2020-07-02 22:19 ` [net 04/11] net/mxl5e: Verify that rpriv is not NULL Saeed Mahameed
@ 2020-07-02 22:19 ` Saeed Mahameed
  2020-07-03  9:33   ` Or Gerlitz
  2020-07-02 22:19 ` [net 06/11] net/mlx5e: Fix usage of rcu-protected pointer Saeed Mahameed
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-02 22:19 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Eli Cohen, Vlad Buslov, Saeed Mahameed

From: Eli Cohen <eli@mellanox.com>

Net devices might be removed. For example, a vxlan device could be
deleted and its ifnidex would become invalid. Use dev_get_by_index()
instead of __dev_get_by_index() to hold reference on the device while
accessing it and release after done.

Fixes: 3c37745ec614 ("net/mlx5e: Properly deal with encap flows add/del under neigh update")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 75f169aef1cf..e88f98ab062f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1327,10 +1327,14 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 			continue;
 
 		mirred_ifindex = parse_attr->mirred_ifindex[out_index];
-		out_dev = __dev_get_by_index(dev_net(priv->netdev),
-					     mirred_ifindex);
+		out_dev = dev_get_by_index(dev_net(priv->netdev),
+					   mirred_ifindex);
+		if (!out_dev)
+			return -ENODEV;
+
 		err = mlx5e_attach_encap(priv, flow, out_dev, out_index,
 					 extack, &encap_dev, &encap_valid);
+		dev_put(out_dev);
 		if (err)
 			return err;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net 06/11] net/mlx5e: Fix usage of rcu-protected pointer
  2020-07-02 22:19 [pull request][net 00/11] mlx5 fixes 2020-07-02 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2020-07-02 22:19 ` [net 05/11] net/mlx5e: Hold reference on mirred devices while accessing them Saeed Mahameed
@ 2020-07-02 22:19 ` Saeed Mahameed
  2020-07-02 22:19 ` [net 07/11] net/mlx5e: Fix VXLAN configuration restore after function reload Saeed Mahameed
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-02 22:19 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Vlad Buslov, Roi Dayan, Saeed Mahameed

From: Vlad Buslov <vladbu@mellanox.com>

In mlx5e_configure_flower() flow pointer is protected by rcu read lock.
However, after cited commit the pointer is being used outside of rcu read
block. Extend the block to protect all pointer accesses.

Fixes: 553f9328385d ("net/mlx5e: Support tc block sharing for representors")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index e88f98ab062f..518957d82b1c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -4691,13 +4691,12 @@ int mlx5e_configure_flower(struct net_device *dev, struct mlx5e_priv *priv,
 
 	rcu_read_lock();
 	flow = rhashtable_lookup(tc_ht, &f->cookie, tc_ht_params);
-	rcu_read_unlock();
 	if (flow) {
 		/* Same flow rule offloaded to non-uplink representor sharing tc block,
 		 * just return 0.
 		 */
 		if (is_flow_rule_duplicate_allowed(dev, rpriv) && flow->orig_dev != dev)
-			goto out;
+			goto rcu_unlock;
 
 		NL_SET_ERR_MSG_MOD(extack,
 				   "flow cookie already exists, ignoring");
@@ -4705,8 +4704,12 @@ int mlx5e_configure_flower(struct net_device *dev, struct mlx5e_priv *priv,
 				 "flow cookie %lx already exists, ignoring\n",
 				 f->cookie);
 		err = -EEXIST;
-		goto out;
+		goto rcu_unlock;
 	}
+rcu_unlock:
+	rcu_read_unlock();
+	if (flow)
+		goto out;
 
 	trace_mlx5e_configure_flower(f);
 	err = mlx5e_tc_add_flow(priv, f, flags, dev, &flow);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net 07/11] net/mlx5e: Fix VXLAN configuration restore after function reload
  2020-07-02 22:19 [pull request][net 00/11] mlx5 fixes 2020-07-02 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2020-07-02 22:19 ` [net 06/11] net/mlx5e: Fix usage of rcu-protected pointer Saeed Mahameed
@ 2020-07-02 22:19 ` Saeed Mahameed
  2020-07-02 22:19 ` [net 08/11] net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash Saeed Mahameed
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-02 22:19 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Aya Levin, Eran Ben Elisha, Saeed Mahameed

From: Aya Levin <ayal@mellanox.com>

When detaching netdev, remove vxlan port configuration using
udp_tunnel_drop_rx_info. During function reload, configuration will be
restored using udp_tunnel_get_rx_info. This ensures sync between
firmware and driver. Use udp_tunnel_get_rx_info even if its physical
interface is down.

Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ee4cc723d225..3193b0e50d2d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3104,9 +3104,6 @@ int mlx5e_open(struct net_device *netdev)
 		mlx5_set_port_admin_status(priv->mdev, MLX5_PORT_UP);
 	mutex_unlock(&priv->state_lock);
 
-	if (mlx5_vxlan_allowed(priv->mdev->vxlan))
-		udp_tunnel_get_rx_info(netdev);
-
 	return err;
 }
 
@@ -5196,6 +5193,8 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
 	rtnl_lock();
 	if (netif_running(netdev))
 		mlx5e_open(netdev);
+	if (mlx5_vxlan_allowed(priv->mdev->vxlan))
+		udp_tunnel_get_rx_info(netdev);
 	netif_device_attach(netdev);
 	rtnl_unlock();
 }
@@ -5210,6 +5209,8 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
 	rtnl_lock();
 	if (netif_running(priv->netdev))
 		mlx5e_close(priv->netdev);
+	if (mlx5_vxlan_allowed(priv->mdev->vxlan))
+		udp_tunnel_drop_rx_info(priv->netdev);
 	netif_device_detach(priv->netdev);
 	rtnl_unlock();
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net 08/11] net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash
  2020-07-02 22:19 [pull request][net 00/11] mlx5 fixes 2020-07-02 Saeed Mahameed
                   ` (6 preceding siblings ...)
  2020-07-02 22:19 ` [net 07/11] net/mlx5e: Fix VXLAN configuration restore after function reload Saeed Mahameed
@ 2020-07-02 22:19 ` Saeed Mahameed
  2020-07-02 22:19 ` [net 09/11] net/mlx5e: Fix 50G per lane indication Saeed Mahameed
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-02 22:19 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Aya Levin, Eran Ben Elisha, Saeed Mahameed

From: Aya Levin <ayal@mellanox.com>

After function reload, CPU mapping used by aRFS RX is broken, leading to
a kernel panic. Fix by moving initialization of rx_cpu_rmap from
netdev_init to netdev_attach. IRQ table is re-allocated on mlx5_load,
but netdev is not re-initialize.

Trace of the panic:
[ 22.055672] general protection fault, probably for non-canonical address 0x785634120000ff1c: 0000 [#1] SMP PTI
[ 22.065010] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.7.0-rc2-for-upstream-perf-2020-04-21_16-34-03-31 #1
[ 22.067967] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 22.071174] RIP: 0010:get_rps_cpu+0x267/0x300
[ 22.075692] RSP: 0018:ffffc90000244d60 EFLAGS: 00010202
[ 22.076888] RAX: ffff888459b0e400 RBX: 0000000000000000 RCX:0000000000000007
[ 22.078364] RDX: 0000000000008884 RSI: ffff888467cb5b00 RDI:0000000000000000
[ 22.079815] RBP: 00000000ff342b27 R08: 0000000000000007 R09:0000000000000003
[ 22.081289] R10: ffffffffffffffff R11: 00000000000070cc R12:ffff888454900000
[ 22.082767] R13: ffffc90000e5a950 R14: ffffc90000244dc0 R15:0000000000000007
[ 22.084190] FS: 0000000000000000(0000) GS:ffff88846fc80000(0000)knlGS:0000000000000000
[ 22.086161] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 22.087427] CR2: ffffffffffffffff CR3: 0000000464426003 CR4:0000000000760ee0
[ 22.088888] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
[ 22.090336] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
[ 22.091764] PKRU: 55555554
[ 22.092618] Call Trace:
[ 22.093442] <IRQ>
[ 22.094211] ? kvm_clock_get_cycles+0xd/0x10
[ 22.095272] netif_receive_skb_list_internal+0x258/0x2a0
[ 22.096460] gro_normal_list.part.137+0x19/0x40
[ 22.097547] napi_complete_done+0xc6/0x110
[ 22.098685] mlx5e_napi_poll+0x190/0x670 [mlx5_core]
[ 22.099859] net_rx_action+0x2a0/0x400
[ 22.100848] __do_softirq+0xd8/0x2a8
[ 22.101829] irq_exit+0xa5/0xb0
[ 22.102750] do_IRQ+0x52/0xd0
[ 22.103654] common_interrupt+0xf/0xf
[ 22.104641] </IRQ>

Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 3193b0e50d2d..a38b79a22d6b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5112,6 +5112,10 @@ static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
 	if (err)
 		goto err_destroy_flow_steering;
 
+#ifdef CONFIG_MLX5_EN_ARFS
+	priv->netdev->rx_cpu_rmap =  mlx5_eq_table_get_rmap(priv->mdev);
+#endif
+
 	return 0;
 
 err_destroy_flow_steering:
@@ -5283,10 +5287,6 @@ int mlx5e_netdev_init(struct net_device *netdev,
 	/* netdev init */
 	netif_carrier_off(netdev);
 
-#ifdef CONFIG_MLX5_EN_ARFS
-	netdev->rx_cpu_rmap =  mlx5_eq_table_get_rmap(mdev);
-#endif
-
 	return 0;
 
 err_free_cpumask:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net 09/11] net/mlx5e: Fix 50G per lane indication
  2020-07-02 22:19 [pull request][net 00/11] mlx5 fixes 2020-07-02 Saeed Mahameed
                   ` (7 preceding siblings ...)
  2020-07-02 22:19 ` [net 08/11] net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash Saeed Mahameed
@ 2020-07-02 22:19 ` Saeed Mahameed
  2020-07-02 22:19 ` [net 10/11] net/mlx5e: Fix port buffers cell size value Saeed Mahameed
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-02 22:19 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Aya Levin, Eran Ben Elisha, Saeed Mahameed

From: Aya Levin <ayal@mellanox.com>

Some released FW versions mistakenly don't set the capability that 50G
per lane link-modes are supported for VFs (ptys_extended_ethernet
capability bit). When the capability is unset, read
PTYS.ext_eth_proto_capability (always reliable).
If PTYS.ext_eth_proto_capability is valid (has a non-zero value)
conclude that the HCA supports 50G per lane. Otherwise, conclude that
the HCA doesn't support 50G per lane.

Fixes: a08b4ed1373d ("net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/port.c | 21 ++++++++++++++++---
 .../net/ethernet/mellanox/mlx5/core/en/port.h |  2 +-
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |  8 +++----
 3 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
index 2a8950b3056f..3cf3e35053f7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
@@ -78,11 +78,26 @@ static const u32 mlx5e_ext_link_speed[MLX5E_EXT_LINK_MODES_NUMBER] = {
 	[MLX5E_400GAUI_8]			= 400000,
 };
 
+bool mlx5e_ptys_ext_supported(struct mlx5_core_dev *mdev)
+{
+	struct mlx5e_port_eth_proto eproto;
+	int err;
+
+	if (MLX5_CAP_PCAM_FEATURE(mdev, ptys_extended_ethernet))
+		return true;
+
+	err = mlx5_port_query_eth_proto(mdev, 1, true, &eproto);
+	if (err)
+		return false;
+
+	return !!eproto.cap;
+}
+
 static void mlx5e_port_get_speed_arr(struct mlx5_core_dev *mdev,
 				     const u32 **arr, u32 *size,
 				     bool force_legacy)
 {
-	bool ext = force_legacy ? false : MLX5_CAP_PCAM_FEATURE(mdev, ptys_extended_ethernet);
+	bool ext = force_legacy ? false : mlx5e_ptys_ext_supported(mdev);
 
 	*size = ext ? ARRAY_SIZE(mlx5e_ext_link_speed) :
 		      ARRAY_SIZE(mlx5e_link_speed);
@@ -177,7 +192,7 @@ int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed)
 	bool ext;
 	int err;
 
-	ext = MLX5_CAP_PCAM_FEATURE(mdev, ptys_extended_ethernet);
+	ext = mlx5e_ptys_ext_supported(mdev);
 	err = mlx5_port_query_eth_proto(mdev, 1, ext, &eproto);
 	if (err)
 		goto out;
@@ -205,7 +220,7 @@ int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed)
 	int err;
 	int i;
 
-	ext = MLX5_CAP_PCAM_FEATURE(mdev, ptys_extended_ethernet);
+	ext = mlx5e_ptys_ext_supported(mdev);
 	err = mlx5_port_query_eth_proto(mdev, 1, ext, &eproto);
 	if (err)
 		return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.h b/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
index a2ddd446dd59..7a7defe60792 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
@@ -54,7 +54,7 @@ int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
 int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
 u32 mlx5e_port_speed2linkmodes(struct mlx5_core_dev *mdev, u32 speed,
 			       bool force_legacy);
-
+bool mlx5e_ptys_ext_supported(struct mlx5_core_dev *mdev);
 int mlx5e_port_query_pbmc(struct mlx5_core_dev *mdev, void *out);
 int mlx5e_port_set_pbmc(struct mlx5_core_dev *mdev, void *in);
 int mlx5e_port_query_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index ec5658bbe3c5..c2464c349117 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -200,7 +200,7 @@ static void mlx5e_ethtool_get_speed_arr(struct mlx5_core_dev *mdev,
 					struct ptys2ethtool_config **arr,
 					u32 *size)
 {
-	bool ext = MLX5_CAP_PCAM_FEATURE(mdev, ptys_extended_ethernet);
+	bool ext = mlx5e_ptys_ext_supported(mdev);
 
 	*arr = ext ? ptys2ext_ethtool_table : ptys2legacy_ethtool_table;
 	*size = ext ? ARRAY_SIZE(ptys2ext_ethtool_table) :
@@ -883,7 +883,7 @@ static void get_lp_advertising(struct mlx5_core_dev *mdev, u32 eth_proto_lp,
 			       struct ethtool_link_ksettings *link_ksettings)
 {
 	unsigned long *lp_advertising = link_ksettings->link_modes.lp_advertising;
-	bool ext = MLX5_CAP_PCAM_FEATURE(mdev, ptys_extended_ethernet);
+	bool ext = mlx5e_ptys_ext_supported(mdev);
 
 	ptys2ethtool_adver_link(lp_advertising, eth_proto_lp, ext);
 }
@@ -913,7 +913,7 @@ int mlx5e_ethtool_get_link_ksettings(struct mlx5e_priv *priv,
 			   __func__, err);
 		goto err_query_regs;
 	}
-	ext = MLX5_CAP_PCAM_FEATURE(mdev, ptys_extended_ethernet);
+	ext = !!MLX5_GET_ETH_PROTO(ptys_reg, out, true, eth_proto_capability);
 	eth_proto_cap    = MLX5_GET_ETH_PROTO(ptys_reg, out, ext,
 					      eth_proto_capability);
 	eth_proto_admin  = MLX5_GET_ETH_PROTO(ptys_reg, out, ext,
@@ -1066,7 +1066,7 @@ int mlx5e_ethtool_set_link_ksettings(struct mlx5e_priv *priv,
 	autoneg = link_ksettings->base.autoneg;
 	speed = link_ksettings->base.speed;
 
-	ext_supported = MLX5_CAP_PCAM_FEATURE(mdev, ptys_extended_ethernet);
+	ext_supported = mlx5e_ptys_ext_supported(mdev);
 	ext = ext_requested(autoneg, adver, ext_supported);
 	if (!ext_supported && ext)
 		return -EOPNOTSUPP;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net 10/11] net/mlx5e: Fix port buffers cell size value
  2020-07-02 22:19 [pull request][net 00/11] mlx5 fixes 2020-07-02 Saeed Mahameed
                   ` (8 preceding siblings ...)
  2020-07-02 22:19 ` [net 09/11] net/mlx5e: Fix 50G per lane indication Saeed Mahameed
@ 2020-07-02 22:19 ` Saeed Mahameed
  2020-07-02 22:19 ` [net 11/11] net/mlx5e: CT: Fix memory leak in cleanup Saeed Mahameed
  2020-07-07 20:33 ` [pull request][net 00/11] mlx5 fixes 2020-07-02 David Miller
  11 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-02 22:19 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Eran Ben Elisha, Huy Nguyen, Saeed Mahameed

From: Eran Ben Elisha <eranbe@mellanox.com>

Device unit for port buffers size, xoff_threshold and xon_threshold is
cells. Fix a bug in driver where cell unit size was hard-coded to
128 bytes. This hard-coded value is buggy, as it is wrong for some hardware
versions.

Driver to read cell size from SBCAM register and translate bytes to cell
units accordingly.

In order to fix the bug, this patch exposes SBCAM (Shared buffer
capabilities mask) layout and defines.

If SBCAM.cap_cell_size is valid, use it for all bytes to cells
calculations. If not valid, fallback to 128.

Cell size do not change on the fly per device. Instead of issuing SBCAM
access reg command every time such translation is needed, cache it in
mlx5e_dcbx as part of mlx5e_dcbnl_initialize(). Pass dcbx.port_buff_cell_sz
as a param to every function that needs bytes to cells translation.

While fixing the bug, move MLX5E_BUFFER_CELL_SHIFT macro to
en_dcbnl.c, as it is only used by that file.

Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../ethernet/mellanox/mlx5/core/en/dcbnl.h    |  1 +
 .../mellanox/mlx5/core/en/port_buffer.c       | 53 ++++++++++---------
 .../mellanox/mlx5/core/en/port_buffer.h       |  1 -
 .../ethernet/mellanox/mlx5/core/en_dcbnl.c    | 19 +++++++
 include/linux/mlx5/driver.h                   |  1 +
 include/linux/mlx5/mlx5_ifc.h                 | 28 ++++++++++
 6 files changed, 78 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/dcbnl.h b/drivers/net/ethernet/mellanox/mlx5/core/en/dcbnl.h
index 7be6b2d36b60..9976de8b9047 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/dcbnl.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/dcbnl.h
@@ -29,6 +29,7 @@ struct mlx5e_dcbx {
 	bool                       manual_buffer;
 	u32                        cable_len;
 	u32                        xoff;
+	u16                        port_buff_cell_sz;
 };
 
 #define MLX5E_MAX_DSCP (64)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
index ae99fac08b53..673f1c82d381 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
@@ -34,6 +34,7 @@
 int mlx5e_port_query_buffer(struct mlx5e_priv *priv,
 			    struct mlx5e_port_buffer *port_buffer)
 {
+	u16 port_buff_cell_sz = priv->dcbx.port_buff_cell_sz;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	int sz = MLX5_ST_SZ_BYTES(pbmc_reg);
 	u32 total_used = 0;
@@ -57,11 +58,11 @@ int mlx5e_port_query_buffer(struct mlx5e_priv *priv,
 		port_buffer->buffer[i].epsb =
 			MLX5_GET(bufferx_reg, buffer, epsb);
 		port_buffer->buffer[i].size =
-			MLX5_GET(bufferx_reg, buffer, size) << MLX5E_BUFFER_CELL_SHIFT;
+			MLX5_GET(bufferx_reg, buffer, size) * port_buff_cell_sz;
 		port_buffer->buffer[i].xon =
-			MLX5_GET(bufferx_reg, buffer, xon_threshold) << MLX5E_BUFFER_CELL_SHIFT;
+			MLX5_GET(bufferx_reg, buffer, xon_threshold) * port_buff_cell_sz;
 		port_buffer->buffer[i].xoff =
-			MLX5_GET(bufferx_reg, buffer, xoff_threshold) << MLX5E_BUFFER_CELL_SHIFT;
+			MLX5_GET(bufferx_reg, buffer, xoff_threshold) * port_buff_cell_sz;
 		total_used += port_buffer->buffer[i].size;
 
 		mlx5e_dbg(HW, priv, "buffer %d: size=%d, xon=%d, xoff=%d, epsb=%d, lossy=%d\n", i,
@@ -73,7 +74,7 @@ int mlx5e_port_query_buffer(struct mlx5e_priv *priv,
 	}
 
 	port_buffer->port_buffer_size =
-		MLX5_GET(pbmc_reg, out, port_buffer_size) << MLX5E_BUFFER_CELL_SHIFT;
+		MLX5_GET(pbmc_reg, out, port_buffer_size) * port_buff_cell_sz;
 	port_buffer->spare_buffer_size =
 		port_buffer->port_buffer_size - total_used;
 
@@ -88,9 +89,9 @@ int mlx5e_port_query_buffer(struct mlx5e_priv *priv,
 static int port_set_buffer(struct mlx5e_priv *priv,
 			   struct mlx5e_port_buffer *port_buffer)
 {
+	u16 port_buff_cell_sz = priv->dcbx.port_buff_cell_sz;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	int sz = MLX5_ST_SZ_BYTES(pbmc_reg);
-	void *buffer;
 	void *in;
 	int err;
 	int i;
@@ -104,16 +105,18 @@ static int port_set_buffer(struct mlx5e_priv *priv,
 		goto out;
 
 	for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
-		buffer = MLX5_ADDR_OF(pbmc_reg, in, buffer[i]);
-
-		MLX5_SET(bufferx_reg, buffer, size,
-			 port_buffer->buffer[i].size >> MLX5E_BUFFER_CELL_SHIFT);
-		MLX5_SET(bufferx_reg, buffer, lossy,
-			 port_buffer->buffer[i].lossy);
-		MLX5_SET(bufferx_reg, buffer, xoff_threshold,
-			 port_buffer->buffer[i].xoff >> MLX5E_BUFFER_CELL_SHIFT);
-		MLX5_SET(bufferx_reg, buffer, xon_threshold,
-			 port_buffer->buffer[i].xon >> MLX5E_BUFFER_CELL_SHIFT);
+		void *buffer = MLX5_ADDR_OF(pbmc_reg, in, buffer[i]);
+		u64 size = port_buffer->buffer[i].size;
+		u64 xoff = port_buffer->buffer[i].xoff;
+		u64 xon = port_buffer->buffer[i].xon;
+
+		do_div(size, port_buff_cell_sz);
+		do_div(xoff, port_buff_cell_sz);
+		do_div(xon, port_buff_cell_sz);
+		MLX5_SET(bufferx_reg, buffer, size, size);
+		MLX5_SET(bufferx_reg, buffer, lossy, port_buffer->buffer[i].lossy);
+		MLX5_SET(bufferx_reg, buffer, xoff_threshold, xoff);
+		MLX5_SET(bufferx_reg, buffer, xon_threshold, xon);
 	}
 
 	err = mlx5e_port_set_pbmc(mdev, in);
@@ -143,7 +146,7 @@ static u32 calculate_xoff(struct mlx5e_priv *priv, unsigned int mtu)
 }
 
 static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer,
-				 u32 xoff, unsigned int max_mtu)
+				 u32 xoff, unsigned int max_mtu, u16 port_buff_cell_sz)
 {
 	int i;
 
@@ -155,7 +158,7 @@ static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer,
 		}
 
 		if (port_buffer->buffer[i].size <
-		    (xoff + max_mtu + (1 << MLX5E_BUFFER_CELL_SHIFT))) {
+		    (xoff + max_mtu + port_buff_cell_sz)) {
 			pr_err("buffer_size[%d]=%d is not enough for lossless buffer\n",
 			       i, port_buffer->buffer[i].size);
 			return -ENOMEM;
@@ -175,6 +178,7 @@ static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer,
  *	@pfc_en: <input> current pfc configuration
  *	@buffer: <input> current prio to buffer mapping
  *	@xoff:   <input> xoff value
+ *	@port_buff_cell_sz: <input> port buffer cell_size
  *	@port_buffer: <output> port receive buffer configuration
  *	@change: <output>
  *
@@ -189,7 +193,7 @@ static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer,
  *	sets change to true if buffer configuration was modified.
  */
 static int update_buffer_lossy(unsigned int max_mtu,
-			       u8 pfc_en, u8 *buffer, u32 xoff,
+			       u8 pfc_en, u8 *buffer, u32 xoff, u16 port_buff_cell_sz,
 			       struct mlx5e_port_buffer *port_buffer,
 			       bool *change)
 {
@@ -225,7 +229,7 @@ static int update_buffer_lossy(unsigned int max_mtu,
 	}
 
 	if (changed) {
-		err = update_xoff_threshold(port_buffer, xoff, max_mtu);
+		err = update_xoff_threshold(port_buffer, xoff, max_mtu, port_buff_cell_sz);
 		if (err)
 			return err;
 
@@ -262,6 +266,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
 				    u32 *buffer_size,
 				    u8 *prio2buffer)
 {
+	u16 port_buff_cell_sz = priv->dcbx.port_buff_cell_sz;
 	struct mlx5e_port_buffer port_buffer;
 	u32 xoff = calculate_xoff(priv, mtu);
 	bool update_prio2buffer = false;
@@ -282,7 +287,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
 
 	if (change & MLX5E_PORT_BUFFER_CABLE_LEN) {
 		update_buffer = true;
-		err = update_xoff_threshold(&port_buffer, xoff, max_mtu);
+		err = update_xoff_threshold(&port_buffer, xoff, max_mtu, port_buff_cell_sz);
 		if (err)
 			return err;
 	}
@@ -292,7 +297,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
 		if (err)
 			return err;
 
-		err = update_buffer_lossy(max_mtu, pfc->pfc_en, buffer, xoff,
+		err = update_buffer_lossy(max_mtu, pfc->pfc_en, buffer, xoff, port_buff_cell_sz,
 					  &port_buffer, &update_buffer);
 		if (err)
 			return err;
@@ -304,7 +309,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
 		if (err)
 			return err;
 
-		err = update_buffer_lossy(max_mtu, curr_pfc_en, prio2buffer,
+		err = update_buffer_lossy(max_mtu, curr_pfc_en, prio2buffer, port_buff_cell_sz,
 					  xoff, &port_buffer, &update_buffer);
 		if (err)
 			return err;
@@ -329,7 +334,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
 			return -EINVAL;
 
 		update_buffer = true;
-		err = update_xoff_threshold(&port_buffer, xoff, max_mtu);
+		err = update_xoff_threshold(&port_buffer, xoff, max_mtu, port_buff_cell_sz);
 		if (err)
 			return err;
 	}
@@ -337,7 +342,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
 	/* Need to update buffer configuration if xoff value is changed */
 	if (!update_buffer && xoff != priv->dcbx.xoff) {
 		update_buffer = true;
-		err = update_xoff_threshold(&port_buffer, xoff, max_mtu);
+		err = update_xoff_threshold(&port_buffer, xoff, max_mtu, port_buff_cell_sz);
 		if (err)
 			return err;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h
index 34f55b81a0de..80af7a5ac604 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h
@@ -36,7 +36,6 @@
 #include "port.h"
 
 #define MLX5E_MAX_BUFFER 8
-#define MLX5E_BUFFER_CELL_SHIFT 7
 #define MLX5E_DEFAULT_CABLE_LEN 7 /* 7 meters */
 
 #define MLX5_BUFFER_SUPPORTED(mdev) (MLX5_CAP_GEN(mdev, pcam_reg) && \
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index bc102d094bbd..d20243d6a032 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -1217,6 +1217,24 @@ static int mlx5e_trust_initialize(struct mlx5e_priv *priv)
 	return 0;
 }
 
+#define MLX5E_BUFFER_CELL_SHIFT 7
+
+static u16 mlx5e_query_port_buffers_cell_size(struct mlx5e_priv *priv)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	u32 out[MLX5_ST_SZ_DW(sbcam_reg)] = {};
+	u32 in[MLX5_ST_SZ_DW(sbcam_reg)] = {};
+
+	if (!MLX5_CAP_GEN(mdev, sbcam_reg))
+		return (1 << MLX5E_BUFFER_CELL_SHIFT);
+
+	if (mlx5_core_access_reg(mdev, in, sizeof(in), out, sizeof(out),
+				 MLX5_REG_SBCAM, 0, 0))
+		return (1 << MLX5E_BUFFER_CELL_SHIFT);
+
+	return MLX5_GET(sbcam_reg, out, cap_cell_size);
+}
+
 void mlx5e_dcbnl_initialize(struct mlx5e_priv *priv)
 {
 	struct mlx5e_dcbx *dcbx = &priv->dcbx;
@@ -1234,6 +1252,7 @@ void mlx5e_dcbnl_initialize(struct mlx5e_priv *priv)
 	if (priv->dcbx.mode == MLX5E_DCBX_PARAM_VER_OPER_HOST)
 		priv->dcbx.cap |= DCB_CAP_DCBX_HOST;
 
+	priv->dcbx.port_buff_cell_sz = mlx5e_query_port_buffers_cell_size(priv);
 	priv->dcbx.manual_buffer = false;
 	priv->dcbx.cable_len = MLX5E_DEFAULT_CABLE_LEN;
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 13c0e4556eda..1e6ca716635a 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -147,6 +147,7 @@ enum {
 	MLX5_REG_MCDA		 = 0x9063,
 	MLX5_REG_MCAM		 = 0x907f,
 	MLX5_REG_MIRC		 = 0x9162,
+	MLX5_REG_SBCAM		 = 0xB01F,
 	MLX5_REG_RESOURCE_DUMP   = 0xC000,
 };
 
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index ca1887dd0423..073b79eacc99 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -9960,6 +9960,34 @@ struct mlx5_ifc_pptb_reg_bits {
 	u8         untagged_buff[0x4];
 };
 
+struct mlx5_ifc_sbcam_reg_bits {
+	u8         reserved_at_0[0x8];
+	u8         feature_group[0x8];
+	u8         reserved_at_10[0x8];
+	u8         access_reg_group[0x8];
+
+	u8         reserved_at_20[0x20];
+
+	u8         sb_access_reg_cap_mask[4][0x20];
+
+	u8         reserved_at_c0[0x80];
+
+	u8         sb_feature_cap_mask[4][0x20];
+
+	u8         reserved_at_1c0[0x40];
+
+	u8         cap_total_buffer_size[0x20];
+
+	u8         cap_cell_size[0x10];
+	u8         cap_max_pg_buffers[0x8];
+	u8         cap_num_pool_supported[0x8];
+
+	u8         reserved_at_240[0x8];
+	u8         cap_sbsr_stat_size[0x8];
+	u8         cap_max_tclass_data[0x8];
+	u8         cap_max_cpu_ingress_tclass_sb[0x8];
+};
+
 struct mlx5_ifc_pbmc_reg_bits {
 	u8         reserved_at_0[0x8];
 	u8         local_port[0x8];
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net 11/11] net/mlx5e: CT: Fix memory leak in cleanup
  2020-07-02 22:19 [pull request][net 00/11] mlx5 fixes 2020-07-02 Saeed Mahameed
                   ` (9 preceding siblings ...)
  2020-07-02 22:19 ` [net 10/11] net/mlx5e: Fix port buffers cell size value Saeed Mahameed
@ 2020-07-02 22:19 ` Saeed Mahameed
  2020-07-07 20:33 ` [pull request][net 00/11] mlx5 fixes 2020-07-02 David Miller
  11 siblings, 0 replies; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-02 22:19 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Eli Britstein, Roi Dayan, Saeed Mahameed

From: Eli Britstein <elibr@mellanox.com>

CT entries are deleted via a workqueue from netfilter. If removing the
module before that, the rules are cleaned by the driver itself, but the
memory entries for them are not freed. Fix that.

Fixes: ac991b48d43c ("net/mlx5e: CT: Offload established flows")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index 430025550fad..aad1c29b23db 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -1097,6 +1097,7 @@ mlx5_tc_ct_flush_ft_entry(void *ptr, void *arg)
 	struct mlx5_ct_entry *entry = ptr;
 
 	mlx5_tc_ct_entry_del_rules(ct_priv, entry);
+	kfree(entry);
 }
 
 static void
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
  2020-07-02 22:19 ` [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s" Saeed Mahameed
@ 2020-07-03  1:47   ` Jakub Kicinski
  2020-07-03  3:45     ` Saeed Mahameed
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2020-07-03  1:47 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: David S. Miller, netdev, Ron Diskin, Tariq Toukan

On Thu,  2 Jul 2020 15:19:14 -0700 Saeed Mahameed wrote:
> From: Ron Diskin <rondi@mellanox.com>
> 
> Currently the FW does not generate events for counters other than error
> counters. Unlike ".get_ethtool_stats", ".ndo_get_stats64" (which ip -s
> uses) might run in atomic context, while the FW interface is non atomic.
> Thus, 'ip' is not allowed to issue fw commands, so it will only display
> cached counters in the driver.
> 
> Add a SW counter (mcast_packets) in the driver to count rx multicast
> packets. The counter also counts broadcast packets, as we consider it a
> special case of multicast.
> Use the counter value when calling "ip -s"/"ifconfig".  Display the new
> counter when calling "ethtool -S", and add a matching counter
> (mcast_bytes) for completeness.

What is the problem that is being solved here exactly?

Device counts mcast wrong / unsuitably?

> Fixes: f62b8bb8f2d3 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
> Signed-off-by: Ron Diskin <rondi@mellanox.com>
> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
  2020-07-03  1:47   ` Jakub Kicinski
@ 2020-07-03  3:45     ` Saeed Mahameed
  2020-07-03  4:25       ` Jakub Kicinski
  0 siblings, 1 reply; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-03  3:45 UTC (permalink / raw)
  To: kuba; +Cc: Tariq Toukan, davem, netdev, Ron Diskin

On Thu, 2020-07-02 at 18:47 -0700, Jakub Kicinski wrote:
> On Thu,  2 Jul 2020 15:19:14 -0700 Saeed Mahameed wrote:
> > From: Ron Diskin <rondi@mellanox.com>
> > 
> > Currently the FW does not generate events for counters other than
> > error
> > counters. Unlike ".get_ethtool_stats", ".ndo_get_stats64" (which ip
> > -s
> > uses) might run in atomic context, while the FW interface is non
> > atomic.
> > Thus, 'ip' is not allowed to issue fw commands, so it will only
> > display
> > cached counters in the driver.
> > 
> > Add a SW counter (mcast_packets) in the driver to count rx
> > multicast
> > packets. The counter also counts broadcast packets, as we consider
> > it a
> > special case of multicast.
> > Use the counter value when calling "ip -s"/"ifconfig".  Display the
> > new
> > counter when calling "ethtool -S", and add a matching counter
> > (mcast_bytes) for completeness.
> 
> What is the problem that is being solved here exactly?
> 
> Device counts mcast wrong / unsuitably?
> 

To read mcast counter we need to execute FW command which is blocking,
we can't block in atomic context .ndo_get_stats64 :( .. we have to
count in SW. 

the previous approach wasn't accurate as we read the mcast counter in a
background thread triggered by the previous read.. so we were off by
the interval between two reads.

> > Fixes: f62b8bb8f2d3 ("net/mlx5: Extend mlx5_core to support
> > ConnectX-4 Ethernet functionality")
> > Signed-off-by: Ron Diskin <rondi@mellanox.com>
> > Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
> > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
  2020-07-03  3:45     ` Saeed Mahameed
@ 2020-07-03  4:25       ` Jakub Kicinski
  2020-07-03  6:15         ` Saeed Mahameed
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2020-07-03  4:25 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: Tariq Toukan, davem, netdev, Ron Diskin

On Fri, 3 Jul 2020 03:45:45 +0000 Saeed Mahameed wrote:
> On Thu, 2020-07-02 at 18:47 -0700, Jakub Kicinski wrote:
> > On Thu,  2 Jul 2020 15:19:14 -0700 Saeed Mahameed wrote:  
> > > From: Ron Diskin <rondi@mellanox.com>
> > > 
> > > Currently the FW does not generate events for counters other than
> > > error
> > > counters. Unlike ".get_ethtool_stats", ".ndo_get_stats64" (which ip
> > > -s
> > > uses) might run in atomic context, while the FW interface is non
> > > atomic.
> > > Thus, 'ip' is not allowed to issue fw commands, so it will only
> > > display
> > > cached counters in the driver.
> > > 
> > > Add a SW counter (mcast_packets) in the driver to count rx
> > > multicast
> > > packets. The counter also counts broadcast packets, as we consider
> > > it a
> > > special case of multicast.
> > > Use the counter value when calling "ip -s"/"ifconfig".  Display the
> > > new
> > > counter when calling "ethtool -S", and add a matching counter
> > > (mcast_bytes) for completeness.  
> > 
> > What is the problem that is being solved here exactly?
> > 
> > Device counts mcast wrong / unsuitably?
> >   
> 
> To read mcast counter we need to execute FW command which is blocking,
> we can't block in atomic context .ndo_get_stats64 :( .. we have to
> count in SW. 
> 
> the previous approach wasn't accurate as we read the mcast counter in a
> background thread triggered by the previous read.. so we were off by
> the interval between two reads.

And that's bad enough to cause trouble? What's the worst case time
delta you're seeing?

> > > Fixes: f62b8bb8f2d3 ("net/mlx5: Extend mlx5_core to support
> > > ConnectX-4 Ethernet functionality")
> > > Signed-off-by: Ron Diskin <rondi@mellanox.com>
> > > Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
> > > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>  


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
  2020-07-03  4:25       ` Jakub Kicinski
@ 2020-07-03  6:15         ` Saeed Mahameed
  2020-07-03 17:59           ` Jakub Kicinski
  0 siblings, 1 reply; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-03  6:15 UTC (permalink / raw)
  To: kuba; +Cc: davem, Tariq Toukan, Ron Diskin, netdev

On Thu, 2020-07-02 at 21:25 -0700, Jakub Kicinski wrote:
> On Fri, 3 Jul 2020 03:45:45 +0000 Saeed Mahameed wrote:
> > On Thu, 2020-07-02 at 18:47 -0700, Jakub Kicinski wrote:
> > > On Thu,  2 Jul 2020 15:19:14 -0700 Saeed Mahameed wrote:  
> > > > From: Ron Diskin <rondi@mellanox.com>
> > > > 
> > > > Currently the FW does not generate events for counters other
> > > > than
> > > > error
> > > > counters. Unlike ".get_ethtool_stats", ".ndo_get_stats64"
> > > > (which ip
> > > > -s
> > > > uses) might run in atomic context, while the FW interface is
> > > > non
> > > > atomic.
> > > > Thus, 'ip' is not allowed to issue fw commands, so it will only
> > > > display
> > > > cached counters in the driver.
> > > > 
> > > > Add a SW counter (mcast_packets) in the driver to count rx
> > > > multicast
> > > > packets. The counter also counts broadcast packets, as we
> > > > consider
> > > > it a
> > > > special case of multicast.
> > > > Use the counter value when calling "ip -s"/"ifconfig".  Display
> > > > the
> > > > new
> > > > counter when calling "ethtool -S", and add a matching counter
> > > > (mcast_bytes) for completeness.  
> > > 
> > > What is the problem that is being solved here exactly?
> > > 
> > > Device counts mcast wrong / unsuitably?
> > >   
> > 
> > To read mcast counter we need to execute FW command which is
> > blocking,
> > we can't block in atomic context .ndo_get_stats64 :( .. we have to
> > count in SW. 
> > 
> > the previous approach wasn't accurate as we read the mcast counter
> > in a
> > background thread triggered by the previous read.. so we were off
> > by
> > the interval between two reads.
> 
> And that's bad enough to cause trouble? What's the worst case time
> delta you're seeing?
> 

Depends on the user frequency to read stats,
if you read stats once every 5 minutes then mcast stats are off by 5
minutes..

Just thinking out loud, is it ok of we busy loop and wait for FW
response for mcast stats commands ? 

In ethtool -S though, they are accurate since we grab them on the spot
from FW.

> > > > Fixes: f62b8bb8f2d3 ("net/mlx5: Extend mlx5_core to support
> > > > ConnectX-4 Ethernet functionality")
> > > > Signed-off-by: Ron Diskin <rondi@mellanox.com>
> > > > Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
> > > > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>  

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net 05/11] net/mlx5e: Hold reference on mirred devices while accessing them
  2020-07-02 22:19 ` [net 05/11] net/mlx5e: Hold reference on mirred devices while accessing them Saeed Mahameed
@ 2020-07-03  9:33   ` Or Gerlitz
  2020-07-05  7:19     ` Eli Cohen
  0 siblings, 1 reply; 29+ messages in thread
From: Or Gerlitz @ 2020-07-03  9:33 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Jakub Kicinski, Linux Netdev List, Eli Cohen,
	Vlad Buslov

On Fri, Jul 3, 2020 at 1:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
> From: Eli Cohen <eli@mellanox.com>
>
> Net devices might be removed. For example, a vxlan device could be
> deleted and its ifnidex would become invalid. Use dev_get_by_index()
> instead of __dev_get_by_index() to hold reference on the device while
> accessing it and release after done.

So if user space app installed a tc rule and then crashed or just
exited without
uninstalling the rule, the mirred (vxlan, vf rep, etc) device could
never be removed?

Why this is any better from the current situation?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
  2020-07-03  6:15         ` Saeed Mahameed
@ 2020-07-03 17:59           ` Jakub Kicinski
  2020-07-06 19:40             ` Saeed Mahameed
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2020-07-03 17:59 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: davem, Tariq Toukan, Ron Diskin, netdev

On Fri, 3 Jul 2020 06:15:09 +0000 Saeed Mahameed wrote:
> > > To read mcast counter we need to execute FW command which is
> > > blocking,
> > > we can't block in atomic context .ndo_get_stats64 :( .. we have to
> > > count in SW. 
> > > 
> > > the previous approach wasn't accurate as we read the mcast counter
> > > in a
> > > background thread triggered by the previous read.. so we were off
> > > by
> > > the interval between two reads.  
> > 
> > And that's bad enough to cause trouble? What's the worst case time
> > delta you're seeing?
> 
> Depends on the user frequency to read stats,
> if you read stats once every 5 minutes then mcast stats are off by 5
> minutes..
> 
> Just thinking out loud, is it ok of we busy loop and wait for FW
> response for mcast stats commands ? 
> 
> In ethtool -S though, they are accurate since we grab them on the spot
> from FW.

I don't really feel too strongly, I'm just trying to get the details
because I feel like the situation is going to be increasingly common.
It'd be quite sad if drivers had to reimplement all stats in sw.

I thought it would be entirely reasonable for the driver to read the
stats from a delayed work every 1/2 HZ and cache that. We do have a
knob in ethtool IRQ coalescing settings for stats writeback frequency.

I'm not sure what locks procfs actually holds, if its something that
could impact reading other files - it'd probably be a bad idea to busy
wait.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net 05/11] net/mlx5e: Hold reference on mirred devices while accessing them
  2020-07-03  9:33   ` Or Gerlitz
@ 2020-07-05  7:19     ` Eli Cohen
  2020-07-06  6:13       ` Or Gerlitz
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Cohen @ 2020-07-05  7:19 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Saeed Mahameed, David S. Miller, Jakub Kicinski,
	Linux Netdev List, Vlad Buslov

On Fri, Jul 03, 2020 at 12:33:58PM +0300, Or Gerlitz wrote:
> On Fri, Jul 3, 2020 at 1:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
> > From: Eli Cohen <eli@mellanox.com>
> >
> > Net devices might be removed. For example, a vxlan device could be
> > deleted and its ifnidex would become invalid. Use dev_get_by_index()
> > instead of __dev_get_by_index() to hold reference on the device while
> > accessing it and release after done.
> 
> So if user space app installed a tc rule and then crashed or just
> exited without
> uninstalling the rule, the mirred (vxlan, vf rep, etc) device could
> never be removed?

Why do you think so? I decrease ref count, unconditionally, right after
returning from mlx5e_attach_encap().


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net 05/11] net/mlx5e: Hold reference on mirred devices while accessing them
  2020-07-05  7:19     ` Eli Cohen
@ 2020-07-06  6:13       ` Or Gerlitz
  2020-07-07 14:57         ` Eli Cohen
  0 siblings, 1 reply; 29+ messages in thread
From: Or Gerlitz @ 2020-07-06  6:13 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Saeed Mahameed, David S. Miller, Jakub Kicinski,
	Linux Netdev List, Vlad Buslov

On Sun, Jul 5, 2020 at 10:19 AM Eli Cohen <eli@mellanox.com> wrote:
>
> On Fri, Jul 03, 2020 at 12:33:58PM +0300, Or Gerlitz wrote:
> > On Fri, Jul 3, 2020 at 1:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
> > > From: Eli Cohen <eli@mellanox.com>
> > >
> > > Net devices might be removed. For example, a vxlan device could be
> > > deleted and its ifnidex would become invalid. Use dev_get_by_index()
> > > instead of __dev_get_by_index() to hold reference on the device while
> > > accessing it and release after done.
> >
> > So if user space app installed a tc rule and then crashed or just
> > exited without
> > uninstalling the rule, the mirred (vxlan, vf rep, etc) device could
> > never be removed?
>
> Why do you think so? I decrease ref count, unconditionally, right after
> returning from mlx5e_attach_encap().

so what are we protecting here against? someone removing the device
while the tc rule is being added?

why do it in the driver and not higher in the tc stack? if I got you
correctly, the same problem can
happen for sw only (skip-hw) rules

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
  2020-07-03 17:59           ` Jakub Kicinski
@ 2020-07-06 19:40             ` Saeed Mahameed
  2020-07-06 19:57               ` Jakub Kicinski
  0 siblings, 1 reply; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-06 19:40 UTC (permalink / raw)
  To: kuba; +Cc: davem, Tariq Toukan, Ron Diskin, netdev

On Fri, 2020-07-03 at 10:59 -0700, Jakub Kicinski wrote:
> On Fri, 3 Jul 2020 06:15:09 +0000 Saeed Mahameed wrote:
> > > > To read mcast counter we need to execute FW command which is
> > > > blocking,
> > > > we can't block in atomic context .ndo_get_stats64 :( .. we have
> > > > to
> > > > count in SW. 
> > > > 
> > > > the previous approach wasn't accurate as we read the mcast
> > > > counter
> > > > in a
> > > > background thread triggered by the previous read.. so we were
> > > > off
> > > > by
> > > > the interval between two reads.  
> > > 
> > > And that's bad enough to cause trouble? What's the worst case
> > > time
> > > delta you're seeing?
> > 
> > Depends on the user frequency to read stats,
> > if you read stats once every 5 minutes then mcast stats are off by
> > 5
> > minutes..
> > 
> > Just thinking out loud, is it ok of we busy loop and wait for FW
> > response for mcast stats commands ? 
> > 
> > In ethtool -S though, they are accurate since we grab them on the
> > spot
> > from FW.
> 
> I don't really feel too strongly, I'm just trying to get the details
> because I feel like the situation is going to be increasingly common.
> It'd be quite sad if drivers had to reimplement all stats in sw.
> 

Depends on HW, our HW/FW supports providing stats per (Vport/function).
which means if a packet got lost between the NIC and the netdev queue,
it will be counted as rx-packet/mcast, although we have a private
counter to show this drop in ethtool but will be counted in rx counter
in netdev stats, if we used hw stats.

so this is why i always prefer SW stats for netdev reported stats, all
we need to count in SW {rx,tx} X {packets, bytes} + rx mcast packets.

This gives more flexibility and correctness, any given HW can create
multiple netdevs on the same function, we need the netdev stats to
reflect traffic that only went through that netdev.

> I thought it would be entirely reasonable for the driver to read the
> stats from a delayed work every 1/2 HZ and cache that. We do have a
> knob in ethtool IRQ coalescing settings for stats writeback
> frequency.
> 

Some customers didn't like this since for drivers that implement this
their CPU power utilization will be slightly higher on idle.

> I'm not sure what locks procfs actually holds, if its something that
> could impact reading other files - it'd probably be a bad idea to
> busy
> wait.

Agreed.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
  2020-07-06 19:40             ` Saeed Mahameed
@ 2020-07-06 19:57               ` Jakub Kicinski
  2020-07-07  1:51                 ` Saeed Mahameed
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2020-07-06 19:57 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: davem, Tariq Toukan, Ron Diskin, netdev

On Mon, 6 Jul 2020 19:40:50 +0000 Saeed Mahameed wrote:
> > I don't really feel too strongly, I'm just trying to get the details
> > because I feel like the situation is going to be increasingly common.
> > It'd be quite sad if drivers had to reimplement all stats in sw.
> 
> Depends on HW, our HW/FW supports providing stats per (Vport/function).
> which means if a packet got lost between the NIC and the netdev queue,
> it will be counted as rx-packet/mcast, although we have a private
> counter to show this drop in ethtool but will be counted in rx counter
> in netdev stats, if we used hw stats.
> 
> so this is why i always prefer SW stats for netdev reported stats, all
> we need to count in SW {rx,tx} X {packets, bytes} + rx mcast packets.

If that was indeed the intention it'd had been done in the core, not
each driver separately..

> This gives more flexibility and correctness, any given HW can create
> multiple netdevs on the same function, we need the netdev stats to
> reflect traffic that only went through that netdev.
> 
> > I thought it would be entirely reasonable for the driver to read the
> > stats from a delayed work every 1/2 HZ and cache that. We do have a
> > knob in ethtool IRQ coalescing settings for stats writeback
> > frequency.
> 
> Some customers didn't like this since for drivers that implement this
> their CPU power utilization will be slightly higher on idle.

Other customers may dislike the per packet cycles.

I don't really mind, I just found the commit message to be lacking 
for a fix, which this supposedly is.

Also looks like you report the total number of mcast packets in ethtool
-S, which should be identical to ip -s? If so please remove that.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
  2020-07-06 19:57               ` Jakub Kicinski
@ 2020-07-07  1:51                 ` Saeed Mahameed
  2020-07-07  2:07                   ` Jakub Kicinski
  0 siblings, 1 reply; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-07  1:51 UTC (permalink / raw)
  To: kuba; +Cc: davem, Tariq Toukan, Ron Diskin, netdev

On Mon, 2020-07-06 at 12:57 -0700, Jakub Kicinski wrote:
> On Mon, 6 Jul 2020 19:40:50 +0000 Saeed Mahameed wrote:
> > > I don't really feel too strongly, I'm just trying to get the
> > > details
> > > because I feel like the situation is going to be increasingly
> > > common.
> > > It'd be quite sad if drivers had to reimplement all stats in sw.
> > 
> > Depends on HW, our HW/FW supports providing stats per
> > (Vport/function).
> > which means if a packet got lost between the NIC and the netdev
> > queue,
> > it will be counted as rx-packet/mcast, although we have a private
> > counter to show this drop in ethtool but will be counted in rx
> > counter
> > in netdev stats, if we used hw stats.
> > 
> > so this is why i always prefer SW stats for netdev reported stats,
> > all
> > we need to count in SW {rx,tx} X {packets, bytes} + rx mcast
> > packets.
> 
> If that was indeed the intention it'd had been done in the core, not
> each driver separately..
> 

this is why it depends on the HW. unfortunately in our case HW stats !=
SW stats.

> > This gives more flexibility and correctness, any given HW can
> > create
> > multiple netdevs on the same function, we need the netdev stats to
> > reflect traffic that only went through that netdev.
> > 
> > > I thought it would be entirely reasonable for the driver to read
> > > the
> > > stats from a delayed work every 1/2 HZ and cache that. We do have
> > > a
> > > knob in ethtool IRQ coalescing settings for stats writeback
> > > frequency.
> > 
> > Some customers didn't like this since for drivers that implement
> > this
> > their CPU power utilization will be slightly higher on idle.
> 
> Other customers may dislike the per packet cycles.
> 
> I don't really mind, I just found the commit message to be lacking 
> for a fix, which this supposedly is.
> 

Yes commit message could use some improvement.

I think it is even worse than i thought, we removed  the background
refreshing of stats due to the FW events support to updates some
counters. Multicast counter can't be refreshed by FW events since it is
a data path counters. 

> Also looks like you report the total number of mcast packets in
> ethtool
> -S, which should be identical to ip -s? If so please remove that.

why ? it is ok to report the same counter both in ehttool and netdev
stats.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
  2020-07-07  1:51                 ` Saeed Mahameed
@ 2020-07-07  2:07                   ` Jakub Kicinski
  2020-07-07  3:29                     ` Saeed Mahameed
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2020-07-07  2:07 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: davem, Tariq Toukan, Ron Diskin, netdev

On Tue, 7 Jul 2020 01:51:21 +0000 Saeed Mahameed wrote:
> > Also looks like you report the total number of mcast packets in
> > ethtool
> > -S, which should be identical to ip -s? If so please remove that.  
> 
> why ? it is ok to report the same counter both in ehttool and netdev
> stats.

I don't think it is, Stephen and I have been trying to catch this in
review for a while now. It's an entirely unnecessary code duplication.
We should steer towards proper APIs first if those exist.

Using ethtool -S stats gets very messy very quickly in production.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
  2020-07-07  2:07                   ` Jakub Kicinski
@ 2020-07-07  3:29                     ` Saeed Mahameed
  2020-07-07 16:37                       ` Jakub Kicinski
  0 siblings, 1 reply; 29+ messages in thread
From: Saeed Mahameed @ 2020-07-07  3:29 UTC (permalink / raw)
  To: kuba; +Cc: davem, Tariq Toukan, Ron Diskin, netdev

On Mon, 2020-07-06 at 19:07 -0700, Jakub Kicinski wrote:
> On Tue, 7 Jul 2020 01:51:21 +0000 Saeed Mahameed wrote:
> > > Also looks like you report the total number of mcast packets in
> > > ethtool
> > > -S, which should be identical to ip -s? If so please remove
> > > that.  
> > 
> > why ? it is ok to report the same counter both in ehttool and
> > netdev
> > stats.
> 
> I don't think it is, Stephen and I have been trying to catch this in
> review for a while now. It's an entirely unnecessary code
> duplication.

Code duplication shouldn't be a factor. For example, in mlx5 we have a
generic mechanism to define and report stats, for the netdev stats to
be reported in ethtoool all we need to do is define the representing
string and store those counters in the SW stats struct.

> We should steer towards proper APIs first if those exist.
> 
> Using ethtool -S stats gets very messy very quickly in production.

I agree on ethtool getting messy very quickly, but i disagree on not
reporting netdev stats, I don't think the 4 netdev stats are the reason
for messy ethtool.

Ethtool -S is meant for verbosity and debug, and it is full of
driver/HW specific counters, how are you planing to audit all of those
?

We had an idea in the past to allow users to choose what stats groups
to report to ethtool, we can follow up on this if this is something
other drivers might be interested in.

example: 

ethtool -S eth0 --groups XDP,SW,PER_QUEUE,PCI,PORT,DRIVER_SPECIFIC
Where non DRIVER_SPECIFC groups are standardize stats objects.. 

-Saeed.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net 05/11] net/mlx5e: Hold reference on mirred devices while accessing them
  2020-07-06  6:13       ` Or Gerlitz
@ 2020-07-07 14:57         ` Eli Cohen
  2020-07-07 16:57           ` Or Gerlitz
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Cohen @ 2020-07-07 14:57 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Saeed Mahameed, David S. Miller, Jakub Kicinski,
	Linux Netdev List, Vlad Buslov

On Mon, Jul 06, 2020 at 09:13:06AM +0300, Or Gerlitz wrote:
> On Sun, Jul 5, 2020 at 10:19 AM Eli Cohen <eli@mellanox.com> wrote:
> 
> so what are we protecting here against? someone removing the device
> while the tc rule is being added?
>
Not necessairly. In case of ecmp, the rule may be copied to another
eswitch. At this time, if the mirred device does not exsist anymore, we
will crash.

I saw this problem at a customer's machine and this solved the problem.
It was an older kernel but I think we have the same issue here as well.
All you have is the ifindex of the mirred device so what I did here is
required.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
  2020-07-07  3:29                     ` Saeed Mahameed
@ 2020-07-07 16:37                       ` Jakub Kicinski
  0 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2020-07-07 16:37 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: davem, Tariq Toukan, Ron Diskin, netdev

On Tue, 7 Jul 2020 03:29:18 +0000 Saeed Mahameed wrote:
> On Mon, 2020-07-06 at 19:07 -0700, Jakub Kicinski wrote:
> > On Tue, 7 Jul 2020 01:51:21 +0000 Saeed Mahameed wrote:  
> > > > Also looks like you report the total number of mcast packets in
> > > > ethtool
> > > > -S, which should be identical to ip -s? If so please remove
> > > > that.    
> > > 
> > > why ? it is ok to report the same counter both in ehttool and
> > > netdev
> > > stats.  
> > 
> > I don't think it is, Stephen and I have been trying to catch this in
> > review for a while now. It's an entirely unnecessary code
> > duplication.  
> 
> Code duplication shouldn't be a factor. For example, in mlx5 we have a
> generic mechanism to define and report stats, for the netdev stats to
> be reported in ethtoool all we need to do is define the representing
> string and store those counters in the SW stats struct.

And sum them up in a loop. One day we'll have a better API for standard
stats and will will we still argue then that ethtool -S should
duplicate _everything_.

Drivers should put more focus on providing useful information in
standard statistics in the first place, then think about exposing more
details as needed.

> > We should steer towards proper APIs first if those exist.
> > 
> > Using ethtool -S stats gets very messy very quickly in production.  
> 
> I agree on ethtool getting messy very quickly, but i disagree on not
> reporting netdev stats, I don't think the 4 netdev stats are the reason
> for messy ethtool.
> 
> Ethtool -S is meant for verbosity and debug, and it is full of
> driver/HW specific counters, how are you planing to audit all of those
> ?

I don't understand how verbosity, debug, and being full of HW specific
counters has any relevance for this patch - which is reporting a pure
SW driver stat.

> We had an idea in the past to allow users to choose what stats groups
> to report to ethtool, we can follow up on this if this is something
> other drivers might be interested in.
> 
> example: 
> 
> ethtool -S eth0 --groups XDP,SW,PER_QUEUE,PCI,PORT,DRIVER_SPECIFIC
> Where non DRIVER_SPECIFC groups are standardize stats objects.. 

Attributes are useful but the primary problem is the fact that we,
driver developers, seem to be funneling all our creative passion into
coming up with new names for statistics.

Look for example how many names we have for etherStatsPkts256to511Octets
And nobody(!) thought it's a good idea to just name the counter what
it's called in the RFC.

Policing a free form string interface in review is just unworkable.

Anyway.. I'm sidetracking.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net 05/11] net/mlx5e: Hold reference on mirred devices while accessing them
  2020-07-07 14:57         ` Eli Cohen
@ 2020-07-07 16:57           ` Or Gerlitz
  0 siblings, 0 replies; 29+ messages in thread
From: Or Gerlitz @ 2020-07-07 16:57 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Saeed Mahameed, David S. Miller, Jakub Kicinski,
	Linux Netdev List, Vlad Buslov

On Tue, Jul 7, 2020 at 5:57 PM Eli Cohen <eli@mellanox.com> wrote:
> On Mon, Jul 06, 2020 at 09:13:06AM +0300, Or Gerlitz wrote:
> > On Sun, Jul 5, 2020 at 10:19 AM Eli Cohen <eli@mellanox.com> wrote:
> >
> > so what are we protecting here against? someone removing the device
> > while the tc rule is being added?
> >
> Not necessairly. In case of ecmp, the rule may be copied to another
> eswitch. At this time, if the mirred device does not exsist anymore, we
> will crash.
>
> I saw this problem at a customer's machine and this solved the problem.
> It was an older kernel but I think we have the same issue here as well.
> All you have is the ifindex of the mirred device so what I did here is
> required.

Repeating myself, why do it in the driver and not higher in the tc stack?
if I got you correctly, the same problem can happen for sw only (skip-hw) rules

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [pull request][net 00/11] mlx5 fixes 2020-07-02
  2020-07-02 22:19 [pull request][net 00/11] mlx5 fixes 2020-07-02 Saeed Mahameed
                   ` (10 preceding siblings ...)
  2020-07-02 22:19 ` [net 11/11] net/mlx5e: CT: Fix memory leak in cleanup Saeed Mahameed
@ 2020-07-07 20:33 ` David Miller
  11 siblings, 0 replies; 29+ messages in thread
From: David Miller @ 2020-07-07 20:33 UTC (permalink / raw)
  To: saeedm; +Cc: kuba, netdev

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu,  2 Jul 2020 15:19:12 -0700

> This series introduces some fixes to mlx5 driver.
> 
> Please pull and let me know if there is any problem.

This needs more work.

I agree with Jakub that duplicating standard stats in ethtool -S is
and has always been a very bad idea.  We let people get away with it
initially in order to allow per-queue stats, but that was a mistake
and even if it was the right way forward it doesn't mean the rest of
the ip -s stats should have been duplicated as well.

Jakub's base argument is true, if we ever do something better in the
generic 'ip -s' stuff it will not propagate to all of the 'ethtool -S'
because it is duplicated around in every driver.

We let driver developers get away with a lot with ethtool custom
statistics and we're past due to perform some push back on this issue.

There also seems to be some necessary discussion about where the tc
reference count fix really belongs.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-07-07 20:33 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-02 22:19 [pull request][net 00/11] mlx5 fixes 2020-07-02 Saeed Mahameed
2020-07-02 22:19 ` [net 01/11] net/mlx5: Fix eeprom support for SFP module Saeed Mahameed
2020-07-02 22:19 ` [net 02/11] net/mlx5e: Fix multicast counter not up-to-date in "ip -s" Saeed Mahameed
2020-07-03  1:47   ` Jakub Kicinski
2020-07-03  3:45     ` Saeed Mahameed
2020-07-03  4:25       ` Jakub Kicinski
2020-07-03  6:15         ` Saeed Mahameed
2020-07-03 17:59           ` Jakub Kicinski
2020-07-06 19:40             ` Saeed Mahameed
2020-07-06 19:57               ` Jakub Kicinski
2020-07-07  1:51                 ` Saeed Mahameed
2020-07-07  2:07                   ` Jakub Kicinski
2020-07-07  3:29                     ` Saeed Mahameed
2020-07-07 16:37                       ` Jakub Kicinski
2020-07-02 22:19 ` [net 03/11] net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode Saeed Mahameed
2020-07-02 22:19 ` [net 04/11] net/mxl5e: Verify that rpriv is not NULL Saeed Mahameed
2020-07-02 22:19 ` [net 05/11] net/mlx5e: Hold reference on mirred devices while accessing them Saeed Mahameed
2020-07-03  9:33   ` Or Gerlitz
2020-07-05  7:19     ` Eli Cohen
2020-07-06  6:13       ` Or Gerlitz
2020-07-07 14:57         ` Eli Cohen
2020-07-07 16:57           ` Or Gerlitz
2020-07-02 22:19 ` [net 06/11] net/mlx5e: Fix usage of rcu-protected pointer Saeed Mahameed
2020-07-02 22:19 ` [net 07/11] net/mlx5e: Fix VXLAN configuration restore after function reload Saeed Mahameed
2020-07-02 22:19 ` [net 08/11] net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash Saeed Mahameed
2020-07-02 22:19 ` [net 09/11] net/mlx5e: Fix 50G per lane indication Saeed Mahameed
2020-07-02 22:19 ` [net 10/11] net/mlx5e: Fix port buffers cell size value Saeed Mahameed
2020-07-02 22:19 ` [net 11/11] net/mlx5e: CT: Fix memory leak in cleanup Saeed Mahameed
2020-07-07 20:33 ` [pull request][net 00/11] mlx5 fixes 2020-07-02 David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).