All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting
@ 2017-06-21 13:04 Gal Pressman
  2017-06-21 13:04 ` [RFC PATCH net-next 1/3] ethtool: Add link down reason callback Gal Pressman
                   ` (4 more replies)
  0 siblings, 5 replies; 30+ messages in thread
From: Gal Pressman @ 2017-06-21 13:04 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team,
	Gal Pressman

Hi All,

Currently, drivers can only tell whether the link is up/down through
ETHTOOL_GLINK callback, but no additional information is given in case
of link down/failure.

This series provides an infrastructure to ethtool that allows 
netdevice drivers to hint the user with the reason why the link is down,
in order to ease the debug process.

In addition two more mlx5 patches to demonstrate the usage.

Reasons are separated to two types, generic and vendor specific.
Drivers can reply with a generic reason using the predefined enums 
(and the ones that will be added in the future), which will be
translated to strings by the userspace ethtool.
In case of a vendor specific reason, drivers can reply with
ETHTOOL_VENDOR_SPECIFIC reason, in this case the vendor_reason field
should be filled with a vendor specific status code which will be parsed
by the vendor specific userspace parser if one is available.

This kind of information can save system administrators precious time,
which will not be wasted trying to understand why the link won't go up.
    
For example, when the cable is unplugged:
    $ ethtool ethXX
    ...
    Link detected: no (Cable unplugged)

Thanks,
Gal.

Gal Pressman (3):
  ethtool: Add link down reason callback
  net/mlx5: Add PDDR register infrastructure
  net/mlx5e: Expose link down reason to ethtool

 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 107 +++++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |  33 +++++++
 drivers/net/ethernet/mellanox/mlx5/core/port.c     |  39 ++++++++
 include/linux/ethtool.h                            |   2 +
 include/linux/mlx5/device.h                        |   3 +
 include/linux/mlx5/driver.h                        |   1 +
 include/linux/mlx5/mlx5_ifc.h                      |  51 ++++++++++
 include/uapi/linux/ethtool.h                       |  33 +++++++
 net/core/ethtool.c                                 |  24 +++++
 9 files changed, 293 insertions(+)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-21 13:04 [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting Gal Pressman
@ 2017-06-21 13:04 ` Gal Pressman
  2017-06-21 13:58   ` Andrew Lunn
                     ` (2 more replies)
  2017-06-21 13:04 ` [RFC PATCH net-next 2/3] net/mlx5: Add PDDR register infrastructure Gal Pressman
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 30+ messages in thread
From: Gal Pressman @ 2017-06-21 13:04 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team,
	Gal Pressman

Currently, drivers can only tell whether the link is up/down through
ETHTOOL_GLINK callback, but no additional information is given in case
of link down.
This patch provides an infrastructure that allows drivers to hint
the user with the reason why the link is down, in order to ease the
debug process.

Reasons are separated to two types, generic and vendor specific.
Drivers can reply with a generic reason using the enums provided in this
patch (and the ones that will be added in the future), which will be
translated to strings by the userspace ethtool.
In case of a vendor specific reason (not suitable for most vendors),
drivers can reply with ETHTOOL_VENDOR_SPECIFIC reason, in this case the
vendor_reason field should be filled with a vendor specific status code
which will be parsed by the vendor specific userspace parser if one is
available.

This kind of information can save system administrators precious time
which will not be wasted trying to understand why the link won't go
up.

For example, when the cable is unplugged:
$ ethtool ethXX
...
Link detected: no (Cable unplugged)

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/ethtool.h      |  2 ++
 include/uapi/linux/ethtool.h | 33 +++++++++++++++++++++++++++++++++
 net/core/ethtool.c           | 24 ++++++++++++++++++++++++
 3 files changed, 59 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 83cc986..d472047 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -374,5 +374,7 @@ struct ethtool_ops {
 				      struct ethtool_link_ksettings *);
 	int	(*set_link_ksettings)(struct net_device *,
 				      const struct ethtool_link_ksettings *);
+	int	(*get_link_down_reason)(struct net_device *,
+					struct ethtool_link_down_reason *);
 };
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 7d4a594..8cf9d2c 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -550,6 +550,13 @@ struct ethtool_pauseparam {
 
 #define ETH_GSTRING_LEN		32
 
+struct ethtool_link_down_reason {
+	__u32	cmd;
+	__u32	reason;
+	__u32	vendor_reason;
+	__u32	reserved[4];
+};
+
 /**
  * enum ethtool_stringset - string set ID
  * @ETH_SS_TEST: Self-test result names, for use with %ETHTOOL_TEST
@@ -1331,6 +1338,8 @@ struct ethtool_per_queue_op {
 #define ETHTOOL_PHY_GTUNABLE	0x0000004e /* Get PHY tunable configuration */
 #define ETHTOOL_PHY_STUNABLE	0x0000004f /* Set PHY tunable configuration */
 
+#define ETHTOOL_GLINK_DOWN_RSN	0x00000050 /* Get link down reason */
+
 /* compatibility with older code */
 #define SPARC_ETH_GSET		ETHTOOL_GSET
 #define SPARC_ETH_SSET		ETHTOOL_SSET
@@ -1766,4 +1775,28 @@ struct ethtool_link_settings {
 	 * __u32 map_lp_advertising[link_mode_masks_nwords];
 	 */
 };
+
+enum {
+	ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in vendor_reason */
+	ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
+	ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
+	ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
+	ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
+	ETHTOOL_LINK_AN_FAILED, /* Auto negotiation failed */
+	ETHTOOL_LINK_TRAINING_FAILED, /* Link training failed */
+	ETHTOOL_LINK_RMT_FAULT, /* Remote fault indication */
+	ETHTOOL_LINK_BAD_SIGNAL_INTEGRITY, /* Bad signal integrity */
+	ETHTOOL_LINK_CABLE_MISMATCH, /* Cable protocol mismatch */
+	ETHTOOL_LINK_INTERNAL_ERR, /* Internal error */
+	ETHTOOL_LINK_CABLE_UNPLUGGED, /* Cable unplugged */
+	ETHTOOL_LINK_UNSUPP_MODULE, /* Unsupported module */
+	ETHTOOL_LINK_I2C_BUS_ERR, /* I2C bus error */
+	ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */
+	ETHTOOL_LINK_OVERTEMP, /* Over temperature */
+	ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
+	ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
+
+	ETHTOOL_LINK_REASONS_COUNT
+};
+
 #endif /* _UAPI_LINUX_ETHTOOL_H */
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 03111a2..b818ad4 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2523,6 +2523,26 @@ static int set_phy_tunable(struct net_device *dev, void __user *useraddr)
 	return ret;
 }
 
+static int get_link_down_reason(struct net_device *dev, void __user *useraddr)
+{
+	struct ethtool_link_down_reason ldr;
+	int ret;
+
+	if (!dev->ethtool_ops->get_link_down_reason)
+		return -EOPNOTSUPP;
+
+	memset(&ldr, 0, sizeof(ldr));
+	ldr.cmd = ETHTOOL_GLINK_DOWN_RSN;
+	ret = dev->ethtool_ops->get_link_down_reason(dev, &ldr);
+	if (ret)
+		return ret;
+
+	if (copy_to_user(useraddr, &ldr, sizeof(ldr)))
+		return -EFAULT;
+
+	return 0;
+}
+
 /* The main entry point in this file.  Called from net/core/dev_ioctl.c */
 
 int dev_ethtool(struct net *net, struct ifreq *ifr)
@@ -2582,6 +2602,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_GTUNABLE:
 	case ETHTOOL_PHY_GTUNABLE:
 	case ETHTOOL_GLINKSETTINGS:
+	case ETHTOOL_GLINK_DOWN_RSN:
 		break;
 	default:
 		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
@@ -2793,6 +2814,9 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_PHY_STUNABLE:
 		rc = set_phy_tunable(dev, useraddr);
 		break;
+	case ETHTOOL_GLINK_DOWN_RSN:
+		rc = get_link_down_reason(dev, useraddr);
+		break;
 	default:
 		rc = -EOPNOTSUPP;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH net-next 2/3] net/mlx5: Add PDDR register infrastructure
  2017-06-21 13:04 [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting Gal Pressman
  2017-06-21 13:04 ` [RFC PATCH net-next 1/3] ethtool: Add link down reason callback Gal Pressman
@ 2017-06-21 13:04 ` Gal Pressman
  2017-06-21 13:04 ` [RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool Gal Pressman
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 30+ messages in thread
From: Gal Pressman @ 2017-06-21 13:04 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team,
	Gal Pressman

PDDR (Port Diagnostics Database Register) is used to read the physical
layer debug database, which contains helpful troubleshooting information
regarding the state of the link.

PDDR register can only be queried when PCAM register reports it as
supported in its register mask. A new helper macro was added to
the MLX5_CAP_* infrastructure in order to access this mask.

Expose query functions for PDDR register which will be used in the
following patch.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    | 29 ++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/port.c     | 14 ++++++
 include/linux/mlx5/device.h                        |  3 ++
 include/linux/mlx5/driver.h                        |  1 +
 include/linux/mlx5/mlx5_ifc.h                      | 51 ++++++++++++++++++++++
 5 files changed, 98 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 5ccdf43..cbb6a0e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -79,6 +79,35 @@ enum {
 	MLX5_DRIVER_SYND = 0xbadd00de,
 };
 
+enum mlx5_pddr_page_select {
+	MLX5_PDDR_OPERATIONAL_INFO_PAGE            = 0x0,
+	MLX5_PDDR_TROUBLESHOOTING_INFO_PAGE        = 0x1,
+	MLX5_PDDR_MODULE_INFO_PAGE                 = 0x3,
+};
+
+enum mlx5_pddr_monitor_opcodes {
+	MLX5_LINK_NO_ISSUE_OBSERVED                = 0x0,
+	MLX5_LINK_PORT_CLOSED                      = 0x1,
+	MLX5_LINK_AN_FAILURE                       = 0x2,
+	MLX5_LINK_TRAINING_FAILURE                 = 0x5,
+	MLX5_LINK_LOGICAL_MISMATCH                 = 0x9,
+	MLX5_LINK_REMOTE_FAULT_INDICATION          = 0xe,
+	MLX5_LINK_BAD_SIGNAL_INTEGRITY             = 0xf,
+	MLX5_LINK_CABLE_COMPLIANCE_CODE_MISMATCH   = 0x10,
+	MLX5_LINK_INTERNAL_ERR                     = 0x17,
+	MLX5_LINK_INFO_NOT_AVAIL                   = 0x3ff,
+	MLX5_LINK_CABLE_UNPLUGGED                  = 0x400,
+	MLX5_LINK_LONG_RANGE_FOR_NON_MLX_CABLE     = 0x401,
+	MLX5_LINK_BUS_STUCK                        = 0x402,
+	MLX5_LINK_UNSUPP_EEPROM                    = 0x403,
+	MLX5_LINK_PART_NUM_LIST                    = 0x404,
+	MLX5_LINK_UNSUPP_CABLE                     = 0x405,
+	MLX5_LINK_MODULE_TEMP_SHUTDOWN             = 0x406,
+	MLX5_LINK_SHORTED_CABLE                    = 0x407,
+	MLX5_LINK_POWER_BUDGET_EXCEEDED            = 0x408,
+	MLX5_LINK_MNG_FORCED_DOWN                  = 0x409,
+};
+
 int mlx5_query_hca_caps(struct mlx5_core_dev *dev);
 int mlx5_query_board_id(struct mlx5_core_dev *dev);
 int mlx5_cmd_init_hca(struct mlx5_core_dev *dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index 1975d43..38e97b2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -789,6 +789,20 @@ int mlx5_query_port_wol(struct mlx5_core_dev *mdev, u8 *wol_mode)
 }
 EXPORT_SYMBOL_GPL(mlx5_query_port_wol);
 
+static int mlx5_query_pddr(struct mlx5_core_dev *mdev,
+			   int page_select, u32 *out, int outlen)
+{
+	u32 in[MLX5_ST_SZ_DW(pddr_reg)] = {0};
+
+	if (!MLX5_CAP_PCAM_REG(mdev, pddr))
+		return -EOPNOTSUPP;
+
+	MLX5_SET(pddr_reg, in, local_port, 1);
+	MLX5_SET(pddr_reg, in, page_select, page_select);
+
+	return mlx5_core_access_reg(mdev, in, sizeof(in), out, outlen, MLX5_REG_PDDR, 0, 0);
+}
+
 static int mlx5_query_ports_check(struct mlx5_core_dev *mdev, u32 *out,
 				  int outlen)
 {
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index b26a478..b70b283 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1094,6 +1094,9 @@ enum mlx5_mcam_feature_groups {
 #define MLX5_CAP_PCAM_FEATURE(mdev, fld) \
 	MLX5_GET(pcam_reg, (mdev)->caps.pcam, feature_cap_mask.enhanced_features.fld)
 
+#define MLX5_CAP_PCAM_REG(mdev, reg) \
+	MLX5_GET(pcam_reg, (mdev)->caps.pcam, port_access_reg_cap_mask.regs_5000_to_507f.reg)
+
 #define MLX5_CAP_MCAM_FEATURE(mdev, fld) \
 	MLX5_GET(mcam_reg, (mdev)->caps.mcam, mng_feature_cap_mask.enhanced_features.fld)
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index bf15e87..876056f 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -121,6 +121,7 @@ enum {
 	MLX5_REG_PMPE		 = 0x5010,
 	MLX5_REG_PELC		 = 0x500e,
 	MLX5_REG_PVLC		 = 0x500f,
+	MLX5_REG_PDDR		 = 0x5031,
 	MLX5_REG_PCMR		 = 0x5041,
 	MLX5_REG_PMLP		 = 0x5002,
 	MLX5_REG_PCAM		 = 0x507f,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index e86ef88..cdb2435 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -7465,6 +7465,45 @@ struct mlx5_ifc_ppcnt_reg_bits {
 	union mlx5_ifc_eth_cntrs_grp_data_layout_auto_bits counter_set;
 };
 
+struct mlx5_ifc_monitor_opcodes_layout_bits {
+	u8         reserved_at_0[0x10];
+	u8         monitor_opcode[0x10];
+};
+
+union mlx5_ifc_pddr_status_opcode_bits {
+	struct mlx5_ifc_monitor_opcodes_layout_bits monitor_opcodes;
+	u8         reserved_at_0[0x20];
+};
+
+struct mlx5_ifc_troubleshooting_info_page_layout_bits {
+	u8         reserved_at_0[0x10];
+	u8         group_opcode[0x10];
+
+	union mlx5_ifc_pddr_status_opcode_bits status_opcode;
+
+	u8         user_feedback_data[0x10];
+	u8         user_feedback_index[0x10];
+
+	u8         status_message[0x760];
+};
+
+union mlx5_ifc_pddr_page_data_bits {
+	struct mlx5_ifc_troubleshooting_info_page_layout_bits troubleshooting_info_page;
+	u8         reserved_at_0[0x7c0];
+};
+
+struct mlx5_ifc_pddr_reg_bits {
+	u8         reserved_at_0[0x8];
+	u8         local_port[0x8];
+	u8         pnat[0x2];
+	u8         reserved_at_12[0xe];
+
+	u8         reserved_at_20[0x18];
+	u8         page_select[0x8];
+
+	union mlx5_ifc_pddr_page_data_bits page_data;
+};
+
 struct mlx5_ifc_mpcnt_reg_bits {
 	u8         reserved_at_0[0x8];
 	u8         pcie_index[0x8];
@@ -7715,6 +7754,17 @@ struct mlx5_ifc_pcam_enhanced_features_bits {
 	u8         ppcnt_statistical_group[0x1];
 };
 
+struct mlx5_ifc_pcam_regs_5000_to_507f_bits {
+	u8         port_access_reg_cap_mask_127_to_96[0x20];
+	u8         port_access_reg_cap_mask_95_to_64[0x20];
+
+	u8         reserved_at_40[0xe];
+	u8         pddr[0x1];
+	u8         reserved_at_4f[0x11];
+
+	u8         port_access_reg_cap_mask_31_to_0[0x20];
+};
+
 struct mlx5_ifc_pcam_reg_bits {
 	u8         reserved_at_0[0x8];
 	u8         feature_group[0x8];
@@ -7724,6 +7774,7 @@ struct mlx5_ifc_pcam_reg_bits {
 	u8         reserved_at_20[0x20];
 
 	union {
+		struct mlx5_ifc_pcam_regs_5000_to_507f_bits regs_5000_to_507f;
 		u8         reserved_at_0[0x80];
 	} port_access_reg_cap_mask;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool
  2017-06-21 13:04 [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting Gal Pressman
  2017-06-21 13:04 ` [RFC PATCH net-next 1/3] ethtool: Add link down reason callback Gal Pressman
  2017-06-21 13:04 ` [RFC PATCH net-next 2/3] net/mlx5: Add PDDR register infrastructure Gal Pressman
@ 2017-06-21 13:04 ` Gal Pressman
  2017-06-21 14:03   ` Andrew Lunn
  2017-06-22  4:33   ` Jakub Kicinski
  2017-06-21 15:44 ` [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting Stephen Hemminger
  2017-06-22  4:14 ` Jakub Kicinski
  4 siblings, 2 replies; 30+ messages in thread
From: Gal Pressman @ 2017-06-21 13:04 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team,
	Gal Pressman

Use the new ethtool link down reason api, and expose troubleshooting
info regarding the link status.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 107 +++++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |   4 +
 drivers/net/ethernet/mellanox/mlx5/core/port.c     |  25 +++++
 3 files changed, 136 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index ab46061..a5e9f1f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -135,6 +135,88 @@ void mlx5e_build_ptys2ethtool_map(void)
 				       ETHTOOL_LINK_MODE_50000baseKR2_Full_BIT);
 }
 
+static const struct {
+	u16 pddr;
+	u32 ethtool;
+} pddr2ethtool_table[] = {
+	{
+		.pddr    = MLX5_LINK_NO_ISSUE_OBSERVED,
+		.ethtool = ETHTOOL_LINK_NO_ISSUE,
+	},
+	{
+		.pddr    = MLX5_LINK_PORT_CLOSED,
+		.ethtool = ETHTOOL_LINK_ADMIN_DOWN,
+	},
+	{
+		.pddr    = MLX5_LINK_AN_FAILURE,
+		.ethtool = ETHTOOL_LINK_AN_FAILED,
+	},
+	{
+		.pddr    = MLX5_LINK_TRAINING_FAILURE,
+		.ethtool = ETHTOOL_LINK_TRAINING_FAILED,
+	},
+	{
+		.pddr    = MLX5_LINK_REMOTE_FAULT_INDICATION,
+		.ethtool = ETHTOOL_LINK_RMT_FAULT,
+	},
+	{
+		.pddr    = MLX5_LINK_BAD_SIGNAL_INTEGRITY,
+		.ethtool = ETHTOOL_LINK_BAD_SIGNAL_INTEGRITY,
+	},
+	{
+		.pddr    = MLX5_LINK_CABLE_COMPLIANCE_CODE_MISMATCH,
+		.ethtool = ETHTOOL_LINK_CABLE_MISMATCH,
+	},
+	{
+		.pddr    = MLX5_LINK_INTERNAL_ERR,
+		.ethtool = ETHTOOL_LINK_INTERNAL_ERR,
+	},
+	{
+		.pddr    = MLX5_LINK_INFO_NOT_AVAIL,
+		.ethtool = ETHTOOL_LINK_REASON_UNKNOWN,
+	},
+	{
+		.pddr    = MLX5_LINK_CABLE_UNPLUGGED,
+		.ethtool = ETHTOOL_LINK_CABLE_UNPLUGGED,
+	},
+	{
+		.pddr    = MLX5_LINK_LONG_RANGE_FOR_NON_MLX_CABLE,
+		.ethtool = ETHTOOL_LINK_UNSUPP_MODULE,
+	},
+	{
+		.pddr    = MLX5_LINK_BUS_STUCK,
+		.ethtool = ETHTOOL_LINK_I2C_BUS_ERR,
+	},
+	{
+		.pddr    = MLX5_LINK_UNSUPP_EEPROM,
+		.ethtool = ETHTOOL_LINK_UNSUPP_EEPROM,
+	},
+	{
+		.pddr    = MLX5_LINK_MODULE_TEMP_SHUTDOWN,
+		.ethtool = ETHTOOL_LINK_OVERTEMP,
+	},
+	{
+		.pddr    = MLX5_LINK_POWER_BUDGET_EXCEEDED,
+		.ethtool = ETHTOOL_LINK_PWR_BUDGET_EXC,
+	},
+	{
+		.pddr    = MLX5_LINK_MNG_FORCED_DOWN,
+		.ethtool = ETHTOOL_LINK_MODULE_ADMIN_DOWN,
+	},
+};
+
+static u32 mlx5e_pddr2ethtool(u16 pddr)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(pddr2ethtool_table); i++) {
+		if (pddr2ethtool_table[i].pddr == pddr)
+			return pddr2ethtool_table[i].ethtool;
+	}
+
+	return ETHTOOL_LINK_VENDOR_SPECIFIC;
+}
+
 static unsigned long mlx5e_query_pfc_combined(struct mlx5e_priv *priv)
 {
 	struct mlx5_core_dev *mdev = priv->mdev;
@@ -1795,6 +1877,30 @@ static int mlx5e_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
 	return err;
 }
 
+static int mlx5e_get_link_down_reason(struct net_device *netdev,
+				      struct ethtool_link_down_reason *ldr)
+{
+	struct mlx5e_priv *priv = netdev_priv(netdev);
+	u16 monitor_opcode;
+	int err;
+
+	if (!netif_running(netdev)) {
+		ldr->reason = ETHTOOL_LINK_NETDEV_CARRIER_DOWN;
+		return 0;
+	}
+
+	err = mlx5_query_pddr_troubleshooting_info(priv->mdev,
+						   &monitor_opcode, NULL);
+	if (err)
+		return err;
+
+	ldr->reason = mlx5e_pddr2ethtool(monitor_opcode);
+	if (ldr->reason == ETHTOOL_LINK_VENDOR_SPECIFIC)
+		ldr->vendor_reason = monitor_opcode;
+
+	return 0;
+}
+
 const struct ethtool_ops mlx5e_ethtool_ops = {
 	.get_drvinfo       = mlx5e_get_drvinfo,
 	.get_link          = ethtool_op_get_link,
@@ -1828,4 +1934,5 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
 	.get_priv_flags    = mlx5e_get_priv_flags,
 	.set_priv_flags    = mlx5e_set_priv_flags,
 	.self_test         = mlx5e_self_test,
+	.get_link_down_reason = mlx5e_get_link_down_reason,
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index cbb6a0e..b4cbaa0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -147,6 +147,10 @@ int mlx5_query_pcam_reg(struct mlx5_core_dev *dev, u32 *pcam, u8 feature_group,
 int mlx5_query_mcam_reg(struct mlx5_core_dev *dev, u32 *mcap, u8 feature_group,
 			u8 access_reg_group);
 
+int mlx5_query_pddr_troubleshooting_info(struct mlx5_core_dev *mdev,
+					 u16 *monitor_opcode,
+					 u8 *status_message);
+
 void mlx5_lag_add(struct mlx5_core_dev *dev, struct net_device *netdev);
 void mlx5_lag_remove(struct mlx5_core_dev *dev);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index 38e97b2..b9e053c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -803,6 +803,31 @@ static int mlx5_query_pddr(struct mlx5_core_dev *mdev,
 	return mlx5_core_access_reg(mdev, in, sizeof(in), out, outlen, MLX5_REG_PDDR, 0, 0);
 }
 
+int mlx5_query_pddr_troubleshooting_info(struct mlx5_core_dev *mdev,
+					 u16 *monitor_opcode,
+					 u8 *status_message)
+{
+	int outlen = MLX5_ST_SZ_BYTES(pddr_reg);
+	u32 out[MLX5_ST_SZ_DW(pddr_reg)] = {0};
+	int err;
+
+	err = mlx5_query_pddr(mdev, MLX5_PDDR_TROUBLESHOOTING_INFO_PAGE,
+			      out, outlen);
+	if (err)
+		return err;
+
+	if (monitor_opcode)
+		*monitor_opcode = MLX5_GET(pddr_reg, out,
+					   page_data.troubleshooting_info_page.status_opcode.monitor_opcodes);
+
+	if (status_message)
+		memcpy(status_message,
+		       MLX5_ADDR_OF(pddr_reg, out, page_data.troubleshooting_info_page.status_message),
+		       ETH_GSTRING_LEN);
+
+	return 0;
+}
+
 static int mlx5_query_ports_check(struct mlx5_core_dev *mdev, u32 *out,
 				  int outlen)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-21 13:04 ` [RFC PATCH net-next 1/3] ethtool: Add link down reason callback Gal Pressman
@ 2017-06-21 13:58   ` Andrew Lunn
  2017-06-22  8:09     ` Gal Pressman
  2017-06-21 15:43   ` Stephen Hemminger
  2017-06-26 13:28   ` Andrew Lunn
  2 siblings, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2017-06-21 13:58 UTC (permalink / raw)
  To: Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team

> +enum {
> +	ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in vendor_reason */
> +	ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
> +	ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */

I think OTHER would be better that UNKNOWN. 

> +	ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
> +	ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */

These two are interesting. We have that information already. Why do we
want it again?

> +	ETHTOOL_LINK_AN_FAILED, /* Auto negotiation failed */
> +	ETHTOOL_LINK_TRAINING_FAILED, /* Link training failed */
> +	ETHTOOL_LINK_RMT_FAULT, /* Remote fault indication */
> +	ETHTOOL_LINK_BAD_SIGNAL_INTEGRITY, /* Bad signal integrity */
> +	ETHTOOL_LINK_CABLE_MISMATCH, /* Cable protocol mismatch */
> +	ETHTOOL_LINK_INTERNAL_ERR, /* Internal error */

How does an internet error differ from an UNKNOWN?

> +	ETHTOOL_LINK_CABLE_UNPLUGGED, /* Cable unplugged */
> +	ETHTOOL_LINK_UNSUPP_MODULE, /* Unsupported module */
> +	ETHTOOL_LINK_I2C_BUS_ERR, /* I2C bus error */

Which I2C bus? What about MDIO BUS errors?

> +	ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */

Which EEPROM?

> +	ETHTOOL_LINK_OVERTEMP, /* Over temperature */
> +	ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
> +	ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */

It seems like these last 6 are all SFP issues? How about putting SFP
into the name?

     Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool
  2017-06-21 13:04 ` [RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool Gal Pressman
@ 2017-06-21 14:03   ` Andrew Lunn
  2017-06-22  8:17     ` Gal Pressman
  2017-06-22  4:33   ` Jakub Kicinski
  1 sibling, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2017-06-21 14:03 UTC (permalink / raw)
  To: Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team

> +static int mlx5e_get_link_down_reason(struct net_device *netdev,
> +				      struct ethtool_link_down_reason *ldr)
> +{
> +	struct mlx5e_priv *priv = netdev_priv(netdev);
> +	u16 monitor_opcode;
> +	int err;
> +
> +	if (!netif_running(netdev)) {
> +		ldr->reason = ETHTOOL_LINK_NETDEV_CARRIER_DOWN;
> +		return 0;
> +	}

This is generic, will work for any interface. The same is true for
ADMIN_DOWN. Either it is not required at all, since the information is
available via other means, or it should be in the core.

	  Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-21 13:04 ` [RFC PATCH net-next 1/3] ethtool: Add link down reason callback Gal Pressman
  2017-06-21 13:58   ` Andrew Lunn
@ 2017-06-21 15:43   ` Stephen Hemminger
  2017-06-22 10:33     ` Gal Pressman
  2017-06-26 13:28   ` Andrew Lunn
  2 siblings, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2017-06-21 15:43 UTC (permalink / raw)
  To: Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team

On Wed, 21 Jun 2017 16:04:44 +0300
Gal Pressman <galp@mellanox.com> wrote:

> +
> +enum {
> +	ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in vendor_reason */
> +	ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
> +	ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
> +	ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
> +	ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
> +	ETHTOOL_LINK_AN_FAILED, /* Auto negotiation failed */
> +	ETHTOOL_LINK_TRAINING_FAILED, /* Link training failed */
> +	ETHTOOL_LINK_RMT_FAULT, /* Remote fault indication */
> +	ETHTOOL_LINK_BAD_SIGNAL_INTEGRITY, /* Bad signal integrity */
> +	ETHTOOL_LINK_CABLE_MISMATCH, /* Cable protocol mismatch */
> +	ETHTOOL_LINK_INTERNAL_ERR, /* Internal error */
> +	ETHTOOL_LINK_CABLE_UNPLUGGED, /* Cable unplugged */
> +	ETHTOOL_LINK_UNSUPP_MODULE, /* Unsupported module */
> +	ETHTOOL_LINK_I2C_BUS_ERR, /* I2C bus error */
> +	ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */
> +	ETHTOOL_LINK_OVERTEMP, /* Over temperature */
> +	ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
> +	ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
> +
> +	ETHTOOL_LINK_REASONS_COUNT
> +};

Any enumerated list is going to get changed too often.
Could the API just return a string?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting
  2017-06-21 13:04 [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting Gal Pressman
                   ` (2 preceding siblings ...)
  2017-06-21 13:04 ` [RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool Gal Pressman
@ 2017-06-21 15:44 ` Stephen Hemminger
  2017-06-22 11:13   ` Gal Pressman
  2017-06-22  4:14 ` Jakub Kicinski
  4 siblings, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2017-06-21 15:44 UTC (permalink / raw)
  To: Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team

On Wed, 21 Jun 2017 16:04:43 +0300
Gal Pressman <galp@mellanox.com> wrote:

> Hi All,
> 
> Currently, drivers can only tell whether the link is up/down through
> ETHTOOL_GLINK callback, but no additional information is given in case
> of link down/failure.
> 
> This series provides an infrastructure to ethtool that allows 
> netdevice drivers to hint the user with the reason why the link is down,
> in order to ease the debug process.
> 
> In addition two more mlx5 patches to demonstrate the usage.
> 
> Reasons are separated to two types, generic and vendor specific.
> Drivers can reply with a generic reason using the predefined enums 
> (and the ones that will be added in the future), which will be
> translated to strings by the userspace ethtool.
> In case of a vendor specific reason, drivers can reply with
> ETHTOOL_VENDOR_SPECIFIC reason, in this case the vendor_reason field
> should be filled with a vendor specific status code which will be parsed
> by the vendor specific userspace parser if one is available.
> 
> This kind of information can save system administrators precious time,
> which will not be wasted trying to understand why the link won't go up.
>     
> For example, when the cable is unplugged:
>     $ ethtool ethXX
>     ...
>     Link detected: no (Cable unplugged)
> 
> Thanks,
> Gal.

Ethtool doesn't work well as a monitoring/event interface.
Maybe this would be better as extended information with the  netlink UP/DOW event.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting
  2017-06-21 13:04 [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting Gal Pressman
                   ` (3 preceding siblings ...)
  2017-06-21 15:44 ` [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting Stephen Hemminger
@ 2017-06-22  4:14 ` Jakub Kicinski
  2017-06-22 11:37   ` Gal Pressman
  4 siblings, 1 reply; 30+ messages in thread
From: Jakub Kicinski @ 2017-06-22  4:14 UTC (permalink / raw)
  To: Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team

On Wed, 21 Jun 2017 16:04:43 +0300, Gal Pressman wrote:
> Hi All,
> 
> Currently, drivers can only tell whether the link is up/down through
> ETHTOOL_GLINK callback, but no additional information is given in case
> of link down/failure.
> 
> This series provides an infrastructure to ethtool that allows 
> netdevice drivers to hint the user with the reason why the link is down,
> in order to ease the debug process.
> 
> In addition two more mlx5 patches to demonstrate the usage.
> 
> Reasons are separated to two types, generic and vendor specific.
> Drivers can reply with a generic reason using the predefined enums 
> (and the ones that will be added in the future), which will be
> translated to strings by the userspace ethtool.
> In case of a vendor specific reason, drivers can reply with
> ETHTOOL_VENDOR_SPECIFIC reason, in this case the vendor_reason field
> should be filled with a vendor specific status code which will be parsed
> by the vendor specific userspace parser if one is available.
> 
> This kind of information can save system administrators precious time,
> which will not be wasted trying to understand why the link won't go up.
>     
> For example, when the cable is unplugged:
>     $ ethtool ethXX
>     ...
>     Link detected: no (Cable unplugged)

Any particular reason for implementing this ABI in ethtool rather than
via some netlink-based interface?  Devlink naturally comes to mind,
given that cabling problems are not really related to the L2 and netdev
shouldn't be required for diagnostics..

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool
  2017-06-21 13:04 ` [RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool Gal Pressman
  2017-06-21 14:03   ` Andrew Lunn
@ 2017-06-22  4:33   ` Jakub Kicinski
  2017-06-22  8:33     ` Gal Pressman
  1 sibling, 1 reply; 30+ messages in thread
From: Jakub Kicinski @ 2017-06-22  4:33 UTC (permalink / raw)
  To: Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team

On Wed, 21 Jun 2017 16:04:46 +0300, Gal Pressman wrote:
> Use the new ethtool link down reason api, and expose troubleshooting
> info regarding the link status.
> 
> Signed-off-by: Gal Pressman <galp@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

Is my reading correct that in case the reason is not in the
pddr2ethtool_table opaque binary data will be passed from the firmware
to user space?  Is there any particular reason to allow for this?  If
it's just for the rare scenario where a new error code needs to be
added perhaps it would be enough to dump the FW-provided message to the
logs?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-21 13:58   ` Andrew Lunn
@ 2017-06-22  8:09     ` Gal Pressman
  2017-06-22 13:34       ` Andrew Lunn
  2017-06-24 19:04       ` Andrew Lunn
  0 siblings, 2 replies; 30+ messages in thread
From: Gal Pressman @ 2017-06-22  8:09 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team


>> +enum {
>> +	ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in vendor_reason */
>> +	ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
>> +	ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
> I think OTHER would be better that UNKNOWN. 

Fine with me.
>> +	ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
>> +	ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
> These two are interesting. We have that information already. Why do we
> want it again?

My goal is to gather all link issue reasons in one place.
I don't want the user to gather pieces of other reports just to get a sense of what's wrong.
>> +	ETHTOOL_LINK_AN_FAILED, /* Auto negotiation failed */
>> +	ETHTOOL_LINK_TRAINING_FAILED, /* Link training failed */
>> +	ETHTOOL_LINK_RMT_FAULT, /* Remote fault indication */
>> +	ETHTOOL_LINK_BAD_SIGNAL_INTEGRITY, /* Bad signal integrity */
>> +	ETHTOOL_LINK_CABLE_MISMATCH, /* Cable protocol mismatch */
>> +	ETHTOOL_LINK_INTERNAL_ERR, /* Internal error */
> How does an internet error differ from an UNKNOWN?

Internal error means I know what happened but I can't tell you, your vendor support might be
able to provide a better analysis.
Unknown means that the driver doesn't have any useful information that can help in this case.
>> +	ETHTOOL_LINK_CABLE_UNPLUGGED, /* Cable unplugged */
>> +	ETHTOOL_LINK_UNSUPP_MODULE, /* Unsupported module */
>> +	ETHTOOL_LINK_I2C_BUS_ERR, /* I2C bus error */
> Which I2C bus? What about MDIO BUS errors?

The I2C bus that's connected to this module (interface).
We can add another reason for MDIO BUS errors or merge to one BUS error reason.
>> +	ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */
> Which EEPROM?

Module EEPROM.
>> +	ETHTOOL_LINK_OVERTEMP, /* Over temperature */
>> +	ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
>> +	ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
> It seems like these last 6 are all SFP issues? How about putting SFP
> into the name?

Might be a QSFP issue for example, we can put module in the name though.
>
>      Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool
  2017-06-21 14:03   ` Andrew Lunn
@ 2017-06-22  8:17     ` Gal Pressman
  0 siblings, 0 replies; 30+ messages in thread
From: Gal Pressman @ 2017-06-22  8:17 UTC (permalink / raw)
  To: Andrew Lunn, Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team


>> +	if (!netif_running(netdev)) {
>> +		ldr->reason = ETHTOOL_LINK_NETDEV_CARRIER_DOWN;
>> +		return 0;
>> +	}
> This is generic, will work for any interface. The same is true for
> ADMIN_DOWN. Either it is not required at all, since the information is
> available via other means, or it should be in the core.

I agree, will move it to the core.

>
> 	  Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool
  2017-06-22  4:33   ` Jakub Kicinski
@ 2017-06-22  8:33     ` Gal Pressman
  2017-06-22 22:38       ` Jakub Kicinski
  0 siblings, 1 reply; 30+ messages in thread
From: Gal Pressman @ 2017-06-22  8:33 UTC (permalink / raw)
  To: Jakub Kicinski, Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team


> Is my reading correct that in case the reason is not in the
> pddr2ethtool_table opaque binary data will be passed from the firmware
> to user space?  Is there any particular reason to allow for this?  If
> it's just for the rare scenario where a new error code needs to be
> added perhaps it would be enough to dump the FW-provided message to the
> logs?

No binary data is passed in this patch, the monitor_opcode is simply a vendor specific
16 bit id that is used when the reason  is not generic enough to have it's own enum.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-21 15:43   ` Stephen Hemminger
@ 2017-06-22 10:33     ` Gal Pressman
  0 siblings, 0 replies; 30+ messages in thread
From: Gal Pressman @ 2017-06-22 10:33 UTC (permalink / raw)
  To: Stephen Hemminger, Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team


>> +	ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
>> +	ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
>> +
>> +	ETHTOOL_LINK_REASONS_COUNT
>> +};
> Any enumerated list is going to get changed too often.
> Could the API just return a string?

The motivation for the enumerated list is to make this API as generic as possible and compatible
with all ethernet drivers.

Returning a string is a good idea, maybe change the vendor specific opcode to a string?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting
  2017-06-21 15:44 ` [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting Stephen Hemminger
@ 2017-06-22 11:13   ` Gal Pressman
  0 siblings, 0 replies; 30+ messages in thread
From: Gal Pressman @ 2017-06-22 11:13 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team

> Ethtool doesn't work well as a monitoring/event interface.
> Maybe this would be better as extended information with the  netlink UP/DOW event.

This API is not meant for monitoring events.
The main use case here is a new interface that's down and won't go up for some reason.
Nothing will magically change until someone fixes the issue, logging the reason as an event on
netdev creation seems less convenient to me.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting
  2017-06-22  4:14 ` Jakub Kicinski
@ 2017-06-22 11:37   ` Gal Pressman
  2017-06-22 21:53     ` Jakub Kicinski
  0 siblings, 1 reply; 30+ messages in thread
From: Gal Pressman @ 2017-06-22 11:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team


> Any particular reason for implementing this ABI in ethtool rather than
> via some netlink-based interface?  Devlink naturally comes to mind,
> given that cabling problems are not really related to the L2 and netdev
> shouldn't be required for diagnostics..

ethtool is already used for reporting and handling of link related stuff through get/set_link_ksettings.
How is reporting the link modes/port type/speed/etc different from this?
This feature is only an extension of the already existent "link detected" field.

Implementing this ABI in devlink is a good idea, but it shouldn't be instead of ethtool but in addition to ethtool.
Many users already use ethtool for this kind of info, we can tell them to use devlink to check why their link is down,
but I think it's nicer to have it all in one place.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-22  8:09     ` Gal Pressman
@ 2017-06-22 13:34       ` Andrew Lunn
  2017-06-23  8:23         ` Gal Pressman
  2017-06-24 19:04       ` Andrew Lunn
  1 sibling, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2017-06-22 13:34 UTC (permalink / raw)
  To: Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team

> The I2C bus that's connected to this module (interface).
> We can add another reason for MDIO BUS errors or merge to one BUS error reason.
> >> +	ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */
> > Which EEPROM?
> 
> Module EEPROM.

Which module? This is all very vague. Some of the Marvell 10G PHYs
have an EEPROM to boot from, for example. Would that count? Or are you
talking about the SFP 'EEPROM', which is not actually an EEPROM, in
that it is not Electrically Erasable, not is it a ROM, since things
like temperature changes with time.

> >> +	ETHTOOL_LINK_OVERTEMP, /* Over temperature */
> >> +	ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
> >> +	ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
> > It seems like these last 6 are all SFP issues? How about putting SFP
> > into the name?
> 
> Might be a QSFP issue for example, we can put module in the name though.

What is the generic name of SFP, SFP+ QSFP, SFF?

     Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting
  2017-06-22 11:37   ` Gal Pressman
@ 2017-06-22 21:53     ` Jakub Kicinski
  0 siblings, 0 replies; 30+ messages in thread
From: Jakub Kicinski @ 2017-06-22 21:53 UTC (permalink / raw)
  To: Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team

On Thu, 22 Jun 2017 14:37:26 +0300, Gal Pressman wrote:
> > Any particular reason for implementing this ABI in ethtool rather than
> > via some netlink-based interface?  Devlink naturally comes to mind,
> > given that cabling problems are not really related to the L2 and netdev
> > shouldn't be required for diagnostics..  
> 
> ethtool is already used for reporting and handling of link related stuff through get/set_link_ksettings.
> How is reporting the link modes/port type/speed/etc different from this?
> This feature is only an extension of the already existent "link detected" field.
> 
> Implementing this ABI in devlink is a good idea, but it shouldn't be instead of ethtool but in addition to ethtool.
> Many users already use ethtool for this kind of info, we can tell them to use devlink to check why their link is down,
> but I think it's nicer to have it all in one place.

I understand where you're coming from, but it's a matter of user space
tooling.  If ethtool has to invoke devlink or know a little of netlink
to get this info, so be it.  Ethtool implements a lot of features, I'm
worried that if we select where things are added based on where the
similar features already live, we will be stuck adding ioctls for
ever :(

But I'm happy to let this go if others don't feel the same way ;)  

In general thanks for working on this feature, it's very useful!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool
  2017-06-22  8:33     ` Gal Pressman
@ 2017-06-22 22:38       ` Jakub Kicinski
  2017-06-23  8:52         ` Gal Pressman
  0 siblings, 1 reply; 30+ messages in thread
From: Jakub Kicinski @ 2017-06-22 22:38 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Gal Pressman, netdev, David S. Miller, John W. Linville,
	Saeed Mahameed, Vidya Sagar Ravipati, Jiri Pirko,
	David Decotigny, kernel-team

On Thu, 22 Jun 2017 11:33:39 +0300, Gal Pressman wrote:
> > Is my reading correct that in case the reason is not in the
> > pddr2ethtool_table opaque binary data will be passed from the firmware
> > to user space?  Is there any particular reason to allow for this?  If
> > it's just for the rare scenario where a new error code needs to be
> > added perhaps it would be enough to dump the FW-provided message to the
> > logs?  
> 
> No binary data is passed in this patch, the monitor_opcode is simply a vendor specific
> 16 bit id that is used when the reason  is not generic enough to have it's own enum.

Sorry if I'm wrong, I thought this would potentially copy
ETH_GSTRING_LEN bytes to userspace:

+	if (status_message)
+		memcpy(status_message,
+		       MLX5_ADDR_OF(pddr_reg, out, page_data.troubleshooting_info_page.status_message),
+		       ETH_GSTRING_LEN);

I'm also still not sure why a reason would not be generic enough for
the enum, if it fits in the 16bit vendor enum... 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-22 13:34       ` Andrew Lunn
@ 2017-06-23  8:23         ` Gal Pressman
  2017-06-23 13:52           ` Andrew Lunn
  0 siblings, 1 reply; 30+ messages in thread
From: Gal Pressman @ 2017-06-23  8:23 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team

>> The I2C bus that's connected to this module (interface).
>> We can add another reason for MDIO BUS errors or merge to one BUS error reason.
>>>> +	ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */
>>> Which EEPROM?
>> Module EEPROM.
> Which module? This is all very vague. Some of the Marvell 10G PHYs
> have an EEPROM to boot from, for example. Would that count? Or are you
> talking about the SFP 'EEPROM', which is not actually an EEPROM, in
> that it is not Electrically Erasable, not is it a ROM, since things
> like temperature changes with time.

I am referring to the optical/electrical module EEPROM which is exposed through standard
interface such as SFF 8472. Might not be an actual EEPROM but that's how the SFF committee decided
to refer to it :).

>
>>>> +	ETHTOOL_LINK_OVERTEMP, /* Over temperature */
>>>> +	ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
>>>> +	ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
>>> It seems like these last 6 are all SFP issues? How about putting SFP
>>> into the name?
>> Might be a QSFP issue for example, we can put module in the name though.
> What is the generic name of SFP, SFP+ QSFP, SFF?

AFAIK, the name is module.
>
>      Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool
  2017-06-22 22:38       ` Jakub Kicinski
@ 2017-06-23  8:52         ` Gal Pressman
  0 siblings, 0 replies; 30+ messages in thread
From: Gal Pressman @ 2017-06-23  8:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Gal Pressman, netdev, David S. Miller, John W. Linville,
	Saeed Mahameed, Vidya Sagar Ravipati, Jiri Pirko,
	David Decotigny, kernel-team


> On Thu, 22 Jun 2017 11:33:39 +0300, Gal Pressman wrote:
>>> Is my reading correct that in case the reason is not in the
>>> pddr2ethtool_table opaque binary data will be passed from the firmware
>>> to user space?  Is there any particular reason to allow for this?  If
>>> it's just for the rare scenario where a new error code needs to be
>>> added perhaps it would be enough to dump the FW-provided message to the
>>> logs?  
>> No binary data is passed in this patch, the monitor_opcode is simply a vendor specific
>> 16 bit id that is used when the reason  is not generic enough to have it's own enum.
> Sorry if I'm wrong, I thought this would potentially copy
> ETH_GSTRING_LEN bytes to userspace:
>
> +	if (status_message)
> +		memcpy(status_message,
> +		       MLX5_ADDR_OF(pddr_reg, out, page_data.troubleshooting_info_page.status_message),
> +		       ETH_GSTRING_LEN);
>
> I'm also still not sure why a reason would not be generic enough for
> the enum, if it fits in the 16bit vendor enum... 
Some reasons might be very specific, and only relevant to my hardware for example.
In that case I'd like to avoid an enum entry that is not wide enough to be used by anyone else but me,
and still have a way to pass it to the user.

The userspace patches will allow for a vendor parser (similar to ethtool regdump parsers) that can
convert this 16bit id to a string.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-23  8:23         ` Gal Pressman
@ 2017-06-23 13:52           ` Andrew Lunn
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Lunn @ 2017-06-23 13:52 UTC (permalink / raw)
  To: Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team

On Fri, Jun 23, 2017 at 11:23:17AM +0300, Gal Pressman wrote:
> >> The I2C bus that's connected to this module (interface).
> >> We can add another reason for MDIO BUS errors or merge to one BUS error reason.
> >>>> +	ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */
> >>> Which EEPROM?
> >> Module EEPROM.
> > Which module? This is all very vague. Some of the Marvell 10G PHYs
> > have an EEPROM to boot from, for example. Would that count? Or are you
> > talking about the SFP 'EEPROM', which is not actually an EEPROM, in
> > that it is not Electrically Erasable, not is it a ROM, since things
> > like temperature changes with time.
> 

> I am referring to the optical/electrical module EEPROM which is
> exposed through standard interface such as SFF 8472. Might not be an
> actual EEPROM but that's how the SFF committee decided to refer to
> it :).

Right, so at a minimum, put a comment: The following properties
referring to the optical/electrical module EEPROM which is exposed
through standard interface such as SFF 8472.

That makes it a lot less ambiguous.

> 
> >
> >>>> +	ETHTOOL_LINK_OVERTEMP, /* Over temperature */
> >>>> +	ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
> >>>> +	ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
> >>> It seems like these last 6 are all SFP issues? How about putting SFP
> >>> into the name?
> >> Might be a QSFP issue for example, we can put module in the name though.
> > What is the generic name of SFP, SFP+ QSFP, SFF?
> 
> AFAIK, the name is module.

And as a term, module is overloaded. If the standard is called SFF
8472, then i would suggest putting SFF in these macros.

This is all assuming we actually decide to expose this information
this way....

     Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-22  8:09     ` Gal Pressman
  2017-06-22 13:34       ` Andrew Lunn
@ 2017-06-24 19:04       ` Andrew Lunn
  2017-06-25 11:59         ` Gal Pressman
  1 sibling, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2017-06-24 19:04 UTC (permalink / raw)
  To: Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team

On Thu, Jun 22, 2017 at 11:09:04AM +0300, Gal Pressman wrote:
> 
> >> +enum {
> >> +	ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in vendor_reason */
> >> +	ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
> >> +	ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
> > I think OTHER would be better that UNKNOWN. 
> 
> Fine with me.
> >> +	ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
> >> +	ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
> > These two are interesting. We have that information already. Why do we
> > want it again?
> 
> My goal is to gather all link issue reasons in one place.

I'm actually wondering if this is a user space problem. Nearly
everything you list is already available. Some you get from ip link,
others from ethtool or ethtool --module-info, including I2C bus
error, since you would expect EIO or ETIMEOUT.

If you were to write a user space tool using the information what is
currently available, what would be missing?

	  Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-24 19:04       ` Andrew Lunn
@ 2017-06-25 11:59         ` Gal Pressman
  2017-06-25 15:19           ` Andrew Lunn
  0 siblings, 1 reply; 30+ messages in thread
From: Gal Pressman @ 2017-06-25 11:59 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team


> On Thu, Jun 22, 2017 at 11:09:04AM +0300, Gal Pressman wrote:
>>>> +enum {
>>>> +	ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in vendor_reason */
>>>> +	ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
>>>> +	ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
>>> I think OTHER would be better that UNKNOWN. 
>> Fine with me.
>>>> +	ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
>>>> +	ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
>>> These two are interesting. We have that information already. Why do we
>>> want it again?
>> My goal is to gather all link issue reasons in one place.
> I'm actually wondering if this is a user space problem. Nearly
> everything you list is already available. Some you get from ip link,
> others from ethtool or ethtool --module-info, including I2C bus
> error, since you would expect EIO or ETIMEOUT.
>
> If you were to write a user space tool using the information what is
> currently available, what would be missing?
>
> 	  Andrew

I think most of the reasons in this list would be missing.
Auto negotiation failure, link training failure, remote fault indication, bad signal integrity, cable protocol mismatch, cable unplugged,
over temperature, power budget exceeded..

Keep in mind that this is just an initial list, not to mention the vendor reasons which are not part of any existing API.
I don't see how a user space tool that expects ETIMEOUT/EIO is better than this suggestion.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-25 11:59         ` Gal Pressman
@ 2017-06-25 15:19           ` Andrew Lunn
  2017-06-26 11:52             ` Gal Pressman
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2017-06-25 15:19 UTC (permalink / raw)
  To: Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team

On Sun, Jun 25, 2017 at 02:59:24PM +0300, Gal Pressman wrote:
> 
> > On Thu, Jun 22, 2017 at 11:09:04AM +0300, Gal Pressman wrote:
> >>>> +enum {
> >>>> +	ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in vendor_reason */
> >>>> +	ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
> >>>> +	ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
> >>> I think OTHER would be better that UNKNOWN. 
> >> Fine with me.
> >>>> +	ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
> >>>> +	ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
> >>> These two are interesting. We have that information already. Why do we
> >>> want it again?
> >> My goal is to gather all link issue reasons in one place.
> > I'm actually wondering if this is a user space problem. Nearly
> > everything you list is already available. Some you get from ip link,
> > others from ethtool or ethtool --module-info, including I2C bus
> > error, since you would expect EIO or ETIMEOUT.
> >
> > If you were to write a user space tool using the information what is
> > currently available, what would be missing?
> >
> > 	  Andrew
> 
> I think most of the reasons in this list would be missing.
> Auto negotiation failure,

You can probably get that from the PHY layer. You get both the local
and remote AN capabilities.

> unplugged, over temperature, power budget exceeded..

Don't you get over temperature from the SFF data? Also power budget?

And what does cable unplugged actually mean? Do you have a micro
switch inside the socket? So that is maybe a gpio-key?

Another thing to remember is that your device is the exception to the
rule. You have some firmware doing a lot of the work bringing this all
together. But nearly every other Ethernet interface has a discrete MAC
and PHY, I2C bus driver, EEPROM driver, generic SFF decoder, HWMON
temperature sensor, etc. How does your call work in this normal
situation? How do you make calls into all these subsystems to get the
information? You want a generic solution which can be made to work for
everybody.

	Andrew




> 
> Keep in mind that this is just an initial list, not to mention the vendor reasons which are not part of any existing API.
> I don't see how a user space tool that expects ETIMEOUT/EIO is better than this suggestion.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-25 15:19           ` Andrew Lunn
@ 2017-06-26 11:52             ` Gal Pressman
  2017-06-26 13:34               ` Andrew Lunn
  0 siblings, 1 reply; 30+ messages in thread
From: Gal Pressman @ 2017-06-26 11:52 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team


> On Sun, Jun 25, 2017 at 02:59:24PM +0300, Gal Pressman wrote:
>>> On Thu, Jun 22, 2017 at 11:09:04AM +0300, Gal Pressman wrote:
>>>>>> +enum {
>>>>>> +	ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in vendor_reason */
>>>>>> +	ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
>>>>>> +	ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
>>>>> I think OTHER would be better that UNKNOWN. 
>>>> Fine with me.
>>>>>> +	ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
>>>>>> +	ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
>>>>> These two are interesting. We have that information already. Why do we
>>>>> want it again?
>>>> My goal is to gather all link issue reasons in one place.
>>> I'm actually wondering if this is a user space problem. Nearly
>>> everything you list is already available. Some you get from ip link,
>>> others from ethtool or ethtool --module-info, including I2C bus
>>> error, since you would expect EIO or ETIMEOUT.
>>>
>>> If you were to write a user space tool using the information what is
>>> currently available, what would be missing?
>>>
>>> 	  Andrew
>> I think most of the reasons in this list would be missing.
>> Auto negotiation failure,
> You can probably get that from the PHY layer. You get both the local
> and remote AN capabilities.
>
>> unplugged, over temperature, power budget exceeded..
> Don't you get over temperature from the SFF data? Also power budget?
You are right, but it depends on other resources that might fail such as BUS failure, invalid EEPROM, etc..

> And what does cable unplugged actually mean? Do you have a micro
> switch inside the socket? So that is maybe a gpio-key?
No, some hardware devices can sense this state.
We would like to expose this information when it's available.

> Another thing to remember is that your device is the exception to the
> rule. You have some firmware doing a lot of the work bringing this all
> together. But nearly every other Ethernet interface has a discrete MAC
> and PHY, I2C bus driver, EEPROM driver, generic SFF decoder, HWMON
> temperature sensor, etc. How does your call work in this normal
> situation? How do you make calls into all these subsystems to get the
> information? You want a generic solution which can be made to work for
> everybody.
The driver has a good intimate information of his device implementation, and hence an analysis done by the device vendor is favorable.
The driver provider can perform the analysis inside the device (firmware) or in the driver according to his preferences.
We believe that since devices are becoming smarter, more analysis will be done by the device itself, which has more
information and faster access.
Smart NICs/SoCs are very popular this days and this API takes into account the different architectures.

Since this callback is optional, a user space analysis tool can be added in the future providing more generic analysis approach for
unsupported devices.

> 	Andrew
>
>> Keep in mind that this is just an initial list, not to mention the vendor reasons which are not part of any existing API.
>> I don't see how a user space tool that expects ETIMEOUT/EIO is better than this suggestion.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-21 13:04 ` [RFC PATCH net-next 1/3] ethtool: Add link down reason callback Gal Pressman
  2017-06-21 13:58   ` Andrew Lunn
  2017-06-21 15:43   ` Stephen Hemminger
@ 2017-06-26 13:28   ` Andrew Lunn
  2017-06-26 15:48     ` Gal Pressman
  2 siblings, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2017-06-26 13:28 UTC (permalink / raw)
  To: Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team

On Wed, Jun 21, 2017 at 04:04:44PM +0300, Gal Pressman wrote:
> Currently, drivers can only tell whether the link is up/down through
> ETHTOOL_GLINK callback, but no additional information is given in case
> of link down.
> This patch provides an infrastructure that allows drivers to hint
> the user with the reason why the link is down, in order to ease the
> debug process.
> 
> Reasons are separated to two types, generic and vendor specific.
> Drivers can reply with a generic reason using the enums provided in this
> patch (and the ones that will be added in the future), which will be
> translated to strings by the userspace ethtool.
> In case of a vendor specific reason (not suitable for most vendors),
> drivers can reply with ETHTOOL_VENDOR_SPECIFIC reason, in this case the
> vendor_reason field should be filled with a vendor specific status code
> which will be parsed by the vendor specific userspace parser if one is
> available.
> 
> This kind of information can save system administrators precious time
> which will not be wasted trying to understand why the link won't go
> up.
> 
> For example, when the cable is unplugged:
> $ ethtool ethXX
> ...
> Link detected: no (Cable unplugged)
> 
> Signed-off-by: Gal Pressman <galp@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
>  include/linux/ethtool.h      |  2 ++
>  include/uapi/linux/ethtool.h | 33 +++++++++++++++++++++++++++++++++
>  net/core/ethtool.c           | 24 ++++++++++++++++++++++++
>  3 files changed, 59 insertions(+)
> 
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index 83cc986..d472047 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -374,5 +374,7 @@ struct ethtool_ops {
>  				      struct ethtool_link_ksettings *);
>  	int	(*set_link_ksettings)(struct net_device *,
>  				      const struct ethtool_link_ksettings *);
> +	int	(*get_link_down_reason)(struct net_device *,
> +					struct ethtool_link_down_reason *);
>  };
>  #endif /* _LINUX_ETHTOOL_H */
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index 7d4a594..8cf9d2c 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -550,6 +550,13 @@ struct ethtool_pauseparam {
>  
>  #define ETH_GSTRING_LEN		32
>  
> +struct ethtool_link_down_reason {
> +	__u32	cmd;
> +	__u32	reason;
> +	__u32	vendor_reason;
> +	__u32	reserved[4];
> +};

Shouldn't this be a list? The device is over its power budget,
overheating and does not have a cable plugged in. There can be
multiple reasons it is down.

	 Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-26 11:52             ` Gal Pressman
@ 2017-06-26 13:34               ` Andrew Lunn
  2017-06-27 19:40                 ` David Miller
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2017-06-26 13:34 UTC (permalink / raw)
  To: Gal Pressman
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team

> The driver has a good intimate information of his device
> implementation, and hence an analysis done by the device vendor is
> favorable.

For a firmware based NIC, maybe. For a discrete NIC, making use of all
the Linux subsystems, this is going to be hard from within the kernel.

> The driver provider can perform the analysis inside the device (firmware) or in the driver according to his preferences.
> We believe that since devices are becoming smarter, more analysis will be done by the device itself, which has more
> information and faster access.
> Smart NICs/SoCs are very popular this days and this API takes into account the different architectures.
> 
> Since this callback is optional, a user space analysis tool can be added in the future providing more generic analysis approach for
> unsupported devices.

I still fear this is going to be an ethtool call with only one user.

  Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-26 13:28   ` Andrew Lunn
@ 2017-06-26 15:48     ` Gal Pressman
  0 siblings, 0 replies; 30+ messages in thread
From: Gal Pressman @ 2017-06-26 15:48 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, David S. Miller, John W. Linville, Saeed Mahameed,
	Vidya Sagar Ravipati, Jiri Pirko, David Decotigny, kernel-team


> On Wed, Jun 21, 2017 at 04:04:44PM +0300, Gal Pressman wrote:
>> Currently, drivers can only tell whether the link is up/down through
>> ETHTOOL_GLINK callback, but no additional information is given in case
>> of link down.
>> This patch provides an infrastructure that allows drivers to hint
>> the user with the reason why the link is down, in order to ease the
>> debug process.
>>
>> Reasons are separated to two types, generic and vendor specific.
>> Drivers can reply with a generic reason using the enums provided in this
>> patch (and the ones that will be added in the future), which will be
>> translated to strings by the userspace ethtool.
>> In case of a vendor specific reason (not suitable for most vendors),
>> drivers can reply with ETHTOOL_VENDOR_SPECIFIC reason, in this case the
>> vendor_reason field should be filled with a vendor specific status code
>> which will be parsed by the vendor specific userspace parser if one is
>> available.
>>
>> This kind of information can save system administrators precious time
>> which will not be wasted trying to understand why the link won't go
>> up.
>>
>> For example, when the cable is unplugged:
>> $ ethtool ethXX
>> ...
>> Link detected: no (Cable unplugged)
>>
>> Signed-off-by: Gal Pressman <galp@mellanox.com>
>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
>> ---
>>  include/linux/ethtool.h      |  2 ++
>>  include/uapi/linux/ethtool.h | 33 +++++++++++++++++++++++++++++++++
>>  net/core/ethtool.c           | 24 ++++++++++++++++++++++++
>>  3 files changed, 59 insertions(+)
>>
>> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
>> index 83cc986..d472047 100644
>> --- a/include/linux/ethtool.h
>> +++ b/include/linux/ethtool.h
>> @@ -374,5 +374,7 @@ struct ethtool_ops {
>>  				      struct ethtool_link_ksettings *);
>>  	int	(*set_link_ksettings)(struct net_device *,
>>  				      const struct ethtool_link_ksettings *);
>> +	int	(*get_link_down_reason)(struct net_device *,
>> +					struct ethtool_link_down_reason *);
>>  };
>>  #endif /* _LINUX_ETHTOOL_H */
>> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
>> index 7d4a594..8cf9d2c 100644
>> --- a/include/uapi/linux/ethtool.h
>> +++ b/include/uapi/linux/ethtool.h
>> @@ -550,6 +550,13 @@ struct ethtool_pauseparam {
>>  
>>  #define ETH_GSTRING_LEN		32
>>  
>> +struct ethtool_link_down_reason {
>> +	__u32	cmd;
>> +	__u32	reason;
>> +	__u32	vendor_reason;
>> +	__u32	reserved[4];
>> +};
> Shouldn't this be a list? The device is over its power budget,
> overheating and does not have a cable plugged in. There can be
> multiple reasons it is down.
Current implementation will report one issue at a time, fix it to reveal the next one :).

Reporting more than one issue is possible, for example:
Add another callback to return the number of issues and allocate a buffer accordingly, sounds like an overkill to me.
Alternatively, use a bitmap instead of the enum, which will probably get too big very quickly.
I don't think it's worth it in this particular case since in most cases one reason should suffice.

>
> 	 Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback
  2017-06-26 13:34               ` Andrew Lunn
@ 2017-06-27 19:40                 ` David Miller
  0 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2017-06-27 19:40 UTC (permalink / raw)
  To: andrew; +Cc: galp, netdev, linville, saeedm, vidya, jiri, decot, kernel-team

From: Andrew Lunn <andrew@lunn.ch>
Date: Mon, 26 Jun 2017 15:34:39 +0200

> I still fear this is going to be an ethtool call with only one user.

That is my fear as well.

We are also in a sort-of moratorium for adding new major ethtool
features until the conversion of ethtool over to netlink occurs.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-06-27 19:40 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-21 13:04 [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting Gal Pressman
2017-06-21 13:04 ` [RFC PATCH net-next 1/3] ethtool: Add link down reason callback Gal Pressman
2017-06-21 13:58   ` Andrew Lunn
2017-06-22  8:09     ` Gal Pressman
2017-06-22 13:34       ` Andrew Lunn
2017-06-23  8:23         ` Gal Pressman
2017-06-23 13:52           ` Andrew Lunn
2017-06-24 19:04       ` Andrew Lunn
2017-06-25 11:59         ` Gal Pressman
2017-06-25 15:19           ` Andrew Lunn
2017-06-26 11:52             ` Gal Pressman
2017-06-26 13:34               ` Andrew Lunn
2017-06-27 19:40                 ` David Miller
2017-06-21 15:43   ` Stephen Hemminger
2017-06-22 10:33     ` Gal Pressman
2017-06-26 13:28   ` Andrew Lunn
2017-06-26 15:48     ` Gal Pressman
2017-06-21 13:04 ` [RFC PATCH net-next 2/3] net/mlx5: Add PDDR register infrastructure Gal Pressman
2017-06-21 13:04 ` [RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool Gal Pressman
2017-06-21 14:03   ` Andrew Lunn
2017-06-22  8:17     ` Gal Pressman
2017-06-22  4:33   ` Jakub Kicinski
2017-06-22  8:33     ` Gal Pressman
2017-06-22 22:38       ` Jakub Kicinski
2017-06-23  8:52         ` Gal Pressman
2017-06-21 15:44 ` [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting Stephen Hemminger
2017-06-22 11:13   ` Gal Pressman
2017-06-22  4:14 ` Jakub Kicinski
2017-06-22 11:37   ` Gal Pressman
2017-06-22 21:53     ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.