All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v0 00/12] mlxsw thermal monitoring amendments
@ 2018-06-21 15:27 Vadim Pasternak
  2018-06-21 15:27 ` [PATCH v0 01/12] mlxsw: spectrum: Move QSFP EEPROM defenitons to common location Vadim Pasternak
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Vadim Pasternak @ 2018-06-21 15:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, jiri, Vadim Pasternak

This patchset extends mlxsw hwmon and thermal modules with ports
temperature reading and adds new hwmon attributes for FAN and
temperature.

Ports temperatures are most critical component in system thermal control
and should be considered by thermal algorithm.

New hwmon attributes, such as FAN faults, port temperature fault will
improve system monitoring abilities.

Vadim Pasternak (12):
  mlxsw: spectrum: Move QSFP EEPROM defenitons to common location
  mlxsw: reg: Add MTBR register
  mlxsw: core: Add core environment module for port temperature reading
  mlxsw: core: Extend hwmon interface with FAN fault attribute
  mlxsw: core: Extend hwmon interface with port temperature attributes
  mlxsw: core: Add bus frequency capability flag for the bus type
  mlxsw: core: Set different thermal polling time based on bus type
  mlxsw: core: Modify thermal zone definition
  mlxsw: core: Extend thermal zone operations with get_trend method
  mlxsw: core: Extend cooling device with cooling levels
  mlxsw: core: Rename cooling device
  mlxsw: core: Add ports temperature measurement to thermal algorithm

 drivers/net/ethernet/mellanox/mlxsw/Makefile       |   2 +-
 drivers/net/ethernet/mellanox/mlxsw/core.h         |   1 +
 drivers/net/ethernet/mellanox/mlxsw/core_env.c     | 316 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/core_env.h     |  63 ++++
 drivers/net/ethernet/mellanox/mlxsw/core_hwmon.c   | 164 ++++++++++-
 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 231 +++++++++++++--
 drivers/net/ethernet/mellanox/mlxsw/i2c.c          |   1 +
 drivers/net/ethernet/mellanox/mlxsw/reg.h          | 101 ++++++-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     |  62 ++--
 9 files changed, 865 insertions(+), 76 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core_env.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core_env.h

-- 
2.1.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v0 01/12] mlxsw: spectrum: Move QSFP EEPROM defenitons to common location
  2018-06-21 15:27 [PATCH v0 00/12] mlxsw thermal monitoring amendments Vadim Pasternak
@ 2018-06-21 15:27 ` Vadim Pasternak
  2018-06-21 15:27 ` [PATCH v0 02/12] mlxsw: reg: Add MTBR register Vadim Pasternak
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Vadim Pasternak @ 2018-06-21 15:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, jiri, Vadim Pasternak

Move QSFP EEPROM definitions to common location from the spectrum
driver in order to make them available for other mlxsw modules. They
are common for all kind of chips and have relation to SFF
specifications 8024, 8436, 8472, 8636, rather then to chip type.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h      | 32 ++++++++++++-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 62 +++++++++-----------------
 2 files changed, 52 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 1877d9f..6a41c48 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -6757,13 +6757,41 @@ MLXSW_ITEM32(reg, mcia, device_address, 0x04, 0, 16);
  */
 MLXSW_ITEM32(reg, mcia, size, 0x08, 0, 16);
 
-#define MLXSW_SP_REG_MCIA_EEPROM_SIZE 48
+#define MLXSW_REG_MCIA_EEPROM_PAGE_LENGTH	256
+#define MLXSW_REG_MCIA_EEPROM_SIZE		48
+#define MLXSW_REG_MCIA_I2C_ADDR_LOW		0x50
+#define MLXSW_REG_MCIA_I2C_ADDR_HIGH		0x51
+#define MLXSW_REG_MCIA_PAGE0_LO_OFF		0xa0
+#define MLXSW_REG_MCIA_TH_SIZE			8
+#define MLXSW_REG_MCIA_TH_PAGE_NUM		3
+#define MLXSW_REG_MCIA_PAGE0_LO			0
+#define MLXSW_REG_MCIA_TH_PAGE_OFF		0x80
+
+enum mlxsw_reg_mcia_eeprom_module_info_rev_id {
+	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_REV_ID_UNSPC	= 0x00,
+	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_REV_ID_8436	= 0x01,
+	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_REV_ID_8636	= 0x03,
+};
+
+enum mlxsw_reg_mcia_eeprom_module_info_id {
+	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_SFP	= 0x03,
+	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP	= 0x0C,
+	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP_PLUS	= 0x0D,
+	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP28	= 0x11,
+	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP_DD	= 0x18,
+};
+
+enum mlxsw_reg_mcia_eeprom_module_info {
+	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID,
+	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_REV_ID,
+	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_SIZE,
+};
 
 /* reg_mcia_eeprom
  * Bytes to read/write.
  * Access: RW
  */
-MLXSW_ITEM_BUF(reg, mcia, eeprom, 0x10, MLXSW_SP_REG_MCIA_EEPROM_SIZE);
+MLXSW_ITEM_BUF(reg, mcia, eeprom, 0x10, MLXSW_REG_MCIA_EEPROM_SIZE);
 
 static inline void mlxsw_reg_mcia_pack(char *payload, u8 module, u8 lock,
 				       u8 page_number, u16 device_addr,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 968b88a..1b0d1bc 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -2481,23 +2481,23 @@ static int mlxsw_sp_query_module_eeprom(struct mlxsw_sp_port *mlxsw_sp_port,
 					unsigned int *p_read_size)
 {
 	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
-	char eeprom_tmp[MLXSW_SP_REG_MCIA_EEPROM_SIZE];
+	char eeprom_tmp[MLXSW_REG_MCIA_EEPROM_SIZE];
 	char mcia_pl[MLXSW_REG_MCIA_LEN];
 	u16 i2c_addr;
 	int status;
 	int err;
 
-	size = min_t(u16, size, MLXSW_SP_REG_MCIA_EEPROM_SIZE);
+	size = min_t(u16, size, MLXSW_REG_MCIA_EEPROM_SIZE);
 
-	if (offset < MLXSW_SP_EEPROM_PAGE_LENGTH &&
-	    offset + size > MLXSW_SP_EEPROM_PAGE_LENGTH)
+	if (offset < MLXSW_REG_MCIA_EEPROM_PAGE_LENGTH &&
+	    offset + size > MLXSW_REG_MCIA_EEPROM_PAGE_LENGTH)
 		/* Cross pages read, read until offset 256 in low page */
-		size = MLXSW_SP_EEPROM_PAGE_LENGTH - offset;
+		size = MLXSW_REG_MCIA_EEPROM_PAGE_LENGTH - offset;
 
-	i2c_addr = MLXSW_SP_I2C_ADDR_LOW;
-	if (offset >= MLXSW_SP_EEPROM_PAGE_LENGTH) {
-		i2c_addr = MLXSW_SP_I2C_ADDR_HIGH;
-		offset -= MLXSW_SP_EEPROM_PAGE_LENGTH;
+	i2c_addr = MLXSW_REG_MCIA_I2C_ADDR_LOW;
+	if (offset >= MLXSW_REG_MCIA_EEPROM_PAGE_LENGTH) {
+		i2c_addr = MLXSW_REG_MCIA_I2C_ADDR_HIGH;
+		offset -= MLXSW_REG_MCIA_EEPROM_PAGE_LENGTH;
 	}
 
 	mlxsw_reg_mcia_pack(mcia_pl, mlxsw_sp_port->mapping.module,
@@ -2518,55 +2518,37 @@ static int mlxsw_sp_query_module_eeprom(struct mlxsw_sp_port *mlxsw_sp_port,
 	return 0;
 }
 
-enum mlxsw_sp_eeprom_module_info_rev_id {
-	MLXSW_SP_EEPROM_MODULE_INFO_REV_ID_UNSPC      = 0x00,
-	MLXSW_SP_EEPROM_MODULE_INFO_REV_ID_8436       = 0x01,
-	MLXSW_SP_EEPROM_MODULE_INFO_REV_ID_8636       = 0x03,
-};
-
-enum mlxsw_sp_eeprom_module_info_id {
-	MLXSW_SP_EEPROM_MODULE_INFO_ID_SFP              = 0x03,
-	MLXSW_SP_EEPROM_MODULE_INFO_ID_QSFP             = 0x0C,
-	MLXSW_SP_EEPROM_MODULE_INFO_ID_QSFP_PLUS        = 0x0D,
-	MLXSW_SP_EEPROM_MODULE_INFO_ID_QSFP28           = 0x11,
-};
-
-enum mlxsw_sp_eeprom_module_info {
-	MLXSW_SP_EEPROM_MODULE_INFO_ID,
-	MLXSW_SP_EEPROM_MODULE_INFO_REV_ID,
-	MLXSW_SP_EEPROM_MODULE_INFO_SIZE,
-};
-
 static int mlxsw_sp_get_module_info(struct net_device *netdev,
 				    struct ethtool_modinfo *modinfo)
 {
 	struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(netdev);
-	u8 module_info[MLXSW_SP_EEPROM_MODULE_INFO_SIZE];
+	u8 module_info[MLXSW_REG_MCIA_EEPROM_MODULE_INFO_SIZE];
 	u8 module_rev_id, module_id;
 	unsigned int read_size;
 	int err;
 
 	err = mlxsw_sp_query_module_eeprom(mlxsw_sp_port, 0,
-					   MLXSW_SP_EEPROM_MODULE_INFO_SIZE,
-					   module_info, &read_size);
+				MLXSW_REG_MCIA_EEPROM_MODULE_INFO_SIZE,
+				module_info, &read_size);
 	if (err)
 		return err;
 
-	if (read_size < MLXSW_SP_EEPROM_MODULE_INFO_SIZE)
+	if (read_size < MLXSW_REG_MCIA_EEPROM_MODULE_INFO_SIZE)
 		return -EIO;
 
-	module_rev_id = module_info[MLXSW_SP_EEPROM_MODULE_INFO_REV_ID];
-	module_id = module_info[MLXSW_SP_EEPROM_MODULE_INFO_ID];
+	module_rev_id = module_info[MLXSW_REG_MCIA_EEPROM_MODULE_INFO_REV_ID];
+	module_id = module_info[MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID];
 
 	switch (module_id) {
-	case MLXSW_SP_EEPROM_MODULE_INFO_ID_QSFP:
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP:
 		modinfo->type       = ETH_MODULE_SFF_8436;
 		modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
 		break;
-	case MLXSW_SP_EEPROM_MODULE_INFO_ID_QSFP_PLUS:
-	case MLXSW_SP_EEPROM_MODULE_INFO_ID_QSFP28:
-		if (module_id  == MLXSW_SP_EEPROM_MODULE_INFO_ID_QSFP28 ||
-		    module_rev_id >= MLXSW_SP_EEPROM_MODULE_INFO_REV_ID_8636) {
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP_PLUS:
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP28:
+		if (module_id == MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP28 ||
+		    module_rev_id >=
+		    MLXSW_REG_MCIA_EEPROM_MODULE_INFO_REV_ID_8636) {
 			modinfo->type       = ETH_MODULE_SFF_8636;
 			modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
 		} else {
@@ -2574,7 +2556,7 @@ static int mlxsw_sp_get_module_info(struct net_device *netdev,
 			modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
 		}
 		break;
-	case MLXSW_SP_EEPROM_MODULE_INFO_ID_SFP:
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_SFP:
 		modinfo->type       = ETH_MODULE_SFF_8472;
 		modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;
 		break;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v0 02/12] mlxsw: reg: Add MTBR register
  2018-06-21 15:27 [PATCH v0 00/12] mlxsw thermal monitoring amendments Vadim Pasternak
  2018-06-21 15:27 ` [PATCH v0 01/12] mlxsw: spectrum: Move QSFP EEPROM defenitons to common location Vadim Pasternak
@ 2018-06-21 15:27 ` Vadim Pasternak
  2018-06-21 15:27 ` [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading Vadim Pasternak
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Vadim Pasternak @ 2018-06-21 15:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, jiri, Vadim Pasternak

Add MTBR (Management Temperature Bulk Register), which is used for port
temperature reading in a bulk mode.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 69 +++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 6a41c48..cfe6bde 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -6703,6 +6703,74 @@ static inline void mlxsw_reg_mtmp_unpack(char *payload, unsigned int *p_temp,
 		mlxsw_reg_mtmp_sensor_name_memcpy_from(payload, sensor_name);
 }
 
+/* MTBR - Management Temperature Bulk Register
+ * -------------------------------------------
+ * This register is used for bulk temperature reading.
+ */
+#define MLXSW_REG_MTBR_ID		0x900F
+#define MLXSW_REG_MTBR_LEN		0xCC
+#define MLXSW_REG_MTBR_REC_MAX_COUNT	47
+
+MLXSW_REG_DEFINE(mtbr, MLXSW_REG_MTBR_ID, MLXSW_REG_MTBR_LEN);
+
+/* reg_mtbr_base_sensor_index
+ * Base sensors index to access (0 - ASIC sensor, 1-63 - ambient sensors,
+ * 64-127 are mapped to the SFP+/QSFP modules sequentially).
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, mtbr, base_sensor_index, 0x00, 0, 7);
+
+/* reg_mtbr_num_rec
+ * Request: Number of records to read
+ * Response: Number of records read
+ * See above description for more details.
+ * Ranges 0..64
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mtbr, num_rec, 0x04, 0, 8);
+
+/* reg_mtbr_temp
+ * Temperature reading from the sensor. Reading is in 0.125 Celsius
+ * degrees units.
+ * Access: RO
+ */
+MLXSW_ITEM32_INDEXED(reg, mtbr, temp, 0x10, 0, 16, 0x04, 0x00, false);
+
+/* reg_mtbr_max_temp
+ * The highest measured temperature from the sensor.
+ * When the bit mte is cleared, the field max_temperature is reserved.
+ * Access: RO
+ */
+MLXSW_ITEM32_INDEXED(reg, mtbr, max_temp, 0x10, 16, 16, 0x04, 0x00, false);
+
+static inline void mlxsw_reg_mtbr_pack(char *payload, u8 base_sensor_index,
+				       u8 num_rec)
+{
+	MLXSW_REG_ZERO(mtbr, payload);
+	mlxsw_reg_mtbr_base_sensor_index_set(payload, base_sensor_index);
+	mlxsw_reg_mtbr_num_rec_set(payload, num_rec);
+}
+
+/* Error codes from temperatute reading */
+enum mlxsw_reg_mtbr_temp_status {
+	MLXSW_REG_MTBR_NO_CONN		= 0x8000,
+	MLXSW_REG_MTBR_NO_TEMP_SENS	= 0x8001,
+	MLXSW_REG_MTBR_INDEX_NA		= 0x8002,
+	MLXSW_REG_MTBR_BAD_SENS_INFO	= 0x8003,
+};
+
+/* Base index for reading ports temperature */
+#define MLXSW_REG_MTBR_BASE_PORT_INDEX		64
+
+static inline void mlxsw_reg_mtbr_temp_unpack(char *payload, int rec_index,
+					      u16 *p_temp, u16 *p_max_temp)
+{
+	if (p_temp)
+		*p_temp = mlxsw_reg_mtbr_temp_get(payload, rec_index);
+	if (p_max_temp)
+		*p_max_temp = mlxsw_reg_mtbr_max_temp_get(payload, rec_index);
+}
+
 /* MCIA - Management Cable Info Access
  * -----------------------------------
  * MCIA register is used to access the SFP+ and QSFP connector's EPROM.
@@ -7945,6 +8013,7 @@ static const struct mlxsw_reg_info *mlxsw_reg_infos[] = {
 	MLXSW_REG(mfsc),
 	MLXSW_REG(mfsm),
 	MLXSW_REG(mfsl),
+	MLXSW_REG(mtbr),
 	MLXSW_REG(mtcap),
 	MLXSW_REG(mtmp),
 	MLXSW_REG(mcia),
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
  2018-06-21 15:27 [PATCH v0 00/12] mlxsw thermal monitoring amendments Vadim Pasternak
  2018-06-21 15:27 ` [PATCH v0 01/12] mlxsw: spectrum: Move QSFP EEPROM defenitons to common location Vadim Pasternak
  2018-06-21 15:27 ` [PATCH v0 02/12] mlxsw: reg: Add MTBR register Vadim Pasternak
@ 2018-06-21 15:27 ` Vadim Pasternak
  2018-06-21 17:11   ` Andrew Lunn
  2018-06-21 15:27 ` [PATCH v0 04/12] mlxsw: core: Extend hwmon interface with FAN fault attribute Vadim Pasternak
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Vadim Pasternak @ 2018-06-21 15:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, jiri, Vadim Pasternak

Add new core_env module to allow port temperature reading. This
information has most critical impact on system's thermal monitoring and
is to be used by core_hwmon and core_thermal modules.

New internal API reads the temperature from all the modules, which are
equipped with the thermal sensor and exposes temperature according to
the worst measure. All individual temperature values are normalized to
pre-defined range.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlxsw/core_env.c | 316 +++++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/core_env.h |  63 +++++
 3 files changed, 380 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core_env.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core_env.h

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile b/drivers/net/ethernet/mellanox/mlxsw/Makefile
index 0cadcab..9f1dc0b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_MLXSW_CORE)	+= mlxsw_core.o
 mlxsw_core-objs			:= core.o core_acl_flex_keys.o \
-				   core_acl_flex_actions.o
+				   core_acl_flex_actions.o core_env.o
 mlxsw_core-$(CONFIG_MLXSW_CORE_HWMON) += core_hwmon.o
 mlxsw_core-$(CONFIG_MLXSW_CORE_THERMAL) += core_thermal.o
 obj-$(CONFIG_MLXSW_PCI)		+= mlxsw_pci.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_env.c b/drivers/net/ethernet/mellanox/mlxsw/core_env.c
new file mode 100644
index 0000000..fb6394d
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_env.c
@@ -0,0 +1,316 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/core_env.c
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/err.h>
+
+#include "core.h"
+#include "core_env.h"
+#include "item.h"
+
+union mlxsw_env_port_thresh {
+	u8 buf[MLXSW_REG_MCIA_TH_SIZE];
+	struct mlxsw_env_port_temp_th {
+		u16 temp_alarm_hi;
+		u16 temp_alarm_lo;
+		u16 temp_warn_hi;
+		u16 temp_warn_low;
+	} t;
+};
+
+static int mlxsw_env_bulk_get(struct mlxsw_core *core,
+			      int *ports_temp_cache, int port_count,
+			      bool *untrusted_sensor)
+{
+	char mtbr_pl[MLXSW_REG_MTBR_LEN];
+	int i, j, count, off;
+	u16 temp;
+	int err;
+
+	/* Read ports temperature. */
+	if (untrusted_sensor)
+		*untrusted_sensor = false;
+	count = 0;
+	while (count < port_count) {
+		off = min_t(u8, MLXSW_REG_MTBR_REC_MAX_COUNT,
+			    port_count - count);
+		mlxsw_reg_mtbr_pack(mtbr_pl, MLXSW_REG_MTBR_BASE_PORT_INDEX +
+				    count, off);
+		err = mlxsw_reg_query(core, MLXSW_REG(mtbr), mtbr_pl);
+		if (err)
+			return err;
+
+		for (i = 0, j = count; i < off; i++, j++) {
+			mlxsw_reg_mtbr_temp_unpack(mtbr_pl, i, &temp, NULL);
+
+			/* Update status and temperature cache. */
+			switch (temp) {
+			case MLXSW_REG_MTBR_NO_CONN:
+			case MLXSW_REG_MTBR_NO_TEMP_SENS:
+			case MLXSW_REG_MTBR_INDEX_NA:
+				ports_temp_cache[j] = 0;
+				break;
+			case MLXSW_REG_MTBR_BAD_SENS_INFO:
+				/* Untrusted cable is connected. It means that
+				 * reading temperature from its sensor is
+				 * unreliable and thermal control should
+				 * consider increasing system's FAN speed
+				 * according to the system requirements.
+				 * The presence of untrusted cable is exposed
+				 * to hwmon through temp1_fault attribute.
+				 */
+				ports_temp_cache[j] = 0;
+				if (untrusted_sensor)
+					*untrusted_sensor = false;
+				break;
+			default:
+				ports_temp_cache[j] =
+					MLXSW_REG_MTMP_TEMP_TO_MC(temp);
+				break;
+			}
+		}
+		count += off;
+	}
+
+	return 0;
+}
+
+static void mlxsw_env_scale_temp(int hot, int crit, int tdelta, u8 mask,
+				 int *temp)
+{
+	int twindow;
+
+	/* Scale port temperature thresholds window to the based window: do
+	 * nothong, if windows are equal, shrink window if it exceeds, expand
+	 * in other case. Set delta according this scale.
+	 */
+	twindow = crit - hot;
+	if (twindow > MLXSW_ENV_TEMP_WINDOW)
+		tdelta /= DIV_ROUND_CLOSEST(twindow, MLXSW_ENV_TEMP_WINDOW);
+	else if (twindow < MLXSW_ENV_TEMP_WINDOW)
+		tdelta *= DIV_ROUND_CLOSEST(MLXSW_ENV_TEMP_WINDOW, twindow);
+
+	switch (mask) {
+	case MLXSW_ENV_CRIT_MASK:
+		*temp = clamp_val(MLXSW_ENV_TEMP_HOT + tdelta,
+				  MLXSW_ENV_TEMP_HOT, MLXSW_ENV_TEMP_CRIT);
+		break;
+	case MLXSW_ENV_HOT_MASK:
+		*temp = clamp_val(MLXSW_ENV_TEMP_NORM + tdelta,
+				  MLXSW_ENV_TEMP_NORM, MLXSW_ENV_TEMP_HOT);
+		break;
+	default:
+		/* Don't set temperature below nominal value. */
+		tdelta %= MLXSW_ENV_TEMP_NORM;
+		*temp = clamp_val(MLXSW_ENV_TEMP_NORM - tdelta, *temp,
+				  MLXSW_ENV_TEMP_NORM);
+		break;
+	}
+}
+
+static void mlxsw_env_process_temp(int temp,
+				   struct mlxsw_env_temp_thresh *port,
+				   struct mlxsw_env_temp_thresh *delta,
+				   struct mlxsw_env_temp_multi *multi)
+{
+	int tdelta;
+
+	/* Compare each port temperature sensors values, with warning and
+	 * threshold values for this port. Find the worst delta for the all,
+	 * sensors which is defined as following:
+	 * - if value is below the warning threshold - the closest value to the
+	 *   warning threshold;
+	 * - if value is between the warning and alarm thresholds - the closet
+	 *   value to the alarm threshold;
+	 * - if value is above the alarm threshold - the value with the biggest
+	 *   delta.
+	 * The temperature value should be set according to the worst delta
+	 * with the next priority:
+	 * - if any sensor above alarm threshold - from the alarm;
+	 * - if any sensor above warning threshold - from the hot;
+	 * - from norm in other case.
+	 */
+	if (!multi->mask && temp < port->hot) {
+		tdelta = port->hot - temp;
+		mlxsw_env_scale_temp(port->hot, port->crit, tdelta, 0, &temp);
+		if (tdelta < delta->normal) {
+			multi->thresh.normal = temp;
+			delta->normal = tdelta;
+		}
+	} else if (temp >= port->crit) {
+		tdelta = temp - port->crit;
+		mlxsw_env_scale_temp(port->hot, port->crit, tdelta,
+				     MLXSW_ENV_CRIT_MASK, &temp);
+		if (tdelta > delta->crit) {
+			multi->thresh.crit = temp;
+			delta->crit = tdelta;
+		}
+		multi->mask |= MLXSW_ENV_CRIT_MASK;
+	} else if (!(multi->mask & MLXSW_ENV_CRIT_MASK)) {
+		tdelta = temp - port->hot;
+		mlxsw_env_scale_temp(port->hot, port->crit, tdelta,
+				     MLXSW_ENV_HOT_MASK, &temp);
+		if (tdelta > delta->hot) {
+			multi->thresh.hot = temp;
+			delta->hot = tdelta;
+		}
+		multi->mask |= MLXSW_ENV_HOT_MASK;
+	}
+}
+
+static void
+mlxsw_env_finalize_temp(struct mlxsw_env_temp_thresh *delta,
+			struct mlxsw_env_temp_multi *multi, int *temp)
+{
+	/* If the values from the all temperature sensors are:
+	 * - above temperature warning threshold - pick for the temperature the
+	 *   value with biggest delta between the temperature alarm threshold;
+	 * - between the temperature warning threshold and the temperature
+	 *   alarm threshold - pick as the temperature the closest value to the
+	 *   the temperature warning threshold;
+	 * - below the temperature warning threshold - pick as the temperature
+	 *   the closest to the temperature warning threshold.
+	 */
+	if (multi->mask & MLXSW_ENV_CRIT_MASK)
+		*temp = multi->thresh.crit;
+	else if (multi->mask & MLXSW_ENV_HOT_MASK)
+		*temp = multi->thresh.hot;
+	else
+		*temp = multi->thresh.normal;
+}
+
+static int mlxsw_env_validate_cable_ident(struct mlxsw_core *core, int id,
+					  bool *qsfp)
+{
+	char eeprom_tmp[MLXSW_REG_MCIA_EEPROM_SIZE];
+	char mcia_pl[MLXSW_REG_MCIA_LEN];
+	u8 ident;
+	int err;
+
+	mlxsw_reg_mcia_pack(mcia_pl, id, 0, MLXSW_REG_MCIA_PAGE0_LO_OFF, 0, 1,
+			    MLXSW_REG_MCIA_I2C_ADDR_LOW);
+	err = mlxsw_reg_query(core, MLXSW_REG(mcia), mcia_pl);
+	if (err)
+		return err;
+	mlxsw_reg_mcia_eeprom_memcpy_from(mcia_pl, eeprom_tmp);
+	ident = eeprom_tmp[0];
+	switch (ident) {
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_SFP:
+		*qsfp = false;
+		break;
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP:
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP_PLUS:
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP28:
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP_DD:
+		*qsfp = true;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int mlxsw_env_collect_port_temp(struct mlxsw_core *core, int *ports_temp_cache,
+				int port_count,
+				struct mlxsw_env_temp_multi *multi,
+				struct mlxsw_env_temp_thresh *delta,
+				bool *untrusted_sensor, int *temp)
+{
+	char eeprom_tmp[MLXSW_REG_MCIA_EEPROM_SIZE];
+	union mlxsw_env_port_thresh thresh;
+	char mcia_pl[MLXSW_REG_MCIA_LEN];
+	struct mlxsw_env_temp_thresh curr;
+	int port_temp, i;
+	bool qsfp;
+	int err;
+
+	memset(&curr, 0, sizeof(struct mlxsw_env_temp_thresh));
+	/* Read ports temperature. */
+	err = mlxsw_env_bulk_get(core, ports_temp_cache, port_count,
+				 untrusted_sensor);
+	if (err)
+		return err;
+
+	for (i = 0; i < port_count; i++) {
+		/* Skip port with no temperature sensor */
+		if (!ports_temp_cache[i])
+			continue;
+
+		/* Read Free Side Device Temperature Thresholds from page 03h
+		 * (MSB at lower byte address).
+		 * Bytes:
+		 * 128-129 - Temp High Alarm
+		 * 130-131 - Temp Low Alarm
+		 * 132-133 - Temp High Warning
+		 * 134-135 - Temp Low Warning
+		 */
+
+		/* Validate module identifier value. */
+		err = mlxsw_env_validate_cable_ident(core, i, &qsfp);
+		if (err)
+			return err;
+
+		if (qsfp)
+			mlxsw_reg_mcia_pack(mcia_pl, i, 0,
+					    MLXSW_REG_MCIA_TH_PAGE_NUM,
+					    MLXSW_REG_MCIA_TH_PAGE_OFF,
+					    MLXSW_REG_MCIA_TH_SIZE,
+					    MLXSW_REG_MCIA_I2C_ADDR_LOW);
+		else
+			mlxsw_reg_mcia_pack(mcia_pl, i, 0,
+					    MLXSW_REG_MCIA_PAGE0_LO, 0,
+					    MLXSW_REG_MCIA_TH_SIZE,
+					    MLXSW_REG_MCIA_I2C_ADDR_HIGH);
+
+		err = mlxsw_reg_query(core, MLXSW_REG(mcia), mcia_pl);
+		if (err)
+			return err;
+
+		mlxsw_reg_mcia_eeprom_memcpy_from(mcia_pl, eeprom_tmp);
+		memcpy(thresh.buf, eeprom_tmp, MLXSW_REG_MCIA_TH_SIZE);
+		/* Skip sensor with no threshold info. */
+		if (!thresh.t.temp_warn_hi || !thresh.t.temp_warn_hi)
+			continue;
+
+		port_temp = ports_temp_cache[i];
+		curr.hot = thresh.t.temp_warn_hi * 1000;
+		curr.crit = thresh.t.temp_alarm_hi * 1000;
+		mlxsw_env_process_temp(port_temp, &curr, delta, multi);
+	}
+
+	mlxsw_env_finalize_temp(delta, multi, temp);
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_env.h b/drivers/net/ethernet/mellanox/mlxsw/core_env.h
new file mode 100644
index 0000000..a239d5b
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_env.h
@@ -0,0 +1,63 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/core_env.h
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _MLXSW_CORE_ENV_H
+#define _MLXSW_CORE_ENV_H
+
+#define MLXSW_ENV_TEMP_UNREACHABLE	150000	/* 150C */
+#define MLXSW_ENV_HOT_MASK		BIT(0)
+#define MLXSW_ENV_CRIT_MASK		BIT(1)
+#define MLXSW_ENV_TEMP_NORM		75000	/* 75C */
+#define MLXSW_ENV_TEMP_HIGH		85000	/* 85C */
+#define MLXSW_ENV_TEMP_HOT		105000	/* 105C */
+#define MLXSW_ENV_TEMP_CRIT		110000	/* 110C */
+#define MLXSW_ENV_TEMP_WINDOW		(MLXSW_ENV_TEMP_HOT - \
+					 MLXSW_ENV_TEMP_NORM)
+
+struct mlxsw_env_temp_thresh {
+	int normal;
+	int hot;
+	int crit;
+};
+
+struct mlxsw_env_temp_multi {
+	struct mlxsw_env_temp_thresh thresh;
+	u8 mask;
+};
+
+int mlxsw_env_collect_port_temp(struct mlxsw_core *core, int *ports_temp_cache,
+				int port_count,
+				struct mlxsw_env_temp_multi *multi,
+				struct mlxsw_env_temp_thresh *delta,
+				bool *untrusted_sensor, int *temp);
+#endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v0 04/12] mlxsw: core: Extend hwmon interface with FAN fault attribute
  2018-06-21 15:27 [PATCH v0 00/12] mlxsw thermal monitoring amendments Vadim Pasternak
                   ` (2 preceding siblings ...)
  2018-06-21 15:27 ` [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading Vadim Pasternak
@ 2018-06-21 15:27 ` Vadim Pasternak
  2018-06-21 15:27 ` [PATCH v0 05/12] mlxsw: core: Extend hwmon interface with port temperature attributes Vadim Pasternak
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Vadim Pasternak @ 2018-06-21 15:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, jiri, Vadim Pasternak

Add new FAN hwmon attribute for exposing FAN faults (fault is set in
case FAN tachometer is below allowed minimum).

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core_hwmon.c | 62 +++++++++++++++++++++++-
 1 file changed, 60 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_hwmon.c b/drivers/net/ethernet/mellanox/mlxsw/core_hwmon.c
index 84185f8..dfd7adc 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_hwmon.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_hwmon.c
@@ -44,6 +44,7 @@
 #define MLXSW_HWMON_TEMP_SENSOR_MAX_COUNT 127
 #define MLXSW_HWMON_ATTR_COUNT (MLXSW_HWMON_TEMP_SENSOR_MAX_COUNT * 4 + \
 				MLXSW_MFCR_TACHOS_MAX + MLXSW_MFCR_PWMS_MAX)
+#define MLXSW_HWMON_SPEED_MAX 50000	/* RPM */
 
 struct mlxsw_hwmon_attr {
 	struct device_attribute dev_attr;
@@ -61,6 +62,7 @@ struct mlxsw_hwmon {
 	struct attribute *attrs[MLXSW_HWMON_ATTR_COUNT + 1];
 	struct mlxsw_hwmon_attr hwmon_attrs[MLXSW_HWMON_ATTR_COUNT];
 	unsigned int attrs_count;
+	u16 tach_min;
 };
 
 static ssize_t mlxsw_hwmon_temp_show(struct device *dev,
@@ -152,6 +154,28 @@ static ssize_t mlxsw_hwmon_fan_rpm_show(struct device *dev,
 	return sprintf(buf, "%u\n", mlxsw_reg_mfsm_rpm_get(mfsm_pl));
 }
 
+static ssize_t mlxsw_hwmon_fan_fault_show(struct device *dev,
+					  struct device_attribute *attr,
+					  char *buf)
+{
+	struct mlxsw_hwmon_attr *mlwsw_hwmon_attr =
+			container_of(attr, struct mlxsw_hwmon_attr, dev_attr);
+	struct mlxsw_hwmon *mlxsw_hwmon = mlwsw_hwmon_attr->hwmon;
+	char mfsm_pl[MLXSW_REG_MFSM_LEN];
+	u16 tach;
+	int err;
+
+	mlxsw_reg_mfsm_pack(mfsm_pl, mlwsw_hwmon_attr->type_index);
+	err = mlxsw_reg_query(mlxsw_hwmon->core, MLXSW_REG(mfsm), mfsm_pl);
+	if (err) {
+		dev_err(mlxsw_hwmon->bus_info->dev, "Failed to query fan\n");
+		return err;
+	}
+	tach = mlxsw_reg_mfsm_rpm_get(mfsm_pl);
+
+	return sprintf(buf, "%u\n", (tach < mlxsw_hwmon->tach_min) ? 1 : 0);
+}
+
 static ssize_t mlxsw_hwmon_pwm_show(struct device *dev,
 				    struct device_attribute *attr,
 				    char *buf)
@@ -203,6 +227,7 @@ enum mlxsw_hwmon_attr_type {
 	MLXSW_HWMON_ATTR_TYPE_TEMP_MAX,
 	MLXSW_HWMON_ATTR_TYPE_TEMP_RST,
 	MLXSW_HWMON_ATTR_TYPE_FAN_RPM,
+	MLXSW_HWMON_ATTR_TYPE_FAN_FAULT,
 	MLXSW_HWMON_ATTR_TYPE_PWM,
 };
 
@@ -240,6 +265,12 @@ static void mlxsw_hwmon_attr_add(struct mlxsw_hwmon *mlxsw_hwmon,
 		snprintf(mlxsw_hwmon_attr->name, sizeof(mlxsw_hwmon_attr->name),
 			 "fan%u_input", num + 1);
 		break;
+	case MLXSW_HWMON_ATTR_TYPE_FAN_FAULT:
+		mlxsw_hwmon_attr->dev_attr.show = mlxsw_hwmon_fan_fault_show;
+		mlxsw_hwmon_attr->dev_attr.attr.mode = 0444;
+		snprintf(mlxsw_hwmon_attr->name, sizeof(mlxsw_hwmon_attr->name),
+			 "fan%u_fault", num + 1);
+		break;
 	case MLXSW_HWMON_ATTR_TYPE_PWM:
 		mlxsw_hwmon_attr->dev_attr.show = mlxsw_hwmon_pwm_show;
 		mlxsw_hwmon_attr->dev_attr.store = mlxsw_hwmon_pwm_store;
@@ -297,9 +328,9 @@ static int mlxsw_hwmon_fans_init(struct mlxsw_hwmon *mlxsw_hwmon)
 {
 	char mfcr_pl[MLXSW_REG_MFCR_LEN] = {0};
 	enum mlxsw_reg_mfcr_pwm_frequency freq;
+	u16 tacho_active, tach_min;
 	unsigned int type_index;
 	unsigned int num;
-	u16 tacho_active;
 	u8 pwm_active;
 	int err;
 
@@ -310,11 +341,38 @@ static int mlxsw_hwmon_fans_init(struct mlxsw_hwmon *mlxsw_hwmon)
 	}
 	mlxsw_reg_mfcr_unpack(mfcr_pl, &freq, &tacho_active, &pwm_active);
 	num = 0;
+	/* Set tachometer to maximum value as the initial seed. */
+	mlxsw_hwmon->tach_min = MLXSW_HWMON_SPEED_MAX;
 	for (type_index = 0; type_index < MLXSW_MFCR_TACHOS_MAX; type_index++) {
-		if (tacho_active & BIT(type_index))
+		if (tacho_active & BIT(type_index)) {
+			char mfsl_pl[MLXSW_REG_MFSL_LEN] = {0};
+
 			mlxsw_hwmon_attr_add(mlxsw_hwmon,
 					     MLXSW_HWMON_ATTR_TYPE_FAN_RPM,
+					     type_index, num);
+			mlxsw_hwmon_attr_add(mlxsw_hwmon,
+					     MLXSW_HWMON_ATTR_TYPE_FAN_FAULT,
 					     type_index, num++);
+			/* Get tachometer minimum value. */
+			mlxsw_reg_mfsl_pack(mfsl_pl, type_index, 0, 0);
+			err = mlxsw_reg_query(mlxsw_hwmon->core,
+					      MLXSW_REG(mfsl), mfsl_pl);
+			if (err) {
+				dev_err(mlxsw_hwmon->bus_info->dev, "Failed to query tachometer %d\n",
+					type_index);
+				return err;
+			}
+
+			tach_min = mlxsw_reg_mfsl_tach_min_get(mfsl_pl);
+			/* Store absolute minimal value of all tachometers for
+			 * alarm indication, because forward FANs could be
+			 * replaced with reversed and wise versa and in such
+			 * case the minimum values could be flipped.
+			 */
+			mlxsw_hwmon->tach_min = min_t(u16,
+						      mlxsw_hwmon->tach_min,
+						      tach_min);
+		}
 	}
 	num = 0;
 	for (type_index = 0; type_index < MLXSW_MFCR_PWMS_MAX; type_index++) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v0 05/12] mlxsw: core: Extend hwmon interface with port temperature attributes
  2018-06-21 15:27 [PATCH v0 00/12] mlxsw thermal monitoring amendments Vadim Pasternak
                   ` (3 preceding siblings ...)
  2018-06-21 15:27 ` [PATCH v0 04/12] mlxsw: core: Extend hwmon interface with FAN fault attribute Vadim Pasternak
@ 2018-06-21 15:27 ` Vadim Pasternak
  2018-06-21 15:28 ` [PATCH v0 06/12] mlxsw: core: Add bus frequency capability flag for the bus type Vadim Pasternak
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Vadim Pasternak @ 2018-06-21 15:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, jiri, Vadim Pasternak

Add new attributes to hwmon object for exposing accumulative ports
temperature input and accumulative port temperature fault (if one of
sensors in untrusted - fault is set.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core_hwmon.c | 102 +++++++++++++++++++++++
 1 file changed, 102 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_hwmon.c b/drivers/net/ethernet/mellanox/mlxsw/core_hwmon.c
index dfd7adc..ac28e6c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_hwmon.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_hwmon.c
@@ -40,6 +40,7 @@
 #include <linux/err.h>
 
 #include "core.h"
+#include "core_env.h"
 
 #define MLXSW_HWMON_TEMP_SENSOR_MAX_COUNT 127
 #define MLXSW_HWMON_ATTR_COUNT (MLXSW_HWMON_TEMP_SENSOR_MAX_COUNT * 4 + \
@@ -63,6 +64,9 @@ struct mlxsw_hwmon {
 	struct mlxsw_hwmon_attr hwmon_attrs[MLXSW_HWMON_ATTR_COUNT];
 	unsigned int attrs_count;
 	u16 tach_min;
+	int *ports_temp_cache;
+	int count;
+	bool untrusted_sensor;
 };
 
 static ssize_t mlxsw_hwmon_temp_show(struct device *dev,
@@ -222,6 +226,47 @@ static ssize_t mlxsw_hwmon_pwm_store(struct device *dev,
 	return len;
 }
 
+static ssize_t mlxsw_hwmon_port_temp_show(struct device *dev,
+					  struct device_attribute *attr,
+					  char *buf)
+{
+	struct mlxsw_hwmon_attr *mlwsw_hwmon_attr =
+			container_of(attr, struct mlxsw_hwmon_attr, dev_attr);
+	struct mlxsw_hwmon *mlxsw_hwmon = mlwsw_hwmon_attr->hwmon;
+	struct mlxsw_env_temp_multi multi;
+	struct mlxsw_env_temp_thresh delta;
+	int temp;
+	int err;
+
+	memset(&multi, 0, sizeof(struct mlxsw_env_temp_multi));
+	memset(&delta, 0, sizeof(struct mlxsw_env_temp_thresh));
+	/* Set initial value for normal temperature to unreachable value. */
+	delta.normal = MLXSW_ENV_TEMP_UNREACHABLE;
+	/* Collect ports temperature */
+	err = mlxsw_env_collect_port_temp(mlxsw_hwmon->core,
+					  mlxsw_hwmon->ports_temp_cache,
+					  mlxsw_hwmon->count, &multi, &delta,
+					  &mlxsw_hwmon->untrusted_sensor,
+					  &temp);
+	if (err) {
+		dev_err(mlxsw_hwmon->bus_info->dev, "Failed to query port temp\n");
+		return err;
+	}
+
+	return sprintf(buf, "%u\n", temp);
+}
+
+static ssize_t mlxsw_hwmon_port_temp_fault_show(struct device *dev,
+						struct device_attribute *attr,
+						char *buf)
+{
+	struct mlxsw_hwmon_attr *mlwsw_hwmon_attr =
+			container_of(attr, struct mlxsw_hwmon_attr, dev_attr);
+	struct mlxsw_hwmon *mlxsw_hwmon = mlwsw_hwmon_attr->hwmon;
+
+	return sprintf(buf, "%u\n", mlxsw_hwmon->untrusted_sensor ? 1 : 0);
+}
+
 enum mlxsw_hwmon_attr_type {
 	MLXSW_HWMON_ATTR_TYPE_TEMP,
 	MLXSW_HWMON_ATTR_TYPE_TEMP_MAX,
@@ -229,6 +274,8 @@ enum mlxsw_hwmon_attr_type {
 	MLXSW_HWMON_ATTR_TYPE_FAN_RPM,
 	MLXSW_HWMON_ATTR_TYPE_FAN_FAULT,
 	MLXSW_HWMON_ATTR_TYPE_PWM,
+	MLXSW_HWMON_ATTR_TYPE_TEMP_PORT,
+	MLXSW_HWMON_ATTR_TYPE_TEMP_PORT_FAULT,
 };
 
 static void mlxsw_hwmon_attr_add(struct mlxsw_hwmon *mlxsw_hwmon,
@@ -278,6 +325,19 @@ static void mlxsw_hwmon_attr_add(struct mlxsw_hwmon *mlxsw_hwmon,
 		snprintf(mlxsw_hwmon_attr->name, sizeof(mlxsw_hwmon_attr->name),
 			 "pwm%u", num + 1);
 		break;
+	case MLXSW_HWMON_ATTR_TYPE_TEMP_PORT:
+		mlxsw_hwmon_attr->dev_attr.show = mlxsw_hwmon_port_temp_show;
+		mlxsw_hwmon_attr->dev_attr.attr.mode = 0444;
+		snprintf(mlxsw_hwmon_attr->name, sizeof(mlxsw_hwmon_attr->name),
+			 "temp%u_input", num + 1);
+		break;
+	case MLXSW_HWMON_ATTR_TYPE_TEMP_PORT_FAULT:
+		mlxsw_hwmon_attr->dev_attr.show =
+					mlxsw_hwmon_port_temp_fault_show;
+		mlxsw_hwmon_attr->dev_attr.attr.mode = 0444;
+		snprintf(mlxsw_hwmon_attr->name, sizeof(mlxsw_hwmon_attr->name),
+			 "temp%u_fault", num + 1);
+		break;
 	default:
 		WARN_ON(1);
 	}
@@ -384,6 +444,43 @@ static int mlxsw_hwmon_fans_init(struct mlxsw_hwmon *mlxsw_hwmon)
 	return 0;
 }
 
+static int mlxsw_hwmon_port_init(struct mlxsw_hwmon *mlxsw_hwmon)
+{
+	unsigned int max_ports = mlxsw_core_max_ports(mlxsw_hwmon->core);
+	struct device *dev = mlxsw_hwmon->bus_info->dev;
+	char mtcap_pl[MLXSW_REG_MTCAP_LEN] = {0};
+	u8 sensor_count;
+	int err;
+
+	mlxsw_hwmon->ports_temp_cache = devm_kmalloc_array(dev, max_ports,
+							   sizeof(int),
+							   GFP_KERNEL);
+	if (!mlxsw_hwmon->ports_temp_cache)
+		return -ENOMEM;
+	mlxsw_hwmon->count = max_ports;
+
+	err = mlxsw_reg_query(mlxsw_hwmon->core, MLXSW_REG(mtcap), mtcap_pl);
+	if (err) {
+		dev_err(mlxsw_hwmon->bus_info->dev, "Failed to get number of temp sensors\n");
+		return err;
+	}
+	/* Add extra attributes for port temperature - one attribute for the
+	 * cumulative temperature measurement and one attribute for the
+	 * cumulative temperature fault status. Sensor index will be assigned
+	 * to sensor_count value, while all indexed before sensor_count are
+	 * already utilized by the sensors connected through mtmp register by
+	 * mlxsw_hwmon_temp_init().
+	 */
+	sensor_count = mlxsw_reg_mtcap_sensor_count_get(mtcap_pl);
+	mlxsw_hwmon_attr_add(mlxsw_hwmon, MLXSW_HWMON_ATTR_TYPE_TEMP_PORT,
+			     sensor_count, sensor_count);
+	mlxsw_hwmon_attr_add(mlxsw_hwmon,
+			     MLXSW_HWMON_ATTR_TYPE_TEMP_PORT_FAULT,
+			     sensor_count, sensor_count);
+
+	return 0;
+}
+
 int mlxsw_hwmon_init(struct mlxsw_core *mlxsw_core,
 		     const struct mlxsw_bus_info *mlxsw_bus_info,
 		     struct mlxsw_hwmon **p_hwmon)
@@ -407,6 +504,10 @@ int mlxsw_hwmon_init(struct mlxsw_core *mlxsw_core,
 	if (err)
 		goto err_fans_init;
 
+	err = mlxsw_hwmon_port_init(mlxsw_hwmon);
+	if (err)
+		goto err_temp_port_init;
+
 	mlxsw_hwmon->groups[0] = &mlxsw_hwmon->group;
 	mlxsw_hwmon->group.attrs = mlxsw_hwmon->attrs;
 
@@ -424,6 +525,7 @@ int mlxsw_hwmon_init(struct mlxsw_core *mlxsw_core,
 	return 0;
 
 err_hwmon_register:
+err_temp_port_init:
 err_fans_init:
 err_temp_init:
 	return err;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v0 06/12] mlxsw: core: Add bus frequency capability flag for the bus type
  2018-06-21 15:27 [PATCH v0 00/12] mlxsw thermal monitoring amendments Vadim Pasternak
                   ` (4 preceding siblings ...)
  2018-06-21 15:27 ` [PATCH v0 05/12] mlxsw: core: Extend hwmon interface with port temperature attributes Vadim Pasternak
@ 2018-06-21 15:28 ` Vadim Pasternak
  2018-06-21 15:28 ` [PATCH v0 07/12] mlxsw: core: Set different thermal polling time based on " Vadim Pasternak
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Vadim Pasternak @ 2018-06-21 15:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, jiri, Vadim Pasternak

Add low frequency bus capability in order to allow core functionality
separation based on bus type. Driver could run over PCIe, which is
considered as high frequency bus or I2C , which is considered as low
frequency bus. In the last case time setting, for example, for thermal
polling interval, should be increased.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.h | 1 +
 drivers/net/ethernet/mellanox/mlxsw/i2c.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 552cfa2..95e6190 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -371,6 +371,7 @@ struct mlxsw_bus_info {
 	struct mlxsw_fw_rev fw_rev;
 	u8 vsd[MLXSW_CMD_BOARDINFO_VSD_LEN];
 	u8 psid[MLXSW_CMD_BOARDINFO_PSID_LEN];
+	bool low_frequency;
 };
 
 struct mlxsw_hwmon;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/i2c.c b/drivers/net/ethernet/mellanox/mlxsw/i2c.c
index 25f9915..384b337 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/i2c.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/i2c.c
@@ -536,6 +536,7 @@ static int mlxsw_i2c_probe(struct i2c_client *client,
 	mlxsw_i2c->bus_info.device_kind = id->name;
 	mlxsw_i2c->bus_info.device_name = client->name;
 	mlxsw_i2c->bus_info.dev = &client->dev;
+	mlxsw_i2c->bus_info.low_frequency = true;
 	mlxsw_i2c->dev = &client->dev;
 
 	err = mlxsw_core_bus_device_register(&mlxsw_i2c->bus_info,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v0 07/12] mlxsw: core: Set different thermal polling time based on bus type
  2018-06-21 15:27 [PATCH v0 00/12] mlxsw thermal monitoring amendments Vadim Pasternak
                   ` (5 preceding siblings ...)
  2018-06-21 15:28 ` [PATCH v0 06/12] mlxsw: core: Add bus frequency capability flag for the bus type Vadim Pasternak
@ 2018-06-21 15:28 ` Vadim Pasternak
  2018-06-21 15:28 ` [PATCH v0 08/12] mlxsw: core: Modify thermal zone definition Vadim Pasternak
  2018-06-21 15:28 ` [PATCH v0 09/12] mlxsw: core: Extend thermal zone operations with get_trend method Vadim Pasternak
  8 siblings, 0 replies; 19+ messages in thread
From: Vadim Pasternak @ 2018-06-21 15:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, jiri, Vadim Pasternak

Use different thermal monitoring based on bus type.
For I2C bus time is set to 20 seconds, while for PCIe 1 second polling
interval is used.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
index d866c98..152591d8 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
@@ -41,6 +41,7 @@
 #include "core.h"
 
 #define MLXSW_THERMAL_POLL_INT	1000	/* ms */
+#define MLXSW_THERMAL_SLOW_POLL_INT	20000	/* ms */
 #define MLXSW_THERMAL_MAX_TEMP	110000	/* 110C */
 #define MLXSW_THERMAL_MAX_STATE	10
 #define MLXSW_THERMAL_MAX_DUTY	255
@@ -95,6 +96,7 @@ struct mlxsw_thermal {
 	struct mlxsw_core *core;
 	const struct mlxsw_bus_info *bus_info;
 	struct thermal_zone_device *tzdev;
+	int polling_delay;
 	struct thermal_cooling_device *cdevs[MLXSW_MFCR_PWMS_MAX];
 	struct mlxsw_thermal_trip trips[MLXSW_THERMAL_NUM_TRIPS];
 	enum thermal_device_mode mode;
@@ -190,7 +192,7 @@ static int mlxsw_thermal_set_mode(struct thermal_zone_device *tzdev,
 	mutex_lock(&tzdev->lock);
 
 	if (mode == THERMAL_DEVICE_ENABLED)
-		tzdev->polling_delay = MLXSW_THERMAL_POLL_INT;
+		tzdev->polling_delay = thermal->polling_delay;
 	else
 		tzdev->polling_delay = 0;
 
@@ -397,13 +399,18 @@ int mlxsw_thermal_init(struct mlxsw_core *core,
 		}
 	}
 
+	if (bus_info->low_frequency)
+		thermal->polling_delay = MLXSW_THERMAL_SLOW_POLL_INT;
+	else
+		thermal->polling_delay = MLXSW_THERMAL_POLL_INT;
+
 	thermal->tzdev = thermal_zone_device_register("mlxsw",
 						      MLXSW_THERMAL_NUM_TRIPS,
 						      MLXSW_THERMAL_TRIP_MASK,
 						      thermal,
 						      &mlxsw_thermal_ops,
 						      NULL, 0,
-						      MLXSW_THERMAL_POLL_INT);
+						      thermal->polling_delay);
 	if (IS_ERR(thermal->tzdev)) {
 		err = PTR_ERR(thermal->tzdev);
 		dev_err(dev, "Failed to register thermal zone\n");
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v0 08/12] mlxsw: core: Modify thermal zone definition
  2018-06-21 15:27 [PATCH v0 00/12] mlxsw thermal monitoring amendments Vadim Pasternak
                   ` (6 preceding siblings ...)
  2018-06-21 15:28 ` [PATCH v0 07/12] mlxsw: core: Set different thermal polling time based on " Vadim Pasternak
@ 2018-06-21 15:28 ` Vadim Pasternak
  2018-06-21 15:28 ` [PATCH v0 09/12] mlxsw: core: Extend thermal zone operations with get_trend method Vadim Pasternak
  8 siblings, 0 replies; 19+ messages in thread
From: Vadim Pasternak @ 2018-06-21 15:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, jiri, Vadim Pasternak

Thermal zone trip points setting is modified for better alignment with
modified thermal algorithm.
The hysteresis thresholds for thermal trips are added in order to avoid
throttling around thermal trip point. If hysteresis temperature is not
considered PWM can have side effect of flip up/down on thermal trip
point boundary.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 63 ++++++++++++++--------
 1 file changed, 41 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
index 152591d8..91c4946 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
@@ -39,16 +39,18 @@
 #include <linux/err.h>
 
 #include "core.h"
+#include "core_env.h"
 
-#define MLXSW_THERMAL_POLL_INT	1000	/* ms */
+#define MLXSW_THERMAL_POLL_INT		1000	/* ms */
 #define MLXSW_THERMAL_SLOW_POLL_INT	20000	/* ms */
-#define MLXSW_THERMAL_MAX_TEMP	110000	/* 110C */
-#define MLXSW_THERMAL_MAX_STATE	10
-#define MLXSW_THERMAL_MAX_DUTY	255
+#define MLXSW_THERMAL_HYSTERESIS_TEMP	5000	/* 5C */
+#define MLXSW_THERMAL_MAX_STATE		10
+#define MLXSW_THERMAL_MAX_DUTY		255
 
 struct mlxsw_thermal_trip {
 	int	type;
 	int	temp;
+	int	hyst;
 	int	min_state;
 	int	max_state;
 };
@@ -56,32 +58,29 @@ struct mlxsw_thermal_trip {
 static const struct mlxsw_thermal_trip default_thermal_trips[] = {
 	{	/* In range - 0-40% PWM */
 		.type		= THERMAL_TRIP_ACTIVE,
-		.temp		= 75000,
+		.temp		= MLXSW_ENV_TEMP_NORM,
+		.hyst		= MLXSW_THERMAL_HYSTERESIS_TEMP,
 		.min_state	= 0,
 		.max_state	= (4 * MLXSW_THERMAL_MAX_STATE) / 10,
 	},
-	{	/* High - 40-100% PWM */
-		.type		= THERMAL_TRIP_ACTIVE,
-		.temp		= 80000,
-		.min_state	= (4 * MLXSW_THERMAL_MAX_STATE) / 10,
-		.max_state	= MLXSW_THERMAL_MAX_STATE,
-	},
 	{
-		/* Very high - 100% PWM */
+		/* In range - 40-100% PWM */
 		.type		= THERMAL_TRIP_ACTIVE,
-		.temp		= 85000,
-		.min_state	= MLXSW_THERMAL_MAX_STATE,
+		.temp		= MLXSW_ENV_TEMP_HIGH,
+		.hyst		= MLXSW_THERMAL_HYSTERESIS_TEMP,
+		.min_state	= (4 * MLXSW_THERMAL_MAX_STATE) / 10,
 		.max_state	= MLXSW_THERMAL_MAX_STATE,
 	},
 	{	/* Warning */
 		.type		= THERMAL_TRIP_HOT,
-		.temp		= 105000,
+		.temp		= MLXSW_ENV_TEMP_HOT,
+		.hyst		= MLXSW_THERMAL_HYSTERESIS_TEMP,
 		.min_state	= MLXSW_THERMAL_MAX_STATE,
 		.max_state	= MLXSW_THERMAL_MAX_STATE,
 	},
 	{	/* Critical - soft poweroff */
 		.type		= THERMAL_TRIP_CRITICAL,
-		.temp		= MLXSW_THERMAL_MAX_TEMP,
+		.temp		= MLXSW_ENV_TEMP_CRIT,
 		.min_state	= MLXSW_THERMAL_MAX_STATE,
 		.max_state	= MLXSW_THERMAL_MAX_STATE,
 	}
@@ -257,22 +256,42 @@ static int mlxsw_thermal_set_trip_temp(struct thermal_zone_device *tzdev,
 	struct mlxsw_thermal *thermal = tzdev->devdata;
 
 	if (trip < 0 || trip >= MLXSW_THERMAL_NUM_TRIPS ||
-	    temp > MLXSW_THERMAL_MAX_TEMP)
+	    temp > MLXSW_ENV_TEMP_CRIT)
 		return -EINVAL;
 
 	thermal->trips[trip].temp = temp;
 	return 0;
 }
 
+static int mlxsw_thermal_get_trip_hyst(struct thermal_zone_device *tzdev,
+				       int trip, int *p_hyst)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+
+	*p_hyst = thermal->trips[trip].hyst;
+	return 0;
+}
+
+static int mlxsw_thermal_set_trip_hyst(struct thermal_zone_device *tzdev,
+				       int trip, int hyst)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+
+	thermal->trips[trip].hyst = hyst;
+	return 0;
+}
+
 static struct thermal_zone_device_ops mlxsw_thermal_ops = {
-	.bind = mlxsw_thermal_bind,
-	.unbind = mlxsw_thermal_unbind,
-	.get_mode = mlxsw_thermal_get_mode,
-	.set_mode = mlxsw_thermal_set_mode,
-	.get_temp = mlxsw_thermal_get_temp,
+	.bind		= mlxsw_thermal_bind,
+	.unbind		= mlxsw_thermal_unbind,
+	.get_mode	= mlxsw_thermal_get_mode,
+	.set_mode	= mlxsw_thermal_set_mode,
+	.get_temp	= mlxsw_thermal_get_temp,
 	.get_trip_type	= mlxsw_thermal_get_trip_type,
 	.get_trip_temp	= mlxsw_thermal_get_trip_temp,
 	.set_trip_temp	= mlxsw_thermal_set_trip_temp,
+	.get_trip_hyst	= mlxsw_thermal_get_trip_hyst,
+	.set_trip_hyst	= mlxsw_thermal_set_trip_hyst,
 };
 
 static int mlxsw_thermal_get_max_state(struct thermal_cooling_device *cdev,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v0 09/12] mlxsw: core: Extend thermal zone operations with get_trend method
  2018-06-21 15:27 [PATCH v0 00/12] mlxsw thermal monitoring amendments Vadim Pasternak
                   ` (7 preceding siblings ...)
  2018-06-21 15:28 ` [PATCH v0 08/12] mlxsw: core: Modify thermal zone definition Vadim Pasternak
@ 2018-06-21 15:28 ` Vadim Pasternak
  8 siblings, 0 replies; 19+ messages in thread
From: Vadim Pasternak @ 2018-06-21 15:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, jiri, Vadim Pasternak

Thermal get_trend method is added in order to notify user in case of
fast temperature downgrade. It could happen in case one or few very hot
port cables are removed. In such situation temperature trend could go
down once, and then could stay in a stable state, while PWM state will
be decreased only once and could stay in not optimal high state.
Notification will allow user to take an appropriate action if
necessary.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 27 ++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
index 91c4946..1587820 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
@@ -281,6 +281,32 @@ static int mlxsw_thermal_set_trip_hyst(struct thermal_zone_device *tzdev,
 	return 0;
 }
 
+static int mlxsw_thermal_get_trend(struct thermal_zone_device *tzdev,
+				   int trip, enum thermal_trend *trend)
+{
+	int delta;
+
+	if (trip < 0 || trip >= MLXSW_THERMAL_NUM_TRIPS)
+		return -EINVAL;
+
+	delta = tzdev->last_temperature - tzdev->temperature;
+	if (delta > MLXSW_ENV_TEMP_WINDOW) {
+		/* Notify user about fast temperature decreasing by sending
+		 * hwmon uevent. Decreasing could happen in case one or few
+		 * very hot port cables have been removed. In this situation
+		 * temperature trend could go down once, and then could stay
+		 * in a stable state, while PWM state will be decreased only
+		 * once. As a side effect PWM could be not at optimal speed.
+		 * Notification will allow user to handle such case, if user
+		 * supposes to optimize PWM state.
+		 */
+		kobject_uevent(&tzdev->device.kobj, KOBJ_CHANGE);
+	}
+
+	/* Return non-zero value to pass control to get_tz_trend() routine. */
+	return 1;
+}
+
 static struct thermal_zone_device_ops mlxsw_thermal_ops = {
 	.bind		= mlxsw_thermal_bind,
 	.unbind		= mlxsw_thermal_unbind,
@@ -292,6 +318,7 @@ static struct thermal_zone_device_ops mlxsw_thermal_ops = {
 	.set_trip_temp	= mlxsw_thermal_set_trip_temp,
 	.get_trip_hyst	= mlxsw_thermal_get_trip_hyst,
 	.set_trip_hyst	= mlxsw_thermal_set_trip_hyst,
+	.get_trend	= mlxsw_thermal_get_trend,
 };
 
 static int mlxsw_thermal_get_max_state(struct thermal_cooling_device *cdev,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
  2018-06-21 15:27 ` [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading Vadim Pasternak
@ 2018-06-21 17:11   ` Andrew Lunn
  2018-06-21 18:14     ` Vadim Pasternak
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Lunn @ 2018-06-21 17:11 UTC (permalink / raw)
  To: Vadim Pasternak; +Cc: davem, netdev, jiri

> New internal API reads the temperature from all the modules, which are
> equipped with the thermal sensor and exposes temperature according to
> the worst measure. All individual temperature values are normalized to
> pre-defined range.

Hi Vadim

Could you explain this normalization process. Why are you not just
expose each sensors temperature in millidegrees C, which is the normal
for HWMON.

    Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
  2018-06-21 17:11   ` Andrew Lunn
@ 2018-06-21 18:14     ` Vadim Pasternak
  2018-06-21 18:34       ` Andrew Lunn
  0 siblings, 1 reply; 19+ messages in thread
From: Vadim Pasternak @ 2018-06-21 18:14 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: davem, netdev, jiri



> -----Original Message-----
> From: Andrew Lunn [mailto:andrew@lunn.ch]
> Sent: Thursday, June 21, 2018 8:11 PM
> To: Vadim Pasternak <vadimp@mellanox.com>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; jiri@resnulli.us
> Subject: Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for
> port temperature reading
> 
> > New internal API reads the temperature from all the modules, which are
> > equipped with the thermal sensor and exposes temperature according to
> > the worst measure. All individual temperature values are normalized to
> > pre-defined range.
> 
> Hi Vadim
> 
> Could you explain this normalization process. Why are you not just expose each
> sensors temperature in millidegrees C, which is the normal for HWMON.

Hi Andrew,

The temperature of each individual module can be obtained
through ethtool.
The worst temperature is necessary for the system cooling
control decision.

Up to 64 SFP/QSFP modules could be connected to the system.
Some of them could cooper modules, which doesn't provide
temperature measurement.
Some of them could be optical modules, providing untrusted
temperature measurement, which could impact thermal
control of the system.
Also optical modules could be from the different vendors,  and
this is real situation, when, f.e. one module has the warning and
critical thresholds 75C and 85C, while another 70C and 80C.
In such case  the first module temperature 72C is better, then the
second module temperature 71C.

And deltas  between warning and critical thresholds, could be
different as well. It could be 5C, 10C, etc.  

So, nominal temperature is not the case here, we should know the
"worst" value for the thermal control decision.

Thanks,
Vadim.

> 
>     Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
  2018-06-21 18:14     ` Vadim Pasternak
@ 2018-06-21 18:34       ` Andrew Lunn
  2018-06-21 19:17         ` Vadim Pasternak
  2018-06-21 22:06         ` Guenter Roeck
  0 siblings, 2 replies; 19+ messages in thread
From: Andrew Lunn @ 2018-06-21 18:34 UTC (permalink / raw)
  To: Vadim Pasternak, Guenter Roeck; +Cc: davem, netdev, jiri

> Hi Andrew,

Adding Guenter Roeck, the HWMON maintainer.
 
> The temperature of each individual module can be obtained
> through ethtool.

You mean via --module-info?

FYI: I plan to add hwmon support to the kernel SFP code. So if you
ever decide to swap to the kernel SFP code, not your own, the raw
temperatures will be exported.

> The worst temperature is necessary for the system cooling
> control decision.

I would expect the system cooling would understand that.

> Up to 64 SFP/QSFP modules could be connected to the system.
> Some of them could cooper modules, which doesn't provide
> temperature measurement.

SFP modules are hot-plugable. So i would also expect the hwmon devices
to hotplug. If there is no sensor, then there is no hwmon device... If
there is no hwmon device, it plays no part in the thermal control
loop.

> Some of them could be optical modules, providing untrusted
> temperature measurement, which could impact thermal
> control of the system.

Why would you not trust it? Are you saying some modules simply have
broken temperature sensors? Do you have a whitelist/blacklist of
modules?

> Also optical modules could be from the different vendors,  and
> this is real situation, when, f.e. one module has the warning and
> critical thresholds 75C and 85C, while another 70C and 80C.

But hwmon exports both the actual temperature and the alarm
temperatures. I would expect the thermal control code to use all this
information when making its decisions, not just the current
temperature.

> So, nominal temperature is not the case here, we should know the
> "worst" value for the thermal control decision.

What it sounds like to me is you are working around problems in the
thermal control by fudging the raw temperatures. That is the wrong
thing to do. hwmon should export the raw data, and you should fix the
thermal control code to use it correctly.

	Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
  2018-06-21 18:34       ` Andrew Lunn
@ 2018-06-21 19:17         ` Vadim Pasternak
  2018-06-21 19:49           ` Andrew Lunn
  2018-06-21 22:06         ` Guenter Roeck
  1 sibling, 1 reply; 19+ messages in thread
From: Vadim Pasternak @ 2018-06-21 19:17 UTC (permalink / raw)
  To: Andrew Lunn, Guenter Roeck; +Cc: davem, netdev, jiri



> -----Original Message-----
> From: Andrew Lunn [mailto:andrew@lunn.ch]
> Sent: Thursday, June 21, 2018 9:35 PM
> To: Vadim Pasternak <vadimp@mellanox.com>; Guenter Roeck <linux@roeck-
> us.net>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; jiri@resnulli.us
> Subject: Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for
> port temperature reading
> 
> > Hi Andrew,
> 
> Adding Guenter Roeck, the HWMON maintainer.
> 
> > The temperature of each individual module can be obtained through
> > ethtool.
> 
> You mean via --module-info?

Yes.

> 
> FYI: I plan to add hwmon support to the kernel SFP code. So if you ever decide to
> swap to the kernel SFP code, not your own, the raw temperatures will be
> exported.
> 

Not sure it'll work for us, since we read SFP/QSFP ports through our SW/FW
interface.
But would be nice if you can provide some reference to this code.

> > The worst temperature is necessary for the system cooling control
> > decision.
> 
> I would expect the system cooling would understand that.
> 

In thermal zone infrastructure there is one temperature input.
How you can consider 64+ different inputs?

> > Up to 64 SFP/QSFP modules could be connected to the system.
> > Some of them could cooper modules, which doesn't provide temperature
> > measurement.
> 
> SFP modules are hot-plugable. So i would also expect the hwmon devices to
> hotplug. If there is no sensor, then there is no hwmon device... If there is no
> hwmon device, it plays no part in the thermal control loop.
> 
> > Some of them could be optical modules, providing untrusted temperature
> > measurement, which could impact thermal control of the system.
> 
> Why would you not trust it? Are you saying some modules simply have broken
> temperature sensors? Do you have a whitelist/blacklist of modules?
> 

We are reading temperature info through the firmware.
In case of "broken" module (module is supposed to be capable of
reading temperature, but returns some non-valid code), we'll get
some error code.

> > Also optical modules could be from the different vendors,  and this is
> > real situation, when, f.e. one module has the warning and critical
> > thresholds 75C and 85C, while another 70C and 80C.
> 
> But hwmon exports both the actual temperature and the alarm temperatures. I
> would expect the thermal control code to use all this information when making
> its decisions, not just the current temperature.
> 

All information is used, but the decision to increase FAN speed is taken
based on the worst measure, which is logical.

> > So, nominal temperature is not the case here, we should know the
> > "worst" value for the thermal control decision.
> 
> What it sounds like to me is you are working around problems in the thermal
> control by fudging the raw temperatures. That is the wrong thing to do. hwmon
> should export the raw data, and you should fix the thermal control code to use it
> correctly.

By default we are using kernel step-wise thermal algorithm, considering
all the module and ASIC ambient sensors temperature. This is not working
around. In thermal zone we have one PWM control and cumulative temperature
from the modules and ASIC. And it gives stable and correct results.

> 
> 	Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
  2018-06-21 19:17         ` Vadim Pasternak
@ 2018-06-21 19:49           ` Andrew Lunn
  2018-06-21 20:02             ` Vadim Pasternak
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Lunn @ 2018-06-21 19:49 UTC (permalink / raw)
  To: Vadim Pasternak; +Cc: Guenter Roeck, davem, netdev, jiri

On Thu, Jun 21, 2018 at 07:17:03PM +0000, Vadim Pasternak wrote:
> 
> 
> > -----Original Message-----
> > From: Andrew Lunn [mailto:andrew@lunn.ch]
> > Sent: Thursday, June 21, 2018 9:35 PM
> > To: Vadim Pasternak <vadimp@mellanox.com>; Guenter Roeck <linux@roeck-
> > us.net>
> > Cc: davem@davemloft.net; netdev@vger.kernel.org; jiri@resnulli.us
> > Subject: Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for
> > port temperature reading
> > 
> > > Hi Andrew,
> > 
> > Adding Guenter Roeck, the HWMON maintainer.
> > 
> > > The temperature of each individual module can be obtained through
> > > ethtool.
> > 
> > You mean via --module-info?
> 
> Yes.
> 
> > 
> > FYI: I plan to add hwmon support to the kernel SFP code. So if you ever decide to
> > swap to the kernel SFP code, not your own, the raw temperatures will be
> > exported.
> > 
> 
> Not sure it'll work for us, since we read SFP/QSFP ports through our SW/FW
> interface.

Can you make fake i2c busses? Pass the i2c transactions to the
firmware?

> But would be nice if you can provide some reference to this code.

drivers/net/phy/sfp.c

> 
> > > The worst temperature is necessary for the system cooling control
> > > decision.
> > 
> > I would expect the system cooling would understand that.
> > 
> 
> In thermal zone infrastructure there is one temperature input.
> How you can consider 64+ different inputs?

I've never used the thermal zone code. But i've used boards with 4
sensors spread around it. If the thermal zone code could not support
that, i would be surprised.

[Goes away and reads https://www.kernel.org/doc/Documentation/thermal/sysfs-api.txt]

So it sounds like, one zone is one sensor. So you actually have
hot-plugable zones, up to 64 of them. It also looks like you can bind
a zone to a cooling device. There does not seem to be a 1:1
mapping. So you should be able to bind 64 zones to one fan. Or if you
have multiple fans, bind a zone to the nearest fan.

But as i said, i'm no expert on this. You really should be posting
these patches on the hwmon list and the linux-pm list. The netdev list
does not have the needed specialist. Once Rui Zhang, Eduardo Valentin,
and Guenter Roack have given them Acked-by, David Miller can then
merge them.

      Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
  2018-06-21 19:49           ` Andrew Lunn
@ 2018-06-21 20:02             ` Vadim Pasternak
  2018-06-21 20:18               ` Andrew Lunn
  0 siblings, 1 reply; 19+ messages in thread
From: Vadim Pasternak @ 2018-06-21 20:02 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Guenter Roeck, davem, netdev, jiri



> -----Original Message-----
> From: Andrew Lunn [mailto:andrew@lunn.ch]
> Sent: Thursday, June 21, 2018 10:49 PM
> To: Vadim Pasternak <vadimp@mellanox.com>
> Cc: Guenter Roeck <linux@roeck-us.net>; davem@davemloft.net;
> netdev@vger.kernel.org; jiri@resnulli.us
> Subject: Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for
> port temperature reading
> 
> On Thu, Jun 21, 2018 at 07:17:03PM +0000, Vadim Pasternak wrote:
> >
> >
> > > -----Original Message-----
> > > From: Andrew Lunn [mailto:andrew@lunn.ch]
> > > Sent: Thursday, June 21, 2018 9:35 PM
> > > To: Vadim Pasternak <vadimp@mellanox.com>; Guenter Roeck
> > > <linux@roeck- us.net>
> > > Cc: davem@davemloft.net; netdev@vger.kernel.org; jiri@resnulli.us
> > > Subject: Re: [PATCH v0 03/12] mlxsw: core: Add core environment
> > > module for port temperature reading
> > >
> > > > Hi Andrew,
> > >
> > > Adding Guenter Roeck, the HWMON maintainer.
> > >
> > > > The temperature of each individual module can be obtained through
> > > > ethtool.
> > >
> > > You mean via --module-info?
> >
> > Yes.
> >
> > >
> > > FYI: I plan to add hwmon support to the kernel SFP code. So if you
> > > ever decide to swap to the kernel SFP code, not your own, the raw
> > > temperatures will be exported.
> > >
> >
> > Not sure it'll work for us, since we read SFP/QSFP ports through our
> > SW/FW interface.
> 
> Can you make fake i2c busses? Pass the i2c transactions to the firmware?

Theoretically yes.
But have well-defined SW/FW interface, working over PCI and FW at
ASIC end implements I2C master.
> 
> > But would be nice if you can provide some reference to this code.
> 
> drivers/net/phy/sfp.c
> 

Thank you.

> >
> > > > The worst temperature is necessary for the system cooling control
> > > > decision.
> > >
> > > I would expect the system cooling would understand that.
> > >
> >
> > In thermal zone infrastructure there is one temperature input.
> > How you can consider 64+ different inputs?
> 
> I've never used the thermal zone code. But i've used boards with 4 sensors
> spread around it. If the thermal zone code could not support that, i would be
> surprised.
> 
> [Goes away and reads
> https://www.kernel.org/doc/Documentation/thermal/sysfs-api.txt]
> 
> So it sounds like, one zone is one sensor. So you actually have hot-plugable
> zones, up to 64 of them. It also looks like you can bind a zone to a cooling
> device. There does not seem to be a 1:1 mapping. So you should be able to bind
> 64 zones to one fan. Or if you have multiple fans, bind a zone to the nearest fan.
> 

It means I will have 64 thermal zones for each module (actually for the
some coming new systems will support 128 modules, plus thermal zone
for ASIC ambient temperatures.
And each zone will try to control same PWM.
As I result PWM will be extremely jumpy and non-effective.

> But as i said, i'm no expert on this. You really should be posting these patches on
> the hwmon list and the linux-pm list. The netdev list does not have the needed
> specialist. Once Rui Zhang, Eduardo Valentin, and Guenter Roack have given
> them Acked-by, David Miller can then merge them.

Thanks,
Vadim.

> 
>       Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
  2018-06-21 20:02             ` Vadim Pasternak
@ 2018-06-21 20:18               ` Andrew Lunn
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Lunn @ 2018-06-21 20:18 UTC (permalink / raw)
  To: Vadim Pasternak; +Cc: Guenter Roeck, davem, netdev, jiri

> It means I will have 64 thermal zones for each module (actually for the
> some coming new systems will support 128 modules, plus thermal zone
> for ASIC ambient temperatures.
> And each zone will try to control same PWM.
> As I result PWM will be extremely jumpy and non-effective.

Hi Vadim

Please repost to the correct lists, and ask the experts how this
should be done. netdev is not the right place to discuss this.

       Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
  2018-06-21 18:34       ` Andrew Lunn
  2018-06-21 19:17         ` Vadim Pasternak
@ 2018-06-21 22:06         ` Guenter Roeck
  2018-06-22  9:00           ` Vadim Pasternak
  1 sibling, 1 reply; 19+ messages in thread
From: Guenter Roeck @ 2018-06-21 22:06 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Vadim Pasternak, davem, netdev, jiri

On Thu, Jun 21, 2018 at 08:34:40PM +0200, Andrew Lunn wrote:
> > Hi Andrew,
> 
> Adding Guenter Roeck, the HWMON maintainer.
>  
> > The temperature of each individual module can be obtained
> > through ethtool.
> 
> You mean via --module-info?
> 
> FYI: I plan to add hwmon support to the kernel SFP code. So if you
> ever decide to swap to the kernel SFP code, not your own, the raw
> temperatures will be exported.
> 
As should be. Unless adjustments are made by the hardware (eg a thermal
diode temperature offset register), all adjustments should be made in
userspace.

> > The worst temperature is necessary for the system cooling
> > control decision.
> 
> I would expect the system cooling would understand that.
> 
> > Up to 64 SFP/QSFP modules could be connected to the system.
> > Some of them could cooper modules, which doesn't provide
> > temperature measurement.
> 
> SFP modules are hot-plugable. So i would also expect the hwmon devices
> to hotplug. If there is no sensor, then there is no hwmon device... If
> there is no hwmon device, it plays no part in the thermal control
> loop.
> 
One hardware monitoring device per SFP, and I would assume that the
hwmon device for an SFP is only instantiated if a thermal sensor
is present.

> > Some of them could be optical modules, providing untrusted
> > temperature measurement, which could impact thermal
> > control of the system.
> 
> Why would you not trust it? Are you saying some modules simply have
> broken temperature sensors? Do you have a whitelist/blacklist of
> modules?
> 
> > Also optical modules could be from the different vendors,  and
> > this is real situation, when, f.e. one module has the warning and
> > critical thresholds 75C and 85C, while another 70C and 80C.
> 
> But hwmon exports both the actual temperature and the alarm
> temperatures. I would expect the thermal control code to use all this
> information when making its decisions, not just the current
> temperature.
> 
The respective information would either be provided by hardware
and reported to userspace, or userspace needs to determine the limits
and write them into the hardware. Either case, that is only relevant
if the hardware has limit registers. Otherwise all limits can and
should be handled in the thermal subsystem.

> > So, nominal temperature is not the case here, we should know the
> > "worst" value for the thermal control decision.
> 
> What it sounds like to me is you are working around problems in the
> thermal control by fudging the raw temperatures. That is the wrong
> thing to do. hwmon should export the raw data, and you should fix the
> thermal control code to use it correctly.
> 
Agreed. From the context it sounds like there should be some kind of
temperature aggregator which would probably reside in the thermal
subsystem (definitely not in hwmon).

I have not seen any hwmon specific patches. For new drivers,
please use [devm_]hwmon_device_register_with_info().

Guenter

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
  2018-06-21 22:06         ` Guenter Roeck
@ 2018-06-22  9:00           ` Vadim Pasternak
  0 siblings, 0 replies; 19+ messages in thread
From: Vadim Pasternak @ 2018-06-22  9:00 UTC (permalink / raw)
  To: Guenter Roeck, Andrew Lunn; +Cc: davem, netdev, jiri



> -----Original Message-----
> From: Guenter Roeck [mailto:linux@roeck-us.net]
> Sent: Friday, June 22, 2018 1:07 AM
> To: Andrew Lunn <andrew@lunn.ch>
> Cc: Vadim Pasternak <vadimp@mellanox.com>; davem@davemloft.net;
> netdev@vger.kernel.org; jiri@resnulli.us
> Subject: Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for
> port temperature reading
> 
> On Thu, Jun 21, 2018 at 08:34:40PM +0200, Andrew Lunn wrote:
> > > Hi Andrew,
> >
> > Adding Guenter Roeck, the HWMON maintainer.

Hi Guenter,

Thank you for reply.
We are going to re-post this patchset and add more people
for review, as Andrew suggested.

> >
> > > The temperature of each individual module can be obtained through
> > > ethtool.
> >
> > You mean via --module-info?
> >
> > FYI: I plan to add hwmon support to the kernel SFP code. So if you
> > ever decide to swap to the kernel SFP code, not your own, the raw
> > temperatures will be exported.
> >
> As should be. Unless adjustments are made by the hardware (eg a thermal diode
> temperature offset register), all adjustments should be made in userspace.
> 

>From hardware we read all module temperature in one request.
The summary of temperature is going to thermal module.

> > > The worst temperature is necessary for the system cooling control
> > > decision.
> >
> > I would expect the system cooling would understand that.
> >
> > > Up to 64 SFP/QSFP modules could be connected to the system.
> > > Some of them could cooper modules, which doesn't provide temperature
> > > measurement.
> >
> > SFP modules are hot-plugable. So i would also expect the hwmon devices
> > to hotplug. If there is no sensor, then there is no hwmon device... If
> > there is no hwmon device, it plays no part in the thermal control
> > loop.
> >
> One hardware monitoring device per SFP, and I would assume that the hwmon
> device for an SFP is only instantiated if a thermal sensor is present.
> 
> > > Some of them could be optical modules, providing untrusted
> > > temperature measurement, which could impact thermal control of the
> > > system.
> >
> > Why would you not trust it? Are you saying some modules simply have
> > broken temperature sensors? Do you have a whitelist/blacklist of
> > modules?
> >
> > > Also optical modules could be from the different vendors,  and this
> > > is real situation, when, f.e. one module has the warning and
> > > critical thresholds 75C and 85C, while another 70C and 80C.
> >
> > But hwmon exports both the actual temperature and the alarm
> > temperatures. I would expect the thermal control code to use all this
> > information when making its decisions, not just the current
> > temperature.
> >
> The respective information would either be provided by hardware and reported
> to userspace, or userspace needs to determine the limits and write them into the
> hardware. Either case, that is only relevant if the hardware has limit registers.
> Otherwise all limits can and should be handled in the thermal subsystem.
> 

This is the case. Limits, modules temperatures and thresholds are handled in
the thermal subsystem.

> > > So, nominal temperature is not the case here, we should know the
> > > "worst" value for the thermal control decision.
> >
> > What it sounds like to me is you are working around problems in the
> > thermal control by fudging the raw temperatures. That is the wrong
> > thing to do. hwmon should export the raw data, and you should fix the
> > thermal control code to use it correctly.
> >
> Agreed. From the context it sounds like there should be some kind of
> temperature aggregator which would probably reside in the thermal subsystem
> (definitely not in hwmon).

Exactly this is the kind of temperature aggregation, provided to the thermal
subsystem as the thermal zone temperature.

> 
> I have not seen any hwmon specific patches. For new drivers, please use
> [devm_]hwmon_device_register_with_info().

We already have hwmon object. This is the reason I didn't use this
interface. This existing object has been extend with a few new
attributes for FANs.
Also I added aggregated FAULT status of the modules. In case even
one module is considered as "untrusted" (this is getting from HW),
this attribute is set to true.
This indication could be used by the user.
In our systems such fault will impact the allowed PWM minimum,
which could be assigned to thermal zone cooling device.
For example, if in normal situation for some particular system
minimum could be set to 20%, then in case any untrusted module
is found, this minimum should be increased to 40% (these percent
are depend on system type).
It doesn't matter how many untrusted modules are inserted. If there
is even one, it could have very bad impact on system thermal flow. 

Thanks,
Vadim.

> 
> Guenter

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-06-22  9:00 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-21 15:27 [PATCH v0 00/12] mlxsw thermal monitoring amendments Vadim Pasternak
2018-06-21 15:27 ` [PATCH v0 01/12] mlxsw: spectrum: Move QSFP EEPROM defenitons to common location Vadim Pasternak
2018-06-21 15:27 ` [PATCH v0 02/12] mlxsw: reg: Add MTBR register Vadim Pasternak
2018-06-21 15:27 ` [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading Vadim Pasternak
2018-06-21 17:11   ` Andrew Lunn
2018-06-21 18:14     ` Vadim Pasternak
2018-06-21 18:34       ` Andrew Lunn
2018-06-21 19:17         ` Vadim Pasternak
2018-06-21 19:49           ` Andrew Lunn
2018-06-21 20:02             ` Vadim Pasternak
2018-06-21 20:18               ` Andrew Lunn
2018-06-21 22:06         ` Guenter Roeck
2018-06-22  9:00           ` Vadim Pasternak
2018-06-21 15:27 ` [PATCH v0 04/12] mlxsw: core: Extend hwmon interface with FAN fault attribute Vadim Pasternak
2018-06-21 15:27 ` [PATCH v0 05/12] mlxsw: core: Extend hwmon interface with port temperature attributes Vadim Pasternak
2018-06-21 15:28 ` [PATCH v0 06/12] mlxsw: core: Add bus frequency capability flag for the bus type Vadim Pasternak
2018-06-21 15:28 ` [PATCH v0 07/12] mlxsw: core: Set different thermal polling time based on " Vadim Pasternak
2018-06-21 15:28 ` [PATCH v0 08/12] mlxsw: core: Modify thermal zone definition Vadim Pasternak
2018-06-21 15:28 ` [PATCH v0 09/12] mlxsw: core: Extend thermal zone operations with get_trend method Vadim Pasternak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.