All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/2] aqc111: Thermal throttling feature
@ 2018-12-12 13:50 Igor Russkikh
  2018-12-12 13:50   ` [net,1/2] " Igor Russkikh
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Igor Russkikh @ 2018-12-12 13:50 UTC (permalink / raw)
  To: linux-usb, davem, netdev; +Cc: Dmitry Bezrukov, Igor Russkikh

This patches introduce the thermal throttling feature to prevent possible
heat damage to the hardware.

Dmitry Bezrukov (2):
  net: usb: aqc111: Add read_mdio operation
  net: usb: aqc111: Support for thermal throttling feature

 drivers/net/usb/aqc111.c | 83 ++++++++++++++++++++++++++++++++++++++++
 drivers/net/usb/aqc111.h | 19 +++++++++
 2 files changed, 102 insertions(+)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH net 1/2] net: usb: aqc111: Add read_mdio operation
@ 2018-12-12 13:50   ` Igor Russkikh
  0 siblings, 0 replies; 23+ messages in thread
From: Igor Russkikh @ 2018-12-12 13:50 UTC (permalink / raw)
  To: linux-usb, davem, netdev; +Cc: Dmitry Bezrukov, Igor Russkikh

From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>

Add necessary defines to read phy registers and temparature

Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 drivers/net/usb/aqc111.c |  5 +++++
 drivers/net/usb/aqc111.h | 11 +++++++++++
 2 files changed, 16 insertions(+)

diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c
index 57f1c94fca0b..8867f6a3eaa7 100644
--- a/drivers/net/usb/aqc111.c
+++ b/drivers/net/usb/aqc111.c
@@ -193,6 +193,11 @@ static int aqc111_write16_cmd_async(struct usbnet *dev, u8 cmd, u16 value,
 				      sizeof(tmp), &tmp);
 }
 
+static int aqc111_mdio_read(struct usbnet *dev, u16 value, u16 index, u16 *data)
+{
+	return aqc111_read16_cmd(dev, AQ_PHY_CMD, value, index, data);
+}
+
 static void aqc111_get_drvinfo(struct net_device *net,
 			       struct ethtool_drvinfo *info)
 {
diff --git a/drivers/net/usb/aqc111.h b/drivers/net/usb/aqc111.h
index 4d68b3a6067c..359663635b49 100644
--- a/drivers/net/usb/aqc111.h
+++ b/drivers/net/usb/aqc111.h
@@ -18,9 +18,20 @@
 #define AQ_ACCESS_MAC			0x01
 #define AQ_FLASH_PARAMETERS		0x20
 #define AQ_PHY_POWER			0x31
+#define AQ_PHY_CMD			0x32
 #define AQ_WOL_CFG			0x60
 #define AQ_PHY_OPS			0x61
 
+#define AQC111_PHY_ID			0x00
+#define AQ_PHY_ADDR(mmd)		((AQC111_PHY_ID << 8) | (mmd))
+
+#define AQ_PHY_GLOBAL_MMD		0x1E
+#define AQ_PHY_GLOBAL_ADDR		AQ_PHY_ADDR(AQ_PHY_GLOBAL_MMD)
+
+#define AQ_GLB_THERMAL_STATUS1		0xC820
+#define AQ_GLB_THERMAL_STATUS2		0xC821
+	#define AQ_THERM_ST_READY		0x0001
+
 #define AQ_USB_PHY_SET_TIMEOUT		10000
 #define AQ_USB_SET_TIMEOUT		4000
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net,1/2] net: usb: aqc111: Add read_mdio operation
@ 2018-12-12 13:50   ` Igor Russkikh
  0 siblings, 0 replies; 23+ messages in thread
From: Igor Russkikh @ 2018-12-12 13:50 UTC (permalink / raw)
  To: linux-usb, davem, netdev; +Cc: Dmitry Bezrukov, Igor Russkikh

From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>

Add necessary defines to read phy registers and temparature

Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 drivers/net/usb/aqc111.c |  5 +++++
 drivers/net/usb/aqc111.h | 11 +++++++++++
 2 files changed, 16 insertions(+)

diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c
index 57f1c94fca0b..8867f6a3eaa7 100644
--- a/drivers/net/usb/aqc111.c
+++ b/drivers/net/usb/aqc111.c
@@ -193,6 +193,11 @@ static int aqc111_write16_cmd_async(struct usbnet *dev, u8 cmd, u16 value,
 				      sizeof(tmp), &tmp);
 }
 
+static int aqc111_mdio_read(struct usbnet *dev, u16 value, u16 index, u16 *data)
+{
+	return aqc111_read16_cmd(dev, AQ_PHY_CMD, value, index, data);
+}
+
 static void aqc111_get_drvinfo(struct net_device *net,
 			       struct ethtool_drvinfo *info)
 {
diff --git a/drivers/net/usb/aqc111.h b/drivers/net/usb/aqc111.h
index 4d68b3a6067c..359663635b49 100644
--- a/drivers/net/usb/aqc111.h
+++ b/drivers/net/usb/aqc111.h
@@ -18,9 +18,20 @@
 #define AQ_ACCESS_MAC			0x01
 #define AQ_FLASH_PARAMETERS		0x20
 #define AQ_PHY_POWER			0x31
+#define AQ_PHY_CMD			0x32
 #define AQ_WOL_CFG			0x60
 #define AQ_PHY_OPS			0x61
 
+#define AQC111_PHY_ID			0x00
+#define AQ_PHY_ADDR(mmd)		((AQC111_PHY_ID << 8) | (mmd))
+
+#define AQ_PHY_GLOBAL_MMD		0x1E
+#define AQ_PHY_GLOBAL_ADDR		AQ_PHY_ADDR(AQ_PHY_GLOBAL_MMD)
+
+#define AQ_GLB_THERMAL_STATUS1		0xC820
+#define AQ_GLB_THERMAL_STATUS2		0xC821
+	#define AQ_THERM_ST_READY		0x0001
+
 #define AQ_USB_PHY_SET_TIMEOUT		10000
 #define AQ_USB_SET_TIMEOUT		4000
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling feature
@ 2018-12-12 13:50   ` Igor Russkikh
  0 siblings, 0 replies; 23+ messages in thread
From: Igor Russkikh @ 2018-12-12 13:50 UTC (permalink / raw)
  To: linux-usb, davem, netdev; +Cc: Dmitry Bezrukov, Igor Russkikh

From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>

Lab testing shows that chip may get overheated when
network link is connected on high speed (2.5G/5G).

Default hardware design uses only passive heatsink without a cooler,
and that makes things worse.

To prevent possible chip damage here we add throttling mechanism.

There is a worker that monitors the PHY's temperature via reading
PHY registers. When PHY's temperature reaches the critical threshold
(it is 106 degrees of Celsius) it changes the link speed to the lowest
regardless user/default link speed settings. It should reduce the PHY's
temperature to normal numbers.

When the PHY's temparature is back to normal (low threshold, it is
85 degrees) it restores user/default link speed settings.

Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 drivers/net/usb/aqc111.c | 78 ++++++++++++++++++++++++++++++++++++++++
 drivers/net/usb/aqc111.h |  8 +++++
 2 files changed, 86 insertions(+)

diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c
index 8867f6a3eaa7..fa6b9dfc2a6f 100644
--- a/drivers/net/usb/aqc111.c
+++ b/drivers/net/usb/aqc111.c
@@ -17,6 +17,7 @@
 #include <linux/usb/cdc.h>
 #include <linux/usb/usbnet.h>
 #include <linux/linkmode.h>
+#include <linux/workqueue.h>
 
 #include "aqc111.h"
 
@@ -334,6 +335,9 @@ static void aqc111_set_phy_speed(struct usbnet *dev, u8 autoneg, u16 speed)
 	aqc111_data->phy_cfg |= (3 << AQ_DSH_RETRIES_SHIFT) &
 				AQ_DSH_RETRIES_MASK;
 
+	if (aqc111_data->thermal_throttling)
+		speed = SPEED_100;
+
 	if (autoneg == AUTONEG_ENABLE) {
 		switch (speed) {
 		case SPEED_5000:
@@ -714,6 +718,8 @@ static int aqc111_bind(struct usbnet *dev, struct usb_interface *intf)
 	/* store aqc111_data pointer in device data field */
 	dev->driver_priv = aqc111_data;
 
+	aqc111_data->dev = dev;
+
 	/* Init the MAC address */
 	ret = aqc111_read_perm_mac(dev);
 	if (ret)
@@ -991,6 +997,71 @@ static int aqc111_link_reset(struct usbnet *dev)
 	return 0;
 }
 
+int aqc111_get_temperature(struct usbnet *dev, u32 *temperature)
+{
+	u16 reg16 = 0;
+
+	if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS2, AQ_PHY_GLOBAL_ADDR,
+			     &reg16) < 0)
+		goto err;
+
+	if (!(reg16 & AQ_THERM_ST_READY))
+		goto err;
+
+	if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS1, AQ_PHY_GLOBAL_ADDR,
+			     &reg16) < 0)
+		goto err;
+
+	/*convert from 1/256 to 1/100 degrees of Celsius*/
+	*temperature = (u32)reg16 * 100 / 256;
+	return 0;
+
+err:
+	*temperature = 0;
+	return -1;
+}
+
+void aqc111_thermal_work_cb(struct work_struct *w)
+{
+	struct delayed_work *dw = to_delayed_work(w);
+	struct aqc111_data *aqc111_data = container_of(dw, struct aqc111_data,
+						       therm_work);
+	unsigned long timeout = msecs_to_jiffies(AQ_THERMAL_TIMER_MS);
+	struct usbnet *dev = aqc111_data->dev;
+	u32 temperature = 0;
+	u8 reset_speed = 0;
+
+	if (!aqc111_data->link)
+		/* poll not so frequently */
+		timeout *= 2;
+
+	if (aqc111_get_temperature(dev, &temperature) != 0)
+		goto end;
+
+	if (aqc111_data->thermal_throttling &&
+	    temperature <= AQ_NORMAL_TEMP_THRESHOLD) {
+		netdev_info(dev->net, "The temperature is back to normal(%u)",
+			    AQ_NORMAL_TEMP_THRESHOLD / 100);
+		aqc111_data->thermal_throttling = 0;
+		reset_speed = 1;
+	}
+
+	if (!aqc111_data->thermal_throttling &&
+	    temperature >= AQ_CRITICAL_TEMP_THRESHOLD) {
+		netdev_warn(dev->net, "Critical temperature(%u) is reached",
+			    AQ_CRITICAL_TEMP_THRESHOLD / 100);
+		aqc111_data->thermal_throttling = 1;
+		reset_speed = 1;
+	}
+
+	if (reset_speed)
+		aqc111_set_phy_speed(dev, aqc111_data->autoneg,
+				     aqc111_data->advertised_speed);
+
+end:
+	schedule_delayed_work(&aqc111_data->therm_work, timeout);
+}
+
 static int aqc111_reset(struct usbnet *dev)
 {
 	struct aqc111_data *aqc111_data = dev->driver_priv;
@@ -1032,6 +1103,10 @@ static int aqc111_reset(struct usbnet *dev)
 	aqc111_set_phy_speed(dev, aqc111_data->autoneg,
 			     aqc111_data->advertised_speed);
 
+	INIT_DELAYED_WORK(&aqc111_data->therm_work, &aqc111_thermal_work_cb);
+	schedule_delayed_work(&aqc111_data->therm_work,
+			      msecs_to_jiffies(AQ_THERMAL_TIMER_MS));
+
 	return 0;
 }
 
@@ -1040,6 +1115,9 @@ static int aqc111_stop(struct usbnet *dev)
 	struct aqc111_data *aqc111_data = dev->driver_priv;
 	u16 reg16 = 0;
 
+	cancel_delayed_work_sync(&aqc111_data->therm_work);
+	aqc111_data->thermal_throttling = 0;
+
 	aqc111_read16_cmd(dev, AQ_ACCESS_MAC, SFR_MEDIUM_STATUS_MODE,
 			  2, &reg16);
 	reg16 &= ~SFR_MEDIUM_RECEIVE_EN;
diff --git a/drivers/net/usb/aqc111.h b/drivers/net/usb/aqc111.h
index 359663635b49..7b50e3cd217b 100644
--- a/drivers/net/usb/aqc111.h
+++ b/drivers/net/usb/aqc111.h
@@ -35,6 +35,11 @@
 #define AQ_USB_PHY_SET_TIMEOUT		10000
 #define AQ_USB_SET_TIMEOUT		4000
 
+#define AQ_THERMAL_TIMER_MS		500
+/* Temperature thresholds in units 1/100 degree of Celsius */
+#define AQ_NORMAL_TEMP_THRESHOLD	8500
+#define AQ_CRITICAL_TEMP_THRESHOLD	10600
+
 /* Feature. ********************************************/
 #define AQ_SUPPORT_FEATURE	(NETIF_F_SG | NETIF_F_IP_CSUM |\
 				 NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM |\
@@ -183,6 +188,9 @@ struct aqc111_data {
 	} fw_ver;
 	u32 phy_cfg;
 	u8 wol_flags;
+	struct delayed_work therm_work;
+	u8 thermal_throttling;
+	struct usbnet *dev;
 };
 
 #define AQ_LS_MASK		0x8000
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net,2/2] net: usb: aqc111: Support for thermal throttling feature
@ 2018-12-12 13:50   ` Igor Russkikh
  0 siblings, 0 replies; 23+ messages in thread
From: Igor Russkikh @ 2018-12-12 13:50 UTC (permalink / raw)
  To: linux-usb, davem, netdev; +Cc: Dmitry Bezrukov, Igor Russkikh

From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>

Lab testing shows that chip may get overheated when
network link is connected on high speed (2.5G/5G).

Default hardware design uses only passive heatsink without a cooler,
and that makes things worse.

To prevent possible chip damage here we add throttling mechanism.

There is a worker that monitors the PHY's temperature via reading
PHY registers. When PHY's temperature reaches the critical threshold
(it is 106 degrees of Celsius) it changes the link speed to the lowest
regardless user/default link speed settings. It should reduce the PHY's
temperature to normal numbers.

When the PHY's temparature is back to normal (low threshold, it is
85 degrees) it restores user/default link speed settings.

Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 drivers/net/usb/aqc111.c | 78 ++++++++++++++++++++++++++++++++++++++++
 drivers/net/usb/aqc111.h |  8 +++++
 2 files changed, 86 insertions(+)

diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c
index 8867f6a3eaa7..fa6b9dfc2a6f 100644
--- a/drivers/net/usb/aqc111.c
+++ b/drivers/net/usb/aqc111.c
@@ -17,6 +17,7 @@
 #include <linux/usb/cdc.h>
 #include <linux/usb/usbnet.h>
 #include <linux/linkmode.h>
+#include <linux/workqueue.h>
 
 #include "aqc111.h"
 
@@ -334,6 +335,9 @@ static void aqc111_set_phy_speed(struct usbnet *dev, u8 autoneg, u16 speed)
 	aqc111_data->phy_cfg |= (3 << AQ_DSH_RETRIES_SHIFT) &
 				AQ_DSH_RETRIES_MASK;
 
+	if (aqc111_data->thermal_throttling)
+		speed = SPEED_100;
+
 	if (autoneg == AUTONEG_ENABLE) {
 		switch (speed) {
 		case SPEED_5000:
@@ -714,6 +718,8 @@ static int aqc111_bind(struct usbnet *dev, struct usb_interface *intf)
 	/* store aqc111_data pointer in device data field */
 	dev->driver_priv = aqc111_data;
 
+	aqc111_data->dev = dev;
+
 	/* Init the MAC address */
 	ret = aqc111_read_perm_mac(dev);
 	if (ret)
@@ -991,6 +997,71 @@ static int aqc111_link_reset(struct usbnet *dev)
 	return 0;
 }
 
+int aqc111_get_temperature(struct usbnet *dev, u32 *temperature)
+{
+	u16 reg16 = 0;
+
+	if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS2, AQ_PHY_GLOBAL_ADDR,
+			     &reg16) < 0)
+		goto err;
+
+	if (!(reg16 & AQ_THERM_ST_READY))
+		goto err;
+
+	if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS1, AQ_PHY_GLOBAL_ADDR,
+			     &reg16) < 0)
+		goto err;
+
+	/*convert from 1/256 to 1/100 degrees of Celsius*/
+	*temperature = (u32)reg16 * 100 / 256;
+	return 0;
+
+err:
+	*temperature = 0;
+	return -1;
+}
+
+void aqc111_thermal_work_cb(struct work_struct *w)
+{
+	struct delayed_work *dw = to_delayed_work(w);
+	struct aqc111_data *aqc111_data = container_of(dw, struct aqc111_data,
+						       therm_work);
+	unsigned long timeout = msecs_to_jiffies(AQ_THERMAL_TIMER_MS);
+	struct usbnet *dev = aqc111_data->dev;
+	u32 temperature = 0;
+	u8 reset_speed = 0;
+
+	if (!aqc111_data->link)
+		/* poll not so frequently */
+		timeout *= 2;
+
+	if (aqc111_get_temperature(dev, &temperature) != 0)
+		goto end;
+
+	if (aqc111_data->thermal_throttling &&
+	    temperature <= AQ_NORMAL_TEMP_THRESHOLD) {
+		netdev_info(dev->net, "The temperature is back to normal(%u)",
+			    AQ_NORMAL_TEMP_THRESHOLD / 100);
+		aqc111_data->thermal_throttling = 0;
+		reset_speed = 1;
+	}
+
+	if (!aqc111_data->thermal_throttling &&
+	    temperature >= AQ_CRITICAL_TEMP_THRESHOLD) {
+		netdev_warn(dev->net, "Critical temperature(%u) is reached",
+			    AQ_CRITICAL_TEMP_THRESHOLD / 100);
+		aqc111_data->thermal_throttling = 1;
+		reset_speed = 1;
+	}
+
+	if (reset_speed)
+		aqc111_set_phy_speed(dev, aqc111_data->autoneg,
+				     aqc111_data->advertised_speed);
+
+end:
+	schedule_delayed_work(&aqc111_data->therm_work, timeout);
+}
+
 static int aqc111_reset(struct usbnet *dev)
 {
 	struct aqc111_data *aqc111_data = dev->driver_priv;
@@ -1032,6 +1103,10 @@ static int aqc111_reset(struct usbnet *dev)
 	aqc111_set_phy_speed(dev, aqc111_data->autoneg,
 			     aqc111_data->advertised_speed);
 
+	INIT_DELAYED_WORK(&aqc111_data->therm_work, &aqc111_thermal_work_cb);
+	schedule_delayed_work(&aqc111_data->therm_work,
+			      msecs_to_jiffies(AQ_THERMAL_TIMER_MS));
+
 	return 0;
 }
 
@@ -1040,6 +1115,9 @@ static int aqc111_stop(struct usbnet *dev)
 	struct aqc111_data *aqc111_data = dev->driver_priv;
 	u16 reg16 = 0;
 
+	cancel_delayed_work_sync(&aqc111_data->therm_work);
+	aqc111_data->thermal_throttling = 0;
+
 	aqc111_read16_cmd(dev, AQ_ACCESS_MAC, SFR_MEDIUM_STATUS_MODE,
 			  2, &reg16);
 	reg16 &= ~SFR_MEDIUM_RECEIVE_EN;
diff --git a/drivers/net/usb/aqc111.h b/drivers/net/usb/aqc111.h
index 359663635b49..7b50e3cd217b 100644
--- a/drivers/net/usb/aqc111.h
+++ b/drivers/net/usb/aqc111.h
@@ -35,6 +35,11 @@
 #define AQ_USB_PHY_SET_TIMEOUT		10000
 #define AQ_USB_SET_TIMEOUT		4000
 
+#define AQ_THERMAL_TIMER_MS		500
+/* Temperature thresholds in units 1/100 degree of Celsius */
+#define AQ_NORMAL_TEMP_THRESHOLD	8500
+#define AQ_CRITICAL_TEMP_THRESHOLD	10600
+
 /* Feature. ********************************************/
 #define AQ_SUPPORT_FEATURE	(NETIF_F_SG | NETIF_F_IP_CSUM |\
 				 NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM |\
@@ -183,6 +188,9 @@ struct aqc111_data {
 	} fw_ver;
 	u32 phy_cfg;
 	u8 wol_flags;
+	struct delayed_work therm_work;
+	u8 thermal_throttling;
+	struct usbnet *dev;
 };
 
 #define AQ_LS_MASK		0x8000

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net 0/2] aqc111: Thermal throttling feature
  2018-12-12 13:50 [PATCH net 0/2] aqc111: Thermal throttling feature Igor Russkikh
  2018-12-12 13:50   ` [net,1/2] " Igor Russkikh
  2018-12-12 13:50   ` [net,2/2] " Igor Russkikh
@ 2018-12-12 13:54 ` Igor Russkikh
  2018-12-12 18:15   ` Florian Fainelli
  2018-12-13  0:18 ` David Miller
  3 siblings, 1 reply; 23+ messages in thread
From: Igor Russkikh @ 2018-12-12 13:54 UTC (permalink / raw)
  To: davem, netdev; +Cc: Dmitry Bezrukov



On 12.12.2018 16:50, Igor Russkikh wrote:
> This patches introduce the thermal throttling feature to prevent possible
> heat damage to the hardware.
> 
> Dmitry Bezrukov (2):
>   net: usb: aqc111: Add read_mdio operation
>   net: usb: aqc111: Support for thermal throttling feature
> 
>  drivers/net/usb/aqc111.c | 83 ++++++++++++++++++++++++++++++++++++++++
>  drivers/net/usb/aqc111.h | 19 +++++++++
>  2 files changed, 102 insertions(+)
> 

Hi David, all,

This is of course designated for the "net-next" tree.

I'll resubmit this after review.

Regards,
  Igor

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling feature
@ 2018-12-12 16:08     ` Andrew Lunn
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Lunn @ 2018-12-12 16:08 UTC (permalink / raw)
  To: Igor Russkikh; +Cc: linux-usb, davem, netdev, Dmitry Bezrukov

On Wed, Dec 12, 2018 at 01:50:10PM +0000, Igor Russkikh wrote:
> From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
> 
> Lab testing shows that chip may get overheated when
> network link is connected on high speed (2.5G/5G).
> 
> Default hardware design uses only passive heatsink without a cooler,
> and that makes things worse.
> 
> To prevent possible chip damage here we add throttling mechanism.
> 
> There is a worker that monitors the PHY's temperature via reading
> PHY registers. When PHY's temperature reaches the critical threshold
> (it is 106 degrees of Celsius) it changes the link speed to the lowest
> regardless user/default link speed settings. It should reduce the PHY's
> temperature to normal numbers.
> 
> When the PHY's temparature is back to normal (low threshold, it is
> 85 degrees) it restores user/default link speed settings.

Hi Igor

Please could you also export the temperature using HWMON. The Marvell
PHY drivers are good examples.

I also have a patch for driver/net/phy/aquantia.c which adds HWMON
support to that PHY.

> 
> Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
> ---
>  drivers/net/usb/aqc111.c | 78 ++++++++++++++++++++++++++++++++++++++++
>  drivers/net/usb/aqc111.h |  8 +++++
>  2 files changed, 86 insertions(+)
> 
> diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c
> index 8867f6a3eaa7..fa6b9dfc2a6f 100644
> --- a/drivers/net/usb/aqc111.c
> +++ b/drivers/net/usb/aqc111.c
> @@ -17,6 +17,7 @@
>  #include <linux/usb/cdc.h>
>  #include <linux/usb/usbnet.h>
>  #include <linux/linkmode.h>
> +#include <linux/workqueue.h>
>  
>  #include "aqc111.h"
>  
> @@ -334,6 +335,9 @@ static void aqc111_set_phy_speed(struct usbnet *dev, u8 autoneg, u16 speed)
>  	aqc111_data->phy_cfg |= (3 << AQ_DSH_RETRIES_SHIFT) &
>  				AQ_DSH_RETRIES_MASK;
>  
> +	if (aqc111_data->thermal_throttling)
> +		speed = SPEED_100;
> +

That is a big jump. Do you need to cool is down quickly, or would
1Gbps be sufficient? I think as a user, i would prefer it ping-pongs
between 5G and 1G, not 5G and 100M.

>  	if (autoneg == AUTONEG_ENABLE) {
>  		switch (speed) {
>  		case SPEED_5000:

It looks like this only works when auto-neg is enabled. If i've fixed
configured it i don't think this works?

> @@ -714,6 +718,8 @@ static int aqc111_bind(struct usbnet *dev, struct usb_interface *intf)
>  	/* store aqc111_data pointer in device data field */
>  	dev->driver_priv = aqc111_data;
>  
> +	aqc111_data->dev = dev;
> +
>  	/* Init the MAC address */
>  	ret = aqc111_read_perm_mac(dev);
>  	if (ret)
> @@ -991,6 +997,71 @@ static int aqc111_link_reset(struct usbnet *dev)
>  	return 0;
>  }
>  
> +int aqc111_get_temperature(struct usbnet *dev, u32 *temperature)
> +{
> +	u16 reg16 = 0;
> +
> +	if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS2, AQ_PHY_GLOBAL_ADDR,
> +			     &reg16) < 0)
> +		goto err;
> +
> +	if (!(reg16 & AQ_THERM_ST_READY))
> +		goto err;
> +
> +	if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS1, AQ_PHY_GLOBAL_ADDR,
> +			     &reg16) < 0)
> +		goto err;
> +
> +	/*convert from 1/256 to 1/100 degrees of Celsius*/
> +	*temperature = (u32)reg16 * 100 / 256;
> +	return 0;
> +
> +err:
> +	*temperature = 0;
> +	return -1;
> +}
> +
> +void aqc111_thermal_work_cb(struct work_struct *w)
> +{
> +	struct delayed_work *dw = to_delayed_work(w);
> +	struct aqc111_data *aqc111_data = container_of(dw, struct aqc111_data,
> +						       therm_work);
> +	unsigned long timeout = msecs_to_jiffies(AQ_THERMAL_TIMER_MS);
> +	struct usbnet *dev = aqc111_data->dev;
> +	u32 temperature = 0;
> +	u8 reset_speed = 0;
> +
> +	if (!aqc111_data->link)
> +		/* poll not so frequently */
> +		timeout *= 2;
> +
> +	if (aqc111_get_temperature(dev, &temperature) != 0)
> +		goto end;
> +
> +	if (aqc111_data->thermal_throttling &&
> +	    temperature <= AQ_NORMAL_TEMP_THRESHOLD) {
> +		netdev_info(dev->net, "The temperature is back to normal(%u)",
> +			    AQ_NORMAL_TEMP_THRESHOLD / 100);

How often do you see these messages? I'm wondering if they need to be
rate limited?

> +		aqc111_data->thermal_throttling = 0;
> +		reset_speed = 1;
> +	}
> +
> +	if (!aqc111_data->thermal_throttling &&
> +	    temperature >= AQ_CRITICAL_TEMP_THRESHOLD) {

Should there be some hysteresis in here? In fact, if temperature is
AQ_CRITICAL_TEMP_THRESHOLD it is both back to normal, and throttled at
the same time!

> +		netdev_warn(dev->net, "Critical temperature(%u) is reached",
> +			    AQ_CRITICAL_TEMP_THRESHOLD / 100);
> +		aqc111_data->thermal_throttling = 1;
> +		reset_speed = 1;

update_speed might be a better name, since you are not always
resetting it.

	  Andrew

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [net,2/2] net: usb: aqc111: Support for thermal throttling feature
@ 2018-12-12 16:08     ` Andrew Lunn
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Lunn @ 2018-12-12 16:08 UTC (permalink / raw)
  To: Igor Russkikh; +Cc: linux-usb, davem, netdev, Dmitry Bezrukov

On Wed, Dec 12, 2018 at 01:50:10PM +0000, Igor Russkikh wrote:
> From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
> 
> Lab testing shows that chip may get overheated when
> network link is connected on high speed (2.5G/5G).
> 
> Default hardware design uses only passive heatsink without a cooler,
> and that makes things worse.
> 
> To prevent possible chip damage here we add throttling mechanism.
> 
> There is a worker that monitors the PHY's temperature via reading
> PHY registers. When PHY's temperature reaches the critical threshold
> (it is 106 degrees of Celsius) it changes the link speed to the lowest
> regardless user/default link speed settings. It should reduce the PHY's
> temperature to normal numbers.
> 
> When the PHY's temparature is back to normal (low threshold, it is
> 85 degrees) it restores user/default link speed settings.

Hi Igor

Please could you also export the temperature using HWMON. The Marvell
PHY drivers are good examples.

I also have a patch for driver/net/phy/aquantia.c which adds HWMON
support to that PHY.

> 
> Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
> ---
>  drivers/net/usb/aqc111.c | 78 ++++++++++++++++++++++++++++++++++++++++
>  drivers/net/usb/aqc111.h |  8 +++++
>  2 files changed, 86 insertions(+)
> 
> diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c
> index 8867f6a3eaa7..fa6b9dfc2a6f 100644
> --- a/drivers/net/usb/aqc111.c
> +++ b/drivers/net/usb/aqc111.c
> @@ -17,6 +17,7 @@
>  #include <linux/usb/cdc.h>
>  #include <linux/usb/usbnet.h>
>  #include <linux/linkmode.h>
> +#include <linux/workqueue.h>
>  
>  #include "aqc111.h"
>  
> @@ -334,6 +335,9 @@ static void aqc111_set_phy_speed(struct usbnet *dev, u8 autoneg, u16 speed)
>  	aqc111_data->phy_cfg |= (3 << AQ_DSH_RETRIES_SHIFT) &
>  				AQ_DSH_RETRIES_MASK;
>  
> +	if (aqc111_data->thermal_throttling)
> +		speed = SPEED_100;
> +

That is a big jump. Do you need to cool is down quickly, or would
1Gbps be sufficient? I think as a user, i would prefer it ping-pongs
between 5G and 1G, not 5G and 100M.

>  	if (autoneg == AUTONEG_ENABLE) {
>  		switch (speed) {
>  		case SPEED_5000:

It looks like this only works when auto-neg is enabled. If i've fixed
configured it i don't think this works?

> @@ -714,6 +718,8 @@ static int aqc111_bind(struct usbnet *dev, struct usb_interface *intf)
>  	/* store aqc111_data pointer in device data field */
>  	dev->driver_priv = aqc111_data;
>  
> +	aqc111_data->dev = dev;
> +
>  	/* Init the MAC address */
>  	ret = aqc111_read_perm_mac(dev);
>  	if (ret)
> @@ -991,6 +997,71 @@ static int aqc111_link_reset(struct usbnet *dev)
>  	return 0;
>  }
>  
> +int aqc111_get_temperature(struct usbnet *dev, u32 *temperature)
> +{
> +	u16 reg16 = 0;
> +
> +	if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS2, AQ_PHY_GLOBAL_ADDR,
> +			     &reg16) < 0)
> +		goto err;
> +
> +	if (!(reg16 & AQ_THERM_ST_READY))
> +		goto err;
> +
> +	if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS1, AQ_PHY_GLOBAL_ADDR,
> +			     &reg16) < 0)
> +		goto err;
> +
> +	/*convert from 1/256 to 1/100 degrees of Celsius*/
> +	*temperature = (u32)reg16 * 100 / 256;
> +	return 0;
> +
> +err:
> +	*temperature = 0;
> +	return -1;
> +}
> +
> +void aqc111_thermal_work_cb(struct work_struct *w)
> +{
> +	struct delayed_work *dw = to_delayed_work(w);
> +	struct aqc111_data *aqc111_data = container_of(dw, struct aqc111_data,
> +						       therm_work);
> +	unsigned long timeout = msecs_to_jiffies(AQ_THERMAL_TIMER_MS);
> +	struct usbnet *dev = aqc111_data->dev;
> +	u32 temperature = 0;
> +	u8 reset_speed = 0;
> +
> +	if (!aqc111_data->link)
> +		/* poll not so frequently */
> +		timeout *= 2;
> +
> +	if (aqc111_get_temperature(dev, &temperature) != 0)
> +		goto end;
> +
> +	if (aqc111_data->thermal_throttling &&
> +	    temperature <= AQ_NORMAL_TEMP_THRESHOLD) {
> +		netdev_info(dev->net, "The temperature is back to normal(%u)",
> +			    AQ_NORMAL_TEMP_THRESHOLD / 100);

How often do you see these messages? I'm wondering if they need to be
rate limited?

> +		aqc111_data->thermal_throttling = 0;
> +		reset_speed = 1;
> +	}
> +
> +	if (!aqc111_data->thermal_throttling &&
> +	    temperature >= AQ_CRITICAL_TEMP_THRESHOLD) {

Should there be some hysteresis in here? In fact, if temperature is
AQ_CRITICAL_TEMP_THRESHOLD it is both back to normal, and throttled at
the same time!

> +		netdev_warn(dev->net, "Critical temperature(%u) is reached",
> +			    AQ_CRITICAL_TEMP_THRESHOLD / 100);
> +		aqc111_data->thermal_throttling = 1;
> +		reset_speed = 1;

update_speed might be a better name, since you are not always
resetting it.

	  Andrew

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net 1/2] net: usb: aqc111: Add read_mdio operation
@ 2018-12-12 16:11     ` Andrew Lunn
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Lunn @ 2018-12-12 16:11 UTC (permalink / raw)
  To: Igor Russkikh; +Cc: linux-usb, davem, netdev, Dmitry Bezrukov

On Wed, Dec 12, 2018 at 01:50:08PM +0000, Igor Russkikh wrote:
> From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
> 
> Add necessary defines to read phy registers and temparature

Hi Igor

[Puts tongue in cheek]

I thought the firmware was supposed to manage the PHY.

So maybe the better fix is to add code to allow firmware upgrade? And
issue new firmware to linux-firmware?

      Andrew

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [net,1/2] net: usb: aqc111: Add read_mdio operation
@ 2018-12-12 16:11     ` Andrew Lunn
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Lunn @ 2018-12-12 16:11 UTC (permalink / raw)
  To: Igor Russkikh; +Cc: linux-usb, davem, netdev, Dmitry Bezrukov

On Wed, Dec 12, 2018 at 01:50:08PM +0000, Igor Russkikh wrote:
> From: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
> 
> Add necessary defines to read phy registers and temparature

Hi Igor

[Puts tongue in cheek]

I thought the firmware was supposed to manage the PHY.

So maybe the better fix is to add code to allow firmware upgrade? And
issue new firmware to linux-firmware?

      Andrew

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net 0/2] aqc111: Thermal throttling feature
  2018-12-12 13:54 ` [PATCH net 0/2] aqc111: Thermal " Igor Russkikh
@ 2018-12-12 18:15   ` Florian Fainelli
  2018-12-12 20:08     ` Igor Russkikh
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Fainelli @ 2018-12-12 18:15 UTC (permalink / raw)
  To: Igor Russkikh, davem, netdev; +Cc: Dmitry Bezrukov, andrew

On 12/12/18 5:54 AM, Igor Russkikh wrote:
> 
> 
> On 12.12.2018 16:50, Igor Russkikh wrote:
>> This patches introduce the thermal throttling feature to prevent possible
>> heat damage to the hardware.
>>
>> Dmitry Bezrukov (2):
>>   net: usb: aqc111: Add read_mdio operation
>>   net: usb: aqc111: Support for thermal throttling feature
>>
>>  drivers/net/usb/aqc111.c | 83 ++++++++++++++++++++++++++++++++++++++++
>>  drivers/net/usb/aqc111.h | 19 +++++++++
>>  2 files changed, 102 insertions(+)
>>
> 
> Hi David, all,
> 
> This is of course designated for the "net-next" tree.
> 
> I'll resubmit this after review.

The idea of having the PHY/network device as a cooling agent is
something valuable, but as Andrew pointed out, you need to expose this
as a standard HWMON device, and you need to let user-space implement the
appropriate thermal policy, not do that in the network driver underneath
the user's feet with no feedback other than link dropped, got
re-negotiated at a different speed. How would one be able to
differentiate those events from a faulty link partner for instance?

None of what you are doing here is specific to your device driver and
the policy of downgrading the link speed to lower the thermal budget is
something that is nearly universally applicable to all network
equipments because higher speeds just require higher power.
-- 
Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling feature
@ 2018-12-12 19:34       ` Igor Russkikh
  0 siblings, 0 replies; 23+ messages in thread
From: Igor Russkikh @ 2018-12-12 19:34 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: linux-usb, davem, netdev, Dmitry Bezrukov

Hi Andrew,

>> When the PHY's temparature is back to normal (low threshold, it is
>> 85 degrees) it restores user/default link speed settings.
> 
> Hi Igor
> 
> Please could you also export the temperature using HWMON. The Marvell
> PHY drivers are good examples.
> 
> I also have a patch for driver/net/phy/aquantia.c which adds HWMON
> support to that PHY.

We actually have almost ready patches to submit hwmon support separately
for both AQC USB and AQC PCI MAC drivers.
Will do that in near time.

>> +	if (aqc111_data->thermal_throttling)
>> +		speed = SPEED_100;
>> +
> 
> That is a big jump. Do you need to cool is down quickly, or would
> 1Gbps be sufficient? I think as a user, i would prefer it ping-pongs
> between 5G and 1G, not 5G and 100M.

In case overheat happens that really means something bad is happening outside,
or heatsink is bad/detached. I'll consult our HW team once again, but 100M was
the original request from our phy/hw team. It seems it really much less on power and
1G may not be enough.

> 
>>  	if (autoneg == AUTONEG_ENABLE) {
>>  		switch (speed) {
>>  		case SPEED_5000:
> 
> It looks like this only works when auto-neg is enabled. If i've fixed
> configured it i don't think this works?

Looks you are right, will check this.


>> +	if (aqc111_data->thermal_throttling &&
>> +	    temperature <= AQ_NORMAL_TEMP_THRESHOLD) {
>> +		netdev_info(dev->net, "The temperature is back to normal(%u)",
>> +			    AQ_NORMAL_TEMP_THRESHOLD / 100);
> 
> How often do you see these messages? I'm wondering if they need to be
> rate limited?

In worst case that will be at least limited by link down/up internal,
which in case of 5G normally takes 3-5secs.

>> +	if (!aqc111_data->thermal_throttling &&
>> +	    temperature >= AQ_CRITICAL_TEMP_THRESHOLD) {
> 
> Should there be some hysteresis in here? In fact, if temperature is
> AQ_CRITICAL_TEMP_THRESHOLD it is both back to normal, and throttled at
> the same time!

NORMAL_TEMP and CRITICAL_TEMP values are actually a hysteresis.
In cool down case only after TEMP_NORMAL temperature link will be flipped back again.

> 
>> +		netdev_warn(dev->net, "Critical temperature(%u) is reached",
>> +			    AQ_CRITICAL_TEMP_THRESHOLD / 100);
>> +		aqc111_data->thermal_throttling = 1;
>> +		reset_speed = 1;
> 
> update_speed might be a better name, since you are not always
> resetting it.

Agreed.

Regards,
  Igor

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [net,2/2] net: usb: aqc111: Support for thermal throttling feature
@ 2018-12-12 19:34       ` Igor Russkikh
  0 siblings, 0 replies; 23+ messages in thread
From: Igor Russkikh @ 2018-12-12 19:34 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: linux-usb, davem, netdev, Dmitry Bezrukov

Hi Andrew,

>> When the PHY's temparature is back to normal (low threshold, it is
>> 85 degrees) it restores user/default link speed settings.
> 
> Hi Igor
> 
> Please could you also export the temperature using HWMON. The Marvell
> PHY drivers are good examples.
> 
> I also have a patch for driver/net/phy/aquantia.c which adds HWMON
> support to that PHY.

We actually have almost ready patches to submit hwmon support separately
for both AQC USB and AQC PCI MAC drivers.
Will do that in near time.

>> +	if (aqc111_data->thermal_throttling)
>> +		speed = SPEED_100;
>> +
> 
> That is a big jump. Do you need to cool is down quickly, or would
> 1Gbps be sufficient? I think as a user, i would prefer it ping-pongs
> between 5G and 1G, not 5G and 100M.

In case overheat happens that really means something bad is happening outside,
or heatsink is bad/detached. I'll consult our HW team once again, but 100M was
the original request from our phy/hw team. It seems it really much less on power and
1G may not be enough.

> 
>>  	if (autoneg == AUTONEG_ENABLE) {
>>  		switch (speed) {
>>  		case SPEED_5000:
> 
> It looks like this only works when auto-neg is enabled. If i've fixed
> configured it i don't think this works?

Looks you are right, will check this.


>> +	if (aqc111_data->thermal_throttling &&
>> +	    temperature <= AQ_NORMAL_TEMP_THRESHOLD) {
>> +		netdev_info(dev->net, "The temperature is back to normal(%u)",
>> +			    AQ_NORMAL_TEMP_THRESHOLD / 100);
> 
> How often do you see these messages? I'm wondering if they need to be
> rate limited?

In worst case that will be at least limited by link down/up internal,
which in case of 5G normally takes 3-5secs.

>> +	if (!aqc111_data->thermal_throttling &&
>> +	    temperature >= AQ_CRITICAL_TEMP_THRESHOLD) {
> 
> Should there be some hysteresis in here? In fact, if temperature is
> AQ_CRITICAL_TEMP_THRESHOLD it is both back to normal, and throttled at
> the same time!

NORMAL_TEMP and CRITICAL_TEMP values are actually a hysteresis.
In cool down case only after TEMP_NORMAL temperature link will be flipped back again.

> 
>> +		netdev_warn(dev->net, "Critical temperature(%u) is reached",
>> +			    AQ_CRITICAL_TEMP_THRESHOLD / 100);
>> +		aqc111_data->thermal_throttling = 1;
>> +		reset_speed = 1;
> 
> update_speed might be a better name, since you are not always
> resetting it.

Agreed.

Regards,
  Igor

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net 1/2] net: usb: aqc111: Add read_mdio operation
@ 2018-12-12 19:38       ` Igor Russkikh
  0 siblings, 0 replies; 23+ messages in thread
From: Igor Russkikh @ 2018-12-12 19:38 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: linux-usb, davem, netdev, Dmitry Bezrukov


>>
>> Add necessary defines to read phy registers and temparature
> 
> Hi Igor
> 
> [Puts tongue in cheek]
> 
> I thought the firmware was supposed to manage the PHY.

FW team due to various reasons do not want to have thermal throttling in their control,
Thus at this moment we are trying to release the product which
will not burn out the table under it ;-)

> So maybe the better fix is to add code to allow firmware upgrade? And
> issue new firmware to linux-firmware?

I would say thats our long term future.

On your request for linux-firmware, I've pushed the question to Phy team,
will inform you on any news.

Regards,
  Igor

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [net,1/2] net: usb: aqc111: Add read_mdio operation
@ 2018-12-12 19:38       ` Igor Russkikh
  0 siblings, 0 replies; 23+ messages in thread
From: Igor Russkikh @ 2018-12-12 19:38 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: linux-usb, davem, netdev, Dmitry Bezrukov

>>
>> Add necessary defines to read phy registers and temparature
> 
> Hi Igor
> 
> [Puts tongue in cheek]
> 
> I thought the firmware was supposed to manage the PHY.

FW team due to various reasons do not want to have thermal throttling in their control,
Thus at this moment we are trying to release the product which
will not burn out the table under it ;-)

> So maybe the better fix is to add code to allow firmware upgrade? And
> issue new firmware to linux-firmware?

I would say thats our long term future.

On your request for linux-firmware, I've pushed the question to Phy team,
will inform you on any news.

Regards,
  Igor

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net 0/2] aqc111: Thermal throttling feature
  2018-12-12 18:15   ` Florian Fainelli
@ 2018-12-12 20:08     ` Igor Russkikh
  2018-12-12 20:28       ` Florian Fainelli
  2018-12-12 20:38       ` Andrew Lunn
  0 siblings, 2 replies; 23+ messages in thread
From: Igor Russkikh @ 2018-12-12 20:08 UTC (permalink / raw)
  To: Florian Fainelli, davem, netdev; +Cc: Dmitry Bezrukov, andrew


> 
> The idea of having the PHY/network device as a cooling agent is
> something valuable, but as Andrew pointed out, you need to expose this
> as a standard HWMON device, and you need to let user-space implement the
> appropriate thermal policy, not do that in the network driver underneath
> the user's feet with no feedback other than link dropped, got
> re-negotiated at a different speed. How would one be able to
> differentiate those events from a faulty link partner for instance?

> 
> None of what you are doing here is specific to your device driver and
> the policy of downgrading the link speed to lower the thermal budget is
> something that is nearly universally applicable to all network
> equipments because higher speeds just require higher power.
> 

Hi Florian,
Partially agreed with you, but as far as I know there is no much of
ready to use infrastructure for this to use right now?

IMHO that could be a both-way solution, where short term driver patch
will secure against hardware burn out right now, and long term hwmon
based infrastructure could handle that on userspace level.

A whole separate concern is how much userspace should be involved here.
It could be a very device specific (and therefore driver specific) logic
on how to do device's thermal control.

Regards,
  Igor

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling feature
@ 2018-12-12 20:28         ` Andrew Lunn
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Lunn @ 2018-12-12 20:28 UTC (permalink / raw)
  To: Igor Russkikh; +Cc: linux-usb, davem, netdev, Dmitry Bezrukov

> NORMAL_TEMP and CRITICAL_TEMP values are actually a hysteresis.  In
> cool down case only after TEMP_NORMAL temperature link will be
> flipped back again.

Ah, yes, sorry, i read that wrong.

    Andrew

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [net,2/2] net: usb: aqc111: Support for thermal throttling feature
@ 2018-12-12 20:28         ` Andrew Lunn
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Lunn @ 2018-12-12 20:28 UTC (permalink / raw)
  To: Igor Russkikh; +Cc: linux-usb, davem, netdev, Dmitry Bezrukov

> NORMAL_TEMP and CRITICAL_TEMP values are actually a hysteresis.  In
> cool down case only after TEMP_NORMAL temperature link will be
> flipped back again.

Ah, yes, sorry, i read that wrong.

    Andrew

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net 0/2] aqc111: Thermal throttling feature
  2018-12-12 20:08     ` Igor Russkikh
@ 2018-12-12 20:28       ` Florian Fainelli
  2018-12-12 20:38       ` Andrew Lunn
  1 sibling, 0 replies; 23+ messages in thread
From: Florian Fainelli @ 2018-12-12 20:28 UTC (permalink / raw)
  To: Igor Russkikh, davem, netdev; +Cc: Dmitry Bezrukov, andrew

On 12/12/18 12:08 PM, Igor Russkikh wrote:
> 
>>
>> The idea of having the PHY/network device as a cooling agent is
>> something valuable, but as Andrew pointed out, you need to expose this
>> as a standard HWMON device, and you need to let user-space implement the
>> appropriate thermal policy, not do that in the network driver underneath
>> the user's feet with no feedback other than link dropped, got
>> re-negotiated at a different speed. How would one be able to
>> differentiate those events from a faulty link partner for instance?
> 
>>
>> None of what you are doing here is specific to your device driver and
>> the policy of downgrading the link speed to lower the thermal budget is
>> something that is nearly universally applicable to all network
>> equipments because higher speeds just require higher power.
>>
> 
> Hi Florian,
> Partially agreed with you, but as far as I know there is no much of
> ready to use infrastructure for this to use right now?

If you use programs like thermald, I am quite positive you could script
and action which involves re-negotiation of the link at a lower speed
and that would be something applicable to a variety of network devices.

> 
> IMHO that could be a both-way solution, where short term driver patch
> will secure against hardware burn out right now, and long term hwmon
> based infrastructure could handle that on userspace level.

The short term and most effective solution would be to have the firmware
running on the device do the thermal throttling, that way, if the host
CPU is crashed/unresponsive, you can still take corrective actions. Your
response to Andrew seems to suggest this is not possible, so if we are
reaching the critical junction temperature of your chip and that in
turn, causes the enclosure to melt down, then clearly the runaway
solution is not good.

> 
> A whole separate concern is how much userspace should be involved here.
> It could be a very device specific (and therefore driver specific) logic
> on how to do device's thermal control.

My problem with your approach is people doing the same thing to each and
every one of their driver and building policy, as opposed to mechanisms
in the kernel. If the argument is "user space may not be running a
thermal solution", then clearly you need a hardware driven (or firmware
driven) approach) which works across all possible use cases, including
those where appropriate SW is not there.

If you look at how your desktop PC likely manages the fans in the
chassis, they can be SW controlled, or ACPI controlled, for the same
reasons.
-- 
Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net 0/2] aqc111: Thermal throttling feature
  2018-12-12 20:08     ` Igor Russkikh
  2018-12-12 20:28       ` Florian Fainelli
@ 2018-12-12 20:38       ` Andrew Lunn
  2018-12-13  0:43         ` David Miller
  1 sibling, 1 reply; 23+ messages in thread
From: Andrew Lunn @ 2018-12-12 20:38 UTC (permalink / raw)
  To: Igor Russkikh; +Cc: Florian Fainelli, davem, netdev, Dmitry Bezrukov

> A whole separate concern is how much userspace should be involved here.
> It could be a very device specific (and therefore driver specific) logic
> on how to do device's thermal control.

Hi Igor

Well, if you fully expose the PHY to Linux using a PHY driver, it
would not be device specific at all. The PHY layer knows how to ask
the PHY to drop to lower speeds. The Marvell PHYs with their
temperature sensors could also use this core code.

You are running into trouble because you want to both to hide the PHY
from Linux, but also have Linux control the PHY to avoid it melting.
This is why i actually think you should be doing this in firmware.

     Andrew

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net 0/2] aqc111: Thermal throttling feature
  2018-12-12 13:50 [PATCH net 0/2] aqc111: Thermal throttling feature Igor Russkikh
                   ` (2 preceding siblings ...)
  2018-12-12 13:54 ` [PATCH net 0/2] aqc111: Thermal " Igor Russkikh
@ 2018-12-13  0:18 ` David Miller
  3 siblings, 0 replies; 23+ messages in thread
From: David Miller @ 2018-12-13  0:18 UTC (permalink / raw)
  To: Igor.Russkikh; +Cc: linux-usb, netdev, Dmitry.Bezrukov

From: Igor Russkikh <Igor.Russkikh@aquantia.com>
Date: Wed, 12 Dec 2018 13:50:06 +0000

> This patches introduce the thermal throttling feature to prevent possible
> heat damage to the hardware.

I see what seems to be a bit of a conflict here, maybe you can explain
the situation better to me.

Andrew suggested that the firmware manage the thermal aspects of the
PHY since it manages all other aspects of the PHY too.

And then the feedback was that the firmware folks don't want to do
that right now.

But then it was also stated that the long term goal is to support
what Andrew asked for, firmware update in the driver and updated
firmwares submitted to linux-firmware.

If the firwmare will eventually have support for thermal management
added, then the code in this series is going to be not used and
just taking up space.

Please explain how all of this is going to fit together, and how we'll
not end up with having to keep this thermal code around forever.

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net 0/2] aqc111: Thermal throttling feature
  2018-12-12 20:38       ` Andrew Lunn
@ 2018-12-13  0:43         ` David Miller
  2018-12-14 11:43           ` Igor Russkikh
  0 siblings, 1 reply; 23+ messages in thread
From: David Miller @ 2018-12-13  0:43 UTC (permalink / raw)
  To: andrew; +Cc: Igor.Russkikh, f.fainelli, netdev, Dmitry.Bezrukov

From: Andrew Lunn <andrew@lunn.ch>
Date: Wed, 12 Dec 2018 21:38:46 +0100

> You are running into trouble because you want to both to hide the PHY
> from Linux, but also have Linux control the PHY to avoid it melting.
> This is why i actually think you should be doing this in firmware.

I completely agree with Andrew.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net 0/2] aqc111: Thermal throttling feature
  2018-12-13  0:43         ` David Miller
@ 2018-12-14 11:43           ` Igor Russkikh
  0 siblings, 0 replies; 23+ messages in thread
From: Igor Russkikh @ 2018-12-14 11:43 UTC (permalink / raw)
  To: David Miller, andrew; +Cc: f.fainelli, netdev, Dmitry Bezrukov



On 13.12.2018 3:43, David Miller wrote:
> From: Andrew Lunn <andrew@lunn.ch>
> Date: Wed, 12 Dec 2018 21:38:46 +0100
> 
>> You are running into trouble because you want to both to hide the PHY
>> from Linux, but also have Linux control the PHY to avoid it melting.
>> This is why i actually think you should be doing this in firmware.
> 
> I completely agree with Andrew.
> 

Hi David, all,

Thank you all for your feedback. I basically agree with your arguments - 
at this stage it's really a better idea to put such a logic into HW/FW.

I've passed on that info to our FW team.

We are still relatively safe because I've got an info the FW already has
a thermal safety trigger. Now it just powers off the chip to eliminate the risk
of burnout.
In addition to that they've agreed to integrate the same hysteresis speeddown logic into FW.

I will resubmit the patch, but without the throttling logic, just
hwmon temperature sensor interface.

Regards,
  Igor

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-12-14 11:43 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12 13:50 [PATCH net 0/2] aqc111: Thermal throttling feature Igor Russkikh
2018-12-12 13:50 ` [PATCH net 1/2] net: usb: aqc111: Add read_mdio operation Igor Russkikh
2018-12-12 13:50   ` [net,1/2] " Igor Russkikh
2018-12-12 16:11   ` [PATCH net 1/2] " Andrew Lunn
2018-12-12 16:11     ` [net,1/2] " Andrew Lunn
2018-12-12 19:38     ` [PATCH net 1/2] " Igor Russkikh
2018-12-12 19:38       ` [net,1/2] " Igor Russkikh
2018-12-12 13:50 ` [PATCH net 2/2] net: usb: aqc111: Support for thermal throttling feature Igor Russkikh
2018-12-12 13:50   ` [net,2/2] " Igor Russkikh
2018-12-12 16:08   ` [PATCH net 2/2] " Andrew Lunn
2018-12-12 16:08     ` [net,2/2] " Andrew Lunn
2018-12-12 19:34     ` [PATCH net 2/2] " Igor Russkikh
2018-12-12 19:34       ` [net,2/2] " Igor Russkikh
2018-12-12 20:28       ` [PATCH net 2/2] " Andrew Lunn
2018-12-12 20:28         ` [net,2/2] " Andrew Lunn
2018-12-12 13:54 ` [PATCH net 0/2] aqc111: Thermal " Igor Russkikh
2018-12-12 18:15   ` Florian Fainelli
2018-12-12 20:08     ` Igor Russkikh
2018-12-12 20:28       ` Florian Fainelli
2018-12-12 20:38       ` Andrew Lunn
2018-12-13  0:43         ` David Miller
2018-12-14 11:43           ` Igor Russkikh
2018-12-13  0:18 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.