* Issues with mt7915 thermal throttling @ 2022-01-18 9:08 Nicolas Cavallari 2022-02-07 17:37 ` [PATCH RFC 1/3] mt76: mt7915e: Fix degraded performance after temporary overheat Nicolas Cavallari 0 siblings, 1 reply; 4+ messages in thread From: Nicolas Cavallari @ 2022-01-18 9:08 UTC (permalink / raw) To: linux-wireless; +Cc: Ryder Lee, Lorenzo Bianconi, Felix Fietkau I noticed some strange issues with mt7915's thermal throttling, particularly the cooling_device. First, it seems that the thermal subsystem expect that higher cooling_device states provide more cooling, but mt7915e apparently does the opposite and use it as a duty cycle, where state=1 does severe throttling/max cooling (iperf throughput basically go down to near zero) and state=100 is full power. Also, state=0, from the comments, apparently disable thermal management, except that in practice, it does not change the throttle state, since throughput stays low when switching from state=1 to state=0, and stays high when switching from 100 to 0. As a result, as soon as the default thermal zone runs a little hot, the performance of mt7915e is destroyed and does not recover much when the temperature drops down. I can come up with a patch to fix the first issue, but not the state=0 one, and i would like some pointers/confirmation. ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH RFC 1/3] mt76: mt7915e: Fix degraded performance after temporary overheat 2022-01-18 9:08 Issues with mt7915 thermal throttling Nicolas Cavallari @ 2022-02-07 17:37 ` Nicolas Cavallari 2022-02-07 17:37 ` [PATCH RFC 2/3] mt76: mt7915e: Add a hwmon attribute to get the actual throttle state Nicolas Cavallari 2022-02-07 17:37 ` [PATCH RFC 3/3] mt76: mt7915e: Enable thermal management by default Nicolas Cavallari 0 siblings, 2 replies; 4+ messages in thread From: Nicolas Cavallari @ 2022-02-07 17:37 UTC (permalink / raw) To: Felix Fietkau, Lorenzo Bianconi, Ryder Lee, Shayne Chen, Sean Wang Cc: Kalle Valo, linux-wireless mt7915e registers a cooling_device with wrong semantics: 1. cooling_device expect that higher states values should cool more, but mt7915e did the opposite... with the exception of state == 0, which should "disable thermal management", but does not seem to have any effect since the previous state is kept. The result is that when the thermal zone heats up a bit and bumps the cooling_device state from 0 to 1 to cool a bit, the performance is destroyed, and when going back from 1 to 0, the performance stays bad. 2. Reading the cooling_device state does not always return the last written state, but can return the actual hardware throttle state, which is different. This is a problem because the mt7915 firmware actually implement the equivalent of a thermal zone with trip points. Setting the cooling device state actually changes the throttles at each trip point, so the following could occur if the first issue is fixed: - thermal subsystem set state to 100% power (state=0) - mt7915e driver set trip throttles to [100%, 50%, 25%, 12%] - hardware heats up and decides to switch to 50% power - thermal subsystem see that power is 50% (state=50), decide to increase it to 60% (state=40) because the rest of the system is cool. - mt7915e driver set trip throttle to [60%, 30%, 15%, 7%] - hardware thus switches to 30% power [race to the bottom continues...] This patch corrects the semantics of the cooling_device to the one that the thermal subsystem expect it. Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr> --- drivers/net/wireless/mediatek/mt76/mt7915/init.c | 16 ++++++++++------ .../net/wireless/mediatek/mt76/mt7915/mt7915.h | 2 ++ 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/init.c b/drivers/net/wireless/mediatek/mt76/mt7915/init.c index 2bc9097c5214..d6efbf1a2724 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7915/init.c +++ b/drivers/net/wireless/mediatek/mt76/mt7915/init.c @@ -97,7 +97,7 @@ static int mt7915_thermal_get_max_throttle_state(struct thermal_cooling_device *cdev, unsigned long *state) { - *state = MT7915_THERMAL_THROTTLE_MAX; + *state = MT7915_CDEV_THROTTLE_MAX; return 0; } @@ -108,7 +108,7 @@ mt7915_thermal_get_cur_throttle_state(struct thermal_cooling_device *cdev, { struct mt7915_phy *phy = cdev->devdata; - *state = phy->throttle_state; + *state = phy->cdev_state; return 0; } @@ -120,20 +120,24 @@ mt7915_thermal_set_cur_throttle_state(struct thermal_cooling_device *cdev, struct mt7915_phy *phy = cdev->devdata; int ret; - if (state > MT7915_THERMAL_THROTTLE_MAX) + if (state > MT7915_CDEV_THROTTLE_MAX) return -EINVAL; if (phy->throttle_temp[0] > phy->throttle_temp[1]) return 0; - if (state == phy->throttle_state) + if (state == phy->cdev_state) return 0; - ret = mt7915_mcu_set_thermal_throttling(phy, state); + // cooling_device convention: 0 = no cooling, more = more cooling + // mcu convention: 1 = max cooling, more = less cooling + ret = mt7915_mcu_set_thermal_throttling(phy, + MT7915_THERMAL_THROTTLE_MAX + - state); if (ret) return ret; - phy->throttle_state = state; + phy->cdev_state = state; return 0; } diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/mt7915.h b/drivers/net/wireless/mediatek/mt76/mt7915/mt7915.h index 0403912a521d..cf4c8d2dcc60 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7915/mt7915.h +++ b/drivers/net/wireless/mediatek/mt76/mt7915/mt7915.h @@ -49,6 +49,7 @@ #define MT7915_CFEND_RATE_11B 0x03 /* 11B LP, 11M */ #define MT7915_THERMAL_THROTTLE_MAX 100 +#define MT7915_CDEV_THROTTLE_MAX 99 #define MT7915_SKU_RATE_NUM 161 @@ -218,6 +219,7 @@ struct mt7915_phy { struct ieee80211_vif *monitor_vif; struct thermal_cooling_device *cdev; + u8 cdev_state; u8 throttle_state; u32 throttle_temp[2]; /* 0: critical high, 1: maximum */ -- 2.34.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH RFC 2/3] mt76: mt7915e: Add a hwmon attribute to get the actual throttle state. 2022-02-07 17:37 ` [PATCH RFC 1/3] mt76: mt7915e: Fix degraded performance after temporary overheat Nicolas Cavallari @ 2022-02-07 17:37 ` Nicolas Cavallari 2022-02-07 17:37 ` [PATCH RFC 3/3] mt76: mt7915e: Enable thermal management by default Nicolas Cavallari 1 sibling, 0 replies; 4+ messages in thread From: Nicolas Cavallari @ 2022-02-07 17:37 UTC (permalink / raw) To: Felix Fietkau, Lorenzo Bianconi, Ryder Lee, Shayne Chen, Sean Wang Cc: Kalle Valo, linux-wireless The firmware-controlled actual throttle state was previously available by reading the cooling_device, but this confused the thermal subsystem. Add a hwmon attribute to get it instead. Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr> --- .../net/wireless/mediatek/mt76/mt7915/init.c | 27 ++++++++++++------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/init.c b/drivers/net/wireless/mediatek/mt76/mt7915/init.c index d6efbf1a2724..71ca4b0641b6 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7915/init.c +++ b/drivers/net/wireless/mediatek/mt76/mt7915/init.c @@ -50,15 +50,22 @@ static ssize_t mt7915_thermal_temp_show(struct device *dev, int i = to_sensor_dev_attr(attr)->index; int temperature; - if (i) - return sprintf(buf, "%u\n", phy->throttle_temp[i - 1] * 1000); - - temperature = mt7915_mcu_get_temperature(phy); - if (temperature < 0) - return temperature; - - /* display in millidegree celcius */ - return sprintf(buf, "%u\n", temperature * 1000); + switch (i) { + case 0: + temperature = mt7915_mcu_get_temperature(phy); + if (temperature < 0) + return temperature; + /* display in millidegree celcius */ + return sprintf(buf, "%u\n", temperature * 1000); + case 1: + case 2: + return sprintf(buf, "%u\n", + phy->throttle_temp[i - 1] * 1000); + case 3: + return sprintf(buf, "%hhu\n", phy->throttle_state); + default: + return -EINVAL; + } } static ssize_t mt7915_thermal_temp_store(struct device *dev, @@ -84,11 +91,13 @@ static ssize_t mt7915_thermal_temp_store(struct device *dev, static SENSOR_DEVICE_ATTR_RO(temp1_input, mt7915_thermal_temp, 0); static SENSOR_DEVICE_ATTR_RW(temp1_crit, mt7915_thermal_temp, 1); static SENSOR_DEVICE_ATTR_RW(temp1_max, mt7915_thermal_temp, 2); +static SENSOR_DEVICE_ATTR_RO(throttle1, mt7915_thermal_temp, 3); static struct attribute *mt7915_hwmon_attrs[] = { &sensor_dev_attr_temp1_input.dev_attr.attr, &sensor_dev_attr_temp1_crit.dev_attr.attr, &sensor_dev_attr_temp1_max.dev_attr.attr, + &sensor_dev_attr_throttle1.dev_attr.attr, NULL, }; ATTRIBUTE_GROUPS(mt7915_hwmon); -- 2.34.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH RFC 3/3] mt76: mt7915e: Enable thermal management by default 2022-02-07 17:37 ` [PATCH RFC 1/3] mt76: mt7915e: Fix degraded performance after temporary overheat Nicolas Cavallari 2022-02-07 17:37 ` [PATCH RFC 2/3] mt76: mt7915e: Add a hwmon attribute to get the actual throttle state Nicolas Cavallari @ 2022-02-07 17:37 ` Nicolas Cavallari 1 sibling, 0 replies; 4+ messages in thread From: Nicolas Cavallari @ 2022-02-07 17:37 UTC (permalink / raw) To: Felix Fietkau, Lorenzo Bianconi, Ryder Lee, Shayne Chen, Sean Wang Cc: Kalle Valo, linux-wireless By default, mt7915e does not enable thermal management until the default thermal zone does not reach a trip point. If the rest of the system have better cooling than the wireless hardware, then it is possible that the wireless chip can overheat while the rest of the system is fine. Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr> --- drivers/net/wireless/mediatek/mt76/mt7915/init.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/init.c b/drivers/net/wireless/mediatek/mt76/mt7915/init.c index 71ca4b0641b6..4fa5da9195a2 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7915/init.c +++ b/drivers/net/wireless/mediatek/mt76/mt7915/init.c @@ -199,7 +199,8 @@ static int mt7915_thermal_init(struct mt7915_phy *phy) phy->throttle_temp[0] = 110; phy->throttle_temp[1] = 120; - return 0; + return mt7915_mcu_set_thermal_throttling(phy, + MT7915_THERMAL_THROTTLE_MAX); } static void mt7915_led_set_config(struct led_classdev *led_cdev, -- 2.34.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-02-07 17:53 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-01-18 9:08 Issues with mt7915 thermal throttling Nicolas Cavallari 2022-02-07 17:37 ` [PATCH RFC 1/3] mt76: mt7915e: Fix degraded performance after temporary overheat Nicolas Cavallari 2022-02-07 17:37 ` [PATCH RFC 2/3] mt76: mt7915e: Add a hwmon attribute to get the actual throttle state Nicolas Cavallari 2022-02-07 17:37 ` [PATCH RFC 3/3] mt76: mt7915e: Enable thermal management by default Nicolas Cavallari
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).