All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Add driver for NVIDIA Tegra30 SoC Thermal sensor
@ 2021-05-29 17:09 Dmitry Osipenko
  2021-05-29 17:09 ` [PATCH v3 1/7] dt-bindings: thermal: Add binding for Tegra30 thermal sensor Dmitry Osipenko
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2021-05-29 17:09 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Zhang Rui, Daniel Lezcano,
	Amit Kucheria, Andreas Westman Dorcsak, Maxim Schwalm,
	Svyatoslav Ryhel, Ihor Didenko, Ion Agorria, Matt Merhar,
	Peter Geis
  Cc: devicetree, linux-kernel, linux-tegra, linux-pm

Hi,

This series adds support for the thermal sensor that is found on NVIDIA
Tegra30 SoC. Sensor monitors temperature and voltage of the SoC, it also
emits signals to the power management and clock controllers that are
performing the emergency shut down and the CPU frequency throttling
when a pre-programmed temperature levels are reached.

Please note that this series is made on top of ACMTON patches! [1].
Otherwise tegra30.dtsi will fail to compile.

[1] https://patchwork.ozlabs.org/project/linux-tegra/list/?series=243115

Changelog:

v3: - No code changes. CC'ed linux-pm, which was previously missed by accident.
      Not sure how much that is important for the thermal patches, but won't
      hurt to re-send since only DT binding was reviewed so far.

v2: - Made a very minor improvement to one error message, it now prints
      number of channel at which error occurred.

    - Added r-b from Rob Herring to the binding.

Dmitry Osipenko (7):
  dt-bindings: thermal: Add binding for Tegra30 thermal sensor
  thermal: thermal_of: Stop zone device before unregistering it
  thermal/core: Export thermal_cooling_device_stats_update()
  thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  ARM: tegra_defconfig: Enable CONFIG_TEGRA30_TSENSOR
  ARM: multi_v7_defconfig: Enable CONFIG_TEGRA30_TSENSOR
  ARM: tegra: Add SoC thermal sensor to Tegra30 device-trees

 .../thermal/nvidia,tegra30-tsensor.yaml       |  78 ++
 arch/arm/boot/dts/tegra30-ouya.dts            |  16 +
 arch/arm/boot/dts/tegra30.dtsi                |  93 ++-
 arch/arm/configs/multi_v7_defconfig           |   1 +
 arch/arm/configs/tegra_defconfig              |   1 +
 drivers/thermal/tegra/Kconfig                 |   7 +
 drivers/thermal/tegra/Makefile                |   1 +
 drivers/thermal/tegra/tegra30-tsensor.c       | 736 ++++++++++++++++++
 drivers/thermal/thermal_of.c                  |   3 +
 drivers/thermal/thermal_sysfs.c               |   1 +
 10 files changed, 933 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/thermal/nvidia,tegra30-tsensor.yaml
 create mode 100644 drivers/thermal/tegra/tegra30-tsensor.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/7] dt-bindings: thermal: Add binding for Tegra30 thermal sensor
  2021-05-29 17:09 [PATCH v3 0/7] Add driver for NVIDIA Tegra30 SoC Thermal sensor Dmitry Osipenko
@ 2021-05-29 17:09 ` Dmitry Osipenko
  2021-05-29 17:09 ` [PATCH v3 2/7] thermal: thermal_of: Stop zone device before unregistering it Dmitry Osipenko
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2021-05-29 17:09 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Zhang Rui, Daniel Lezcano,
	Amit Kucheria, Andreas Westman Dorcsak, Maxim Schwalm,
	Svyatoslav Ryhel, Ihor Didenko, Ion Agorria, Matt Merhar,
	Peter Geis
  Cc: devicetree, linux-kernel, linux-tegra, linux-pm

All NVIDIA Tegra30 SoCs have on-chip sensors which monitor temperature
and voltage of the SoC. Sensors also controls CPU x2 freq throttle and
emits emergency shutdown signal. TSENSOR has has two separate channels
for each sensor placed in a different parts of the SoC. Add binding for
the sensor hardware.

Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 .../thermal/nvidia,tegra30-tsensor.yaml       | 78 +++++++++++++++++++
 1 file changed, 78 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/thermal/nvidia,tegra30-tsensor.yaml

diff --git a/Documentation/devicetree/bindings/thermal/nvidia,tegra30-tsensor.yaml b/Documentation/devicetree/bindings/thermal/nvidia,tegra30-tsensor.yaml
new file mode 100644
index 000000000000..6182090d313c
--- /dev/null
+++ b/Documentation/devicetree/bindings/thermal/nvidia,tegra30-tsensor.yaml
@@ -0,0 +1,78 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/thermal/nvidia,tegra30-tsensor.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: NVIDIA Tegra30 Thermal Sensor
+
+maintainers:
+  - Dmitry Osipenko <digetx@gmail.com>
+  - Jon Hunter <jonathanh@nvidia.com>
+  - Thierry Reding <thierry.reding@gmail.com>
+
+description: |
+  TSENSOR provides thermal and voltage sensors which monitor temperature
+  and voltage of the chip. Sensors are placed across the die to gauge the
+  temperature of the whole chip. The TSENSOR module:
+
+    Generates an interrupt to SW to lower temperature via DVFS on reaching
+    a certain thermal/voltage threshold.
+
+    Generates a signal to the CAR to reduce CPU frequency by half on reaching
+    a certain thermal/voltage threshold.
+
+    Generates a signal to the PMC when the temperature reaches dangerously high
+    levels to reset the chip and sets a flag in the PMC.
+
+  TSENSOR has two channels which monitor two different spots of the SoC.
+
+properties:
+  compatible:
+    const: nvidia,tegra30-tsensor
+
+  reg:
+    maxItems: 1
+
+  clocks:
+    maxItems: 1
+
+  resets:
+    maxItems: 1
+
+  interrupts:
+    maxItems: 1
+
+  "#thermal-sensor-cells":
+    const: 1
+
+  "#cooling-cells":
+    const: 2
+
+  assigned-clock-parents: true
+  assigned-clock-rates: true
+  assigned-clocks: true
+
+required:
+  - compatible
+  - reg
+  - clocks
+  - resets
+  - interrupts
+  - "#thermal-sensor-cells"
+  - "#cooling-cells"
+
+additionalProperties: false
+
+examples:
+  - |
+    thermal-sensor@70014000 {
+      compatible = "nvidia,tegra30-tsensor";
+      reg = <0x70014000 0x500>;
+      interrupts = <0 102 4>;
+      clocks = <&clk 100>;
+      resets = <&rst 100>;
+
+      #thermal-sensor-cells = <1>;
+      #cooling-cells = <2>;
+    };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 2/7] thermal: thermal_of: Stop zone device before unregistering it
  2021-05-29 17:09 [PATCH v3 0/7] Add driver for NVIDIA Tegra30 SoC Thermal sensor Dmitry Osipenko
  2021-05-29 17:09 ` [PATCH v3 1/7] dt-bindings: thermal: Add binding for Tegra30 thermal sensor Dmitry Osipenko
@ 2021-05-29 17:09 ` Dmitry Osipenko
  2021-05-29 17:09 ` [PATCH v3 3/7] thermal/core: Export thermal_cooling_device_stats_update() Dmitry Osipenko
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2021-05-29 17:09 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Zhang Rui, Daniel Lezcano,
	Amit Kucheria, Andreas Westman Dorcsak, Maxim Schwalm,
	Svyatoslav Ryhel, Ihor Didenko, Ion Agorria, Matt Merhar,
	Peter Geis
  Cc: devicetree, linux-kernel, linux-tegra, linux-pm

Zone device is enabled after thermal_zone_of_sensor_register() completion,
but it's not disabled before senors if unregistered, leaving temperature
polling active. This results in accessing a disabled zone device and
produces a warning about this problem. Stop zone device before
unregistering it in order to fix this "use-after-free" problem.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/thermal/thermal_of.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index 5b76f9a1280d..6379f26a335f 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -559,6 +559,9 @@ void thermal_zone_of_sensor_unregister(struct device *dev,
 	if (!tz)
 		return;
 
+	/* stop temperature polling */
+	thermal_zone_device_disable(tzd);
+
 	mutex_lock(&tzd->lock);
 	tzd->ops->get_temp = NULL;
 	tzd->ops->get_trend = NULL;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 3/7] thermal/core: Export thermal_cooling_device_stats_update()
  2021-05-29 17:09 [PATCH v3 0/7] Add driver for NVIDIA Tegra30 SoC Thermal sensor Dmitry Osipenko
  2021-05-29 17:09 ` [PATCH v3 1/7] dt-bindings: thermal: Add binding for Tegra30 thermal sensor Dmitry Osipenko
  2021-05-29 17:09 ` [PATCH v3 2/7] thermal: thermal_of: Stop zone device before unregistering it Dmitry Osipenko
@ 2021-05-29 17:09 ` Dmitry Osipenko
  2021-05-29 17:09 ` [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor Dmitry Osipenko
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2021-05-29 17:09 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Zhang Rui, Daniel Lezcano,
	Amit Kucheria, Andreas Westman Dorcsak, Maxim Schwalm,
	Svyatoslav Ryhel, Ihor Didenko, Ion Agorria, Matt Merhar,
	Peter Geis
  Cc: devicetree, linux-kernel, linux-tegra, linux-pm

NVIDIA Tegra30 thermal sensor driver has a hardware-controlled CPU cooling
feature that halves CPU frequency once a specified trip point is breached.
In order to account the hardware state transitions, which are reported by
interrupt, the sensor driver needs to report the cooling state transition
and this is done by thermal_cooling_device_stats_update(). The sensor
driver could be compiled as a loadable driver module, but this API
function isn't exported, hence export it.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/thermal/thermal_sysfs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index 1c4aac8464a7..ab373280f853 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -690,6 +690,7 @@ void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
 unlock:
 	spin_unlock(&stats->lock);
 }
+EXPORT_SYMBOL_GPL(thermal_cooling_device_stats_update);
 
 static ssize_t total_trans_show(struct device *dev,
 				struct device_attribute *attr, char *buf)
-- 
2.30.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  2021-05-29 17:09 [PATCH v3 0/7] Add driver for NVIDIA Tegra30 SoC Thermal sensor Dmitry Osipenko
                   ` (2 preceding siblings ...)
  2021-05-29 17:09 ` [PATCH v3 3/7] thermal/core: Export thermal_cooling_device_stats_update() Dmitry Osipenko
@ 2021-05-29 17:09 ` Dmitry Osipenko
  2021-05-31 12:51   ` Thierry Reding
  2021-06-15 10:03   ` Daniel Lezcano
  2021-05-29 17:09 ` [PATCH v3 5/7] ARM: tegra_defconfig: Enable CONFIG_TEGRA30_TSENSOR Dmitry Osipenko
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2021-05-29 17:09 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Zhang Rui, Daniel Lezcano,
	Amit Kucheria, Andreas Westman Dorcsak, Maxim Schwalm,
	Svyatoslav Ryhel, Ihor Didenko, Ion Agorria, Matt Merhar,
	Peter Geis
  Cc: devicetree, linux-kernel, linux-tegra, linux-pm

All NVIDIA Tegra30 SoCs have a two-channel on-chip sensor unit which
monitors temperature and voltage of the SoC. Sensors control CPU frequency
throttling, which is activated by hardware once preprogrammed temperature
level is breached, they also send signal to Power Management controller to
perform emergency shutdown on a critical overheat of the SoC die. Add
driver for the Tegra30 TSENSOR module, exposing it as a thermal sensor
and a cooling device.

Tested-by: Andreas Westman Dorcsak <hedmoo@yahoo.com> # Asus TF700T
Tested-by: Maxim Schwalm <maxim.schwalm@gmail.com> # Asus TF700T
Tested-by: Svyatoslav Ryhel <clamor95@gmail.com> # Asus TF201T
Tested-by: Ihor Didenko <tailormoon@rambler.ru> # Asus TF300T
Tested-by: Ion Agorria <ion@agorria.com> # Asus TF201T
Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya
Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/thermal/tegra/Kconfig           |   7 +
 drivers/thermal/tegra/Makefile          |   1 +
 drivers/thermal/tegra/tegra30-tsensor.c | 736 ++++++++++++++++++++++++
 3 files changed, 744 insertions(+)
 create mode 100644 drivers/thermal/tegra/tegra30-tsensor.c

diff --git a/drivers/thermal/tegra/Kconfig b/drivers/thermal/tegra/Kconfig
index 46c2215867cd..019e3a2eb69e 100644
--- a/drivers/thermal/tegra/Kconfig
+++ b/drivers/thermal/tegra/Kconfig
@@ -18,4 +18,11 @@ config TEGRA_BPMP_THERMAL
 	  Enable this option for support for sensing system temperature of NVIDIA
 	  Tegra systems-on-chip with the BPMP coprocessor (Tegra186).
 
+config TEGRA30_TSENSOR
+	tristate "Tegra30 Thermal Sensor"
+	depends on ARCH_TEGRA_3x_SOC || COMPILE_TEST
+	help
+	  Enable this option to support thermal management of NVIDIA Tegra30
+	  system-on-chip.
+
 endmenu
diff --git a/drivers/thermal/tegra/Makefile b/drivers/thermal/tegra/Makefile
index 0f2b66edf0d2..eb27d194c583 100644
--- a/drivers/thermal/tegra/Makefile
+++ b/drivers/thermal/tegra/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_TEGRA_SOCTHERM)		+= tegra-soctherm.o
 obj-$(CONFIG_TEGRA_BPMP_THERMAL)	+= tegra-bpmp-thermal.o
+obj-$(CONFIG_TEGRA30_TSENSOR)		+= tegra30-tsensor.o
 
 tegra-soctherm-y				:= soctherm.o soctherm-fuse.o
 tegra-soctherm-$(CONFIG_ARCH_TEGRA_124_SOC)	+= tegra124-soctherm.o
diff --git a/drivers/thermal/tegra/tegra30-tsensor.c b/drivers/thermal/tegra/tegra30-tsensor.c
new file mode 100644
index 000000000000..f7562eac01d6
--- /dev/null
+++ b/drivers/thermal/tegra/tegra30-tsensor.c
@@ -0,0 +1,736 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Tegra30 SoC Thermal Sensor driver
+ *
+ * Based on downstream HWMON driver from NVIDIA.
+ * Copyright (C) 2011 NVIDIA Corporation
+ *
+ * Author: Dmitry Osipenko <digetx@gmail.com>
+ * Copyright (C) 2021 GRATE-DRIVER project
+ */
+
+#include <linux/bitfield.h>
+#include <linux/clk.h>
+#include <linux/delay.h>
+#include <linux/errno.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/math.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include <linux/platform_device.h>
+#include <linux/pm.h>
+#include <linux/reset.h>
+#include <linux/slab.h>
+#include <linux/thermal.h>
+#include <linux/types.h>
+
+#include <soc/tegra/fuse.h>
+
+#include "../thermal_core.h"
+#include "../thermal_hwmon.h"
+
+#define TSENSOR_SENSOR0_CONFIG0				0x0
+#define TSENSOR_SENSOR0_CONFIG0_SENSOR_STOP		BIT(0)
+#define TSENSOR_SENSOR0_CONFIG0_HW_FREQ_DIV_EN		BIT(1)
+#define TSENSOR_SENSOR0_CONFIG0_THERMAL_RST_EN		BIT(2)
+#define TSENSOR_SENSOR0_CONFIG0_DVFS_EN			BIT(3)
+#define TSENSOR_SENSOR0_CONFIG0_INTR_OVERFLOW_EN	BIT(4)
+#define TSENSOR_SENSOR0_CONFIG0_INTR_HW_FREQ_DIV_EN	BIT(5)
+#define TSENSOR_SENSOR0_CONFIG0_INTR_THERMAL_RST_EN	BIT(6)
+#define TSENSOR_SENSOR0_CONFIG0_M			GENMASK(23,  8)
+#define TSENSOR_SENSOR0_CONFIG0_N			GENMASK(31, 24)
+
+#define TSENSOR_SENSOR0_CONFIG1				0x8
+#define TSENSOR_SENSOR0_CONFIG1_TH1			GENMASK(15,  0)
+#define TSENSOR_SENSOR0_CONFIG1_TH2			GENMASK(31, 16)
+
+#define TSENSOR_SENSOR0_CONFIG2				0xc
+#define TSENSOR_SENSOR0_CONFIG2_TH3			GENMASK(15,  0)
+
+#define TSENSOR_SENSOR0_STATUS0				0x18
+#define TSENSOR_SENSOR0_STATUS0_STATE			GENMASK(2, 0)
+#define TSENSOR_SENSOR0_STATUS0_INTR			BIT(8)
+#define TSENSOR_SENSOR0_STATUS0_CURRENT_VALID		BIT(9)
+
+#define TSENSOR_SENSOR0_TS_STATUS1			0x1c
+#define TSENSOR_SENSOR0_TS_STATUS1_CURRENT_COUNT	GENMASK(31, 16)
+
+#define TEGRA30_FUSE_TEST_PROG_VER			0x28
+
+#define TEGRA30_FUSE_TSENSOR_CALIB			0x98
+#define TEGRA30_FUSE_TSENSOR_CALIB_LOW			GENMASK(15,  0)
+#define TEGRA30_FUSE_TSENSOR_CALIB_HIGH			GENMASK(31, 16)
+
+#define TEGRA30_FUSE_SPARE_BIT				0x144
+
+struct tegra_tsensor;
+
+struct tegra_tsensor_calibration_data {
+	int a, b, m, n, p, r;
+};
+
+struct tegra_tsensor_channel {
+	void __iomem *regs;
+	unsigned int id;
+	struct tegra_tsensor *ts;
+	struct thermal_zone_device *tzd;
+};
+
+struct tegra_tsensor {
+	void __iomem *regs;
+	bool swap_channels;
+	struct clk *clk;
+	struct device *dev;
+	struct reset_control *rst;
+	struct tegra_tsensor_channel ch[2];
+	struct thermal_cooling_device *cdev;
+	struct tegra_tsensor_calibration_data calib;
+};
+
+static int tegra_tsensor_hw_enable(const struct tegra_tsensor *ts)
+{
+	u32 val;
+	int err;
+
+	err = reset_control_assert(ts->rst);
+	if (err) {
+		dev_err(ts->dev, "failed to assert hardware reset: %d\n", err);
+		return err;
+	}
+
+	err = clk_prepare_enable(ts->clk);
+	if (err) {
+		dev_err(ts->dev, "failed to enable clock: %d\n", err);
+		return err;
+	}
+
+	fsleep(1000);
+
+	err = reset_control_deassert(ts->rst);
+	if (err) {
+		dev_err(ts->dev, "failed to deassert hardware reset: %d\n", err);
+		goto disable_clk;
+	}
+
+	/*
+	 * Sensors are enabled after reset by default, but not gauging
+	 * until clock counter is programmed.
+	 *
+	 * M: number of reference clock pulses after which every
+	 *    temperature / voltage measurement is made
+	 *
+	 * N: number of reference clock counts for which the counter runs
+	 */
+	val  = FIELD_PREP(TSENSOR_SENSOR0_CONFIG0_M, 12500);
+	val |= FIELD_PREP(TSENSOR_SENSOR0_CONFIG0_N, 255);
+
+	/* apply the same configuration to both channels */
+	writel_relaxed(val, ts->regs + 0x40 + TSENSOR_SENSOR0_CONFIG0);
+	writel_relaxed(val, ts->regs + 0x80 + TSENSOR_SENSOR0_CONFIG0);
+
+	return 0;
+
+disable_clk:
+	clk_disable_unprepare(ts->clk);
+
+	return err;
+}
+
+static int tegra_tsensor_hw_disable(const struct tegra_tsensor *ts)
+{
+	int err;
+
+	err = reset_control_assert(ts->rst);
+	if (err) {
+		dev_err(ts->dev, "failed to assert hardware reset: %d\n", err);
+		return err;
+	}
+
+	clk_disable_unprepare(ts->clk);
+
+	return 0;
+}
+
+static void devm_tegra_tsensor_hw_disable(void *data)
+{
+	const struct tegra_tsensor *ts = data;
+
+	tegra_tsensor_hw_disable(ts);
+}
+
+static int tegra_tsensor_get_temp(void *data, int *temp)
+{
+	const struct tegra_tsensor_channel *tsc = data;
+	const struct tegra_tsensor *ts = tsc->ts;
+	int err, c1, c2, c3, c4, counter;
+	u32 val;
+
+	/*
+	 * Counter will be invalid if hardware is misprogrammed or not enough
+	 * time passed since the time when sensor was enabled.
+	 */
+	err = readl_relaxed_poll_timeout(tsc->regs + TSENSOR_SENSOR0_STATUS0, val,
+					 val & TSENSOR_SENSOR0_STATUS0_CURRENT_VALID,
+					 21 * USEC_PER_MSEC,
+					 21 * USEC_PER_MSEC * 50);
+	if (err) {
+		dev_err_once(ts->dev, "ch%u: counter invalid\n", tsc->id);
+		return err;
+	}
+
+	val = readl_relaxed(tsc->regs + TSENSOR_SENSOR0_TS_STATUS1);
+	counter = FIELD_GET(TSENSOR_SENSOR0_TS_STATUS1_CURRENT_COUNT, val);
+
+	/*
+	 * This shouldn't happen with a valid counter status, nevertheless
+	 * lets verify the value since it's in a separate (from status)
+	 * register.
+	 */
+	if (counter == 0xffff) {
+		dev_err_once(ts->dev, "ch%u: counter overflow\n", tsc->id);
+		return -EINVAL;
+	}
+
+	/*
+	 * temperature = a * counter + b
+	 * temperature = m * (temperature ^ 2) + n * temperature + p
+	 */
+	c1 = DIV_ROUND_CLOSEST(ts->calib.a * counter + ts->calib.b, 1000000);
+	c1 = c1 ?: 1;
+	c2 = DIV_ROUND_CLOSEST(ts->calib.p, c1);
+	c3 = c1 * ts->calib.m;
+	c4 = ts->calib.n;
+
+	*temp = DIV_ROUND_CLOSEST(c1 * (c2 + c3 + c4), 1000);
+
+	return 0;
+}
+
+static int tegra_tsensor_temp_to_counter(const struct tegra_tsensor *ts, int temp)
+{
+	int c1, c2;
+
+	c1 = DIV_ROUND_CLOSEST(ts->calib.p - temp * 1000, ts->calib.m);
+	c2 = -ts->calib.r - int_sqrt(ts->calib.r * ts->calib.r - c1);
+
+	return DIV_ROUND_CLOSEST(c2 * 1000000 - ts->calib.b, ts->calib.a);
+}
+
+static int tegra_tsensor_set_trips(void *data, int low, int high)
+{
+	const struct tegra_tsensor_channel *tsc = data;
+	const struct tegra_tsensor *ts = tsc->ts;
+	u32 val;
+
+	/*
+	 * TSENSOR doesn't trigger interrupt on the "low" temperature breach,
+	 * hence bail out if high temperature is unspecified.
+	 */
+	if (high == INT_MAX)
+		return 0;
+
+	val = readl_relaxed(tsc->regs + TSENSOR_SENSOR0_CONFIG1);
+	val &= ~TSENSOR_SENSOR0_CONFIG1_TH1;
+
+	high = tegra_tsensor_temp_to_counter(ts, high);
+	val |= FIELD_PREP(TSENSOR_SENSOR0_CONFIG1_TH1, high);
+	writel_relaxed(val, tsc->regs + TSENSOR_SENSOR0_CONFIG1);
+
+	return 0;
+}
+
+static const struct thermal_zone_of_device_ops ops = {
+	.get_temp = tegra_tsensor_get_temp,
+	.set_trips = tegra_tsensor_set_trips,
+};
+
+static int tegra_tsensor_get_max_state(struct thermal_cooling_device *cdev,
+				       unsigned long *state)
+{
+	*state = 1;
+
+	return 0;
+}
+
+static bool tegra_tsensor_channel_div2_active(const struct tegra_tsensor *ts,
+					      unsigned int id)
+{
+	u32 val;
+
+	val = readl_relaxed(ts->ch[id].regs + TSENSOR_SENSOR0_CONFIG0);
+	if (!FIELD_GET(TSENSOR_SENSOR0_CONFIG0_HW_FREQ_DIV_EN, val))
+		return false;
+
+	val = readl_relaxed(ts->ch[id].regs + TSENSOR_SENSOR0_STATUS0);
+
+	/* CPU frequency is halved when LEVEL2 is breached */
+	return FIELD_GET(TSENSOR_SENSOR0_STATUS0_STATE, val) > 2;
+}
+
+static int tegra_tsensor_get_cur_state(struct thermal_cooling_device *cdev,
+				       unsigned long *state)
+{
+	const struct tegra_tsensor *ts = cdev->devdata;
+	unsigned int i, div2_state = 0;
+
+	for (i = 0; i < ARRAY_SIZE(ts->ch); i++)
+		div2_state |= tegra_tsensor_channel_div2_active(ts, i);
+
+	*state = div2_state;
+
+	return 0;
+}
+
+static int tegra_tsensor_set_cur_state(struct thermal_cooling_device *cdev,
+				       unsigned long state)
+{
+	/* state is controlled by hardware and can't be changed by software */
+	return -EOPNOTSUPP;
+}
+
+static const struct thermal_cooling_device_ops tegra_tsensor_cpu_cooling_ops = {
+	.get_max_state = tegra_tsensor_get_max_state,
+	.get_cur_state = tegra_tsensor_get_cur_state,
+	.set_cur_state = tegra_tsensor_set_cur_state,
+};
+
+static bool
+tegra_tsensor_handle_channel_interrupt(const struct tegra_tsensor *ts,
+				       unsigned int id)
+{
+	const struct tegra_tsensor_channel *tsc = &ts->ch[id];
+	u32 val;
+
+	val = readl_relaxed(tsc->regs + TSENSOR_SENSOR0_STATUS0);
+	writel_relaxed(val, tsc->regs + TSENSOR_SENSOR0_STATUS0);
+
+	if (FIELD_GET(TSENSOR_SENSOR0_STATUS0_STATE, val) == 5)
+		dev_err_ratelimited(ts->dev, "ch%u: counter overflowed\n", id);
+
+	if (!FIELD_GET(TSENSOR_SENSOR0_STATUS0_INTR, val))
+		return false;
+
+	thermal_zone_device_update(tsc->tzd, THERMAL_EVENT_UNSPECIFIED);
+
+	return true;
+}
+
+static irqreturn_t tegra_tsensor_isr(int irq, void *data)
+{
+	const struct tegra_tsensor *ts = data;
+	bool div2_state = false;
+	bool handled = false;
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(ts->ch); i++) {
+		div2_state |= tegra_tsensor_channel_div2_active(ts, i);
+		handled    |= tegra_tsensor_handle_channel_interrupt(ts, i);
+	}
+
+	thermal_cooling_device_stats_update(ts->cdev, div2_state);
+
+	return handled ? IRQ_HANDLED : IRQ_NONE;
+}
+
+static int tegra_tsensor_disable_hw_channel(const struct tegra_tsensor *ts,
+					    unsigned int id)
+{
+	const struct tegra_tsensor_channel *tsc = &ts->ch[id];
+	struct thermal_zone_device *tzd = tsc->tzd;
+	u32 val;
+	int err;
+
+	if (!tzd)
+		goto stop_channel;
+
+	err = thermal_zone_device_disable(tzd);
+	if (err) {
+		dev_err(ts->dev, "ch%u: failed to disable zone: %d\n", id, err);
+		return err;
+	}
+
+stop_channel:
+	/* stop channel gracefully */
+	val = readl_relaxed(tsc->regs + TSENSOR_SENSOR0_CONFIG0);
+	val |= FIELD_PREP(TSENSOR_SENSOR0_CONFIG0_SENSOR_STOP, 1);
+	writel_relaxed(val, tsc->regs + TSENSOR_SENSOR0_CONFIG0);
+
+	return 0;
+}
+
+static void tegra_tsensor_get_hw_channel_trips(struct thermal_zone_device *tzd,
+					       int *hot_trip, int *crit_trip)
+{
+	unsigned int i;
+
+	/*
+	 * 90C is the maximal critical temperature of all Tegra30 SoC variants,
+	 * use it for the default trip if unspecified in a device-tree.
+	 */
+	*hot_trip  = 85000;
+	*crit_trip = 90000;
+
+	for (i = 0; i < tzd->trips; i++) {
+		enum thermal_trip_type type;
+		int trip_temp;
+
+		tzd->ops->get_trip_temp(tzd, i, &trip_temp);
+		tzd->ops->get_trip_type(tzd, i, &type);
+
+		if (type == THERMAL_TRIP_HOT)
+			*hot_trip = trip_temp;
+
+		if (type == THERMAL_TRIP_CRITICAL)
+			*crit_trip = trip_temp;
+	}
+
+	/* clamp hardware trips to the calibration limits */
+	*hot_trip = clamp(*hot_trip, 25000, 90000);
+
+	/*
+	 * Kernel will perform a normal system shut down if it will
+	 * see that critical temperature is breached, hence set the
+	 * hardware limit by 5C higher in order to allow system to
+	 * shut down gracefully before sending signal to the Power
+	 * Management controller.
+	 */
+	*crit_trip = clamp(*crit_trip + 5000, 25000, 90000);
+}
+
+static int tegra_tsensor_enable_hw_channel(const struct tegra_tsensor *ts,
+					   unsigned int id)
+{
+	const struct tegra_tsensor_channel *tsc = &ts->ch[id];
+	struct thermal_zone_device *tzd = tsc->tzd;
+	int err, hot_trip = 0, crit_trip = 0;
+	u32 val;
+
+	if (!tzd) {
+		val = readl_relaxed(tsc->regs + TSENSOR_SENSOR0_CONFIG0);
+		val &= ~TSENSOR_SENSOR0_CONFIG0_SENSOR_STOP;
+		writel_relaxed(val, tsc->regs + TSENSOR_SENSOR0_CONFIG0);
+
+		return 0;
+	}
+
+	tegra_tsensor_get_hw_channel_trips(tzd, &hot_trip, &crit_trip);
+
+	/* prevent potential racing with tegra_tsensor_set_trips() */
+	mutex_lock(&tzd->lock);
+
+	dev_info_once(ts->dev, "ch%u: CPU freq div2 throttle trip set to %dC\n",
+		      id, DIV_ROUND_CLOSEST(hot_trip, 1000));
+
+	dev_info_once(ts->dev, "ch%u: PMC emergency shutdown trip set to %dC\n",
+		      id, DIV_ROUND_CLOSEST(crit_trip, 1000));
+
+	hot_trip  = tegra_tsensor_temp_to_counter(ts, hot_trip);
+	crit_trip = tegra_tsensor_temp_to_counter(ts, crit_trip);
+
+	/* program LEVEL2 counter threshold */
+	val = readl_relaxed(tsc->regs + TSENSOR_SENSOR0_CONFIG1);
+	val &= ~TSENSOR_SENSOR0_CONFIG1_TH2;
+	val |= FIELD_PREP(TSENSOR_SENSOR0_CONFIG1_TH2, hot_trip);
+	writel_relaxed(val, tsc->regs + TSENSOR_SENSOR0_CONFIG1);
+
+	/* program LEVEL3 counter threshold */
+	val = readl_relaxed(tsc->regs + TSENSOR_SENSOR0_CONFIG2);
+	val &= ~TSENSOR_SENSOR0_CONFIG2_TH3;
+	val |= FIELD_PREP(TSENSOR_SENSOR0_CONFIG2_TH3, crit_trip);
+	writel_relaxed(val, tsc->regs + TSENSOR_SENSOR0_CONFIG2);
+
+	/*
+	 * Enable sensor, DIV2 throttle, emergency shutdown, interrupts
+	 * for level 1/2/3 breaches and counter overflow condition.
+	 *
+	 * Thermal levels supported by hardware:
+	 *
+	 *     Level 0 = cold
+	 *     Level 1 = passive cooling (cpufreq DVFS)
+	 *     Level 2 = passive cooling assisted by hardware (DIV2)
+	 *     Level 3 = emergency shutdown assisted by hardware (PMC)
+	 */
+	val = readl_relaxed(tsc->regs + TSENSOR_SENSOR0_CONFIG0);
+	val &= ~TSENSOR_SENSOR0_CONFIG0_SENSOR_STOP;
+	val |= FIELD_PREP(TSENSOR_SENSOR0_CONFIG0_DVFS_EN, 1);
+	val |= FIELD_PREP(TSENSOR_SENSOR0_CONFIG0_HW_FREQ_DIV_EN, 1);
+	val |= FIELD_PREP(TSENSOR_SENSOR0_CONFIG0_THERMAL_RST_EN, 1);
+	val |= FIELD_PREP(TSENSOR_SENSOR0_CONFIG0_INTR_OVERFLOW_EN, 1);
+	val |= FIELD_PREP(TSENSOR_SENSOR0_CONFIG0_INTR_HW_FREQ_DIV_EN, 1);
+	val |= FIELD_PREP(TSENSOR_SENSOR0_CONFIG0_INTR_THERMAL_RST_EN, 1);
+	writel_relaxed(val, tsc->regs + TSENSOR_SENSOR0_CONFIG0);
+
+	mutex_unlock(&tzd->lock);
+
+	err = thermal_zone_device_enable(tzd);
+	if (err) {
+		dev_err(ts->dev, "ch%u: failed to enable zone: %d\n", id, err);
+		return err;
+	}
+
+	return 0;
+}
+
+static bool tegra_tsensor_fuse_read_spare(unsigned int spare)
+{
+	u32 val = 0;
+
+	tegra_fuse_readl(TEGRA30_FUSE_SPARE_BIT + spare * 4, &val);
+
+	return !!val;
+}
+
+static int tegra_tsensor_nvmem_setup(struct tegra_tsensor *ts)
+{
+	u32 i, ate_ver = 0, cal = 0, t1_25C = 0, t2_90C = 0;
+	int err, c1_25C, c2_90C;
+
+	err = tegra_fuse_readl(TEGRA30_FUSE_TEST_PROG_VER, &ate_ver);
+	if (err) {
+		dev_err_probe(ts->dev, err, "failed to get ATE version\n");
+		return err;
+	}
+
+	if (ate_ver < 8) {
+		dev_info(ts->dev, "unsupported ATE version: %u\n", ate_ver);
+		return -ENODEV;
+	}
+
+	/*
+	 * We have two TSENSOR channels in a two different spots on SoC.
+	 * Second channel provides more accurate data on older SoC versions,
+	 * use it as a primary channel.
+	 */
+	if (ate_ver <= 21) {
+		dev_info_once(ts->dev,
+			      "older ATE version detected, channels remapped\n");
+		ts->swap_channels = true;
+	}
+
+	err = tegra_fuse_readl(TEGRA30_FUSE_TSENSOR_CALIB, &cal);
+	if (err) {
+		dev_err(ts->dev, "failed to get calibration data: %d\n", err);
+		return err;
+	}
+
+	/* get calibrated counter values for 25C/90C thresholds */
+	c1_25C = FIELD_GET(TEGRA30_FUSE_TSENSOR_CALIB_LOW, cal);
+	c2_90C = FIELD_GET(TEGRA30_FUSE_TSENSOR_CALIB_HIGH, cal);
+
+	/* and calibrated temperatures corresponding to the counter values */
+	for (i = 0; i < 7; i++) {
+		t1_25C |= tegra_tsensor_fuse_read_spare(14 + i) << i;
+		t1_25C |= tegra_tsensor_fuse_read_spare(21 + i) << i;
+
+		t2_90C |= tegra_tsensor_fuse_read_spare(0 + i) << i;
+		t2_90C |= tegra_tsensor_fuse_read_spare(7 + i) << i;
+	}
+
+	if (c2_90C - c1_25C <= t2_90C - t1_25C) {
+		dev_err(ts->dev, "invalid calibration data: %d %d %u %u\n",
+			c2_90C, c1_25C, t2_90C, t1_25C);
+		return -EINVAL;
+	}
+
+	/* all calibration coefficients are premultiplied by 1000000 */
+
+	ts->calib.a = DIV_ROUND_CLOSEST((t2_90C - t1_25C) * 1000000,
+					(c2_90C - c1_25C));
+
+	ts->calib.b = t1_25C * 1000000 - ts->calib.a * c1_25C;
+
+	if (tegra_sku_info.revision == TEGRA_REVISION_A01) {
+		ts->calib.m =     -2775;
+		ts->calib.n =   1338811;
+		ts->calib.p =  -7300000;
+	} else {
+		ts->calib.m =     -3512;
+		ts->calib.n =   1528943;
+		ts->calib.p = -11100000;
+	}
+
+	/* except the coefficient of a reduced quadratic equation */
+	ts->calib.r = DIV_ROUND_CLOSEST(ts->calib.n, ts->calib.m * 2);
+
+	dev_info_once(ts->dev,
+		      "calibration: %d %d %u %u ATE ver: %u SoC rev: %u\n",
+		      c2_90C, c1_25C, t2_90C, t1_25C, ate_ver,
+		      tegra_sku_info.revision);
+
+	return 0;
+}
+
+static int tegra_tsensor_register_channel(struct tegra_tsensor *ts,
+					  unsigned int id)
+{
+	struct tegra_tsensor_channel *tsc = &ts->ch[id];
+	unsigned int hw_id = ts->swap_channels ? !id : id;
+
+	tsc->ts = ts;
+	tsc->id = id;
+	tsc->regs = ts->regs + 0x40 * (hw_id + 1);
+
+	tsc->tzd = devm_thermal_zone_of_sensor_register(ts->dev, id, tsc, &ops);
+	if (IS_ERR(tsc->tzd)) {
+		if (PTR_ERR(tsc->tzd) != -ENODEV)
+			return dev_err_probe(ts->dev, PTR_ERR(tsc->tzd),
+					     "failed to register thermal zone\n");
+
+		/*
+		 * It's okay if sensor isn't assigned to any thermal zone
+		 * in a device-tree.
+		 */
+		tsc->tzd = NULL;
+		return 0;
+	}
+
+	if (devm_thermal_add_hwmon_sysfs(tsc->tzd))
+		dev_warn(ts->dev, "failed to add hwmon sysfs attributes\n");
+
+	return 0;
+}
+
+static int tegra_tsensor_probe(struct platform_device *pdev)
+{
+	struct tegra_tsensor *ts;
+	unsigned int i;
+	int err, irq;
+
+	ts = devm_kzalloc(&pdev->dev, sizeof(*ts), GFP_KERNEL);
+	if (!ts)
+		return -ENOMEM;
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq < 0)
+		return irq;
+
+	ts->dev = &pdev->dev;
+	platform_set_drvdata(pdev, ts);
+
+	ts->regs = devm_platform_ioremap_resource(pdev, 0);
+	if (IS_ERR(ts->regs))
+		return PTR_ERR(ts->regs);
+
+	ts->clk = devm_clk_get(&pdev->dev, NULL);
+	if (IS_ERR(ts->clk))
+		return dev_err_probe(&pdev->dev, PTR_ERR(ts->clk),
+				     "failed to get clock\n");
+
+	ts->rst = devm_reset_control_get_exclusive(&pdev->dev, NULL);
+	if (IS_ERR(ts->rst))
+		return dev_err_probe(&pdev->dev, PTR_ERR(ts->rst),
+				     "failed to get reset control\n");
+
+	err = tegra_tsensor_nvmem_setup(ts);
+	if (err)
+		return err;
+
+	err = tegra_tsensor_hw_enable(ts);
+	if (err)
+		return err;
+
+	err = devm_add_action_or_reset(&pdev->dev,
+				       devm_tegra_tsensor_hw_disable,
+				       ts);
+	if (err)
+		return err;
+
+	for (i = 0; i < ARRAY_SIZE(ts->ch); i++) {
+		err = tegra_tsensor_register_channel(ts, i);
+		if (err)
+			return err;
+	}
+
+	ts->cdev = devm_thermal_of_cooling_device_register(&pdev->dev,
+				pdev->dev.of_node, "tegra-cpu-div2-throttle",
+				ts, &tegra_tsensor_cpu_cooling_ops);
+	if (IS_ERR(ts->cdev))
+		return dev_err_probe(&pdev->dev, PTR_ERR(ts->cdev),
+				     "failed to register cooling device\n");
+
+	err = devm_request_threaded_irq(&pdev->dev, irq, NULL,
+					tegra_tsensor_isr, IRQF_ONESHOT,
+					"tegra_tsensor", ts);
+	if (err)
+		return dev_err_probe(&pdev->dev, err,
+				     "failed to request interrupt\n");
+
+	for (i = 0; i < ARRAY_SIZE(ts->ch); i++) {
+		err = tegra_tsensor_enable_hw_channel(ts, i);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int __maybe_unused tegra_tsensor_suspend(struct device *dev)
+{
+	struct tegra_tsensor *ts = dev_get_drvdata(dev);
+	unsigned int i;
+	int err;
+
+	for (i = 0; i < ARRAY_SIZE(ts->ch); i++) {
+		err = tegra_tsensor_disable_hw_channel(ts, i);
+		if (err)
+			goto enable_channel;
+	}
+
+	err = tegra_tsensor_hw_disable(ts);
+	if (err)
+		goto enable_channel;
+
+	return 0;
+
+enable_channel:
+	while (i--)
+		tegra_tsensor_enable_hw_channel(ts, i);
+
+	return err;
+}
+
+static int __maybe_unused tegra_tsensor_resume(struct device *dev)
+{
+	struct tegra_tsensor *ts = dev_get_drvdata(dev);
+	unsigned int i;
+	int err;
+
+	err = tegra_tsensor_hw_enable(ts);
+	if (err)
+		return err;
+
+	for (i = 0; i < ARRAY_SIZE(ts->ch); i++) {
+		err = tegra_tsensor_enable_hw_channel(ts, i);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static const struct dev_pm_ops tegra_tsensor_pm_ops = {
+	SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(tegra_tsensor_suspend,
+				      tegra_tsensor_resume)
+};
+
+static const struct of_device_id tegra_tsensor_of_match[] = {
+	{ .compatible = "nvidia,tegra30-tsensor", },
+	{},
+};
+MODULE_DEVICE_TABLE(of, tegra_tsensor_of_match);
+
+static struct platform_driver tegra_tsensor_driver = {
+	.probe = tegra_tsensor_probe,
+	.driver = {
+		.name = "tegra30-tsensor",
+		.of_match_table = tegra_tsensor_of_match,
+		.pm = &tegra_tsensor_pm_ops,
+	},
+};
+module_platform_driver(tegra_tsensor_driver);
+
+MODULE_DESCRIPTION("NVIDIA Tegra30 Thermal Sensor driver");
+MODULE_AUTHOR("Dmitry Osipenko <digetx@gmail.com>");
+MODULE_LICENSE("GPL");
-- 
2.30.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 5/7] ARM: tegra_defconfig: Enable CONFIG_TEGRA30_TSENSOR
  2021-05-29 17:09 [PATCH v3 0/7] Add driver for NVIDIA Tegra30 SoC Thermal sensor Dmitry Osipenko
                   ` (3 preceding siblings ...)
  2021-05-29 17:09 ` [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor Dmitry Osipenko
@ 2021-05-29 17:09 ` Dmitry Osipenko
  2021-05-29 17:09 ` [PATCH v3 6/7] ARM: multi_v7_defconfig: " Dmitry Osipenko
  2021-05-29 17:09 ` [PATCH v3 7/7] ARM: tegra: Add SoC thermal sensor to Tegra30 device-trees Dmitry Osipenko
  6 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2021-05-29 17:09 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Zhang Rui, Daniel Lezcano,
	Amit Kucheria, Andreas Westman Dorcsak, Maxim Schwalm,
	Svyatoslav Ryhel, Ihor Didenko, Ion Agorria, Matt Merhar,
	Peter Geis
  Cc: devicetree, linux-kernel, linux-tegra, linux-pm

Enable NVIDIA Tegra30 SoC thermal sensor driver in tegra_defconfig.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 arch/arm/configs/tegra_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/tegra_defconfig b/arch/arm/configs/tegra_defconfig
index 3d8d8af9524d..63e7f3261e6d 100644
--- a/arch/arm/configs/tegra_defconfig
+++ b/arch/arm/configs/tegra_defconfig
@@ -169,6 +169,7 @@ CONFIG_THERMAL_STATISTICS=y
 CONFIG_CPU_THERMAL=y
 CONFIG_DEVFREQ_THERMAL=y
 CONFIG_TEGRA_SOCTHERM=m
+CONFIG_TEGRA30_TSENSOR=m
 CONFIG_WATCHDOG=y
 CONFIG_MAX77620_WATCHDOG=y
 CONFIG_TEGRA_WATCHDOG=y
-- 
2.30.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 6/7] ARM: multi_v7_defconfig: Enable CONFIG_TEGRA30_TSENSOR
  2021-05-29 17:09 [PATCH v3 0/7] Add driver for NVIDIA Tegra30 SoC Thermal sensor Dmitry Osipenko
                   ` (4 preceding siblings ...)
  2021-05-29 17:09 ` [PATCH v3 5/7] ARM: tegra_defconfig: Enable CONFIG_TEGRA30_TSENSOR Dmitry Osipenko
@ 2021-05-29 17:09 ` Dmitry Osipenko
  2021-05-29 17:09 ` [PATCH v3 7/7] ARM: tegra: Add SoC thermal sensor to Tegra30 device-trees Dmitry Osipenko
  6 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2021-05-29 17:09 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Zhang Rui, Daniel Lezcano,
	Amit Kucheria, Andreas Westman Dorcsak, Maxim Schwalm,
	Svyatoslav Ryhel, Ihor Didenko, Ion Agorria, Matt Merhar,
	Peter Geis
  Cc: devicetree, linux-kernel, linux-tegra, linux-pm

Enable NVIDIA Tegra30 SoC thermal sensor driver in multi_v7_defconfig.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 arch/arm/configs/multi_v7_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/multi_v7_defconfig b/arch/arm/configs/multi_v7_defconfig
index 52a0400fdd92..fc346f87d7f9 100644
--- a/arch/arm/configs/multi_v7_defconfig
+++ b/arch/arm/configs/multi_v7_defconfig
@@ -523,6 +523,7 @@ CONFIG_BRCMSTB_THERMAL=m
 CONFIG_GENERIC_ADC_THERMAL=m
 CONFIG_ST_THERMAL_MEMMAP=y
 CONFIG_TEGRA_SOCTHERM=m
+CONFIG_TEGRA30_TSENSOR=m
 CONFIG_UNIPHIER_THERMAL=y
 CONFIG_DA9063_WATCHDOG=m
 CONFIG_XILINX_WATCHDOG=y
-- 
2.30.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 7/7] ARM: tegra: Add SoC thermal sensor to Tegra30 device-trees
  2021-05-29 17:09 [PATCH v3 0/7] Add driver for NVIDIA Tegra30 SoC Thermal sensor Dmitry Osipenko
                   ` (5 preceding siblings ...)
  2021-05-29 17:09 ` [PATCH v3 6/7] ARM: multi_v7_defconfig: " Dmitry Osipenko
@ 2021-05-29 17:09 ` Dmitry Osipenko
  6 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2021-05-29 17:09 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Zhang Rui, Daniel Lezcano,
	Amit Kucheria, Andreas Westman Dorcsak, Maxim Schwalm,
	Svyatoslav Ryhel, Ihor Didenko, Ion Agorria, Matt Merhar,
	Peter Geis
  Cc: devicetree, linux-kernel, linux-tegra, linux-pm

Add the on-chip SoC thermal sensor to Tegra30 device-trees. Now CPU
temperature reporting and thermal throttling is available on all Tegra30
devices universally.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 arch/arm/boot/dts/tegra30-ouya.dts | 16 +++++
 arch/arm/boot/dts/tegra30.dtsi     | 93 ++++++++++++++++++++++++++++--
 2 files changed, 105 insertions(+), 4 deletions(-)

diff --git a/arch/arm/boot/dts/tegra30-ouya.dts b/arch/arm/boot/dts/tegra30-ouya.dts
index e767ac227a81..d792ce5c64c0 100644
--- a/arch/arm/boot/dts/tegra30-ouya.dts
+++ b/arch/arm/boot/dts/tegra30-ouya.dts
@@ -468,6 +468,22 @@ map1 {
 				};
 			};
 		};
+
+		tsensor-channel0 {
+			trips {
+				dvfs-alert {
+					temperature = <70000>;
+				};
+
+				cpu-div2-throttle {
+					temperature = <75000>;
+				};
+
+				soc-critical {
+					temperature = <90000>;
+				};
+			};
+		};
 	};
 
 	vdd_12v_in: vdd_12v_in {
diff --git a/arch/arm/boot/dts/tegra30.dtsi b/arch/arm/boot/dts/tegra30.dtsi
index c577c191be4b..6becbadef210 100644
--- a/arch/arm/boot/dts/tegra30.dtsi
+++ b/arch/arm/boot/dts/tegra30.dtsi
@@ -5,6 +5,7 @@
 #include <dt-bindings/pinctrl/pinctrl-tegra.h>
 #include <dt-bindings/interrupt-controller/arm-gic.h>
 #include <dt-bindings/soc/tegra-pmc.h>
+#include <dt-bindings/thermal/thermal.h>
 
 #include "tegra30-peripherals-opp.dtsi"
 
@@ -800,6 +801,21 @@ fuse@7000f800 {
 		reset-names = "fuse";
 	};
 
+	tsensor: tsensor@70014000 {
+		compatible = "nvidia,tegra30-tsensor";
+		reg = <0x70014000 0x500>;
+		interrupts = <GIC_SPI 102 IRQ_TYPE_LEVEL_HIGH>;
+		clocks = <&tegra_car TEGRA30_CLK_TSENSOR>;
+		resets = <&tegra_car TEGRA30_CLK_TSENSOR>;
+
+		assigned-clocks = <&tegra_car TEGRA30_CLK_TSENSOR>;
+		assigned-clock-parents = <&tegra_car TEGRA30_CLK_CLK_M>;
+		assigned-clock-rates = <500000>;
+
+		#thermal-sensor-cells = <1>;
+		#cooling-cells = <2>;
+	};
+
 	hda@70030000 {
 		compatible = "nvidia,tegra30-hda";
 		reg = <0x70030000 0x10000>;
@@ -1062,32 +1078,36 @@ cpus {
 		#address-cells = <1>;
 		#size-cells = <0>;
 
-		cpu@0 {
+		cpu0: cpu@0 {
 			device_type = "cpu";
 			compatible = "arm,cortex-a9";
 			reg = <0>;
 			clocks = <&tegra_car TEGRA30_CLK_CCLK_G>;
+			#cooling-cells = <2>;
 		};
 
-		cpu@1 {
+		cpu1: cpu@1 {
 			device_type = "cpu";
 			compatible = "arm,cortex-a9";
 			reg = <1>;
 			clocks = <&tegra_car TEGRA30_CLK_CCLK_G>;
+			#cooling-cells = <2>;
 		};
 
-		cpu@2 {
+		cpu2: cpu@2 {
 			device_type = "cpu";
 			compatible = "arm,cortex-a9";
 			reg = <2>;
 			clocks = <&tegra_car TEGRA30_CLK_CCLK_G>;
+			#cooling-cells = <2>;
 		};
 
-		cpu@3 {
+		cpu3: cpu@3 {
 			device_type = "cpu";
 			compatible = "arm,cortex-a9";
 			reg = <3>;
 			clocks = <&tegra_car TEGRA30_CLK_CCLK_G>;
+			#cooling-cells = <2>;
 		};
 	};
 
@@ -1102,4 +1122,69 @@ pmu {
 				     <&{/cpus/cpu@2}>,
 				     <&{/cpus/cpu@3}>;
 	};
+
+	thermal-zones {
+		tsensor-channel0 {
+			polling-delay-passive = <1000>; /* milliseconds */
+			polling-delay = <5000>; /* milliseconds */
+
+			thermal-sensors = <&tsensor 0>;
+
+			trips {
+				level1_trip: dvfs-alert {
+					/* throttle at 67C until temperature drops to 66.8C */
+					temperature = <67000>;
+					hysteresis = <200>;
+					type = "passive";
+				};
+
+				level2_trip: cpu-div2-throttle {
+					/* hardware CPU x2 freq throttle at 70C */
+					temperature = <70000>;
+					hysteresis = <200>;
+					type = "hot";
+				};
+
+				soc-critical {
+					/* hardware shut down at 80C */
+					temperature = <80000>;
+					hysteresis = <2000>;
+					type = "critical";
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&level1_trip>;
+					cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&cpu3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
+							 <&actmon THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+				};
+
+				map1 {
+					trip = <&level2_trip>;
+					cooling-device = <&tsensor THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
+				};
+			};
+		};
+
+		tsensor-channel1 {
+			status = "disabled";
+
+			polling-delay-passive = <1000>; /* milliseconds */
+			polling-delay = <0>; /* milliseconds */
+
+			thermal-sensors = <&tsensor 1>;
+
+			trips {
+				dvfs-alert {
+					temperature = <80000>;
+					hysteresis = <200>;
+					type = "passive";
+				};
+			};
+		};
+	};
 };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  2021-05-29 17:09 ` [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor Dmitry Osipenko
@ 2021-05-31 12:51   ` Thierry Reding
  2021-06-15 10:03   ` Daniel Lezcano
  1 sibling, 0 replies; 22+ messages in thread
From: Thierry Reding @ 2021-05-31 12:51 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Jonathan Hunter, Zhang Rui, Daniel Lezcano, Amit Kucheria,
	Andreas Westman Dorcsak, Maxim Schwalm, Svyatoslav Ryhel,
	Ihor Didenko, Ion Agorria, Matt Merhar, Peter Geis, devicetree,
	linux-kernel, linux-tegra, linux-pm

[-- Attachment #1: Type: text/plain, Size: 1380 bytes --]

On Sat, May 29, 2021 at 08:09:52PM +0300, Dmitry Osipenko wrote:
> All NVIDIA Tegra30 SoCs have a two-channel on-chip sensor unit which
> monitors temperature and voltage of the SoC. Sensors control CPU frequency
> throttling, which is activated by hardware once preprogrammed temperature
> level is breached, they also send signal to Power Management controller to
> perform emergency shutdown on a critical overheat of the SoC die. Add
> driver for the Tegra30 TSENSOR module, exposing it as a thermal sensor
> and a cooling device.
> 
> Tested-by: Andreas Westman Dorcsak <hedmoo@yahoo.com> # Asus TF700T
> Tested-by: Maxim Schwalm <maxim.schwalm@gmail.com> # Asus TF700T
> Tested-by: Svyatoslav Ryhel <clamor95@gmail.com> # Asus TF201T
> Tested-by: Ihor Didenko <tailormoon@rambler.ru> # Asus TF300T
> Tested-by: Ion Agorria <ion@agorria.com> # Asus TF201T
> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya
> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/thermal/tegra/Kconfig           |   7 +
>  drivers/thermal/tegra/Makefile          |   1 +
>  drivers/thermal/tegra/tegra30-tsensor.c | 736 ++++++++++++++++++++++++
>  3 files changed, 744 insertions(+)
>  create mode 100644 drivers/thermal/tegra/tegra30-tsensor.c

Acked-by: Thierry Reding <treding@nvidia.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  2021-05-29 17:09 ` [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor Dmitry Osipenko
  2021-05-31 12:51   ` Thierry Reding
@ 2021-06-15 10:03   ` Daniel Lezcano
  2021-06-15 10:26     ` Viresh Kumar
  1 sibling, 1 reply; 22+ messages in thread
From: Daniel Lezcano @ 2021-06-15 10:03 UTC (permalink / raw)
  To: Dmitry Osipenko, Thierry Reding, Jonathan Hunter, Zhang Rui,
	Amit Kucheria, Andreas Westman Dorcsak, Maxim Schwalm,
	Svyatoslav Ryhel, Ihor Didenko, Ion Agorria, Matt Merhar,
	Peter Geis
  Cc: devicetree, linux-kernel, linux-tegra, linux-pm, Viresh Kumar


[Cc Viresh]

On 29/05/2021 19:09, Dmitry Osipenko wrote:
> All NVIDIA Tegra30 SoCs have a two-channel on-chip sensor unit which
> monitors temperature and voltage of the SoC. Sensors control CPU frequency
> throttling, which is activated by hardware once preprogrammed temperature
> level is breached, they also send signal to Power Management controller to
> perform emergency shutdown on a critical overheat of the SoC die. Add
> driver for the Tegra30 TSENSOR module, exposing it as a thermal sensor
> and a cooling device.

IMO it does not make sense to expose the hardware throttling mechanism
as a cooling device because it is not usable anywhere from the thermal
framework.

Moreover, that will collide with the thermal / cpufreq framework
mitigation (hardware sets the frequency but the software thinks the freq
is different), right ?

The hardware limiter should let know the cpufreq framework about the
frequency change.

	https://lkml.org/lkml/2021/6/8/1792

May be post the sensor without the hw limiter for now and address that
in a separate series ?

  -- Daniel

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  2021-06-15 10:03   ` Daniel Lezcano
@ 2021-06-15 10:26     ` Viresh Kumar
  2021-06-15 13:01       ` Dmitry Osipenko
  0 siblings, 1 reply; 22+ messages in thread
From: Viresh Kumar @ 2021-06-15 10:26 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Dmitry Osipenko, Thierry Reding, Jonathan Hunter, Zhang Rui,
	Amit Kucheria, Andreas Westman Dorcsak, Maxim Schwalm,
	Svyatoslav Ryhel, Ihor Didenko, Ion Agorria, Matt Merhar,
	Peter Geis, devicetree, linux-kernel, linux-tegra, linux-pm

On 15-06-21, 12:03, Daniel Lezcano wrote:
> 
> [Cc Viresh]
> 
> On 29/05/2021 19:09, Dmitry Osipenko wrote:
> > All NVIDIA Tegra30 SoCs have a two-channel on-chip sensor unit which
> > monitors temperature and voltage of the SoC. Sensors control CPU frequency
> > throttling, which is activated by hardware once preprogrammed temperature
> > level is breached, they also send signal to Power Management controller to
> > perform emergency shutdown on a critical overheat of the SoC die. Add
> > driver for the Tegra30 TSENSOR module, exposing it as a thermal sensor
> > and a cooling device.
> 
> IMO it does not make sense to expose the hardware throttling mechanism
> as a cooling device because it is not usable anywhere from the thermal
> framework.
> 
> Moreover, that will collide with the thermal / cpufreq framework
> mitigation (hardware sets the frequency but the software thinks the freq
> is different), right ?

I am not even sure what the cooling device is doing here:

tegra_tsensor_set_cur_state() is not implemented and it says hardware
changed it by itself. What is the benefit you are getting out of the
cooling device here ?

> The hardware limiter should let know the cpufreq framework about the
> frequency change.
> 
> 	https://lkml.org/lkml/2021/6/8/1792
> 
> May be post the sensor without the hw limiter for now and address that
> in a separate series ?

-- 
viresh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  2021-06-15 10:26     ` Viresh Kumar
@ 2021-06-15 13:01       ` Dmitry Osipenko
  2021-06-15 16:18         ` Daniel Lezcano
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry Osipenko @ 2021-06-15 13:01 UTC (permalink / raw)
  To: Viresh Kumar, Daniel Lezcano
  Cc: Thierry Reding, Jonathan Hunter, Zhang Rui, Amit Kucheria,
	Andreas Westman Dorcsak, Maxim Schwalm, Svyatoslav Ryhel,
	Ihor Didenko, Ion Agorria, Matt Merhar, Peter Geis, devicetree,
	linux-kernel, linux-tegra, linux-pm

15.06.2021 13:26, Viresh Kumar пишет:
> On 15-06-21, 12:03, Daniel Lezcano wrote:
>>
>> [Cc Viresh]
>>
>> On 29/05/2021 19:09, Dmitry Osipenko wrote:
>>> All NVIDIA Tegra30 SoCs have a two-channel on-chip sensor unit which
>>> monitors temperature and voltage of the SoC. Sensors control CPU frequency
>>> throttling, which is activated by hardware once preprogrammed temperature
>>> level is breached, they also send signal to Power Management controller to
>>> perform emergency shutdown on a critical overheat of the SoC die. Add
>>> driver for the Tegra30 TSENSOR module, exposing it as a thermal sensor
>>> and a cooling device.
>>
>> IMO it does not make sense to expose the hardware throttling mechanism
>> as a cooling device because it is not usable anywhere from the thermal
>> framework.
>>
>> Moreover, that will collide with the thermal / cpufreq framework
>> mitigation (hardware sets the frequency but the software thinks the freq
>> is different), right ?

H/w mitigation is additional and should be transparent to the software
mitigation. The software mitigation is much more flexible, but it has
latency. Software also could crash and hang.

Hardware mitigation doesn't have latency and it will continue to work
regardless of the software state.

The CCF driver is aware about the h/w cooling status [1], hence software
sees the actual frequency.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit?id=344d5df34f5abd468267daa98f041abf90b2f660

> I am not even sure what the cooling device is doing here:
> 
> tegra_tsensor_set_cur_state() is not implemented and it says hardware
> changed it by itself. What is the benefit you are getting out of the
> cooling device here ?

It allows userspace to check whether hardware cooling is active via the
cooling_device sysfs. Otherwise we don't have ability to check whether
h/w cooling is active, I think it's a useful information. It's also
interesting to see the cooling_device stats, showing how many times h/w
mitigation was active.

>> The hardware limiter should let know the cpufreq framework about the
>> frequency change.
>>
>> 	https://lkml.org/lkml/2021/6/8/1792
>>
>> May be post the sensor without the hw limiter for now and address that
>> in a separate series ?
> 

I wasn't aware about existence of the thermal pressure, thank you for
pointing at it. At a quick glance it should be possible to benefit from
the information about the additional pressure.

Seems the current thermal pressure API assumes that there is only one
user of the API. So it's impossible to aggregate the pressure from
different sources, like software cpufreq pressure + h/w freq pressure.
Correct? If yes, then please let me know yours thoughts about the best
approach of supporting the aggregation.

I'll factor out the h/w limiter from this patchset and prepare the v4.
Thank you all for taking a look at the patches.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  2021-06-15 13:01       ` Dmitry Osipenko
@ 2021-06-15 16:18         ` Daniel Lezcano
  2021-06-15 19:32           ` Dmitry Osipenko
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Lezcano @ 2021-06-15 16:18 UTC (permalink / raw)
  To: Dmitry Osipenko, Viresh Kumar, Thara Gopinath
  Cc: Thierry Reding, Jonathan Hunter, Zhang Rui, Amit Kucheria,
	Andreas Westman Dorcsak, Maxim Schwalm, Svyatoslav Ryhel,
	Ihor Didenko, Ion Agorria, Matt Merhar, Peter Geis, devicetree,
	linux-kernel, linux-tegra, linux-pm

On 15/06/2021 15:01, Dmitry Osipenko wrote:
> 15.06.2021 13:26, Viresh Kumar пишет:
>> On 15-06-21, 12:03, Daniel Lezcano wrote:
>>>
>>> [Cc Viresh]
>>>
>>> On 29/05/2021 19:09, Dmitry Osipenko wrote:
>>>> All NVIDIA Tegra30 SoCs have a two-channel on-chip sensor unit which
>>>> monitors temperature and voltage of the SoC. Sensors control CPU frequency
>>>> throttling, which is activated by hardware once preprogrammed temperature
>>>> level is breached, they also send signal to Power Management controller to
>>>> perform emergency shutdown on a critical overheat of the SoC die. Add
>>>> driver for the Tegra30 TSENSOR module, exposing it as a thermal sensor
>>>> and a cooling device.
>>>
>>> IMO it does not make sense to expose the hardware throttling mechanism
>>> as a cooling device because it is not usable anywhere from the thermal
>>> framework.
>>>
>>> Moreover, that will collide with the thermal / cpufreq framework
>>> mitigation (hardware sets the frequency but the software thinks the freq
>>> is different), right ?
> 
> H/w mitigation is additional and should be transparent to the software
> mitigation. The software mitigation is much more flexible, but it has
> latency. Software also could crash and hang.
> 
> Hardware mitigation doesn't have latency and it will continue to work
> regardless of the software state.

Yes, I agree. Both solutions have their pros and cons. However, I don't
think they can co-exist sanely.

> The CCF driver is aware about the h/w cooling status [1], hence software
> sees the actual frequency.
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit?id=344d5df34f5abd468267daa98f041abf90b2f660

Ah interesting, thanks for the pointer.

What I'm worried about is the consistency with cpufreq.

Probably cpufreq_update_limits() should be called from the interrupt
handler.

>> I am not even sure what the cooling device is doing here:
>>
>> tegra_tsensor_set_cur_state() is not implemented and it says hardware
>> changed it by itself. What is the benefit you are getting out of the
>> cooling device here ?
> 
> It allows userspace to check whether hardware cooling is active via the
> cooling_device sysfs. Otherwise we don't have ability to check whether
> h/w cooling is active, I think it's a useful information. It's also
> interesting to see the cooling_device stats, showing how many times h/w
> mitigation was active.

Actually the stats are for software mitigation. For the driver, create a
debugfs entry like what do the other drivers or a module parameter with
the stats.

>>> The hardware limiter should let know the cpufreq framework about the
>>> frequency change.
>>>
>>> 	https://lkml.org/lkml/2021/6/8/1792
>>>
>>> May be post the sensor without the hw limiter for now and address that
>>> in a separate series ?
>>
> 
> I wasn't aware about existence of the thermal pressure, thank you for
> pointing at it. At a quick glance it should be possible to benefit from
> the information about the additional pressure.
> 
> Seems the current thermal pressure API assumes that there is only one
> user of the API. So it's impossible to aggregate the pressure from
> different sources, like software cpufreq pressure + h/w freq pressure.
> Correct? If yes, then please let me know yours thoughts about the best
> approach of supporting the aggregation.

That is a good question. IMO, first step would be to call
cpufreq_update_limits().

[ Cc Thara who implemented the thermal pressure ]

May be Thara has an idea about how to aggregate both? There is another
series floating around with hardware limiter [1] and the same problematic.

 [1] https://lkml.org/lkml/2021/6/8/1791

> I'll factor out the h/w limiter from this patchset and prepare the v4.
> Thank you all for taking a look at the patches.



-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  2021-06-15 16:18         ` Daniel Lezcano
@ 2021-06-15 19:32           ` Dmitry Osipenko
  2021-06-16  2:50             ` Thara Gopinath
  2021-06-16  8:03             ` Viresh Kumar
  0 siblings, 2 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2021-06-15 19:32 UTC (permalink / raw)
  To: Daniel Lezcano, Viresh Kumar, Thara Gopinath
  Cc: Thierry Reding, Jonathan Hunter, Zhang Rui, Amit Kucheria,
	Andreas Westman Dorcsak, Maxim Schwalm, Svyatoslav Ryhel,
	Ihor Didenko, Ion Agorria, Matt Merhar, Peter Geis, devicetree,
	linux-kernel, linux-tegra, linux-pm

15.06.2021 19:18, Daniel Lezcano пишет:
> On 15/06/2021 15:01, Dmitry Osipenko wrote:
>> 15.06.2021 13:26, Viresh Kumar пишет:
>>> On 15-06-21, 12:03, Daniel Lezcano wrote:
>>>>
>>>> [Cc Viresh]
>>>>
>>>> On 29/05/2021 19:09, Dmitry Osipenko wrote:
>>>>> All NVIDIA Tegra30 SoCs have a two-channel on-chip sensor unit which
>>>>> monitors temperature and voltage of the SoC. Sensors control CPU frequency
>>>>> throttling, which is activated by hardware once preprogrammed temperature
>>>>> level is breached, they also send signal to Power Management controller to
>>>>> perform emergency shutdown on a critical overheat of the SoC die. Add
>>>>> driver for the Tegra30 TSENSOR module, exposing it as a thermal sensor
>>>>> and a cooling device.
>>>>
>>>> IMO it does not make sense to expose the hardware throttling mechanism
>>>> as a cooling device because it is not usable anywhere from the thermal
>>>> framework.
>>>>
>>>> Moreover, that will collide with the thermal / cpufreq framework
>>>> mitigation (hardware sets the frequency but the software thinks the freq
>>>> is different), right ?
>>
>> H/w mitigation is additional and should be transparent to the software
>> mitigation. The software mitigation is much more flexible, but it has
>> latency. Software also could crash and hang.
>>
>> Hardware mitigation doesn't have latency and it will continue to work
>> regardless of the software state.
> 
> Yes, I agree. Both solutions have their pros and cons. However, I don't
> think they can co-exist sanely.
> 
>> The CCF driver is aware about the h/w cooling status [1], hence software
>> sees the actual frequency.
>>
>> [1]
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit?id=344d5df34f5abd468267daa98f041abf90b2f660
> 
> Ah interesting, thanks for the pointer.
> 
> What I'm worried about is the consistency with cpufreq.
> 
> Probably cpufreq_update_limits() should be called from the interrupt
> handler.

IIUC, the cpufreq already should be prepared for the case where firmware
may override frequency. Viresh, could you please clarify what are the
possible implications of the frequency overriding?

>>> I am not even sure what the cooling device is doing here:
>>>
>>> tegra_tsensor_set_cur_state() is not implemented and it says hardware
>>> changed it by itself. What is the benefit you are getting out of the
>>> cooling device here ?
>>
>> It allows userspace to check whether hardware cooling is active via the
>> cooling_device sysfs. Otherwise we don't have ability to check whether
>> h/w cooling is active, I think it's a useful information. It's also
>> interesting to see the cooling_device stats, showing how many times h/w
>> mitigation was active.
> 
> Actually the stats are for software mitigation. For the driver, create a
> debugfs entry like what do the other drivers or a module parameter with
> the stats.

Okay

>>>> The hardware limiter should let know the cpufreq framework about the
>>>> frequency change.
>>>>
>>>> 	https://lkml.org/lkml/2021/6/8/1792
>>>>
>>>> May be post the sensor without the hw limiter for now and address that
>>>> in a separate series ?
>>>
>>
>> I wasn't aware about existence of the thermal pressure, thank you for
>> pointing at it. At a quick glance it should be possible to benefit from
>> the information about the additional pressure.
>>
>> Seems the current thermal pressure API assumes that there is only one
>> user of the API. So it's impossible to aggregate the pressure from
>> different sources, like software cpufreq pressure + h/w freq pressure.
>> Correct? If yes, then please let me know yours thoughts about the best
>> approach of supporting the aggregation.
> 
> That is a good question. IMO, first step would be to call
> cpufreq_update_limits().

Right

> [ Cc Thara who implemented the thermal pressure ]
> 
> May be Thara has an idea about how to aggregate both? There is another
> series floating around with hardware limiter [1] and the same problematic.
> 
>  [1] https://lkml.org/lkml/2021/6/8/1791

Thanks, it indeed looks similar.

I guess the common thermal pressure update code could be moved out into
a new special cpufreq thermal QoS handler (policy->thermal_constraints),
where handler will select the frequency constraint and set up the
pressure accordingly. So there won't be any races in the code.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  2021-06-15 19:32           ` Dmitry Osipenko
@ 2021-06-16  2:50             ` Thara Gopinath
  2021-06-16 10:47               ` Dmitry Osipenko
  2021-06-16  8:03             ` Viresh Kumar
  1 sibling, 1 reply; 22+ messages in thread
From: Thara Gopinath @ 2021-06-16  2:50 UTC (permalink / raw)
  To: Dmitry Osipenko, Daniel Lezcano, Viresh Kumar
  Cc: Thierry Reding, Jonathan Hunter, Zhang Rui, Amit Kucheria,
	Andreas Westman Dorcsak, Maxim Schwalm, Svyatoslav Ryhel,
	Ihor Didenko, Ion Agorria, Matt Merhar, Peter Geis, devicetree,
	linux-kernel, linux-tegra, linux-pm



On 6/15/21 3:32 PM, Dmitry Osipenko wrote:
> 15.06.2021 19:18, Daniel Lezcano пишет:
>> On 15/06/2021 15:01, Dmitry Osipenko wrote:
>>> 15.06.2021 13:26, Viresh Kumar пишет:
>>>> On 15-06-21, 12:03, Daniel Lezcano wrote:
>>>>>
>>>>> [Cc Viresh]
>>>>>
>>>>> On 29/05/2021 19:09, Dmitry Osipenko wrote:
>>>>>> All NVIDIA Tegra30 SoCs have a two-channel on-chip sensor unit which
>>>>>> monitors temperature and voltage of the SoC. Sensors control CPU frequency
>>>>>> throttling, which is activated by hardware once preprogrammed temperature
>>>>>> level is breached, they also send signal to Power Management controller to
>>>>>> perform emergency shutdown on a critical overheat of the SoC die. Add
>>>>>> driver for the Tegra30 TSENSOR module, exposing it as a thermal sensor
>>>>>> and a cooling device.
>>>>>
>>>>> IMO it does not make sense to expose the hardware throttling mechanism
>>>>> as a cooling device because it is not usable anywhere from the thermal
>>>>> framework.
>>>>>
>>>>> Moreover, that will collide with the thermal / cpufreq framework
>>>>> mitigation (hardware sets the frequency but the software thinks the freq
>>>>> is different), right ?
>>>
>>> H/w mitigation is additional and should be transparent to the software
>>> mitigation. The software mitigation is much more flexible, but it has
>>> latency. Software also could crash and hang.
>>>
>>> Hardware mitigation doesn't have latency and it will continue to work
>>> regardless of the software state.
>>
>> Yes, I agree. Both solutions have their pros and cons. However, I don't
>> think they can co-exist sanely.
>>
>>> The CCF driver is aware about the h/w cooling status [1], hence software
>>> sees the actual frequency.
>>>
>>> [1]
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit?id=344d5df34f5abd468267daa98f041abf90b2f660
>>
>> Ah interesting, thanks for the pointer.
>>
>> What I'm worried about is the consistency with cpufreq.
>>
>> Probably cpufreq_update_limits() should be called from the interrupt
>> handler.
> 
> IIUC, the cpufreq already should be prepared for the case where firmware
> may override frequency. Viresh, could you please clarify what are the
> possible implications of the frequency overriding?
> 
>>>> I am not even sure what the cooling device is doing here:
>>>>
>>>> tegra_tsensor_set_cur_state() is not implemented and it says hardware
>>>> changed it by itself. What is the benefit you are getting out of the
>>>> cooling device here ?
>>>
>>> It allows userspace to check whether hardware cooling is active via the
>>> cooling_device sysfs. Otherwise we don't have ability to check whether
>>> h/w cooling is active, I think it's a useful information. It's also
>>> interesting to see the cooling_device stats, showing how many times h/w
>>> mitigation was active.
>>
>> Actually the stats are for software mitigation. For the driver, create a
>> debugfs entry like what do the other drivers or a module parameter with
>> the stats.
> 
> Okay
> 
>>>>> The hardware limiter should let know the cpufreq framework about the
>>>>> frequency change.
>>>>>
>>>>> 	https://lkml.org/lkml/2021/6/8/1792
>>>>>
>>>>> May be post the sensor without the hw limiter for now and address that
>>>>> in a separate series ?
>>>>
>>>
>>> I wasn't aware about existence of the thermal pressure, thank you for
>>> pointing at it. At a quick glance it should be possible to benefit from
>>> the information about the additional pressure.
>>>
>>> Seems the current thermal pressure API assumes that there is only one
>>> user of the API. So it's impossible to aggregate the pressure from
>>> different sources, like software cpufreq pressure + h/w freq pressure.
>>> Correct? If yes, then please let me know yours thoughts about the best
>>> approach of supporting the aggregation.

Hi,

Thermal pressure is letting scheduler know that the max capacity 
available for a cpu to schedule tasks is reduced due to a thermal event.
So you cannot have a h/w thermal pressure and s/w thermal pressure. 
There is eventually only one capping applied at h/w level and the 
frequency corresponding to this capping should be used for thermal pressure.

Ideally you should not be having both s/w and h/w trying to throttle at 
the same time. Why is this a scenario and what prevents you from 
disabling s/w throttling when h/w throttling is enabled. Now if there 
has to a aggregation for whatever reason this should be done at the 
thermal driver level and passed to scheduler.

>>
>> That is a good question. IMO, first step would be to call
>> cpufreq_update_limits().
> 
> Right
> 
>> [ Cc Thara who implemented the thermal pressure ]
>>
>> May be Thara has an idea about how to aggregate both? There is another
>> series floating around with hardware limiter [1] and the same problematic.
>>
>>   [1] https://lkml.org/lkml/2021/6/8/1791
> 
> Thanks, it indeed looks similar.
> 
> I guess the common thermal pressure update code could be moved out into
> a new special cpufreq thermal QoS handler (policy->thermal_constraints),
> where handler will select the frequency constraint and set up the
> pressure accordingly. So there won't be any races in the code.
> 
It was a conscious decision to keep thermal pressure update out of qos 
max freq update because there are platforms that don't use the qos 
framework. For eg acpi uses cpufreq_update_policy.
But you are right. We have two platforms now applying h/w throttling and 
cpufreq_cooling applying s/w throttling. So it does make sense to have 
one api doing all the computation to update thermal pressure. I am not 
sure how exactly/where exactly this will reside.

So for starters, I think you should replicate the update of thermal 
pressure in your h/w driver when you know that h/w is 
throttling/throttled the frequency. You can refer to cpufreq_cooling.c 
to see how it is done.

Moving to a common api can be done as a separate patch series.

-- 
Warm Regards
Thara (She/Her/Hers)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  2021-06-15 19:32           ` Dmitry Osipenko
  2021-06-16  2:50             ` Thara Gopinath
@ 2021-06-16  8:03             ` Viresh Kumar
  2021-06-16  8:30               ` Vincent Guittot
  1 sibling, 1 reply; 22+ messages in thread
From: Viresh Kumar @ 2021-06-16  8:03 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Daniel Lezcano, vincent.guittot, Thara Gopinath, Thierry Reding,
	Jonathan Hunter, Zhang Rui, Amit Kucheria,
	Andreas Westman Dorcsak, Maxim Schwalm, Svyatoslav Ryhel,
	Ihor Didenko, Ion Agorria, Matt Merhar, Peter Geis, devicetree,
	linux-kernel, linux-tegra, linux-pm

+Vincent.

On 15-06-21, 22:32, Dmitry Osipenko wrote:
> IIUC, the cpufreq already should be prepared for the case where firmware
> may override frequency. Viresh, could you please clarify what are the
> possible implications of the frequency overriding?

The only implication is software would think hardware is running at
some other frequency, while it is not. Not sure if something may break
as a result of this.

The scheduler's view of CPUs will not be same though, i.e. scheduler
will see capacity as X, while in reality it has changed to Y.

-- 
viresh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  2021-06-16  8:03             ` Viresh Kumar
@ 2021-06-16  8:30               ` Vincent Guittot
  2021-06-16  8:39                 ` Dmitry Osipenko
  0 siblings, 1 reply; 22+ messages in thread
From: Vincent Guittot @ 2021-06-16  8:30 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Dmitry Osipenko, Daniel Lezcano, Thara Gopinath, Thierry Reding,
	Jonathan Hunter, Zhang Rui, Amit Kucheria,
	Andreas Westman Dorcsak, Maxim Schwalm, Svyatoslav Ryhel,
	Ihor Didenko, Ion Agorria, Matt Merhar, Peter Geis,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-kernel, linux-tegra, open list:THERMAL

On Wed, 16 Jun 2021 at 10:03, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> +Vincent.
>
> On 15-06-21, 22:32, Dmitry Osipenko wrote:
> > IIUC, the cpufreq already should be prepared for the case where firmware
> > may override frequency. Viresh, could you please clarify what are the
> > possible implications of the frequency overriding?
>
> The only implication is software would think hardware is running at
> some other frequency, while it is not. Not sure if something may break
> as a result of this.
>
> The scheduler's view of CPUs will not be same though, i.e. scheduler
> will see capacity as X, while in reality it has changed to Y.

thermal_pressure is used by scheduler to balance the load between CPUs
according to the actual max frequency. If the thermal pressure doesn't
reflect reality, scheduler will end up enqueuing too many  tasks on a
throttle CPU.

>
> --
> viresh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  2021-06-16  8:30               ` Vincent Guittot
@ 2021-06-16  8:39                 ` Dmitry Osipenko
  2021-06-16  8:51                   ` Vincent Guittot
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry Osipenko @ 2021-06-16  8:39 UTC (permalink / raw)
  To: Vincent Guittot, Viresh Kumar
  Cc: Daniel Lezcano, Thara Gopinath, Thierry Reding, Jonathan Hunter,
	Zhang Rui, Amit Kucheria, Andreas Westman Dorcsak, Maxim Schwalm,
	Svyatoslav Ryhel, Ihor Didenko, Ion Agorria, Matt Merhar,
	Peter Geis,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-kernel, linux-tegra, open list:THERMAL

16.06.2021 11:30, Vincent Guittot пишет:
> On Wed, 16 Jun 2021 at 10:03, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>
>> +Vincent.
>>
>> On 15-06-21, 22:32, Dmitry Osipenko wrote:
>>> IIUC, the cpufreq already should be prepared for the case where firmware
>>> may override frequency. Viresh, could you please clarify what are the
>>> possible implications of the frequency overriding?
>>
>> The only implication is software would think hardware is running at
>> some other frequency, while it is not. Not sure if something may break
>> as a result of this.
>>
>> The scheduler's view of CPUs will not be same though, i.e. scheduler
>> will see capacity as X, while in reality it has changed to Y.
> 
> thermal_pressure is used by scheduler to balance the load between CPUs
> according to the actual max frequency. If the thermal pressure doesn't
> reflect reality, scheduler will end up enqueuing too many  tasks on a
> throttle CPU.

What if all CPUs are throttled equally and running on the same
frequency, will throttling have any effect on the scheduler decisions?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  2021-06-16  8:39                 ` Dmitry Osipenko
@ 2021-06-16  8:51                   ` Vincent Guittot
  2021-06-16 10:49                     ` Dmitry Osipenko
  0 siblings, 1 reply; 22+ messages in thread
From: Vincent Guittot @ 2021-06-16  8:51 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Viresh Kumar, Daniel Lezcano, Thara Gopinath, Thierry Reding,
	Jonathan Hunter, Zhang Rui, Amit Kucheria,
	Andreas Westman Dorcsak, Maxim Schwalm, Svyatoslav Ryhel,
	Ihor Didenko, Ion Agorria, Matt Merhar, Peter Geis,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-kernel, linux-tegra, open list:THERMAL

On Wed, 16 Jun 2021 at 10:39, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 16.06.2021 11:30, Vincent Guittot пишет:
> > On Wed, 16 Jun 2021 at 10:03, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >>
> >> +Vincent.
> >>
> >> On 15-06-21, 22:32, Dmitry Osipenko wrote:
> >>> IIUC, the cpufreq already should be prepared for the case where firmware
> >>> may override frequency. Viresh, could you please clarify what are the
> >>> possible implications of the frequency overriding?
> >>
> >> The only implication is software would think hardware is running at
> >> some other frequency, while it is not. Not sure if something may break
> >> as a result of this.
> >>
> >> The scheduler's view of CPUs will not be same though, i.e. scheduler
> >> will see capacity as X, while in reality it has changed to Y.
> >
> > thermal_pressure is used by scheduler to balance the load between CPUs
> > according to the actual max frequency. If the thermal pressure doesn't
> > reflect reality, scheduler will end up enqueuing too many  tasks on a
> > throttle CPU.
>
> What if all CPUs are throttled equally and running on the same
> frequency, will throttling have any effect on the scheduler decisions?

Yes, the capacity is also used to detect when CPUs have spare capacity
or are already overloaded. We usually try to fill the spare capacity
of a CPU (CPU's max capacity - current utilization) but he max
capacity is reduced when the CPU is throttled, and the spare capacity
doesn't exist but scheduler could try to it

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  2021-06-16  2:50             ` Thara Gopinath
@ 2021-06-16 10:47               ` Dmitry Osipenko
  2021-06-16 14:36                 ` Thara Gopinath
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry Osipenko @ 2021-06-16 10:47 UTC (permalink / raw)
  To: Thara Gopinath, Daniel Lezcano, Viresh Kumar
  Cc: Thierry Reding, Jonathan Hunter, Zhang Rui, Amit Kucheria,
	Andreas Westman Dorcsak, Maxim Schwalm, Svyatoslav Ryhel,
	Ihor Didenko, Ion Agorria, Matt Merhar, Peter Geis, devicetree,
	linux-kernel, linux-tegra, linux-pm

16.06.2021 05:50, Thara Gopinath пишет:
...
> 
> Hi,
> 
> Thermal pressure is letting scheduler know that the max capacity
> available for a cpu to schedule tasks is reduced due to a thermal event.
> So you cannot have a h/w thermal pressure and s/w thermal pressure.
> There is eventually only one capping applied at h/w level and the
> frequency corresponding to this capping should be used for thermal
> pressure.
> 
> Ideally you should not be having both s/w and h/w trying to throttle at
> the same time. Why is this a scenario and what prevents you from
> disabling s/w throttling when h/w throttling is enabled. Now if there
> has to a aggregation for whatever reason this should be done at the
> thermal driver level and passed to scheduler.

Hello,

The h/w mitigation is much more reactive than software, in the same time
it's much less flexible than software. It should provide additional
protection in a cases where software isn't doing a good job. Ideally h/w
mitigation should stay inactive all the time, nevertheless it should be
modeled properly by the driver.

>>>
>>> That is a good question. IMO, first step would be to call
>>> cpufreq_update_limits().
>>
>> Right
>>
>>> [ Cc Thara who implemented the thermal pressure ]
>>>
>>> May be Thara has an idea about how to aggregate both? There is another
>>> series floating around with hardware limiter [1] and the same
>>> problematic.
>>>
>>>   [1] https://lkml.org/lkml/2021/6/8/1791
>>
>> Thanks, it indeed looks similar.
>>
>> I guess the common thermal pressure update code could be moved out into
>> a new special cpufreq thermal QoS handler (policy->thermal_constraints),
>> where handler will select the frequency constraint and set up the
>> pressure accordingly. So there won't be any races in the code.
>>
> It was a conscious decision to keep thermal pressure update out of qos
> max freq update because there are platforms that don't use the qos
> framework. For eg acpi uses cpufreq_update_policy.
> But you are right. We have two platforms now applying h/w throttling and
> cpufreq_cooling applying s/w throttling. So it does make sense to have
> one api doing all the computation to update thermal pressure. I am not
> sure how exactly/where exactly this will reside.

The generic cpufreq_cooling already uses QoS for limiting the CPU
frequency. It could be okay to use QoS for the OF drivers, this needs a
closer look.

We have the case where CPU frequency is changed by the thermal event and
the thermal pressure equation is the same for both s/w cpufreq_cooling
and h/w thermal driver. The pressure is calculated based on the QoS
cpufreq constraint that is already aggregated.

Hence what we may need to do on the thermal event is:

1. Update the QoS request
2. Update the thermal pressure
3. Ensure that updates are not racing

> So for starters, I think you should replicate the update of thermal
> pressure in your h/w driver when you know that h/w is
> throttling/throttled the frequency. You can refer to cpufreq_cooling.c
> to see how it is done.
> 
> Moving to a common api can be done as a separate patch series.
> 

Thank you for the clarification and suggestion.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  2021-06-16  8:51                   ` Vincent Guittot
@ 2021-06-16 10:49                     ` Dmitry Osipenko
  0 siblings, 0 replies; 22+ messages in thread
From: Dmitry Osipenko @ 2021-06-16 10:49 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Viresh Kumar, Daniel Lezcano, Thara Gopinath, Thierry Reding,
	Jonathan Hunter, Zhang Rui, Amit Kucheria,
	Andreas Westman Dorcsak, Maxim Schwalm, Svyatoslav Ryhel,
	Ihor Didenko, Ion Agorria, Matt Merhar, Peter Geis,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-kernel, linux-tegra, open list:THERMAL

16.06.2021 11:51, Vincent Guittot пишет:
> On Wed, 16 Jun 2021 at 10:39, Dmitry Osipenko <digetx@gmail.com> wrote:
>>
>> 16.06.2021 11:30, Vincent Guittot пишет:
>>> On Wed, 16 Jun 2021 at 10:03, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>>>
>>>> +Vincent.
>>>>
>>>> On 15-06-21, 22:32, Dmitry Osipenko wrote:
>>>>> IIUC, the cpufreq already should be prepared for the case where firmware
>>>>> may override frequency. Viresh, could you please clarify what are the
>>>>> possible implications of the frequency overriding?
>>>>
>>>> The only implication is software would think hardware is running at
>>>> some other frequency, while it is not. Not sure if something may break
>>>> as a result of this.
>>>>
>>>> The scheduler's view of CPUs will not be same though, i.e. scheduler
>>>> will see capacity as X, while in reality it has changed to Y.
>>>
>>> thermal_pressure is used by scheduler to balance the load between CPUs
>>> according to the actual max frequency. If the thermal pressure doesn't
>>> reflect reality, scheduler will end up enqueuing too many  tasks on a
>>> throttle CPU.
>>
>> What if all CPUs are throttled equally and running on the same
>> frequency, will throttling have any effect on the scheduler decisions?
> 
> Yes, the capacity is also used to detect when CPUs have spare capacity
> or are already overloaded. We usually try to fill the spare capacity
> of a CPU (CPU's max capacity - current utilization) but he max
> capacity is reduced when the CPU is throttled, and the spare capacity
> doesn't exist but scheduler could try to it
> 

Thank you.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
  2021-06-16 10:47               ` Dmitry Osipenko
@ 2021-06-16 14:36                 ` Thara Gopinath
  0 siblings, 0 replies; 22+ messages in thread
From: Thara Gopinath @ 2021-06-16 14:36 UTC (permalink / raw)
  To: Dmitry Osipenko, Daniel Lezcano, Viresh Kumar
  Cc: Thierry Reding, Jonathan Hunter, Zhang Rui, Amit Kucheria,
	Andreas Westman Dorcsak, Maxim Schwalm, Svyatoslav Ryhel,
	Ihor Didenko, Ion Agorria, Matt Merhar, Peter Geis, devicetree,
	linux-kernel, linux-tegra, linux-pm



On 6/16/21 6:47 AM, Dmitry Osipenko wrote:
> 16.06.2021 05:50, Thara Gopinath пишет:
> ...
>>
>> Hi,
>>
>> Thermal pressure is letting scheduler know that the max capacity
>> available for a cpu to schedule tasks is reduced due to a thermal event.
>> So you cannot have a h/w thermal pressure and s/w thermal pressure.
>> There is eventually only one capping applied at h/w level and the
>> frequency corresponding to this capping should be used for thermal
>> pressure.
>>
>> Ideally you should not be having both s/w and h/w trying to throttle at
>> the same time. Why is this a scenario and what prevents you from
>> disabling s/w throttling when h/w throttling is enabled. Now if there
>> has to a aggregation for whatever reason this should be done at the
>> thermal driver level and passed to scheduler.
> 
> Hello,
> 
> The h/w mitigation is much more reactive than software, in the same time
> it's much less flexible than software. It should provide additional
> protection in a cases where software isn't doing a good job. Ideally h/w
> mitigation should stay inactive all the time, nevertheless it should be
> modeled properly by the driver.

Ok. This is kind of opposite to what I am doing on the Qcom platform I 
am working on. The h/w throttling is the default since like you 
mentioned it is more reactive. And s/w does only critical trip management.

> 
>>>>
>>>> That is a good question. IMO, first step would be to call
>>>> cpufreq_update_limits().
>>>
>>> Right
>>>
>>>> [ Cc Thara who implemented the thermal pressure ]
>>>>
>>>> May be Thara has an idea about how to aggregate both? There is another
>>>> series floating around with hardware limiter [1] and the same
>>>> problematic.
>>>>
>>>>    [1] https://lkml.org/lkml/2021/6/8/1791
>>>
>>> Thanks, it indeed looks similar.
>>>
>>> I guess the common thermal pressure update code could be moved out into
>>> a new special cpufreq thermal QoS handler (policy->thermal_constraints),
>>> where handler will select the frequency constraint and set up the
>>> pressure accordingly. So there won't be any races in the code.
>>>
>> It was a conscious decision to keep thermal pressure update out of qos
>> max freq update because there are platforms that don't use the qos
>> framework. For eg acpi uses cpufreq_update_policy.
>> But you are right. We have two platforms now applying h/w throttling and
>> cpufreq_cooling applying s/w throttling. So it does make sense to have
>> one api doing all the computation to update thermal pressure. I am not
>> sure how exactly/where exactly this will reside.
> 
> The generic cpufreq_cooling already uses QoS for limiting the CPU
> frequency. It could be okay to use QoS for the OF drivers, this needs a
> closer look.
> 
> We have the case where CPU frequency is changed by the thermal event and
> the thermal pressure equation is the same for both s/w cpufreq_cooling
> and h/w thermal driver. The pressure is calculated based on the QoS
> cpufreq constraint that is already aggregated.
> 
> Hence what we may need to do on the thermal event is:
> 
> 1. Update the QoS request
> 2. Update the thermal pressure
> 3. Ensure that updates are not racing

Yes. So the first two steps you mentioned is exactly what 
cpufreq_cooling.c also does except for the fact that it is a s/w 
mitigation. Now if you have two sources that is updating the max 
frequency via qos, I think you can do either of the following before
calculating thermal pressure
1. Read the throttled frequency from h/w if  your h/w supports this feature.
	or
2. Use freq_qos_read_value to get the max frequency value.

Either way only the correct throttled capacity should be passed to 
scheduler.

-- 
Warm Regards
Thara (She/Her/Hers)
> 
>> So for starters, I think you should replicate the update of thermal
>> pressure in your h/w driver when you know that h/w is
>> throttling/throttled the frequency. You can refer to cpufreq_cooling.c
>> to see how it is done.
>>
>> Moving to a common api can be done as a separate patch series.
>>
> 
> Thank you for the clarification and suggestion.
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-06-16 14:36 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-29 17:09 [PATCH v3 0/7] Add driver for NVIDIA Tegra30 SoC Thermal sensor Dmitry Osipenko
2021-05-29 17:09 ` [PATCH v3 1/7] dt-bindings: thermal: Add binding for Tegra30 thermal sensor Dmitry Osipenko
2021-05-29 17:09 ` [PATCH v3 2/7] thermal: thermal_of: Stop zone device before unregistering it Dmitry Osipenko
2021-05-29 17:09 ` [PATCH v3 3/7] thermal/core: Export thermal_cooling_device_stats_update() Dmitry Osipenko
2021-05-29 17:09 ` [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor Dmitry Osipenko
2021-05-31 12:51   ` Thierry Reding
2021-06-15 10:03   ` Daniel Lezcano
2021-06-15 10:26     ` Viresh Kumar
2021-06-15 13:01       ` Dmitry Osipenko
2021-06-15 16:18         ` Daniel Lezcano
2021-06-15 19:32           ` Dmitry Osipenko
2021-06-16  2:50             ` Thara Gopinath
2021-06-16 10:47               ` Dmitry Osipenko
2021-06-16 14:36                 ` Thara Gopinath
2021-06-16  8:03             ` Viresh Kumar
2021-06-16  8:30               ` Vincent Guittot
2021-06-16  8:39                 ` Dmitry Osipenko
2021-06-16  8:51                   ` Vincent Guittot
2021-06-16 10:49                     ` Dmitry Osipenko
2021-05-29 17:09 ` [PATCH v3 5/7] ARM: tegra_defconfig: Enable CONFIG_TEGRA30_TSENSOR Dmitry Osipenko
2021-05-29 17:09 ` [PATCH v3 6/7] ARM: multi_v7_defconfig: " Dmitry Osipenko
2021-05-29 17:09 ` [PATCH v3 7/7] ARM: tegra: Add SoC thermal sensor to Tegra30 device-trees Dmitry Osipenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.