* [PATCH 0/4] powercap/dtpm: Add the DTPM framework @ 2020-10-06 12:20 Daniel Lezcano 2020-10-06 12:20 ` [PATCH 1/4] units: Add Watt units Daniel Lezcano ` (4 more replies) 0 siblings, 5 replies; 42+ messages in thread From: Daniel Lezcano @ 2020-10-06 12:20 UTC (permalink / raw) To: rafael, srinivas.pandruvada Cc: lukasz.luba, linux-kernel, linux-pm, rui.zhang The density of components greatly increased the last decade bringing a numerous number of heating sources which are monitored by more than 20 sensors on recent SoC. The skin temperature, which is the case temperature of the device, must stay below approximately 45°C in order to comply with the legal requirements. The skin temperature is managed as a whole by an user space daemon, which is catching the current application profile, to allocate a power budget to the different components where the resulting heating effect will comply with the skin temperature constraint. This technique is called the Dynamic Thermal Power Management. The Linux kernel does not provide any unified interface to act on the power of the different devices. Currently, the thermal framework is changed to export artificially the performance states of different devices via the cooling device software component with opaque values. This change is done regardless of the in-kernel logic to mitigate the temperature. The user space daemon uses all the available knobs to act on the power limit and those differ from one platform to another. This series provides a Dynamic Thermal Power Management framework to provide an unified way to act on the power of the devices. Daniel Lezcano (4): units: Add Watt units Documentation/powercap/dtpm: Add documentation for dtpm powercap/drivers/dtpm: Add API for dynamic thermal power management powercap/drivers/dtpm: Add CPU energy model based support Documentation/power/powercap/dtpm.rst | 222 +++++++++++++ drivers/powercap/Kconfig | 14 + drivers/powercap/Makefile | 2 + drivers/powercap/dtpm.c | 430 ++++++++++++++++++++++++++ drivers/powercap/dtpm_cpu.c | 242 +++++++++++++++ include/asm-generic/vmlinux.lds.h | 11 + include/linux/cpuhotplug.h | 1 + include/linux/dtpm.h | 76 +++++ include/linux/units.h | 4 + 9 files changed, 1002 insertions(+) create mode 100644 Documentation/power/powercap/dtpm.rst create mode 100644 drivers/powercap/dtpm.c create mode 100644 drivers/powercap/dtpm_cpu.c create mode 100644 include/linux/dtpm.h -- 2.17.1 ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 1/4] units: Add Watt units 2020-10-06 12:20 [PATCH 0/4] powercap/dtpm: Add the DTPM framework Daniel Lezcano @ 2020-10-06 12:20 ` Daniel Lezcano 2020-11-10 10:02 ` Lukasz Luba 2020-10-06 12:20 ` [PATCH 2/4] Documentation/powercap/dtpm: Add documentation for dtpm Daniel Lezcano ` (3 subsequent siblings) 4 siblings, 1 reply; 42+ messages in thread From: Daniel Lezcano @ 2020-10-06 12:20 UTC (permalink / raw) To: rafael, srinivas.pandruvada Cc: lukasz.luba, linux-kernel, linux-pm, rui.zhang, Andrew Morton, Andy Shevchenko, Akinobu Mita As there are the temperature units, let's add the Watt macros definition. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> --- include/linux/units.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/include/linux/units.h b/include/linux/units.h index aaf716364ec3..92c234e71cab 100644 --- a/include/linux/units.h +++ b/include/linux/units.h @@ -4,6 +4,10 @@ #include <linux/kernel.h> +#define MILLIWATT_PER_WATT 1000L +#define MICROWATT_PER_MILLIWATT 1000L +#define MICROWATT_PER_WATT 1000000L + #define ABSOLUTE_ZERO_MILLICELSIUS -273150 static inline long milli_kelvin_to_millicelsius(long t) -- 2.17.1 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 1/4] units: Add Watt units 2020-10-06 12:20 ` [PATCH 1/4] units: Add Watt units Daniel Lezcano @ 2020-11-10 10:02 ` Lukasz Luba 0 siblings, 0 replies; 42+ messages in thread From: Lukasz Luba @ 2020-11-10 10:02 UTC (permalink / raw) To: Daniel Lezcano Cc: rafael, srinivas.pandruvada, linux-kernel, linux-pm, rui.zhang, Andrew Morton, Andy Shevchenko, Akinobu Mita On 10/6/20 1:20 PM, Daniel Lezcano wrote: > As there are the temperature units, let's add the Watt macros definition. > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > --- > include/linux/units.h | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/include/linux/units.h b/include/linux/units.h > index aaf716364ec3..92c234e71cab 100644 > --- a/include/linux/units.h > +++ b/include/linux/units.h > @@ -4,6 +4,10 @@ > > #include <linux/kernel.h> > > +#define MILLIWATT_PER_WATT 1000L > +#define MICROWATT_PER_MILLIWATT 1000L > +#define MICROWATT_PER_WATT 1000000L > + > #define ABSOLUTE_ZERO_MILLICELSIUS -273150 > > static inline long milli_kelvin_to_millicelsius(long t) > Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Regards, Lukasz ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 2/4] Documentation/powercap/dtpm: Add documentation for dtpm 2020-10-06 12:20 [PATCH 0/4] powercap/dtpm: Add the DTPM framework Daniel Lezcano 2020-10-06 12:20 ` [PATCH 1/4] units: Add Watt units Daniel Lezcano @ 2020-10-06 12:20 ` Daniel Lezcano 2020-10-13 22:01 ` Ram Chandrasekar 2020-10-06 12:20 ` [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management Daniel Lezcano ` (2 subsequent siblings) 4 siblings, 1 reply; 42+ messages in thread From: Daniel Lezcano @ 2020-10-06 12:20 UTC (permalink / raw) To: rafael, srinivas.pandruvada Cc: lukasz.luba, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Len Brown, Pavel Machek The dynamic thermal and power management is a technique to dynamically adjust the power consumption of different devices in order to ensure a global thermal constraint. An userspace daemon is usually monitoring the temperature and the power to take immediate action on the device. The DTPM framework provides an unified API to userspace to act on the power. Document this framework. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> --- Documentation/power/powercap/dtpm.rst | 222 ++++++++++++++++++++++++++ 1 file changed, 222 insertions(+) create mode 100644 Documentation/power/powercap/dtpm.rst diff --git a/Documentation/power/powercap/dtpm.rst b/Documentation/power/powercap/dtpm.rst new file mode 100644 index 000000000000..ce11cf183994 --- /dev/null +++ b/Documentation/power/powercap/dtpm.rst @@ -0,0 +1,222 @@ +========================================== +Dynamic Thermal Power Management framework +========================================== + +On the embedded world, the complexity of the SoC leads to an +increasing number of hotspots which need to be monitored and mitigated +as a whole in order to prevent the temperature to go above the +normative and legally stated 'skin temperature'. + +Another aspect is to sustain the performance for a given power budget, +for example virtual reality where the user can feel dizziness if the +performance is capped while a big CPU is processing something else. Or +reduce the battery charging because the dissipated power is too high +compared with the power consumed by other devices. + +The userspace is the most adequate place to dynamically act on the +different devices by limiting their power given an application +profile: it has the knowledge of the platform. + +The Dynamic Thermal Power Management (DTPM) is a technique acting on +the device power by limiting and/or balancing a power budget among +different devices. + +The DTPM framework provides an unified interface to act on the +device power. + +=========== +1. Overview +=========== + +The DTPM framework relies on the powercap framework to create the +powercap entries in the sysfs directory and implement the backend +driver to do the connection with the power manageable device. + +The DTPM is a tree representation describing the power constraints +shared between devices, not their physical positions. + +The nodes of the tree are a virtual description aggregating the power +characteristics of the children nodes and their power limitations. + +The leaves of the tree are the real power manageable devices. + +For instance: + + SoC + | + `-- pkg + | + |-- pd0 (cpu0-3) + | + `-- pd1 (cpu4-5) + +* The pkg power will be the sum of pd0 and pd1 power numbers. + + SoC (400mW - 3100mW) + | + `-- pkg (400mW - 3100mW) + | + |-- pd0 (100mW - 700mW) + | + `-- pd1 (300mW - 2400mW) + +* When the nodes are inserted in the tree, their power characteristics + are propagated to the parents. + + SoC (600mW - 5900mW) + | + |-- pkg (400mW - 3100mW) + | | + | |-- pd0 (100mW - 700mW) + | | + | `-- pd1 (300mW - 2400mW) + | + `-- pd2 (200mW - 2800mW) + +* Each node have a weight on a 2^10 basis reflecting the percentage of + power consumption along the siblings. + + SoC (w=1024) + | + |-- pkg (w=538) + | | + | |-- pd0 (w=231) + | | + | `-- pd1 (w=794) + | + `-- pd2 (w=486) + + Note the sum of weights at the same level are equal to 1024. + +* When a power limitation is applied to a node, then it is distributed + along the children given their weights. For example, if we set a + power limitation of 3200mW at the 'SoC' root node, the resulting + tree will be. + + SoC (w=1024) <--- power_limit = 3200mW + | + |-- pkg (w=538) --> power_limit = 1681mW + | | + | |-- pd0 (w=231) --> power_limit = 378mW + | | + | `-- pd1 (w=794) --> power_limit = 1303mW + | + `-- pd2 (w=486) --> power_limit = 1519mW + +==================== +1.1 Flat description +==================== + +A root node is created and it is the parent of all the nodes. This +description is the simplest one and it is supposed to give to +userspace a flat representation of all the devices supporting the +power limitation without any power limitation distribution. + +============================ +1.2 Hierarchical description +============================ + +The different devices supporting the power limitation are represented +hierarchically. There is one root node, all intermediate nodes are +grouping the child nodes which can be intermediate nodes also or real +devices. + +The intermediate nodes aggregate the power information and allows to +set the power limit given the weight of the nodes. + +================ +2. Userspace API +================ + +As stated in the overview, the DTPM framework is built on top of the +powercap framework. Thus the sysfs interface is the same, please refer +to the powercap documentation for further details. + + * power_uw: Instantaneous power consumption. If the node is an + intermediate node, then the power consumption will be the sum of all + children power consumption. + + * max_power_range_uw: The power range resulting of the maximum power + minus the minimum power. + + * name: The name of the node. This is implementation dependant. Even + if it is not recommended for the userspace, several nodes can have + the same name. + + * constraint_X_name: The name of the constraint. + + * constraint_X_max_power_uw: The maximum power limit to be applicable + to the node. + + * constraint_X_power_limit_uw: The power limit to be applied to the + node. If the value contained in constraint_X_max_power_uw is set, + the constraint will be removed. + + * constraint_X_time_window_us: The meaning of this file will depend + on the constraint number. + +=============== +2.1 Constraints +=============== + + * Constraint 0: The power limitation is immediately applied, without + limitation in time. + +============= +3. Kernel API +============= + +============ +3.1 Overview +============ + +The DTPM framework has no power limiting backend support. It is +generic and provides a set of API to let the different drivers to +implement the backend part for the power limitation and create a the +power constraints tree. + +It is up to the platform to provide the initialization function to +allocate and link the different nodes of the tree. + +A special macro has the role of declaring a node and the corresponding +initialization function via a description structure. This one contains +an optional parent field allowing to hook different devices to an +already existing tree at boot time. + +struct dtpm_descr my_descr = { + .name = "my_name", + .init = my_init_func, +}; + +DTPM_DECLARE(my_descr); + +The nodes of the DTPM tree are described with dtpm structure. The +steps to add a new power limitable device is done in three steps: + + * Allocate the dtpm node + * Set the power number of the dtpm node + * Register the dtpm node + +The registration of the dtpm node is done with the powercap +ops. Basically, it must implements the callbacks to get and set the +power and the limit. + +Alternatively, if the node to be inserted is an intermediate one, then +a simple function to insert it as a future parent is available. + +If a device has its power characteristics changing, then the tree must +be updated with the new power numbers and weights. + +================ +3.2 Nomenclature +================ + + * dtpm_alloc() : Allocate and initialize a dtpm structure + + * dtpm_register() : Add the dtpm node to the tree + + * dtpm_register_parent() : Add an intermediate node + + * dtpm_unregister() : Remove the dtpm node from the tree + + * dtpm_update_power() : Update the power characteristics of the dtpm node -- 2.17.1 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 2/4] Documentation/powercap/dtpm: Add documentation for dtpm 2020-10-06 12:20 ` [PATCH 2/4] Documentation/powercap/dtpm: Add documentation for dtpm Daniel Lezcano @ 2020-10-13 22:01 ` Ram Chandrasekar 0 siblings, 0 replies; 42+ messages in thread From: Ram Chandrasekar @ 2020-10-13 22:01 UTC (permalink / raw) To: Daniel Lezcano, rafael, srinivas.pandruvada Cc: lukasz.luba, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Len Brown, Pavel Machek On 10/6/2020 6:20 AM, Daniel Lezcano wrote: > The dynamic thermal and power management is a technique to dynamically > adjust the power consumption of different devices in order to ensure a > global thermal constraint. > > An userspace daemon is usually monitoring the temperature and the > power to take immediate action on the device. > > The DTPM framework provides an unified API to userspace to act on the > power. > > Document this framework. > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > --- > Documentation/power/powercap/dtpm.rst | 222 ++++++++++++++++++++++++++ > 1 file changed, 222 insertions(+) > create mode 100644 Documentation/power/powercap/dtpm.rst > > diff --git a/Documentation/power/powercap/dtpm.rst b/Documentation/power/powercap/dtpm.rst > new file mode 100644 > index 000000000000..ce11cf183994 > --- /dev/null > +++ b/Documentation/power/powercap/dtpm.rst > @@ -0,0 +1,222 @@ > +========================================== > +Dynamic Thermal Power Management framework > +========================================== > + > +On the embedded world, the complexity of the SoC leads to an > +increasing number of hotspots which need to be monitored and mitigated > +as a whole in order to prevent the temperature to go above the > +normative and legally stated 'skin temperature'. > + > +Another aspect is to sustain the performance for a given power budget, > +for example virtual reality where the user can feel dizziness if the > +performance is capped while a big CPU is processing something else. Or > +reduce the battery charging because the dissipated power is too high > +compared with the power consumed by other devices. > + > +The userspace is the most adequate place to dynamically act on the > +different devices by limiting their power given an application > +profile: it has the knowledge of the platform. > + > +The Dynamic Thermal Power Management (DTPM) is a technique acting on > +the device power by limiting and/or balancing a power budget among > +different devices. > + > +The DTPM framework provides an unified interface to act on the > +device power. > + > +=========== > +1. Overview > +=========== > + > +The DTPM framework relies on the powercap framework to create the > +powercap entries in the sysfs directory and implement the backend > +driver to do the connection with the power manageable device. > + > +The DTPM is a tree representation describing the power constraints > +shared between devices, not their physical positions. > + > +The nodes of the tree are a virtual description aggregating the power > +characteristics of the children nodes and their power limitations. > + > +The leaves of the tree are the real power manageable devices. > + > +For instance: > + > + SoC > + | > + `-- pkg > + | > + |-- pd0 (cpu0-3) > + | > + `-- pd1 (cpu4-5) > + > +* The pkg power will be the sum of pd0 and pd1 power numbers. > + > + SoC (400mW - 3100mW) > + | > + `-- pkg (400mW - 3100mW) > + | > + |-- pd0 (100mW - 700mW) > + | > + `-- pd1 (300mW - 2400mW) > + > +* When the nodes are inserted in the tree, their power characteristics > + are propagated to the parents. > + > + SoC (600mW - 5900mW) > + | > + |-- pkg (400mW - 3100mW) > + | | > + | |-- pd0 (100mW - 700mW) > + | | > + | `-- pd1 (300mW - 2400mW) > + | > + `-- pd2 (200mW - 2800mW) > + > +* Each node have a weight on a 2^10 basis reflecting the percentage of > + power consumption along the siblings. > + > + SoC (w=1024) > + | > + |-- pkg (w=538) > + | | > + | |-- pd0 (w=231) > + | | > + | `-- pd1 (w=794) > + | > + `-- pd2 (w=486) > + > + Note the sum of weights at the same level are equal to 1024. > + > +* When a power limitation is applied to a node, then it is distributed > + along the children given their weights. For example, if we set a > + power limitation of 3200mW at the 'SoC' root node, the resulting > + tree will be. > + > + SoC (w=1024) <--- power_limit = 3200mW > + | > + |-- pkg (w=538) --> power_limit = 1681mW > + | | > + | |-- pd0 (w=231) --> power_limit = 378mW > + | | > + | `-- pd1 (w=794) --> power_limit = 1303mW > + | > + `-- pd2 (w=486) --> power_limit = 1519mW > + > +==================== > +1.1 Flat description > +==================== > + > +A root node is created and it is the parent of all the nodes. This > +description is the simplest one and it is supposed to give to > +userspace a flat representation of all the devices supporting the > +power limitation without any power limitation distribution. > + > +============================ > +1.2 Hierarchical description > +============================ > + > +The different devices supporting the power limitation are represented > +hierarchically. There is one root node, all intermediate nodes are > +grouping the child nodes which can be intermediate nodes also or real > +devices. > + > +The intermediate nodes aggregate the power information and allows to > +set the power limit given the weight of the nodes. > + > +================ > +2. Userspace API > +================ > + > +As stated in the overview, the DTPM framework is built on top of the > +powercap framework. Thus the sysfs interface is the same, please refer > +to the powercap documentation for further details. > + > + * power_uw: Instantaneous power consumption. If the node is an > + intermediate node, then the power consumption will be the sum of all > + children power consumption. > + > + * max_power_range_uw: The power range resulting of the maximum power > + minus the minimum power. > + > + * name: The name of the node. This is implementation dependant. Even > + if it is not recommended for the userspace, several nodes can have > + the same name. > + > + * constraint_X_name: The name of the constraint. > + > + * constraint_X_max_power_uw: The maximum power limit to be applicable > + to the node. > + > + * constraint_X_power_limit_uw: The power limit to be applied to the > + node. If the value contained in constraint_X_max_power_uw is set, > + the constraint will be removed. How is power_limit_uW different from max_power_uW? > + > + * constraint_X_time_window_us: The meaning of this file will depend > + on the constraint number. > + > +=============== > +2.1 Constraints > +=============== > + > + * Constraint 0: The power limitation is immediately applied, without > + limitation in time. > + > +============= > +3. Kernel API > +============= > + > +============ > +3.1 Overview > +============ > + > +The DTPM framework has no power limiting backend support. It is > +generic and provides a set of API to let the different drivers to > +implement the backend part for the power limitation and create a the create a the -> create a > +power constraints tree. > + > +It is up to the platform to provide the initialization function to > +allocate and link the different nodes of the tree. > + > +A special macro has the role of declaring a node and the corresponding > +initialization function via a description structure. This one contains > +an optional parent field allowing to hook different devices to an > +already existing tree at boot time. > + > +struct dtpm_descr my_descr = { > + .name = "my_name", > + .init = my_init_func, > +}; > + > +DTPM_DECLARE(my_descr); > + > +The nodes of the DTPM tree are described with dtpm structure. The > +steps to add a new power limitable device is done in three steps: > + > + * Allocate the dtpm node > + * Set the power number of the dtpm node > + * Register the dtpm node > + > +The registration of the dtpm node is done with the powercap > +ops. Basically, it must implements the callbacks to get and set the implements -> implement > +power and the limit. > + > +Alternatively, if the node to be inserted is an intermediate one, then > +a simple function to insert it as a future parent is available. > + > +If a device has its power characteristics changing, then the tree must > +be updated with the new power numbers and weights. > + > +================ > +3.2 Nomenclature > +================ > + > + * dtpm_alloc() : Allocate and initialize a dtpm structure > + > + * dtpm_register() : Add the dtpm node to the tree > + > + * dtpm_register_parent() : Add an intermediate node > + > + * dtpm_unregister() : Remove the dtpm node from the tree > + > + * dtpm_update_power() : Update the power characteristics of the dtpm node > ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management 2020-10-06 12:20 [PATCH 0/4] powercap/dtpm: Add the DTPM framework Daniel Lezcano 2020-10-06 12:20 ` [PATCH 1/4] units: Add Watt units Daniel Lezcano 2020-10-06 12:20 ` [PATCH 2/4] Documentation/powercap/dtpm: Add documentation for dtpm Daniel Lezcano @ 2020-10-06 12:20 ` Daniel Lezcano 2020-10-06 16:42 ` kernel test robot ` (3 more replies) 2020-10-06 12:20 ` [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support Daniel Lezcano 2020-10-07 10:43 ` [PATCH 0/4] powercap/dtpm: Add the DTPM framework Hans de Goede 4 siblings, 4 replies; 42+ messages in thread From: Daniel Lezcano @ 2020-10-06 12:20 UTC (permalink / raw) To: rafael, srinivas.pandruvada Cc: lukasz.luba, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Arnd Bergmann, open list:GENERIC INCLUDE/ASM HEADER FILES On the embedded world, the complexity of the SoC leads to an increasing number of hotspots which need to be monitored and mitigated as a whole in order to prevent the temperature to go above the normative and legally stated 'skin temperature'. Another aspect is to sustain the performance for a given power budget, for example virtual reality where the user can feel dizziness if the GPU performance is capped while a big CPU is processing something else. Or reduce the battery charging because the dissipated power is too high compared with the power consumed by other devices. The userspace is the most adequate place to dynamically act on the different devices by limiting their power given an application profile: it has the knowledge of the platform. These userspace daemons are in charge of the Dynamic Thermal Power Management (DTPM). Nowadays, the dtpm daemons are abusing the thermal framework as they act on the cooling device state to force a specific and arbitraty state without taking care of the governor decisions. Given the closed loop of some governors that can confuse the logic or directly enter in a decision conflict. As the number of cooling device support is limited today to the CPU and the GPU, the dtpm daemons have little control on the power dissipation of the system. The out of tree solutions are hacking around here and there in the drivers, in the frameworks to have control on the devices. The common solution is to declare them as cooling devices. There is no unification of the power limitation unit, opaque states are used. This patch provides a way to create a hierarchy of constraints using the powercap framework. The devices which are registered as power limit-able devices are represented in this hierarchy as a tree. They are linked together with intermediate nodes which are just there to propagate the constraint to the children. The leaves of the tree are the real devices, the intermediate nodes are virtual, aggregating the children constraints and power characteristics. Each node have a weight on a 2^10 basis, in order to reflect the percentage of power distribution of the children's node. This percentage is used to dispatch the power limit to the children. The weight is computed against the max power of the siblings. This simple approach allows to do a fair distribution of the power limit. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> --- drivers/powercap/Kconfig | 6 + drivers/powercap/Makefile | 1 + drivers/powercap/dtpm.c | 430 ++++++++++++++++++++++++++++++ include/asm-generic/vmlinux.lds.h | 11 + include/linux/dtpm.h | 73 +++++ 5 files changed, 521 insertions(+) create mode 100644 drivers/powercap/dtpm.c create mode 100644 include/linux/dtpm.h diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig index ebc4d4578339..777cf60300b8 100644 --- a/drivers/powercap/Kconfig +++ b/drivers/powercap/Kconfig @@ -43,4 +43,10 @@ config IDLE_INJECT CPUs for power capping. Idle period can be injected synchronously on a set of specified CPUs or alternatively on a per CPU basis. + +config DTPM + bool "Power capping for dynamic thermal power management" + help + This enables support for the power capping for the dynamic + thermal management userspace engine. endif diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile index 7255c94ec61c..6482ac52054d 100644 --- a/drivers/powercap/Makefile +++ b/drivers/powercap/Makefile @@ -1,4 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only +obj-$(CONFIG_DTPM) += dtpm.o obj-$(CONFIG_POWERCAP) += powercap_sys.o obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o diff --git a/drivers/powercap/dtpm.c b/drivers/powercap/dtpm.c new file mode 100644 index 000000000000..6df1e51a2c1c --- /dev/null +++ b/drivers/powercap/dtpm.c @@ -0,0 +1,430 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright 2020 Linaro Limited + * + * Author: Daniel Lezcano <daniel.lezcano@linaro.org> + * + * The powercap based Dynamic Thermal Power Management framework + * provides to the userspace a consistent API to set the power limit + * on some devices. + * + * DTPM defines the functions to create a tree of constraints. Each + * parent node is a virtual description of the aggregation of the + * children. It propagates the constraints set at its level to its + * children and collect the children power infomation. The leaves of + * the tree are the real devices which have the ability to get their + * current power consumption and set their power limit. + */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/dtpm.h> +#include <linux/init.h> +#include <linux/kernel.h> +#include <linux/powercap.h> +#include <linux/slab.h> +#include <linux/spinlock.h> + +static const char *constraint_name[] = { + "Instantaneous power constraint", +}; + +static struct powercap_control_type *pct; +static struct dtpm *root; + +static int get_time_window_us(struct powercap_zone *pcz, int cid, u64 *window) +{ + return -ENOSYS; +} + +static int set_time_window_us(struct powercap_zone *pcz, int cid, u64 window) +{ + return -ENOSYS; +} + +static int get_max_power_range_uw(struct powercap_zone *pcz, u64 *max_power_uw) +{ + struct dtpm *dtpm = to_dtpm(pcz); + + spin_lock(&dtpm->lock); + *max_power_uw = dtpm->power_max - dtpm->power_min; + spin_unlock(&dtpm->lock); + + return 0; +} + +static int get_children_power_uw(struct powercap_zone *pcz, u64 *power_uw) +{ + struct dtpm *dtpm = to_dtpm(pcz); + struct dtpm *child; + u64 power; + int ret = 0; + + *power_uw = 0; + + spin_lock(&dtpm->lock); + list_for_each_entry(child, &dtpm->children, sibling) { + ret = child->zone.ops->get_power_uw(&child->zone, &power); + if (ret) + break; + *power_uw += power; + } + spin_unlock(&dtpm->lock); + + return ret; +} + +static void __dtpm_rebalance_weight(struct dtpm *dtpm) +{ + struct dtpm *child; + + spin_lock(&dtpm->lock); + list_for_each_entry(child, &dtpm->children, sibling) { + + pr_debug("Setting weight '%d' for '%s'\n", + child->weight, child->zone.name); + + child->weight = DIV_ROUND_CLOSEST(child->power_max * 1024, + dtpm->power_max); + + __dtpm_rebalance_weight(child); + } + spin_unlock(&dtpm->lock); +} + +static void dtpm_rebalance_weight(void) +{ + __dtpm_rebalance_weight(root); +} + +static void dtpm_sub_power(struct dtpm *dtpm) +{ + struct dtpm *parent = dtpm->parent; + + while (parent) { + spin_lock(&parent->lock); + parent->power_min -= dtpm->power_min; + parent->power_max -= dtpm->power_max; + spin_unlock(&parent->lock); + parent = parent->parent; + } + + dtpm_rebalance_weight(); +} + +static void dtpm_add_power(struct dtpm *dtpm) +{ + struct dtpm *parent = dtpm->parent; + + while (parent) { + spin_lock(&parent->lock); + parent->power_min += dtpm->power_min; + parent->power_max += dtpm->power_max; + spin_unlock(&parent->lock); + parent = parent->parent; + } + + dtpm_rebalance_weight(); +} + +/** + * dtpm_update_power - Update the power on the dtpm + * @dtpm: a pointer to a dtpm structure to update + * @power_min: a u64 representing the new power_min value + * @power_max: a u64 representing the new power_max value + * + * Function to update the power values of the dtpm node specified in + * parameter. These new values will be propagated to the tree. + * + * Return: zero on success, -EINVAL if the values are inconsistent + */ +int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max) +{ + if (power_min == dtpm->power_min && power_max == dtpm->power_max) + return 0; + + if (power_max < power_min) + return -EINVAL; + + dtpm_sub_power(dtpm); + spin_lock(&dtpm->lock); + dtpm->power_min = power_min; + dtpm->power_max = power_max; + spin_unlock(&dtpm->lock); + dtpm_add_power(dtpm); + + return 0; +} + +/** + * dtpm_release_zone - Cleanup when the node is released + * @pcz: a pointer to a powercap_zone structure + * + * Do some housecleaning and update the weight on the tree. The + * release will be denied if the node has children. This function must + * be called by the specific release callback of the different + * backends. + * + * Return: 0 on success, -EBUSY if there are children + */ +int dtpm_release_zone(struct powercap_zone *pcz) +{ + struct dtpm *dtpm = to_dtpm(pcz); + struct dtpm *parent = dtpm->parent; + + if (!list_empty(&dtpm->children)) + return -EBUSY; + + if (parent) { + spin_lock(&parent->lock); + list_del(&dtpm->sibling); + spin_unlock(&parent->lock); + } + + dtpm_sub_power(dtpm); + + kfree(dtpm); + + return 0; +} + +/* + * Set the power limit on the nodes, the power limit is distributed + * given the weight of the children. + */ +static int set_children_power_limit(struct powercap_zone *pcz, int cid, + u64 power_limit) +{ + struct dtpm *dtpm = to_dtpm(pcz); + struct dtpm *child; + u64 power; + int ret = 0; + + /* + * Don't allow values outside of the power range previously + * set when initiliazing the power numbers. + */ + power_limit = clamp_val(power_limit, dtpm->power_min, dtpm->power_max); + + spin_lock(&dtpm->lock); + list_for_each_entry(child, &dtpm->children, sibling) { + + /* + * Integer division rounding will inevitably lead to a + * different max value when set several times. In + * order to restore the initial value, we force the + * child's max power every time if the constraint is + * removed by setting a value greater or equal to the + * max power. + */ + if (power_limit == dtpm->power_max) + power = child->power_max; + else + power = DIV_ROUND_CLOSEST( + power_limit * child->weight, 1024); + + pr_debug("Setting power limit for '%s': %llu uW\n", + child->zone.name, power); + + ret = child->zone.constraints->ops->set_power_limit_uw( + &child->zone, cid, power); + if (ret) + break; + } + spin_unlock(&dtpm->lock); + + return ret; +} + +static int get_children_power_limit(struct powercap_zone *pcz, int cid, + u64 *power_limit) +{ + struct dtpm *dtpm = to_dtpm(pcz); + struct dtpm *child; + u64 power; + int ret = 0; + + *power_limit = 0; + + spin_lock(&dtpm->lock); + list_for_each_entry(child, &dtpm->children, sibling) { + ret = child->zone.constraints->ops->get_power_limit_uw( + &child->zone, cid, &power); + if (ret) + break; + *power_limit += power; + } + spin_unlock(&dtpm->lock); + + return ret; +} + +static const char *get_constraint_name(struct powercap_zone *pcz, int cid) +{ + return constraint_name[cid]; +} + +static int get_max_power_uw(struct powercap_zone *pcz, int id, u64 *max_power) +{ + struct dtpm *dtpm = to_dtpm(pcz); + + spin_lock(&dtpm->lock); + *max_power = dtpm->power_max; + spin_unlock(&dtpm->lock); + + return 0; +} + +static struct powercap_zone_constraint_ops constraint_ops = { + .set_power_limit_uw = set_children_power_limit, + .get_power_limit_uw = get_children_power_limit, + .set_time_window_us = set_time_window_us, + .get_time_window_us = get_time_window_us, + .get_max_power_uw = get_max_power_uw, + .get_name = get_constraint_name, +}; + +static struct powercap_zone_ops zone_ops = { + .get_max_power_range_uw = get_max_power_range_uw, + .get_power_uw = get_children_power_uw, + .release = dtpm_release_zone, +}; + +/** + * dtpm_alloc - Allocate and initialize a dtpm struct + * @name: a string specifying the name of the node + * + * Return: a struct dtpm pointer, NULL in case of error + */ +struct dtpm *dtpm_alloc(void) +{ + struct dtpm *dtpm; + + dtpm = kzalloc(sizeof(*dtpm), GFP_KERNEL); + if (dtpm) { + INIT_LIST_HEAD(&dtpm->children); + INIT_LIST_HEAD(&dtpm->sibling); + spin_lock_init(&dtpm->lock); + dtpm->weight = 1024; + } + + return dtpm; +} + +/** + * dtpm_unregister - Unregister a dtpm node from the hierarchy tree + * @dtpm: a pointer to a dtpm structure corresponding to the node to be removed + * + * Call the underlying powercap unregister function. That will call + * the release callback of the powercap zone. + */ +void dtpm_unregister(struct dtpm *dtpm) +{ + powercap_unregister_zone(pct, &dtpm->zone); +} + +/** + * dtpm_register - Register a dtpm node in the hierarchy tree + * @name: a string specifying the name of the node + * @dtpm: a pointer to a dtpm structure corresponding to the new node + * @parent: a pointer to a dtpm structure corresponding to the parent node + * @ops: a pointer to a powercap_zone_ops structure + * @nr_constraints: a integer giving the number of constraints supported + * @const_ops: a pointer to a powercap_zone_constraint_ops structure + * + * Create a dtpm node in the tree. If no parent is specified, the node + * is the root node of the hierarchy. If the root node already exists, + * then the registration will fail. The powercap controller must be + * initialized before calling this function. + * + * The dtpm structure must be initialized with the power numbers + * before calling this function. + * + * Return: zero on success, a negative value in case of error: + * -EAGAIN: the function is called before the framework is initialized. + * -EBUSY: the root node is already inserted + * -EINVAL: there is no root node yet and @parent is specified + * Other negative values are reported back from the powercap framework + */ +int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent, + struct powercap_zone_ops *ops, int nr_constraints, + struct powercap_zone_constraint_ops *const_ops) +{ + struct powercap_zone *pcz; + + if (!pct) + return -EAGAIN; + + if (root && !parent) + return -EBUSY; + + if (!root && parent) + return -EINVAL; + + const_ops->get_name = get_constraint_name; + const_ops->get_max_power_uw = get_max_power_uw; + const_ops->set_time_window_us = set_time_window_us; + const_ops->get_time_window_us = get_time_window_us; + + ops->get_max_power_range_uw = get_max_power_range_uw; + + pcz = powercap_register_zone(&dtpm->zone, pct, name, + parent ? &parent->zone : NULL, + ops, nr_constraints, const_ops); + if (IS_ERR(pcz)) + return PTR_ERR(pcz); + + if (parent) { + spin_lock(&parent->lock); + list_add_tail(&dtpm->sibling, &parent->children); + spin_unlock(&parent->lock); + dtpm->parent = parent; + } else { + root = dtpm; + } + + dtpm_add_power(dtpm); + + return 0; +} + +/** + * dtpm_register_parent - Register a intermediate node in the tree + * @name: a string specifying the name of the node + * @dtpm: a pointer to a dtpm structure corresponding to the new node + * @parent: a pointer to a dtpm structure corresponding parent's new node + * + * The function will add an intermediate node corresponding to a + * parent to more nodes. Its purpose is to aggregate the children + * characteristics and dispatch the constraints. If the @parent + * parameter is NULL, then this node becomes the root node of the tree + * if there is no root node yet. + * + * Return: zero on success, a negative value in case of error: + * -EAGAIN: the function is called before the framework is initialized. + * -EBUSY: the root node is already inserted + * -EINVAL: there is not root node yet and @parent is specified + * Other negative values are reported back from the powercap framework + */ +int dtpm_register_parent(const char *name, struct dtpm *dtpm, + struct dtpm *parent) +{ + return dtpm_register(name, dtpm, parent, &zone_ops, + MAX_DTPM_CONSTRAINTS, &constraint_ops); +} + +static int __init dtpm_init(void) +{ + struct dtpm_descr **dtpm_descr; + + pct = powercap_register_control_type(NULL, "dtpm", NULL); + if (!pct) { + pr_err("Failed to register control type\n"); + return -EINVAL; + } + + for_each_dtpm_table(dtpm_descr) + (*dtpm_descr)->init(*dtpm_descr); + + return 0; +} +late_initcall(dtpm_init); diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 5430febd34be..29b30976ea02 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -315,6 +315,16 @@ #define THERMAL_TABLE(name) #endif +#ifdef CONFIG_DTPM +#define DTPM_TABLE() \ + . = ALIGN(8); \ + __dtpm_table = .; \ + KEEP(*(__dtpm_table)) \ + __dtpm_table_end = .; +#else +#define DTPM_TABLE() +#endif + #define KERNEL_DTB() \ STRUCT_ALIGN(); \ __dtb_start = .; \ @@ -715,6 +725,7 @@ ACPI_PROBE_TABLE(irqchip) \ ACPI_PROBE_TABLE(timer) \ THERMAL_TABLE(governor) \ + DTPM_TABLE() \ EARLYCON_TABLE() \ LSM_TABLE() \ EARLY_LSM_TABLE() diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h new file mode 100644 index 000000000000..6696bdcfdb87 --- /dev/null +++ b/include/linux/dtpm.h @@ -0,0 +1,73 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2020 Linaro Ltd + * + * Author: Daniel Lezcano <daniel.lezcano@linaro.org> + */ +#ifndef ___DTPM_H__ +#define ___DTPM_H__ + +#include <linux/of.h> +#include <linux/powercap.h> + +#define MAX_DTPM_DESCR 8 +#define MAX_DTPM_CONSTRAINTS 1 + +struct dtpm { + struct powercap_zone zone; + struct dtpm *parent; + struct list_head sibling; + struct list_head children; + spinlock_t lock; + u64 power_limit; + u64 power_max; + u64 power_min; + int weight; + void *private; +}; + +struct dtpm_descr; + +typedef int (*dtpm_init_t)(struct dtpm_descr *); + +struct dtpm_descr { + struct dtpm *parent; + const char *name; + dtpm_init_t init; +}; + +/* Init section thermal table */ +extern struct dtpm_descr *__dtpm_table[]; +extern struct dtpm_descr *__dtpm_table_end[]; + +#define DTPM_TABLE_ENTRY(name) \ + static typeof(name) *__dtpm_table_entry_##name \ + __used __section(__dtpm_table) = &name + +#define DTPM_DECLARE(name) DTPM_TABLE_ENTRY(name) + +#define for_each_dtpm_table(__dtpm) \ + for (__dtpm = __dtpm_table; \ + __dtpm < __dtpm_table_end; \ + __dtpm++) + +static inline struct dtpm *to_dtpm(struct powercap_zone *zone) +{ + return container_of(zone, struct dtpm, zone); +} + +int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max); + +int dtpm_release_zone(struct powercap_zone *pcz); + +struct dtpm *dtpm_alloc(void); + +void dtpm_unregister(struct dtpm *dtpm); + +int dtpm_register_parent(const char *name, struct dtpm *dtpm, + struct dtpm *parent); + +int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent, + struct powercap_zone_ops *ops, int nr_constraints, + struct powercap_zone_constraint_ops *const_ops); +#endif -- 2.17.1 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management 2020-10-06 12:20 ` [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management Daniel Lezcano @ 2020-10-06 16:42 ` kernel test robot 2020-10-06 18:05 ` kernel test robot ` (2 subsequent siblings) 3 siblings, 0 replies; 42+ messages in thread From: kernel test robot @ 2020-10-06 16:42 UTC (permalink / raw) To: Daniel Lezcano, rafael, srinivas.pandruvada Cc: kbuild-all, lukasz.luba, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Arnd Bergmann [-- Attachment #1: Type: text/plain, Size: 1736 bytes --] Hi Daniel, I love your patch! Yet something to improve: [auto build test ERROR on pm/linux-next] [also build test ERROR on linux/master linus/master v5.9-rc8 next-20201006] [cannot apply to asm-generic/master] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Daniel-Lezcano/powercap-dtpm-Add-the-DTPM-framework/20201006-202317 base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next config: m68k-allmodconfig (attached as .config) compiler: m68k-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/a3c92086239fccbe4523d59537de2a4c805561d9 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Daniel-Lezcano/powercap-dtpm-Add-the-DTPM-framework/20201006-202317 git checkout a3c92086239fccbe4523d59537de2a4c805561d9 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=m68k If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): m68k-linux-ld: drivers/powercap/dtpm.o: in function `__dtpm_rebalance_weight': >> dtpm.c:(.text+0x298): undefined reference to `__udivdi3' --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 57818 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management 2020-10-06 12:20 ` [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management Daniel Lezcano 2020-10-06 16:42 ` kernel test robot @ 2020-10-06 18:05 ` kernel test robot 2020-10-23 10:29 ` Lukasz Luba 2020-11-10 9:59 ` Lukasz Luba 3 siblings, 0 replies; 42+ messages in thread From: kernel test robot @ 2020-10-06 18:05 UTC (permalink / raw) To: Daniel Lezcano, rafael, srinivas.pandruvada Cc: kbuild-all, lukasz.luba, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Arnd Bergmann [-- Attachment #1: Type: text/plain, Size: 3024 bytes --] Hi Daniel, I love your patch! Yet something to improve: [auto build test ERROR on pm/linux-next] [also build test ERROR on linux/master linus/master v5.9-rc8 next-20201006] [cannot apply to asm-generic/master] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Daniel-Lezcano/powercap-dtpm-Add-the-DTPM-framework/20201006-202317 base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next config: mips-allyesconfig (attached as .config) compiler: mips-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/a3c92086239fccbe4523d59537de2a4c805561d9 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Daniel-Lezcano/powercap-dtpm-Add-the-DTPM-framework/20201006-202317 git checkout a3c92086239fccbe4523d59537de2a4c805561d9 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=mips If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): arch/mips/kernel/head.o: in function `dtb_found': (.ref.text+0xe0): relocation truncated to fit: R_MIPS_26 against `start_kernel' init/main.o: in function `set_reset_devices': main.c:(.init.text+0x20): relocation truncated to fit: R_MIPS_26 against `_mcount' main.c:(.init.text+0x30): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc' init/main.o: in function `debug_kernel': main.c:(.init.text+0x9c): relocation truncated to fit: R_MIPS_26 against `_mcount' main.c:(.init.text+0xac): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc' init/main.o: in function `quiet_kernel': main.c:(.init.text+0x118): relocation truncated to fit: R_MIPS_26 against `_mcount' main.c:(.init.text+0x128): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc' init/main.o: in function `init_setup': main.c:(.init.text+0x1a4): relocation truncated to fit: R_MIPS_26 against `_mcount' main.c:(.init.text+0x1c8): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc' main.c:(.init.text+0x1e8): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc' main.c:(.init.text+0x1fc): additional relocation overflows omitted from the output mips-linux-ld: drivers/powercap/dtpm.o: in function `__dtpm_rebalance_weight': >> dtpm.c:(.text.__dtpm_rebalance_weight+0xec): undefined reference to `__udivdi3' --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 67616 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management 2020-10-06 12:20 ` [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management Daniel Lezcano 2020-10-06 16:42 ` kernel test robot 2020-10-06 18:05 ` kernel test robot @ 2020-10-23 10:29 ` Lukasz Luba 2020-11-03 18:42 ` Daniel Lezcano 2020-11-10 9:59 ` Lukasz Luba 3 siblings, 1 reply; 42+ messages in thread From: Lukasz Luba @ 2020-10-23 10:29 UTC (permalink / raw) To: Daniel Lezcano Cc: rafael, srinivas.pandruvada, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Arnd Bergmann, open list:GENERIC INCLUDE/ASM HEADER FILES Hi Daniel, On 10/6/20 1:20 PM, Daniel Lezcano wrote: > On the embedded world, the complexity of the SoC leads to an > increasing number of hotspots which need to be monitored and mitigated > as a whole in order to prevent the temperature to go above the > normative and legally stated 'skin temperature'. > > Another aspect is to sustain the performance for a given power budget, > for example virtual reality where the user can feel dizziness if the > GPU performance is capped while a big CPU is processing something > else. Or reduce the battery charging because the dissipated power is > too high compared with the power consumed by other devices. > > The userspace is the most adequate place to dynamically act on the > different devices by limiting their power given an application > profile: it has the knowledge of the platform. > > These userspace daemons are in charge of the Dynamic Thermal Power > Management (DTPM). > > Nowadays, the dtpm daemons are abusing the thermal framework as they > act on the cooling device state to force a specific and arbitraty > state without taking care of the governor decisions. Given the closed > loop of some governors that can confuse the logic or directly enter in > a decision conflict. > > As the number of cooling device support is limited today to the CPU > and the GPU, the dtpm daemons have little control on the power > dissipation of the system. The out of tree solutions are hacking > around here and there in the drivers, in the frameworks to have > control on the devices. The common solution is to declare them as > cooling devices. > > There is no unification of the power limitation unit, opaque states > are used. > > This patch provides a way to create a hierarchy of constraints using > the powercap framework. The devices which are registered as power > limit-able devices are represented in this hierarchy as a tree. They > are linked together with intermediate nodes which are just there to > propagate the constraint to the children. > > The leaves of the tree are the real devices, the intermediate nodes > are virtual, aggregating the children constraints and power > characteristics. > > Each node have a weight on a 2^10 basis, in order to reflect the > percentage of power distribution of the children's node. This > percentage is used to dispatch the power limit to the children. > > The weight is computed against the max power of the siblings. > > This simple approach allows to do a fair distribution of the power > limit. > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > --- > drivers/powercap/Kconfig | 6 + > drivers/powercap/Makefile | 1 + > drivers/powercap/dtpm.c | 430 ++++++++++++++++++++++++++++++ > include/asm-generic/vmlinux.lds.h | 11 + > include/linux/dtpm.h | 73 +++++ > 5 files changed, 521 insertions(+) > create mode 100644 drivers/powercap/dtpm.c > create mode 100644 include/linux/dtpm.h > > diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig > index ebc4d4578339..777cf60300b8 100644 > --- a/drivers/powercap/Kconfig > +++ b/drivers/powercap/Kconfig > @@ -43,4 +43,10 @@ config IDLE_INJECT > CPUs for power capping. Idle period can be injected > synchronously on a set of specified CPUs or alternatively > on a per CPU basis. > + > +config DTPM > + bool "Power capping for dynamic thermal power management" Maybe starting with capital letters: Dynamic Thermal Power Management? > + help > + This enables support for the power capping for the dynamic > + thermal management userspace engine. > endif > diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile > index 7255c94ec61c..6482ac52054d 100644 > --- a/drivers/powercap/Makefile > +++ b/drivers/powercap/Makefile > @@ -1,4 +1,5 @@ > # SPDX-License-Identifier: GPL-2.0-only > +obj-$(CONFIG_DTPM) += dtpm.o > obj-$(CONFIG_POWERCAP) += powercap_sys.o > obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o > obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o > diff --git a/drivers/powercap/dtpm.c b/drivers/powercap/dtpm.c > new file mode 100644 > index 000000000000..6df1e51a2c1c > --- /dev/null > +++ b/drivers/powercap/dtpm.c > @@ -0,0 +1,430 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Copyright 2020 Linaro Limited > + * > + * Author: Daniel Lezcano <daniel.lezcano@linaro.org> > + * > + * The powercap based Dynamic Thermal Power Management framework > + * provides to the userspace a consistent API to set the power limit > + * on some devices. > + * > + * DTPM defines the functions to create a tree of constraints. Each > + * parent node is a virtual description of the aggregation of the > + * children. It propagates the constraints set at its level to its > + * children and collect the children power infomation. The leaves of s/infomation/information/ > + * the tree are the real devices which have the ability to get their > + * current power consumption and set their power limit. > + */ > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include <linux/dtpm.h> > +#include <linux/init.h> > +#include <linux/kernel.h> > +#include <linux/powercap.h> > +#include <linux/slab.h> > +#include <linux/spinlock.h> > + > +static const char *constraint_name[] = { > + "Instantaneous power constraint", > +}; > + > +static struct powercap_control_type *pct; > +static struct dtpm *root; I wonder if it safe to have the tree without a global lock for it, like mutex tree_lock ? I have put some comments below when the code traverses the tree. > + > +static int get_time_window_us(struct powercap_zone *pcz, int cid, u64 *window) > +{ > + return -ENOSYS; > +} > + > +static int set_time_window_us(struct powercap_zone *pcz, int cid, u64 window) > +{ > + return -ENOSYS; > +} > + > +static int get_max_power_range_uw(struct powercap_zone *pcz, u64 *max_power_uw) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + > + spin_lock(&dtpm->lock); > + *max_power_uw = dtpm->power_max - dtpm->power_min; > + spin_unlock(&dtpm->lock); > + > + return 0; > +} > + > +static int get_children_power_uw(struct powercap_zone *pcz, u64 *power_uw) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + struct dtpm *child; > + u64 power; > + int ret = 0; > + > + *power_uw = 0; > + > + spin_lock(&dtpm->lock); > + list_for_each_entry(child, &dtpm->children, sibling) { > + ret = child->zone.ops->get_power_uw(&child->zone, &power); > + if (ret) > + break; > + *power_uw += power; > + } > + spin_unlock(&dtpm->lock); > + > + return ret; > +} > + > +static void __dtpm_rebalance_weight(struct dtpm *dtpm) > +{ > + struct dtpm *child; > + > + spin_lock(&dtpm->lock); > + list_for_each_entry(child, &dtpm->children, sibling) { > + > + pr_debug("Setting weight '%d' for '%s'\n", > + child->weight, child->zone.name); > + > + child->weight = DIV_ROUND_CLOSEST(child->power_max * 1024, > + dtpm->power_max); > + > + __dtpm_rebalance_weight(child); > + } > + spin_unlock(&dtpm->lock); > +} > + > +static void dtpm_rebalance_weight(void) > +{ > + __dtpm_rebalance_weight(root); > +} > + > +static void dtpm_sub_power(struct dtpm *dtpm) > +{ > + struct dtpm *parent = dtpm->parent; > + > + while (parent) { I am not sure if it is safe for a corner case when the nodes are removing from bottom to top. We don't hold a tree lock, so these two (above line and below) operations might be split/preempted and 'parent' freed before taking the lock. Is it possible? (Note: I might missed something like double locking using this local node spinlock). > + spin_lock(&parent->lock); > + parent->power_min -= dtpm->power_min; > + parent->power_max -= dtpm->power_max; > + spin_unlock(&parent->lock); > + parent = parent->parent; > + } > + > + dtpm_rebalance_weight(); > +} > + > +static void dtpm_add_power(struct dtpm *dtpm) > +{ > + struct dtpm *parent = dtpm->parent; > + > + while (parent) { Similar here? > + spin_lock(&parent->lock); > + parent->power_min += dtpm->power_min; > + parent->power_max += dtpm->power_max; > + spin_unlock(&parent->lock); > + parent = parent->parent; > + } > + > + dtpm_rebalance_weight(); > +} > + > +/** > + * dtpm_update_power - Update the power on the dtpm > + * @dtpm: a pointer to a dtpm structure to update > + * @power_min: a u64 representing the new power_min value > + * @power_max: a u64 representing the new power_max value > + * > + * Function to update the power values of the dtpm node specified in > + * parameter. These new values will be propagated to the tree. > + * > + * Return: zero on success, -EINVAL if the values are inconsistent > + */ > +int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max) > +{ > + if (power_min == dtpm->power_min && power_max == dtpm->power_max) > + return 0; > + > + if (power_max < power_min) > + return -EINVAL; > + > + dtpm_sub_power(dtpm); > + spin_lock(&dtpm->lock); > + dtpm->power_min = power_min; > + dtpm->power_max = power_max; > + spin_unlock(&dtpm->lock); > + dtpm_add_power(dtpm); > + > + return 0; > +} > + > +/** > + * dtpm_release_zone - Cleanup when the node is released > + * @pcz: a pointer to a powercap_zone structure > + * > + * Do some housecleaning and update the weight on the tree. The > + * release will be denied if the node has children. This function must > + * be called by the specific release callback of the different > + * backends. > + * > + * Return: 0 on success, -EBUSY if there are children > + */ > +int dtpm_release_zone(struct powercap_zone *pcz) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + struct dtpm *parent = dtpm->parent; > + I would lock the whole tree, just to play safe. What do you think? > + if (!list_empty(&dtpm->children)) > + return -EBUSY; > + > + if (parent) { > + spin_lock(&parent->lock); > + list_del(&dtpm->sibling); > + spin_unlock(&parent->lock); > + } > + > + dtpm_sub_power(dtpm); > + > + kfree(dtpm); > + > + return 0; > +} > + > +/* > + * Set the power limit on the nodes, the power limit is distributed > + * given the weight of the children. > + */ > +static int set_children_power_limit(struct powercap_zone *pcz, int cid, > + u64 power_limit) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + struct dtpm *child; > + u64 power; > + int ret = 0; > + > + /* > + * Don't allow values outside of the power range previously > + * set when initiliazing the power numbers. > + */ > + power_limit = clamp_val(power_limit, dtpm->power_min, dtpm->power_max); > + > + spin_lock(&dtpm->lock); > + list_for_each_entry(child, &dtpm->children, sibling) { > + > + /* > + * Integer division rounding will inevitably lead to a > + * different max value when set several times. In > + * order to restore the initial value, we force the > + * child's max power every time if the constraint is > + * removed by setting a value greater or equal to the > + * max power. > + */ > + if (power_limit == dtpm->power_max) > + power = child->power_max; > + else > + power = DIV_ROUND_CLOSEST( > + power_limit * child->weight, 1024); > + > + pr_debug("Setting power limit for '%s': %llu uW\n", > + child->zone.name, power); > + > + ret = child->zone.constraints->ops->set_power_limit_uw( > + &child->zone, cid, power); > + if (ret) > + break; > + } > + spin_unlock(&dtpm->lock); > + > + return ret; > +} > + > +static int get_children_power_limit(struct powercap_zone *pcz, int cid, > + u64 *power_limit) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + struct dtpm *child; > + u64 power; > + int ret = 0; > + > + *power_limit = 0; > + > + spin_lock(&dtpm->lock); > + list_for_each_entry(child, &dtpm->children, sibling) { > + ret = child->zone.constraints->ops->get_power_limit_uw( > + &child->zone, cid, &power); > + if (ret) > + break; > + *power_limit += power; > + } > + spin_unlock(&dtpm->lock); > + > + return ret; > +} > + > +static const char *get_constraint_name(struct powercap_zone *pcz, int cid) > +{ > + return constraint_name[cid]; > +} > + > +static int get_max_power_uw(struct powercap_zone *pcz, int id, u64 *max_power) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + > + spin_lock(&dtpm->lock); > + *max_power = dtpm->power_max; > + spin_unlock(&dtpm->lock); > + > + return 0; > +} > + > +static struct powercap_zone_constraint_ops constraint_ops = { > + .set_power_limit_uw = set_children_power_limit, > + .get_power_limit_uw = get_children_power_limit, > + .set_time_window_us = set_time_window_us, > + .get_time_window_us = get_time_window_us, > + .get_max_power_uw = get_max_power_uw, > + .get_name = get_constraint_name, > +}; > + > +static struct powercap_zone_ops zone_ops = { > + .get_max_power_range_uw = get_max_power_range_uw, > + .get_power_uw = get_children_power_uw, > + .release = dtpm_release_zone, > +}; > + > +/** > + * dtpm_alloc - Allocate and initialize a dtpm struct > + * @name: a string specifying the name of the node > + * > + * Return: a struct dtpm pointer, NULL in case of error > + */ > +struct dtpm *dtpm_alloc(void) > +{ > + struct dtpm *dtpm; > + > + dtpm = kzalloc(sizeof(*dtpm), GFP_KERNEL); > + if (dtpm) { > + INIT_LIST_HEAD(&dtpm->children); > + INIT_LIST_HEAD(&dtpm->sibling); > + spin_lock_init(&dtpm->lock); Why do we use spinlock and not mutex? > + dtpm->weight = 1024; > + } > + > + return dtpm; > +} > + > +/** > + * dtpm_unregister - Unregister a dtpm node from the hierarchy tree > + * @dtpm: a pointer to a dtpm structure corresponding to the node to be removed > + * > + * Call the underlying powercap unregister function. That will call > + * the release callback of the powercap zone. > + */ > +void dtpm_unregister(struct dtpm *dtpm) > +{ > + powercap_unregister_zone(pct, &dtpm->zone); > +} > + > +/** > + * dtpm_register - Register a dtpm node in the hierarchy tree > + * @name: a string specifying the name of the node > + * @dtpm: a pointer to a dtpm structure corresponding to the new node > + * @parent: a pointer to a dtpm structure corresponding to the parent node > + * @ops: a pointer to a powercap_zone_ops structure > + * @nr_constraints: a integer giving the number of constraints supported > + * @const_ops: a pointer to a powercap_zone_constraint_ops structure > + * > + * Create a dtpm node in the tree. If no parent is specified, the node > + * is the root node of the hierarchy. If the root node already exists, > + * then the registration will fail. The powercap controller must be > + * initialized before calling this function. > + * > + * The dtpm structure must be initialized with the power numbers > + * before calling this function. > + * > + * Return: zero on success, a negative value in case of error: > + * -EAGAIN: the function is called before the framework is initialized. > + * -EBUSY: the root node is already inserted > + * -EINVAL: there is no root node yet and @parent is specified > + * Other negative values are reported back from the powercap framework > + */ > +int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent, > + struct powercap_zone_ops *ops, int nr_constraints, > + struct powercap_zone_constraint_ops *const_ops) > +{ > + struct powercap_zone *pcz; > + > + if (!pct) > + return -EAGAIN; > + > + if (root && !parent) > + return -EBUSY; > + > + if (!root && parent) > + return -EINVAL; > + > + const_ops->get_name = get_constraint_name; > + const_ops->get_max_power_uw = get_max_power_uw; > + const_ops->set_time_window_us = set_time_window_us; > + const_ops->get_time_window_us = get_time_window_us; > + > + ops->get_max_power_range_uw = get_max_power_range_uw; > + > + pcz = powercap_register_zone(&dtpm->zone, pct, name, > + parent ? &parent->zone : NULL, > + ops, nr_constraints, const_ops); > + if (IS_ERR(pcz)) > + return PTR_ERR(pcz); > + > + if (parent) { > + spin_lock(&parent->lock); > + list_add_tail(&dtpm->sibling, &parent->children); > + spin_unlock(&parent->lock); > + dtpm->parent = parent; > + } else { > + root = dtpm; > + } > + > + dtpm_add_power(dtpm); > + > + return 0; > +} > + > +/** > + * dtpm_register_parent - Register a intermediate node in the tree > + * @name: a string specifying the name of the node > + * @dtpm: a pointer to a dtpm structure corresponding to the new node > + * @parent: a pointer to a dtpm structure corresponding parent's new node > + * > + * The function will add an intermediate node corresponding to a > + * parent to more nodes. Its purpose is to aggregate the children > + * characteristics and dispatch the constraints. If the @parent > + * parameter is NULL, then this node becomes the root node of the tree > + * if there is no root node yet. > + * > + * Return: zero on success, a negative value in case of error: > + * -EAGAIN: the function is called before the framework is initialized. > + * -EBUSY: the root node is already inserted > + * -EINVAL: there is not root node yet and @parent is specified > + * Other negative values are reported back from the powercap framework > + */ > +int dtpm_register_parent(const char *name, struct dtpm *dtpm, > + struct dtpm *parent) > +{ > + return dtpm_register(name, dtpm, parent, &zone_ops, > + MAX_DTPM_CONSTRAINTS, &constraint_ops); > +} > + > +static int __init dtpm_init(void) > +{ > + struct dtpm_descr **dtpm_descr; > + > + pct = powercap_register_control_type(NULL, "dtpm", NULL); > + if (!pct) { > + pr_err("Failed to register control type\n"); > + return -EINVAL; > + } > + > + for_each_dtpm_table(dtpm_descr) > + (*dtpm_descr)->init(*dtpm_descr); We don't check the returned value here. It is required that the devices should already be up and running (like cpufreq). But if for some reason the init() failed, maybe it's worth to add a field inside the dtpm_desc or dtpm struct like 'bool ready' ? It could be retried to init later. > + > + return 0; > +} > +late_initcall(dtpm_init); The framework would start operating at late boot. We don't control the thermal/power issues in earier stages. Although, at this late stage all other things like cpufreq should be ready, so the ->init() on them is likely to success. Regards, Lukasz > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > index 5430febd34be..29b30976ea02 100644 > --- a/include/asm-generic/vmlinux.lds.h > +++ b/include/asm-generic/vmlinux.lds.h > @@ -315,6 +315,16 @@ > #define THERMAL_TABLE(name) > #endif > > +#ifdef CONFIG_DTPM > +#define DTPM_TABLE() \ > + . = ALIGN(8); \ > + __dtpm_table = .; \ > + KEEP(*(__dtpm_table)) \ > + __dtpm_table_end = .; > +#else > +#define DTPM_TABLE() > +#endif > + > #define KERNEL_DTB() \ > STRUCT_ALIGN(); \ > __dtb_start = .; \ > @@ -715,6 +725,7 @@ > ACPI_PROBE_TABLE(irqchip) \ > ACPI_PROBE_TABLE(timer) \ > THERMAL_TABLE(governor) \ > + DTPM_TABLE() \ > EARLYCON_TABLE() \ > LSM_TABLE() \ > EARLY_LSM_TABLE() > diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h > new file mode 100644 > index 000000000000..6696bdcfdb87 > --- /dev/null > +++ b/include/linux/dtpm.h > @@ -0,0 +1,73 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* > + * Copyright (C) 2020 Linaro Ltd > + * > + * Author: Daniel Lezcano <daniel.lezcano@linaro.org> > + */ > +#ifndef ___DTPM_H__ > +#define ___DTPM_H__ > + > +#include <linux/of.h> > +#include <linux/powercap.h> > + > +#define MAX_DTPM_DESCR 8 > +#define MAX_DTPM_CONSTRAINTS 1 > + > +struct dtpm { > + struct powercap_zone zone; > + struct dtpm *parent; > + struct list_head sibling; > + struct list_head children; > + spinlock_t lock; > + u64 power_limit; > + u64 power_max; > + u64 power_min; > + int weight; > + void *private; > +}; > + > +struct dtpm_descr; > + > +typedef int (*dtpm_init_t)(struct dtpm_descr *); > + > +struct dtpm_descr { > + struct dtpm *parent; > + const char *name; > + dtpm_init_t init; > +}; > + > +/* Init section thermal table */ > +extern struct dtpm_descr *__dtpm_table[]; > +extern struct dtpm_descr *__dtpm_table_end[]; > + > +#define DTPM_TABLE_ENTRY(name) \ > + static typeof(name) *__dtpm_table_entry_##name \ > + __used __section(__dtpm_table) = &name > + > +#define DTPM_DECLARE(name) DTPM_TABLE_ENTRY(name) > + > +#define for_each_dtpm_table(__dtpm) \ > + for (__dtpm = __dtpm_table; \ > + __dtpm < __dtpm_table_end; \ > + __dtpm++) > + > +static inline struct dtpm *to_dtpm(struct powercap_zone *zone) > +{ > + return container_of(zone, struct dtpm, zone); > +} > + > +int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max); > + > +int dtpm_release_zone(struct powercap_zone *pcz); > + > +struct dtpm *dtpm_alloc(void); > + > +void dtpm_unregister(struct dtpm *dtpm); > + > +int dtpm_register_parent(const char *name, struct dtpm *dtpm, > + struct dtpm *parent); > + > +int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent, > + struct powercap_zone_ops *ops, int nr_constraints, > + struct powercap_zone_constraint_ops *const_ops); > +#endif > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management 2020-10-23 10:29 ` Lukasz Luba @ 2020-11-03 18:42 ` Daniel Lezcano 0 siblings, 0 replies; 42+ messages in thread From: Daniel Lezcano @ 2020-11-03 18:42 UTC (permalink / raw) To: Lukasz Luba Cc: rafael, srinivas.pandruvada, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Arnd Bergmann, open list:GENERIC INCLUDE/ASM HEADER FILES Hi Lukasz, thanks for the review and the comments. On 23/10/2020 12:29, Lukasz Luba wrote: > Hi Daniel, [ ... ] >> + >> +config DTPM >> + bool "Power capping for dynamic thermal power management" > > Maybe starting with capital letters: Dynamic Thermal Power Management? Ok, noted. [ ... ] >> +++ b/drivers/powercap/dtpm.c >> @@ -0,0 +1,430 @@ >> +// SPDX-License-Identifier: GPL-2.0-only >> +/* >> + * Copyright 2020 Linaro Limited >> + * >> + * Author: Daniel Lezcano <daniel.lezcano@linaro.org> >> + * >> + * The powercap based Dynamic Thermal Power Management framework >> + * provides to the userspace a consistent API to set the power limit >> + * on some devices. >> + * >> + * DTPM defines the functions to create a tree of constraints. Each >> + * parent node is a virtual description of the aggregation of the >> + * children. It propagates the constraints set at its level to its >> + * children and collect the children power infomation. The leaves of > > s/infomation/information/ Ok, thanks [ ... ] >> +static struct powercap_control_type *pct; >> +static struct dtpm *root; > > I wonder if it safe to have the tree without a global lock for it, like > mutex tree_lock ? > I have put some comments below when the code traverses the tree. The mutex is a heavy lock and the its purpose is to allow the current process to be preempted while the spinlock is very fast without preemption. Putting in place a single lock will simplify the code but I'm not sure it is worth as it could be a contention. It would be simpler to switch to a big lock than the opposite. [ ... ] >> +static void dtpm_rebalance_weight(void) >> +{ >> + __dtpm_rebalance_weight(root); >> +} >> + >> +static void dtpm_sub_power(struct dtpm *dtpm) >> +{ >> + struct dtpm *parent = dtpm->parent; >> + >> + while (parent) { > > I am not sure if it is safe for a corner case when the > nodes are removing from bottom to top. We don't hold a tree > lock, so these two (above line and below) operations might > be split/preempted and 'parent' freed before taking the lock. > Is it possible? (Note: I might missed something like double > locking using this local node spinlock). The parent can not be freed until it has children, the check is done in the release node function. >> + spin_lock(&parent->lock); >> + parent->power_min -= dtpm->power_min; >> + parent->power_max -= dtpm->power_max; >> + spin_unlock(&parent->lock); >> + parent = parent->parent; >> + } >> + >> + dtpm_rebalance_weight(); >> +} >> + >> +static void dtpm_add_power(struct dtpm *dtpm) >> +{ >> + struct dtpm *parent = dtpm->parent; >> + >> + while (parent) { > > Similar here? > >> + spin_lock(&parent->lock); >> + parent->power_min += dtpm->power_min; >> + parent->power_max += dtpm->power_max; >> + spin_unlock(&parent->lock); >> + parent = parent->parent; >> + } >> + >> + dtpm_rebalance_weight(); >> +} >> + >> +/** >> + * dtpm_update_power - Update the power on the dtpm >> + * @dtpm: a pointer to a dtpm structure to update >> + * @power_min: a u64 representing the new power_min value >> + * @power_max: a u64 representing the new power_max value >> + * >> + * Function to update the power values of the dtpm node specified in >> + * parameter. These new values will be propagated to the tree. >> + * >> + * Return: zero on success, -EINVAL if the values are inconsistent >> + */ >> +int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max) >> +{ >> + if (power_min == dtpm->power_min && power_max == dtpm->power_max) >> + return 0; >> + >> + if (power_max < power_min) >> + return -EINVAL; >> + >> + dtpm_sub_power(dtpm); >> + spin_lock(&dtpm->lock); >> + dtpm->power_min = power_min; >> + dtpm->power_max = power_max; >> + spin_unlock(&dtpm->lock); >> + dtpm_add_power(dtpm); >> + >> + return 0; >> +} >> + >> +/** >> + * dtpm_release_zone - Cleanup when the node is released >> + * @pcz: a pointer to a powercap_zone structure >> + * >> + * Do some housecleaning and update the weight on the tree. The >> + * release will be denied if the node has children. This function must >> + * be called by the specific release callback of the different >> + * backends. >> + * >> + * Return: 0 on success, -EBUSY if there are children >> + */ >> +int dtpm_release_zone(struct powercap_zone *pcz) >> +{ >> + struct dtpm *dtpm = to_dtpm(pcz); >> + struct dtpm *parent = dtpm->parent; >> + > > I would lock the whole tree, just to play safe. > What do you think? I would like to keep the fine grain locking to prevent a potential contention. If it appears we hit a locking incorrectness or a race putting in question the fine grain locking scheme, then we can consider switching to a tree lock. >> + if (!list_empty(&dtpm->children)) >> + return -EBUSY; >> + >> + if (parent) { >> + spin_lock(&parent->lock); >> + list_del(&dtpm->sibling); >> + spin_unlock(&parent->lock); >> + } >> + >> + dtpm_sub_power(dtpm); >> + >> + kfree(dtpm); >> + >> + return 0; >> +} [ ... ] >> +struct dtpm *dtpm_alloc(void) >> +{ >> + struct dtpm *dtpm; >> + >> + dtpm = kzalloc(sizeof(*dtpm), GFP_KERNEL); >> + if (dtpm) { >> + INIT_LIST_HEAD(&dtpm->children); >> + INIT_LIST_HEAD(&dtpm->sibling); >> + spin_lock_init(&dtpm->lock); > > Why do we use spinlock and not mutex? The mutex will force the calling process to be preempted, that is useful when the critical sections contains blocking calls. Here we are just changing values without blocking calls, so using the spinlock is more adequate as they are faster. [ ... ] >> +static int __init dtpm_init(void) >> +{ >> + struct dtpm_descr **dtpm_descr; >> + >> + pct = powercap_register_control_type(NULL, "dtpm", NULL); >> + if (!pct) { >> + pr_err("Failed to register control type\n"); >> + return -EINVAL; >> + } >> + >> + for_each_dtpm_table(dtpm_descr) >> + (*dtpm_descr)->init(*dtpm_descr); > > We don't check the returned value here. It is required that the > devices should already be up and running (like cpufreq). > But if for some reason the init() failed, maybe it's worth to add a > field inside the dtpm_desc or dtpm struct like 'bool ready' ? > It could be retried to init later. It would be make sense to check the init return value if we want to rollback what we have done. Here we don't want to do that. If one subsystem fails to insert itself in the tree, it will log an error but the tree should continue to give access to what have been successfully initialized. The rollback is important in the init() ops, not in dtpm_init(). >> + >> + return 0; >> +} >> +late_initcall(dtpm_init); > > The framework would start operating at late boot. We don't control > the thermal/power issues in earier stages. > Although, at this late stage all other things like cpufreq should be > ready, so the ->init() on them is likely to success. Right, the dtpm is accessible through sysfs for an userspace thermal daemon doing the smart mitigation. So do the initcall can be really late. [ ... ] Thanks for the review. -- Daniel -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management 2020-10-06 12:20 ` [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management Daniel Lezcano ` (2 preceding siblings ...) 2020-10-23 10:29 ` Lukasz Luba @ 2020-11-10 9:59 ` Lukasz Luba 2020-11-10 11:05 ` Lukasz Luba 2020-11-10 12:55 ` Daniel Lezcano 3 siblings, 2 replies; 42+ messages in thread From: Lukasz Luba @ 2020-11-10 9:59 UTC (permalink / raw) To: Daniel Lezcano Cc: rafael, srinivas.pandruvada, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Arnd Bergmann, open list:GENERIC INCLUDE/ASM HEADER FILES Hi Daniel, I've experimented with the patch set and went through the code again. It looks good, only a few minor comments. On 10/6/20 1:20 PM, Daniel Lezcano wrote: > On the embedded world, the complexity of the SoC leads to an > increasing number of hotspots which need to be monitored and mitigated > as a whole in order to prevent the temperature to go above the > normative and legally stated 'skin temperature'. > > Another aspect is to sustain the performance for a given power budget, > for example virtual reality where the user can feel dizziness if the > GPU performance is capped while a big CPU is processing something > else. Or reduce the battery charging because the dissipated power is > too high compared with the power consumed by other devices. > > The userspace is the most adequate place to dynamically act on the > different devices by limiting their power given an application > profile: it has the knowledge of the platform. > > These userspace daemons are in charge of the Dynamic Thermal Power > Management (DTPM). > > Nowadays, the dtpm daemons are abusing the thermal framework as they > act on the cooling device state to force a specific and arbitraty s/arbitraty/arbitrary/ > state without taking care of the governor decisions. Given the closed > loop of some governors that can confuse the logic or directly enter in > a decision conflict. > > As the number of cooling device support is limited today to the CPU > and the GPU, the dtpm daemons have little control on the power > dissipation of the system. The out of tree solutions are hacking > around here and there in the drivers, in the frameworks to have > control on the devices. The common solution is to declare them as > cooling devices. > > There is no unification of the power limitation unit, opaque states > are used. > > This patch provides a way to create a hierarchy of constraints using > the powercap framework. The devices which are registered as power > limit-able devices are represented in this hierarchy as a tree. They > are linked together with intermediate nodes which are just there to > propagate the constraint to the children. > > The leaves of the tree are the real devices, the intermediate nodes > are virtual, aggregating the children constraints and power > characteristics. > > Each node have a weight on a 2^10 basis, in order to reflect the > percentage of power distribution of the children's node. This > percentage is used to dispatch the power limit to the children. > > The weight is computed against the max power of the siblings. > > This simple approach allows to do a fair distribution of the power > limit. > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > --- > drivers/powercap/Kconfig | 6 + > drivers/powercap/Makefile | 1 + > drivers/powercap/dtpm.c | 430 ++++++++++++++++++++++++++++++ > include/asm-generic/vmlinux.lds.h | 11 + > include/linux/dtpm.h | 73 +++++ > 5 files changed, 521 insertions(+) > create mode 100644 drivers/powercap/dtpm.c > create mode 100644 include/linux/dtpm.h > > diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig > index ebc4d4578339..777cf60300b8 100644 > --- a/drivers/powercap/Kconfig > +++ b/drivers/powercap/Kconfig > @@ -43,4 +43,10 @@ config IDLE_INJECT > CPUs for power capping. Idle period can be injected > synchronously on a set of specified CPUs or alternatively > on a per CPU basis. > + > +config DTPM > + bool "Power capping for dynamic thermal power management" > + help > + This enables support for the power capping for the dynamic > + thermal management userspace engine. > endif > diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile > index 7255c94ec61c..6482ac52054d 100644 > --- a/drivers/powercap/Makefile > +++ b/drivers/powercap/Makefile > @@ -1,4 +1,5 @@ > # SPDX-License-Identifier: GPL-2.0-only > +obj-$(CONFIG_DTPM) += dtpm.o > obj-$(CONFIG_POWERCAP) += powercap_sys.o > obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o > obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o > diff --git a/drivers/powercap/dtpm.c b/drivers/powercap/dtpm.c > new file mode 100644 > index 000000000000..6df1e51a2c1c > --- /dev/null > +++ b/drivers/powercap/dtpm.c > @@ -0,0 +1,430 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Copyright 2020 Linaro Limited > + * > + * Author: Daniel Lezcano <daniel.lezcano@linaro.org> > + * > + * The powercap based Dynamic Thermal Power Management framework > + * provides to the userspace a consistent API to set the power limit > + * on some devices. > + * > + * DTPM defines the functions to create a tree of constraints. Each > + * parent node is a virtual description of the aggregation of the > + * children. It propagates the constraints set at its level to its > + * children and collect the children power infomation. The leaves of > + * the tree are the real devices which have the ability to get their > + * current power consumption and set their power limit. > + */ > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include <linux/dtpm.h> > +#include <linux/init.h> > +#include <linux/kernel.h> > +#include <linux/powercap.h> > +#include <linux/slab.h> > +#include <linux/spinlock.h> > + > +static const char *constraint_name[] = { > + "Instantaneous power constraint", Quite long name, max is 30, we would lose new line [1]. > +}; > + > +static struct powercap_control_type *pct; > +static struct dtpm *root; > + > +static int get_time_window_us(struct powercap_zone *pcz, int cid, u64 *window) > +{ > + return -ENOSYS; > +} > + > +static int set_time_window_us(struct powercap_zone *pcz, int cid, u64 window) > +{ > + return -ENOSYS; > +} > + > +static int get_max_power_range_uw(struct powercap_zone *pcz, u64 *max_power_uw) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + > + spin_lock(&dtpm->lock); > + *max_power_uw = dtpm->power_max - dtpm->power_min; > + spin_unlock(&dtpm->lock); > + > + return 0; > +} > + > +static int get_children_power_uw(struct powercap_zone *pcz, u64 *power_uw) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + struct dtpm *child; > + u64 power; > + int ret = 0; > + > + *power_uw = 0; > + > + spin_lock(&dtpm->lock); > + list_for_each_entry(child, &dtpm->children, sibling) { > + ret = child->zone.ops->get_power_uw(&child->zone, &power); > + if (ret) > + break; > + *power_uw += power; > + } > + spin_unlock(&dtpm->lock); > + > + return ret; > +} > + > +static void __dtpm_rebalance_weight(struct dtpm *dtpm) > +{ > + struct dtpm *child; > + > + spin_lock(&dtpm->lock); > + list_for_each_entry(child, &dtpm->children, sibling) { > + > + pr_debug("Setting weight '%d' for '%s'\n", > + child->weight, child->zone.name); > + > + child->weight = DIV_ROUND_CLOSEST(child->power_max * 1024, > + dtpm->power_max); > + > + __dtpm_rebalance_weight(child); > + } > + spin_unlock(&dtpm->lock); > +} > + > +static void dtpm_rebalance_weight(void) > +{ > + __dtpm_rebalance_weight(root); > +} > + > +static void dtpm_sub_power(struct dtpm *dtpm) > +{ > + struct dtpm *parent = dtpm->parent; > + > + while (parent) { > + spin_lock(&parent->lock); > + parent->power_min -= dtpm->power_min; > + parent->power_max -= dtpm->power_max; > + spin_unlock(&parent->lock); > + parent = parent->parent; > + } > + > + dtpm_rebalance_weight(); > +} > + > +static void dtpm_add_power(struct dtpm *dtpm) > +{ > + struct dtpm *parent = dtpm->parent; > + > + while (parent) { > + spin_lock(&parent->lock); > + parent->power_min += dtpm->power_min; > + parent->power_max += dtpm->power_max; > + spin_unlock(&parent->lock); > + parent = parent->parent; > + } > + > + dtpm_rebalance_weight(); > +} > + > +/** > + * dtpm_update_power - Update the power on the dtpm > + * @dtpm: a pointer to a dtpm structure to update > + * @power_min: a u64 representing the new power_min value > + * @power_max: a u64 representing the new power_max value > + * > + * Function to update the power values of the dtpm node specified in > + * parameter. These new values will be propagated to the tree. > + * > + * Return: zero on success, -EINVAL if the values are inconsistent > + */ > +int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max) > +{ > + if (power_min == dtpm->power_min && power_max == dtpm->power_max) > + return 0; > + > + if (power_max < power_min) > + return -EINVAL; > + > + dtpm_sub_power(dtpm); > + spin_lock(&dtpm->lock); > + dtpm->power_min = power_min; > + dtpm->power_max = power_max; > + spin_unlock(&dtpm->lock); > + dtpm_add_power(dtpm); > + > + return 0; > +} > + > +/** > + * dtpm_release_zone - Cleanup when the node is released > + * @pcz: a pointer to a powercap_zone structure > + * > + * Do some housecleaning and update the weight on the tree. The > + * release will be denied if the node has children. This function must > + * be called by the specific release callback of the different > + * backends. > + * > + * Return: 0 on success, -EBUSY if there are children > + */ > +int dtpm_release_zone(struct powercap_zone *pcz) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + struct dtpm *parent = dtpm->parent; > + > + if (!list_empty(&dtpm->children)) > + return -EBUSY; > + > + if (parent) { > + spin_lock(&parent->lock); > + list_del(&dtpm->sibling); > + spin_unlock(&parent->lock); > + } > + > + dtpm_sub_power(dtpm); > + > + kfree(dtpm); > + > + return 0; > +} > + > +/* > + * Set the power limit on the nodes, the power limit is distributed > + * given the weight of the children. > + */ > +static int set_children_power_limit(struct powercap_zone *pcz, int cid, > + u64 power_limit) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + struct dtpm *child; > + u64 power; > + int ret = 0; > + > + /* > + * Don't allow values outside of the power range previously > + * set when initiliazing the power numbers. s/initiliazing/initializing > + */ > + power_limit = clamp_val(power_limit, dtpm->power_min, dtpm->power_max); > + > + spin_lock(&dtpm->lock); > + list_for_each_entry(child, &dtpm->children, sibling) { > + > + /* > + * Integer division rounding will inevitably lead to a > + * different max value when set several times. In > + * order to restore the initial value, we force the > + * child's max power every time if the constraint is > + * removed by setting a value greater or equal to the > + * max power. > + */ > + if (power_limit == dtpm->power_max) > + power = child->power_max; > + else > + power = DIV_ROUND_CLOSEST( > + power_limit * child->weight, 1024); > + > + pr_debug("Setting power limit for '%s': %llu uW\n", > + child->zone.name, power); > + > + ret = child->zone.constraints->ops->set_power_limit_uw( > + &child->zone, cid, power); > + if (ret) > + break; > + } > + spin_unlock(&dtpm->lock); > + > + return ret; > +} > + > +static int get_children_power_limit(struct powercap_zone *pcz, int cid, > + u64 *power_limit) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + struct dtpm *child; > + u64 power; > + int ret = 0; > + > + *power_limit = 0; > + > + spin_lock(&dtpm->lock); > + list_for_each_entry(child, &dtpm->children, sibling) { > + ret = child->zone.constraints->ops->get_power_limit_uw( > + &child->zone, cid, &power); > + if (ret) > + break; > + *power_limit += power; > + } > + spin_unlock(&dtpm->lock); > + > + return ret; > +} > + > +static const char *get_constraint_name(struct powercap_zone *pcz, int cid) > +{ > + return constraint_name[cid]; > +} > + > +static int get_max_power_uw(struct powercap_zone *pcz, int id, u64 *max_power) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + > + spin_lock(&dtpm->lock); > + *max_power = dtpm->power_max; > + spin_unlock(&dtpm->lock); > + > + return 0; > +} > + > +static struct powercap_zone_constraint_ops constraint_ops = { > + .set_power_limit_uw = set_children_power_limit, > + .get_power_limit_uw = get_children_power_limit, > + .set_time_window_us = set_time_window_us, > + .get_time_window_us = get_time_window_us, > + .get_max_power_uw = get_max_power_uw, > + .get_name = get_constraint_name, > +}; > + > +static struct powercap_zone_ops zone_ops = { > + .get_max_power_range_uw = get_max_power_range_uw, > + .get_power_uw = get_children_power_uw, > + .release = dtpm_release_zone, > +}; > + > +/** > + * dtpm_alloc - Allocate and initialize a dtpm struct > + * @name: a string specifying the name of the node > + * > + * Return: a struct dtpm pointer, NULL in case of error > + */ > +struct dtpm *dtpm_alloc(void) > +{ > + struct dtpm *dtpm; > + > + dtpm = kzalloc(sizeof(*dtpm), GFP_KERNEL); > + if (dtpm) { > + INIT_LIST_HEAD(&dtpm->children); > + INIT_LIST_HEAD(&dtpm->sibling); > + spin_lock_init(&dtpm->lock); > + dtpm->weight = 1024; > + } > + > + return dtpm; > +} > + > +/** > + * dtpm_unregister - Unregister a dtpm node from the hierarchy tree > + * @dtpm: a pointer to a dtpm structure corresponding to the node to be removed > + * > + * Call the underlying powercap unregister function. That will call > + * the release callback of the powercap zone. > + */ > +void dtpm_unregister(struct dtpm *dtpm) > +{ > + powercap_unregister_zone(pct, &dtpm->zone); > +} > + > +/** > + * dtpm_register - Register a dtpm node in the hierarchy tree > + * @name: a string specifying the name of the node > + * @dtpm: a pointer to a dtpm structure corresponding to the new node > + * @parent: a pointer to a dtpm structure corresponding to the parent node > + * @ops: a pointer to a powercap_zone_ops structure > + * @nr_constraints: a integer giving the number of constraints supported > + * @const_ops: a pointer to a powercap_zone_constraint_ops structure > + * > + * Create a dtpm node in the tree. If no parent is specified, the node > + * is the root node of the hierarchy. If the root node already exists, > + * then the registration will fail. The powercap controller must be > + * initialized before calling this function. > + * > + * The dtpm structure must be initialized with the power numbers > + * before calling this function. > + * > + * Return: zero on success, a negative value in case of error: > + * -EAGAIN: the function is called before the framework is initialized. > + * -EBUSY: the root node is already inserted > + * -EINVAL: there is no root node yet and @parent is specified > + * Other negative values are reported back from the powercap framework > + */ > +int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent, > + struct powercap_zone_ops *ops, int nr_constraints, > + struct powercap_zone_constraint_ops *const_ops) > +{ > + struct powercap_zone *pcz; > + > + if (!pct) > + return -EAGAIN; > + > + if (root && !parent) > + return -EBUSY; > + > + if (!root && parent) > + return -EINVAL; > + > + const_ops->get_name = get_constraint_name; > + const_ops->get_max_power_uw = get_max_power_uw; > + const_ops->set_time_window_us = set_time_window_us; > + const_ops->get_time_window_us = get_time_window_us; > + > + ops->get_max_power_range_uw = get_max_power_range_uw; > + > + pcz = powercap_register_zone(&dtpm->zone, pct, name, > + parent ? &parent->zone : NULL, > + ops, nr_constraints, const_ops); > + if (IS_ERR(pcz)) > + return PTR_ERR(pcz); > + > + if (parent) { > + spin_lock(&parent->lock); > + list_add_tail(&dtpm->sibling, &parent->children); > + spin_unlock(&parent->lock); > + dtpm->parent = parent; > + } else { > + root = dtpm; > + } > + > + dtpm_add_power(dtpm); > + > + return 0; > +} > + > +/** > + * dtpm_register_parent - Register a intermediate node in the tree > + * @name: a string specifying the name of the node > + * @dtpm: a pointer to a dtpm structure corresponding to the new node > + * @parent: a pointer to a dtpm structure corresponding parent's new node > + * > + * The function will add an intermediate node corresponding to a > + * parent to more nodes. Its purpose is to aggregate the children > + * characteristics and dispatch the constraints. If the @parent > + * parameter is NULL, then this node becomes the root node of the tree > + * if there is no root node yet. > + * > + * Return: zero on success, a negative value in case of error: > + * -EAGAIN: the function is called before the framework is initialized. > + * -EBUSY: the root node is already inserted > + * -EINVAL: there is not root node yet and @parent is specified > + * Other negative values are reported back from the powercap framework > + */ > +int dtpm_register_parent(const char *name, struct dtpm *dtpm, > + struct dtpm *parent) > +{ > + return dtpm_register(name, dtpm, parent, &zone_ops, > + MAX_DTPM_CONSTRAINTS, &constraint_ops); > +} > + > +static int __init dtpm_init(void) > +{ > + struct dtpm_descr **dtpm_descr; > + > + pct = powercap_register_control_type(NULL, "dtpm", NULL); > + if (!pct) { > + pr_err("Failed to register control type\n"); > + return -EINVAL; > + } > + > + for_each_dtpm_table(dtpm_descr) > + (*dtpm_descr)->init(*dtpm_descr); > + > + return 0; > +} > +late_initcall(dtpm_init); > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > index 5430febd34be..29b30976ea02 100644 > --- a/include/asm-generic/vmlinux.lds.h > +++ b/include/asm-generic/vmlinux.lds.h > @@ -315,6 +315,16 @@ > #define THERMAL_TABLE(name) > #endif > > +#ifdef CONFIG_DTPM > +#define DTPM_TABLE() \ > + . = ALIGN(8); \ > + __dtpm_table = .; \ > + KEEP(*(__dtpm_table)) \ > + __dtpm_table_end = .; > +#else > +#define DTPM_TABLE() > +#endif > + > #define KERNEL_DTB() \ > STRUCT_ALIGN(); \ > __dtb_start = .; \ > @@ -715,6 +725,7 @@ > ACPI_PROBE_TABLE(irqchip) \ > ACPI_PROBE_TABLE(timer) \ > THERMAL_TABLE(governor) \ > + DTPM_TABLE() \ > EARLYCON_TABLE() \ > LSM_TABLE() \ > EARLY_LSM_TABLE() > diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h > new file mode 100644 > index 000000000000..6696bdcfdb87 > --- /dev/null > +++ b/include/linux/dtpm.h > @@ -0,0 +1,73 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* > + * Copyright (C) 2020 Linaro Ltd > + * > + * Author: Daniel Lezcano <daniel.lezcano@linaro.org> > + */ > +#ifndef ___DTPM_H__ > +#define ___DTPM_H__ > + > +#include <linux/of.h> > +#include <linux/powercap.h> > + > +#define MAX_DTPM_DESCR 8 > +#define MAX_DTPM_CONSTRAINTS 1 > + > +struct dtpm { > + struct powercap_zone zone; > + struct dtpm *parent; > + struct list_head sibling; > + struct list_head children; > + spinlock_t lock; > + u64 power_limit; > + u64 power_max; > + u64 power_min; > + int weight; > + void *private; > +}; > + > +struct dtpm_descr; > + > +typedef int (*dtpm_init_t)(struct dtpm_descr *); > + > +struct dtpm_descr { > + struct dtpm *parent; > + const char *name; > + dtpm_init_t init; > +}; > + > +/* Init section thermal table */ > +extern struct dtpm_descr *__dtpm_table[]; > +extern struct dtpm_descr *__dtpm_table_end[]; > + > +#define DTPM_TABLE_ENTRY(name) \ > + static typeof(name) *__dtpm_table_entry_##name \ > + __used __section(__dtpm_table) = &name I had to change the section name to string, to pass compilation: __used __section("__dtpm_table") = &name I don't know if it's my compiler or configuration. I've tried to register this DTPM in scmi-cpufreq.c with macro proposed in patch 4/4 commit message, but I might missed some important includes there... > + > +#define DTPM_DECLARE(name) DTPM_TABLE_ENTRY(name) > + > +#define for_each_dtpm_table(__dtpm) \ > + for (__dtpm = __dtpm_table; \ > + __dtpm < __dtpm_table_end; \ > + __dtpm++) > + > +static inline struct dtpm *to_dtpm(struct powercap_zone *zone) > +{ > + return container_of(zone, struct dtpm, zone); > +} > + > +int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max); > + > +int dtpm_release_zone(struct powercap_zone *pcz); > + > +struct dtpm *dtpm_alloc(void); > + > +void dtpm_unregister(struct dtpm *dtpm); > + > +int dtpm_register_parent(const char *name, struct dtpm *dtpm, > + struct dtpm *parent); > + > +int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent, > + struct powercap_zone_ops *ops, int nr_constraints, > + struct powercap_zone_constraint_ops *const_ops); > +#endif > Minor comment. This new framework deserves more debug prints, especially in registration/unregistration paths. I had to put some, to test it. But it can be done later as well, after it gets into mainline. I have also run different hotplug stress tests to check this tree locking. The userspace process constantly reading these values, while the last CPU in the cluster was going on/off and node was detaching. I haven't seen any problems, but the tree wasn't so deep. Everything was calculated properly, no error, null pointers, etc. Apart from the spelling minor issues and the long constraint name, LGTM Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Lukasz Luba <lukasz.luba@arm.com> Regards, Lukasz [1] https://lore.kernel.org/linux-pm/20201109172452.6923-1-lukasz.luba@arm.com/ ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management 2020-11-10 9:59 ` Lukasz Luba @ 2020-11-10 11:05 ` Lukasz Luba 2020-11-10 14:59 ` Daniel Lezcano 2020-11-10 12:55 ` Daniel Lezcano 1 sibling, 1 reply; 42+ messages in thread From: Lukasz Luba @ 2020-11-10 11:05 UTC (permalink / raw) To: Daniel Lezcano Cc: rafael, srinivas.pandruvada, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Arnd Bergmann, open list:GENERIC INCLUDE/ASM HEADER FILES Actually I've found one issue when I have been trying to clean my testing branch with modified scmi-cpufreq.c. On 11/10/20 9:59 AM, Lukasz Luba wrote: > Hi Daniel, > > I've experimented with the patch set and went through the code again. > It looks good, only a few minor comments. > > On 10/6/20 1:20 PM, Daniel Lezcano wrote: >> On the embedded world, the complexity of the SoC leads to an >> increasing number of hotspots which need to be monitored and mitigated >> as a whole in order to prevent the temperature to go above the >> normative and legally stated 'skin temperature'. [snip] >> diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h >> new file mode 100644 >> index 000000000000..6696bdcfdb87 >> --- /dev/null >> +++ b/include/linux/dtpm.h >> @@ -0,0 +1,73 @@ >> +/* SPDX-License-Identifier: GPL-2.0-only */ >> +/* >> + * Copyright (C) 2020 Linaro Ltd >> + * >> + * Author: Daniel Lezcano <daniel.lezcano@linaro.org> >> + */ >> +#ifndef ___DTPM_H__ >> +#define ___DTPM_H__ >> + >> +#include <linux/of.h> >> +#include <linux/powercap.h> >> + >> +#define MAX_DTPM_DESCR 8 >> +#define MAX_DTPM_CONSTRAINTS 1 >> + >> +struct dtpm { >> + struct powercap_zone zone; >> + struct dtpm *parent; >> + struct list_head sibling; >> + struct list_head children; >> + spinlock_t lock; >> + u64 power_limit; >> + u64 power_max; >> + u64 power_min; >> + int weight; >> + void *private; >> +}; >> + >> +struct dtpm_descr; >> + >> +typedef int (*dtpm_init_t)(struct dtpm_descr *); >> + >> +struct dtpm_descr { >> + struct dtpm *parent; >> + const char *name; >> + dtpm_init_t init; >> +}; >> + >> +/* Init section thermal table */ >> +extern struct dtpm_descr *__dtpm_table[]; >> +extern struct dtpm_descr *__dtpm_table_end[]; >> + >> +#define DTPM_TABLE_ENTRY(name) \ >> + static typeof(name) *__dtpm_table_entry_##name \ >> + __used __section(__dtpm_table) = &name > > I had to change the section name to string, to pass compilation: > __used __section("__dtpm_table") = &name > I don't know if it's my compiler or configuration. > > I've tried to register this DTPM in scmi-cpufreq.c with macro > proposed in patch 4/4 commit message, but I might missed some > important includes there... > >> + >> +#define DTPM_DECLARE(name) DTPM_TABLE_ENTRY(name) >> + >> +#define for_each_dtpm_table(__dtpm) \ >> + for (__dtpm = __dtpm_table; \ >> + __dtpm < __dtpm_table_end; \ >> + __dtpm++) >> + >> +static inline struct dtpm *to_dtpm(struct powercap_zone *zone) >> +{ >> + return container_of(zone, struct dtpm, zone); >> +} >> + >> +int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max); >> + >> +int dtpm_release_zone(struct powercap_zone *pcz); >> + >> +struct dtpm *dtpm_alloc(void); >> + >> +void dtpm_unregister(struct dtpm *dtpm); >> + >> +int dtpm_register_parent(const char *name, struct dtpm *dtpm, >> + struct dtpm *parent); >> + >> +int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm >> *parent, >> + struct powercap_zone_ops *ops, int nr_constraints, >> + struct powercap_zone_constraint_ops *const_ops); This header is missing #ifdef CONFIG_DTPM with static inline functions and empty DTPM_DECLARE() macro. I got these issues, when my testing code in scmi-cpufreq.c was compiled w/o CONFIG_DTPM and DTPM_CPU /usr/bin/aarch64-linux-gnu-ld: warning: orphan section `__dtpm_table' from `drivers/cpufreq/scmi-cpufreq.o' being placed in section `__dtpm_table'. /usr/bin/aarch64-linux-gnu-ld: Unexpected GOT/PLT entries detected! /usr/bin/aarch64-linux-gnu-ld: Unexpected run-time procedure linkages detected! drivers/cpufreq/scmi-cpufreq.o: In function `dtpm_register_pkg': /data/linux/drivers/cpufreq/scmi-cpufreq.c:272: undefined reference to `dtpm_alloc' /data/linux/drivers/cpufreq/scmi-cpufreq.c:276: undefined reference to `dtpm_register_parent' /data/linux/drivers/cpufreq/scmi-cpufreq.c:280: undefined reference to `dtpm_register_cpu' Makefile:1164: recipe for target 'vmlinux' failed The diff bellow fixed my issues. Then I had one for patch 4/4 for static inline int dtpm_register_cpu() function. I've followed the thermal.h scheme with -ENODEV, but you can choose different approach. --------------------------8<--------------------------------------------- diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h index 6696bdcfdb87..0ef784ca5d0b 100644 --- a/include/linux/dtpm.h +++ b/include/linux/dtpm.h @@ -40,6 +40,7 @@ struct dtpm_descr { extern struct dtpm_descr *__dtpm_table[]; extern struct dtpm_descr *__dtpm_table_end[]; +#ifdef CONFIG_DTPM #define DTPM_TABLE_ENTRY(name) \ static typeof(name) *__dtpm_table_entry_##name \ __used __section(__dtpm_table) = &name @@ -70,4 +71,36 @@ int dtpm_register_parent(const char *name, struct dtpm *dtpm, int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent, struct powercap_zone_ops *ops, int nr_constraints, struct powercap_zone_constraint_ops *const_ops); -#endif +#else +#define DTPM_DECLARE(name) +static inline +int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max) +{ + return -ENODEV; +} +static inline int dtpm_release_zone(struct powercap_zone *pcz) +{ + return -ENODEV; +} +static inline struct dtpm *dtpm_alloc(void) +{ + return ERR_PTR(-ENODEV); +} +static inline void dtpm_unregister(struct dtpm *dtpm) +{ } +static inline +int dtpm_register_parent(const char *name, struct dtpm *dtpm, + struct dtpm *parent) +{ + return -ENODEV; +} +static inline +int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent, + struct powercap_zone_ops *ops, int nr_constraints, + struct powercap_zone_constraint_ops *const_ops) +{ + return -ENODEV; +} +#endif /* CONFIG_DTPM */ + +#endif /* __DTPM_H__ */ ----------------------------->8------------------------------------------- >> +#endif >> > > Minor comment. This new framework deserves more debug prints, especially > in registration/unregistration paths. I had to put some, to test it. > But it can be done later as well, after it gets into mainline. > > I have also run different hotplug stress tests to check this tree > locking. The userspace process constantly reading these values, while > the last CPU in the cluster was going on/off and node was detaching. > I haven't seen any problems, but the tree wasn't so deep. > Everything was calculated properly, no error, null pointers, etc. > > Apart from the spelling minor issues and the long constraint name, LGTM > > Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> > Tested-by: Lukasz Luba <lukasz.luba@arm.com> Please ignore these two for a while. But if you decide to take this diff above, you can add these two tags then in v2. This is the only issue that I see. Regards, Lukasz ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management 2020-11-10 11:05 ` Lukasz Luba @ 2020-11-10 14:59 ` Daniel Lezcano 2020-11-10 15:04 ` Lukasz Luba 0 siblings, 1 reply; 42+ messages in thread From: Daniel Lezcano @ 2020-11-10 14:59 UTC (permalink / raw) To: Lukasz Luba Cc: rafael, srinivas.pandruvada, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Arnd Bergmann, open list:GENERIC INCLUDE/ASM HEADER FILES On 10/11/2020 12:05, Lukasz Luba wrote: > > Actually I've found one issue when I have been trying to clean > my testing branch with modified scmi-cpufreq.c. IMO, those errors are not the dtpm framework fault but the scmi-cpufreq. You should add a component in the drivers/powercap which does the glue between the scmi-cpufreq and the dtpm. No stub will be needed in this case as the component will depend on CONFIG_DTPM. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management 2020-11-10 14:59 ` Daniel Lezcano @ 2020-11-10 15:04 ` Lukasz Luba 0 siblings, 0 replies; 42+ messages in thread From: Lukasz Luba @ 2020-11-10 15:04 UTC (permalink / raw) To: Daniel Lezcano Cc: rafael, srinivas.pandruvada, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Arnd Bergmann, open list:GENERIC INCLUDE/ASM HEADER FILES On 11/10/20 2:59 PM, Daniel Lezcano wrote: > On 10/11/2020 12:05, Lukasz Luba wrote: >> >> Actually I've found one issue when I have been trying to clean >> my testing branch with modified scmi-cpufreq.c. > > IMO, those errors are not the dtpm framework fault but the scmi-cpufreq. True, I have added this proposed macro directly into driver, but it's not strictly the framework. > > You should add a component in the drivers/powercap which does the glue > between the scmi-cpufreq and the dtpm. No stub will be needed in this > case as the component will depend on CONFIG_DTPM. > > > > Make sense, the glue stick should take care in this scenario. In this case, please keep the Reviewed-by and Tested-by and ignore the previous email. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management 2020-11-10 9:59 ` Lukasz Luba 2020-11-10 11:05 ` Lukasz Luba @ 2020-11-10 12:55 ` Daniel Lezcano 1 sibling, 0 replies; 42+ messages in thread From: Daniel Lezcano @ 2020-11-10 12:55 UTC (permalink / raw) To: Lukasz Luba Cc: rafael, srinivas.pandruvada, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Arnd Bergmann, open list:GENERIC INCLUDE/ASM HEADER FILES Hi Lukasz, thanks for the review On 10/11/2020 10:59, Lukasz Luba wrote: [ ... ] >> +/* Init section thermal table */ >> +extern struct dtpm_descr *__dtpm_table[]; >> +extern struct dtpm_descr *__dtpm_table_end[]; >> + >> +#define DTPM_TABLE_ENTRY(name) \ >> + static typeof(name) *__dtpm_table_entry_##name \ >> + __used __section(__dtpm_table) = &name > > I had to change the section name to string, to pass compilation: > __used __section("__dtpm_table") = &name > I don't know if it's my compiler or configuration. Actually, it is: commit 33def8498fdde180023444b08e12b72a9efed41d Author: Joe Perches <joe@perches.com> Date: Wed Oct 21 19:36:07 2020 -0700 treewide: Convert macro and uses of __section(foo) to __section("foo") Your change is correct, I've noticed it a few days ago when rebasing the series. > I've tried to register this DTPM in scmi-cpufreq.c with macro > proposed in patch 4/4 commit message, but I might missed some > important includes there... > >> + >> +#define DTPM_DECLARE(name) DTPM_TABLE_ENTRY(name) >> + >> +#define for_each_dtpm_table(__dtpm) \ >> + for (__dtpm = __dtpm_table; \ >> + __dtpm < __dtpm_table_end; \ >> + __dtpm++) >> + >> +static inline struct dtpm *to_dtpm(struct powercap_zone *zone) >> +{ >> + return container_of(zone, struct dtpm, zone); >> +} >> + >> +int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max); >> + >> +int dtpm_release_zone(struct powercap_zone *pcz); >> + >> +struct dtpm *dtpm_alloc(void); >> + >> +void dtpm_unregister(struct dtpm *dtpm); >> + >> +int dtpm_register_parent(const char *name, struct dtpm *dtpm, >> + struct dtpm *parent); >> + >> +int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm >> *parent, >> + struct powercap_zone_ops *ops, int nr_constraints, >> + struct powercap_zone_constraint_ops *const_ops); >> +#endif >> > > Minor comment. This new framework deserves more debug prints, especially > in registration/unregistration paths. I had to put some, to test it. > But it can be done later as well, after it gets into mainline. Ok, I will add some debug traces. > I have also run different hotplug stress tests to check this tree > locking. The userspace process constantly reading these values, while > the last CPU in the cluster was going on/off and node was detaching. > I haven't seen any problems, but the tree wasn't so deep. > Everything was calculated properly, no error, null pointers, etc. Great! thank you very much for this test > Apart from the spelling minor issues and the long constraint name, LGTM > > Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> > Tested-by: Lukasz Luba <lukasz.luba@arm.com> Thanks for the review -- Daniel -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support 2020-10-06 12:20 [PATCH 0/4] powercap/dtpm: Add the DTPM framework Daniel Lezcano ` (2 preceding siblings ...) 2020-10-06 12:20 ` [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management Daniel Lezcano @ 2020-10-06 12:20 ` Daniel Lezcano 2020-10-23 13:27 ` Lukasz Luba 2020-11-10 12:50 ` Lukasz Luba 2020-10-07 10:43 ` [PATCH 0/4] powercap/dtpm: Add the DTPM framework Hans de Goede 4 siblings, 2 replies; 42+ messages in thread From: Daniel Lezcano @ 2020-10-06 12:20 UTC (permalink / raw) To: rafael, srinivas.pandruvada Cc: lukasz.luba, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Palmer Dabbelt, Anup Patel, Atish Patra, Marc Zyngier, Andrew Jones, Michael Kelley, Mike Leach, Kajol Jain, Daniel Jordan, Steven Price With the powercap dtpm controller, we are able to plug devices with power limitation features in the tree. The following patch introduces the CPU power limitation based on the energy model and the performance states. The power limitation is done at the performance domain level. If some CPUs are unplugged, the corresponding power will be substracted from the performance domain total power. It is up to the platform to initialize the dtpm tree and add the CPU. Here is an example to create a simple tree with one root node called "pkg" and the cpu's performance domains. int dtpm_register_pkg(struct dtpm_descr *descr) { struct dtpm *pkg; int ret; pkg = dtpm_alloc(); if (!pkg) return -ENOMEM; ret = dtpm_register_parent(descr->name, pkg, descr->parent); if (ret) return ret; return dtpm_register_cpu(pkg); } struct dtpm_descr descr = { .name = "pkg", .init = dtpm_register_pkg, }; DTPM_DECLARE(descr); Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> --- drivers/powercap/Kconfig | 8 ++ drivers/powercap/Makefile | 1 + drivers/powercap/dtpm_cpu.c | 242 ++++++++++++++++++++++++++++++++++++ include/linux/cpuhotplug.h | 1 + include/linux/dtpm.h | 3 + 5 files changed, 255 insertions(+) create mode 100644 drivers/powercap/dtpm_cpu.c diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig index 777cf60300b8..240dc09e8dc2 100644 --- a/drivers/powercap/Kconfig +++ b/drivers/powercap/Kconfig @@ -49,4 +49,12 @@ config DTPM help This enables support for the power capping for the dynamic thermal management userspace engine. + +config DTPM_CPU + bool "Add CPU power capping based on the energy model" + depends on DTPM && ENERGY_MODEL + help + This enables support for CPU power limitation based on + energy model. + endif diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile index 6482ac52054d..fabcf388a8d3 100644 --- a/drivers/powercap/Makefile +++ b/drivers/powercap/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only obj-$(CONFIG_DTPM) += dtpm.o +obj-$(CONFIG_DTPM_CPU) += dtpm_cpu.o obj-$(CONFIG_POWERCAP) += powercap_sys.o obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c new file mode 100644 index 000000000000..23ebf704c599 --- /dev/null +++ b/drivers/powercap/dtpm_cpu.c @@ -0,0 +1,242 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright 2020 Linaro Limited + * + * Author: Daniel Lezcano <daniel.lezcano@linaro.org> + * + */ +#include <linux/cpumask.h> +#include <linux/cpufreq.h> +#include <linux/cpuhotplug.h> +#include <linux/dtpm.h> +#include <linux/energy_model.h> +#include <linux/pm_qos.h> +#include <linux/slab.h> +#include <linux/units.h> + +static struct dtpm *__parent; + +static DEFINE_PER_CPU(struct dtpm *, dtpm_per_cpu); + +struct dtpm_cpu { + struct freq_qos_request qos_req; + int cpu; +}; + +static int power_add(struct dtpm *dtpm, struct em_perf_domain *em) +{ + u64 power_min, power_max; + + power_min = em->table[0].power; + power_min *= MICROWATT_PER_MILLIWATT; + power_min += dtpm->power_min; + + power_max = em->table[em->nr_perf_states - 1].power; + power_max *= MICROWATT_PER_MILLIWATT; + power_max += dtpm->power_max; + + return dtpm_update_power(dtpm, power_min, power_max); +} + +static int power_sub(struct dtpm *dtpm, struct em_perf_domain *em) +{ + u64 power_min, power_max; + + power_min = em->table[0].power; + power_min *= MICROWATT_PER_MILLIWATT; + power_min = dtpm->power_min - power_min; + + power_max = em->table[em->nr_perf_states - 1].power; + power_max *= MICROWATT_PER_MILLIWATT; + power_max = dtpm->power_max - power_max; + + return dtpm_update_power(dtpm, power_min, power_max); +} + +static int set_pd_power_limit(struct powercap_zone *pcz, int cid, + u64 power_limit) +{ + struct dtpm *dtpm = to_dtpm(pcz); + struct dtpm_cpu *dtpm_cpu = dtpm->private; + struct em_perf_domain *pd; + unsigned long freq; + int i, nr_cpus; + + spin_lock(&dtpm->lock); + + power_limit = clamp_val(power_limit, dtpm->power_min, dtpm->power_max); + + pd = em_cpu_get(dtpm_cpu->cpu); + + nr_cpus = cpumask_weight(to_cpumask(pd->cpus)); + + for (i = 0; i < pd->nr_perf_states; i++) { + + u64 power = pd->table[i].power * MICROWATT_PER_MILLIWATT; + + if ((power * nr_cpus) > power_limit) + break; + } + + freq = pd->table[i - 1].frequency; + + freq_qos_update_request(&dtpm_cpu->qos_req, freq); + + dtpm->power_limit = power_limit; + + spin_unlock(&dtpm->lock); + + return 0; +} + +static int get_pd_power_limit(struct powercap_zone *pcz, int cid, u64 *data) +{ + struct dtpm *dtpm = to_dtpm(pcz); + + spin_lock(&dtpm->lock); + *data = dtpm->power_max; + spin_unlock(&dtpm->lock); + + return 0; +} + +static int get_pd_power_uw(struct powercap_zone *pcz, u64 *power_uw) +{ + struct dtpm *dtpm = to_dtpm(pcz); + struct dtpm_cpu *dtpm_cpu = dtpm->private; + struct em_perf_domain *pd; + unsigned long freq; + int i, nr_cpus; + + freq = cpufreq_quick_get(dtpm_cpu->cpu); + pd = em_cpu_get(dtpm_cpu->cpu); + nr_cpus = cpumask_weight(to_cpumask(pd->cpus)); + + for (i = 0; i < pd->nr_perf_states; i++) { + + if (pd->table[i].frequency < freq) + continue; + + *power_uw = pd->table[i].power * + MICROWATT_PER_MILLIWATT * nr_cpus; + + return 0; + } + + return -EINVAL; +} + +static int cpu_release_zone(struct powercap_zone *pcz) +{ + struct dtpm *dtpm = to_dtpm(pcz); + struct dtpm_cpu *dtpm_cpu = dtpm->private; + + freq_qos_remove_request(&dtpm_cpu->qos_req); + + return dtpm_release_zone(pcz); +} + +static struct powercap_zone_constraint_ops pd_constraint_ops = { + .set_power_limit_uw = set_pd_power_limit, + .get_power_limit_uw = get_pd_power_limit, +}; + +static struct powercap_zone_ops pd_zone_ops = { + .get_power_uw = get_pd_power_uw, + .release = cpu_release_zone, +}; + +static int cpuhp_dtpm_cpu_offline(unsigned int cpu) +{ + struct cpufreq_policy *policy; + struct em_perf_domain *pd; + struct dtpm *dtpm; + + policy = cpufreq_cpu_get(cpu); + + if (!policy) + return 0; + + pd = em_cpu_get(cpu); + if (!pd) + return -EINVAL; + + dtpm = per_cpu(dtpm_per_cpu, cpu); + + power_sub(dtpm, pd); + + if (cpumask_weight(policy->cpus) != 1) + return 0; + + for_each_cpu(cpu, policy->related_cpus) + per_cpu(dtpm_per_cpu, cpu) = NULL; + + dtpm_unregister(dtpm); + + return 0; +} + +static int cpuhp_dtpm_cpu_online(unsigned int cpu) +{ + struct dtpm *dtpm; + struct dtpm_cpu *dtpm_cpu; + struct cpufreq_policy *policy; + struct em_perf_domain *pd; + char name[CPUFREQ_NAME_LEN]; + int ret; + + policy = cpufreq_cpu_get(cpu); + + if (!policy) + return 0; + + pd = em_cpu_get(cpu); + if (!pd) + return -EINVAL; + + dtpm = per_cpu(dtpm_per_cpu, cpu); + if (dtpm) + return power_add(dtpm, pd); + + dtpm = dtpm_alloc(); + if (!dtpm) + return -EINVAL; + + dtpm_cpu = kzalloc(sizeof(dtpm_cpu), GFP_KERNEL); + if (!dtpm_cpu) + return -ENOMEM; + + dtpm->private = dtpm_cpu; + dtpm_cpu->cpu = cpu; + + for_each_cpu(cpu, policy->related_cpus) + per_cpu(dtpm_per_cpu, cpu) = dtpm; + + ret = power_add(dtpm, pd); + if (ret) + return ret; + + dtpm->power_limit = dtpm->power_max; + + sprintf(name, "cpu%d", dtpm_cpu->cpu); + + ret = dtpm_register(name, dtpm, __parent, &pd_zone_ops, + 1, &pd_constraint_ops); + if (ret) + return ret; + + ret = freq_qos_add_request(&policy->constraints, + &dtpm_cpu->qos_req, FREQ_QOS_MAX, + pd->table[pd->nr_perf_states - 1].frequency); + return ret; +} + +int dtpm_register_cpu(struct dtpm *parent) +{ + __parent = parent; + + return cpuhp_setup_state(CPUHP_AP_DTPM_CPU_ONLINE, + "dtpm_cpu:online", + cpuhp_dtpm_cpu_online, + cpuhp_dtpm_cpu_offline); +} diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index bf9181cef444..6792bad4a435 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -190,6 +190,7 @@ enum cpuhp_state { CPUHP_AP_ONLINE_DYN_END = CPUHP_AP_ONLINE_DYN + 30, CPUHP_AP_X86_HPET_ONLINE, CPUHP_AP_X86_KVM_CLK_ONLINE, + CPUHP_AP_DTPM_CPU_ONLINE, CPUHP_AP_ACTIVE, CPUHP_ONLINE, }; diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h index 6696bdcfdb87..b62215a13baa 100644 --- a/include/linux/dtpm.h +++ b/include/linux/dtpm.h @@ -70,4 +70,7 @@ int dtpm_register_parent(const char *name, struct dtpm *dtpm, int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent, struct powercap_zone_ops *ops, int nr_constraints, struct powercap_zone_constraint_ops *const_ops); + +int dtpm_register_cpu(struct dtpm *parent); + #endif -- 2.17.1 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support 2020-10-06 12:20 ` [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support Daniel Lezcano @ 2020-10-23 13:27 ` Lukasz Luba 2020-11-04 10:47 ` Daniel Lezcano 2020-11-10 12:50 ` Lukasz Luba 1 sibling, 1 reply; 42+ messages in thread From: Lukasz Luba @ 2020-10-23 13:27 UTC (permalink / raw) To: Daniel Lezcano Cc: rafael, srinivas.pandruvada, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Palmer Dabbelt, Anup Patel, Atish Patra, Marc Zyngier, Andrew Jones, Michael Kelley, Mike Leach, Kajol Jain, Daniel Jordan, Steven Price Hi Daniel, On 10/6/20 1:20 PM, Daniel Lezcano wrote: > With the powercap dtpm controller, we are able to plug devices with > power limitation features in the tree. > > The following patch introduces the CPU power limitation based on the > energy model and the performance states. > > The power limitation is done at the performance domain level. If some > CPUs are unplugged, the corresponding power will be substracted from > the performance domain total power. > > It is up to the platform to initialize the dtpm tree and add the CPU. > > Here is an example to create a simple tree with one root node called > "pkg" and the cpu's performance domains. > > int dtpm_register_pkg(struct dtpm_descr *descr) > { > struct dtpm *pkg; > int ret; > > pkg = dtpm_alloc(); > if (!pkg) > return -ENOMEM; > > ret = dtpm_register_parent(descr->name, pkg, descr->parent); > if (ret) > return ret; > > return dtpm_register_cpu(pkg); > } > > struct dtpm_descr descr = { > .name = "pkg", > .init = dtpm_register_pkg, > }; > DTPM_DECLARE(descr); > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > --- > drivers/powercap/Kconfig | 8 ++ > drivers/powercap/Makefile | 1 + > drivers/powercap/dtpm_cpu.c | 242 ++++++++++++++++++++++++++++++++++++ > include/linux/cpuhotplug.h | 1 + > include/linux/dtpm.h | 3 + > 5 files changed, 255 insertions(+) > create mode 100644 drivers/powercap/dtpm_cpu.c > > diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig > index 777cf60300b8..240dc09e8dc2 100644 > --- a/drivers/powercap/Kconfig > +++ b/drivers/powercap/Kconfig > @@ -49,4 +49,12 @@ config DTPM > help > This enables support for the power capping for the dynamic > thermal management userspace engine. > + > +config DTPM_CPU > + bool "Add CPU power capping based on the energy model" > + depends on DTPM && ENERGY_MODEL > + help > + This enables support for CPU power limitation based on > + energy model. > + > endif > diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile > index 6482ac52054d..fabcf388a8d3 100644 > --- a/drivers/powercap/Makefile > +++ b/drivers/powercap/Makefile > @@ -1,5 +1,6 @@ > # SPDX-License-Identifier: GPL-2.0-only > obj-$(CONFIG_DTPM) += dtpm.o > +obj-$(CONFIG_DTPM_CPU) += dtpm_cpu.o > obj-$(CONFIG_POWERCAP) += powercap_sys.o > obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o > obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o > diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c > new file mode 100644 > index 000000000000..23ebf704c599 > --- /dev/null > +++ b/drivers/powercap/dtpm_cpu.c > @@ -0,0 +1,242 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Copyright 2020 Linaro Limited > + * > + * Author: Daniel Lezcano <daniel.lezcano@linaro.org> > + * > + */ > +#include <linux/cpumask.h> > +#include <linux/cpufreq.h> > +#include <linux/cpuhotplug.h> > +#include <linux/dtpm.h> > +#include <linux/energy_model.h> > +#include <linux/pm_qos.h> > +#include <linux/slab.h> > +#include <linux/units.h> > + > +static struct dtpm *__parent; > + > +static DEFINE_PER_CPU(struct dtpm *, dtpm_per_cpu); > + > +struct dtpm_cpu { > + struct freq_qos_request qos_req; > + int cpu; > +}; > + > +static int power_add(struct dtpm *dtpm, struct em_perf_domain *em) > +{ > + u64 power_min, power_max; > + > + power_min = em->table[0].power; > + power_min *= MICROWATT_PER_MILLIWATT; > + power_min += dtpm->power_min; > + > + power_max = em->table[em->nr_perf_states - 1].power; > + power_max *= MICROWATT_PER_MILLIWATT; > + power_max += dtpm->power_max; > + > + return dtpm_update_power(dtpm, power_min, power_max); > +} > + > +static int power_sub(struct dtpm *dtpm, struct em_perf_domain *em) > +{ > + u64 power_min, power_max; > + > + power_min = em->table[0].power; > + power_min *= MICROWATT_PER_MILLIWATT; > + power_min = dtpm->power_min - power_min; > + > + power_max = em->table[em->nr_perf_states - 1].power; > + power_max *= MICROWATT_PER_MILLIWATT; > + power_max = dtpm->power_max - power_max; > + > + return dtpm_update_power(dtpm, power_min, power_max); > +} > + > +static int set_pd_power_limit(struct powercap_zone *pcz, int cid, > + u64 power_limit) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + struct dtpm_cpu *dtpm_cpu = dtpm->private; > + struct em_perf_domain *pd; > + unsigned long freq; > + int i, nr_cpus; > + > + spin_lock(&dtpm->lock); > + > + power_limit = clamp_val(power_limit, dtpm->power_min, dtpm->power_max); > + > + pd = em_cpu_get(dtpm_cpu->cpu); > + > + nr_cpus = cpumask_weight(to_cpumask(pd->cpus)); > + > + for (i = 0; i < pd->nr_perf_states; i++) { > + > + u64 power = pd->table[i].power * MICROWATT_PER_MILLIWATT; > + > + if ((power * nr_cpus) > power_limit) We have one node in that DTPM hierarchy tree, which represents all CPUs which are in 'related_cpus' mask. I saw below that we just remove the node in hotplug. I have put a comment below asking if we could change the registration, which will affect power calculation. > + break; > + } > + > + freq = pd->table[i - 1].frequency; > + > + freq_qos_update_request(&dtpm_cpu->qos_req, freq); > + > + dtpm->power_limit = power_limit; > + > + spin_unlock(&dtpm->lock); > + > + return 0; > +} > + > +static int get_pd_power_limit(struct powercap_zone *pcz, int cid, u64 *data) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + > + spin_lock(&dtpm->lock); > + *data = dtpm->power_max; > + spin_unlock(&dtpm->lock); > + > + return 0; > +} > + > +static int get_pd_power_uw(struct powercap_zone *pcz, u64 *power_uw) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + struct dtpm_cpu *dtpm_cpu = dtpm->private; > + struct em_perf_domain *pd; > + unsigned long freq; > + int i, nr_cpus; > + > + freq = cpufreq_quick_get(dtpm_cpu->cpu); > + pd = em_cpu_get(dtpm_cpu->cpu); > + nr_cpus = cpumask_weight(to_cpumask(pd->cpus)); > + > + for (i = 0; i < pd->nr_perf_states; i++) { > + > + if (pd->table[i].frequency < freq) > + continue; > + > + *power_uw = pd->table[i].power * > + MICROWATT_PER_MILLIWATT * nr_cpus; Same here, we have 'nr_cpus'. > + > + return 0; > + } > + > + return -EINVAL; > +} > + > +static int cpu_release_zone(struct powercap_zone *pcz) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + struct dtpm_cpu *dtpm_cpu = dtpm->private; > + > + freq_qos_remove_request(&dtpm_cpu->qos_req); > + > + return dtpm_release_zone(pcz); > +} > + > +static struct powercap_zone_constraint_ops pd_constraint_ops = { > + .set_power_limit_uw = set_pd_power_limit, > + .get_power_limit_uw = get_pd_power_limit, > +}; > + > +static struct powercap_zone_ops pd_zone_ops = { > + .get_power_uw = get_pd_power_uw, > + .release = cpu_release_zone, > +}; > + > +static int cpuhp_dtpm_cpu_offline(unsigned int cpu) > +{ > + struct cpufreq_policy *policy; > + struct em_perf_domain *pd; > + struct dtpm *dtpm; > + > + policy = cpufreq_cpu_get(cpu); > + > + if (!policy) > + return 0; > + > + pd = em_cpu_get(cpu); > + if (!pd) > + return -EINVAL; > + > + dtpm = per_cpu(dtpm_per_cpu, cpu); > + > + power_sub(dtpm, pd); > + > + if (cpumask_weight(policy->cpus) != 1) > + return 0; > + > + for_each_cpu(cpu, policy->related_cpus) > + per_cpu(dtpm_per_cpu, cpu) = NULL; Hotplugging one CPU would affect others. I would keep them all but marked somehow that CPU is offline. > + > + dtpm_unregister(dtpm); Could we keep the node in the hierarchy on CPU hotplug? > + > + return 0; > +} > + > +static int cpuhp_dtpm_cpu_online(unsigned int cpu) > +{ > + struct dtpm *dtpm; > + struct dtpm_cpu *dtpm_cpu; > + struct cpufreq_policy *policy; > + struct em_perf_domain *pd; > + char name[CPUFREQ_NAME_LEN]; > + int ret; > + > + policy = cpufreq_cpu_get(cpu); > + > + if (!policy) > + return 0; > + > + pd = em_cpu_get(cpu); > + if (!pd) > + return -EINVAL; > + > + dtpm = per_cpu(dtpm_per_cpu, cpu); > + if (dtpm) > + return power_add(dtpm, pd); > + > + dtpm = dtpm_alloc(); > + if (!dtpm) > + return -EINVAL; > + > + dtpm_cpu = kzalloc(sizeof(dtpm_cpu), GFP_KERNEL); > + if (!dtpm_cpu) > + return -ENOMEM; > + > + dtpm->private = dtpm_cpu; > + dtpm_cpu->cpu = cpu; > + > + for_each_cpu(cpu, policy->related_cpus) > + per_cpu(dtpm_per_cpu, cpu) = dtpm; > + > + ret = power_add(dtpm, pd); > + if (ret) > + return ret; > + > + dtpm->power_limit = dtpm->power_max; > + > + sprintf(name, "cpu%d", dtpm_cpu->cpu); > + > + ret = dtpm_register(name, dtpm, __parent, &pd_zone_ops, > + 1, &pd_constraint_ops); > + if (ret) > + return ret; > + > + ret = freq_qos_add_request(&policy->constraints, > + &dtpm_cpu->qos_req, FREQ_QOS_MAX, > + pd->table[pd->nr_perf_states - 1].frequency); > + return ret; > +} > + > +int dtpm_register_cpu(struct dtpm *parent) > +{ > + __parent = parent; > + > + return cpuhp_setup_state(CPUHP_AP_DTPM_CPU_ONLINE, > + "dtpm_cpu:online", > + cpuhp_dtpm_cpu_online, > + cpuhp_dtpm_cpu_offline); > +} > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h > index bf9181cef444..6792bad4a435 100644 > --- a/include/linux/cpuhotplug.h > +++ b/include/linux/cpuhotplug.h > @@ -190,6 +190,7 @@ enum cpuhp_state { > CPUHP_AP_ONLINE_DYN_END = CPUHP_AP_ONLINE_DYN + 30, > CPUHP_AP_X86_HPET_ONLINE, > CPUHP_AP_X86_KVM_CLK_ONLINE, > + CPUHP_AP_DTPM_CPU_ONLINE, > CPUHP_AP_ACTIVE, > CPUHP_ONLINE, > }; > diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h > index 6696bdcfdb87..b62215a13baa 100644 > --- a/include/linux/dtpm.h > +++ b/include/linux/dtpm.h > @@ -70,4 +70,7 @@ int dtpm_register_parent(const char *name, struct dtpm *dtpm, > int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent, > struct powercap_zone_ops *ops, int nr_constraints, > struct powercap_zone_constraint_ops *const_ops); > + > +int dtpm_register_cpu(struct dtpm *parent); > + > #endif > I have a few comments for this DTPM CPU. 1. Maybe we can register these CPUs differently. First register the parent node as a separate dtpm based on 'policy->related_cpus. Then register new children nodes, one for each CPU. When the CPU is up, mark it as 'active'. 2. We don't remove the node when the CPU is hotplugged, but we mark it '!active' Or 'offline'. The power calculation could be done in upper node, which takes into account that flag for children. 3. We would only remove the node when it's module is unloaded (e.g. GPU) That would make the tree more stable and also more detailed. We would also account the power properly when one CPU went offline, but the other are still there. What do you think? Regards, Lukasz ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support 2020-10-23 13:27 ` Lukasz Luba @ 2020-11-04 10:47 ` Daniel Lezcano 2020-11-04 10:57 ` Lukasz Luba 0 siblings, 1 reply; 42+ messages in thread From: Daniel Lezcano @ 2020-11-04 10:47 UTC (permalink / raw) To: Lukasz Luba Cc: rafael, srinivas.pandruvada, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Palmer Dabbelt, Anup Patel, Atish Patra, Marc Zyngier, Andrew Jones, Michael Kelley, Mike Leach, Kajol Jain, Daniel Jordan, Steven Price Hi Lukasz, On 23/10/2020 15:27, Lukasz Luba wrote: > Hi Daniel, > > > On 10/6/20 1:20 PM, Daniel Lezcano wrote: >> With the powercap dtpm controller, we are able to plug devices with >> power limitation features in the tree. >> >> The following patch introduces the CPU power limitation based on the >> energy model and the performance states. >> >> The power limitation is done at the performance domain level. If some >> CPUs are unplugged, the corresponding power will be substracted from >> the performance domain total power. >> >> It is up to the platform to initialize the dtpm tree and add the CPU. >> >> Here is an example to create a simple tree with one root node called >> "pkg" and the cpu's performance domains. [ ... ] >> +static int set_pd_power_limit(struct powercap_zone *pcz, int cid, >> + u64 power_limit) >> +{ >> + struct dtpm *dtpm = to_dtpm(pcz); >> + struct dtpm_cpu *dtpm_cpu = dtpm->private; >> + struct em_perf_domain *pd; >> + unsigned long freq; >> + int i, nr_cpus; >> + >> + spin_lock(&dtpm->lock); >> + >> + power_limit = clamp_val(power_limit, dtpm->power_min, >> dtpm->power_max); >> + >> + pd = em_cpu_get(dtpm_cpu->cpu); >> + >> + nr_cpus = cpumask_weight(to_cpumask(pd->cpus)); >> + >> + for (i = 0; i < pd->nr_perf_states; i++) { >> + >> + u64 power = pd->table[i].power * MICROWATT_PER_MILLIWATT; >> + >> + if ((power * nr_cpus) > power_limit) > > We have one node in that DTPM hierarchy tree, which represents all CPUs > which are in 'related_cpus' mask. I saw below that we just remove the > node in hotplug. The last CPU hotplugged will remove the node. > I have put a comment below asking if we could change the registration, > which will affect power calculation. > > >> + break; >> + } >> + >> + freq = pd->table[i - 1].frequency; >> + >> + freq_qos_update_request(&dtpm_cpu->qos_req, freq); >> + >> + dtpm->power_limit = power_limit; >> + >> + spin_unlock(&dtpm->lock); >> + >> + return 0; >> +} >> + >> +static int get_pd_power_limit(struct powercap_zone *pcz, int cid, u64 >> *data) >> +{ >> + struct dtpm *dtpm = to_dtpm(pcz); >> + >> + spin_lock(&dtpm->lock); >> + *data = dtpm->power_max; >> + spin_unlock(&dtpm->lock); >> + >> + return 0; >> +} >> + >> +static int get_pd_power_uw(struct powercap_zone *pcz, u64 *power_uw) >> +{ >> + struct dtpm *dtpm = to_dtpm(pcz); >> + struct dtpm_cpu *dtpm_cpu = dtpm->private; >> + struct em_perf_domain *pd; >> + unsigned long freq; >> + int i, nr_cpus; >> + >> + freq = cpufreq_quick_get(dtpm_cpu->cpu); >> + pd = em_cpu_get(dtpm_cpu->cpu); >> + nr_cpus = cpumask_weight(to_cpumask(pd->cpus)); >> + >> + for (i = 0; i < pd->nr_perf_states; i++) { >> + >> + if (pd->table[i].frequency < freq) >> + continue; >> + >> + *power_uw = pd->table[i].power * >> + MICROWATT_PER_MILLIWATT * nr_cpus; > > Same here, we have 'nr_cpus'. > >> + >> + return 0; >> + } >> + >> + return -EINVAL; >> +} >> + >> +static int cpu_release_zone(struct powercap_zone *pcz) >> +{ >> + struct dtpm *dtpm = to_dtpm(pcz); >> + struct dtpm_cpu *dtpm_cpu = dtpm->private; >> + >> + freq_qos_remove_request(&dtpm_cpu->qos_req); >> + >> + return dtpm_release_zone(pcz); >> +} >> + >> +static struct powercap_zone_constraint_ops pd_constraint_ops = { >> + .set_power_limit_uw = set_pd_power_limit, >> + .get_power_limit_uw = get_pd_power_limit, >> +}; >> + >> +static struct powercap_zone_ops pd_zone_ops = { >> + .get_power_uw = get_pd_power_uw, >> + .release = cpu_release_zone, >> +}; >> + >> +static int cpuhp_dtpm_cpu_offline(unsigned int cpu) >> +{ >> + struct cpufreq_policy *policy; >> + struct em_perf_domain *pd; >> + struct dtpm *dtpm; >> + >> + policy = cpufreq_cpu_get(cpu); >> + >> + if (!policy) >> + return 0; >> + >> + pd = em_cpu_get(cpu); >> + if (!pd) >> + return -EINVAL; >> + >> + dtpm = per_cpu(dtpm_per_cpu, cpu); >> + >> + power_sub(dtpm, pd); >> + >> + if (cpumask_weight(policy->cpus) != 1) >> + return 0; >> + >> + for_each_cpu(cpu, policy->related_cpus) >> + per_cpu(dtpm_per_cpu, cpu) = NULL; > > Hotplugging one CPU would affect others. I would keep them > all but marked somehow that CPU is offline. No, the last one will remove the node. This is checked in the test above (policy->cpus) != 1 ... >> + >> + dtpm_unregister(dtpm); > > Could we keep the node in the hierarchy on CPU hotplug? [ ... ] >> diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h >> index 6696bdcfdb87..b62215a13baa 100644 >> --- a/include/linux/dtpm.h >> +++ b/include/linux/dtpm.h >> @@ -70,4 +70,7 @@ int dtpm_register_parent(const char *name, struct >> dtpm *dtpm, >> int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm >> *parent, >> struct powercap_zone_ops *ops, int nr_constraints, >> struct powercap_zone_constraint_ops *const_ops); >> + >> +int dtpm_register_cpu(struct dtpm *parent); >> + >> #endif >> > > I have a few comments for this DTPM CPU. > > 1. Maybe we can register these CPUs differently. First register > the parent node as a separate dtpm based on 'policy->related_cpus. Then > register new children nodes, one for each CPU. When the CPU is up, mark > it as 'active'. > > 2. We don't remove the node when the CPU is hotplugged, but we mark it > '!active' Or 'offline'. The power calculation could be done in upper > node, which takes into account that flag for children. > > 3. We would only remove the node when it's module is unloaded (e.g. GPU) > > That would make the tree more stable and also more detailed. > We would also account the power properly when one CPU went offline, but > the other are still there. > > What do you think? The paradigm of the DTPM is the intermediate nodes (have children), are aggregating the power of their children and do not represent the real devices. The leaves are the real devices which are power manageable. In our case, the CPU DTPM is based on the performance state which is a group of CPUs, hence it is a leaf of the tree. I think you misunderstood the power is recomputed when the CPU is switched on/off and the node is removed when the last CPU is hotplugged. eg. 1000mW max per CPU, a performance domain with 4 CPUs. With all CPUs on, max power is 4000mW With 3 CPUs on, and 1 CPU off, max power is 3000mW etc... With 4 CPUs off, the node is removed. If the hardware evolves with a performance domain per CPU, we will end up with a leaf per CPU and a "cluster" on top of them. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support 2020-11-04 10:47 ` Daniel Lezcano @ 2020-11-04 10:57 ` Lukasz Luba 2020-11-04 11:15 ` Daniel Lezcano 0 siblings, 1 reply; 42+ messages in thread From: Lukasz Luba @ 2020-11-04 10:57 UTC (permalink / raw) To: Daniel Lezcano Cc: rafael, srinivas.pandruvada, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Palmer Dabbelt, Anup Patel, Atish Patra, Marc Zyngier, Andrew Jones, Michael Kelley, Mike Leach, Kajol Jain, Daniel Jordan, Steven Price On 11/4/20 10:47 AM, Daniel Lezcano wrote: > > Hi Lukasz, > > > On 23/10/2020 15:27, Lukasz Luba wrote: >> Hi Daniel, >> >> >> On 10/6/20 1:20 PM, Daniel Lezcano wrote: >>> With the powercap dtpm controller, we are able to plug devices with >>> power limitation features in the tree. >>> >>> The following patch introduces the CPU power limitation based on the >>> energy model and the performance states. >>> >>> The power limitation is done at the performance domain level. If some >>> CPUs are unplugged, the corresponding power will be substracted from >>> the performance domain total power. >>> >>> It is up to the platform to initialize the dtpm tree and add the CPU. >>> >>> Here is an example to create a simple tree with one root node called >>> "pkg" and the cpu's performance domains. > > [ ... ] > >>> +static int set_pd_power_limit(struct powercap_zone *pcz, int cid, >>> + u64 power_limit) >>> +{ >>> + struct dtpm *dtpm = to_dtpm(pcz); >>> + struct dtpm_cpu *dtpm_cpu = dtpm->private; >>> + struct em_perf_domain *pd; >>> + unsigned long freq; >>> + int i, nr_cpus; >>> + >>> + spin_lock(&dtpm->lock); >>> + >>> + power_limit = clamp_val(power_limit, dtpm->power_min, >>> dtpm->power_max); >>> + >>> + pd = em_cpu_get(dtpm_cpu->cpu); >>> + >>> + nr_cpus = cpumask_weight(to_cpumask(pd->cpus)); >>> + >>> + for (i = 0; i < pd->nr_perf_states; i++) { >>> + >>> + u64 power = pd->table[i].power * MICROWATT_PER_MILLIWATT; >>> + >>> + if ((power * nr_cpus) > power_limit) >> >> We have one node in that DTPM hierarchy tree, which represents all CPUs >> which are in 'related_cpus' mask. I saw below that we just remove the >> node in hotplug. > > The last CPU hotplugged will remove the node. > >> I have put a comment below asking if we could change the registration, >> which will affect power calculation. >> >> >>> + break; >>> + } >>> + >>> + freq = pd->table[i - 1].frequency; >>> + >>> + freq_qos_update_request(&dtpm_cpu->qos_req, freq); >>> + >>> + dtpm->power_limit = power_limit; >>> + >>> + spin_unlock(&dtpm->lock); >>> + >>> + return 0; >>> +} >>> + >>> +static int get_pd_power_limit(struct powercap_zone *pcz, int cid, u64 >>> *data) >>> +{ >>> + struct dtpm *dtpm = to_dtpm(pcz); >>> + >>> + spin_lock(&dtpm->lock); >>> + *data = dtpm->power_max; >>> + spin_unlock(&dtpm->lock); >>> + >>> + return 0; >>> +} >>> + >>> +static int get_pd_power_uw(struct powercap_zone *pcz, u64 *power_uw) >>> +{ >>> + struct dtpm *dtpm = to_dtpm(pcz); >>> + struct dtpm_cpu *dtpm_cpu = dtpm->private; >>> + struct em_perf_domain *pd; >>> + unsigned long freq; >>> + int i, nr_cpus; >>> + >>> + freq = cpufreq_quick_get(dtpm_cpu->cpu); >>> + pd = em_cpu_get(dtpm_cpu->cpu); >>> + nr_cpus = cpumask_weight(to_cpumask(pd->cpus)); >>> + >>> + for (i = 0; i < pd->nr_perf_states; i++) { >>> + >>> + if (pd->table[i].frequency < freq) >>> + continue; >>> + >>> + *power_uw = pd->table[i].power * >>> + MICROWATT_PER_MILLIWATT * nr_cpus; >> >> Same here, we have 'nr_cpus'. >> >>> + >>> + return 0; >>> + } >>> + >>> + return -EINVAL; >>> +} >>> + >>> +static int cpu_release_zone(struct powercap_zone *pcz) >>> +{ >>> + struct dtpm *dtpm = to_dtpm(pcz); >>> + struct dtpm_cpu *dtpm_cpu = dtpm->private; >>> + >>> + freq_qos_remove_request(&dtpm_cpu->qos_req); >>> + >>> + return dtpm_release_zone(pcz); >>> +} >>> + >>> +static struct powercap_zone_constraint_ops pd_constraint_ops = { >>> + .set_power_limit_uw = set_pd_power_limit, >>> + .get_power_limit_uw = get_pd_power_limit, >>> +}; >>> + >>> +static struct powercap_zone_ops pd_zone_ops = { >>> + .get_power_uw = get_pd_power_uw, >>> + .release = cpu_release_zone, >>> +}; >>> + >>> +static int cpuhp_dtpm_cpu_offline(unsigned int cpu) >>> +{ >>> + struct cpufreq_policy *policy; >>> + struct em_perf_domain *pd; >>> + struct dtpm *dtpm; >>> + >>> + policy = cpufreq_cpu_get(cpu); >>> + >>> + if (!policy) >>> + return 0; >>> + >>> + pd = em_cpu_get(cpu); >>> + if (!pd) >>> + return -EINVAL; >>> + >>> + dtpm = per_cpu(dtpm_per_cpu, cpu); >>> + >>> + power_sub(dtpm, pd); >>> + >>> + if (cpumask_weight(policy->cpus) != 1) >>> + return 0; >>> + >>> + for_each_cpu(cpu, policy->related_cpus) >>> + per_cpu(dtpm_per_cpu, cpu) = NULL; >> >> Hotplugging one CPU would affect others. I would keep them >> all but marked somehow that CPU is offline. > > No, the last one will remove the node. This is checked in the test above > (policy->cpus) != 1 ... > >>> + >>> + dtpm_unregister(dtpm); >> >> Could we keep the node in the hierarchy on CPU hotplug? > > [ ... ] > >>> diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h >>> index 6696bdcfdb87..b62215a13baa 100644 >>> --- a/include/linux/dtpm.h >>> +++ b/include/linux/dtpm.h >>> @@ -70,4 +70,7 @@ int dtpm_register_parent(const char *name, struct >>> dtpm *dtpm, >>> int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm >>> *parent, >>> struct powercap_zone_ops *ops, int nr_constraints, >>> struct powercap_zone_constraint_ops *const_ops); >>> + >>> +int dtpm_register_cpu(struct dtpm *parent); >>> + >>> #endif >>> >> >> I have a few comments for this DTPM CPU. >> >> 1. Maybe we can register these CPUs differently. First register >> the parent node as a separate dtpm based on 'policy->related_cpus. Then >> register new children nodes, one for each CPU. When the CPU is up, mark >> it as 'active'. >> >> 2. We don't remove the node when the CPU is hotplugged, but we mark it >> '!active' Or 'offline'. The power calculation could be done in upper >> node, which takes into account that flag for children. >> >> 3. We would only remove the node when it's module is unloaded (e.g. GPU) >> >> That would make the tree more stable and also more detailed. >> We would also account the power properly when one CPU went offline, but >> the other are still there. >> >> What do you think? > > The paradigm of the DTPM is the intermediate nodes (have children), are > aggregating the power of their children and do not represent the real > devices. The leaves are the real devices which are power manageable. OK, I see, it makes sense. Thanks for the explanation. > > In our case, the CPU DTPM is based on the performance state which is a > group of CPUs, hence it is a leaf of the tree. > > I think you misunderstood the power is recomputed when the CPU is > switched on/off and the node is removed when the last CPU is hotplugged. Yes, you are right. I misunderstood the hotplug and then power calc. > > eg. 1000mW max per CPU, a performance domain with 4 CPUs. > > With all CPUs on, max power is 4000mW > With 3 CPUs on, and 1 CPU off, max power is 3000mW > > etc... > > With 4 CPUs off, the node is removed. > > If the hardware evolves with a performance domain per CPU, we will end > up with a leaf per CPU and a "cluster" on top of them. > > Let me go again through the patches and then I will add my reviewed by. I will also run LTP hotplug or LISA hotplug torture on this tree, just to check it's fine. Regards, Lukasz ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support 2020-11-04 10:57 ` Lukasz Luba @ 2020-11-04 11:15 ` Daniel Lezcano 0 siblings, 0 replies; 42+ messages in thread From: Daniel Lezcano @ 2020-11-04 11:15 UTC (permalink / raw) To: Lukasz Luba Cc: rafael, srinivas.pandruvada, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Palmer Dabbelt, Anup Patel, Atish Patra, Marc Zyngier, Andrew Jones, Michael Kelley, Mike Leach, Kajol Jain, Daniel Jordan, Steven Price On 04/11/2020 11:57, Lukasz Luba wrote: [ ... ] > Let me go again through the patches and then I will add my reviewed by. > > I will also run LTP hotplug or LISA hotplug torture on this tree, > just to check it's fine. Ah yes, good idea. Thanks for doing that. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support 2020-10-06 12:20 ` [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support Daniel Lezcano 2020-10-23 13:27 ` Lukasz Luba @ 2020-11-10 12:50 ` Lukasz Luba 1 sibling, 0 replies; 42+ messages in thread From: Lukasz Luba @ 2020-11-10 12:50 UTC (permalink / raw) To: Daniel Lezcano Cc: rafael, srinivas.pandruvada, linux-kernel, linux-pm, rui.zhang, Rafael J. Wysocki, Palmer Dabbelt, Anup Patel, Atish Patra, Marc Zyngier, Andrew Jones, Michael Kelley, Mike Leach, Kajol Jain, Daniel Jordan, Steven Price On 10/6/20 1:20 PM, Daniel Lezcano wrote: > With the powercap dtpm controller, we are able to plug devices with > power limitation features in the tree. > > The following patch introduces the CPU power limitation based on the > energy model and the performance states. > > The power limitation is done at the performance domain level. If some > CPUs are unplugged, the corresponding power will be substracted from s/substracted/subtracted > the performance domain total power. > > It is up to the platform to initialize the dtpm tree and add the CPU. > > Here is an example to create a simple tree with one root node called > "pkg" and the cpu's performance domains. s/cpu/CPU to be aligned with previous 'CPU' > > int dtpm_register_pkg(struct dtpm_descr *descr) > { > struct dtpm *pkg; > int ret; > > pkg = dtpm_alloc(); > if (!pkg) > return -ENOMEM; > > ret = dtpm_register_parent(descr->name, pkg, descr->parent); > if (ret) > return ret; > > return dtpm_register_cpu(pkg); > } > > struct dtpm_descr descr = { > .name = "pkg", > .init = dtpm_register_pkg, > }; > DTPM_DECLARE(descr); > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > --- > drivers/powercap/Kconfig | 8 ++ > drivers/powercap/Makefile | 1 + > drivers/powercap/dtpm_cpu.c | 242 ++++++++++++++++++++++++++++++++++++ > include/linux/cpuhotplug.h | 1 + > include/linux/dtpm.h | 3 + > 5 files changed, 255 insertions(+) > create mode 100644 drivers/powercap/dtpm_cpu.c > > diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig > index 777cf60300b8..240dc09e8dc2 100644 > --- a/drivers/powercap/Kconfig > +++ b/drivers/powercap/Kconfig > @@ -49,4 +49,12 @@ config DTPM > help > This enables support for the power capping for the dynamic > thermal management userspace engine. > + > +config DTPM_CPU > + bool "Add CPU power capping based on the energy model" > + depends on DTPM && ENERGY_MODEL > + help > + This enables support for CPU power limitation based on > + energy model. > + > endif > diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile > index 6482ac52054d..fabcf388a8d3 100644 > --- a/drivers/powercap/Makefile > +++ b/drivers/powercap/Makefile > @@ -1,5 +1,6 @@ > # SPDX-License-Identifier: GPL-2.0-only > obj-$(CONFIG_DTPM) += dtpm.o > +obj-$(CONFIG_DTPM_CPU) += dtpm_cpu.o > obj-$(CONFIG_POWERCAP) += powercap_sys.o > obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o > obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o > diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c > new file mode 100644 > index 000000000000..23ebf704c599 > --- /dev/null > +++ b/drivers/powercap/dtpm_cpu.c > @@ -0,0 +1,242 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Copyright 2020 Linaro Limited > + * > + * Author: Daniel Lezcano <daniel.lezcano@linaro.org> > + * > + */ > +#include <linux/cpumask.h> > +#include <linux/cpufreq.h> > +#include <linux/cpuhotplug.h> > +#include <linux/dtpm.h> > +#include <linux/energy_model.h> > +#include <linux/pm_qos.h> > +#include <linux/slab.h> > +#include <linux/units.h> > + > +static struct dtpm *__parent; > + > +static DEFINE_PER_CPU(struct dtpm *, dtpm_per_cpu); > + > +struct dtpm_cpu { > + struct freq_qos_request qos_req; > + int cpu; > +}; > + > +static int power_add(struct dtpm *dtpm, struct em_perf_domain *em) > +{ > + u64 power_min, power_max; > + > + power_min = em->table[0].power; > + power_min *= MICROWATT_PER_MILLIWATT; > + power_min += dtpm->power_min; > + > + power_max = em->table[em->nr_perf_states - 1].power; > + power_max *= MICROWATT_PER_MILLIWATT; > + power_max += dtpm->power_max; > + > + return dtpm_update_power(dtpm, power_min, power_max); > +} > + > +static int power_sub(struct dtpm *dtpm, struct em_perf_domain *em) > +{ > + u64 power_min, power_max; > + > + power_min = em->table[0].power; > + power_min *= MICROWATT_PER_MILLIWATT; > + power_min = dtpm->power_min - power_min; > + > + power_max = em->table[em->nr_perf_states - 1].power; > + power_max *= MICROWATT_PER_MILLIWATT; > + power_max = dtpm->power_max - power_max; > + > + return dtpm_update_power(dtpm, power_min, power_max); > +} > + > +static int set_pd_power_limit(struct powercap_zone *pcz, int cid, > + u64 power_limit) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + struct dtpm_cpu *dtpm_cpu = dtpm->private; > + struct em_perf_domain *pd; > + unsigned long freq; > + int i, nr_cpus; > + > + spin_lock(&dtpm->lock); > + > + power_limit = clamp_val(power_limit, dtpm->power_min, dtpm->power_max); > + > + pd = em_cpu_get(dtpm_cpu->cpu); > + > + nr_cpus = cpumask_weight(to_cpumask(pd->cpus)); > + > + for (i = 0; i < pd->nr_perf_states; i++) { > + > + u64 power = pd->table[i].power * MICROWATT_PER_MILLIWATT; > + > + if ((power * nr_cpus) > power_limit) > + break; > + } > + > + freq = pd->table[i - 1].frequency; > + > + freq_qos_update_request(&dtpm_cpu->qos_req, freq); > + > + dtpm->power_limit = power_limit; > + > + spin_unlock(&dtpm->lock); > + > + return 0; > +} > + > +static int get_pd_power_limit(struct powercap_zone *pcz, int cid, u64 *data) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + > + spin_lock(&dtpm->lock); > + *data = dtpm->power_max; > + spin_unlock(&dtpm->lock); > + > + return 0; > +} > + > +static int get_pd_power_uw(struct powercap_zone *pcz, u64 *power_uw) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + struct dtpm_cpu *dtpm_cpu = dtpm->private; > + struct em_perf_domain *pd; > + unsigned long freq; > + int i, nr_cpus; > + > + freq = cpufreq_quick_get(dtpm_cpu->cpu); > + pd = em_cpu_get(dtpm_cpu->cpu); > + nr_cpus = cpumask_weight(to_cpumask(pd->cpus)); > + > + for (i = 0; i < pd->nr_perf_states; i++) { > + > + if (pd->table[i].frequency < freq) > + continue; > + > + *power_uw = pd->table[i].power * > + MICROWATT_PER_MILLIWATT * nr_cpus; > + > + return 0; > + } > + > + return -EINVAL; > +} > + > +static int cpu_release_zone(struct powercap_zone *pcz) > +{ > + struct dtpm *dtpm = to_dtpm(pcz); > + struct dtpm_cpu *dtpm_cpu = dtpm->private; > + > + freq_qos_remove_request(&dtpm_cpu->qos_req); > + > + return dtpm_release_zone(pcz); The dtpm_cpu should be freed somewhere, maybe here or below. > +} > + > +static struct powercap_zone_constraint_ops pd_constraint_ops = { > + .set_power_limit_uw = set_pd_power_limit, > + .get_power_limit_uw = get_pd_power_limit, > +}; > + > +static struct powercap_zone_ops pd_zone_ops = { > + .get_power_uw = get_pd_power_uw, > + .release = cpu_release_zone, > +}; > + > +static int cpuhp_dtpm_cpu_offline(unsigned int cpu) > +{ > + struct cpufreq_policy *policy; > + struct em_perf_domain *pd; > + struct dtpm *dtpm; > + > + policy = cpufreq_cpu_get(cpu); > + > + if (!policy) > + return 0; > + > + pd = em_cpu_get(cpu); > + if (!pd) > + return -EINVAL; > + > + dtpm = per_cpu(dtpm_per_cpu, cpu); > + > + power_sub(dtpm, pd); > + > + if (cpumask_weight(policy->cpus) != 1) > + return 0; > + > + for_each_cpu(cpu, policy->related_cpus) > + per_cpu(dtpm_per_cpu, cpu) = NULL; > + > + dtpm_unregister(dtpm); Is it the right place to call kfree(dtpm_cpu)? > + > + return 0; > +} > + > +static int cpuhp_dtpm_cpu_online(unsigned int cpu) > +{ > + struct dtpm *dtpm; > + struct dtpm_cpu *dtpm_cpu; > + struct cpufreq_policy *policy; > + struct em_perf_domain *pd; > + char name[CPUFREQ_NAME_LEN]; > + int ret; > + > + policy = cpufreq_cpu_get(cpu); > + > + if (!policy) > + return 0; > + > + pd = em_cpu_get(cpu); > + if (!pd) > + return -EINVAL; > + > + dtpm = per_cpu(dtpm_per_cpu, cpu); > + if (dtpm) > + return power_add(dtpm, pd); > + > + dtpm = dtpm_alloc(); > + if (!dtpm) > + return -EINVAL; > + > + dtpm_cpu = kzalloc(sizeof(dtpm_cpu), GFP_KERNEL); We have to free this dtpm_cpu somewhere. > + if (!dtpm_cpu) > + return -ENOMEM; > + > + dtpm->private = dtpm_cpu; > + dtpm_cpu->cpu = cpu; > + > + for_each_cpu(cpu, policy->related_cpus) > + per_cpu(dtpm_per_cpu, cpu) = dtpm; > + > + ret = power_add(dtpm, pd); > + if (ret) > + return ret; > + > + dtpm->power_limit = dtpm->power_max; > + > + sprintf(name, "cpu%d", dtpm_cpu->cpu); > + > + ret = dtpm_register(name, dtpm, __parent, &pd_zone_ops, > + 1, &pd_constraint_ops); > + if (ret) > + return ret; > + > + ret = freq_qos_add_request(&policy->constraints, > + &dtpm_cpu->qos_req, FREQ_QOS_MAX, > + pd->table[pd->nr_perf_states - 1].frequency); > + return ret; > +} > + > +int dtpm_register_cpu(struct dtpm *parent) > +{ > + __parent = parent; > + > + return cpuhp_setup_state(CPUHP_AP_DTPM_CPU_ONLINE, > + "dtpm_cpu:online", > + cpuhp_dtpm_cpu_online, > + cpuhp_dtpm_cpu_offline); Shouldn't be the DTPM_CPU dependent on HOTPLUG_CPU in Kconfig? Most of platforms enable it by default, though. > +} > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h > index bf9181cef444..6792bad4a435 100644 > --- a/include/linux/cpuhotplug.h > +++ b/include/linux/cpuhotplug.h > @@ -190,6 +190,7 @@ enum cpuhp_state { > CPUHP_AP_ONLINE_DYN_END = CPUHP_AP_ONLINE_DYN + 30, > CPUHP_AP_X86_HPET_ONLINE, > CPUHP_AP_X86_KVM_CLK_ONLINE, > + CPUHP_AP_DTPM_CPU_ONLINE, > CPUHP_AP_ACTIVE, > CPUHP_ONLINE, > }; > diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h > index 6696bdcfdb87..b62215a13baa 100644 > --- a/include/linux/dtpm.h > +++ b/include/linux/dtpm.h > @@ -70,4 +70,7 @@ int dtpm_register_parent(const char *name, struct dtpm *dtpm, > int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent, > struct powercap_zone_ops *ops, int nr_constraints, > struct powercap_zone_constraint_ops *const_ops); > + > +int dtpm_register_cpu(struct dtpm *parent); > + This function needs a sibling under in #ifdef CONFIG_DTPM_CPU #else. > #endif > The code might a few print debugs which helps with experimenting. In general, it looks good. Regards, Lukasz ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework 2020-10-06 12:20 [PATCH 0/4] powercap/dtpm: Add the DTPM framework Daniel Lezcano ` (3 preceding siblings ...) 2020-10-06 12:20 ` [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support Daniel Lezcano @ 2020-10-07 10:43 ` Hans de Goede 2020-10-12 10:30 ` Daniel Lezcano 4 siblings, 1 reply; 42+ messages in thread From: Hans de Goede @ 2020-10-07 10:43 UTC (permalink / raw) To: Daniel Lezcano, rafael, srinivas.pandruvada Cc: lukasz.luba, linux-kernel, linux-pm, rui.zhang Hi, On 10/6/20 2:20 PM, Daniel Lezcano wrote: > The density of components greatly increased the last decade bringing a > numerous number of heating sources which are monitored by more than 20 > sensors on recent SoC. The skin temperature, which is the case > temperature of the device, must stay below approximately 45°C in order > to comply with the legal requirements. > > The skin temperature is managed as a whole by an user space daemon, > which is catching the current application profile, to allocate a power > budget to the different components where the resulting heating effect > will comply with the skin temperature constraint. > > This technique is called the Dynamic Thermal Power Management. > > The Linux kernel does not provide any unified interface to act on the > power of the different devices. Currently, the thermal framework is > changed to export artificially the performance states of different > devices via the cooling device software component with opaque values. > This change is done regardless of the in-kernel logic to mitigate the > temperature. The user space daemon uses all the available knobs to act > on the power limit and those differ from one platform to another. > > This series provides a Dynamic Thermal Power Management framework to > provide an unified way to act on the power of the devices. Interesting, we have a discussion going on about a related (while at the same time almost orthogonal) discussion for setting policies for if the code managing the restraints (which on x86 is often hidden in firmware or ACPI DPTF tables) should have a bias towards trying to have as long a battery life as possible, vs maximum performance. I know those 2 aren't always opposite ends of a spectrum with race-to-idle, yet most modern x86 hardware has some notion of what I call performance-profiles where we can tell the firmware managing this to go for a bias towards low-power / balanced / performance. I've send a RFC / sysfs API proposal for this here: https://lore.kernel.org/linux-pm/20201003131938.9426-1-hdegoede@redhat.com/ I've read the patches in this thread and as said already I think the 2 APIs are mostly orthogonal. The API in this thread is giving userspace direct access to detailed power-limits allowing userspace to configure things directly (and for things to work optimal userspace must do this). Where as in the x86 case with which I'm dealing everything is mostly handled in a black-box and userspace can merely configure the low-power / balanced / performance bias (*) of that black-box. Still I think it is good if we are aware of each-others efforts here. So Daniel, if you can take a quick look at my proposal: https://lore.kernel.org/linux-pm/20201003131938.9426-1-hdegoede@redhat.com/ That would be great. I think we definitely want to avoid having 2 APIs for the same thing here. Again I don't think that is actually the case, but maybe you see this differently ? Regards, Hans *) bias might not always give the correct impression at least on some Thinkpads switching from balanced (which they call medium) to performance boosts the long time power-limit from aprox. 15W to 25W which as expected results in a significant performance boost. Note usually we have no idea what the black-box knob which we are tweaking actually does, all we know is that it is there and Windows 10 often has a slider to configure it and users want the same functionality under Linux. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework 2020-10-07 10:43 ` [PATCH 0/4] powercap/dtpm: Add the DTPM framework Hans de Goede @ 2020-10-12 10:30 ` Daniel Lezcano 2020-10-12 11:46 ` Hans de Goede 0 siblings, 1 reply; 42+ messages in thread From: Daniel Lezcano @ 2020-10-12 10:30 UTC (permalink / raw) To: Hans de Goede, rafael, srinivas.pandruvada Cc: lukasz.luba, linux-kernel, linux-pm, rui.zhang Hi Hans, On 07/10/2020 12:43, Hans de Goede wrote: > Hi, > > On 10/6/20 2:20 PM, Daniel Lezcano wrote: >> The density of components greatly increased the last decade bringing a >> numerous number of heating sources which are monitored by more than 20 >> sensors on recent SoC. The skin temperature, which is the case >> temperature of the device, must stay below approximately 45°C in order >> to comply with the legal requirements. >> >> The skin temperature is managed as a whole by an user space daemon, >> which is catching the current application profile, to allocate a power >> budget to the different components where the resulting heating effect >> will comply with the skin temperature constraint. >> >> This technique is called the Dynamic Thermal Power Management. >> >> The Linux kernel does not provide any unified interface to act on the >> power of the different devices. Currently, the thermal framework is >> changed to export artificially the performance states of different >> devices via the cooling device software component with opaque values. >> This change is done regardless of the in-kernel logic to mitigate the >> temperature. The user space daemon uses all the available knobs to act >> on the power limit and those differ from one platform to another. >> >> This series provides a Dynamic Thermal Power Management framework to >> provide an unified way to act on the power of the devices. > > Interesting, we have a discussion going on about a related > (while at the same time almost orthogonal) discussion for > setting policies for if the code managing the restraints > (which on x86 is often hidden in firmware or ACPI DPTF tables) > should have a bias towards trying to have as long a battery life > as possible, vs maximum performance. I know those 2 aren't > always opposite ends of a spectrum with race-to-idle, yet most > modern x86 hardware has some notion of what I call performance-profiles > where we can tell the firmware managing this to go for a bias towards > low-power / balanced / performance. > > I've send a RFC / sysfs API proposal for this here: > https://lore.kernel.org/linux-pm/20201003131938.9426-1-hdegoede@redhat.com/ > > I've read the patches in this thread and as said already I think > the 2 APIs are mostly orthogonal. The API in this thread is giving > userspace direct access to detailed power-limits allowing userspace > to configure things directly (and for things to work optimal userspace > must do this). Where as in the x86 case with which I'm dealing everything > is mostly handled in a black-box and userspace can merely configure > the low-power / balanced / performance bias (*) of that black-box. > > Still I think it is good if we are aware of each-others efforts here. > > So Daniel, if you can take a quick look at my proposal: > https://lore.kernel.org/linux-pm/20201003131938.9426-1-hdegoede@redhat.com/ > > That would be great. I think we definitely want to avoid having 2 > APIs for the same thing here. Again I don't think that is actually > the case, but maybe you see this differently ? Thanks for pointing this out. Actually, it is a different feature as you mentioned. The profile is the same knob we have with the BIOS where we can choose power/ balanced power / balanced/balanced performance / performance, AFAICT. Here the proposed interface is already exported in userspace via the powercap framework which supports today the backend driver for the RAPL register. The userspace will be in charge of handling the logic to have the correct power/performance profile tuned against the current application running foreground. The DTPM framework gives the unified access to the power limitation to the individual devices the userspace logic can act on. A side note, related to your proposal, not this patch. IMO it suits better to have /sys/power/profile. cat /sys/power/profile power balanced_power * balanced balanced_performance performance The (*) being the active profile. > *) bias might not always give the correct impression at least > on some Thinkpads switching from balanced (which they call medium) > to performance boosts the long time power-limit from aprox. > 15W to 25W which as expected results in a significant performance > boost. > > Note usually we have no idea what the black-box knob which we are > tweaking actually does, all we know is that it is there and > Windows 10 often has a slider to configure it and users want > the same functionality under Linux. > -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework 2020-10-12 10:30 ` Daniel Lezcano @ 2020-10-12 11:46 ` Hans de Goede 2020-10-12 16:02 ` Daniel Lezcano 2020-10-12 16:37 ` Rafael J. Wysocki 0 siblings, 2 replies; 42+ messages in thread From: Hans de Goede @ 2020-10-12 11:46 UTC (permalink / raw) To: Daniel Lezcano, rafael, srinivas.pandruvada Cc: lukasz.luba, linux-kernel, linux-pm, rui.zhang Hi Daniel, On 10/12/20 12:30 PM, Daniel Lezcano wrote: > > Hi Hans, > > On 07/10/2020 12:43, Hans de Goede wrote: >> Hi, >> >> On 10/6/20 2:20 PM, Daniel Lezcano wrote: >>> The density of components greatly increased the last decade bringing a >>> numerous number of heating sources which are monitored by more than 20 >>> sensors on recent SoC. The skin temperature, which is the case >>> temperature of the device, must stay below approximately 45°C in order >>> to comply with the legal requirements. >>> >>> The skin temperature is managed as a whole by an user space daemon, >>> which is catching the current application profile, to allocate a power >>> budget to the different components where the resulting heating effect >>> will comply with the skin temperature constraint. >>> >>> This technique is called the Dynamic Thermal Power Management. >>> >>> The Linux kernel does not provide any unified interface to act on the >>> power of the different devices. Currently, the thermal framework is >>> changed to export artificially the performance states of different >>> devices via the cooling device software component with opaque values. >>> This change is done regardless of the in-kernel logic to mitigate the >>> temperature. The user space daemon uses all the available knobs to act >>> on the power limit and those differ from one platform to another. >>> >>> This series provides a Dynamic Thermal Power Management framework to >>> provide an unified way to act on the power of the devices. >> >> Interesting, we have a discussion going on about a related >> (while at the same time almost orthogonal) discussion for >> setting policies for if the code managing the restraints >> (which on x86 is often hidden in firmware or ACPI DPTF tables) >> should have a bias towards trying to have as long a battery life >> as possible, vs maximum performance. I know those 2 aren't >> always opposite ends of a spectrum with race-to-idle, yet most >> modern x86 hardware has some notion of what I call performance-profiles >> where we can tell the firmware managing this to go for a bias towards >> low-power / balanced / performance. >> >> I've send a RFC / sysfs API proposal for this here: >> https://lore.kernel.org/linux-pm/20201003131938.9426-1-hdegoede@redhat.com/ >> >> I've read the patches in this thread and as said already I think >> the 2 APIs are mostly orthogonal. The API in this thread is giving >> userspace direct access to detailed power-limits allowing userspace >> to configure things directly (and for things to work optimal userspace >> must do this). Where as in the x86 case with which I'm dealing everything >> is mostly handled in a black-box and userspace can merely configure >> the low-power / balanced / performance bias (*) of that black-box. >> >> Still I think it is good if we are aware of each-others efforts here. >> >> So Daniel, if you can take a quick look at my proposal: >> https://lore.kernel.org/linux-pm/20201003131938.9426-1-hdegoede@redhat.com/ >> >> That would be great. I think we definitely want to avoid having 2 >> APIs for the same thing here. Again I don't think that is actually >> the case, but maybe you see this differently ? > > Thanks for pointing this out. Actually, it is a different feature as you > mentioned. The profile is the same knob we have with the BIOS where we > can choose power/ balanced power / balanced/balanced > performance / performance, AFAICT. Right. > Here the proposed interface is already exported in userspace via the > powercap framework which supports today the backend driver for the RAPL > register. You say that some sort of power/ balanced power / balanced / balanced performance / performance setting in is already exported through the powercap interface today (if I understand you correctly)? But I'm not seeing any such setting in: Documentation/ABI/testing/sysfs-class-powercap Nor can I find it under /sys/class/powercap/intel-rapl* on a ThinkPad X1 carbon 8th gen. Note, if there indeed is an existing userspace API for this I would greatly prefer for the thinkpad_acpi and hp-wmi (and possibly other) drivers to use this, so if you can point me to this interface then that would be great. > The userspace will be in charge of handling the logic to have the > correct power/performance profile tuned against the current application > running foreground. The DTPM framework gives the unified access to the > power limitation to the individual devices the userspace logic can act on. > > A side note, related to your proposal, not this patch. IMO it suits > better to have /sys/power/profile. > > cat /sys/power/profile > > power > balanced_power * > balanced > balanced_performance > performance > > The (*) being the active profile. Interesting the same thing was brought up in the discussion surrounding RFC which I posted. The downside against this approach is that it assumes that there only is a single system-wide settings. AFAIK that is not always the case, e.g. (AFAIK): 1. The intel pstate driver has something like this (might this be the rapl setting you mean? ) 2. The X1C8 has such a setting for the embedded-controller, controlled through the ACPI interfaces which thinkpad-acpi used 3. The hp-wmi interface allows selecting a profile which in turn (through AML code) sets a bunch of variables which influence how the (dynamic, through mjg59's patches) DPTF code controls various things At least the pstate setting and the vendor specific settings can co-exist. Also the powercap API has a notion of zones, I can see the same thing here, with a desktop e.g. having separate performance-profile selection for the CPU and a discrete GPU. So limiting the API to a single /sys/power/profile setting seems a bit limited and I have the feeling we will regret making this choice in the future. With that said your proposal would work well for the current thinkpad_acpi / hp-wmi cases, so I'm not 100% against it. This would require adding some internal API to the code which owns the /sys/power root-dir to allow registering a profile provider I guess. But that would also immediately bring the question, what if multiple drivers try to register themselves as /sys/power/profile provider ? Regards, Hans ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework 2020-10-12 11:46 ` Hans de Goede @ 2020-10-12 16:02 ` Daniel Lezcano 2020-10-13 12:47 ` Hans de Goede 2020-10-12 16:37 ` Rafael J. Wysocki 1 sibling, 1 reply; 42+ messages in thread From: Daniel Lezcano @ 2020-10-12 16:02 UTC (permalink / raw) To: Hans de Goede, rafael, srinivas.pandruvada Cc: lukasz.luba, linux-kernel, linux-pm, rui.zhang On 12/10/2020 13:46, Hans de Goede wrote: > Hi Daniel, > > On 10/12/20 12:30 PM, Daniel Lezcano wrote: >> >> Hi Hans, >> >> On 07/10/2020 12:43, Hans de Goede wrote: >>> Hi, >>> >>> On 10/6/20 2:20 PM, Daniel Lezcano wrote: >>>> The density of components greatly increased the last decade bringing a >>>> numerous number of heating sources which are monitored by more than 20 >>>> sensors on recent SoC. The skin temperature, which is the case >>>> temperature of the device, must stay below approximately 45°C in order >>>> to comply with the legal requirements. >>>> >>>> The skin temperature is managed as a whole by an user space daemon, >>>> which is catching the current application profile, to allocate a power >>>> budget to the different components where the resulting heating effect >>>> will comply with the skin temperature constraint. >>>> >>>> This technique is called the Dynamic Thermal Power Management. >>>> >>>> The Linux kernel does not provide any unified interface to act on the >>>> power of the different devices. Currently, the thermal framework is >>>> changed to export artificially the performance states of different >>>> devices via the cooling device software component with opaque values. >>>> This change is done regardless of the in-kernel logic to mitigate the >>>> temperature. The user space daemon uses all the available knobs to act >>>> on the power limit and those differ from one platform to another. >>>> >>>> This series provides a Dynamic Thermal Power Management framework to >>>> provide an unified way to act on the power of the devices. >>> >>> Interesting, we have a discussion going on about a related >>> (while at the same time almost orthogonal) discussion for >>> setting policies for if the code managing the restraints >>> (which on x86 is often hidden in firmware or ACPI DPTF tables) >>> should have a bias towards trying to have as long a battery life >>> as possible, vs maximum performance. I know those 2 aren't >>> always opposite ends of a spectrum with race-to-idle, yet most >>> modern x86 hardware has some notion of what I call performance-profiles >>> where we can tell the firmware managing this to go for a bias towards >>> low-power / balanced / performance. >>> >>> I've send a RFC / sysfs API proposal for this here: >>> https://lore.kernel.org/linux-pm/20201003131938.9426-1-hdegoede@redhat.com/ >>> >>> >>> I've read the patches in this thread and as said already I think >>> the 2 APIs are mostly orthogonal. The API in this thread is giving >>> userspace direct access to detailed power-limits allowing userspace >>> to configure things directly (and for things to work optimal userspace >>> must do this). Where as in the x86 case with which I'm dealing >>> everything >>> is mostly handled in a black-box and userspace can merely configure >>> the low-power / balanced / performance bias (*) of that black-box. >>> >>> Still I think it is good if we are aware of each-others efforts here. >>> >>> So Daniel, if you can take a quick look at my proposal: >>> https://lore.kernel.org/linux-pm/20201003131938.9426-1-hdegoede@redhat.com/ >>> >>> >>> That would be great. I think we definitely want to avoid having 2 >>> APIs for the same thing here. Again I don't think that is actually >>> the case, but maybe you see this differently ? >> >> Thanks for pointing this out. Actually, it is a different feature as you >> mentioned. The profile is the same knob we have with the BIOS where we >> can choose power/ balanced power / balanced/balanced >> performance / performance, AFAICT. > > Right. > >> Here the proposed interface is already exported in userspace via the >> powercap framework which supports today the backend driver for the RAPL >> register. > > You say that some sort of power/ balanced power / balanced / > balanced performance / performance setting in is already exported > through the powercap interface today (if I understand you correctly)? Sorry, I was unclear. I meant 'Here the proposed interface' referring to the powercap/dtpm. There is no profile interface in the powercap framework. > But I'm not seeing any such setting in: > Documentation/ABI/testing/sysfs-class-powercap > > Nor can I find it under /sys/class/powercap/intel-rapl* on a ThinkPad > X1 carbon 8th gen. > > Note, if there indeed is an existing userspace API for this I would > greatly prefer for the thinkpad_acpi and hp-wmi (and possibly other) > drivers to use this, so if you can point me to this interface then > that would be great. > >> The userspace will be in charge of handling the logic to have the >> correct power/performance profile tuned against the current application >> running foreground. The DTPM framework gives the unified access to the >> power limitation to the individual devices the userspace logic can act >> on. >> >> A side note, related to your proposal, not this patch. IMO it suits >> better to have /sys/power/profile. >> >> cat /sys/power/profile >> >> power >> balanced_power * >> balanced >> balanced_performance >> performance >> >> The (*) being the active profile. > > Interesting the same thing was brought up in the discussion surrounding > RFC which I posted. > > The downside against this approach is that it assumes that there > only is a single system-wide settings. AFAIK that is not always > the case, e.g. (AFAIK): > > 1. The intel pstate driver has something like this > (might this be the rapl setting you mean? ) > > 2. The X1C8 has such a setting for the embedded-controller, controlled > through the ACPI interfaces which thinkpad-acpi used > > 3. The hp-wmi interface allows selecting a profile which in turn > (through AML code) sets a bunch of variables which influence how > the (dynamic, through mjg59's patches) DPTF code controls various > things > > At least the pstate setting and the vendor specific settings can > co-exist. Also the powercap API has a notion of zones, I can see the > same thing here, with a desktop e.g. having separate performance-profile > selection for the CPU and a discrete GPU. > > So limiting the API to a single /sys/power/profile setting seems a > bit limited and I have the feeling we will regret making this > choice in the future. > > With that said your proposal would work well for the current > thinkpad_acpi / hp-wmi cases, so I'm not 100% against it. > > This would require adding some internal API to the code which > owns the /sys/power root-dir to allow registering a profile > provider I guess. But that would also immediately bring the > question, what if multiple drivers try to register themselves > as /sys/power/profile provider ? Did you consider putting the profile on a per device basis ? eg. /sys/devices/system/cpu/cpu[0-9]/power/profile May be make 'energy_performance_preference' obsolete in /sys/devices/system/cpu/cpufreq ? When one device sets the profile, all children will have the same profile. eg. A change in /sys/devices/system/cpu/power/profile will impact all the underlying cpu[0-9]/power/profile Or a change in /sys/devices/power/profile will change all profiles below /sys/devices. Well that is a high level suggestion, I don't know how that can fit with the cyclic sysfs hierarchy. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework 2020-10-12 16:02 ` Daniel Lezcano @ 2020-10-13 12:47 ` Hans de Goede 0 siblings, 0 replies; 42+ messages in thread From: Hans de Goede @ 2020-10-13 12:47 UTC (permalink / raw) To: Daniel Lezcano, rafael, srinivas.pandruvada Cc: lukasz.luba, linux-kernel, linux-pm, rui.zhang, Mark Pearson Hi, On 10/12/20 6:02 PM, Daniel Lezcano wrote: > On 12/10/2020 13:46, Hans de Goede wrote: >> Hi Daniel, >> >> On 10/12/20 12:30 PM, Daniel Lezcano wrote: <snip> >>> Here the proposed interface is already exported in userspace via the >>> powercap framework which supports today the backend driver for the RAPL >>> register. >> >> You say that some sort of power/ balanced power / balanced / >> balanced performance / performance setting in is already exported >> through the powercap interface today (if I understand you correctly)? > > Sorry, I was unclear. I meant 'Here the proposed interface' referring to > the powercap/dtpm. There is no profile interface in the powercap framework. Ah, I see. <snip> >>> A side note, related to your proposal, not this patch. IMO it suits >>> better to have /sys/power/profile. >>> >>> cat /sys/power/profile >>> >>> power >>> balanced_power * >>> balanced >>> balanced_performance >>> performance >>> >>> The (*) being the active profile. >> >> Interesting the same thing was brought up in the discussion surrounding >> RFC which I posted. >> >> The downside against this approach is that it assumes that there >> only is a single system-wide settings. AFAIK that is not always >> the case, e.g. (AFAIK): >> >> 1. The intel pstate driver has something like this >> (might this be the rapl setting you mean? ) >> >> 2. The X1C8 has such a setting for the embedded-controller, controlled >> through the ACPI interfaces which thinkpad-acpi used >> >> 3. The hp-wmi interface allows selecting a profile which in turn >> (through AML code) sets a bunch of variables which influence how >> the (dynamic, through mjg59's patches) DPTF code controls various >> things >> >> At least the pstate setting and the vendor specific settings can >> co-exist. Also the powercap API has a notion of zones, I can see the >> same thing here, with a desktop e.g. having separate performance-profile >> selection for the CPU and a discrete GPU. >> >> So limiting the API to a single /sys/power/profile setting seems a >> bit limited and I have the feeling we will regret making this >> choice in the future. >> >> With that said your proposal would work well for the current >> thinkpad_acpi / hp-wmi cases, so I'm not 100% against it. >> >> This would require adding some internal API to the code which >> owns the /sys/power root-dir to allow registering a profile >> provider I guess. But that would also immediately bring the >> question, what if multiple drivers try to register themselves >> as /sys/power/profile provider ? > > Did you consider putting the profile on a per device basis ? > > eg. > > /sys/devices/system/cpu/cpu[0-9]/power/profile > > May be make 'energy_performance_preference' obsolete in > /sys/devices/system/cpu/cpufreq ? > > When one device sets the profile, all children will have the same profile. > > eg. > > A change in /sys/devices/system/cpu/power/profile will impact all the > underlying cpu[0-9]/power/profile > > Or a change in /sys/devices/power/profile will change all profiles below > /sys/devices. > > Well that is a high level suggestion, I don't know how that can fit with > the cyclic sysfs hierarchy. A problem with I see with making this a per-device power setting is that only a few devices will actually have this; and then the question becomes how does userspace discover / find these devices ? Typically for these kinda discovery problems we use a sysfs class as a solution to group devices offering the same functionailty (through the same standardized sysfs API) together. Which circles back to my original RFC proposal for this. Regards, Hans ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework 2020-10-12 11:46 ` Hans de Goede 2020-10-12 16:02 ` Daniel Lezcano @ 2020-10-12 16:37 ` Rafael J. Wysocki 2020-10-13 13:04 ` Hans de Goede 1 sibling, 1 reply; 42+ messages in thread From: Rafael J. Wysocki @ 2020-10-12 16:37 UTC (permalink / raw) To: Hans de Goede Cc: Daniel Lezcano, Rafael J. Wysocki, Srinivas Pandruvada, Lukasz Luba, Linux Kernel Mailing List, Linux PM, Zhang, Rui On Mon, Oct 12, 2020 at 1:46 PM Hans de Goede <hdegoede@redhat.com> wrote: > > Hi Daniel, > > On 10/12/20 12:30 PM, Daniel Lezcano wrote: > > > > Hi Hans, > > > > On 07/10/2020 12:43, Hans de Goede wrote: > >> Hi, > >> > >> On 10/6/20 2:20 PM, Daniel Lezcano wrote: > >>> The density of components greatly increased the last decade bringing a > >>> numerous number of heating sources which are monitored by more than 20 > >>> sensors on recent SoC. The skin temperature, which is the case > >>> temperature of the device, must stay below approximately 45°C in order > >>> to comply with the legal requirements. > >>> > >>> The skin temperature is managed as a whole by an user space daemon, > >>> which is catching the current application profile, to allocate a power > >>> budget to the different components where the resulting heating effect > >>> will comply with the skin temperature constraint. > >>> > >>> This technique is called the Dynamic Thermal Power Management. > >>> > >>> The Linux kernel does not provide any unified interface to act on the > >>> power of the different devices. Currently, the thermal framework is > >>> changed to export artificially the performance states of different > >>> devices via the cooling device software component with opaque values. > >>> This change is done regardless of the in-kernel logic to mitigate the > >>> temperature. The user space daemon uses all the available knobs to act > >>> on the power limit and those differ from one platform to another. > >>> > >>> This series provides a Dynamic Thermal Power Management framework to > >>> provide an unified way to act on the power of the devices. > >> > >> Interesting, we have a discussion going on about a related > >> (while at the same time almost orthogonal) discussion for > >> setting policies for if the code managing the restraints > >> (which on x86 is often hidden in firmware or ACPI DPTF tables) > >> should have a bias towards trying to have as long a battery life > >> as possible, vs maximum performance. I know those 2 aren't > >> always opposite ends of a spectrum with race-to-idle, yet most > >> modern x86 hardware has some notion of what I call performance-profiles > >> where we can tell the firmware managing this to go for a bias towards > >> low-power / balanced / performance. > >> > >> I've send a RFC / sysfs API proposal for this here: > >> https://lore.kernel.org/linux-pm/20201003131938.9426-1-hdegoede@redhat.com/ > >> > >> I've read the patches in this thread and as said already I think > >> the 2 APIs are mostly orthogonal. The API in this thread is giving > >> userspace direct access to detailed power-limits allowing userspace > >> to configure things directly (and for things to work optimal userspace > >> must do this). Where as in the x86 case with which I'm dealing everything > >> is mostly handled in a black-box and userspace can merely configure > >> the low-power / balanced / performance bias (*) of that black-box. > >> > >> Still I think it is good if we are aware of each-others efforts here. > >> > >> So Daniel, if you can take a quick look at my proposal: > >> https://lore.kernel.org/linux-pm/20201003131938.9426-1-hdegoede@redhat.com/ > >> > >> That would be great. I think we definitely want to avoid having 2 > >> APIs for the same thing here. Again I don't think that is actually > >> the case, but maybe you see this differently ? > > > > Thanks for pointing this out. Actually, it is a different feature as you > > mentioned. The profile is the same knob we have with the BIOS where we > > can choose power/ balanced power / balanced/balanced > > performance / performance, AFAICT. > > Right. > > > Here the proposed interface is already exported in userspace via the > > powercap framework which supports today the backend driver for the RAPL > > register. > > You say that some sort of power/ balanced power / balanced / > balanced performance / performance setting in is already exported > through the powercap interface today (if I understand you correctly)? > > But I'm not seeing any such setting in: > Documentation/ABI/testing/sysfs-class-powercap > > Nor can I find it under /sys/class/powercap/intel-rapl* on a ThinkPad > X1 carbon 8th gen. > > Note, if there indeed is an existing userspace API for this I would > greatly prefer for the thinkpad_acpi and hp-wmi (and possibly other) > drivers to use this, so if you can point me to this interface then > that would be great. > > > The userspace will be in charge of handling the logic to have the > > correct power/performance profile tuned against the current application > > running foreground. The DTPM framework gives the unified access to the > > power limitation to the individual devices the userspace logic can act on. > > > > A side note, related to your proposal, not this patch. IMO it suits > > better to have /sys/power/profile. > > > > cat /sys/power/profile > > > > power > > balanced_power * > > balanced > > balanced_performance > > performance > > > > The (*) being the active profile. > > Interesting the same thing was brought up in the discussion surrounding > RFC which I posted. > > The downside against this approach is that it assumes that there > only is a single system-wide settings. AFAIK that is not always > the case, e.g. (AFAIK): > > 1. The intel pstate driver has something like this > (might this be the rapl setting you mean? ) > > 2. The X1C8 has such a setting for the embedded-controller, controlled > through the ACPI interfaces which thinkpad-acpi used > > 3. The hp-wmi interface allows selecting a profile which in turn > (through AML code) sets a bunch of variables which influence how > the (dynamic, through mjg59's patches) DPTF code controls various > things > > At least the pstate setting and the vendor specific settings can > co-exist. Also the powercap API has a notion of zones, I can see the > same thing here, with a desktop e.g. having separate performance-profile > selection for the CPU and a discrete GPU. > > So limiting the API to a single /sys/power/profile setting seems a > bit limited and I have the feeling we will regret making this > choice in the future. > > With that said your proposal would work well for the current > thinkpad_acpi / hp-wmi cases, so I'm not 100% against it. > > This would require adding some internal API to the code which > owns the /sys/power root-dir to allow registering a profile > provider I guess. But that would also immediately bring the > question, what if multiple drivers try to register themselves > as /sys/power/profile provider ? It doesn't need to work this way IMV. It may also work by allowing drivers (or whatever kernel entities are interested in that) to subscribe to it, so that they get notified whenever a new value is written to it by user space (eg. each driver may be able to register a callback to be invoked when that happens). The information coming from user space will just be passed to the subscribers of that interface and they will do about it what they want (eg. it may be translated into a value to be written to a performance-vs-power interface provided by the platform or similar). This really is similar to having a class interface with one file per "subscribed" device except that the aggregation is done in the kernel and not in user space and the subscribers need not be related to specific devices. It still allows to avoid exposing the low-level interfaces to user space verbatim and it just passes the "policy" choice from user space down to the entities that can take it into account. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework 2020-10-12 16:37 ` Rafael J. Wysocki @ 2020-10-13 13:04 ` Hans de Goede 2020-10-14 13:33 ` Rafael J. Wysocki 0 siblings, 1 reply; 42+ messages in thread From: Hans de Goede @ 2020-10-13 13:04 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Daniel Lezcano, Srinivas Pandruvada, Lukasz Luba, Linux Kernel Mailing List, Linux PM, Zhang, Rui, Bastien Nocera, Mark Pearson Hi Rafael, On 10/12/20 6:37 PM, Rafael J. Wysocki wrote: > On Mon, Oct 12, 2020 at 1:46 PM Hans de Goede <hdegoede@redhat.com> wrote: <snip> >>> A side note, related to your proposal, not this patch. IMO it suits >>> better to have /sys/power/profile. >>> >>> cat /sys/power/profile >>> >>> power >>> balanced_power * >>> balanced >>> balanced_performance >>> performance >>> >>> The (*) being the active profile. >> >> Interesting the same thing was brought up in the discussion surrounding >> RFC which I posted. >> >> The downside against this approach is that it assumes that there >> only is a single system-wide settings. AFAIK that is not always >> the case, e.g. (AFAIK): >> >> 1. The intel pstate driver has something like this >> (might this be the rapl setting you mean? ) >> >> 2. The X1C8 has such a setting for the embedded-controller, controlled >> through the ACPI interfaces which thinkpad-acpi used >> >> 3. The hp-wmi interface allows selecting a profile which in turn >> (through AML code) sets a bunch of variables which influence how >> the (dynamic, through mjg59's patches) DPTF code controls various >> things >> >> At least the pstate setting and the vendor specific settings can >> co-exist. Also the powercap API has a notion of zones, I can see the >> same thing here, with a desktop e.g. having separate performance-profile >> selection for the CPU and a discrete GPU. >> >> So limiting the API to a single /sys/power/profile setting seems a >> bit limited and I have the feeling we will regret making this >> choice in the future. >> >> With that said your proposal would work well for the current >> thinkpad_acpi / hp-wmi cases, so I'm not 100% against it. >> >> This would require adding some internal API to the code which >> owns the /sys/power root-dir to allow registering a profile >> provider I guess. But that would also immediately bring the >> question, what if multiple drivers try to register themselves >> as /sys/power/profile provider ? > > It doesn't need to work this way IMV. > > It may also work by allowing drivers (or whatever kernel entities are > interested in that) to subscribe to it, so that they get notified > whenever a new value is written to it by user space (eg. each driver > may be able to register a callback to be invoked when that happens). > The information coming from user space will just be passed to the > subscribers of that interface and they will do about it what they want > (eg. it may be translated into a value to be written to a > performance-vs-power interface provided by the platform or similar). > > This really is similar to having a class interface with one file per > "subscribed" device except that the aggregation is done in the kernel > and not in user space and the subscribers need not be related to > specific devices. It still allows to avoid exposing the low-level > interfaces to user space verbatim and it just passes the "policy" > choice from user space down to the entities that can take it into > account. First of all thank you for your input, with your expertise in this area your input is very much appreciated, after all we only get one chance to get the userspace API for this right. Your proposal to have a single sysfs file for userspace to talk to and then use an in kernel subscription mechanism for drivers to get notified of writes to this file is interesting. But I see 2 issues with it: 1. How will userspace know which profiles are actually available ? An obvious solution is to pick a set of standard names and let subscribers map those as close to their own settings as possible, the most often mentioned set of profile names in this case seems to be: low_power balanced_power balanced balanced_performance performance Which works fine for the thinkpad_acpi case, but not so much for the hp-wmi case. In the HP case what happens is that a WMI call is made which sets a bunch of ACPI variables which influence the DPTF code (this assumes we have some sort of DPTF support such as mjg59's reverse engineered support) but the profile-names under Windows are: "Performance", "HP recommended", "Cool" and "Quiet". If you read the discussion from the "[RFC] Documentation: Add documentation for new performance_profile sysfs class" thread you will see this was brought up as an issue there. The problem here is that both "cool" and "quiet" could be interpreted as low-power. But it seems that they actually mean what they say, cool focuses on keeping temps low, which can also be done by making the fan-profile more aggressive. And quiet is mostly about keeping fan speeds down, at the cost of possible higher temperatures. IOW we don't really have a 1 dimensional axis. My class proposal fixes this by having a notion of both standardized names (because anything else would suck) combined with a way for drivers to advertise which standardized names the support. So in my proposal I simply add quiet and cool to the list of standard profile names, and then the HP-wmi driver can list those as supported, while not listing low_power as a supported profile. This way we export the hardware interface to userspace as is (as much as possible) while still offering a standardized interface for userspace to consume. Granted if userspace now actually want to set a low_power profile, we have just punted the problem to userspace but I really do not see a better solution. 2. This only works assuming that all performance-profiles are system wide. But given a big desktop case there might be very well be separate cooling zones for e.g. the CPU and the GPU and I can imagine both having separate performance-profile settings and some users will doubtlessly want to be able to control these separately ... Regards, Hans ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework 2020-10-13 13:04 ` Hans de Goede @ 2020-10-14 13:33 ` Rafael J. Wysocki 2020-10-14 14:06 ` Hans de Goede 0 siblings, 1 reply; 42+ messages in thread From: Rafael J. Wysocki @ 2020-10-14 13:33 UTC (permalink / raw) To: Hans de Goede Cc: Rafael J. Wysocki, Daniel Lezcano, Srinivas Pandruvada, Lukasz Luba, Linux Kernel Mailing List, Linux PM, Zhang, Rui, Bastien Nocera, Mark Pearson Hi Hans, On Tue, Oct 13, 2020 at 3:04 PM Hans de Goede <hdegoede@redhat.com> wrote: > > Hi Rafael, > > On 10/12/20 6:37 PM, Rafael J. Wysocki wrote: > > On Mon, Oct 12, 2020 at 1:46 PM Hans de Goede <hdegoede@redhat.com> wrote: > > <snip> > > >>> A side note, related to your proposal, not this patch. IMO it suits > >>> better to have /sys/power/profile. > >>> > >>> cat /sys/power/profile > >>> > >>> power > >>> balanced_power * > >>> balanced > >>> balanced_performance > >>> performance > >>> > >>> The (*) being the active profile. > >> > >> Interesting the same thing was brought up in the discussion surrounding > >> RFC which I posted. > >> > >> The downside against this approach is that it assumes that there > >> only is a single system-wide settings. AFAIK that is not always > >> the case, e.g. (AFAIK): > >> > >> 1. The intel pstate driver has something like this > >> (might this be the rapl setting you mean? ) > >> > >> 2. The X1C8 has such a setting for the embedded-controller, controlled > >> through the ACPI interfaces which thinkpad-acpi used > >> > >> 3. The hp-wmi interface allows selecting a profile which in turn > >> (through AML code) sets a bunch of variables which influence how > >> the (dynamic, through mjg59's patches) DPTF code controls various > >> things > >> > >> At least the pstate setting and the vendor specific settings can > >> co-exist. Also the powercap API has a notion of zones, I can see the > >> same thing here, with a desktop e.g. having separate performance-profile > >> selection for the CPU and a discrete GPU. > >> > >> So limiting the API to a single /sys/power/profile setting seems a > >> bit limited and I have the feeling we will regret making this > >> choice in the future. > >> > >> With that said your proposal would work well for the current > >> thinkpad_acpi / hp-wmi cases, so I'm not 100% against it. > >> > >> This would require adding some internal API to the code which > >> owns the /sys/power root-dir to allow registering a profile > >> provider I guess. But that would also immediately bring the > >> question, what if multiple drivers try to register themselves > >> as /sys/power/profile provider ? > > > > It doesn't need to work this way IMV. > > > > It may also work by allowing drivers (or whatever kernel entities are > > interested in that) to subscribe to it, so that they get notified > > whenever a new value is written to it by user space (eg. each driver > > may be able to register a callback to be invoked when that happens). > > The information coming from user space will just be passed to the > > subscribers of that interface and they will do about it what they want > > (eg. it may be translated into a value to be written to a > > performance-vs-power interface provided by the platform or similar). > > > > This really is similar to having a class interface with one file per > > "subscribed" device except that the aggregation is done in the kernel > > and not in user space and the subscribers need not be related to > > specific devices. It still allows to avoid exposing the low-level > > interfaces to user space verbatim and it just passes the "policy" > > choice from user space down to the entities that can take it into > > account. > > First of all thank you for your input, with your expertise in this > area your input is very much appreciated, after all we only get > one chance to get the userspace API for this right. > > Your proposal to have a single sysfs file for userspace to talk > to and then use an in kernel subscription mechanism for drivers > to get notified of writes to this file is interesting. > > But I see 2 issues with it: > > 1. How will userspace know which profiles are actually available ? > > An obvious solution is to pick a set of standard names and let > subscribers map those as close to their own settings as possible, > the most often mentioned set of profile names in this case seems to be: > > low_power > balanced_power > balanced > balanced_performance > performance > > Which works fine for the thinkpad_acpi case, but not so much for > the hp-wmi case. In the HP case what happens is that a WMI call > is made which sets a bunch of ACPI variables which influence > the DPTF code (this assumes we have some sort of DPTF support > such as mjg59's reverse engineered support) but the profile-names > under Windows are: "Performance", "HP recommended", "Cool" and > "Quiet". If you read the discussion from the > "[RFC] Documentation: Add documentation for new performance_profile sysfs class" > thread you will see this was brought up as an issue there. Two different things seem to be conflated here. One is how to pass a possible performance-vs-power preference coming from user space down to device drivers or generally pieces of kernel code that can adjust the behavior and/or hardware settings depending on what that preference is and the other is how to expose OEM-provided DPTF system profile interfaces to user space. The former assumes that there is a common set of values that can be understood and acted on in a consistent way by all of the interested entities within the kernel and the latter is about passing information from user space down to a side-band power control mechanism working in its own way behind the kernel's back (and possibly poking at multiple hardware components in the platform in its own way). IMO there is no way to provide a common interface covering these two cases at the same time. > The problem here is that both "cool" and "quiet" could be > interpreted as low-power. But it seems that they actually mean > what they say, cool focuses on keeping temps low, which can > also be done by making the fan-profile more aggressive. And quiet > is mostly about keeping fan speeds down, at the cost of possible > higher temperatures. IOW we don't really have a 1 dimensional > axis. Well, AFAICS, DPTF system profile interfaces coming from different OEMs will be different, but they are about side-band power control and there can be only one thing like that in a platform at the same time. > My class proposal fixes this by having a notion of both > standardized names (because anything else would suck) combined > with a way for drivers to advertise which standardized names > the support. So in my proposal I simply add quiet and cool > to the list of standard profile names, and then the HP-wmi > driver can list those as supported, while not listing > low_power as a supported profile. This way we export the > hardware interface to userspace as is (as much as possible) > while still offering a standardized interface for userspace > to consume. Granted if userspace now actually want to set > a low_power profile, we have just punted the problem to userspace > but I really do not see a better solution. First, a common place to register a DPTF system profile seems to be needed and, as I said above, I wouldn't expect more than one such thing to be present in the system at any given time, so it may be registered along with the list of supported profiles and user space will have to understand what they mean. Second, irrespective of the above, it may be useful to have a consistent way to pass performance-vs-power preference information from user space to different parts of the kernel so as to allow them to adjust their operation and this could be done with a system-wide power profile attribute IMO. > 2. This only works assuming that all performance-profiles > are system wide. But given a big desktop case there might > be very well be separate cooling zones for e.g. the CPU > and the GPU and I can imagine both having separate > performance-profile settings and some users will doubtlessly > want to be able to control these separately ... Let's say that I'm not convinced. :-) They cannot be totally separate, because they will affect each other and making possibly conflicting adjustments needs to be avoided. Cheers! ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework 2020-10-14 13:33 ` Rafael J. Wysocki @ 2020-10-14 14:06 ` Hans de Goede 2020-10-14 15:42 ` Rafael J. Wysocki 0 siblings, 1 reply; 42+ messages in thread From: Hans de Goede @ 2020-10-14 14:06 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Daniel Lezcano, Srinivas Pandruvada, Lukasz Luba, Linux Kernel Mailing List, Linux PM, Zhang, Rui, Bastien Nocera, Mark Pearson Hi, On 10/14/20 3:33 PM, Rafael J. Wysocki wrote: > Hi Hans, > > On Tue, Oct 13, 2020 at 3:04 PM Hans de Goede <hdegoede@redhat.com> wrote: >> >> Hi Rafael, >> >> On 10/12/20 6:37 PM, Rafael J. Wysocki wrote: >>> On Mon, Oct 12, 2020 at 1:46 PM Hans de Goede <hdegoede@redhat.com> wrote: >> >> <snip> >> >>>>> A side note, related to your proposal, not this patch. IMO it suits >>>>> better to have /sys/power/profile. >>>>> >>>>> cat /sys/power/profile >>>>> >>>>> power >>>>> balanced_power * >>>>> balanced >>>>> balanced_performance >>>>> performance >>>>> >>>>> The (*) being the active profile. >>>> >>>> Interesting the same thing was brought up in the discussion surrounding >>>> RFC which I posted. >>>> >>>> The downside against this approach is that it assumes that there >>>> only is a single system-wide settings. AFAIK that is not always >>>> the case, e.g. (AFAIK): >>>> >>>> 1. The intel pstate driver has something like this >>>> (might this be the rapl setting you mean? ) >>>> >>>> 2. The X1C8 has such a setting for the embedded-controller, controlled >>>> through the ACPI interfaces which thinkpad-acpi used >>>> >>>> 3. The hp-wmi interface allows selecting a profile which in turn >>>> (through AML code) sets a bunch of variables which influence how >>>> the (dynamic, through mjg59's patches) DPTF code controls various >>>> things >>>> >>>> At least the pstate setting and the vendor specific settings can >>>> co-exist. Also the powercap API has a notion of zones, I can see the >>>> same thing here, with a desktop e.g. having separate performance-profile >>>> selection for the CPU and a discrete GPU. >>>> >>>> So limiting the API to a single /sys/power/profile setting seems a >>>> bit limited and I have the feeling we will regret making this >>>> choice in the future. >>>> >>>> With that said your proposal would work well for the current >>>> thinkpad_acpi / hp-wmi cases, so I'm not 100% against it. >>>> >>>> This would require adding some internal API to the code which >>>> owns the /sys/power root-dir to allow registering a profile >>>> provider I guess. But that would also immediately bring the >>>> question, what if multiple drivers try to register themselves >>>> as /sys/power/profile provider ? >>> >>> It doesn't need to work this way IMV. >>> >>> It may also work by allowing drivers (or whatever kernel entities are >>> interested in that) to subscribe to it, so that they get notified >>> whenever a new value is written to it by user space (eg. each driver >>> may be able to register a callback to be invoked when that happens). >>> The information coming from user space will just be passed to the >>> subscribers of that interface and they will do about it what they want >>> (eg. it may be translated into a value to be written to a >>> performance-vs-power interface provided by the platform or similar). >>> >>> This really is similar to having a class interface with one file per >>> "subscribed" device except that the aggregation is done in the kernel >>> and not in user space and the subscribers need not be related to >>> specific devices. It still allows to avoid exposing the low-level >>> interfaces to user space verbatim and it just passes the "policy" >>> choice from user space down to the entities that can take it into >>> account. >> >> First of all thank you for your input, with your expertise in this >> area your input is very much appreciated, after all we only get >> one chance to get the userspace API for this right. >> >> Your proposal to have a single sysfs file for userspace to talk >> to and then use an in kernel subscription mechanism for drivers >> to get notified of writes to this file is interesting. >> >> But I see 2 issues with it: >> >> 1. How will userspace know which profiles are actually available ? >> >> An obvious solution is to pick a set of standard names and let >> subscribers map those as close to their own settings as possible, >> the most often mentioned set of profile names in this case seems to be: >> >> low_power >> balanced_power >> balanced >> balanced_performance >> performance >> >> Which works fine for the thinkpad_acpi case, but not so much for >> the hp-wmi case. In the HP case what happens is that a WMI call >> is made which sets a bunch of ACPI variables which influence >> the DPTF code (this assumes we have some sort of DPTF support >> such as mjg59's reverse engineered support) but the profile-names >> under Windows are: "Performance", "HP recommended", "Cool" and >> "Quiet". If you read the discussion from the >> "[RFC] Documentation: Add documentation for new performance_profile sysfs class" >> thread you will see this was brought up as an issue there. > > Two different things seem to be conflated here. One is how to pass a > possible performance-vs-power preference coming from user space down > to device drivers or generally pieces of kernel code that can adjust > the behavior and/or hardware settings depending on what that > preference is and the other is how to expose OEM-provided DPTF system > profile interfaces to user space. I was hoping / thinking that we could use a single API for both of these. But I guess that it makes sense to see them as 2 separate things, esp. since DPTF profiles seem to be somewhat free-form where as a way to pass a performance-pref to a device could use a fixes set of values. So lets say that we indeed want to treat these 2 separately, then I guess that the issue at hand / my reason to start a discussion surrounding this is allowing userspace to selecting the DPTF system profile. The thinkpad_acpi case at hand is not using DPTF, but that is because Lenovo decided to implement dynamic DPTF like behavior inside their embedded controller (for when running Linux) since DPTF is atm not really supported all that well under Linux and Lenovo was getting a lot of complaints about sub-optimal performance because of this. So the thinkpad_acpi solution is in essence a replacement for DPTF and it should thus use the same userspace API as other mechanisms to select DPTF system profiles. And if we limit this new userspace API solely to setting DPTF system profiles, then their will indeed be only 1 provider for this for the entire system. > The former assumes that there is a common set of values that can be > understood and acted on in a consistent way by all of the interested > entities within the kernel and the latter is about passing information > from user space down to a side-band power control mechanism working in > its own way behind the kernel's back (and possibly poking at multiple > hardware components in the platform in its own way). Ack. > IMO there is no way to provide a common interface covering these two > cases at the same time. I see your point, esp. the free form vs common set of values argument seems to be exactly what we have been going in circles about during the discussion about this so far. >> The problem here is that both "cool" and "quiet" could be >> interpreted as low-power. But it seems that they actually mean >> what they say, cool focuses on keeping temps low, which can >> also be done by making the fan-profile more aggressive. And quiet >> is mostly about keeping fan speeds down, at the cost of possible >> higher temperatures. IOW we don't really have a 1 dimensional >> axis. > > Well, AFAICS, DPTF system profile interfaces coming from different > OEMs will be different, but they are about side-band power control and > there can be only one thing like that in a platform at the same time. Ack. >> My class proposal fixes this by having a notion of both >> standardized names (because anything else would suck) combined >> with a way for drivers to advertise which standardized names >> the support. So in my proposal I simply add quiet and cool >> to the list of standard profile names, and then the HP-wmi >> driver can list those as supported, while not listing >> low_power as a supported profile. This way we export the >> hardware interface to userspace as is (as much as possible) >> while still offering a standardized interface for userspace >> to consume. Granted if userspace now actually want to set >> a low_power profile, we have just punted the problem to userspace >> but I really do not see a better solution. > > First, a common place to register a DPTF system profile seems to be > needed and, as I said above, I wouldn't expect more than one such > thing to be present in the system at any given time, so it may be > registered along with the list of supported profiles and user space > will have to understand what they mean. Mostly Ack, I would still like to have an enum for DPTF system profiles in the kernel and have a single piece of code map that enum to profile names. This enum can then be extended as necessary, but I want to avoid having one driver use "Performance" and the other "performance" or one using "performance-balanced" and the other "balanced-performance", etc. With the goal being that new drivers use existing values from the enum as much as possible, but we extend it where necessary. > Second, irrespective of the above, it may be useful to have a > consistent way to pass performance-vs-power preference information > from user space to different parts of the kernel so as to allow them > to adjust their operation and this could be done with a system-wide > power profile attribute IMO. I agree, which is why I tried to tackle both things in one go, but as you said doing both in 1 API is probably not the best idea. So I believe we should park this second issue for now and revisit it when we find a need for it. Do you have any specific userspace API in mind for the DPTF system profile selection? And to get one thing out of the way, in the other thread we had some discussion about using a single attribute where a cat would result in: low-power [balanced] performance Where the [] indicate the active profile, vs having 2 sysfs attributes one ro with space-separated available (foo_available) values and one wr with the actual/current value. FWIW userspace folks have indicated they prefer the solution with 2 separate sysfs-attributes and that is also what e.g. cpufreq is currently using for governor selection. I don't really have a strong opinion either way. Regards, Hans ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework 2020-10-14 14:06 ` Hans de Goede @ 2020-10-14 15:42 ` Rafael J. Wysocki 2020-10-16 11:10 ` [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) Hans de Goede 0 siblings, 1 reply; 42+ messages in thread From: Rafael J. Wysocki @ 2020-10-14 15:42 UTC (permalink / raw) To: Hans de Goede Cc: Rafael J. Wysocki, Daniel Lezcano, Srinivas Pandruvada, Lukasz Luba, Linux Kernel Mailing List, Linux PM, Zhang, Rui, Bastien Nocera, Mark Pearson On Wed, Oct 14, 2020 at 4:06 PM Hans de Goede <hdegoede@redhat.com> wrote: > > Hi, > > On 10/14/20 3:33 PM, Rafael J. Wysocki wrote: > > Hi Hans, > > > > On Tue, Oct 13, 2020 at 3:04 PM Hans de Goede <hdegoede@redhat.com> wrote: > >> > >> Hi Rafael, > >> > >> On 10/12/20 6:37 PM, Rafael J. Wysocki wrote: > >>> On Mon, Oct 12, 2020 at 1:46 PM Hans de Goede <hdegoede@redhat.com> wrote: > >> > >> <snip> > >> > >>>>> A side note, related to your proposal, not this patch. IMO it suits > >>>>> better to have /sys/power/profile. > >>>>> > >>>>> cat /sys/power/profile > >>>>> > >>>>> power > >>>>> balanced_power * > >>>>> balanced > >>>>> balanced_performance > >>>>> performance > >>>>> > >>>>> The (*) being the active profile. > >>>> > >>>> Interesting the same thing was brought up in the discussion surrounding > >>>> RFC which I posted. > >>>> > >>>> The downside against this approach is that it assumes that there > >>>> only is a single system-wide settings. AFAIK that is not always > >>>> the case, e.g. (AFAIK): > >>>> > >>>> 1. The intel pstate driver has something like this > >>>> (might this be the rapl setting you mean? ) > >>>> > >>>> 2. The X1C8 has such a setting for the embedded-controller, controlled > >>>> through the ACPI interfaces which thinkpad-acpi used > >>>> > >>>> 3. The hp-wmi interface allows selecting a profile which in turn > >>>> (through AML code) sets a bunch of variables which influence how > >>>> the (dynamic, through mjg59's patches) DPTF code controls various > >>>> things > >>>> > >>>> At least the pstate setting and the vendor specific settings can > >>>> co-exist. Also the powercap API has a notion of zones, I can see the > >>>> same thing here, with a desktop e.g. having separate performance-profile > >>>> selection for the CPU and a discrete GPU. > >>>> > >>>> So limiting the API to a single /sys/power/profile setting seems a > >>>> bit limited and I have the feeling we will regret making this > >>>> choice in the future. > >>>> > >>>> With that said your proposal would work well for the current > >>>> thinkpad_acpi / hp-wmi cases, so I'm not 100% against it. > >>>> > >>>> This would require adding some internal API to the code which > >>>> owns the /sys/power root-dir to allow registering a profile > >>>> provider I guess. But that would also immediately bring the > >>>> question, what if multiple drivers try to register themselves > >>>> as /sys/power/profile provider ? > >>> > >>> It doesn't need to work this way IMV. > >>> > >>> It may also work by allowing drivers (or whatever kernel entities are > >>> interested in that) to subscribe to it, so that they get notified > >>> whenever a new value is written to it by user space (eg. each driver > >>> may be able to register a callback to be invoked when that happens). > >>> The information coming from user space will just be passed to the > >>> subscribers of that interface and they will do about it what they want > >>> (eg. it may be translated into a value to be written to a > >>> performance-vs-power interface provided by the platform or similar). > >>> > >>> This really is similar to having a class interface with one file per > >>> "subscribed" device except that the aggregation is done in the kernel > >>> and not in user space and the subscribers need not be related to > >>> specific devices. It still allows to avoid exposing the low-level > >>> interfaces to user space verbatim and it just passes the "policy" > >>> choice from user space down to the entities that can take it into > >>> account. > >> > >> First of all thank you for your input, with your expertise in this > >> area your input is very much appreciated, after all we only get > >> one chance to get the userspace API for this right. > >> > >> Your proposal to have a single sysfs file for userspace to talk > >> to and then use an in kernel subscription mechanism for drivers > >> to get notified of writes to this file is interesting. > >> > >> But I see 2 issues with it: > >> > >> 1. How will userspace know which profiles are actually available ? > >> > >> An obvious solution is to pick a set of standard names and let > >> subscribers map those as close to their own settings as possible, > >> the most often mentioned set of profile names in this case seems to be: > >> > >> low_power > >> balanced_power > >> balanced > >> balanced_performance > >> performance > >> > >> Which works fine for the thinkpad_acpi case, but not so much for > >> the hp-wmi case. In the HP case what happens is that a WMI call > >> is made which sets a bunch of ACPI variables which influence > >> the DPTF code (this assumes we have some sort of DPTF support > >> such as mjg59's reverse engineered support) but the profile-names > >> under Windows are: "Performance", "HP recommended", "Cool" and > >> "Quiet". If you read the discussion from the > >> "[RFC] Documentation: Add documentation for new performance_profile sysfs class" > >> thread you will see this was brought up as an issue there. > > > > Two different things seem to be conflated here. One is how to pass a > > possible performance-vs-power preference coming from user space down > > to device drivers or generally pieces of kernel code that can adjust > > the behavior and/or hardware settings depending on what that > > preference is and the other is how to expose OEM-provided DPTF system > > profile interfaces to user space. > > I was hoping / thinking that we could use a single API for both of > these. But I guess that it makes sense to see them as 2 separate > things, esp. since DPTF profiles seem to be somewhat free-form > where as a way to pass a performance-pref to a device could use > a fixes set of values. > > So lets say that we indeed want to treat these 2 separately, > then I guess that the issue at hand / my reason to start a > discussion surrounding this is allowing userspace to selecting > the DPTF system profile. > > The thinkpad_acpi case at hand is not using DPTF, but that is > because Lenovo decided to implement dynamic DPTF like behavior > inside their embedded controller (for when running Linux) since > DPTF is atm not really supported all that well under Linux and > Lenovo was getting a lot of complaints about sub-optimal > performance because of this. > > So the thinkpad_acpi solution is in essence a replacement > for DPTF and it should thus use the same userspace API as > other mechanisms to select DPTF system profiles. > > And if we limit this new userspace API solely to setting DPTF > system profiles, then their will indeed be only 1 provider for > this for the entire system. > > > The former assumes that there is a common set of values that can be > > understood and acted on in a consistent way by all of the interested > > entities within the kernel and the latter is about passing information > > from user space down to a side-band power control mechanism working in > > its own way behind the kernel's back (and possibly poking at multiple > > hardware components in the platform in its own way). > > Ack. > > > IMO there is no way to provide a common interface covering these two > > cases at the same time. > > I see your point, esp. the free form vs common set of values > argument seems to be exactly what we have been going in circles > about during the discussion about this so far. > > >> The problem here is that both "cool" and "quiet" could be > >> interpreted as low-power. But it seems that they actually mean > >> what they say, cool focuses on keeping temps low, which can > >> also be done by making the fan-profile more aggressive. And quiet > >> is mostly about keeping fan speeds down, at the cost of possible > >> higher temperatures. IOW we don't really have a 1 dimensional > >> axis. > > > > Well, AFAICS, DPTF system profile interfaces coming from different > > OEMs will be different, but they are about side-band power control and > > there can be only one thing like that in a platform at the same time. > > Ack. > > >> My class proposal fixes this by having a notion of both > >> standardized names (because anything else would suck) combined > >> with a way for drivers to advertise which standardized names > >> the support. So in my proposal I simply add quiet and cool > >> to the list of standard profile names, and then the HP-wmi > >> driver can list those as supported, while not listing > >> low_power as a supported profile. This way we export the > >> hardware interface to userspace as is (as much as possible) > >> while still offering a standardized interface for userspace > >> to consume. Granted if userspace now actually want to set > >> a low_power profile, we have just punted the problem to userspace > >> but I really do not see a better solution. > > > > First, a common place to register a DPTF system profile seems to be > > needed and, as I said above, I wouldn't expect more than one such > > thing to be present in the system at any given time, so it may be > > registered along with the list of supported profiles and user space > > will have to understand what they mean. > > Mostly Ack, I would still like to have an enum for DPTF system > profiles in the kernel and have a single piece of code map that > enum to profile names. This enum can then be extended as > necessary, but I want to avoid having one driver use > "Performance" and the other "performance" or one using > "performance-balanced" and the other "balanced-performance", etc. > > With the goal being that new drivers use existing values from > the enum as much as possible, but we extend it where necessary. IOW, just a table of known profile names with specific indices assigned to them. This sounds reasonable. > > Second, irrespective of the above, it may be useful to have a > > consistent way to pass performance-vs-power preference information > > from user space to different parts of the kernel so as to allow them > > to adjust their operation and this could be done with a system-wide > > power profile attribute IMO. > > I agree, which is why I tried to tackle both things in one go, > but as you said doing both in 1 API is probably not the best idea. > So I believe we should park this second issue for now and revisit it > when we find a need for it. Agreed. > Do you have any specific userspace API in mind for the > DPTF system profile selection? Not really. > And to get one thing out of the way, in the other thread we had some > discussion about using a single attribute where a cat would result in: > > low-power [balanced] performance > > Where the [] indicate the active profile, vs having 2 sysfs attributes > one ro with space-separated available (foo_available) values and one > wr with the actual/current value. FWIW userspace folks have indicated > they prefer the solution with 2 separate sysfs-attributes and that is > also what e.g. cpufreq is currently using for governor selection. Right. It also uses that for the EPP "profiles" selection which kind of belongs to the same broad category of settings. > I don't really have a strong opinion either way. Me neither. :-) I guess from the perspective of a script accessing the interface it is somewhat more straightforward to read what is available as a white-space-separated list without the need for extra parsing. Cheers! ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) 2020-10-14 15:42 ` Rafael J. Wysocki @ 2020-10-16 11:10 ` Hans de Goede 2020-10-16 14:26 ` Elia Devito 2020-10-16 14:51 ` Rafael J. Wysocki 0 siblings, 2 replies; 42+ messages in thread From: Hans de Goede @ 2020-10-16 11:10 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Daniel Lezcano, Srinivas Pandruvada, Lukasz Luba, Linux Kernel Mailing List, Linux PM, Zhang, Rui, Bastien Nocera, Mark Pearson, Limonciello, Mario, Darren Hart, Andy Shevchenko, Mark Gross, Elia Devito, Benjamin Berg, linux-acpi, platform-driver-x86 <note folding the 2 threads we are having on this into one, adding every one from both threads to the Cc> Hi, On 10/14/20 5:42 PM, Rafael J. Wysocki wrote: > On Wed, Oct 14, 2020 at 4:06 PM Hans de Goede <hdegoede@redhat.com> wrote: >> On 10/14/20 3:33 PM, Rafael J. Wysocki wrote: <snip> >>> First, a common place to register a DPTF system profile seems to be >>> needed and, as I said above, I wouldn't expect more than one such >>> thing to be present in the system at any given time, so it may be >>> registered along with the list of supported profiles and user space >>> will have to understand what they mean. >> >> Mostly Ack, I would still like to have an enum for DPTF system >> profiles in the kernel and have a single piece of code map that >> enum to profile names. This enum can then be extended as >> necessary, but I want to avoid having one driver use >> "Performance" and the other "performance" or one using >> "performance-balanced" and the other "balanced-performance", etc. >> >> With the goal being that new drivers use existing values from >> the enum as much as possible, but we extend it where necessary. > > IOW, just a table of known profile names with specific indices assigned to them. Yes. > This sounds reasonable. > >>> Second, irrespective of the above, it may be useful to have a >>> consistent way to pass performance-vs-power preference information >>> from user space to different parts of the kernel so as to allow them >>> to adjust their operation and this could be done with a system-wide >>> power profile attribute IMO. >> >> I agree, which is why I tried to tackle both things in one go, >> but as you said doing both in 1 API is probably not the best idea. >> So I believe we should park this second issue for now and revisit it >> when we find a need for it. > > Agreed. > >> Do you have any specific userspace API in mind for the >> DPTF system profile selection? > > Not really. So before /sys/power/profile was mentioned, but that seems more like a thing which should have a set of fixed possible values, iow that is out of scope for this discussion. Since we all seem to agree that this is something which we need specifically for DPTF profiles maybe just add: /sys/power/dptf_current_profile (rw) /sys/power/dptf_available_profiles (ro) (which will only be visible if a dptf-profile handler has been registered) ? Or more generic and thus better (in case other platforms later need something similar) I think, mirror the: /sys/bus/cpu/devices/cpu#/cpufreq/energy_performance_* bits for a system-wide energy-performance setting, so we get: /sys/power/energy_performance_preference /sys/power/energy_performance_available_preferences (again only visible when applicable) ? I personally like the second option best. Regards, Hans ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) 2020-10-16 11:10 ` [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) Hans de Goede @ 2020-10-16 14:26 ` Elia Devito [not found] ` <HK2PR0302MB2449214B28438ADC1790D468BD030@HK2PR0302MB2449.apcprd03.prod.outlook.com> 2020-10-16 14:51 ` Rafael J. Wysocki 1 sibling, 1 reply; 42+ messages in thread From: Elia Devito @ 2020-10-16 14:26 UTC (permalink / raw) To: Rafael J. Wysocki, Hans de Goede Cc: Daniel Lezcano, Srinivas Pandruvada, Lukasz Luba, Linux Kernel Mailing List, Linux PM, Zhang, Rui, Bastien Nocera, Mark Pearson, Limonciello, Mario, Darren Hart, Andy Shevchenko, Mark Gross, Benjamin Berg, linux-acpi, platform-driver-x86 Hi, In data venerdì 16 ottobre 2020 13:10:54 CEST, Hans de Goede ha scritto: > <note folding the 2 threads we are having on this into one, adding every one > from both threads to the Cc> > > Hi, > > On 10/14/20 5:42 PM, Rafael J. Wysocki wrote: > > On Wed, Oct 14, 2020 at 4:06 PM Hans de Goede <hdegoede@redhat.com> wrote: > >> On 10/14/20 3:33 PM, Rafael J. Wysocki wrote: > <snip> > > >>> First, a common place to register a DPTF system profile seems to be > >>> needed and, as I said above, I wouldn't expect more than one such > >>> thing to be present in the system at any given time, so it may be > >>> registered along with the list of supported profiles and user space > >>> will have to understand what they mean. > >> > >> Mostly Ack, I would still like to have an enum for DPTF system > >> profiles in the kernel and have a single piece of code map that > >> enum to profile names. This enum can then be extended as > >> necessary, but I want to avoid having one driver use > >> "Performance" and the other "performance" or one using > >> "performance-balanced" and the other "balanced-performance", etc. > >> > >> With the goal being that new drivers use existing values from > >> the enum as much as possible, but we extend it where necessary. > > > > IOW, just a table of known profile names with specific indices assigned to > > them. > Yes. > > > This sounds reasonable. > > > >>> Second, irrespective of the above, it may be useful to have a > >>> consistent way to pass performance-vs-power preference information > >>> from user space to different parts of the kernel so as to allow them > >>> to adjust their operation and this could be done with a system-wide > >>> power profile attribute IMO. > >> > >> I agree, which is why I tried to tackle both things in one go, > >> but as you said doing both in 1 API is probably not the best idea. > >> So I believe we should park this second issue for now and revisit it > >> when we find a need for it. > > > > Agreed. > > > >> Do you have any specific userspace API in mind for the > >> DPTF system profile selection? > > > > Not really. > > So before /sys/power/profile was mentioned, but that seems more like > a thing which should have a set of fixed possible values, iow that is > out of scope for this discussion. > > Since we all seem to agree that this is something which we need > specifically for DPTF profiles maybe just add: > > /sys/power/dptf_current_profile (rw) > /sys/power/dptf_available_profiles (ro) > > (which will only be visible if a dptf-profile handler > has been registered) ? > > Or more generic and thus better (in case other platforms > later need something similar) I think, mirror the: > > /sys/bus/cpu/devices/cpu#/cpufreq/energy_performance_* bits > for a system-wide energy-performance setting, so we get: > > /sys/power/energy_performance_preference > /sys/power/energy_performance_available_preferences > > (again only visible when applicable) ? > > I personally like the second option best. > > Regards, > > Hans between the two, the second seems to me more appropriate. Considering that the various profiles interact with thermal behaviors what do you think of something like: /sys/power/thermal_profile_available_profiles /sys/power/thermal_profile_profile Regards, Elia ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <HK2PR0302MB2449214B28438ADC1790D468BD030@HK2PR0302MB2449.apcprd03.prod.outlook.com>]
* Re: Fw: [External] Re: [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) [not found] ` <HK2PR0302MB2449214B28438ADC1790D468BD030@HK2PR0302MB2449.apcprd03.prod.outlook.com> @ 2020-10-16 14:43 ` Mark Pearson 2020-10-16 15:16 ` Elia Devito 0 siblings, 1 reply; 42+ messages in thread From: Mark Pearson @ 2020-10-16 14:43 UTC (permalink / raw) To: Rafael J. Wysocki, Hans de Goede, Elia Devito Cc: aniel Lezcano, Srinivas Pandruvada, Lukasz Luba, Linux Kernel Mailing List, Linux PM, Zhang, Rui, Bastien Nocera, Limonciello, Mario, Darren Hart, Andy Shevchenko, Mark Gross, Benjamin Berg, linux-acpi, platform-driver-x86 <Note - switched my email address to my more open source non-outlook based address> On 2020-10-16 10:32 a.m., Mark Pearson wrote: > > > ------------------------------------------------------------------------ > *From:* Elia Devito <eliadevito@gmail.com> > *Sent:* October 16, 2020 10:26 > *To:* Rafael J. Wysocki <rafael@kernel.org>; Hans de Goede > <hdegoede@redhat.com> > *Cc:* Daniel Lezcano <daniel.lezcano@linaro.org>; Srinivas Pandruvada > <srinivas.pandruvada@linux.intel.com>; Lukasz Luba > <lukasz.luba@arm.com>; Linux Kernel Mailing List > <linux-kernel@vger.kernel.org>; Linux PM <linux-pm@vger.kernel.org>; > Zhang, Rui <rui.zhang@intel.com>; Bastien Nocera <hadess@hadess.net>; > Mark Pearson <mpearson@lenovo.com>; Limonciello, Mario > <Mario.Limonciello@dell.com>; Darren Hart <dvhart@infradead.org>; Andy > Shevchenko <andy@infradead.org>; Mark Gross <mgross@linux.intel.com>; > Benjamin Berg <bberg@redhat.com>; linux-acpi@vger.kernel.org > <linux-acpi@vger.kernel.org>; platform-driver-x86@vger.kernel.org > <platform-driver-x86@vger.kernel.org> > *Subject:* [External] Re: [RFC] Documentation: Add documentation for new > performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add > the DTPM framework) > Hi, > > In data venerdì 16 ottobre 2020 13:10:54 CEST, Hans de Goede ha scritto: >> <note folding the 2 threads we are having on this into one, adding every one >> from both threads to the Cc> >> >> Hi, >> >> On 10/14/20 5:42 PM, Rafael J. Wysocki wrote: >> > On Wed, Oct 14, 2020 at 4:06 PM Hans de Goede <hdegoede@redhat.com> wrote: >> >> On 10/14/20 3:33 PM, Rafael J. Wysocki wrote: >> <snip> >> >> >>> First, a common place to register a DPTF system profile seems to be >> >>> needed and, as I said above, I wouldn't expect more than one such >> >>> thing to be present in the system at any given time, so it may be >> >>> registered along with the list of supported profiles and user space >> >>> will have to understand what they mean. >> >> >> >> Mostly Ack, I would still like to have an enum for DPTF system >> >> profiles in the kernel and have a single piece of code map that >> >> enum to profile names. This enum can then be extended as >> >> necessary, but I want to avoid having one driver use >> >> "Performance" and the other "performance" or one using >> >> "performance-balanced" and the other "balanced-performance", etc. >> >> >> >> With the goal being that new drivers use existing values from >> >> the enum as much as possible, but we extend it where necessary. >> > >> > IOW, just a table of known profile names with specific indices assigned to >> > them. >> Yes. >> >> > This sounds reasonable. >> > >> >>> Second, irrespective of the above, it may be useful to have a >> >>> consistent way to pass performance-vs-power preference information >> >>> from user space to different parts of the kernel so as to allow them >> >>> to adjust their operation and this could be done with a system-wide >> >>> power profile attribute IMO. >> >> >> >> I agree, which is why I tried to tackle both things in one go, >> >> but as you said doing both in 1 API is probably not the best idea. >> >> So I believe we should park this second issue for now and revisit it >> >> when we find a need for it. >> > >> > Agreed. >> > >> >> Do you have any specific userspace API in mind for the >> >> DPTF system profile selection? >> > >> > Not really. >> >> So before /sys/power/profile was mentioned, but that seems more like >> a thing which should have a set of fixed possible values, iow that is >> out of scope for this discussion. >> >> Since we all seem to agree that this is something which we need >> specifically for DPTF profiles maybe just add: >> >> /sys/power/dptf_current_profile (rw) >> /sys/power/dptf_available_profiles (ro) >> >> (which will only be visible if a dptf-profile handler >> has been registered) ? >> >> Or more generic and thus better (in case other platforms >> later need something similar) I think, mirror the: >> >> /sys/bus/cpu/devices/cpu#/cpufreq/energy_performance_* bits >> for a system-wide energy-performance setting, so we get: >> >> /sys/power/energy_performance_preference >> /sys/power/energy_performance_available_preferences >> >> (again only visible when applicable) ? >> >> I personally like the second option best. >> >> Regards, >> >> Hans > > between the two, the second seems to me more appropriate. > Considering that the various profiles interact with thermal behaviors > what do > you think of something like: > > /sys/power/thermal_profile_available_profiles > /sys/power/thermal_profile_profile > > Regards, > Elia > I'm good with either but I do find 'profile_profile' slightly awkward to say out loud (even though it's logically correct :)) How about just: /sys/power/platform_profile /sys/power/platform_profile_available As it covers the platform as a whole - fans, temperature, power, and anything else that ends up getting thrown in? Mark ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Fw: [External] Re: [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) 2020-10-16 14:43 ` Fw: [External] " Mark Pearson @ 2020-10-16 15:16 ` Elia Devito 0 siblings, 0 replies; 42+ messages in thread From: Elia Devito @ 2020-10-16 15:16 UTC (permalink / raw) To: Rafael J. Wysocki, Hans de Goede, Mark Pearson Cc: aniel Lezcano, Srinivas Pandruvada, Lukasz Luba, Linux Kernel Mailing List, Linux PM, Zhang, Rui, Bastien Nocera, Limonciello, Mario, Darren Hart, Andy Shevchenko, Mark Gross, Benjamin Berg, linux-acpi, platform-driver-x86 Hi, In data venerdì 16 ottobre 2020 16:43:09 CEST, Mark Pearson ha scritto: > <Note - switched my email address to my more open source non-outlook > based address> > > On 2020-10-16 10:32 a.m., Mark Pearson wrote: > > ------------------------------------------------------------------------ > > *From:* Elia Devito <eliadevito@gmail.com> > > *Sent:* October 16, 2020 10:26 > > *To:* Rafael J. Wysocki <rafael@kernel.org>; Hans de Goede > > <hdegoede@redhat.com> > > *Cc:* Daniel Lezcano <daniel.lezcano@linaro.org>; Srinivas Pandruvada > > <srinivas.pandruvada@linux.intel.com>; Lukasz Luba > > <lukasz.luba@arm.com>; Linux Kernel Mailing List > > <linux-kernel@vger.kernel.org>; Linux PM <linux-pm@vger.kernel.org>; > > Zhang, Rui <rui.zhang@intel.com>; Bastien Nocera <hadess@hadess.net>; > > Mark Pearson <mpearson@lenovo.com>; Limonciello, Mario > > <Mario.Limonciello@dell.com>; Darren Hart <dvhart@infradead.org>; Andy > > Shevchenko <andy@infradead.org>; Mark Gross <mgross@linux.intel.com>; > > Benjamin Berg <bberg@redhat.com>; linux-acpi@vger.kernel.org > > <linux-acpi@vger.kernel.org>; platform-driver-x86@vger.kernel.org > > <platform-driver-x86@vger.kernel.org> > > *Subject:* [External] Re: [RFC] Documentation: Add documentation for new > > performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add > > the DTPM framework) > > Hi, > > > > In data venerdì 16 ottobre 2020 13:10:54 CEST, Hans de Goede ha scritto: > >> <note folding the 2 threads we are having on this into one, adding every > >> one from both threads to the Cc> > >> > >> Hi, > >> > >> On 10/14/20 5:42 PM, Rafael J. Wysocki wrote: > >> > On Wed, Oct 14, 2020 at 4:06 PM Hans de Goede <hdegoede@redhat.com> wrote: > >> >> On 10/14/20 3:33 PM, Rafael J. Wysocki wrote: > >> <snip> > >> > >> >>> First, a common place to register a DPTF system profile seems to be > >> >>> needed and, as I said above, I wouldn't expect more than one such > >> >>> thing to be present in the system at any given time, so it may be > >> >>> registered along with the list of supported profiles and user space > >> >>> will have to understand what they mean. > >> >> > >> >> Mostly Ack, I would still like to have an enum for DPTF system > >> >> profiles in the kernel and have a single piece of code map that > >> >> enum to profile names. This enum can then be extended as > >> >> necessary, but I want to avoid having one driver use > >> >> "Performance" and the other "performance" or one using > >> >> "performance-balanced" and the other "balanced-performance", etc. > >> >> > >> >> With the goal being that new drivers use existing values from > >> >> the enum as much as possible, but we extend it where necessary. > >> > > >> > IOW, just a table of known profile names with specific indices assigned > >> > to > >> > them. > >> > >> Yes. > >> > >> > This sounds reasonable. > >> > > >> >>> Second, irrespective of the above, it may be useful to have a > >> >>> consistent way to pass performance-vs-power preference information > >> >>> from user space to different parts of the kernel so as to allow them > >> >>> to adjust their operation and this could be done with a system-wide > >> >>> power profile attribute IMO. > >> >> > >> >> I agree, which is why I tried to tackle both things in one go, > >> >> but as you said doing both in 1 API is probably not the best idea. > >> >> So I believe we should park this second issue for now and revisit it > >> >> when we find a need for it. > >> > > >> > Agreed. > >> > > >> >> Do you have any specific userspace API in mind for the > >> >> DPTF system profile selection? > >> > > >> > Not really. > >> > >> So before /sys/power/profile was mentioned, but that seems more like > >> a thing which should have a set of fixed possible values, iow that is > >> out of scope for this discussion. > >> > >> Since we all seem to agree that this is something which we need > >> specifically for DPTF profiles maybe just add: > >> > >> /sys/power/dptf_current_profile (rw) > >> /sys/power/dptf_available_profiles (ro) > >> > >> (which will only be visible if a dptf-profile handler > >> > >> has been registered) ? > >> > >> Or more generic and thus better (in case other platforms > >> later need something similar) I think, mirror the: > >> > >> /sys/bus/cpu/devices/cpu#/cpufreq/energy_performance_* bits > >> for a system-wide energy-performance setting, so we get: > >> > >> /sys/power/energy_performance_preference > >> /sys/power/energy_performance_available_preferences > >> > >> (again only visible when applicable) ? > >> > >> I personally like the second option best. > >> > >> Regards, > >> > >> Hans > > > > between the two, the second seems to me more appropriate. > > Considering that the various profiles interact with thermal behaviors > > what do > > you think of something like: > > > > /sys/power/thermal_profile_available_profiles > > /sys/power/thermal_profile_profile > > > > Regards, > > Elia > > I'm good with either but I do find 'profile_profile' slightly awkward to > say out loud (even though it's logically correct :)) > > How about just: > /sys/power/platform_profile > /sys/power/platform_profile_available > > As it covers the platform as a whole - fans, temperature, power, and > anything else that ends up getting thrown in? > > Mark Completely agree, I made a typo xD Elia ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) 2020-10-16 11:10 ` [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) Hans de Goede 2020-10-16 14:26 ` Elia Devito @ 2020-10-16 14:51 ` Rafael J. Wysocki 2020-10-18 9:41 ` Hans de Goede 1 sibling, 1 reply; 42+ messages in thread From: Rafael J. Wysocki @ 2020-10-16 14:51 UTC (permalink / raw) To: Hans de Goede Cc: Rafael J. Wysocki, Daniel Lezcano, Srinivas Pandruvada, Lukasz Luba, Linux Kernel Mailing List, Linux PM, Zhang, Rui, Bastien Nocera, Mark Pearson, Limonciello, Mario, Darren Hart, Andy Shevchenko, Mark Gross, Elia Devito, Benjamin Berg, linux-acpi, platform-driver-x86 On Fri, Oct 16, 2020 at 1:11 PM Hans de Goede <hdegoede@redhat.com> wrote: > > <note folding the 2 threads we are having on this into one, adding every one from both threads to the Cc> > > Hi, > > On 10/14/20 5:42 PM, Rafael J. Wysocki wrote: > > On Wed, Oct 14, 2020 at 4:06 PM Hans de Goede <hdegoede@redhat.com> wrote: > >> On 10/14/20 3:33 PM, Rafael J. Wysocki wrote: > > <snip> > > >>> First, a common place to register a DPTF system profile seems to be > >>> needed and, as I said above, I wouldn't expect more than one such > >>> thing to be present in the system at any given time, so it may be > >>> registered along with the list of supported profiles and user space > >>> will have to understand what they mean. > >> > >> Mostly Ack, I would still like to have an enum for DPTF system > >> profiles in the kernel and have a single piece of code map that > >> enum to profile names. This enum can then be extended as > >> necessary, but I want to avoid having one driver use > >> "Performance" and the other "performance" or one using > >> "performance-balanced" and the other "balanced-performance", etc. > >> > >> With the goal being that new drivers use existing values from > >> the enum as much as possible, but we extend it where necessary. > > > > IOW, just a table of known profile names with specific indices assigned to them. > > Yes. > > > This sounds reasonable. > > > >>> Second, irrespective of the above, it may be useful to have a > >>> consistent way to pass performance-vs-power preference information > >>> from user space to different parts of the kernel so as to allow them > >>> to adjust their operation and this could be done with a system-wide > >>> power profile attribute IMO. > >> > >> I agree, which is why I tried to tackle both things in one go, > >> but as you said doing both in 1 API is probably not the best idea. > >> So I believe we should park this second issue for now and revisit it > >> when we find a need for it. > > > > Agreed. > > > >> Do you have any specific userspace API in mind for the > >> DPTF system profile selection? > > > > Not really. > > So before /sys/power/profile was mentioned, but that seems more like > a thing which should have a set of fixed possible values, iow that is > out of scope for this discussion. Yes. > Since we all seem to agree that this is something which we need > specifically for DPTF profiles maybe just add: > > /sys/power/dptf_current_profile (rw) > /sys/power/dptf_available_profiles (ro) > > (which will only be visible if a dptf-profile handler > has been registered) ? > > Or more generic and thus better (in case other platforms > later need something similar) I think, mirror the: > > /sys/bus/cpu/devices/cpu#/cpufreq/energy_performance_* bits > for a system-wide energy-performance setting, so we get: > > /sys/power/energy_performance_preference > /sys/power/energy_performance_available_preferences But this is not about energy vs performance only in general, is it? > (again only visible when applicable) ? > > I personally like the second option best. But I would put it under /sys/firmware/ instead of /sys/power/ and I would call it something like platform_profile (and platform_profile_choices or similar). Cheers! ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) 2020-10-16 14:51 ` Rafael J. Wysocki @ 2020-10-18 9:41 ` Hans de Goede 2020-10-18 12:31 ` Rafael J. Wysocki 0 siblings, 1 reply; 42+ messages in thread From: Hans de Goede @ 2020-10-18 9:41 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Daniel Lezcano, Srinivas Pandruvada, Lukasz Luba, Linux Kernel Mailing List, Linux PM, Zhang, Rui, Bastien Nocera, Mark Pearson, Limonciello, Mario, Darren Hart, Andy Shevchenko, Mark Gross, Elia Devito, Benjamin Berg, linux-acpi, platform-driver-x86 Hi, On 10/16/20 4:51 PM, Rafael J. Wysocki wrote: > On Fri, Oct 16, 2020 at 1:11 PM Hans de Goede <hdegoede@redhat.com> wrote: >> >> <note folding the 2 threads we are having on this into one, adding every one from both threads to the Cc> >> >> Hi, >> >> On 10/14/20 5:42 PM, Rafael J. Wysocki wrote: >>> On Wed, Oct 14, 2020 at 4:06 PM Hans de Goede <hdegoede@redhat.com> wrote: >>>> On 10/14/20 3:33 PM, Rafael J. Wysocki wrote: >> >> <snip> >> >>>>> First, a common place to register a DPTF system profile seems to be >>>>> needed and, as I said above, I wouldn't expect more than one such >>>>> thing to be present in the system at any given time, so it may be >>>>> registered along with the list of supported profiles and user space >>>>> will have to understand what they mean. >>>> >>>> Mostly Ack, I would still like to have an enum for DPTF system >>>> profiles in the kernel and have a single piece of code map that >>>> enum to profile names. This enum can then be extended as >>>> necessary, but I want to avoid having one driver use >>>> "Performance" and the other "performance" or one using >>>> "performance-balanced" and the other "balanced-performance", etc. >>>> >>>> With the goal being that new drivers use existing values from >>>> the enum as much as possible, but we extend it where necessary. >>> >>> IOW, just a table of known profile names with specific indices assigned to them. >> >> Yes. >> >>> This sounds reasonable. >>> >>>>> Second, irrespective of the above, it may be useful to have a >>>>> consistent way to pass performance-vs-power preference information >>>>> from user space to different parts of the kernel so as to allow them >>>>> to adjust their operation and this could be done with a system-wide >>>>> power profile attribute IMO. >>>> >>>> I agree, which is why I tried to tackle both things in one go, >>>> but as you said doing both in 1 API is probably not the best idea. >>>> So I believe we should park this second issue for now and revisit it >>>> when we find a need for it. >>> >>> Agreed. >>> >>>> Do you have any specific userspace API in mind for the >>>> DPTF system profile selection? >>> >>> Not really. >> >> So before /sys/power/profile was mentioned, but that seems more like >> a thing which should have a set of fixed possible values, iow that is >> out of scope for this discussion. > > Yes. > >> Since we all seem to agree that this is something which we need >> specifically for DPTF profiles maybe just add: >> >> /sys/power/dptf_current_profile (rw) >> /sys/power/dptf_available_profiles (ro) >> >> (which will only be visible if a dptf-profile handler >> has been registered) ? >> >> Or more generic and thus better (in case other platforms >> later need something similar) I think, mirror the: >> >> /sys/bus/cpu/devices/cpu#/cpufreq/energy_performance_* bits >> for a system-wide energy-performance setting, so we get: >> >> /sys/power/energy_performance_preference >> /sys/power/energy_performance_available_preferences > > But this is not about energy vs performance only in general, is it? > >> (again only visible when applicable) ? >> >> I personally like the second option best. > > But I would put it under /sys/firmware/ instead of /sys/power/ and I > would call it something like platform_profile (and > platform_profile_choices or similar). Currently we only have dirs under /sys/firmware: [hans@x1 ~]$ ls /sys/firmware acpi dmi efi memmap But we do have /sys/firmware/apci/pm_profile: Documentation/ABI/stable/sysfs-acpi-pmprofile What: /sys/firmware/acpi/pm_profile Date: 03-Nov-2011 KernelVersion: v3.2 Contact: linux-acpi@vger.kernel.org Description: The ACPI pm_profile sysfs interface exports the platform power management (and performance) requirement expectations as provided by BIOS. The integer value is directly passed as retrieved from the FADT ACPI table. Values: For possible values see ACPI specification: 5.2.9 Fixed ACPI Description Table (FADT) Field: Preferred_PM_Profile Currently these values are defined by spec: 0 Unspecified 1 Desktop 2 Mobile 3 Workstation 4 Enterprise Server ... Since all platforms which we need this for are ACPI based (and the involved interfaces are also all ACPI interfaces) how about: /sys/firmware/acpi/platform_profile /sys/firmware/acpi/platform_profile_choices ? I think this goes nice together with /sys/firmware/acpi/pm_profile although that is read-only and this is a read/write setting. Rafel, would: /sys/firmware/acpi/platform_profile /sys/firmware/acpi/platform_profile_choices work for you ? Regards, Hans ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) 2020-10-18 9:41 ` Hans de Goede @ 2020-10-18 12:31 ` Rafael J. Wysocki 2020-10-19 18:43 ` Hans de Goede 0 siblings, 1 reply; 42+ messages in thread From: Rafael J. Wysocki @ 2020-10-18 12:31 UTC (permalink / raw) To: Hans de Goede Cc: Rafael J. Wysocki, Daniel Lezcano, Srinivas Pandruvada, Lukasz Luba, Linux Kernel Mailing List, Linux PM, Zhang, Rui, Bastien Nocera, Mark Pearson, Limonciello, Mario, Darren Hart, Andy Shevchenko, Mark Gross, Elia Devito, Benjamin Berg, linux-acpi, platform-driver-x86 On Sun, Oct 18, 2020 at 11:41 AM Hans de Goede <hdegoede@redhat.com> wrote: > > Hi, > > On 10/16/20 4:51 PM, Rafael J. Wysocki wrote: > > On Fri, Oct 16, 2020 at 1:11 PM Hans de Goede <hdegoede@redhat.com> wrote: > >> > >> <note folding the 2 threads we are having on this into one, adding every one from both threads to the Cc> > >> > >> Hi, > >> > >> On 10/14/20 5:42 PM, Rafael J. Wysocki wrote: > >>> On Wed, Oct 14, 2020 at 4:06 PM Hans de Goede <hdegoede@redhat.com> wrote: > >>>> On 10/14/20 3:33 PM, Rafael J. Wysocki wrote: > >> > >> <snip> > >> > >>>>> First, a common place to register a DPTF system profile seems to be > >>>>> needed and, as I said above, I wouldn't expect more than one such > >>>>> thing to be present in the system at any given time, so it may be > >>>>> registered along with the list of supported profiles and user space > >>>>> will have to understand what they mean. > >>>> > >>>> Mostly Ack, I would still like to have an enum for DPTF system > >>>> profiles in the kernel and have a single piece of code map that > >>>> enum to profile names. This enum can then be extended as > >>>> necessary, but I want to avoid having one driver use > >>>> "Performance" and the other "performance" or one using > >>>> "performance-balanced" and the other "balanced-performance", etc. > >>>> > >>>> With the goal being that new drivers use existing values from > >>>> the enum as much as possible, but we extend it where necessary. > >>> > >>> IOW, just a table of known profile names with specific indices assigned to them. > >> > >> Yes. > >> > >>> This sounds reasonable. > >>> > >>>>> Second, irrespective of the above, it may be useful to have a > >>>>> consistent way to pass performance-vs-power preference information > >>>>> from user space to different parts of the kernel so as to allow them > >>>>> to adjust their operation and this could be done with a system-wide > >>>>> power profile attribute IMO. > >>>> > >>>> I agree, which is why I tried to tackle both things in one go, > >>>> but as you said doing both in 1 API is probably not the best idea. > >>>> So I believe we should park this second issue for now and revisit it > >>>> when we find a need for it. > >>> > >>> Agreed. > >>> > >>>> Do you have any specific userspace API in mind for the > >>>> DPTF system profile selection? > >>> > >>> Not really. > >> > >> So before /sys/power/profile was mentioned, but that seems more like > >> a thing which should have a set of fixed possible values, iow that is > >> out of scope for this discussion. > > > > Yes. > > > >> Since we all seem to agree that this is something which we need > >> specifically for DPTF profiles maybe just add: > >> > >> /sys/power/dptf_current_profile (rw) > >> /sys/power/dptf_available_profiles (ro) > >> > >> (which will only be visible if a dptf-profile handler > >> has been registered) ? > >> > >> Or more generic and thus better (in case other platforms > >> later need something similar) I think, mirror the: > >> > >> /sys/bus/cpu/devices/cpu#/cpufreq/energy_performance_* bits > >> for a system-wide energy-performance setting, so we get: > >> > >> /sys/power/energy_performance_preference > >> /sys/power/energy_performance_available_preferences > > > > But this is not about energy vs performance only in general, is it? > > > >> (again only visible when applicable) ? > >> > >> I personally like the second option best. > > > > But I would put it under /sys/firmware/ instead of /sys/power/ and I > > would call it something like platform_profile (and > > platform_profile_choices or similar). > > Currently we only have dirs under /sys/firmware: > > [hans@x1 ~]$ ls /sys/firmware > acpi dmi efi memmap > > But we do have /sys/firmware/apci/pm_profile: > > Documentation/ABI/stable/sysfs-acpi-pmprofile > > What: /sys/firmware/acpi/pm_profile > Date: 03-Nov-2011 > KernelVersion: v3.2 > Contact: linux-acpi@vger.kernel.org > Description: The ACPI pm_profile sysfs interface exports the platform > power management (and performance) requirement expectations > as provided by BIOS. The integer value is directly passed as > retrieved from the FADT ACPI table. > Values: For possible values see ACPI specification: > 5.2.9 Fixed ACPI Description Table (FADT) > Field: Preferred_PM_Profile > > Currently these values are defined by spec: > 0 Unspecified > 1 Desktop > 2 Mobile > 3 Workstation > 4 Enterprise Server > ... > > Since all platforms which we need this for are ACPI based > (and the involved interfaces are also all ACPI interfaces) > how about: > > /sys/firmware/acpi/platform_profile > /sys/firmware/acpi/platform_profile_choices > > ? > > I think this goes nice together with /sys/firmware/acpi/pm_profile > although that is read-only and this is a read/write setting. > > Rafel, would: > > /sys/firmware/acpi/platform_profile > /sys/firmware/acpi/platform_profile_choices > > work for you ? Yes, it would. Cheers! ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) 2020-10-18 12:31 ` Rafael J. Wysocki @ 2020-10-19 18:43 ` Hans de Goede [not found] ` <HK2PR0302MB24494037019FBC7720976735BD1E0@HK2PR0302MB2449.apcprd03.prod.outlook.com> 2020-10-20 12:34 ` Rafael J. Wysocki 0 siblings, 2 replies; 42+ messages in thread From: Hans de Goede @ 2020-10-19 18:43 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Daniel Lezcano, Srinivas Pandruvada, Lukasz Luba, Linux Kernel Mailing List, Linux PM, Zhang, Rui, Bastien Nocera, Mark Pearson, Limonciello, Mario, Darren Hart, Andy Shevchenko, Mark Gross, Elia Devito, Benjamin Berg, linux-acpi, platform-driver-x86 Hi, On 10/18/20 2:31 PM, Rafael J. Wysocki wrote: > On Sun, Oct 18, 2020 at 11:41 AM Hans de Goede <hdegoede@redhat.com> wrote: >> >> Hi, >> >> On 10/16/20 4:51 PM, Rafael J. Wysocki wrote: >>> On Fri, Oct 16, 2020 at 1:11 PM Hans de Goede <hdegoede@redhat.com> wrote: >>>> >>>> <note folding the 2 threads we are having on this into one, adding every one from both threads to the Cc> >>>> >>>> Hi, >>>> >>>> On 10/14/20 5:42 PM, Rafael J. Wysocki wrote: >>>>> On Wed, Oct 14, 2020 at 4:06 PM Hans de Goede <hdegoede@redhat.com> wrote: >>>>>> On 10/14/20 3:33 PM, Rafael J. Wysocki wrote: >>>> >>>> <snip> >>>> >>>>>>> First, a common place to register a DPTF system profile seems to be >>>>>>> needed and, as I said above, I wouldn't expect more than one such >>>>>>> thing to be present in the system at any given time, so it may be >>>>>>> registered along with the list of supported profiles and user space >>>>>>> will have to understand what they mean. >>>>>> >>>>>> Mostly Ack, I would still like to have an enum for DPTF system >>>>>> profiles in the kernel and have a single piece of code map that >>>>>> enum to profile names. This enum can then be extended as >>>>>> necessary, but I want to avoid having one driver use >>>>>> "Performance" and the other "performance" or one using >>>>>> "performance-balanced" and the other "balanced-performance", etc. >>>>>> >>>>>> With the goal being that new drivers use existing values from >>>>>> the enum as much as possible, but we extend it where necessary. >>>>> >>>>> IOW, just a table of known profile names with specific indices assigned to them. >>>> >>>> Yes. >>>> >>>>> This sounds reasonable. >>>>> >>>>>>> Second, irrespective of the above, it may be useful to have a >>>>>>> consistent way to pass performance-vs-power preference information >>>>>>> from user space to different parts of the kernel so as to allow them >>>>>>> to adjust their operation and this could be done with a system-wide >>>>>>> power profile attribute IMO. >>>>>> >>>>>> I agree, which is why I tried to tackle both things in one go, >>>>>> but as you said doing both in 1 API is probably not the best idea. >>>>>> So I believe we should park this second issue for now and revisit it >>>>>> when we find a need for it. >>>>> >>>>> Agreed. >>>>> >>>>>> Do you have any specific userspace API in mind for the >>>>>> DPTF system profile selection? >>>>> >>>>> Not really. >>>> >>>> So before /sys/power/profile was mentioned, but that seems more like >>>> a thing which should have a set of fixed possible values, iow that is >>>> out of scope for this discussion. >>> >>> Yes. >>> >>>> Since we all seem to agree that this is something which we need >>>> specifically for DPTF profiles maybe just add: >>>> >>>> /sys/power/dptf_current_profile (rw) >>>> /sys/power/dptf_available_profiles (ro) >>>> >>>> (which will only be visible if a dptf-profile handler >>>> has been registered) ? >>>> >>>> Or more generic and thus better (in case other platforms >>>> later need something similar) I think, mirror the: >>>> >>>> /sys/bus/cpu/devices/cpu#/cpufreq/energy_performance_* bits >>>> for a system-wide energy-performance setting, so we get: >>>> >>>> /sys/power/energy_performance_preference >>>> /sys/power/energy_performance_available_preferences >>> >>> But this is not about energy vs performance only in general, is it? >>> >>>> (again only visible when applicable) ? >>>> >>>> I personally like the second option best. >>> >>> But I would put it under /sys/firmware/ instead of /sys/power/ and I >>> would call it something like platform_profile (and >>> platform_profile_choices or similar). >> >> Currently we only have dirs under /sys/firmware: >> >> [hans@x1 ~]$ ls /sys/firmware >> acpi dmi efi memmap >> >> But we do have /sys/firmware/apci/pm_profile: >> >> Documentation/ABI/stable/sysfs-acpi-pmprofile >> >> What: /sys/firmware/acpi/pm_profile >> Date: 03-Nov-2011 >> KernelVersion: v3.2 >> Contact: linux-acpi@vger.kernel.org >> Description: The ACPI pm_profile sysfs interface exports the platform >> power management (and performance) requirement expectations >> as provided by BIOS. The integer value is directly passed as >> retrieved from the FADT ACPI table. >> Values: For possible values see ACPI specification: >> 5.2.9 Fixed ACPI Description Table (FADT) >> Field: Preferred_PM_Profile >> >> Currently these values are defined by spec: >> 0 Unspecified >> 1 Desktop >> 2 Mobile >> 3 Workstation >> 4 Enterprise Server >> ... >> >> Since all platforms which we need this for are ACPI based >> (and the involved interfaces are also all ACPI interfaces) >> how about: >> >> /sys/firmware/acpi/platform_profile >> /sys/firmware/acpi/platform_profile_choices >> >> ? >> >> I think this goes nice together with /sys/firmware/acpi/pm_profile >> although that is read-only and this is a read/write setting. >> >> Rafel, would: >> >> /sys/firmware/acpi/platform_profile >> /sys/firmware/acpi/platform_profile_choices >> >> work for you ? > > Yes, it would. Great. So I think hat means that we have the most important part for moving forward with this. So I guess the plan for this now looks something like this. 1. Rewrite my API docs RFC to update it for the new /sys/firmware/acpi/platform_profile[_choices] plan (should be easy and a bunch of stuff like the "type" bit can just be dropped) 2. Add code somewhere under drivers/acpi which allows code from else where to register itself as platform_profile handler/provider. Rafael, any suggestions / preference for where this should be added under drivers/acpi ? In a new .c file perhaps ? 3.1 Use the code from 2 to add support for platform-profile selection in thinkpad_acpi (something for me or Mark Pearson) to do 3.2 Use the code from 2 to add support for platform-profile selection to hp-wmi 3.3 (and to other drivers in the future). An open question is who will take care of 1. and 2. Mark (Pearson) do you feel up to this? or do you want me to take care of this? Regards, Hans ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <HK2PR0302MB24494037019FBC7720976735BD1E0@HK2PR0302MB2449.apcprd03.prod.outlook.com>]
* Re: Fw: [External] Re: [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) [not found] ` <HK2PR0302MB24494037019FBC7720976735BD1E0@HK2PR0302MB2449.apcprd03.prod.outlook.com> @ 2020-10-19 18:49 ` Mark Pearson 2020-10-25 10:13 ` Hans de Goede 0 siblings, 1 reply; 42+ messages in thread From: Mark Pearson @ 2020-10-19 18:49 UTC (permalink / raw) To: Hans de Goede, Rafael J. Wysocki Cc: Daniel Lezcano, Srinivas Pandruvada ,>, Lukasz Luba ,>, Linux Kernel Mailing List ,>, Linux PM, > Zhang, Rui, Bastien Nocera, > Mark Pearson, Limonciello, Mario ,>, Darren Hart, Andy ,> Shevchenko, Mark Gross, > Elia Devito, Benjamin Berg, > linux-acpi@vger.kernel.org, > platform-driver-x86@vger.kernel.org Hi > On 19/10/2020 14:43, Hans de Goede <hdegoede@redhat.com> wrote: > Hi, > > On 10/18/20 2:31 PM, Rafael J. Wysocki wrote: >> On Sun, Oct 18, 2020 at 11:41 AM Hans de Goede <hdegoede@redhat.com> wrote: >>> >>> Hi, >>> >>> On 10/16/20 4:51 PM, Rafael J. Wysocki wrote: >>>> On Fri, Oct 16, 2020 at 1:11 PM Hans de Goede <hdegoede@redhat.com> wrote: >>>>> >>>>> <note folding the 2 threads we are having on this into one, adding every one from both threads to the Cc> >>>>> >>>>> Hi, >>>>> >>>>> On 10/14/20 5:42 PM, Rafael J. Wysocki wrote: >>>>>> On Wed, Oct 14, 2020 at 4:06 PM Hans de Goede <hdegoede@redhat.com> wrote: >>>>>>> On 10/14/20 3:33 PM, Rafael J. Wysocki wrote: >>>>> >>>>> <snip> >>>>> >>>>>>>> First, a common place to register a DPTF system profile seems to be >>>>>>>> needed and, as I said above, I wouldn't expect more than one such >>>>>>>> thing to be present in the system at any given time, so it may be >>>>>>>> registered along with the list of supported profiles and user space >>>>>>>> will have to understand what they mean. >>>>>>> >>>>>>> Mostly Ack, I would still like to have an enum for DPTF system >>>>>>> profiles in the kernel and have a single piece of code map that >>>>>>> enum to profile names. This enum can then be extended as >>>>>>> necessary, but I want to avoid having one driver use >>>>>>> "Performance" and the other "performance" or one using >>>>>>> "performance-balanced" and the other "balanced-performance", etc. >>>>>>> >>>>>>> With the goal being that new drivers use existing values from >>>>>>> the enum as much as possible, but we extend it where necessary. >>>>>> >>>>>> IOW, just a table of known profile names with specific indices assigned to them. >>>>> >>>>> Yes. >>>>> >>>>>> This sounds reasonable. >>>>>> >>>>>>>> Second, irrespective of the above, it may be useful to have a >>>>>>>> consistent way to pass performance-vs-power preference information >>>>>>>> from user space to different parts of the kernel so as to allow them >>>>>>>> to adjust their operation and this could be done with a system-wide >>>>>>>> power profile attribute IMO. >>>>>>> >>>>>>> I agree, which is why I tried to tackle both things in one go, >>>>>>> but as you said doing both in 1 API is probably not the best idea. >>>>>>> So I believe we should park this second issue for now and revisit it >>>>>>> when we find a need for it. >>>>>> >>>>>> Agreed. >>>>>> >>>>>>> Do you have any specific userspace API in mind for the >>>>>>> DPTF system profile selection? >>>>>> >>>>>> Not really. >>>>> >>>>> So before /sys/power/profile was mentioned, but that seems more like >>>>> a thing which should have a set of fixed possible values, iow that is >>>>> out of scope for this discussion. >>>> >>>> Yes. >>>> >>>>> Since we all seem to agree that this is something which we need >>>>> specifically for DPTF profiles maybe just add: >>>>> >>>>> /sys/power/dptf_current_profile (rw) >>>>> /sys/power/dptf_available_profiles (ro) >>>>> >>>>> (which will only be visible if a dptf-profile handler >>>>> has been registered) ? >>>>> >>>>> Or more generic and thus better (in case other platforms >>>>> later need something similar) I think, mirror the: >>>>> >>>>> /sys/bus/cpu/devices/cpu#/cpufreq/energy_performance_* bits >>>>> for a system-wide energy-performance setting, so we get: >>>>> >>>>> /sys/power/energy_performance_preference >>>>> /sys/power/energy_performance_available_preferences >>>> >>>> But this is not about energy vs performance only in general, is it? >>>> >>>>> (again only visible when applicable) ? >>>>> >>>>> I personally like the second option best. >>>> >>>> But I would put it under /sys/firmware/ instead of /sys/power/ and I >>>> would call it something like platform_profile (and >>>> platform_profile_choices or similar). >>> >>> Currently we only have dirs under /sys/firmware: >>> >>> [hans@x1 ~]$ ls /sys/firmware >>> acpi dmi efi memmap >>> >>> But we do have /sys/firmware/apci/pm_profile: >>> >>> Documentation/ABI/stable/sysfs-acpi-pmprofile >>> >>> What: /sys/firmware/acpi/pm_profile >>> Date: 03-Nov-2011 >>> KernelVersion: v3.2 >>> Contact: linux-acpi@vger.kernel.org >>> Description: The ACPI pm_profile sysfs interface exports the platform >>> power management (and performance) requirement expectations >>> as provided by BIOS. The integer value is directly passed as >>> retrieved from the FADT ACPI table. >>> Values: For possible values see ACPI specification: >>> 5.2.9 Fixed ACPI Description Table (FADT) >>> Field: Preferred_PM_Profile >>> >>> Currently these values are defined by spec: >>> 0 Unspecified >>> 1 Desktop >>> 2 Mobile >>> 3 Workstation >>> 4 Enterprise Server >>> ... >>> >>> Since all platforms which we need this for are ACPI based >>> (and the involved interfaces are also all ACPI interfaces) >>> how about: >>> >>> /sys/firmware/acpi/platform_profile >>> /sys/firmware/acpi/platform_profile_choices >>> >>> ? >>> >>> I think this goes nice together with /sys/firmware/acpi/pm_profile >>> although that is read-only and this is a read/write setting. >>> >>> Rafel, would: >>> >>> /sys/firmware/acpi/platform_profile >>> /sys/firmware/acpi/platform_profile_choices >>> >>> work for you ? >> >> Yes, it would. > > Great. So I think hat means that we have the most important part > for moving forward with this. > > So I guess the plan for this now looks something like this. > > 1. Rewrite my API docs RFC to update it for the new > /sys/firmware/acpi/platform_profile[_choices] > plan (should be easy and a bunch of stuff like the "type" bit can > just be dropped) > > 2. Add code somewhere under drivers/acpi which allows code from else where > to register itself as platform_profile handler/provider. > > Rafael, any suggestions / preference for where this should be added under > drivers/acpi ? In a new .c file perhaps ? > > 3.1 Use the code from 2 to add support for platform-profile selection in > thinkpad_acpi (something for me or Mark Pearson) to do > 3.2 Use the code from 2 to add support for platform-profile selection > to hp-wmi > 3.3 (and to other drivers in the future). > > > An open question is who will take care of 1. and 2. Mark (Pearson) > do you feel up to this? or do you want me to take care of this? > > Regards, > > Hans > Definitely up for (2) and will happily have a go at number (1). If there's an example of something similar I can look at for reference that would be helpful :) Mark ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Fw: [External] Re: [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) 2020-10-19 18:49 ` Fw: [External] " Mark Pearson @ 2020-10-25 10:13 ` Hans de Goede 0 siblings, 0 replies; 42+ messages in thread From: Hans de Goede @ 2020-10-25 10:13 UTC (permalink / raw) To: Mark Pearson, Rafael J. Wysocki Cc: Daniel Lezcano, Srinivas Pandruvada ,>, Lukasz Luba ,>, Linux Kernel Mailing List ,>, Linux PM, > Zhang, Rui, Bastien Nocera, > Mark Pearson, Limonciello, Mario ,>, Darren Hart, Andy ,> Shevchenko, Mark Gross, > Elia Devito, Benjamin Berg, > linux-acpi@vger.kernel.org, > platform-driver-x86@vger.kernel.org Hi, On 10/19/20 8:49 PM, Mark Pearson wrote: > Hi > >> On 19/10/2020 14:43, Hans de Goede <hdegoede@redhat.com> wrote: >> Hi, >> >> On 10/18/20 2:31 PM, Rafael J. Wysocki wrote: >>> On Sun, Oct 18, 2020 at 11:41 AM Hans de Goede <hdegoede@redhat.com> wrote: >>>> >>>> Hi, >>>> >>>> On 10/16/20 4:51 PM, Rafael J. Wysocki wrote: >>>>> On Fri, Oct 16, 2020 at 1:11 PM Hans de Goede <hdegoede@redhat.com> wrote: >>>>>> >>>>>> <note folding the 2 threads we are having on this into one, adding every one from both threads to the Cc> >>>>>> >>>>>> Hi, >>>>>> >>>>>> On 10/14/20 5:42 PM, Rafael J. Wysocki wrote: >>>>>>> On Wed, Oct 14, 2020 at 4:06 PM Hans de Goede <hdegoede@redhat.com> wrote: >>>>>>>> On 10/14/20 3:33 PM, Rafael J. Wysocki wrote: >>>>>> >>>>>> <snip> >>>>>> >>>>>>>>> First, a common place to register a DPTF system profile seems to be >>>>>>>>> needed and, as I said above, I wouldn't expect more than one such >>>>>>>>> thing to be present in the system at any given time, so it may be >>>>>>>>> registered along with the list of supported profiles and user space >>>>>>>>> will have to understand what they mean. >>>>>>>> >>>>>>>> Mostly Ack, I would still like to have an enum for DPTF system >>>>>>>> profiles in the kernel and have a single piece of code map that >>>>>>>> enum to profile names. This enum can then be extended as >>>>>>>> necessary, but I want to avoid having one driver use >>>>>>>> "Performance" and the other "performance" or one using >>>>>>>> "performance-balanced" and the other "balanced-performance", etc. >>>>>>>> >>>>>>>> With the goal being that new drivers use existing values from >>>>>>>> the enum as much as possible, but we extend it where necessary. >>>>>>> >>>>>>> IOW, just a table of known profile names with specific indices assigned to them. >>>>>> >>>>>> Yes. >>>>>> >>>>>>> This sounds reasonable. >>>>>>> >>>>>>>>> Second, irrespective of the above, it may be useful to have a >>>>>>>>> consistent way to pass performance-vs-power preference information >>>>>>>>> from user space to different parts of the kernel so as to allow them >>>>>>>>> to adjust their operation and this could be done with a system-wide >>>>>>>>> power profile attribute IMO. >>>>>>>> >>>>>>>> I agree, which is why I tried to tackle both things in one go, >>>>>>>> but as you said doing both in 1 API is probably not the best idea. >>>>>>>> So I believe we should park this second issue for now and revisit it >>>>>>>> when we find a need for it. >>>>>>> >>>>>>> Agreed. >>>>>>> >>>>>>>> Do you have any specific userspace API in mind for the >>>>>>>> DPTF system profile selection? >>>>>>> >>>>>>> Not really. >>>>>> >>>>>> So before /sys/power/profile was mentioned, but that seems more like >>>>>> a thing which should have a set of fixed possible values, iow that is >>>>>> out of scope for this discussion. >>>>> >>>>> Yes. >>>>> >>>>>> Since we all seem to agree that this is something which we need >>>>>> specifically for DPTF profiles maybe just add: >>>>>> >>>>>> /sys/power/dptf_current_profile (rw) >>>>>> /sys/power/dptf_available_profiles (ro) >>>>>> >>>>>> (which will only be visible if a dptf-profile handler >>>>>> has been registered) ? >>>>>> >>>>>> Or more generic and thus better (in case other platforms >>>>>> later need something similar) I think, mirror the: >>>>>> >>>>>> /sys/bus/cpu/devices/cpu#/cpufreq/energy_performance_* bits >>>>>> for a system-wide energy-performance setting, so we get: >>>>>> >>>>>> /sys/power/energy_performance_preference >>>>>> /sys/power/energy_performance_available_preferences >>>>> >>>>> But this is not about energy vs performance only in general, is it? >>>>> >>>>>> (again only visible when applicable) ? >>>>>> >>>>>> I personally like the second option best. >>>>> >>>>> But I would put it under /sys/firmware/ instead of /sys/power/ and I >>>>> would call it something like platform_profile (and >>>>> platform_profile_choices or similar). >>>> >>>> Currently we only have dirs under /sys/firmware: >>>> >>>> [hans@x1 ~]$ ls /sys/firmware >>>> acpi dmi efi memmap >>>> >>>> But we do have /sys/firmware/apci/pm_profile: >>>> >>>> Documentation/ABI/stable/sysfs-acpi-pmprofile >>>> >>>> What: /sys/firmware/acpi/pm_profile >>>> Date: 03-Nov-2011 >>>> KernelVersion: v3.2 >>>> Contact: linux-acpi@vger.kernel.org >>>> Description: The ACPI pm_profile sysfs interface exports the platform >>>> power management (and performance) requirement expectations >>>> as provided by BIOS. The integer value is directly passed as >>>> retrieved from the FADT ACPI table. >>>> Values: For possible values see ACPI specification: >>>> 5.2.9 Fixed ACPI Description Table (FADT) >>>> Field: Preferred_PM_Profile >>>> >>>> Currently these values are defined by spec: >>>> 0 Unspecified >>>> 1 Desktop >>>> 2 Mobile >>>> 3 Workstation >>>> 4 Enterprise Server >>>> ... >>>> >>>> Since all platforms which we need this for are ACPI based >>>> (and the involved interfaces are also all ACPI interfaces) >>>> how about: >>>> >>>> /sys/firmware/acpi/platform_profile >>>> /sys/firmware/acpi/platform_profile_choices >>>> >>>> ? >>>> >>>> I think this goes nice together with /sys/firmware/acpi/pm_profile >>>> although that is read-only and this is a read/write setting. >>>> >>>> Rafel, would: >>>> >>>> /sys/firmware/acpi/platform_profile >>>> /sys/firmware/acpi/platform_profile_choices >>>> >>>> work for you ? >>> >>> Yes, it would. >> >> Great. So I think hat means that we have the most important part >> for moving forward with this. >> >> So I guess the plan for this now looks something like this. >> >> 1. Rewrite my API docs RFC to update it for the new /sys/firmware/acpi/platform_profile[_choices] >> plan (should be easy and a bunch of stuff like the "type" bit can just be dropped) >> >> 2. Add code somewhere under drivers/acpi which allows code from else where >> to register itself as platform_profile handler/provider. >> >> Rafael, any suggestions / preference for where this should be added under >> drivers/acpi ? In a new .c file perhaps ? >> >> 3.1 Use the code from 2 to add support for platform-profile selection in >> thinkpad_acpi (something for me or Mark Pearson) to do >> 3.2 Use the code from 2 to add support for platform-profile selection >> to hp-wmi >> 3.3 (and to other drivers in the future). >> >> >> An open question is who will take care of 1. and 2. Mark (Pearson) >> do you feel up to this? or do you want me to take care of this? >> >> Regards, >> >> Hans >> > > Definitely up for (2) and will happily have a go at number (1). > > If there's an example of something similar I can look at for reference that would be helpful :) So what I would do is something like this: 1. Create a new include/acpi/platform_profile_provider.h file and in that file define: 1.1 An enum with possible profile values (as discussed we want the driver API to use an enum (which may be extended) and then use an array with strings inside the shared code to avoid differences like "Performance" vs "performance", etc. The enum should end with something like: ACPI_PLATFORM_PROFILE_COUNT 1.2 An acpi_platform_profile_provider struct in which contains a number of function-pointers for set/get callbacks (these callbacks should get/set the enum type, not strings) a "void *user_data" (to be passed back to the callbacks) and a: "unsigned long profile_choices[BITS_TO_LONGS(ACPI_PLATFORM_PROFILE_COUNT)] member. 1.3 A function to register / unregister a platform_profile_provider 2. Add a new .c file for this under drivers/acpi which should contain the actual implementation of the API (left to you) and at least 2 global variables a "struct acpi_platform_profile_provider *profile_provider;" and a mutex protecting this. The register/unregister function should lock the mutex to protect the pointer and the register function should check that this is the first provider being registered if there already is a provider registered then -EBUSY should be returned. The register/unregister functions should also add / remove the 2 /sys/firmware/acpi/platform_profile[_choices] files. Hint for the read function of the platform_profile_choices function you should use for_each_set_bit(bit, profile_choices, ACPI_PLATFORM_PROFILE_COUNT) {} Note this is more or less what I would do (minus any changes I would come up with when implementing this), feel free to use your own judgement here. Rafael, do you have any comments on this approach ? Regards, Hans ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) 2020-10-19 18:43 ` Hans de Goede [not found] ` <HK2PR0302MB24494037019FBC7720976735BD1E0@HK2PR0302MB2449.apcprd03.prod.outlook.com> @ 2020-10-20 12:34 ` Rafael J. Wysocki 1 sibling, 0 replies; 42+ messages in thread From: Rafael J. Wysocki @ 2020-10-20 12:34 UTC (permalink / raw) To: Hans de Goede Cc: Rafael J. Wysocki, Daniel Lezcano, Srinivas Pandruvada, Lukasz Luba, Linux Kernel Mailing List, Linux PM, Zhang, Rui, Bastien Nocera, Mark Pearson, Limonciello, Mario, Darren Hart, Andy Shevchenko, Mark Gross, Elia Devito, Benjamin Berg, linux-acpi, platform-driver-x86 On Mon, Oct 19, 2020 at 8:43 PM Hans de Goede <hdegoede@redhat.com> wrote: > > Hi, > > On 10/18/20 2:31 PM, Rafael J. Wysocki wrote: > > On Sun, Oct 18, 2020 at 11:41 AM Hans de Goede <hdegoede@redhat.com> wrote: > >> > >> Hi, > >> > >> On 10/16/20 4:51 PM, Rafael J. Wysocki wrote: > >>> On Fri, Oct 16, 2020 at 1:11 PM Hans de Goede <hdegoede@redhat.com> wrote: > >>>> > >>>> <note folding the 2 threads we are having on this into one, adding every one from both threads to the Cc> > >>>> > >>>> Hi, > >>>> > >>>> On 10/14/20 5:42 PM, Rafael J. Wysocki wrote: > >>>>> On Wed, Oct 14, 2020 at 4:06 PM Hans de Goede <hdegoede@redhat.com> wrote: > >>>>>> On 10/14/20 3:33 PM, Rafael J. Wysocki wrote: > >>>> > >>>> <snip> > >>>> > >>>>>>> First, a common place to register a DPTF system profile seems to be > >>>>>>> needed and, as I said above, I wouldn't expect more than one such > >>>>>>> thing to be present in the system at any given time, so it may be > >>>>>>> registered along with the list of supported profiles and user space > >>>>>>> will have to understand what they mean. > >>>>>> > >>>>>> Mostly Ack, I would still like to have an enum for DPTF system > >>>>>> profiles in the kernel and have a single piece of code map that > >>>>>> enum to profile names. This enum can then be extended as > >>>>>> necessary, but I want to avoid having one driver use > >>>>>> "Performance" and the other "performance" or one using > >>>>>> "performance-balanced" and the other "balanced-performance", etc. > >>>>>> > >>>>>> With the goal being that new drivers use existing values from > >>>>>> the enum as much as possible, but we extend it where necessary. > >>>>> > >>>>> IOW, just a table of known profile names with specific indices assigned to them. > >>>> > >>>> Yes. > >>>> > >>>>> This sounds reasonable. > >>>>> > >>>>>>> Second, irrespective of the above, it may be useful to have a > >>>>>>> consistent way to pass performance-vs-power preference information > >>>>>>> from user space to different parts of the kernel so as to allow them > >>>>>>> to adjust their operation and this could be done with a system-wide > >>>>>>> power profile attribute IMO. > >>>>>> > >>>>>> I agree, which is why I tried to tackle both things in one go, > >>>>>> but as you said doing both in 1 API is probably not the best idea. > >>>>>> So I believe we should park this second issue for now and revisit it > >>>>>> when we find a need for it. > >>>>> > >>>>> Agreed. > >>>>> > >>>>>> Do you have any specific userspace API in mind for the > >>>>>> DPTF system profile selection? > >>>>> > >>>>> Not really. > >>>> > >>>> So before /sys/power/profile was mentioned, but that seems more like > >>>> a thing which should have a set of fixed possible values, iow that is > >>>> out of scope for this discussion. > >>> > >>> Yes. > >>> > >>>> Since we all seem to agree that this is something which we need > >>>> specifically for DPTF profiles maybe just add: > >>>> > >>>> /sys/power/dptf_current_profile (rw) > >>>> /sys/power/dptf_available_profiles (ro) > >>>> > >>>> (which will only be visible if a dptf-profile handler > >>>> has been registered) ? > >>>> > >>>> Or more generic and thus better (in case other platforms > >>>> later need something similar) I think, mirror the: > >>>> > >>>> /sys/bus/cpu/devices/cpu#/cpufreq/energy_performance_* bits > >>>> for a system-wide energy-performance setting, so we get: > >>>> > >>>> /sys/power/energy_performance_preference > >>>> /sys/power/energy_performance_available_preferences > >>> > >>> But this is not about energy vs performance only in general, is it? > >>> > >>>> (again only visible when applicable) ? > >>>> > >>>> I personally like the second option best. > >>> > >>> But I would put it under /sys/firmware/ instead of /sys/power/ and I > >>> would call it something like platform_profile (and > >>> platform_profile_choices or similar). > >> > >> Currently we only have dirs under /sys/firmware: > >> > >> [hans@x1 ~]$ ls /sys/firmware > >> acpi dmi efi memmap > >> > >> But we do have /sys/firmware/apci/pm_profile: > >> > >> Documentation/ABI/stable/sysfs-acpi-pmprofile > >> > >> What: /sys/firmware/acpi/pm_profile > >> Date: 03-Nov-2011 > >> KernelVersion: v3.2 > >> Contact: linux-acpi@vger.kernel.org > >> Description: The ACPI pm_profile sysfs interface exports the platform > >> power management (and performance) requirement expectations > >> as provided by BIOS. The integer value is directly passed as > >> retrieved from the FADT ACPI table. > >> Values: For possible values see ACPI specification: > >> 5.2.9 Fixed ACPI Description Table (FADT) > >> Field: Preferred_PM_Profile > >> > >> Currently these values are defined by spec: > >> 0 Unspecified > >> 1 Desktop > >> 2 Mobile > >> 3 Workstation > >> 4 Enterprise Server > >> ... > >> > >> Since all platforms which we need this for are ACPI based > >> (and the involved interfaces are also all ACPI interfaces) > >> how about: > >> > >> /sys/firmware/acpi/platform_profile > >> /sys/firmware/acpi/platform_profile_choices > >> > >> ? > >> > >> I think this goes nice together with /sys/firmware/acpi/pm_profile > >> although that is read-only and this is a read/write setting. > >> > >> Rafel, would: > >> > >> /sys/firmware/acpi/platform_profile > >> /sys/firmware/acpi/platform_profile_choices > >> > >> work for you ? > > > > Yes, it would. > > Great. So I think hat means that we have the most important part > for moving forward with this. > > So I guess the plan for this now looks something like this. > > 1. Rewrite my API docs RFC to update it for the new /sys/firmware/acpi/platform_profile[_choices] > plan (should be easy and a bunch of stuff like the "type" bit can just be dropped) > > 2. Add code somewhere under drivers/acpi which allows code from else where > to register itself as platform_profile handler/provider. Sounds good to me. > Rafael, any suggestions / preference for where this should be added under > drivers/acpi ? In a new .c file perhaps ? Yes, that would be most suitable IMV. > 3.1 Use the code from 2 to add support for platform-profile selection in > thinkpad_acpi (something for me or Mark Pearson) to do > 3.2 Use the code from 2 to add support for platform-profile selection > to hp-wmi > 3.3 (and to other drivers in the future). Right. ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2020-11-10 15:04 UTC | newest] Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-06 12:20 [PATCH 0/4] powercap/dtpm: Add the DTPM framework Daniel Lezcano 2020-10-06 12:20 ` [PATCH 1/4] units: Add Watt units Daniel Lezcano 2020-11-10 10:02 ` Lukasz Luba 2020-10-06 12:20 ` [PATCH 2/4] Documentation/powercap/dtpm: Add documentation for dtpm Daniel Lezcano 2020-10-13 22:01 ` Ram Chandrasekar 2020-10-06 12:20 ` [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management Daniel Lezcano 2020-10-06 16:42 ` kernel test robot 2020-10-06 18:05 ` kernel test robot 2020-10-23 10:29 ` Lukasz Luba 2020-11-03 18:42 ` Daniel Lezcano 2020-11-10 9:59 ` Lukasz Luba 2020-11-10 11:05 ` Lukasz Luba 2020-11-10 14:59 ` Daniel Lezcano 2020-11-10 15:04 ` Lukasz Luba 2020-11-10 12:55 ` Daniel Lezcano 2020-10-06 12:20 ` [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support Daniel Lezcano 2020-10-23 13:27 ` Lukasz Luba 2020-11-04 10:47 ` Daniel Lezcano 2020-11-04 10:57 ` Lukasz Luba 2020-11-04 11:15 ` Daniel Lezcano 2020-11-10 12:50 ` Lukasz Luba 2020-10-07 10:43 ` [PATCH 0/4] powercap/dtpm: Add the DTPM framework Hans de Goede 2020-10-12 10:30 ` Daniel Lezcano 2020-10-12 11:46 ` Hans de Goede 2020-10-12 16:02 ` Daniel Lezcano 2020-10-13 12:47 ` Hans de Goede 2020-10-12 16:37 ` Rafael J. Wysocki 2020-10-13 13:04 ` Hans de Goede 2020-10-14 13:33 ` Rafael J. Wysocki 2020-10-14 14:06 ` Hans de Goede 2020-10-14 15:42 ` Rafael J. Wysocki 2020-10-16 11:10 ` [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) Hans de Goede 2020-10-16 14:26 ` Elia Devito [not found] ` <HK2PR0302MB2449214B28438ADC1790D468BD030@HK2PR0302MB2449.apcprd03.prod.outlook.com> 2020-10-16 14:43 ` Fw: [External] " Mark Pearson 2020-10-16 15:16 ` Elia Devito 2020-10-16 14:51 ` Rafael J. Wysocki 2020-10-18 9:41 ` Hans de Goede 2020-10-18 12:31 ` Rafael J. Wysocki 2020-10-19 18:43 ` Hans de Goede [not found] ` <HK2PR0302MB24494037019FBC7720976735BD1E0@HK2PR0302MB2449.apcprd03.prod.outlook.com> 2020-10-19 18:49 ` Fw: [External] " Mark Pearson 2020-10-25 10:13 ` Hans de Goede 2020-10-20 12:34 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).