* [PATCH v2 0/2] intel_powerclamp: New module parameter @ 2023-02-05 2:59 Srinivas Pandruvada 2023-02-05 2:59 ` [PATCH v2 1/2] Documentation:admin-guide: Move intel_powerclamp documentation Srinivas Pandruvada ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Srinivas Pandruvada @ 2023-02-05 2:59 UTC (permalink / raw) To: rafael Cc: linux-pm, linux-kernel, daniel.lezcano, rui.zhang, Srinivas Pandruvada Split from the series for powerclamp user of powercap idle-inject. v2 - Build warnings reported by Rui - Moved the powerclamp documentation to admin guide folder - Commit log updated as suggested by Rafael and other code suggestion Srinivas Pandruvada (2): Documentation:admin-guide: Move intel_powerclamp documentation thermal/drivers/intel_powerclamp: Add two module parameters Documentation/admin-guide/index.rst | 1 + .../thermal/intel_powerclamp.rst | 22 +++ Documentation/driver-api/thermal/index.rst | 1 - MAINTAINERS | 1 + drivers/thermal/intel/intel_powerclamp.c | 177 +++++++++++++++--- 5 files changed, 180 insertions(+), 22 deletions(-) rename Documentation/{driver-api => admin-guide}/thermal/intel_powerclamp.rst (93%) -- 2.39.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/2] Documentation:admin-guide: Move intel_powerclamp documentation 2023-02-05 2:59 [PATCH v2 0/2] intel_powerclamp: New module parameter Srinivas Pandruvada @ 2023-02-05 2:59 ` Srinivas Pandruvada 2023-02-05 2:59 ` [PATCH v2 2/2] thermal/drivers/intel_powerclamp: Add two module parameters Srinivas Pandruvada 2023-02-05 15:57 ` [PATCH v2 0/2] intel_powerclamp: New module parameter Zhang, Rui 2 siblings, 0 replies; 10+ messages in thread From: Srinivas Pandruvada @ 2023-02-05 2:59 UTC (permalink / raw) To: rafael Cc: linux-pm, linux-kernel, daniel.lezcano, rui.zhang, Srinivas Pandruvada Create a folder "thermal" under Documentation/admin-guide and move intel_powerclamp documentation to this folder. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> --- Documentation/admin-guide/index.rst | 1 + .../{driver-api => admin-guide}/thermal/intel_powerclamp.rst | 0 Documentation/driver-api/thermal/index.rst | 1 - MAINTAINERS | 1 + 4 files changed, 2 insertions(+), 1 deletion(-) rename Documentation/{driver-api => admin-guide}/thermal/intel_powerclamp.rst (100%) diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index 5bfafcbb9562..c872a8a1ddfa 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -116,6 +116,7 @@ configure specific aspects of kernel behavior to your liking. svga syscall-user-dispatch sysrq + thermal thunderbolt ufs unicode diff --git a/Documentation/driver-api/thermal/intel_powerclamp.rst b/Documentation/admin-guide/thermal/intel_powerclamp.rst similarity index 100% rename from Documentation/driver-api/thermal/intel_powerclamp.rst rename to Documentation/admin-guide/thermal/intel_powerclamp.rst diff --git a/Documentation/driver-api/thermal/index.rst b/Documentation/driver-api/thermal/index.rst index 030306ffa408..a886028014ab 100644 --- a/Documentation/driver-api/thermal/index.rst +++ b/Documentation/driver-api/thermal/index.rst @@ -14,7 +14,6 @@ Thermal exynos_thermal exynos_thermal_emulation - intel_powerclamp nouveau_thermal x86_pkg_temperature_thermal intel_dptf diff --git a/MAINTAINERS b/MAINTAINERS index b4043f72dfac..28fd62eacaf3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -20714,6 +20714,7 @@ S: Supported Q: https://patchwork.kernel.org/project/linux-pm/list/ T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git thermal F: Documentation/ABI/testing/sysfs-class-thermal +F: Documentation/admin-guide/thermal/ F: Documentation/devicetree/bindings/thermal/ F: Documentation/driver-api/thermal/ F: drivers/thermal/ -- 2.39.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/2] thermal/drivers/intel_powerclamp: Add two module parameters 2023-02-05 2:59 [PATCH v2 0/2] intel_powerclamp: New module parameter Srinivas Pandruvada 2023-02-05 2:59 ` [PATCH v2 1/2] Documentation:admin-guide: Move intel_powerclamp documentation Srinivas Pandruvada @ 2023-02-05 2:59 ` Srinivas Pandruvada 2023-02-05 15:57 ` [PATCH v2 0/2] intel_powerclamp: New module parameter Zhang, Rui 2 siblings, 0 replies; 10+ messages in thread From: Srinivas Pandruvada @ 2023-02-05 2:59 UTC (permalink / raw) To: rafael Cc: linux-pm, linux-kernel, daniel.lezcano, rui.zhang, Srinivas Pandruvada In some use cases, it is desirable to only inject idle on certain set of CPUs. For example on Alder Lake systems, it is possible that we force idle only on P-Cores for thermal reasons. Also the idle percent can be more than 50% if we only choose partial set of CPUs in the system. Introduce 2 new module parameters for this purpose. They can be only changed when the cooling device is inactive. cpumask (Read/Write): A bit mask of CPUs to inject idle. The format of this bitmask is same as used in other subsystems like in /proc/irq/*/smp_affinity. The mask is comma separated 32 bit groups. Each CPU is one bit. For example for 256 CPU system the full mask is: ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff The leftmost mask is for CPU 0-32. max_idle (Read/Write): Maximum injected idle time to the total CPU time ratio in percent range from 1 to 100. Even if the cooling device max_state is always 100 (100%), this parameter allows to add a max idle percent limit. The default is 50, to match the current implementation of powerclamp driver. Also doesn't allow value more than 75, if the cpumask includes every CPU present in the system. Also when the cpumask doesn't include every CPU, there is no use of compensation using package C-state idle counters. Hence don't start package C-state polling thread even for a single package or a single die system in this case. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> --- .../admin-guide/thermal/intel_powerclamp.rst | 22 +++ drivers/thermal/intel/intel_powerclamp.c | 177 +++++++++++++++--- 2 files changed, 178 insertions(+), 21 deletions(-) diff --git a/Documentation/admin-guide/thermal/intel_powerclamp.rst b/Documentation/admin-guide/thermal/intel_powerclamp.rst index 3f6dfb0b3ea6..da83b5eefbff 100644 --- a/Documentation/admin-guide/thermal/intel_powerclamp.rst +++ b/Documentation/admin-guide/thermal/intel_powerclamp.rst @@ -26,6 +26,8 @@ By: - Generic Thermal Layer (sysfs) - Kernel APIs (TBD) + (*) Module Parameters + INTRODUCTION ============ @@ -318,3 +320,23 @@ device, a PID based userspace thermal controller can manage to control CPU temperature effectively, when no other thermal influence is added. For example, a UltraBook user can compile the kernel under certain temperature (below most active trip points). + +Module Parameters +================= + +``cpumask`` (RW) + A bit mask of CPUs to inject idle. The format of the bitmask is same as + used in other subsystems like in /proc/irq/*/smp_affinity. The mask is + comma separated 32 bit groups. Each CPU is one bit. For example for a 256 + CPU system the full mask is: + ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff + + The leftmost mask is for CPU 0-32. + +``max_idle`` (RW) + Maximum injected idle time to the total CPU time ratio in percent range + from 1 to 100. Even if the cooling device max_state is always 100 (100%), + this parameter allows to add a max idle percent limit. The default is 50, + to match the current implementation of powerclamp driver. Also doesn't + allow value more than 75, if the cpumask includes every CPU present in + the system. diff --git a/drivers/thermal/intel/intel_powerclamp.c b/drivers/thermal/intel/intel_powerclamp.c index 1390748706a6..6d00ac597b8a 100644 --- a/drivers/thermal/intel/intel_powerclamp.c +++ b/drivers/thermal/intel/intel_powerclamp.c @@ -37,7 +37,7 @@ #include <asm/mwait.h> #include <asm/cpu_device_id.h> -#define MAX_TARGET_RATIO (50U) +#define MAX_TARGET_RATIO (100U) /* For each undisturbed clamping period (no extra wake ups during idle time), * we increment the confidence counter for the given target ratio. * CONFIDENCE_OK defines the level where runtime calibration results are @@ -105,10 +105,144 @@ static const struct kernel_param_ops duration_ops = { .get = param_get_int, }; - module_param_cb(duration, &duration_ops, &duration, 0644); MODULE_PARM_DESC(duration, "forced idle time for each attempt in msec."); +#define DEFAULT_MAX_IDLE 50 +#define MAX_ALL_CPU_IDLE 75 + +static u8 max_idle = DEFAULT_MAX_IDLE; + +static cpumask_var_t idle_injection_cpu_mask; + +static int allocate_copy_idle_injection_mask(const struct cpumask *copy_mask) +{ + if (cpumask_available(idle_injection_cpu_mask)) + goto copy_mask; + + /* This mask is allocated only one time and freed during module exit */ + if (!alloc_cpumask_var(&idle_injection_cpu_mask, GFP_KERNEL)) + return -ENOMEM; + +copy_mask: + cpumask_copy(idle_injection_cpu_mask, copy_mask); + + return 0; +} + +/* Return true if the cpumask and idle percent combination is invalid */ +static bool check_invalid(cpumask_var_t mask, u8 idle) +{ + if (cpumask_equal(cpu_present_mask, mask) && idle > MAX_ALL_CPU_IDLE) + return true; + + return false; +} + +static int cpumask_set(const char *arg, const struct kernel_param *kp) +{ + cpumask_var_t new_mask; + int ret; + + mutex_lock(&powerclamp_lock); + + /* Can't set mask when cooling device is in use */ + if (powerclamp_data.clamping) { + ret = -EAGAIN; + goto skip_cpumask_set; + } + + ret = alloc_cpumask_var(&new_mask, GFP_KERNEL); + if (!ret) + goto skip_cpumask_set; + + ret = bitmap_parse(arg, strlen(arg), cpumask_bits(new_mask), + nr_cpumask_bits); + if (ret) + goto free_cpumask_set; + + if (cpumask_empty(new_mask) || check_invalid(new_mask, max_idle)) { + ret = -EINVAL; + goto free_cpumask_set; + } + + /* + * When module parameters are passed from kernel command line + * during insmod, the module parameter callback is called + * before powerclamp_init(), so we can't assume that some + * cpumask can be allocated and copied before here. Also + * in this case this cpumask is used as the default mask. + */ + ret = allocate_copy_idle_injection_mask(new_mask); + +free_cpumask_set: + free_cpumask_var(new_mask); +skip_cpumask_set: + mutex_unlock(&powerclamp_lock); + + return ret; +} + +static int cpumask_get(char *buf, const struct kernel_param *kp) +{ + if (!cpumask_available(idle_injection_cpu_mask)) + return -ENODEV; + + return bitmap_print_to_pagebuf(false, buf, cpumask_bits(idle_injection_cpu_mask), + nr_cpumask_bits); +} + +static const struct kernel_param_ops cpumask_ops = { + .set = cpumask_set, + .get = cpumask_get, +}; + +module_param_cb(cpumask, &cpumask_ops, NULL, 0644); +MODULE_PARM_DESC(cpumask, "Mask of CPUs to use for idle injection."); + +static int max_idle_set(const char *arg, const struct kernel_param *kp) +{ + u8 new_max_idle; + int ret = 0; + + mutex_lock(&powerclamp_lock); + + /* Can't set mask when cooling device is in use */ + if (powerclamp_data.clamping) { + ret = -EAGAIN; + goto skip_limit_set; + } + + ret = kstrtou8(arg, 10, &new_max_idle); + if (ret) + goto skip_limit_set; + + if (new_max_idle > MAX_TARGET_RATIO) { + ret = -EINVAL; + goto skip_limit_set; + } + + if (check_invalid(idle_injection_cpu_mask, new_max_idle)) { + ret = -EINVAL; + goto skip_limit_set; + } + + max_idle = new_max_idle; + +skip_limit_set: + mutex_unlock(&powerclamp_lock); + + return ret; +} + +static const struct kernel_param_ops max_idle_ops = { + .set = max_idle_set, + .get = param_get_int, +}; + +module_param_cb(max_idle, &max_idle_ops, &max_idle, 0644); +MODULE_PARM_DESC(max_idle, "maximum injected idle time to the total CPU time ratio in percent range:1-100"); + struct powerclamp_calibration_data { unsigned long confidence; /* used for calibration, basically a counter * gets incremented each time a clamping @@ -460,21 +594,15 @@ static void trigger_idle_injection(void) */ static int powerclamp_idle_injection_register(void) { - /* - * The idle inject core will only inject for online CPUs, - * So we can register for all present CPUs. In this way - * if some CPU goes online/offline while idle inject - * is registered, nothing additional calls are required. - * The same runtime and idle time is applicable for - * newly onlined CPUs if any. - * - * Here cpu_present_mask can be used as is. - * cast to (struct cpumask *) is required as the - * cpu_present_mask is const struct cpumask *, otherwise - * there will be compiler warnings. - */ - ii_dev = idle_inject_register_full((struct cpumask *)cpu_present_mask, - idle_inject_update); + poll_pkg_cstate_enable = false; + if (cpumask_equal(cpu_present_mask, idle_injection_cpu_mask)) { + ii_dev = idle_inject_register_full(idle_injection_cpu_mask, idle_inject_update); + if (topology_max_packages() == 1 && topology_max_die_per_package() == 1) + poll_pkg_cstate_enable = true; + } else { + ii_dev = idle_inject_register(idle_injection_cpu_mask); + } + if (!ii_dev) { pr_err("powerclamp: idle_inject_register failed\n"); return -EAGAIN; @@ -555,7 +683,7 @@ static int powerclamp_set_cur_state(struct thermal_cooling_device *cdev, mutex_lock(&powerclamp_lock); new_target_ratio = clamp(new_target_ratio, 0UL, - (unsigned long) (MAX_TARGET_RATIO - 1)); + (unsigned long) (max_idle - 1)); if (!powerclamp_data.target_ratio && new_target_ratio > 0) { pr_info("Start idle injection to reduce power\n"); powerclamp_data.target_ratio = new_target_ratio; @@ -646,15 +774,19 @@ static int __init powerclamp_init(void) /* probe cpu features and ids here */ retval = powerclamp_probe(); + if (retval) + return retval; + + mutex_lock(&powerclamp_lock); + retval = allocate_copy_idle_injection_mask(cpu_present_mask); + mutex_unlock(&powerclamp_lock); + if (retval) return retval; /* set default limit, maybe adjusted during runtime based on feedback */ window_size = 2; - if (topology_max_packages() == 1 && topology_max_die_per_package() == 1) - poll_pkg_cstate_enable = true; - cooling_dev = thermal_cooling_device_register("intel_powerclamp", NULL, &powerclamp_cooling_ops); if (IS_ERR(cooling_dev)) @@ -679,6 +811,9 @@ static void __exit powerclamp_exit(void) cancel_delayed_work_sync(&poll_pkg_cstate_work); debugfs_remove_recursive(debug_dir); + + if (cpumask_available(idle_injection_cpu_mask)) + free_cpumask_var(idle_injection_cpu_mask); } module_exit(powerclamp_exit); -- 2.39.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] intel_powerclamp: New module parameter 2023-02-05 2:59 [PATCH v2 0/2] intel_powerclamp: New module parameter Srinivas Pandruvada 2023-02-05 2:59 ` [PATCH v2 1/2] Documentation:admin-guide: Move intel_powerclamp documentation Srinivas Pandruvada 2023-02-05 2:59 ` [PATCH v2 2/2] thermal/drivers/intel_powerclamp: Add two module parameters Srinivas Pandruvada @ 2023-02-05 15:57 ` Zhang, Rui 2023-02-06 2:45 ` srinivas pandruvada 2 siblings, 1 reply; 10+ messages in thread From: Zhang, Rui @ 2023-02-05 15:57 UTC (permalink / raw) To: srinivas.pandruvada, rafael, Neri, Ricardo Cc: linux-pm, linux-kernel, daniel.lezcano Hi, Srinivas, First of all, the previous build error is gone. Second, I found something strange, which may be related with the scheduler asym-packing, so CC Ricardo. The test is done with pm linux-intel branch + this patch series on an ADL system. cpu0~cpu7 are Pcore cpus, cpu8-cpu15 are Ecore cpus, and intel_powerclamp is register as cooling_device21. 1. run stress -c 16 2. update /sys/module/intel_powerclamp/parameters/cpumask echo 90 > /sys/module/intel_powerclamp/parameters/max_idle 3. echo 90 > /sys/class/thermal/cooling_device21/cur_state 4. echo 0 > /sys/class/thermal/cooling_device21/cur_state I use turbostat to monitor the CPU Busy% in all 4 steps. If 'cpumask' does not include all the Ecore CPUs, all CPUs becomes 100% busy after idle injection removed in step 4. If 'cpumask' includes all the Ecore CPUs, i.e. cpumask = FFxy, in some cases, the Ecore CPUs will drop to an Busy% much lower than 10%, and then they don't come back to busy after idle injection removed in step 4, although we have 16 stress threads. And this also relates with how long we stay in idle injection. Say, when cpumask=fff3, the problem can be triggered occasionally if there is a 10 second timeout between step 3 and step4, but it is much easier to reproducible if I increase the timeout to 20 seconds. It seems that Pcore can always pull tasks from Ecores, but Ecore can not pull tasks from Pcore HT siblings. thanks, rui On Sat, 2023-02-04 at 18:59 -0800, Srinivas Pandruvada wrote: > Split from the series for powerclamp user of powercap idle-inject. > > v2 > - Build warnings reported by Rui > - Moved the powerclamp documentation to admin guide folder > - Commit log updated as suggested by Rafael and other code suggestion > > Srinivas Pandruvada (2): > Documentation:admin-guide: Move intel_powerclamp documentation > thermal/drivers/intel_powerclamp: Add two module parameters > > Documentation/admin-guide/index.rst | 1 + > .../thermal/intel_powerclamp.rst | 22 +++ > Documentation/driver-api/thermal/index.rst | 1 - > MAINTAINERS | 1 + > drivers/thermal/intel/intel_powerclamp.c | 177 +++++++++++++++- > -- > 5 files changed, 180 insertions(+), 22 deletions(-) > rename Documentation/{driver-api => admin- > guide}/thermal/intel_powerclamp.rst (93%) > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] intel_powerclamp: New module parameter 2023-02-05 15:57 ` [PATCH v2 0/2] intel_powerclamp: New module parameter Zhang, Rui @ 2023-02-06 2:45 ` srinivas pandruvada 2023-02-06 8:05 ` Zhang, Rui 0 siblings, 1 reply; 10+ messages in thread From: srinivas pandruvada @ 2023-02-06 2:45 UTC (permalink / raw) To: Zhang, Rui, rafael, Neri, Ricardo; +Cc: linux-pm, linux-kernel, daniel.lezcano Hi Rui, On Sun, 2023-02-05 at 15:57 +0000, Zhang, Rui wrote: > Hi, Srinivas, > > First of all, the previous build error is gone. > > Second, I found something strange, which may be related with the > scheduler asym-packing, so CC Ricardo. > I thought you disable ITMT before idle injection and reenebale after removal. > The test is done with pm linux-intel branch + this patch series on an > ADL system. Can you do test on bleeding edge branch of Linux-pm? > cpu0~cpu7 are Pcore cpus, cpu8-cpu15 are Ecore cpus, and > intel_powerclamp is register as cooling_device21. > > 1. run stress -c 16 > 2. update /sys/module/intel_powerclamp/parameters/cpumask > echo 90 > /sys/module/intel_powerclamp/parameters/max_idle > 3. echo 90 > /sys/class/thermal/cooling_device21/cur_state > 4. echo 0 > /sys/class/thermal/cooling_device21/cur_state > I use turbostat to monitor the CPU Busy% in all 4 steps. > > If 'cpumask' does not include all the Ecore CPUs, all CPUs becomes > 100% > busy after idle injection removed in step 4. > that should be the case. > If 'cpumask' includes all the Ecore CPUs, i.e. cpumask = FFxy, in > some > cases, the Ecore CPUs will drop to an Busy% much lower than 10%, and > then they don't come back to busy after idle injection removed in > step Do you see that idle injection is removed message in dmesg? We can also check powercap idle-inejct, if some CPUs still not wake from play_idle. > 4, although we have 16 stress threads. And this also relates with how > long we stay in idle injection. > > Say, when cpumask=fff3, the problem can be triggered occasionally if > there is a 10 second timeout between step 3 and step4, but it is much > easier to reproducible if I increase the timeout to 20 seconds. > > It seems that Pcore can always pull tasks from Ecores, but Ecore can > not pull tasks from Pcore HT siblings. > That will be regular load balance threads should do. Better to try upsteam kernel first. Thanks, Srinivas > thanks, > rui > > On Sat, 2023-02-04 at 18:59 -0800, Srinivas Pandruvada wrote: > > Split from the series for powerclamp user of powercap idle-inject. > > > > v2 > > - Build warnings reported by Rui > > - Moved the powerclamp documentation to admin guide folder > > - Commit log updated as suggested by Rafael and other code > > suggestion > > > > Srinivas Pandruvada (2): > > Documentation:admin-guide: Move intel_powerclamp documentation > > thermal/drivers/intel_powerclamp: Add two module parameters > > > > Documentation/admin-guide/index.rst | 1 + > > .../thermal/intel_powerclamp.rst | 22 +++ > > Documentation/driver-api/thermal/index.rst | 1 - > > MAINTAINERS | 1 + > > drivers/thermal/intel/intel_powerclamp.c | 177 > > +++++++++++++++- > > -- > > 5 files changed, 180 insertions(+), 22 deletions(-) > > rename Documentation/{driver-api => admin- > > guide}/thermal/intel_powerclamp.rst (93%) > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] intel_powerclamp: New module parameter 2023-02-06 2:45 ` srinivas pandruvada @ 2023-02-06 8:05 ` Zhang, Rui 2023-02-06 10:02 ` srinivas pandruvada 0 siblings, 1 reply; 10+ messages in thread From: Zhang, Rui @ 2023-02-06 8:05 UTC (permalink / raw) To: srinivas.pandruvada, rafael, Neri, Ricardo Cc: linux-pm, linux-kernel, daniel.lezcano On Sun, 2023-02-05 at 18:45 -0800, srinivas pandruvada wrote: > Hi Rui, > > On Sun, 2023-02-05 at 15:57 +0000, Zhang, Rui wrote: > > Hi, Srinivas, > > > > First of all, the previous build error is gone. > > > > Second, I found something strange, which may be related with the > > scheduler asym-packing, so CC Ricardo. > > > I thought you disable ITMT before idle injection and reenebale after > removal. No. I can reproduce this by playing with raw intel_powerclamp sysfs knobs and ITMT enabled. > > > > > The test is done with pm linux-intel branch sorry, I mean linux-next branch. > > + this patch series on an > > ADL system. > Can you do test on bleeding edge branch of Linux-pm? > > > cpu0~cpu7 are Pcore cpus, cpu8-cpu15 are Ecore cpus, and > > intel_powerclamp is register as cooling_device21. > > > > 1. run stress -c 16 > > 2. update /sys/module/intel_powerclamp/parameters/cpumask > > echo 90 > /sys/module/intel_powerclamp/parameters/max_idle > > 3. echo 90 > /sys/class/thermal/cooling_device21/cur_state > > 4. echo 0 > /sys/class/thermal/cooling_device21/cur_state > > I use turbostat to monitor the CPU Busy% in all 4 steps. > > > > If 'cpumask' does not include all the Ecore CPUs, all CPUs becomes > > 100% > > busy after idle injection removed in step 4. > > > that should be the case. > > > If 'cpumask' includes all the Ecore CPUs, i.e. cpumask = FFxy, in > > some > > cases, the Ecore CPUs will drop to an Busy% much lower than 10%, > > and > > then they don't come back to busy after idle injection removed in > > step > Do you see that idle injection is removed message in dmesg? yes. > We can also check powercap idle-inejct, if some CPUs still not wake > from play_idle. "ps" command shows the the idle_injection threads time is not increasing any more. > > > > 4, although we have 16 stress threads. And this also relates with > > how > > long we stay in idle injection. > > > > Say, when cpumask=fff3, the problem can be triggered occasionally > > if > > there is a 10 second timeout between step 3 and step4, but it is > > much > > easier to reproducible if I increase the timeout to 20 seconds. > > > > It seems that Pcore can always pull tasks from Ecores, but Ecore > > can > > not pull tasks from Pcore HT siblings. > > > That will be regular load balance threads should do. > Better to try upsteam kernel first. I'm already running with linux-pm tree linux-next branch + this patch series. thanks, rui > > Thanks, > Srinivas > > > > thanks, > > rui > > > > On Sat, 2023-02-04 at 18:59 -0800, Srinivas Pandruvada wrote: > > > Split from the series for powerclamp user of powercap idle- > > > inject. > > > > > > v2 > > > - Build warnings reported by Rui > > > - Moved the powerclamp documentation to admin guide folder > > > - Commit log updated as suggested by Rafael and other code > > > suggestion > > > > > > Srinivas Pandruvada (2): > > > Documentation:admin-guide: Move intel_powerclamp documentation > > > thermal/drivers/intel_powerclamp: Add two module parameters > > > > > > Documentation/admin-guide/index.rst | 1 + > > > .../thermal/intel_powerclamp.rst | 22 +++ > > > Documentation/driver-api/thermal/index.rst | 1 - > > > MAINTAINERS | 1 + > > > drivers/thermal/intel/intel_powerclamp.c | 177 > > > +++++++++++++++- > > > -- > > > 5 files changed, 180 insertions(+), 22 deletions(-) > > > rename Documentation/{driver-api => admin- > > > guide}/thermal/intel_powerclamp.rst (93%) > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] intel_powerclamp: New module parameter 2023-02-06 8:05 ` Zhang, Rui @ 2023-02-06 10:02 ` srinivas pandruvada 2023-02-07 13:42 ` Ricardo Neri 0 siblings, 1 reply; 10+ messages in thread From: srinivas pandruvada @ 2023-02-06 10:02 UTC (permalink / raw) To: Zhang, Rui, rafael, Neri, Ricardo; +Cc: linux-pm, linux-kernel, daniel.lezcano On Mon, 2023-02-06 at 08:05 +0000, Zhang, Rui wrote: > On Sun, 2023-02-05 at 18:45 -0800, srinivas pandruvada wrote: > > Hi Rui, > > > > On Sun, 2023-02-05 at 15:57 +0000, Zhang, Rui wrote: > > > Hi, Srinivas, > > > > > > First of all, the previous build error is gone. > > > > > > Second, I found something strange, which may be related with the > > > scheduler asym-packing, so CC Ricardo. > > > > > I thought you disable ITMT before idle injection and reenebale > > after > > removal. > > No. > > I can reproduce this by playing with raw intel_powerclamp sysfs knobs > and ITMT enabled. > This issue is happening even if ITMT disabled. If the module mask is composed of P-cores it works or even on servers as expected. Also if you offline all P-cores then select mask among E-cores, it is working. Somehow P-core influences E-cores. Since this patch is module mask related, that is functioning correctly. We have to debug this interaction with P and E cores separately. Thanks, Srinivas > > > > > > > > > The test is done with pm linux-intel branch > > sorry, I mean linux-next branch. > > > > + this patch series on an > > > ADL system. > > Can you do test on bleeding edge branch of Linux-pm? > > > > > cpu0~cpu7 are Pcore cpus, cpu8-cpu15 are Ecore cpus, and > > > intel_powerclamp is register as cooling_device21. > > > > > > 1. run stress -c 16 > > > 2. update /sys/module/intel_powerclamp/parameters/cpumask > > > echo 90 > /sys/module/intel_powerclamp/parameters/max_idle > > > 3. echo 90 > /sys/class/thermal/cooling_device21/cur_state > > > 4. echo 0 > /sys/class/thermal/cooling_device21/cur_state > > > I use turbostat to monitor the CPU Busy% in all 4 steps. > > > > > > If 'cpumask' does not include all the Ecore CPUs, all CPUs > > > becomes > > > 100% > > > busy after idle injection removed in step 4. > > > > > that should be the case. > > > > > If 'cpumask' includes all the Ecore CPUs, i.e. cpumask = FFxy, in > > > some > > > cases, the Ecore CPUs will drop to an Busy% much lower than 10%, > > > and > > > then they don't come back to busy after idle injection removed in > > > step > > Do you see that idle injection is removed message in dmesg? > > yes. > > > We can also check powercap idle-inejct, if some CPUs still not wake > > from play_idle. > > "ps" command shows the the idle_injection threads time is not > increasing any more. > > > > > > > > 4, although we have 16 stress threads. And this also relates with > > > how > > > long we stay in idle injection. > > > > > > Say, when cpumask=fff3, the problem can be triggered occasionally > > > if > > > there is a 10 second timeout between step 3 and step4, but it is > > > much > > > easier to reproducible if I increase the timeout to 20 seconds. > > > > > > It seems that Pcore can always pull tasks from Ecores, but Ecore > > > can > > > not pull tasks from Pcore HT siblings. > > > > > That will be regular load balance threads should do. > > Better to try upsteam kernel first. > > I'm already running with linux-pm tree linux-next branch + this patch > series. > > thanks, > rui > > > > > Thanks, > > Srinivas > > > > > > > thanks, > > > rui > > > > > > On Sat, 2023-02-04 at 18:59 -0800, Srinivas Pandruvada wrote: > > > > Split from the series for powerclamp user of powercap idle- > > > > inject. > > > > > > > > v2 > > > > - Build warnings reported by Rui > > > > - Moved the powerclamp documentation to admin guide folder > > > > - Commit log updated as suggested by Rafael and other code > > > > suggestion > > > > > > > > Srinivas Pandruvada (2): > > > > Documentation:admin-guide: Move intel_powerclamp > > > > documentation > > > > thermal/drivers/intel_powerclamp: Add two module parameters > > > > > > > > Documentation/admin-guide/index.rst | 1 + > > > > .../thermal/intel_powerclamp.rst | 22 +++ > > > > Documentation/driver-api/thermal/index.rst | 1 - > > > > MAINTAINERS | 1 + > > > > drivers/thermal/intel/intel_powerclamp.c | 177 > > > > +++++++++++++++- > > > > -- > > > > 5 files changed, 180 insertions(+), 22 deletions(-) > > > > rename Documentation/{driver-api => admin- > > > > guide}/thermal/intel_powerclamp.rst (93%) > > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] intel_powerclamp: New module parameter 2023-02-06 10:02 ` srinivas pandruvada @ 2023-02-07 13:42 ` Ricardo Neri 2023-02-07 13:51 ` srinivas pandruvada 0 siblings, 1 reply; 10+ messages in thread From: Ricardo Neri @ 2023-02-07 13:42 UTC (permalink / raw) To: srinivas pandruvada Cc: Zhang, Rui, rafael, Neri, Ricardo, linux-pm, linux-kernel, daniel.lezcano On Mon, Feb 06, 2023 at 02:02:28AM -0800, srinivas pandruvada wrote: > On Mon, 2023-02-06 at 08:05 +0000, Zhang, Rui wrote: > > On Sun, 2023-02-05 at 18:45 -0800, srinivas pandruvada wrote: > > > Hi Rui, > > > > > > On Sun, 2023-02-05 at 15:57 +0000, Zhang, Rui wrote: > > > > Hi, Srinivas, > > > > > > > > First of all, the previous build error is gone. > > > > > > > > Second, I found something strange, which may be related with the > > > > scheduler asym-packing, so CC Ricardo. > > > > > > > I thought you disable ITMT before idle injection and reenebale > > > after > > > removal. > > > > No. > > > > I can reproduce this by playing with raw intel_powerclamp sysfs knobs > > and ITMT enabled. > > > > This issue is happening even if ITMT disabled. If the module mask is > composed of P-cores it works or even on servers as expected. > Also if you offline all P-cores then select mask among E-cores, it is > working. Somehow P-core influences E-cores. > > Since this patch is module mask related, that is functioning correctly. > We have to debug this interaction with P and E cores separately. Currently, when doing asym_packing, ECores will only pull tasks from a PCore only if both SMT siblings are busy. It will only pull from the lower-priority sibling. These patches [1] let ECores pull from either sibling, if both are busy. I presume that by injecting idle, the scheduler thinks that the CPU is idle (i.e., idle_cpu() returns true) and it will not do asym_packing from lower-priority CPUs. However, in your experiment you have 16 threads. If a Pcore is overloaded, an ECore should be able to help. [1]. https://lore.kernel.org/lkml/20230207045838.11243-1-ricardo.neri-calderon@linux.intel.com/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] intel_powerclamp: New module parameter 2023-02-07 13:42 ` Ricardo Neri @ 2023-02-07 13:51 ` srinivas pandruvada 2023-02-07 19:08 ` Ricardo Neri 0 siblings, 1 reply; 10+ messages in thread From: srinivas pandruvada @ 2023-02-07 13:51 UTC (permalink / raw) To: Ricardo Neri Cc: Zhang, Rui, rafael, Neri, Ricardo, linux-pm, linux-kernel, daniel.lezcano On Tue, 2023-02-07 at 05:42 -0800, Ricardo Neri wrote: > On Mon, Feb 06, 2023 at 02:02:28AM -0800, srinivas pandruvada wrote: > > On Mon, 2023-02-06 at 08:05 +0000, Zhang, Rui wrote: > > > On Sun, 2023-02-05 at 18:45 -0800, srinivas pandruvada wrote: > > > > Hi Rui, > > > > > > > > On Sun, 2023-02-05 at 15:57 +0000, Zhang, Rui wrote: > > > > > Hi, Srinivas, > > > > > > > > > > First of all, the previous build error is gone. > > > > > > > > > > Second, I found something strange, which may be related with > > > > > the > > > > > scheduler asym-packing, so CC Ricardo. > > > > > > > > > I thought you disable ITMT before idle injection and reenebale > > > > after > > > > removal. > > > > > > No. > > > > > > I can reproduce this by playing with raw intel_powerclamp sysfs > > > knobs > > > and ITMT enabled. > > > > > > > This issue is happening even if ITMT disabled. If the module mask > > is > > composed of P-cores it works or even on servers as expected. > > Also if you offline all P-cores then select mask among E-cores, it > > is > > working. Somehow P-core influences E-cores. > > > > Since this patch is module mask related, that is functioning > > correctly. > > We have to debug this interaction with P and E cores separately. > > Currently, when doing asym_packing, ECores will only pull tasks from > a > PCore only if both SMT siblings are busy. It will only pull from the > lower-priority sibling. These patches [1] let ECores pull from either > sibling, if both are busy. > > I presume that by injecting idle, the scheduler thinks that the CPU > is > idle (i.e., idle_cpu() returns true) and it will not do asym_packing > from > lower-priority CPUs. > > However, in your experiment you have 16 threads. If a Pcore is > overloaded, > an ECore should be able to help. This issue happens with or without ITMT and also without any idle injection active. Thanks, Srinivas > > [1]. > https://lore.kernel.org/lkml/20230207045838.11243-1-ricardo.neri-calderon@linux.intel.com/ > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] intel_powerclamp: New module parameter 2023-02-07 13:51 ` srinivas pandruvada @ 2023-02-07 19:08 ` Ricardo Neri 0 siblings, 0 replies; 10+ messages in thread From: Ricardo Neri @ 2023-02-07 19:08 UTC (permalink / raw) To: srinivas pandruvada Cc: Zhang, Rui, rafael, Neri, Ricardo, linux-pm, linux-kernel, daniel.lezcano On Tue, Feb 07, 2023 at 05:51:59AM -0800, srinivas pandruvada wrote: > On Tue, 2023-02-07 at 05:42 -0800, Ricardo Neri wrote: > > On Mon, Feb 06, 2023 at 02:02:28AM -0800, srinivas pandruvada wrote: > > > On Mon, 2023-02-06 at 08:05 +0000, Zhang, Rui wrote: > > > > On Sun, 2023-02-05 at 18:45 -0800, srinivas pandruvada wrote: > > > > > Hi Rui, > > > > > > > > > > On Sun, 2023-02-05 at 15:57 +0000, Zhang, Rui wrote: > > > > > > Hi, Srinivas, > > > > > > > > > > > > First of all, the previous build error is gone. > > > > > > > > > > > > Second, I found something strange, which may be related with > > > > > > the > > > > > > scheduler asym-packing, so CC Ricardo. > > > > > > > > > > > I thought you disable ITMT before idle injection and reenebale > > > > > after > > > > > removal. > > > > > > > > No. > > > > > > > > I can reproduce this by playing with raw intel_powerclamp sysfs > > > > knobs > > > > and ITMT enabled. > > > > > > > > > > This issue is happening even if ITMT disabled. If the module mask > > > is > > > composed of P-cores it works or even on servers as expected. > > > Also if you offline all P-cores then select mask among E-cores, it > > > is > > > working. Somehow P-core influences E-cores. > > > > > > Since this patch is module mask related, that is functioning > > > correctly. > > > We have to debug this interaction with P and E cores separately. > > > > Currently, when doing asym_packing, ECores will only pull tasks from > > a > > PCore only if both SMT siblings are busy. It will only pull from the > > lower-priority sibling. These patches [1] let ECores pull from either > > sibling, if both are busy. > > > > I presume that by injecting idle, the scheduler thinks that the CPU > > is > > idle (i.e., idle_cpu() returns true) and it will not do asym_packing > > from > > lower-priority CPUs. > > > > However, in your experiment you have 16 threads. If a Pcore is > > overloaded, > > an ECore should be able to help. > This issue happens with or without ITMT and also without any idle > injection active. I was not able to reproduce this issue on my ADL-S system with ITMT. The described bug is exactly what and old patchset of mine was supposed to fix [2]. Maybe the CPU priorities in the failing system are such that it prevents asym_packing from kicking in. I was able to reproduce the issue without ITMT. I had found that the scheduler cannot handle load balance between SMT and non-SMT cores correctly. My patchset [1] includes fixes for this case. I applied it on top of Rafael's linux-next branch and it fixed the issue for me in the non-ITMT case. Perhaps patches 5 and 6 are sufficient, but I applied the whole series. Thanks and BR, Ricardo [1]. https://lore.kernel.org/lkml/20230207045838.11243-1-ricardo.neri-calderon@linux.intel.com/ [2]. https://lore.kernel.org/all/20210911011819.12184-7-ricardo.neri-calderon@linux.intel.com/ ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-02-07 18:58 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-02-05 2:59 [PATCH v2 0/2] intel_powerclamp: New module parameter Srinivas Pandruvada 2023-02-05 2:59 ` [PATCH v2 1/2] Documentation:admin-guide: Move intel_powerclamp documentation Srinivas Pandruvada 2023-02-05 2:59 ` [PATCH v2 2/2] thermal/drivers/intel_powerclamp: Add two module parameters Srinivas Pandruvada 2023-02-05 15:57 ` [PATCH v2 0/2] intel_powerclamp: New module parameter Zhang, Rui 2023-02-06 2:45 ` srinivas pandruvada 2023-02-06 8:05 ` Zhang, Rui 2023-02-06 10:02 ` srinivas pandruvada 2023-02-07 13:42 ` Ricardo Neri 2023-02-07 13:51 ` srinivas pandruvada 2023-02-07 19:08 ` Ricardo Neri
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).