All of lore.kernel.org
 help / color / mirror / Atom feed
* [bug/6.3-rc4/bisected] WARNING at cooling_device_stats_setup+0xac caused by commit 790930f44289c8209c57461b2db499fcc702e0b3
@ 2023-03-30  7:52 Mikhail Gavrilov
  2023-03-30 10:07 ` Rafael J. Wysocki
  0 siblings, 1 reply; 3+ messages in thread
From: Mikhail Gavrilov @ 2023-03-30  7:52 UTC (permalink / raw)
  To: rafael.j.wysocki, rui.zhang, Linux List Kernel Mailing, rafael,
	daniel.lezcano, linux-pm

[-- Attachment #1: Type: text/plain, Size: 5552 bytes --]

Hi,
The release 6.3-rc4 brings new warning messages to log:
[    4.590775] ------------[ cut here ]------------
[    4.590783] WARNING: CPU: 2 PID: 1 at
drivers/thermal/thermal_sysfs.c:879
cooling_device_stats_setup+0xac/0xc0
[    4.590799] Modules linked in:
[    4.590806] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
6.3.0-rc3-08-790930f44289c8209c57461b2db499fcc702e0b3+ #87
[    4.590819] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
G513QY_G513QY/G513QY, BIOS G513QY.320 09/07/2022
[    4.590832] RIP: 0010:cooling_device_stats_setup+0xac/0xc0
[    4.590841] Code: ff 48 89 1d 9e 27 9f 01 5b 5d 41 5c c3 cc cc cc
cc 48 8d bf 60 05 00 00 be ff ff ff ff e8 5c 16 3b 00 85 c0 0f 85 72
ff ff ff <0f> 0b e9 6b ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90
90 90
[    4.590863] RSP: 0018:ffffa5a080107c60 EFLAGS: 00010246
[    4.590871] RAX: 0000000000000000 RBX: ffff96fc51f6d800 RCX: 0000000000000001
[    4.590880] RDX: 0000000000000000 RSI: ffffffffb9a7f591 RDI: ffffffffb9b325ce
[    4.590889] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000001
[    4.590898] R10: 0000000000000001 R11: 0000000000000001 R12: ffff96fc51f6d800
[    4.590907] R13: ffff96fc51f6d818 R14: ffff96fc4b450000 R15: 0000000000000000
[    4.590916] FS:  0000000000000000(0000) GS:ffff970b16a00000(0000)
knlGS:0000000000000000
[    4.590927] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.590934] CR2: 0000000000000000 CR3: 000000034643c000 CR4: 0000000000750ee0
[    4.590944] PKRU: 55555554
[    4.590948] Call Trace:
[    4.590953]  <TASK>
[    4.590958]  thermal_cooling_device_setup_sysfs+0xe/0x20
[    4.590967]  __thermal_cooling_device_register.part.0+0x13c/0x3d0
[    4.590977]  acpi_processor_thermal_init+0x22/0x100
[    4.590987]  __acpi_processor_start+0x7f/0xf0
[    4.590995]  acpi_processor_start+0x2c/0x50
[    4.591002]  really_probe+0x19e/0x3e0
[    4.591010]  ? __pfx___driver_attach+0x10/0x10
[    4.591017]  __driver_probe_device+0x78/0x160
[    4.591025]  driver_probe_device+0x1f/0x90
[    4.591032]  __driver_attach+0xd2/0x1c0
[    4.591039]  bus_for_each_dev+0x8b/0xe0
[    4.591047]  bus_add_driver+0x115/0x210
[    4.591055]  driver_register+0x55/0x100
[    4.591062]  ? __pfx_acpi_processor_driver_init+0x10/0x10
[    4.591072]  acpi_processor_driver_init+0x3b/0xc0
[    4.591080]  ? __pfx_acpi_processor_driver_init+0x10/0x10
[    4.591088]  do_one_initcall+0x70/0x290
[    4.591101]  kernel_init_freeable+0x3c5/0x580
[    4.591112]  ? __pfx_kernel_init+0x10/0x10
[    4.591122]  kernel_init+0x16/0x1c0
[    4.591128]  ret_from_fork+0x2c/0x50
[    4.591139]  </TASK>

This message appears after each boot.

Bisect blaming this commit:

commit 790930f44289c8209c57461b2db499fcc702e0b3
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Fri Mar 17 18:01:26 2023 +0100

    thermal: core: Introduce thermal_cooling_device_update()

    Introduce a core thermal API function, thermal_cooling_device_update(),
    for updating the max_state value for a cooling device and rearranging
    its statistics in sysfs after a possible change of its ->get_max_state()
    callback return value.

    That callback is now invoked only once, during cooling device
    registration, to populate the max_state field in the cooling device
    object, so if its return value changes, it needs to be invoked again
    and the new return value needs to be stored as max_state.  Moreover,
    the statistics presented in sysfs need to be rearranged in general,
    because there may not be enough room in them to store data for all
    of the possible states (in the case when max_state grows).

    The new function takes care of that (and some other minor things
    related to it), but some extra locking and lockdep annotations are
    added in several places too to protect against crashes in the cases
    when the statistics are not present or when a stale max_state value
    might be used by sysfs attributes.

    Note that the actual user of the new function will be added separately.

    Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Tested-by: Zhang Rui <rui.zhang@intel.com>
    Reviewed-by: Zhang Rui <rui.zhang@intel.com>

 drivers/thermal/thermal_core.c  | 83 ++++++++++++++++++++++++++++++++++++++++-
 drivers/thermal/thermal_core.h  |  2 +
 drivers/thermal/thermal_sysfs.c | 74 +++++++++++++++++++++++++++++++-----
 include/linux/thermal.h         |  1 +
 4 files changed, 150 insertions(+), 10 deletions(-)

All my PCs turned up affected by this issue:
- CPU: Ryzen 3950X / MB: ROG Strix X570-I
- CPU Ruzen 7950X / MB: MPG B650I EDGE WIFI
- Laptop: ASUS ROG Strix G15 G513QY-HF001 (CPU: 5900HX)

Unfortunately I couldn't check revert this commit, because after
reverting the kernel does not build.

drivers/acpi/processor_thermal.c: In function ‘acpi_thermal_cpufreq_init’:
drivers/acpi/processor_thermal.c:149:17: error: implicit declaration
of function ‘thermal_cooling_device_update’; did you mean
‘thermal_zone_device_update’? [-Werror=implicit-function-declaration]
  149 |                 thermal_cooling_device_update(pr->cdev);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                 thermal_zone_device_update


Who wants to see the full kernel log could see an attached archive (for laptop).

-- 
Best Regards,
Mike Gavrilov.

[-- Attachment #2: dmesg.tar.xz --]
[-- Type: application/x-xz, Size: 32792 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bug/6.3-rc4/bisected] WARNING at cooling_device_stats_setup+0xac caused by commit 790930f44289c8209c57461b2db499fcc702e0b3
  2023-03-30  7:52 [bug/6.3-rc4/bisected] WARNING at cooling_device_stats_setup+0xac caused by commit 790930f44289c8209c57461b2db499fcc702e0b3 Mikhail Gavrilov
@ 2023-03-30 10:07 ` Rafael J. Wysocki
  2023-03-30 13:25   ` Mikhail Gavrilov
  0 siblings, 1 reply; 3+ messages in thread
From: Rafael J. Wysocki @ 2023-03-30 10:07 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: rafael.j.wysocki, rui.zhang, Linux List Kernel Mailing, rafael,
	daniel.lezcano, linux-pm

On Thu, Mar 30, 2023 at 9:52 AM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> Hi,
> The release 6.3-rc4 brings new warning messages to log:

Thanks for the report, please see this patch:

https://patchwork.kernel.org/project/linux-pm/patch/2681615.mvXUDI8C0e@kreacher/

> [    4.590775] ------------[ cut here ]------------
> [    4.590783] WARNING: CPU: 2 PID: 1 at
> drivers/thermal/thermal_sysfs.c:879
> cooling_device_stats_setup+0xac/0xc0
> [    4.590799] Modules linked in:
> [    4.590806] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
> 6.3.0-rc3-08-790930f44289c8209c57461b2db499fcc702e0b3+ #87
> [    4.590819] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
> G513QY_G513QY/G513QY, BIOS G513QY.320 09/07/2022
> [    4.590832] RIP: 0010:cooling_device_stats_setup+0xac/0xc0
> [    4.590841] Code: ff 48 89 1d 9e 27 9f 01 5b 5d 41 5c c3 cc cc cc
> cc 48 8d bf 60 05 00 00 be ff ff ff ff e8 5c 16 3b 00 85 c0 0f 85 72
> ff ff ff <0f> 0b e9 6b ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90
> 90 90
> [    4.590863] RSP: 0018:ffffa5a080107c60 EFLAGS: 00010246
> [    4.590871] RAX: 0000000000000000 RBX: ffff96fc51f6d800 RCX: 0000000000000001
> [    4.590880] RDX: 0000000000000000 RSI: ffffffffb9a7f591 RDI: ffffffffb9b325ce
> [    4.590889] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000001
> [    4.590898] R10: 0000000000000001 R11: 0000000000000001 R12: ffff96fc51f6d800
> [    4.590907] R13: ffff96fc51f6d818 R14: ffff96fc4b450000 R15: 0000000000000000
> [    4.590916] FS:  0000000000000000(0000) GS:ffff970b16a00000(0000)
> knlGS:0000000000000000
> [    4.590927] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    4.590934] CR2: 0000000000000000 CR3: 000000034643c000 CR4: 0000000000750ee0
> [    4.590944] PKRU: 55555554
> [    4.590948] Call Trace:
> [    4.590953]  <TASK>
> [    4.590958]  thermal_cooling_device_setup_sysfs+0xe/0x20
> [    4.590967]  __thermal_cooling_device_register.part.0+0x13c/0x3d0
> [    4.590977]  acpi_processor_thermal_init+0x22/0x100
> [    4.590987]  __acpi_processor_start+0x7f/0xf0
> [    4.590995]  acpi_processor_start+0x2c/0x50
> [    4.591002]  really_probe+0x19e/0x3e0
> [    4.591010]  ? __pfx___driver_attach+0x10/0x10
> [    4.591017]  __driver_probe_device+0x78/0x160
> [    4.591025]  driver_probe_device+0x1f/0x90
> [    4.591032]  __driver_attach+0xd2/0x1c0
> [    4.591039]  bus_for_each_dev+0x8b/0xe0
> [    4.591047]  bus_add_driver+0x115/0x210
> [    4.591055]  driver_register+0x55/0x100
> [    4.591062]  ? __pfx_acpi_processor_driver_init+0x10/0x10
> [    4.591072]  acpi_processor_driver_init+0x3b/0xc0
> [    4.591080]  ? __pfx_acpi_processor_driver_init+0x10/0x10
> [    4.591088]  do_one_initcall+0x70/0x290
> [    4.591101]  kernel_init_freeable+0x3c5/0x580
> [    4.591112]  ? __pfx_kernel_init+0x10/0x10
> [    4.591122]  kernel_init+0x16/0x1c0
> [    4.591128]  ret_from_fork+0x2c/0x50
> [    4.591139]  </TASK>
>
> This message appears after each boot.
>
> Bisect blaming this commit:
>
> commit 790930f44289c8209c57461b2db499fcc702e0b3
> Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Date:   Fri Mar 17 18:01:26 2023 +0100
>
>     thermal: core: Introduce thermal_cooling_device_update()
>
>     Introduce a core thermal API function, thermal_cooling_device_update(),
>     for updating the max_state value for a cooling device and rearranging
>     its statistics in sysfs after a possible change of its ->get_max_state()
>     callback return value.
>
>     That callback is now invoked only once, during cooling device
>     registration, to populate the max_state field in the cooling device
>     object, so if its return value changes, it needs to be invoked again
>     and the new return value needs to be stored as max_state.  Moreover,
>     the statistics presented in sysfs need to be rearranged in general,
>     because there may not be enough room in them to store data for all
>     of the possible states (in the case when max_state grows).
>
>     The new function takes care of that (and some other minor things
>     related to it), but some extra locking and lockdep annotations are
>     added in several places too to protect against crashes in the cases
>     when the statistics are not present or when a stale max_state value
>     might be used by sysfs attributes.
>
>     Note that the actual user of the new function will be added separately.
>
>     Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
>     Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>     Tested-by: Zhang Rui <rui.zhang@intel.com>
>     Reviewed-by: Zhang Rui <rui.zhang@intel.com>
>
>  drivers/thermal/thermal_core.c  | 83 ++++++++++++++++++++++++++++++++++++++++-
>  drivers/thermal/thermal_core.h  |  2 +
>  drivers/thermal/thermal_sysfs.c | 74 +++++++++++++++++++++++++++++++-----
>  include/linux/thermal.h         |  1 +
>  4 files changed, 150 insertions(+), 10 deletions(-)
>
> All my PCs turned up affected by this issue:
> - CPU: Ryzen 3950X / MB: ROG Strix X570-I
> - CPU Ruzen 7950X / MB: MPG B650I EDGE WIFI
> - Laptop: ASUS ROG Strix G15 G513QY-HF001 (CPU: 5900HX)
>
> Unfortunately I couldn't check revert this commit, because after
> reverting the kernel does not build.
>
> drivers/acpi/processor_thermal.c: In function ‘acpi_thermal_cpufreq_init’:
> drivers/acpi/processor_thermal.c:149:17: error: implicit declaration
> of function ‘thermal_cooling_device_update’; did you mean
> ‘thermal_zone_device_update’? [-Werror=implicit-function-declaration]
>   149 |                 thermal_cooling_device_update(pr->cdev);
>       |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>       |                 thermal_zone_device_update
>
>
> Who wants to see the full kernel log could see an attached archive (for laptop).
>
> --
> Best Regards,
> Mike Gavrilov.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bug/6.3-rc4/bisected] WARNING at cooling_device_stats_setup+0xac caused by commit 790930f44289c8209c57461b2db499fcc702e0b3
  2023-03-30 10:07 ` Rafael J. Wysocki
@ 2023-03-30 13:25   ` Mikhail Gavrilov
  0 siblings, 0 replies; 3+ messages in thread
From: Mikhail Gavrilov @ 2023-03-30 13:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: rafael.j.wysocki, rui.zhang, Linux List Kernel Mailing,
	daniel.lezcano, linux-pm

On Thu, Mar 30, 2023 at 3:07 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Mar 30, 2023 at 9:52 AM Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:
> >
> > Hi,
> > The release 6.3-rc4 brings new warning messages to log:
>
> Thanks for the report, please see this patch:
>
> https://patchwork.kernel.org/project/linux-pm/patch/2681615.mvXUDI8C0e@kreacher/

Thanks, after applying this patch the issue gone on all my PCs.

Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-03-30 13:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-30  7:52 [bug/6.3-rc4/bisected] WARNING at cooling_device_stats_setup+0xac caused by commit 790930f44289c8209c57461b2db499fcc702e0b3 Mikhail Gavrilov
2023-03-30 10:07 ` Rafael J. Wysocki
2023-03-30 13:25   ` Mikhail Gavrilov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.