All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] thermal/sysfs: Clear the slot left in cooling_device_attr_groups
@ 2022-06-20 13:19 Di Shen
  2022-06-22 13:55 ` Rafael J. Wysocki
  0 siblings, 1 reply; 3+ messages in thread
From: Di Shen @ 2022-06-20 13:19 UTC (permalink / raw)
  To: rafael, rui.zhang
  Cc: xuewen.yan, cindygm567, daniel.lezcano, amitk, linux-pm,
	linux-kernel, lukasz.luba

There's a space allocated for cooling_device_stats_attr_group within co-
oling_device_attr_groups. This space is shared by all cooling devices.

If not clear this space before cooling_device_stats_attr_group is initi-
alized, the next cooling device will still create stats sysfs file node.
At this time, read or write related nodes will cause kernel crash.

[exception_reboot_reason]: kernel_crash
[exception_panic_reason]: Fatal exception
[exception_time]: 2022-06-07_06-32-25
[exception_file_info]: not-bugon
[exception_task_id]: 3154
[exception_task_family]: [cat, 3154][sh, 2597][sh, 2362][adbd, 1804]
[exception_pc_symbol]: [<747516ae>] _raw_spin_lock+0x20/0x60
[exception_stack_info]: [<4cbe9ac1>] prepare_exception_info+0x19c/0x1a4
[<03041be7>] sysdump_panic_event+0x57c/0x6f4
[<b16f865e>] atomic_notifier_call_chain+0x48/0x7c
[<5baac8d4>] panic+0x1b4/0x3c8
[<9d287b0f>] arm_notify_die+0x0/0x78
[<094fc22c>] __do_kernel_fault+0x94/0xa4
[<3b4b69a4>] do_page_fault+0xd4/0x364
[<23793e7a>] do_translation_fault+0x38/0xc0
[<6e5cc52a>] do_DataAbort+0x4c/0xd0
[<a28c16b8>] __dabt_svc+0x5c/0xa0
[<747516ae>] _raw_spin_lock+0x20/0x60
[<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
[<cb78325e>] dev_attr_show+0x38/0x64
[<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
[<c0a843ab>] seq_read+0x244/0x620
[<b316b374>] vfs_read+0xd8/0x218
[<3aebf5fa>] sys_read+0x80/0xe4
[<7cf100f5>] ret_fast_syscall+0x0/0x28
[<08cbe22f>] 0xbe8c1198

So clear the slot left in cooling_device_attr_groups before cooling_dev-
ice_stats_attr_group is initialized to avoid kernel crash.

Signed-off-by: Di Shen <di.shen@unisoc.com>
---
 drivers/thermal/thermal_sysfs.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index 1c4aac8464a7..fbc3dbc85841 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -817,6 +817,11 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
 	unsigned long states;
 	int var;
 
+	/* Clear the slot left in cooling_device_attr_groups */
+	var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
+	if (cooling_device_attr_groups[var])
+		cooling_device_attr_groups[var] = NULL;
+
 	if (cdev->ops->get_max_state(cdev, &states))
 		return;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] thermal/sysfs: Clear the slot left in cooling_device_attr_groups
  2022-06-20 13:19 [PATCH 1/1] thermal/sysfs: Clear the slot left in cooling_device_attr_groups Di Shen
@ 2022-06-22 13:55 ` Rafael J. Wysocki
  2022-06-23  7:34   ` Di Shen
  0 siblings, 1 reply; 3+ messages in thread
From: Rafael J. Wysocki @ 2022-06-22 13:55 UTC (permalink / raw)
  To: Di Shen
  Cc: Rafael J. Wysocki, Zhang, Rui, xuewen.yan, cindygm567,
	Daniel Lezcano, Amit Kucheria, Linux PM,
	Linux Kernel Mailing List, Lukasz Luba

On Mon, Jun 20, 2022 at 3:20 PM Di Shen <di.shen@unisoc.com> wrote:
>
> There's a space allocated for cooling_device_stats_attr_group within co-

Please don't break words in the changelog (and variable names in
particular) like this.

> oling_device_attr_groups. This space is shared by all cooling devices.
>
> If not clear this space before cooling_device_stats_attr_group is initi-
> alized, the next cooling device will still create stats sysfs file node.
> At this time, read or write related nodes will cause kernel crash.

A bit more of an explanation here wouldn't hurt IMV.  In particular,
what does "the next cooling device" mean and what are "related nodes"?

> [exception_reboot_reason]: kernel_crash
> [exception_panic_reason]: Fatal exception
> [exception_time]: 2022-06-07_06-32-25
> [exception_file_info]: not-bugon
> [exception_task_id]: 3154
> [exception_task_family]: [cat, 3154][sh, 2597][sh, 2362][adbd, 1804]
> [exception_pc_symbol]: [<747516ae>] _raw_spin_lock+0x20/0x60
> [exception_stack_info]: [<4cbe9ac1>] prepare_exception_info+0x19c/0x1a4
> [<03041be7>] sysdump_panic_event+0x57c/0x6f4
> [<b16f865e>] atomic_notifier_call_chain+0x48/0x7c
> [<5baac8d4>] panic+0x1b4/0x3c8
> [<9d287b0f>] arm_notify_die+0x0/0x78
> [<094fc22c>] __do_kernel_fault+0x94/0xa4
> [<3b4b69a4>] do_page_fault+0xd4/0x364
> [<23793e7a>] do_translation_fault+0x38/0xc0
> [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> [<a28c16b8>] __dabt_svc+0x5c/0xa0
> [<747516ae>] _raw_spin_lock+0x20/0x60
> [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> [<cb78325e>] dev_attr_show+0x38/0x64
> [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> [<c0a843ab>] seq_read+0x244/0x620
> [<b316b374>] vfs_read+0xd8/0x218
> [<3aebf5fa>] sys_read+0x80/0xe4
> [<7cf100f5>] ret_fast_syscall+0x0/0x28
> [<08cbe22f>] 0xbe8c1198
>
> So clear the slot left in cooling_device_attr_groups before cooling_dev-
> ice_stats_attr_group is initialized to avoid kernel crash.
>
> Signed-off-by: Di Shen <di.shen@unisoc.com>
> ---
>  drivers/thermal/thermal_sysfs.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> index 1c4aac8464a7..fbc3dbc85841 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -817,6 +817,11 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
>         unsigned long states;
>         int var;
>
> +       /* Clear the slot left in cooling_device_attr_groups */

The comment is a bit too vague IMO.  In particular, what slot do you mean?

> +       var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
> +       if (cooling_device_attr_groups[var])
> +               cooling_device_attr_groups[var] = NULL;

The NULL check above is redundant (it can be cleared even if it is
NULL already) and it all can be done in one code line.

> +
>         if (cdev->ops->get_max_state(cdev, &states))
>                 return;
>
> --

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] thermal/sysfs: Clear the slot left in cooling_device_attr_groups
  2022-06-22 13:55 ` Rafael J. Wysocki
@ 2022-06-23  7:34   ` Di Shen
  0 siblings, 0 replies; 3+ messages in thread
From: Di Shen @ 2022-06-23  7:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Di Shen, Zhang, Rui, xuewen.yan, Daniel Lezcano, Amit Kucheria,
	Linux PM, Linux Kernel Mailing List, Lukasz Luba, jeson.gao,
	zhang.lyra

On Wed, Jun 22, 2022 at 9:55 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Jun 20, 2022 at 3:20 PM Di Shen <di.shen@unisoc.com> wrote:
> >
> > There's a space allocated for cooling_device_stats_attr_group within co-
>
> Please don't break words in the changelog (and variable names in
> particular) like this.
Ok, thank you. I will pay attention to it.
>
> > oling_device_attr_groups. This space is shared by all cooling devices.
> >
> > If not clear this space before cooling_device_stats_attr_group is initi-
> > alized, the next cooling device will still create stats sysfs file nodes.
> > At this time, read or write related nodes will cause kernel crash.
>
> A bit more of an explanation here wouldn't hurt IMV.  In particular,
> what does "the next cooling device" mean and what are "related nodes"?

There are some cooling devices in thermal management. During the
registration of these
cooling devices, they will create "stats" sysfs.

s9863a:/sys/class/thermal/cooling_device2 # cd stats/
s9863a:/sys/class/thermal/cooling_device2/stats # ls
reset  time_in_state_ms  total_trans  trans_table

Reading or writing these property nodes may cause kernel_crash.
For example, cat time_in_state_ms caused a crash because stats is NULL.

The reason why stats is NULL is "failed to get_max_state in
cooling_device_stats_setup()", and
then cooling_device_stats_setup() returned directly without
initializing stats structure.

static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
{
        //.....
        if (cdev->ops->get_max_state(cdev, &states))
                return;
        //.....
        stats = kzalloc(var, GFP_KERNEL);
        //...
        /* Fill the empty slot left in cooling_device_attr_groups */
        var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
        cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
}

static ssize_t
time_in_state_ms_show(struct device *dev, struct device_attribute *attr,
     char *buf)
{
        //......
        spin_lock(&stats->lock); // stats is NULL. Failed to get lock here.
        //....
        spin_unlock(&stats->lock);

        return len;
}
The same as reset_store(), trans_table_show(), total_trans_show().
>
> > [exception_reboot_reason]: kernel_crash
> > [exception_panic_reason]: Fatal exception
> > [exception_time]: 2022-06-07_06-32-25
> > [exception_file_info]: not-bugon
> > [exception_task_id]: 3154
> > [exception_task_family]: [cat, 3154][sh, 2597][sh, 2362][adbd, 1804]
> > [exception_pc_symbol]: [<747516ae>] _raw_spin_lock+0x20/0x60
> > [exception_stack_info]: [<4cbe9ac1>] prepare_exception_info+0x19c/0x1a4
> > [<03041be7>] sysdump_panic_event+0x57c/0x6f4
> > [<b16f865e>] atomic_notifier_call_chain+0x48/0x7c
> > [<5baac8d4>] panic+0x1b4/0x3c8
> > [<9d287b0f>] arm_notify_die+0x0/0x78
> > [<094fc22c>] __do_kernel_fault+0x94/0xa4
> > [<3b4b69a4>] do_page_fault+0xd4/0x364
> > [<23793e7a>] do_translation_fault+0x38/0xc0
> > [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> > [<a28c16b8>] __dabt_svc+0x5c/0xa0
> > [<747516ae>] _raw_spin_lock+0x20/0x60
> > [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> > [<cb78325e>] dev_attr_show+0x38/0x64
> > [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> > [<c0a843ab>] seq_read+0x244/0x620
> > [<b316b374>] vfs_read+0xd8/0x218
> > [<3aebf5fa>] sys_read+0x80/0xe4
> > [<7cf100f5>] ret_fast_syscall+0x0/0x28
> > [<08cbe22f>] 0xbe8c1198
> >

Actually, if succeed to get max state, it will not crash when cat
time_in_state_ms. But it is worth
noting that even if one of the cooling devices failed to get max
state(which means stats structure
will not be initialized), the stats sysfs was still created.

The property nodes of stats sysfs are created according to the
attribute group, like:

static const struct attribute_group *cooling_device_attr_groups[] = {
        &cooling_device_attr_group,
        NULL, /* Space allocated for cooling_device_stats_attr_group */
        NULL,                                   |
};                                                   |
                                                     v
There's a space allocated for cooling_device_stats_attr_group within
cooling_device_attr_groups. This space is shared by all cooling
devices.

static struct attribute *cooling_device_stats_attrs[] = {
        &dev_attr_total_trans.attr,
        &dev_attr_time_in_state_ms.attr,
        &dev_attr_reset.attr,
        &dev_attr_trans_table.attr,
        NULL
};

static const struct attribute_group cooling_device_stats_attr_group = {
        .attrs = cooling_device_stats_attrs,
        .name = "stats"
};

Suppose there are two cooling devices: cooling_device0 and cooling_device1.
The stats structure of cooling_device0 is firstly initialized
successfully, the space allocated for cooling_device_stats_attr_group
will be filled.
And then cooling_device1 fails to initialize stats structure, the
space now still remains the cooling_device_stats_attr_group of
cooling_device0.
The initialization has failed, so why create sysfs? It is
inappropriate to create the sysfs at this time.

> > So clear the slot left in cooling_device_attr_groups before cooling_dev-
> > ice_stats_attr_group is initialized to avoid kernel crash.
> >
> > Signed-off-by: Di Shen <di.shen@unisoc.com>
> > ---
> >  drivers/thermal/thermal_sysfs.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> > index 1c4aac8464a7..fbc3dbc85841 100644
> > --- a/drivers/thermal/thermal_sysfs.c
> > +++ b/drivers/thermal/thermal_sysfs.c
> > @@ -817,6 +817,11 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
> >         unsigned long states;
> >         int var;
> >
> > +       /* Clear the slot left in cooling_device_attr_groups */
>
> The comment is a bit too vague IMO.  In particular, what slot do you mean?

Well, I will remove it. (Slot means a space allocated for
cooling_device_stats_attr_group in cooling_device_attr_groups[])

>
> > +       var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
> > +       if (cooling_device_attr_groups[var])
> > +               cooling_device_attr_groups[var] = NULL;
>
> The NULL check above is redundant (it can be cleared even if it is
> NULL already) and it all can be done in one code line.

OK. I will fix it in the next patchset.

>
> > +
> >         if (cdev->ops->get_max_state(cdev, &states))
> >                 return;
> >
> > --

Thanks!
Di

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-06-23  7:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-20 13:19 [PATCH 1/1] thermal/sysfs: Clear the slot left in cooling_device_attr_groups Di Shen
2022-06-22 13:55 ` Rafael J. Wysocki
2022-06-23  7:34   ` Di Shen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.