linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized
@ 2022-07-22  8:43 Di Shen
  2022-07-22 17:18 ` Rafael J. Wysocki
  0 siblings, 1 reply; 8+ messages in thread
From: Di Shen @ 2022-07-22  8:43 UTC (permalink / raw)
  To: rafael, daniel.lezcano, rui.zhang
  Cc: viresh.kumar, amitk, linux-pm, linux-kernel, jeson.gao,
	xuewen.yan, cindygm567

There's a space allocated for cooling_device_stats_attr_group
within cooling_device_attr_groups. This space is shared by all
cooling devices.

If the stats structure of one cooling device successfully
creates stats sysfs. After that, another cooling device fails
to get max_states in cooling_device_stats_setup(). It can
return directly without initializing the stats structure, but
the cooling_device_stats_attr_group is still the attribute
group of the last cooling device.

At this time, read or write stats sysfs nodes can cause kernel
crash. Like the following, kernel crashed when
'cat time_in_state_ms'.

[<5baac8d4>] panic+0x1b4/0x3c8
[<9d287b0f>] arm_notify_die+0x0/0x78
[<094fc22c>] __do_kernel_fault+0x94/0xa4
[<3b4b69a4>] do_page_fault+0xd4/0x364
[<23793e7a>] do_translation_fault+0x38/0xc0
[<6e5cc52a>] do_DataAbort+0x4c/0xd0
[<a28c16b8>] __dabt_svc+0x5c/0xa0
[<747516ae>] _raw_spin_lock+0x20/0x60
[<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
[<cb78325e>] dev_attr_show+0x38/0x64
[<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
[<c0a843ab>] seq_read+0x244/0x620
[<b316b374>] vfs_read+0xd8/0x218
[<3aebf5fa>] sys_read+0x80/0xe4
[<7cf100f5>] ret_fast_syscall+0x0/0x28
[<08cbe22f>] 0xbe8c1198

stats sysfs:
phone:/sys/class/thermal/cooling_device2/stats # ls
reset  time_in_state_ms  total_trans  trans_table

The same as cat total_trans, trans_table, and echo reset.

To avoid kernel crash, this patch set clears the
cooling_device_attr_groups before stats structure is initialized.

Signed-off-by: Di Shen <di.shen@unisoc.com>
---
 drivers/thermal/thermal_sysfs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index 1c4aac8464a7..e3fae63fa0f7 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -817,6 +817,9 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
 	unsigned long states;
 	int var;
 
+	var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
+	cooling_device_attr_groups[var] = NULL;
+
 	if (cdev->ops->get_max_state(cdev, &states))
 		return;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized
  2022-07-22  8:43 [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized Di Shen
@ 2022-07-22 17:18 ` Rafael J. Wysocki
  2022-07-22 18:42   ` Rafael J. Wysocki
  2022-07-26  7:31   ` Di Shen
  0 siblings, 2 replies; 8+ messages in thread
From: Rafael J. Wysocki @ 2022-07-22 17:18 UTC (permalink / raw)
  To: Di Shen
  Cc: Rafael J. Wysocki, Daniel Lezcano, Zhang, Rui, Viresh Kumar,
	Amit Kucheria, Linux PM, Linux Kernel Mailing List, jeson.gao,
	xuewen.yan, cindygm567

On Fri, Jul 22, 2022 at 10:44 AM Di Shen <di.shen@unisoc.com> wrote:
>
> There's a space allocated for cooling_device_stats_attr_group
> within cooling_device_attr_groups. This space is shared by all
> cooling devices.

That's correct.

> If the stats structure of one cooling device successfully
> creates stats sysfs. After that, another cooling device fails
> to get max_states in cooling_device_stats_setup(). It can
> return directly without initializing the stats structure, but
> the cooling_device_stats_attr_group is still the attribute
> group of the last cooling device.

I cannot parse the above, sorry.

For example, how can a "stats structure of one cooling device" create
anything?   As a data structure, it is a passive entity, so it doesn't
carry out any actions.

I think (but I am not sure) that you are referring to the error code
path in which the ->get_max_state() callback fails for a cooling
device after cooling_device_stats_setup() has completed successfully
for another one.

> At this time, read or write stats sysfs nodes can cause kernel
> crash. Like the following, kernel crashed when
> 'cat time_in_state_ms'.
>
> [<5baac8d4>] panic+0x1b4/0x3c8
> [<9d287b0f>] arm_notify_die+0x0/0x78
> [<094fc22c>] __do_kernel_fault+0x94/0xa4
> [<3b4b69a4>] do_page_fault+0xd4/0x364
> [<23793e7a>] do_translation_fault+0x38/0xc0
> [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> [<a28c16b8>] __dabt_svc+0x5c/0xa0
> [<747516ae>] _raw_spin_lock+0x20/0x60
> [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> [<cb78325e>] dev_attr_show+0x38/0x64
> [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> [<c0a843ab>] seq_read+0x244/0x620
> [<b316b374>] vfs_read+0xd8/0x218
> [<3aebf5fa>] sys_read+0x80/0xe4
> [<7cf100f5>] ret_fast_syscall+0x0/0x28
> [<08cbe22f>] 0xbe8c1198
>
> stats sysfs:
> phone:/sys/class/thermal/cooling_device2/stats # ls
> reset  time_in_state_ms  total_trans  trans_table
>
> The same as cat total_trans, trans_table, and echo reset.
>
> To avoid kernel crash, this patch set clears the
> cooling_device_attr_groups before stats structure is initialized.
>
> Signed-off-by: Di Shen <di.shen@unisoc.com>
> ---
>  drivers/thermal/thermal_sysfs.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> index 1c4aac8464a7..e3fae63fa0f7 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -817,6 +817,9 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
>         unsigned long states;
>         int var;
>
> +       var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
> +       cooling_device_attr_groups[var] = NULL;
> +
>         if (cdev->ops->get_max_state(cdev, &states))
>                 return;
>
> --

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized
  2022-07-22 17:18 ` Rafael J. Wysocki
@ 2022-07-22 18:42   ` Rafael J. Wysocki
  2022-07-26  7:39     ` Di Shen
  2022-07-26  7:31   ` Di Shen
  1 sibling, 1 reply; 8+ messages in thread
From: Rafael J. Wysocki @ 2022-07-22 18:42 UTC (permalink / raw)
  To: Di Shen
  Cc: Rafael J. Wysocki, Daniel Lezcano, Zhang, Rui, Viresh Kumar,
	Amit Kucheria, Linux PM, Linux Kernel Mailing List, jeson.gao,
	xuewen.yan, cindygm567

On Friday, July 22, 2022 7:18:42 PM CEST Rafael J. Wysocki wrote:
> On Fri, Jul 22, 2022 at 10:44 AM Di Shen <di.shen@unisoc.com> wrote:
> >
> > There's a space allocated for cooling_device_stats_attr_group
> > within cooling_device_attr_groups. This space is shared by all
> > cooling devices.
> 
> That's correct.
> 
> > If the stats structure of one cooling device successfully
> > creates stats sysfs. After that, another cooling device fails
> > to get max_states in cooling_device_stats_setup(). It can
> > return directly without initializing the stats structure, but
> > the cooling_device_stats_attr_group is still the attribute
> > group of the last cooling device.
> 
> I cannot parse the above, sorry.
> 
> For example, how can a "stats structure of one cooling device" create
> anything?   As a data structure, it is a passive entity, so it doesn't
> carry out any actions.
> 
> I think (but I am not sure) that you are referring to the error code
> path in which the ->get_max_state() callback fails for a cooling
> device after cooling_device_stats_setup() has completed successfully
> for another one.
> 
> > At this time, read or write stats sysfs nodes can cause kernel
> > crash. Like the following, kernel crashed when
> > 'cat time_in_state_ms'.
> >
> > [<5baac8d4>] panic+0x1b4/0x3c8
> > [<9d287b0f>] arm_notify_die+0x0/0x78
> > [<094fc22c>] __do_kernel_fault+0x94/0xa4
> > [<3b4b69a4>] do_page_fault+0xd4/0x364
> > [<23793e7a>] do_translation_fault+0x38/0xc0
> > [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> > [<a28c16b8>] __dabt_svc+0x5c/0xa0
> > [<747516ae>] _raw_spin_lock+0x20/0x60
> > [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> > [<cb78325e>] dev_attr_show+0x38/0x64
> > [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> > [<c0a843ab>] seq_read+0x244/0x620
> > [<b316b374>] vfs_read+0xd8/0x218
> > [<3aebf5fa>] sys_read+0x80/0xe4
> > [<7cf100f5>] ret_fast_syscall+0x0/0x28
> > [<08cbe22f>] 0xbe8c1198
> >
> > stats sysfs:
> > phone:/sys/class/thermal/cooling_device2/stats # ls
> > reset  time_in_state_ms  total_trans  trans_table
> >
> > The same as cat total_trans, trans_table, and echo reset.

So does the (untested) patch below work too?

---
 drivers/thermal/thermal_sysfs.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/thermal/thermal_sysfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_sysfs.c
+++ linux-pm/drivers/thermal/thermal_sysfs.c
@@ -813,12 +813,13 @@ static const struct attribute_group cool
 
 static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
 {
+	const struct attribute_group *stats_attr_group = NULL;
 	struct cooling_dev_stats *stats;
 	unsigned long states;
 	int var;
 
 	if (cdev->ops->get_max_state(cdev, &states))
-		return;
+		goto out;
 
 	states++; /* Total number of states is highest state + 1 */
 
@@ -828,7 +829,7 @@ static void cooling_device_stats_setup(s
 
 	stats = kzalloc(var, GFP_KERNEL);
 	if (!stats)
-		return;
+		goto out;
 
 	stats->time_in_state = (ktime_t *)(stats + 1);
 	stats->trans_table = (unsigned int *)(stats->time_in_state + states);
@@ -838,9 +839,12 @@ static void cooling_device_stats_setup(s
 
 	spin_lock_init(&stats->lock);
 
+	stats_attr_group = &cooling_device_stats_attr_group;
+
+out:
 	/* Fill the empty slot left in cooling_device_attr_groups */
 	var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
-	cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
+	cooling_device_attr_groups[var] = stats_attr_group;
 }
 
 static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized
  2022-07-22 17:18 ` Rafael J. Wysocki
  2022-07-22 18:42   ` Rafael J. Wysocki
@ 2022-07-26  7:31   ` Di Shen
  1 sibling, 0 replies; 8+ messages in thread
From: Di Shen @ 2022-07-26  7:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Di Shen, Daniel Lezcano, Zhang, Rui, Viresh Kumar, Amit Kucheria,
	Linux PM, Linux Kernel Mailing List, jeson.gao, xuewen.yan,
	ke.wang

On Sat, Jul 23, 2022 at 1:18 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Jul 22, 2022 at 10:44 AM Di Shen <di.shen@unisoc.com> wrote:
> >
> > There's a space allocated for cooling_device_stats_attr_group
> > within cooling_device_attr_groups. This space is shared by all
> > cooling devices.
>
> That's correct.
>
> > If the stats structure of one cooling device successfully
> > creates stats sysfs. After that, another cooling device fails
> > to get max_states in cooling_device_stats_setup(). It can
> > return directly without initializing the stats structure, but
> > the cooling_device_stats_attr_group is still the attribute
> > group of the last cooling device.
>
> I cannot parse the above, sorry.
>
> For example, how can a "stats structure of one cooling device" create
> anything?   As a data structure, it is a passive entity, so it doesn't
> carry out any actions.
>

Sorry, I didn't describe it properly. I mean 'if it has been called back
cooling_device_stats_setup() successfully for a cooling device'.

> I think (but I am not sure) that you are referring to the error code
> path in which the ->get_max_state() callback fails for a cooling
> device after cooling_device_stats_setup() has completed successfully
> for another one.
>

That's it. As you say, ->get_max_state() callback fails for a cooling
device after cooling_device_stats_setup() has completed successfully
for another one.

> > At this time, read or write stats sysfs nodes can cause kernel
> > crash. Like the following, kernel crashed when
> > 'cat time_in_state_ms'.
> >
> > [<5baac8d4>] panic+0x1b4/0x3c8
> > [<9d287b0f>] arm_notify_die+0x0/0x78
> > [<094fc22c>] __do_kernel_fault+0x94/0xa4
> > [<3b4b69a4>] do_page_fault+0xd4/0x364
> > [<23793e7a>] do_translation_fault+0x38/0xc0
> > [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> > [<a28c16b8>] __dabt_svc+0x5c/0xa0
> > [<747516ae>] _raw_spin_lock+0x20/0x60
> > [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> > [<cb78325e>] dev_attr_show+0x38/0x64
> > [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> > [<c0a843ab>] seq_read+0x244/0x620
> > [<b316b374>] vfs_read+0xd8/0x218
> > [<3aebf5fa>] sys_read+0x80/0xe4
> > [<7cf100f5>] ret_fast_syscall+0x0/0x28
> > [<08cbe22f>] 0xbe8c1198
> >
> > stats sysfs:
> > phone:/sys/class/thermal/cooling_device2/stats # ls
> > reset  time_in_state_ms  total_trans  trans_table
> >
> > The same as cat total_trans, trans_table, and echo reset.
> >
> > To avoid kernel crash, this patch set clears the
> > cooling_device_attr_groups before stats structure is initialized.
> >
> > Signed-off-by: Di Shen <di.shen@unisoc.com>
> > ---
> >  drivers/thermal/thermal_sysfs.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> > index 1c4aac8464a7..e3fae63fa0f7 100644
> > --- a/drivers/thermal/thermal_sysfs.c
> > +++ b/drivers/thermal/thermal_sysfs.c
> > @@ -817,6 +817,9 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
> >         unsigned long states;
> >         int var;
> >
> > +       var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
> > +       cooling_device_attr_groups[var] = NULL;
> > +
> >         if (cdev->ops->get_max_state(cdev, &states))
> >                 return;
> >
> > --

--
Best regards,
Di



Di

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized
  2022-07-22 18:42   ` Rafael J. Wysocki
@ 2022-07-26  7:39     ` Di Shen
  2022-07-27  8:16       ` Di Shen
  0 siblings, 1 reply; 8+ messages in thread
From: Di Shen @ 2022-07-26  7:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Di Shen, Rafael J. Wysocki, Daniel Lezcano, Zhang, Rui,
	Viresh Kumar, Amit Kucheria, Linux PM, Linux Kernel Mailing List,
	jeson.gao, xuewen.yan, ke.wang

On Sat, Jul 23, 2022 at 2:42 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> On Friday, July 22, 2022 7:18:42 PM CEST Rafael J. Wysocki wrote:
> > On Fri, Jul 22, 2022 at 10:44 AM Di Shen <di.shen@unisoc.com> wrote:
> > >
> > > There's a space allocated for cooling_device_stats_attr_group
> > > within cooling_device_attr_groups. This space is shared by all
> > > cooling devices.
> >
> > That's correct.
> >
> > > If the stats structure of one cooling device successfully
> > > creates stats sysfs. After that, another cooling device fails
> > > to get max_states in cooling_device_stats_setup(). It can
> > > return directly without initializing the stats structure, but
> > > the cooling_device_stats_attr_group is still the attribute
> > > group of the last cooling device.
> >
> > I cannot parse the above, sorry.
> >
> > For example, how can a "stats structure of one cooling device" create
> > anything?   As a data structure, it is a passive entity, so it doesn't
> > carry out any actions.
> >
> > I think (but I am not sure) that you are referring to the error code
> > path in which the ->get_max_state() callback fails for a cooling
> > device after cooling_device_stats_setup() has completed successfully
> > for another one.
> >
> > > At this time, read or write stats sysfs nodes can cause kernel
> > > crash. Like the following, kernel crashed when
> > > 'cat time_in_state_ms'.
> > >
> > > [<5baac8d4>] panic+0x1b4/0x3c8
> > > [<9d287b0f>] arm_notify_die+0x0/0x78
> > > [<094fc22c>] __do_kernel_fault+0x94/0xa4
> > > [<3b4b69a4>] do_page_fault+0xd4/0x364
> > > [<23793e7a>] do_translation_fault+0x38/0xc0
> > > [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> > > [<a28c16b8>] __dabt_svc+0x5c/0xa0
> > > [<747516ae>] _raw_spin_lock+0x20/0x60
> > > [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> > > [<cb78325e>] dev_attr_show+0x38/0x64
> > > [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> > > [<c0a843ab>] seq_read+0x244/0x620
> > > [<b316b374>] vfs_read+0xd8/0x218
> > > [<3aebf5fa>] sys_read+0x80/0xe4
> > > [<7cf100f5>] ret_fast_syscall+0x0/0x28
> > > [<08cbe22f>] 0xbe8c1198
> > >
> > > stats sysfs:
> > > phone:/sys/class/thermal/cooling_device2/stats # ls
> > > reset  time_in_state_ms  total_trans  trans_table
> > >
> > > The same as cat total_trans, trans_table, and echo reset.
>
> So does the (untested) patch below work too?
>

Yes, I agree with you. I will test it on our platform and give
a reply later. Thanks.

> ---
>  drivers/thermal/thermal_sysfs.c |   10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_sysfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_sysfs.c
> +++ linux-pm/drivers/thermal/thermal_sysfs.c
> @@ -813,12 +813,13 @@ static const struct attribute_group cool
>
>  static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
>  {
> +       const struct attribute_group *stats_attr_group = NULL;
>         struct cooling_dev_stats *stats;
>         unsigned long states;
>         int var;
>
>         if (cdev->ops->get_max_state(cdev, &states))
> -               return;
> +               goto out;
>
>         states++; /* Total number of states is highest state + 1 */
>
> @@ -828,7 +829,7 @@ static void cooling_device_stats_setup(s
>
>         stats = kzalloc(var, GFP_KERNEL);
>         if (!stats)
> -               return;
> +               goto out;
>
>         stats->time_in_state = (ktime_t *)(stats + 1);
>         stats->trans_table = (unsigned int *)(stats->time_in_state + states);
> @@ -838,9 +839,12 @@ static void cooling_device_stats_setup(s
>
>         spin_lock_init(&stats->lock);
>
> +       stats_attr_group = &cooling_device_stats_attr_group;
> +
> +out:
>         /* Fill the empty slot left in cooling_device_attr_groups */
>         var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
> -       cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
> +       cooling_device_attr_groups[var] = stats_attr_group;
>  }
>
>  static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)
>
>
>
--
Best regards,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized
  2022-07-26  7:39     ` Di Shen
@ 2022-07-27  8:16       ` Di Shen
  2022-07-27 14:20         ` Rafael J. Wysocki
  0 siblings, 1 reply; 8+ messages in thread
From: Di Shen @ 2022-07-27  8:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Di Shen, Rafael J. Wysocki, Daniel Lezcano, Zhang, Rui,
	Viresh Kumar, Amit Kucheria, Linux PM, Linux Kernel Mailing List,
	jeson.gao, xuewen.yan, ke.wang

Hi Rafael,
I have tested this patch on our platform, and it works. Later, I will
send the patch v3.
Thanks!
--
Di

On Tue, Jul 26, 2022 at 3:39 PM Di Shen <cindygm567@gmail.com> wrote:
>
> On Sat, Jul 23, 2022 at 2:42 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > On Friday, July 22, 2022 7:18:42 PM CEST Rafael J. Wysocki wrote:
> > > On Fri, Jul 22, 2022 at 10:44 AM Di Shen <di.shen@unisoc.com> wrote:
> > > >
> > > > There's a space allocated for cooling_device_stats_attr_group
> > > > within cooling_device_attr_groups. This space is shared by all
> > > > cooling devices.
> > >
> > > That's correct.
> > >
> > > > If the stats structure of one cooling device successfully
> > > > creates stats sysfs. After that, another cooling device fails
> > > > to get max_states in cooling_device_stats_setup(). It can
> > > > return directly without initializing the stats structure, but
> > > > the cooling_device_stats_attr_group is still the attribute
> > > > group of the last cooling device.
> > >
> > > I cannot parse the above, sorry.
> > >
> > > For example, how can a "stats structure of one cooling device" create
> > > anything?   As a data structure, it is a passive entity, so it doesn't
> > > carry out any actions.
> > >
> > > I think (but I am not sure) that you are referring to the error code
> > > path in which the ->get_max_state() callback fails for a cooling
> > > device after cooling_device_stats_setup() has completed successfully
> > > for another one.
> > >
> > > > At this time, read or write stats sysfs nodes can cause kernel
> > > > crash. Like the following, kernel crashed when
> > > > 'cat time_in_state_ms'.
> > > >
> > > > [<5baac8d4>] panic+0x1b4/0x3c8
> > > > [<9d287b0f>] arm_notify_die+0x0/0x78
> > > > [<094fc22c>] __do_kernel_fault+0x94/0xa4
> > > > [<3b4b69a4>] do_page_fault+0xd4/0x364
> > > > [<23793e7a>] do_translation_fault+0x38/0xc0
> > > > [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> > > > [<a28c16b8>] __dabt_svc+0x5c/0xa0
> > > > [<747516ae>] _raw_spin_lock+0x20/0x60
> > > > [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> > > > [<cb78325e>] dev_attr_show+0x38/0x64
> > > > [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> > > > [<c0a843ab>] seq_read+0x244/0x620
> > > > [<b316b374>] vfs_read+0xd8/0x218
> > > > [<3aebf5fa>] sys_read+0x80/0xe4
> > > > [<7cf100f5>] ret_fast_syscall+0x0/0x28
> > > > [<08cbe22f>] 0xbe8c1198
> > > >
> > > > stats sysfs:
> > > > phone:/sys/class/thermal/cooling_device2/stats # ls
> > > > reset  time_in_state_ms  total_trans  trans_table
> > > >
> > > > The same as cat total_trans, trans_table, and echo reset.
> >
> > So does the (untested) patch below work too?
> >
>
> Yes, I agree with you. I will test it on our platform and give
> a reply later. Thanks.
>
> > ---
> >  drivers/thermal/thermal_sysfs.c |   10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> >
> > Index: linux-pm/drivers/thermal/thermal_sysfs.c
> > ===================================================================
> > --- linux-pm.orig/drivers/thermal/thermal_sysfs.c
> > +++ linux-pm/drivers/thermal/thermal_sysfs.c
> > @@ -813,12 +813,13 @@ static const struct attribute_group cool
> >
> >  static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
> >  {
> > +       const struct attribute_group *stats_attr_group = NULL;
> >         struct cooling_dev_stats *stats;
> >         unsigned long states;
> >         int var;
> >
> >         if (cdev->ops->get_max_state(cdev, &states))
> > -               return;
> > +               goto out;
> >
> >         states++; /* Total number of states is highest state + 1 */
> >
> > @@ -828,7 +829,7 @@ static void cooling_device_stats_setup(s
> >
> >         stats = kzalloc(var, GFP_KERNEL);
> >         if (!stats)
> > -               return;
> > +               goto out;
> >
> >         stats->time_in_state = (ktime_t *)(stats + 1);
> >         stats->trans_table = (unsigned int *)(stats->time_in_state + states);
> > @@ -838,9 +839,12 @@ static void cooling_device_stats_setup(s
> >
> >         spin_lock_init(&stats->lock);
> >
> > +       stats_attr_group = &cooling_device_stats_attr_group;
> > +
> > +out:
> >         /* Fill the empty slot left in cooling_device_attr_groups */
> >         var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
> > -       cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
> > +       cooling_device_attr_groups[var] = stats_attr_group;
> >  }
> >
> >  static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)
> >
> >
> >
> --
> Best regards,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized
  2022-07-27  8:16       ` Di Shen
@ 2022-07-27 14:20         ` Rafael J. Wysocki
  2022-07-28  1:54           ` Di Shen
  0 siblings, 1 reply; 8+ messages in thread
From: Rafael J. Wysocki @ 2022-07-27 14:20 UTC (permalink / raw)
  To: Di Shen
  Cc: Rafael J. Wysocki, Di Shen, Rafael J. Wysocki, Daniel Lezcano,
	Zhang, Rui, Viresh Kumar, Amit Kucheria, Linux PM,
	Linux Kernel Mailing List, jeson.gao, xuewen.yan, ke.wang

On Wed, Jul 27, 2022 at 10:17 AM Di Shen <cindygm567@gmail.com> wrote:
>
> Hi Rafael,
> I have tested this patch on our platform, and it works. Later, I will
> send the patch v3.

Well, no need.  I'll use the one that you've just tested.

Thanks!


> On Tue, Jul 26, 2022 at 3:39 PM Di Shen <cindygm567@gmail.com> wrote:
> >
> > On Sat, Jul 23, 2022 at 2:42 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > On Friday, July 22, 2022 7:18:42 PM CEST Rafael J. Wysocki wrote:
> > > > On Fri, Jul 22, 2022 at 10:44 AM Di Shen <di.shen@unisoc.com> wrote:
> > > > >
> > > > > There's a space allocated for cooling_device_stats_attr_group
> > > > > within cooling_device_attr_groups. This space is shared by all
> > > > > cooling devices.
> > > >
> > > > That's correct.
> > > >
> > > > > If the stats structure of one cooling device successfully
> > > > > creates stats sysfs. After that, another cooling device fails
> > > > > to get max_states in cooling_device_stats_setup(). It can
> > > > > return directly without initializing the stats structure, but
> > > > > the cooling_device_stats_attr_group is still the attribute
> > > > > group of the last cooling device.
> > > >
> > > > I cannot parse the above, sorry.
> > > >
> > > > For example, how can a "stats structure of one cooling device" create
> > > > anything?   As a data structure, it is a passive entity, so it doesn't
> > > > carry out any actions.
> > > >
> > > > I think (but I am not sure) that you are referring to the error code
> > > > path in which the ->get_max_state() callback fails for a cooling
> > > > device after cooling_device_stats_setup() has completed successfully
> > > > for another one.
> > > >
> > > > > At this time, read or write stats sysfs nodes can cause kernel
> > > > > crash. Like the following, kernel crashed when
> > > > > 'cat time_in_state_ms'.
> > > > >
> > > > > [<5baac8d4>] panic+0x1b4/0x3c8
> > > > > [<9d287b0f>] arm_notify_die+0x0/0x78
> > > > > [<094fc22c>] __do_kernel_fault+0x94/0xa4
> > > > > [<3b4b69a4>] do_page_fault+0xd4/0x364
> > > > > [<23793e7a>] do_translation_fault+0x38/0xc0
> > > > > [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> > > > > [<a28c16b8>] __dabt_svc+0x5c/0xa0
> > > > > [<747516ae>] _raw_spin_lock+0x20/0x60
> > > > > [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> > > > > [<cb78325e>] dev_attr_show+0x38/0x64
> > > > > [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> > > > > [<c0a843ab>] seq_read+0x244/0x620
> > > > > [<b316b374>] vfs_read+0xd8/0x218
> > > > > [<3aebf5fa>] sys_read+0x80/0xe4
> > > > > [<7cf100f5>] ret_fast_syscall+0x0/0x28
> > > > > [<08cbe22f>] 0xbe8c1198
> > > > >
> > > > > stats sysfs:
> > > > > phone:/sys/class/thermal/cooling_device2/stats # ls
> > > > > reset  time_in_state_ms  total_trans  trans_table
> > > > >
> > > > > The same as cat total_trans, trans_table, and echo reset.
> > >
> > > So does the (untested) patch below work too?
> > >
> >
> > Yes, I agree with you. I will test it on our platform and give
> > a reply later. Thanks.
> >
> > > ---
> > >  drivers/thermal/thermal_sysfs.c |   10 +++++++---
> > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > >
> > > Index: linux-pm/drivers/thermal/thermal_sysfs.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/thermal/thermal_sysfs.c
> > > +++ linux-pm/drivers/thermal/thermal_sysfs.c
> > > @@ -813,12 +813,13 @@ static const struct attribute_group cool
> > >
> > >  static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
> > >  {
> > > +       const struct attribute_group *stats_attr_group = NULL;
> > >         struct cooling_dev_stats *stats;
> > >         unsigned long states;
> > >         int var;
> > >
> > >         if (cdev->ops->get_max_state(cdev, &states))
> > > -               return;
> > > +               goto out;
> > >
> > >         states++; /* Total number of states is highest state + 1 */
> > >
> > > @@ -828,7 +829,7 @@ static void cooling_device_stats_setup(s
> > >
> > >         stats = kzalloc(var, GFP_KERNEL);
> > >         if (!stats)
> > > -               return;
> > > +               goto out;
> > >
> > >         stats->time_in_state = (ktime_t *)(stats + 1);
> > >         stats->trans_table = (unsigned int *)(stats->time_in_state + states);
> > > @@ -838,9 +839,12 @@ static void cooling_device_stats_setup(s
> > >
> > >         spin_lock_init(&stats->lock);
> > >
> > > +       stats_attr_group = &cooling_device_stats_attr_group;
> > > +
> > > +out:
> > >         /* Fill the empty slot left in cooling_device_attr_groups */
> > >         var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
> > > -       cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
> > > +       cooling_device_attr_groups[var] = stats_attr_group;
> > >  }
> > >
> > >  static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)
> > >
> > >
> > >
> > --
> > Best regards,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized
  2022-07-27 14:20         ` Rafael J. Wysocki
@ 2022-07-28  1:54           ` Di Shen
  0 siblings, 0 replies; 8+ messages in thread
From: Di Shen @ 2022-07-28  1:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Di Shen, Daniel Lezcano, Zhang, Rui,
	Viresh Kumar, Amit Kucheria, Linux PM, Linux Kernel Mailing List,
	jeson.gao, xuewen.yan, ke.wang

Ok, thanks.

On Wed, Jul 27, 2022 at 10:20 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Jul 27, 2022 at 10:17 AM Di Shen <cindygm567@gmail.com> wrote:
> >
> > Hi Rafael,
> > I have tested this patch on our platform, and it works. Later, I will
> > send the patch v3.
>
> Well, no need.  I'll use the one that you've just tested.
>
> Thanks!
>
>
> > On Tue, Jul 26, 2022 at 3:39 PM Di Shen <cindygm567@gmail.com> wrote:
> > >
> > > On Sat, Jul 23, 2022 at 2:42 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >
> > > > On Friday, July 22, 2022 7:18:42 PM CEST Rafael J. Wysocki wrote:
> > > > > On Fri, Jul 22, 2022 at 10:44 AM Di Shen <di.shen@unisoc.com> wrote:
> > > > > >
> > > > > > There's a space allocated for cooling_device_stats_attr_group
> > > > > > within cooling_device_attr_groups. This space is shared by all
> > > > > > cooling devices.
> > > > >
> > > > > That's correct.
> > > > >
> > > > > > If the stats structure of one cooling device successfully
> > > > > > creates stats sysfs. After that, another cooling device fails
> > > > > > to get max_states in cooling_device_stats_setup(). It can
> > > > > > return directly without initializing the stats structure, but
> > > > > > the cooling_device_stats_attr_group is still the attribute
> > > > > > group of the last cooling device.
> > > > >
> > > > > I cannot parse the above, sorry.
> > > > >
> > > > > For example, how can a "stats structure of one cooling device" create
> > > > > anything?   As a data structure, it is a passive entity, so it doesn't
> > > > > carry out any actions.
> > > > >
> > > > > I think (but I am not sure) that you are referring to the error code
> > > > > path in which the ->get_max_state() callback fails for a cooling
> > > > > device after cooling_device_stats_setup() has completed successfully
> > > > > for another one.
> > > > >
> > > > > > At this time, read or write stats sysfs nodes can cause kernel
> > > > > > crash. Like the following, kernel crashed when
> > > > > > 'cat time_in_state_ms'.
> > > > > >
> > > > > > [<5baac8d4>] panic+0x1b4/0x3c8
> > > > > > [<9d287b0f>] arm_notify_die+0x0/0x78
> > > > > > [<094fc22c>] __do_kernel_fault+0x94/0xa4
> > > > > > [<3b4b69a4>] do_page_fault+0xd4/0x364
> > > > > > [<23793e7a>] do_translation_fault+0x38/0xc0
> > > > > > [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> > > > > > [<a28c16b8>] __dabt_svc+0x5c/0xa0
> > > > > > [<747516ae>] _raw_spin_lock+0x20/0x60
> > > > > > [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> > > > > > [<cb78325e>] dev_attr_show+0x38/0x64
> > > > > > [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> > > > > > [<c0a843ab>] seq_read+0x244/0x620
> > > > > > [<b316b374>] vfs_read+0xd8/0x218
> > > > > > [<3aebf5fa>] sys_read+0x80/0xe4
> > > > > > [<7cf100f5>] ret_fast_syscall+0x0/0x28
> > > > > > [<08cbe22f>] 0xbe8c1198
> > > > > >
> > > > > > stats sysfs:
> > > > > > phone:/sys/class/thermal/cooling_device2/stats # ls
> > > > > > reset  time_in_state_ms  total_trans  trans_table
> > > > > >
> > > > > > The same as cat total_trans, trans_table, and echo reset.
> > > >
> > > > So does the (untested) patch below work too?
> > > >
> > >
> > > Yes, I agree with you. I will test it on our platform and give
> > > a reply later. Thanks.
> > >
> > > > ---
> > > >  drivers/thermal/thermal_sysfs.c |   10 +++++++---
> > > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > > >
> > > > Index: linux-pm/drivers/thermal/thermal_sysfs.c
> > > > ===================================================================
> > > > --- linux-pm.orig/drivers/thermal/thermal_sysfs.c
> > > > +++ linux-pm/drivers/thermal/thermal_sysfs.c
> > > > @@ -813,12 +813,13 @@ static const struct attribute_group cool
> > > >
> > > >  static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
> > > >  {
> > > > +       const struct attribute_group *stats_attr_group = NULL;
> > > >         struct cooling_dev_stats *stats;
> > > >         unsigned long states;
> > > >         int var;
> > > >
> > > >         if (cdev->ops->get_max_state(cdev, &states))
> > > > -               return;
> > > > +               goto out;
> > > >
> > > >         states++; /* Total number of states is highest state + 1 */
> > > >
> > > > @@ -828,7 +829,7 @@ static void cooling_device_stats_setup(s
> > > >
> > > >         stats = kzalloc(var, GFP_KERNEL);
> > > >         if (!stats)
> > > > -               return;
> > > > +               goto out;
> > > >
> > > >         stats->time_in_state = (ktime_t *)(stats + 1);
> > > >         stats->trans_table = (unsigned int *)(stats->time_in_state + states);
> > > > @@ -838,9 +839,12 @@ static void cooling_device_stats_setup(s
> > > >
> > > >         spin_lock_init(&stats->lock);
> > > >
> > > > +       stats_attr_group = &cooling_device_stats_attr_group;
> > > > +
> > > > +out:
> > > >         /* Fill the empty slot left in cooling_device_attr_groups */
> > > >         var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
> > > > -       cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
> > > > +       cooling_device_attr_groups[var] = stats_attr_group;
> > > >  }
> > > >
> > > >  static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)
> > > >
> > > >
> > > >
> > > --
> > > Best regards,

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-07-28  1:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-22  8:43 [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized Di Shen
2022-07-22 17:18 ` Rafael J. Wysocki
2022-07-22 18:42   ` Rafael J. Wysocki
2022-07-26  7:39     ` Di Shen
2022-07-27  8:16       ` Di Shen
2022-07-27 14:20         ` Rafael J. Wysocki
2022-07-28  1:54           ` Di Shen
2022-07-26  7:31   ` Di Shen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).