linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] thermal: add thermal_zone_remove_device_groups()
@ 2016-12-15 21:47 Yasuaki Ishimatsu
  2016-12-20  4:21 ` Eduardo Valentin
  2017-01-04  4:35 ` Zhang Rui
  0 siblings, 2 replies; 6+ messages in thread
From: Yasuaki Ishimatsu @ 2016-12-15 21:47 UTC (permalink / raw)
  To: linux-kernel, linux-pm; +Cc: rui.zhang, edubezval

When offlining all cores on a CPU, the following system panic
occurs:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: strlen+0x0/0x20
<snip>
Call Trace:
  ? kernfs_name_hash+0x17/0x80
  kernfs_find_ns+0x3f/0xd0
  kernfs_remove_by_name_ns+0x36/0xa0
  remove_files.isra.1+0x36/0x70
  sysfs_remove_group+0x44/0x90
  sysfs_remove_groups+0x2e/0x50
  device_remove_attrs+0x5e/0x90
  device_del+0x1ea/0x350
  device_unregister+0x1a/0x60
  thermal_zone_device_unregister+0x1f2/0x210
  pkg_thermal_cpu_offline+0x14f/0x1a0 [x86_pkg_temp_thermal]
  ? kzalloc.constprop.2+0x10/0x10 [x86_pkg_temp_thermal]
  cpuhp_invoke_callback+0x8d/0x3f0
  cpuhp_down_callbacks+0x42/0x80
  cpuhp_thread_fun+0x8b/0xf0
  smpboot_thread_fn+0x110/0x160
  kthread+0x101/0x140
  ? sort_range+0x30/0x30
  ? kthread_park+0x90/0x90
  ret_from_fork+0x25/0x30

thermal_zone_create_device_group() sets attribute_groups in
thermal_zone_attribute_groups[] to tz->device.groups. But these
attributes_groups do not have name argument.

So when offlining all cores on CPU and executing
thermal_zone_device_unregister(), the panic occurs in strlen()
called from kernfs_name_hash() because name argument is NULL.

The patch adds thermal_zone_remove_device_groups() to free
tz->device.groups and set NULL pointer.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Eduardo Valentin <edubezval@gmail.com>
---
  drivers/thermal/thermal_core.c  | 3 ++-
  drivers/thermal/thermal_core.h  | 1 +
  drivers/thermal/thermal_sysfs.c | 6 ++++++
  3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 641faab..926e385 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1251,6 +1251,7 @@ struct thermal_zone_device *

  unregister:
  	release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
+	thermal_zone_remove_device_groups(tz);
  	device_unregister(&tz->device);
  	return ERR_PTR(result);
  }
@@ -1315,8 +1316,8 @@ void thermal_zone_device_unregister(struct thermal_zone_device *tz)
  	release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
  	idr_destroy(&tz->idr);
  	mutex_destroy(&tz->lock);
+	thermal_zone_remove_device_groups(tz);
  	device_unregister(&tz->device);
-	kfree(tz->device.groups);
  }
  EXPORT_SYMBOL_GPL(thermal_zone_device_unregister);

diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index 2412b37..e3a60db 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -70,6 +70,7 @@ void thermal_zone_device_unbind_exception(struct thermal_zone_device *,
  int thermal_build_list_of_policies(char *buf);

  /* sysfs I/F */
+void thermal_zone_remove_device_groups(struct thermal_zone_device *tz);
  int thermal_zone_create_device_groups(struct thermal_zone_device *, int);
  void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *);
  /* used only at binding time */
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index a694de9..3dfd29b 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -605,6 +605,12 @@ static int create_trip_attrs(struct thermal_zone_device *tz, int mask)
  	return 0;
  }

+void thermal_zone_remove_device_groups(struct thermal_zone_device *tz)
+{
+	kfree(tz->device.groups);
+	tz->device.groups = NULL;
+}
+
  int thermal_zone_create_device_groups(struct thermal_zone_device *tz,
  				      int mask)
  {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] thermal: add thermal_zone_remove_device_groups()
  2016-12-15 21:47 [PATCH] thermal: add thermal_zone_remove_device_groups() Yasuaki Ishimatsu
@ 2016-12-20  4:21 ` Eduardo Valentin
  2016-12-20 15:31   ` Yasuaki Ishimatsu
  2017-01-04  4:35 ` Zhang Rui
  1 sibling, 1 reply; 6+ messages in thread
From: Eduardo Valentin @ 2016-12-20  4:21 UTC (permalink / raw)
  To: Yasuaki Ishimatsu; +Cc: linux-kernel, linux-pm, rui.zhang

Hey Yasuaki San,

On Thu, Dec 15, 2016 at 04:47:08PM -0500, Yasuaki Ishimatsu wrote:
> When offlining all cores on a CPU, the following system panic
> occurs:
> 
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: strlen+0x0/0x20
> <snip>
> Call Trace:
>  ? kernfs_name_hash+0x17/0x80
>  kernfs_find_ns+0x3f/0xd0
>  kernfs_remove_by_name_ns+0x36/0xa0
>  remove_files.isra.1+0x36/0x70
>  sysfs_remove_group+0x44/0x90
>  sysfs_remove_groups+0x2e/0x50
>  device_remove_attrs+0x5e/0x90
>  device_del+0x1ea/0x350
>  device_unregister+0x1a/0x60
>  thermal_zone_device_unregister+0x1f2/0x210
>  pkg_thermal_cpu_offline+0x14f/0x1a0 [x86_pkg_temp_thermal]
>  ? kzalloc.constprop.2+0x10/0x10 [x86_pkg_temp_thermal]
>  cpuhp_invoke_callback+0x8d/0x3f0
>  cpuhp_down_callbacks+0x42/0x80
>  cpuhp_thread_fun+0x8b/0xf0
>  smpboot_thread_fn+0x110/0x160
>  kthread+0x101/0x140
>  ? sort_range+0x30/0x30
>  ? kthread_park+0x90/0x90
>  ret_from_fork+0x25/0x30
> 
> thermal_zone_create_device_group() sets attribute_groups in
> thermal_zone_attribute_groups[] to tz->device.groups. But these
> attributes_groups do not have name argument.
> 
> So when offlining all cores on CPU and executing
> thermal_zone_device_unregister(), the panic occurs in strlen()
> called from kernfs_name_hash() because name argument is NULL.
> 
> The patch adds thermal_zone_remove_device_groups() to free
> tz->device.groups and set NULL pointer.
> 

Thanks for finding this and sending a fix. Overall I am OK with the
patch, but I have some questions, as follows.

> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> CC: Zhang Rui <rui.zhang@intel.com>
> CC: Eduardo Valentin <edubezval@gmail.com>
> ---
>  drivers/thermal/thermal_core.c  | 3 ++-
>  drivers/thermal/thermal_core.h  | 1 +
>  drivers/thermal/thermal_sysfs.c | 6 ++++++

First question is, are you seeing same problem with cooling devices ?
They follow same strategy, of setting the groups property, and the
cooling device groups also have no name.

>  3 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 641faab..926e385 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1251,6 +1251,7 @@ struct thermal_zone_device *
> 
>  unregister:
>  	release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
> +	thermal_zone_remove_device_groups(tz);
>  	device_unregister(&tz->device);
>  	return ERR_PTR(result);
>  }
> @@ -1315,8 +1316,8 @@ void thermal_zone_device_unregister(struct thermal_zone_device *tz)
>  	release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
>  	idr_destroy(&tz->idr);
>  	mutex_destroy(&tz->lock);
> +	thermal_zone_remove_device_groups(tz);
>  	device_unregister(&tz->device);
> -	kfree(tz->device.groups);
>  }
>  EXPORT_SYMBOL_GPL(thermal_zone_device_unregister);
> 
> diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
> index 2412b37..e3a60db 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -70,6 +70,7 @@ void thermal_zone_device_unbind_exception(struct thermal_zone_device *,
>  int thermal_build_list_of_policies(char *buf);
> 
>  /* sysfs I/F */
> +void thermal_zone_remove_device_groups(struct thermal_zone_device *tz);
>  int thermal_zone_create_device_groups(struct thermal_zone_device *, int);
>  void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *);
>  /* used only at binding time */
> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> index a694de9..3dfd29b 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -605,6 +605,12 @@ static int create_trip_attrs(struct thermal_zone_device *tz, int mask)
>  	return 0;
>  }
> 
> +void thermal_zone_remove_device_groups(struct thermal_zone_device *tz)
> +{
> +	kfree(tz->device.groups);

OK. This part needs to be done regardless

> +	tz->device.groups = NULL;

This almost defeats the purpose of the effort to put all properties to
device.groups, as we need to care about releasing them. 

I guess the problem is we cannot keep the same sysfs entries if we set a
name on the thermal device groups. That would break userspace. And if we
do not set, we need to care about releasing them.

Almost feel like we might consider patching sysfs core.
Or maybe find another way to setup the device groups for thermal
devices.

> +}
> +
>  int thermal_zone_create_device_groups(struct thermal_zone_device *tz,
>  				      int mask)
>  {
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] thermal: add thermal_zone_remove_device_groups()
  2016-12-20  4:21 ` Eduardo Valentin
@ 2016-12-20 15:31   ` Yasuaki Ishimatsu
  0 siblings, 0 replies; 6+ messages in thread
From: Yasuaki Ishimatsu @ 2016-12-20 15:31 UTC (permalink / raw)
  To: Eduardo Valentin; +Cc: linux-kernel, linux-pm, rui.zhang



On 12/19/2016 11:21 PM, Eduardo Valentin wrote:
> Hey Yasuaki San,
>
> On Thu, Dec 15, 2016 at 04:47:08PM -0500, Yasuaki Ishimatsu wrote:
>> When offlining all cores on a CPU, the following system panic
>> occurs:
>>
>> BUG: unable to handle kernel NULL pointer dereference at (null)
>> IP: strlen+0x0/0x20
>> <snip>
>> Call Trace:
>>  ? kernfs_name_hash+0x17/0x80
>>  kernfs_find_ns+0x3f/0xd0
>>  kernfs_remove_by_name_ns+0x36/0xa0
>>  remove_files.isra.1+0x36/0x70
>>  sysfs_remove_group+0x44/0x90
>>  sysfs_remove_groups+0x2e/0x50
>>  device_remove_attrs+0x5e/0x90
>>  device_del+0x1ea/0x350
>>  device_unregister+0x1a/0x60
>>  thermal_zone_device_unregister+0x1f2/0x210
>>  pkg_thermal_cpu_offline+0x14f/0x1a0 [x86_pkg_temp_thermal]
>>  ? kzalloc.constprop.2+0x10/0x10 [x86_pkg_temp_thermal]
>>  cpuhp_invoke_callback+0x8d/0x3f0
>>  cpuhp_down_callbacks+0x42/0x80
>>  cpuhp_thread_fun+0x8b/0xf0
>>  smpboot_thread_fn+0x110/0x160
>>  kthread+0x101/0x140
>>  ? sort_range+0x30/0x30
>>  ? kthread_park+0x90/0x90
>>  ret_from_fork+0x25/0x30
>>
>> thermal_zone_create_device_group() sets attribute_groups in
>> thermal_zone_attribute_groups[] to tz->device.groups. But these
>> attributes_groups do not have name argument.
>>
>> So when offlining all cores on CPU and executing
>> thermal_zone_device_unregister(), the panic occurs in strlen()
>> called from kernfs_name_hash() because name argument is NULL.
>>
>> The patch adds thermal_zone_remove_device_groups() to free
>> tz->device.groups and set NULL pointer.
>>
>
> Thanks for finding this and sending a fix. Overall I am OK with the
> patch, but I have some questions, as follows.
>
>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> CC: Zhang Rui <rui.zhang@intel.com>
>> CC: Eduardo Valentin <edubezval@gmail.com>
>> ---
>>  drivers/thermal/thermal_core.c  | 3 ++-
>>  drivers/thermal/thermal_core.h  | 1 +
>>  drivers/thermal/thermal_sysfs.c | 6 ++++++
>

> First question is, are you seeing same problem with cooling devices ?

No, I'm not.
I tested physical cpu hotplug on my server and found only the issue.

> They follow same strategy, of setting the groups property, and the
> cooling device groups also have no name.
>
>>  3 files changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
>> index 641faab..926e385 100644
>> --- a/drivers/thermal/thermal_core.c
>> +++ b/drivers/thermal/thermal_core.c
>> @@ -1251,6 +1251,7 @@ struct thermal_zone_device *
>>
>>  unregister:
>>  	release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
>> +	thermal_zone_remove_device_groups(tz);
>>  	device_unregister(&tz->device);
>>  	return ERR_PTR(result);
>>  }
>> @@ -1315,8 +1316,8 @@ void thermal_zone_device_unregister(struct thermal_zone_device *tz)
>>  	release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
>>  	idr_destroy(&tz->idr);
>>  	mutex_destroy(&tz->lock);
>> +	thermal_zone_remove_device_groups(tz);
>>  	device_unregister(&tz->device);
>> -	kfree(tz->device.groups);
>>  }
>>  EXPORT_SYMBOL_GPL(thermal_zone_device_unregister);
>>
>> diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
>> index 2412b37..e3a60db 100644
>> --- a/drivers/thermal/thermal_core.h
>> +++ b/drivers/thermal/thermal_core.h
>> @@ -70,6 +70,7 @@ void thermal_zone_device_unbind_exception(struct thermal_zone_device *,
>>  int thermal_build_list_of_policies(char *buf);
>>
>>  /* sysfs I/F */
>> +void thermal_zone_remove_device_groups(struct thermal_zone_device *tz);
>>  int thermal_zone_create_device_groups(struct thermal_zone_device *, int);
>>  void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *);
>>  /* used only at binding time */
>> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
>> index a694de9..3dfd29b 100644
>> --- a/drivers/thermal/thermal_sysfs.c
>> +++ b/drivers/thermal/thermal_sysfs.c
>> @@ -605,6 +605,12 @@ static int create_trip_attrs(struct thermal_zone_device *tz, int mask)
>>  	return 0;
>>  }
>>
>> +void thermal_zone_remove_device_groups(struct thermal_zone_device *tz)
>> +{
>> +	kfree(tz->device.groups);
>
> OK. This part needs to be done regardless
>
>> +	tz->device.groups = NULL;
>
> This almost defeats the purpose of the effort to put all properties to
> device.groups, as we need to care about releasing them.

tz->device.groups has been released by kfree(). So if we don't set NULL
to tz->device.groups, it is possible to touch the released memory. Therefore
tz->device.groups should be set to NULL.

Thanks,
Yasuaki Ishimatsu

>
> I guess the problem is we cannot keep the same sysfs entries if we set a
> name on the thermal device groups. That would break userspace. And if we
> do not set, we need to care about releasing them.
>
> Almost feel like we might consider patching sysfs core.
> Or maybe find another way to setup the device groups for thermal
> devices.
>
>> +}
>> +
>>  int thermal_zone_create_device_groups(struct thermal_zone_device *tz,
>>  				      int mask)
>>  {
>> --
>> 1.8.3.1
>>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] thermal: add thermal_zone_remove_device_groups()
  2016-12-15 21:47 [PATCH] thermal: add thermal_zone_remove_device_groups() Yasuaki Ishimatsu
  2016-12-20  4:21 ` Eduardo Valentin
@ 2017-01-04  4:35 ` Zhang Rui
  2017-01-04  4:45   ` Zhang Rui
  1 sibling, 1 reply; 6+ messages in thread
From: Zhang Rui @ 2017-01-04  4:35 UTC (permalink / raw)
  To: Yasuaki Ishimatsu, linux-kernel, linux-pm; +Cc: edubezval

On Thu, 2016-12-15 at 16:47 -0500, Yasuaki Ishimatsu wrote:
> When offlining all cores on a CPU, the following system panic
> occurs:
> 
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: strlen+0x0/0x20
> <snip>
> Call Trace:
>   ? kernfs_name_hash+0x17/0x80
>   kernfs_find_ns+0x3f/0xd0
>   kernfs_remove_by_name_ns+0x36/0xa0
>   remove_files.isra.1+0x36/0x70
>   sysfs_remove_group+0x44/0x90
>   sysfs_remove_groups+0x2e/0x50
>   device_remove_attrs+0x5e/0x90
>   device_del+0x1ea/0x350
>   device_unregister+0x1a/0x60
>   thermal_zone_device_unregister+0x1f2/0x210
>   pkg_thermal_cpu_offline+0x14f/0x1a0 [x86_pkg_temp_thermal]
>   ? kzalloc.constprop.2+0x10/0x10 [x86_pkg_temp_thermal]
>   cpuhp_invoke_callback+0x8d/0x3f0
>   cpuhp_down_callbacks+0x42/0x80
>   cpuhp_thread_fun+0x8b/0xf0
>   smpboot_thread_fn+0x110/0x160
>   kthread+0x101/0x140
>   ? sort_range+0x30/0x30
>   ? kthread_park+0x90/0x90
>   ret_from_fork+0x25/0x30
> 
> thermal_zone_create_device_group() sets attribute_groups in
> thermal_zone_attribute_groups[] to tz->device.groups. But these
> attributes_groups do not have name argument.
> 
I'm a little confused here, in remove_files(),
it is the (struct attribute *)->name which is passed into
kernfs_remove_by_name, instead of attributes_groups->name.

IMO, a NULL-name attribute group won't bring any problem.

thanks,
rui

> So when offlining all cores on CPU and executing
> thermal_zone_device_unregister(), the panic occurs in strlen()
> called from kernfs_name_hash() because name argument is NULL.
> 
> The patch adds thermal_zone_remove_device_groups() to free
> tz->device.groups and set NULL pointer.
> 
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> CC: Zhang Rui <rui.zhang@intel.com>
> CC: Eduardo Valentin <edubezval@gmail.com>
> ---
>   drivers/thermal/thermal_core.c  | 3 ++-
>   drivers/thermal/thermal_core.h  | 1 +
>   drivers/thermal/thermal_sysfs.c | 6 ++++++
>   3 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/thermal_core.c
> b/drivers/thermal/thermal_core.c
> index 641faab..926e385 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1251,6 +1251,7 @@ struct thermal_zone_device *
> 
>   unregister:
>   	release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
> +	thermal_zone_remove_device_groups(tz);
>   	device_unregister(&tz->device);
>   	return ERR_PTR(result);
>   }
> @@ -1315,8 +1316,8 @@ void thermal_zone_device_unregister(struct
> thermal_zone_device *tz)
>   	release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
>   	idr_destroy(&tz->idr);
>   	mutex_destroy(&tz->lock);
> +	thermal_zone_remove_device_groups(tz);
>   	device_unregister(&tz->device);
> -	kfree(tz->device.groups);
>   }
>   EXPORT_SYMBOL_GPL(thermal_zone_device_unregister);
> 
> diff --git a/drivers/thermal/thermal_core.h
> b/drivers/thermal/thermal_core.h
> index 2412b37..e3a60db 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -70,6 +70,7 @@ void thermal_zone_device_unbind_exception(struct
> thermal_zone_device *,
>   int thermal_build_list_of_policies(char *buf);
> 
>   /* sysfs I/F */
> +void thermal_zone_remove_device_groups(struct thermal_zone_device
> *tz);
>   int thermal_zone_create_device_groups(struct thermal_zone_device *,
> int);
>   void thermal_cooling_device_setup_sysfs(struct
> thermal_cooling_device *);
>   /* used only at binding time */
> diff --git a/drivers/thermal/thermal_sysfs.c
> b/drivers/thermal/thermal_sysfs.c
> index a694de9..3dfd29b 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -605,6 +605,12 @@ static int create_trip_attrs(struct
> thermal_zone_device *tz, int mask)
>   	return 0;
>   }
> 
> +void thermal_zone_remove_device_groups(struct thermal_zone_device
> *tz)
> +{
> +	kfree(tz->device.groups);
> +	tz->device.groups = NULL;
> +}
> +
>   int thermal_zone_create_device_groups(struct thermal_zone_device
> *tz,
>   				      int mask)
>   {

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] thermal: add thermal_zone_remove_device_groups()
  2017-01-04  4:35 ` Zhang Rui
@ 2017-01-04  4:45   ` Zhang Rui
  2017-01-05 16:58     ` Yasuaki Ishimatsu
  0 siblings, 1 reply; 6+ messages in thread
From: Zhang Rui @ 2017-01-04  4:45 UTC (permalink / raw)
  To: Yasuaki Ishimatsu, linux-kernel, linux-pm; +Cc: edubezval

On Wed, 2017-01-04 at 12:35 +0800, Zhang Rui wrote:
> On Thu, 2016-12-15 at 16:47 -0500, Yasuaki Ishimatsu wrote:
> > 
> > When offlining all cores on a CPU, the following system panic
> > occurs:
> > 
> > BUG: unable to handle kernel NULL pointer dereference at (null)
> > IP: strlen+0x0/0x20
> > <snip>
> > Call Trace:
> >   ? kernfs_name_hash+0x17/0x80
> >   kernfs_find_ns+0x3f/0xd0
> >   kernfs_remove_by_name_ns+0x36/0xa0
> >   remove_files.isra.1+0x36/0x70
> >   sysfs_remove_group+0x44/0x90
> >   sysfs_remove_groups+0x2e/0x50
> >   device_remove_attrs+0x5e/0x90
> >   device_del+0x1ea/0x350
> >   device_unregister+0x1a/0x60
> >   thermal_zone_device_unregister+0x1f2/0x210
> >   pkg_thermal_cpu_offline+0x14f/0x1a0 [x86_pkg_temp_thermal]
> >   ? kzalloc.constprop.2+0x10/0x10 [x86_pkg_temp_thermal]
> >   cpuhp_invoke_callback+0x8d/0x3f0
> >   cpuhp_down_callbacks+0x42/0x80
> >   cpuhp_thread_fun+0x8b/0xf0
> >   smpboot_thread_fn+0x110/0x160
> >   kthread+0x101/0x140
> >   ? sort_range+0x30/0x30
> >   ? kthread_park+0x90/0x90
> >   ret_from_fork+0x25/0x30
> > 
> > thermal_zone_create_device_group() sets attribute_groups in
> > thermal_zone_attribute_groups[] to tz->device.groups. But these
> > attributes_groups do not have name argument.
> > 
> I'm a little confused here, in remove_files(),
> it is the (struct attribute *)->name which is passed into
> kernfs_remove_by_name, instead of attributes_groups->name.
> 
> IMO, a NULL-name attribute group won't bring any problem.
> 
hah, I see what the problem is here.
Just like the problem illustrated in this one
https://patchwork.kernel.org/patch/9492439/
The root cause is that, the trip point attributes are cleared and freed
BEFORE device_unregister(), this results in NULL trip point attributes
when removing the thermal zone device sysfs groups.

And I believe https://patchwork.kernel.org/patch/9492439/ should fix
the problem for you, right?

thanks,
rui> > 
> > So when offlining all cores on CPU and executing
> > thermal_zone_device_unregister(), the panic occurs in strlen()
> > called from kernfs_name_hash() because name argument is NULL.
> > 
> > The patch adds thermal_zone_remove_device_groups() to free
> > tz->device.groups and set NULL pointer.
> > 
> > Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> > CC: Zhang Rui <rui.zhang@intel.com>
> > CC: Eduardo Valentin <edubezval@gmail.com>
> > ---
> >   drivers/thermal/thermal_core.c  | 3 ++-
> >   drivers/thermal/thermal_core.h  | 1 +
> >   drivers/thermal/thermal_sysfs.c | 6 ++++++
> >   3 files changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/thermal/thermal_core.c
> > b/drivers/thermal/thermal_core.c
> > index 641faab..926e385 100644
> > --- a/drivers/thermal/thermal_core.c
> > +++ b/drivers/thermal/thermal_core.c
> > @@ -1251,6 +1251,7 @@ struct thermal_zone_device *
> > 
> >   unregister:
> >   	release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
> > +	thermal_zone_remove_device_groups(tz);
> >   	device_unregister(&tz->device);
> >   	return ERR_PTR(result);
> >   }
> > @@ -1315,8 +1316,8 @@ void thermal_zone_device_unregister(struct
> > thermal_zone_device *tz)
> >   	release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
> >   	idr_destroy(&tz->idr);
> >   	mutex_destroy(&tz->lock);
> > +	thermal_zone_remove_device_groups(tz);
> >   	device_unregister(&tz->device);
> > -	kfree(tz->device.groups);
> >   }
> >   EXPORT_SYMBOL_GPL(thermal_zone_device_unregister);
> > 
> > diff --git a/drivers/thermal/thermal_core.h
> > b/drivers/thermal/thermal_core.h
> > index 2412b37..e3a60db 100644
> > --- a/drivers/thermal/thermal_core.h
> > +++ b/drivers/thermal/thermal_core.h
> > @@ -70,6 +70,7 @@ void thermal_zone_device_unbind_exception(struct
> > thermal_zone_device *,
> >   int thermal_build_list_of_policies(char *buf);
> > 
> >   /* sysfs I/F */
> > +void thermal_zone_remove_device_groups(struct thermal_zone_device
> > *tz);
> >   int thermal_zone_create_device_groups(struct thermal_zone_device
> > *,
> > int);
> >   void thermal_cooling_device_setup_sysfs(struct
> > thermal_cooling_device *);
> >   /* used only at binding time */
> > diff --git a/drivers/thermal/thermal_sysfs.c
> > b/drivers/thermal/thermal_sysfs.c
> > index a694de9..3dfd29b 100644
> > --- a/drivers/thermal/thermal_sysfs.c
> > +++ b/drivers/thermal/thermal_sysfs.c
> > @@ -605,6 +605,12 @@ static int create_trip_attrs(struct
> > thermal_zone_device *tz, int mask)
> >   	return 0;
> >   }
> > 
> > +void thermal_zone_remove_device_groups(struct thermal_zone_device
> > *tz)
> > +{
> > +	kfree(tz->device.groups);
> > +	tz->device.groups = NULL;
> > +}
> > +
> >   int thermal_zone_create_device_groups(struct thermal_zone_device
> > *tz,
> >   				      int mask)
> >   {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] thermal: add thermal_zone_remove_device_groups()
  2017-01-04  4:45   ` Zhang Rui
@ 2017-01-05 16:58     ` Yasuaki Ishimatsu
  0 siblings, 0 replies; 6+ messages in thread
From: Yasuaki Ishimatsu @ 2017-01-05 16:58 UTC (permalink / raw)
  To: Zhang Rui, linux-kernel, linux-pm; +Cc: edubezval



On 01/03/2017 11:45 PM, Zhang Rui wrote:
> On Wed, 2017-01-04 at 12:35 +0800, Zhang Rui wrote:
>> On Thu, 2016-12-15 at 16:47 -0500, Yasuaki Ishimatsu wrote:
>>>
>>> When offlining all cores on a CPU, the following system panic
>>> occurs:
>>>
>>> BUG: unable to handle kernel NULL pointer dereference at (null)
>>> IP: strlen+0x0/0x20
>>> <snip>
>>> Call Trace:
>>>   ? kernfs_name_hash+0x17/0x80
>>>   kernfs_find_ns+0x3f/0xd0
>>>   kernfs_remove_by_name_ns+0x36/0xa0
>>>   remove_files.isra.1+0x36/0x70
>>>   sysfs_remove_group+0x44/0x90
>>>   sysfs_remove_groups+0x2e/0x50
>>>   device_remove_attrs+0x5e/0x90
>>>   device_del+0x1ea/0x350
>>>   device_unregister+0x1a/0x60
>>>   thermal_zone_device_unregister+0x1f2/0x210
>>>   pkg_thermal_cpu_offline+0x14f/0x1a0 [x86_pkg_temp_thermal]
>>>   ? kzalloc.constprop.2+0x10/0x10 [x86_pkg_temp_thermal]
>>>   cpuhp_invoke_callback+0x8d/0x3f0
>>>   cpuhp_down_callbacks+0x42/0x80
>>>   cpuhp_thread_fun+0x8b/0xf0
>>>   smpboot_thread_fn+0x110/0x160
>>>   kthread+0x101/0x140
>>>   ? sort_range+0x30/0x30
>>>   ? kthread_park+0x90/0x90
>>>   ret_from_fork+0x25/0x30
>>>
>>> thermal_zone_create_device_group() sets attribute_groups in
>>> thermal_zone_attribute_groups[] to tz->device.groups. But these
>>> attributes_groups do not have name argument.
>>>
>> I'm a little confused here, in remove_files(),
>> it is the (struct attribute *)->name which is passed into
>> kernfs_remove_by_name, instead of attributes_groups->name.
>>
>> IMO, a NULL-name attribute group won't bring any problem.
>>
> hah, I see what the problem is here.
> Just like the problem illustrated in this one
> https://patchwork.kernel.org/patch/9492439/
> The root cause is that, the trip point attributes are cleared and freed
> BEFORE device_unregister(), this results in NULL trip point attributes
> when removing the thermal zone device sysfs groups.
>
> And I believe https://patchwork.kernel.org/patch/9492439/ should fix
> the problem for you, right?

Thank you for the information.
I confirmed that the patch fixes the issue.

Thanks,
Yasuaki Ishimatsu
>
> thanks,
> rui> >
>>> So when offlining all cores on CPU and executing
>>> thermal_zone_device_unregister(), the panic occurs in strlen()
>>> called from kernfs_name_hash() because name argument is NULL.
>>>
>>> The patch adds thermal_zone_remove_device_groups() to free
>>> tz->device.groups and set NULL pointer.
>>>
>>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>> CC: Zhang Rui <rui.zhang@intel.com>
>>> CC: Eduardo Valentin <edubezval@gmail.com>
>>> ---
>>>   drivers/thermal/thermal_core.c  | 3 ++-
>>>   drivers/thermal/thermal_core.h  | 1 +
>>>   drivers/thermal/thermal_sysfs.c | 6 ++++++
>>>   3 files changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/thermal/thermal_core.c
>>> b/drivers/thermal/thermal_core.c
>>> index 641faab..926e385 100644
>>> --- a/drivers/thermal/thermal_core.c
>>> +++ b/drivers/thermal/thermal_core.c
>>> @@ -1251,6 +1251,7 @@ struct thermal_zone_device *
>>>
>>>   unregister:
>>>   	release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
>>> +	thermal_zone_remove_device_groups(tz);
>>>   	device_unregister(&tz->device);
>>>   	return ERR_PTR(result);
>>>   }
>>> @@ -1315,8 +1316,8 @@ void thermal_zone_device_unregister(struct
>>> thermal_zone_device *tz)
>>>   	release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
>>>   	idr_destroy(&tz->idr);
>>>   	mutex_destroy(&tz->lock);
>>> +	thermal_zone_remove_device_groups(tz);
>>>   	device_unregister(&tz->device);
>>> -	kfree(tz->device.groups);
>>>   }
>>>   EXPORT_SYMBOL_GPL(thermal_zone_device_unregister);
>>>
>>> diff --git a/drivers/thermal/thermal_core.h
>>> b/drivers/thermal/thermal_core.h
>>> index 2412b37..e3a60db 100644
>>> --- a/drivers/thermal/thermal_core.h
>>> +++ b/drivers/thermal/thermal_core.h
>>> @@ -70,6 +70,7 @@ void thermal_zone_device_unbind_exception(struct
>>> thermal_zone_device *,
>>>   int thermal_build_list_of_policies(char *buf);
>>>
>>>   /* sysfs I/F */
>>> +void thermal_zone_remove_device_groups(struct thermal_zone_device
>>> *tz);
>>>   int thermal_zone_create_device_groups(struct thermal_zone_device
>>> *,
>>> int);
>>>   void thermal_cooling_device_setup_sysfs(struct
>>> thermal_cooling_device *);
>>>   /* used only at binding time */
>>> diff --git a/drivers/thermal/thermal_sysfs.c
>>> b/drivers/thermal/thermal_sysfs.c
>>> index a694de9..3dfd29b 100644
>>> --- a/drivers/thermal/thermal_sysfs.c
>>> +++ b/drivers/thermal/thermal_sysfs.c
>>> @@ -605,6 +605,12 @@ static int create_trip_attrs(struct
>>> thermal_zone_device *tz, int mask)
>>>   	return 0;
>>>   }
>>>
>>> +void thermal_zone_remove_device_groups(struct thermal_zone_device
>>> *tz)
>>> +{
>>> +	kfree(tz->device.groups);
>>> +	tz->device.groups = NULL;
>>> +}
>>> +
>>>   int thermal_zone_create_device_groups(struct thermal_zone_device
>>> *tz,
>>>   				      int mask)
>>>   {
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pm"
>> in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-01-05 16:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-15 21:47 [PATCH] thermal: add thermal_zone_remove_device_groups() Yasuaki Ishimatsu
2016-12-20  4:21 ` Eduardo Valentin
2016-12-20 15:31   ` Yasuaki Ishimatsu
2017-01-04  4:35 ` Zhang Rui
2017-01-04  4:45   ` Zhang Rui
2017-01-05 16:58     ` Yasuaki Ishimatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).