From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751281AbdADEsR (ORCPT ); Tue, 3 Jan 2017 23:48:17 -0500 Received: from mga14.intel.com ([192.55.52.115]:4640 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751101AbdADEsQ (ORCPT ); Tue, 3 Jan 2017 23:48:16 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,458,1477983600"; d="scan'208";a="918693011" Message-ID: <1483505107.2320.50.camel@intel.com> Subject: Re: [PATCH] thermal: add thermal_zone_remove_device_groups() From: Zhang Rui To: Yasuaki Ishimatsu , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: edubezval@gmail.com Date: Wed, 04 Jan 2017 12:45:07 +0800 In-Reply-To: <1483504508.2320.41.camel@intel.com> References: <6c5a1a95-9b03-6f43-8de8-1ab4ac27cc78@gmail.com> <1483504508.2320.41.camel@intel.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.5.2-0ubuntu3 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2017-01-04 at 12:35 +0800, Zhang Rui wrote: > On Thu, 2016-12-15 at 16:47 -0500, Yasuaki Ishimatsu wrote: > > > > When offlining all cores on a CPU, the following system panic > > occurs: > > > > BUG: unable to handle kernel NULL pointer dereference at (null) > > IP: strlen+0x0/0x20 > > > > Call Trace: > >   ? kernfs_name_hash+0x17/0x80 > >   kernfs_find_ns+0x3f/0xd0 > >   kernfs_remove_by_name_ns+0x36/0xa0 > >   remove_files.isra.1+0x36/0x70 > >   sysfs_remove_group+0x44/0x90 > >   sysfs_remove_groups+0x2e/0x50 > >   device_remove_attrs+0x5e/0x90 > >   device_del+0x1ea/0x350 > >   device_unregister+0x1a/0x60 > >   thermal_zone_device_unregister+0x1f2/0x210 > >   pkg_thermal_cpu_offline+0x14f/0x1a0 [x86_pkg_temp_thermal] > >   ? kzalloc.constprop.2+0x10/0x10 [x86_pkg_temp_thermal] > >   cpuhp_invoke_callback+0x8d/0x3f0 > >   cpuhp_down_callbacks+0x42/0x80 > >   cpuhp_thread_fun+0x8b/0xf0 > >   smpboot_thread_fn+0x110/0x160 > >   kthread+0x101/0x140 > >   ? sort_range+0x30/0x30 > >   ? kthread_park+0x90/0x90 > >   ret_from_fork+0x25/0x30 > > > > thermal_zone_create_device_group() sets attribute_groups in > > thermal_zone_attribute_groups[] to tz->device.groups. But these > > attributes_groups do not have name argument. > > > I'm a little confused here, in remove_files(), > it is the (struct attribute *)->name which is passed into > kernfs_remove_by_name, instead of attributes_groups->name. > > IMO, a NULL-name attribute group won't bring any problem. > hah, I see what the problem is here. Just like the problem illustrated in this one https://patchwork.kernel.org/patch/9492439/ The root cause is that, the trip point attributes are cleared and freed BEFORE device_unregister(), this results in NULL trip point attributes when removing the thermal zone device sysfs groups. And I believe https://patchwork.kernel.org/patch/9492439/ should fix the problem for you, right? thanks, rui> > > > So when offlining all cores on CPU and executing > > thermal_zone_device_unregister(), the panic occurs in strlen() > > called from kernfs_name_hash() because name argument is NULL. > > > > The patch adds thermal_zone_remove_device_groups() to free > > tz->device.groups and set NULL pointer. > > > > Signed-off-by: Yasuaki Ishimatsu > > CC: Zhang Rui > > CC: Eduardo Valentin > > --- > >   drivers/thermal/thermal_core.c  | 3 ++- > >   drivers/thermal/thermal_core.h  | 1 + > >   drivers/thermal/thermal_sysfs.c | 6 ++++++ > >   3 files changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/thermal/thermal_core.c > > b/drivers/thermal/thermal_core.c > > index 641faab..926e385 100644 > > --- a/drivers/thermal/thermal_core.c > > +++ b/drivers/thermal/thermal_core.c > > @@ -1251,6 +1251,7 @@ struct thermal_zone_device * > > > >   unregister: > >    release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id); > > + thermal_zone_remove_device_groups(tz); > >    device_unregister(&tz->device); > >    return ERR_PTR(result); > >   } > > @@ -1315,8 +1316,8 @@ void thermal_zone_device_unregister(struct > > thermal_zone_device *tz) > >    release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id); > >    idr_destroy(&tz->idr); > >    mutex_destroy(&tz->lock); > > + thermal_zone_remove_device_groups(tz); > >    device_unregister(&tz->device); > > - kfree(tz->device.groups); > >   } > >   EXPORT_SYMBOL_GPL(thermal_zone_device_unregister); > > > > diff --git a/drivers/thermal/thermal_core.h > > b/drivers/thermal/thermal_core.h > > index 2412b37..e3a60db 100644 > > --- a/drivers/thermal/thermal_core.h > > +++ b/drivers/thermal/thermal_core.h > > @@ -70,6 +70,7 @@ void thermal_zone_device_unbind_exception(struct > > thermal_zone_device *, > >   int thermal_build_list_of_policies(char *buf); > > > >   /* sysfs I/F */ > > +void thermal_zone_remove_device_groups(struct thermal_zone_device > > *tz); > >   int thermal_zone_create_device_groups(struct thermal_zone_device > > *, > > int); > >   void thermal_cooling_device_setup_sysfs(struct > > thermal_cooling_device *); > >   /* used only at binding time */ > > diff --git a/drivers/thermal/thermal_sysfs.c > > b/drivers/thermal/thermal_sysfs.c > > index a694de9..3dfd29b 100644 > > --- a/drivers/thermal/thermal_sysfs.c > > +++ b/drivers/thermal/thermal_sysfs.c > > @@ -605,6 +605,12 @@ static int create_trip_attrs(struct > > thermal_zone_device *tz, int mask) > >    return 0; > >   } > > > > +void thermal_zone_remove_device_groups(struct thermal_zone_device > > *tz) > > +{ > > + kfree(tz->device.groups); > > + tz->device.groups = NULL; > > +} > > + > >   int thermal_zone_create_device_groups(struct thermal_zone_device > > *tz, > >          int mask) > >   { > -- > To unsubscribe from this list: send the line "unsubscribe linux-pm" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at  http://vger.kernel.org/majordomo-info.html