* [PATCH 0/3] Thermal: thermal enhancements for boot and system sleep @ 2015-03-24 5:21 Zhang Rui 2015-03-24 5:21 ` [PATCH 1/3] Thermal: initialize thermal zone device correctly Zhang Rui ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Zhang Rui @ 2015-03-24 5:21 UTC (permalink / raw) To: linux-pm; +Cc: Zhang Rui Currently, there are a couple of problems in thermal core framework after boot and resume from system sleep state, because the thermal zone devices are not put into a proper state in these cases. Details of the problems are described in the patch change logs. In general, altogether they fix three bugs https://bugzilla.kernel.org/show_bug.cgi?id=78201 https://bugzilla.kernel.org/show_bug.cgi?id=91411 https://bugzilla.kernel.org/show_bug.cgi?id=92431 Bug 78201 needs patch 1/3 and 2/3. Bug 91411 and 92431 are regressions caused by commit 19593a1fb1f6718406afca5b867dab184289d406 Author: Aaron Lu <aaron.lu@intel.com> Date: Tue Nov 19 16:59:20 2013 +0800 ACPI / fan: convert to platform driver Convert ACPI fan driver to a platform driver for the purpose of phasing out ACPI bus. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> which is shipped in 3.18. Bug 91411 needs patch 1/3, 2/3 to fix, while 92431 needs all three patches. If possible, I'd like to push these patches into 4.0-rc and 3.18/3.19 stable kernel as it actually fixes a regression in 3.18. Any comments are welcome. thanks, rui ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 1/3] Thermal: initialize thermal zone device correctly 2015-03-24 5:21 [PATCH 0/3] Thermal: thermal enhancements for boot and system sleep Zhang Rui @ 2015-03-24 5:21 ` Zhang Rui 2015-03-24 15:00 ` Eduardo Valentin 2015-03-24 5:21 ` [PATCH 2/3] Thermal: handle thermal zone device properly during system sleep Zhang Rui 2015-03-24 5:21 ` [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered Zhang Rui 2 siblings, 1 reply; 32+ messages in thread From: Zhang Rui @ 2015-03-24 5:21 UTC (permalink / raw) To: linux-pm; +Cc: Zhang Rui, stable After thermal zone device registered, as we have not read any temperature before, thus tz->temperature should not be 0, which actually means 0C, and thermal trend is not available. In this case, we need specially handling for the first thermal_zone_device_update(). Both thermal core framework and step_wise governor is enhanced to handle this. CC: <stable@vger.kernel.org> #3.18+ Tested-by: Manuel Krause <manuelkrause@netscape.net> Tested-by: szegad <szegadlo@poczta.onet.pl> Tested-by: prash <prash.n.rao@gmail.com> Tested-by: amish <ammdispose-arch@yahoo.com> Tested-by: Matthias <morpheusxyz123@yahoo.de> Signed-off-by: Zhang Rui <rui.zhang@intel.com> --- drivers/thermal/step_wise.c | 15 +++++++++++++-- drivers/thermal/thermal_core.c | 19 +++++++++++++++++-- drivers/thermal/thermal_core.h | 1 + include/linux/thermal.h | 3 +++ 4 files changed, 34 insertions(+), 4 deletions(-) diff --git a/drivers/thermal/step_wise.c b/drivers/thermal/step_wise.c index 5a0f12d..c2bb37c 100644 --- a/drivers/thermal/step_wise.c +++ b/drivers/thermal/step_wise.c @@ -63,6 +63,16 @@ static unsigned long get_target_state(struct thermal_instance *instance, next_target = instance->target; dev_dbg(&cdev->device, "cur_state=%ld\n", cur_state); + if (!instance->initialized) { + if (throttle) { + next_target = (cur_state + 1) >= instance->upper ? + instance->upper : + ((cur_state + 1) < instance->lower ? + instance->lower : (cur_state + 1)); + } else + next_target = THERMAL_NO_TARGET; + } + switch (trend) { case THERMAL_TREND_RAISING: if (throttle) { @@ -149,7 +159,8 @@ static void thermal_zone_trip_update(struct thermal_zone_device *tz, int trip) dev_dbg(&instance->cdev->device, "old_target=%d, target=%d\n", old_target, (int)instance->target); - if (old_target == instance->target) + if (instance->initialized && + old_target == instance->target) continue; /* Activate a passive thermal instance */ @@ -161,7 +172,7 @@ static void thermal_zone_trip_update(struct thermal_zone_device *tz, int trip) instance->target == THERMAL_NO_TARGET) update_passive_instance(tz, trip_type, -1); - + instance->initialized = true; instance->cdev->updated = false; /* cdev needs update */ } diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 174d3bc..9d6f71b 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -469,8 +469,22 @@ static void update_temperature(struct thermal_zone_device *tz) mutex_unlock(&tz->lock); trace_thermal_temperature(tz); - dev_dbg(&tz->device, "last_temperature=%d, current_temperature=%d\n", - tz->last_temperature, tz->temperature); + if (tz->last_temperature == THERMAL_TEMP_INVALID) + dev_dbg(&tz->device, "last_temperature N/A, current_temperature=%d\n", + tz->temperature); + else + dev_dbg(&tz->device, "last_temperature=%d, current_temperature=%d\n", + tz->last_temperature, tz->temperature); +} + +static void thermal_zone_device_reset(struct thermal_zone_device *tz) +{ + struct thermal_instance *pos; + + tz->temperature = THERMAL_TEMP_INVALID; + tz->passive = 0; + list_for_each_entry(pos, &tz->thermal_instances, tz_node) + pos->initialized = false; } void thermal_zone_device_update(struct thermal_zone_device *tz) @@ -1574,6 +1588,7 @@ struct thermal_zone_device *thermal_zone_device_register(const char *type, if (!tz->ops->get_temp) thermal_zone_device_set_polling(tz, 0); + thermal_zone_device_reset(tz); thermal_zone_device_update(tz); return tz; diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h index 0531c75..6d9ffa5 100644 --- a/drivers/thermal/thermal_core.h +++ b/drivers/thermal/thermal_core.h @@ -41,6 +41,7 @@ struct thermal_instance { struct thermal_zone_device *tz; struct thermal_cooling_device *cdev; int trip; + bool initialized; unsigned long upper; /* Highest cooling state for this trip point */ unsigned long lower; /* Lowest cooling state for this trip point */ unsigned long target; /* expected cooling state */ diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 5eac316..8650b0b 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -40,6 +40,9 @@ /* No upper/lower limit requirement */ #define THERMAL_NO_LIMIT ((u32)~0) +/* Invalid/uninitialized temperature */ +#define THERMAL_TEMP_INVALID -27400 + /* Unit conversion macros */ #define KELVIN_TO_CELSIUS(t) (long)(((long)t-2732 >= 0) ? \ ((long)t-2732+5)/10 : ((long)t-2732-5)/10) -- 1.9.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 1/3] Thermal: initialize thermal zone device correctly 2015-03-24 5:21 ` [PATCH 1/3] Thermal: initialize thermal zone device correctly Zhang Rui @ 2015-03-24 15:00 ` Eduardo Valentin 2015-03-24 17:20 ` Javi Merino 2015-03-25 2:14 ` Zhang, Rui 0 siblings, 2 replies; 32+ messages in thread From: Eduardo Valentin @ 2015-03-24 15:00 UTC (permalink / raw) To: Zhang Rui; +Cc: linux-pm, stable [-- Attachment #1: Type: text/plain, Size: 5472 bytes --] Rui, A couple of comments. On Tue, Mar 24, 2015 at 01:21:28PM +0800, Zhang Rui wrote: > After thermal zone device registered, as we have not read any > temperature before, thus tz->temperature should not be 0, which actually > means 0C, and thermal trend is not available. > In this case, we need specially handling for the first > thermal_zone_device_update(). > > Both thermal core framework and step_wise governor is enhanced to handle this. > > CC: <stable@vger.kernel.org> #3.18+ > Tested-by: Manuel Krause <manuelkrause@netscape.net> > Tested-by: szegad <szegadlo@poczta.onet.pl> > Tested-by: prash <prash.n.rao@gmail.com> > Tested-by: amish <ammdispose-arch@yahoo.com> > Tested-by: Matthias <morpheusxyz123@yahoo.de> > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > --- > drivers/thermal/step_wise.c | 15 +++++++++++++-- > drivers/thermal/thermal_core.c | 19 +++++++++++++++++-- > drivers/thermal/thermal_core.h | 1 + > include/linux/thermal.h | 3 +++ > 4 files changed, 34 insertions(+), 4 deletions(-) > > diff --git a/drivers/thermal/step_wise.c b/drivers/thermal/step_wise.c Should this patch also include changes in other governors ? > index 5a0f12d..c2bb37c 100644 > --- a/drivers/thermal/step_wise.c > +++ b/drivers/thermal/step_wise.c > @@ -63,6 +63,16 @@ static unsigned long get_target_state(struct thermal_instance *instance, > next_target = instance->target; > dev_dbg(&cdev->device, "cur_state=%ld\n", cur_state); > > + if (!instance->initialized) { > + if (throttle) { > + next_target = (cur_state + 1) >= instance->upper ? > + instance->upper : > + ((cur_state + 1) < instance->lower ? > + instance->lower : (cur_state + 1)); Why it makes sense to change the next state if a instance is uninitialized? > + } else > + next_target = THERMAL_NO_TARGET; > + } > + > switch (trend) { > case THERMAL_TREND_RAISING: > if (throttle) { > @@ -149,7 +159,8 @@ static void thermal_zone_trip_update(struct thermal_zone_device *tz, int trip) > dev_dbg(&instance->cdev->device, "old_target=%d, target=%d\n", > old_target, (int)instance->target); > > - if (old_target == instance->target) > + if (instance->initialized && > + old_target == instance->target) > continue; > > /* Activate a passive thermal instance */ > @@ -161,7 +172,7 @@ static void thermal_zone_trip_update(struct thermal_zone_device *tz, int trip) > instance->target == THERMAL_NO_TARGET) > update_passive_instance(tz, trip_type, -1); > > - > + instance->initialized = true; > instance->cdev->updated = false; /* cdev needs update */ > } > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index 174d3bc..9d6f71b 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -469,8 +469,22 @@ static void update_temperature(struct thermal_zone_device *tz) > mutex_unlock(&tz->lock); > > trace_thermal_temperature(tz); > - dev_dbg(&tz->device, "last_temperature=%d, current_temperature=%d\n", > - tz->last_temperature, tz->temperature); > + if (tz->last_temperature == THERMAL_TEMP_INVALID) > + dev_dbg(&tz->device, "last_temperature N/A, current_temperature=%d\n", > + tz->temperature); > + else > + dev_dbg(&tz->device, "last_temperature=%d, current_temperature=%d\n", > + tz->last_temperature, tz->temperature); Should we also teach the tracing facility about THERMAL_TEMP_INVALID? > +} > + > +static void thermal_zone_device_reset(struct thermal_zone_device *tz) > +{ > + struct thermal_instance *pos; > + > + tz->temperature = THERMAL_TEMP_INVALID; > + tz->passive = 0; > + list_for_each_entry(pos, &tz->thermal_instances, tz_node) > + pos->initialized = false; > } > > void thermal_zone_device_update(struct thermal_zone_device *tz) > @@ -1574,6 +1588,7 @@ struct thermal_zone_device *thermal_zone_device_register(const char *type, > if (!tz->ops->get_temp) > thermal_zone_device_set_polling(tz, 0); > > + thermal_zone_device_reset(tz); > thermal_zone_device_update(tz); > > return tz; > diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h > index 0531c75..6d9ffa5 100644 > --- a/drivers/thermal/thermal_core.h > +++ b/drivers/thermal/thermal_core.h > @@ -41,6 +41,7 @@ struct thermal_instance { > struct thermal_zone_device *tz; > struct thermal_cooling_device *cdev; > int trip; > + bool initialized; > unsigned long upper; /* Highest cooling state for this trip point */ > unsigned long lower; /* Lowest cooling state for this trip point */ > unsigned long target; /* expected cooling state */ > diff --git a/include/linux/thermal.h b/include/linux/thermal.h > index 5eac316..8650b0b 100644 > --- a/include/linux/thermal.h > +++ b/include/linux/thermal.h > @@ -40,6 +40,9 @@ > /* No upper/lower limit requirement */ > #define THERMAL_NO_LIMIT ((u32)~0) > > +/* Invalid/uninitialized temperature */ > +#define THERMAL_TEMP_INVALID -27400 > + > /* Unit conversion macros */ > #define KELVIN_TO_CELSIUS(t) (long)(((long)t-2732 >= 0) ? \ > ((long)t-2732+5)/10 : ((long)t-2732-5)/10) > -- > 1.9.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/3] Thermal: initialize thermal zone device correctly 2015-03-24 15:00 ` Eduardo Valentin @ 2015-03-24 17:20 ` Javi Merino 2015-03-25 2:14 ` Zhang, Rui 1 sibling, 0 replies; 32+ messages in thread From: Javi Merino @ 2015-03-24 17:20 UTC (permalink / raw) To: Eduardo Valentin; +Cc: Punit Agrawal, Zhang Rui, linux-pm, stable On Tue, Mar 24, 2015 at 03:00:06PM +0000, Eduardo Valentin wrote: > Rui, > > A couple of comments. > > On Tue, Mar 24, 2015 at 01:21:28PM +0800, Zhang Rui wrote: > > After thermal zone device registered, as we have not read any > > temperature before, thus tz->temperature should not be 0, which actually > > means 0C, and thermal trend is not available. > > In this case, we need specially handling for the first > > thermal_zone_device_update(). > > > > Both thermal core framework and step_wise governor is enhanced to handle this. > > > > CC: <stable@vger.kernel.org> #3.18+ > > Tested-by: Manuel Krause <manuelkrause@netscape.net> > > Tested-by: szegad <szegadlo@poczta.onet.pl> > > Tested-by: prash <prash.n.rao@gmail.com> > > Tested-by: amish <ammdispose-arch@yahoo.com> > > Tested-by: Matthias <morpheusxyz123@yahoo.de> > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > --- > > drivers/thermal/step_wise.c | 15 +++++++++++++-- > > drivers/thermal/thermal_core.c | 19 +++++++++++++++++-- > > drivers/thermal/thermal_core.h | 1 + > > include/linux/thermal.h | 3 +++ > > 4 files changed, 34 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/thermal/step_wise.c b/drivers/thermal/step_wise.c > > Should this patch also include changes in other governors ? If I understand it correctly, instance->initialized actually means "is the trend valid?". As step_wise is the only governor that uses the trend, it's the only one that needs updating. > > index 5a0f12d..c2bb37c 100644 > > --- a/drivers/thermal/step_wise.c > > +++ b/drivers/thermal/step_wise.c > > @@ -63,6 +63,16 @@ static unsigned long get_target_state(struct thermal_instance *instance, > > next_target = instance->target; > > dev_dbg(&cdev->device, "cur_state=%ld\n", cur_state); > > > > + if (!instance->initialized) { > > + if (throttle) { > > + next_target = (cur_state + 1) >= instance->upper ? > > + instance->upper : > > + ((cur_state + 1) < instance->lower ? > > + instance->lower : (cur_state + 1)); > > Why it makes sense to change the next state if a instance is > uninitialized? > > > + } else > > + next_target = THERMAL_NO_TARGET; CodingStyle says that if one branch of an if statement needs braces, all branches must have braces: if (condition) { do_this(); do_that(); } else { otherwise(); } > > + } > > + Does this really work? The update of next_target will probably be overwritten by the switch below, shouldn't you "return next_target;" at the end of the "if (!instance->initialized)"? > > switch (trend) { > > case THERMAL_TREND_RAISING: > > if (throttle) { > > @@ -149,7 +159,8 @@ static void thermal_zone_trip_update(struct thermal_zone_device *tz, int trip) > > dev_dbg(&instance->cdev->device, "old_target=%d, target=%d\n", > > old_target, (int)instance->target); > > > > - if (old_target == instance->target) > > + if (instance->initialized && > > + old_target == instance->target) > > continue; > > > > /* Activate a passive thermal instance */ > > @@ -161,7 +172,7 @@ static void thermal_zone_trip_update(struct thermal_zone_device *tz, int trip) > > instance->target == THERMAL_NO_TARGET) > > update_passive_instance(tz, trip_type, -1); > > > > - > > + instance->initialized = true; > > instance->cdev->updated = false; /* cdev needs update */ > > } > > > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > > index 174d3bc..9d6f71b 100644 > > --- a/drivers/thermal/thermal_core.c > > +++ b/drivers/thermal/thermal_core.c > > @@ -469,8 +469,22 @@ static void update_temperature(struct thermal_zone_device *tz) > > mutex_unlock(&tz->lock); > > > > trace_thermal_temperature(tz); > > - dev_dbg(&tz->device, "last_temperature=%d, current_temperature=%d\n", > > - tz->last_temperature, tz->temperature); > > + if (tz->last_temperature == THERMAL_TEMP_INVALID) > > + dev_dbg(&tz->device, "last_temperature N/A, current_temperature=%d\n", > > + tz->temperature); > > + else > > + dev_dbg(&tz->device, "last_temperature=%d, current_temperature=%d\n", > > + tz->last_temperature, tz->temperature); > > Should we also teach the tracing facility about THERMAL_TEMP_INVALID? I don't think there's a good way of putting this information in trace. The format string is fixed and playing with it like we do here is not an option. In practical terms, trace will collect a weird "-27400" for the previous temperature, so it's not too bad. I guess it would help if the invalid temperature was something more obvious, like INT_MIN. > > +} > > + > > +static void thermal_zone_device_reset(struct thermal_zone_device *tz) > > +{ > > + struct thermal_instance *pos; > > + > > + tz->temperature = THERMAL_TEMP_INVALID; > > + tz->passive = 0; > > + list_for_each_entry(pos, &tz->thermal_instances, tz_node) > > + pos->initialized = false; > > } > > > > void thermal_zone_device_update(struct thermal_zone_device *tz) > > @@ -1574,6 +1588,7 @@ struct thermal_zone_device *thermal_zone_device_register(const char *type, > > if (!tz->ops->get_temp) > > thermal_zone_device_set_polling(tz, 0); > > > > + thermal_zone_device_reset(tz); > > thermal_zone_device_update(tz); > > > > return tz; > > diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h > > index 0531c75..6d9ffa5 100644 > > --- a/drivers/thermal/thermal_core.h > > +++ b/drivers/thermal/thermal_core.h > > @@ -41,6 +41,7 @@ struct thermal_instance { > > struct thermal_zone_device *tz; > > struct thermal_cooling_device *cdev; > > int trip; > > + bool initialized; This could be more specific. If I understand it correctly, this flag indicates if the trend is valid or not. Can we call it "valid_trend" instead? > > unsigned long upper; /* Highest cooling state for this trip point */ > > unsigned long lower; /* Lowest cooling state for this trip point */ > > unsigned long target; /* expected cooling state */ > > diff --git a/include/linux/thermal.h b/include/linux/thermal.h > > index 5eac316..8650b0b 100644 > > --- a/include/linux/thermal.h > > +++ b/include/linux/thermal.h > > @@ -40,6 +40,9 @@ > > /* No upper/lower limit requirement */ > > #define THERMAL_NO_LIMIT ((u32)~0) > > > > +/* Invalid/uninitialized temperature */ > > +#define THERMAL_TEMP_INVALID -27400 Out of curiosity, why -27400? Why not INT_MIN? Cheers, Javi ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 1/3] Thermal: initialize thermal zone device correctly 2015-03-24 15:00 ` Eduardo Valentin 2015-03-24 17:20 ` Javi Merino @ 2015-03-25 2:14 ` Zhang, Rui 1 sibling, 0 replies; 32+ messages in thread From: Zhang, Rui @ 2015-03-25 2:14 UTC (permalink / raw) To: Eduardo Valentin; +Cc: linux-pm, stable > -----Original Message----- > From: Eduardo Valentin [mailto:edubezval@gmail.com] > Sent: Tuesday, March 24, 2015 11:00 PM > To: Zhang, Rui > Cc: linux-pm@vger.kernel.org; stable@vger.kernel.org > Subject: Re: [PATCH 1/3] Thermal: initialize thermal zone device correctly > Importance: High > > Rui, > > A couple of comments. > > On Tue, Mar 24, 2015 at 01:21:28PM +0800, Zhang Rui wrote: > > After thermal zone device registered, as we have not read any > > temperature before, thus tz->temperature should not be 0, which > > actually means 0C, and thermal trend is not available. > > In this case, we need specially handling for the first > > thermal_zone_device_update(). > > > > Both thermal core framework and step_wise governor is enhanced to handle > this. > > > > CC: <stable@vger.kernel.org> #3.18+ > > Tested-by: Manuel Krause <manuelkrause@netscape.net> > > Tested-by: szegad <szegadlo@poczta.onet.pl> > > Tested-by: prash <prash.n.rao@gmail.com> > > Tested-by: amish <ammdispose-arch@yahoo.com> > > Tested-by: Matthias <morpheusxyz123@yahoo.de> > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > --- > > drivers/thermal/step_wise.c | 15 +++++++++++++-- > > drivers/thermal/thermal_core.c | 19 +++++++++++++++++-- > > drivers/thermal/thermal_core.h | 1 + > > include/linux/thermal.h | 3 +++ > > 4 files changed, 34 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/thermal/step_wise.c b/drivers/thermal/step_wise.c > > Should this patch also include changes in other governors ? > No, I've checked the code, step_wise/bang_bang/user_space governor does not have this problem. > > index 5a0f12d..c2bb37c 100644 > > --- a/drivers/thermal/step_wise.c > > +++ b/drivers/thermal/step_wise.c > > @@ -63,6 +63,16 @@ static unsigned long get_target_state(struct > thermal_instance *instance, > > next_target = instance->target; > > dev_dbg(&cdev->device, "cur_state=%ld\n", cur_state); > > > > + if (!instance->initialized) { > > + if (throttle) { > > + next_target = (cur_state + 1) >= instance->upper ? > > + instance->upper : > > + ((cur_state + 1) < instance->lower ? > > + instance->lower : (cur_state + 1)); > > Why it makes sense to change the next state if a instance is uninitialized? > For thermal safety reason, I prefer to use a higher cooling state because the system is overheating with current cooling state I even used to think about using instance->upper directly, but in this case, cooling devices like processors are put into the lowest frequency, and processors on ACPI based platform are put into lowest t-state, which is overkill. > > + } else > > + next_target = THERMAL_NO_TARGET; > > + } > > + > > switch (trend) { > > case THERMAL_TREND_RAISING: > > if (throttle) { > > @@ -149,7 +159,8 @@ static void thermal_zone_trip_update(struct > thermal_zone_device *tz, int trip) > > dev_dbg(&instance->cdev->device, "old_target=%d, > target=%d\n", > > old_target, (int)instance->target); > > > > - if (old_target == instance->target) > > + if (instance->initialized && > > + old_target == instance->target) > > continue; > > > > /* Activate a passive thermal instance */ @@ -161,7 +172,7 > @@ > > static void thermal_zone_trip_update(struct thermal_zone_device *tz, int trip) > > instance->target == THERMAL_NO_TARGET) > > update_passive_instance(tz, trip_type, -1); > > > > - > > + instance->initialized = true; > > instance->cdev->updated = false; /* cdev needs update */ > > } > > > > diff --git a/drivers/thermal/thermal_core.c > > b/drivers/thermal/thermal_core.c index 174d3bc..9d6f71b 100644 > > --- a/drivers/thermal/thermal_core.c > > +++ b/drivers/thermal/thermal_core.c > > @@ -469,8 +469,22 @@ static void update_temperature(struct > thermal_zone_device *tz) > > mutex_unlock(&tz->lock); > > > > trace_thermal_temperature(tz); > > - dev_dbg(&tz->device, "last_temperature=%d, > current_temperature=%d\n", > > - tz->last_temperature, tz->temperature); > > + if (tz->last_temperature == THERMAL_TEMP_INVALID) > > + dev_dbg(&tz->device, "last_temperature N/A, > current_temperature=%d\n", > > + tz->temperature); > > + else > > + dev_dbg(&tz->device, "last_temperature=%d, > current_temperature=%d\n", > > + tz->last_temperature, tz->temperature); > > Should we also teach the tracing facility about THERMAL_TEMP_INVALID? > Hmm, I don't quite understand your question. Thanks, rui > > +} > > + > > +static void thermal_zone_device_reset(struct thermal_zone_device *tz) > > +{ > > + struct thermal_instance *pos; > > + > > + tz->temperature = THERMAL_TEMP_INVALID; > > + tz->passive = 0; > > + list_for_each_entry(pos, &tz->thermal_instances, tz_node) > > + pos->initialized = false; > > } > > > > void thermal_zone_device_update(struct thermal_zone_device *tz) @@ > > -1574,6 +1588,7 @@ struct thermal_zone_device > *thermal_zone_device_register(const char *type, > > if (!tz->ops->get_temp) > > thermal_zone_device_set_polling(tz, 0); > > > > + thermal_zone_device_reset(tz); > > thermal_zone_device_update(tz); > > > > return tz; > > diff --git a/drivers/thermal/thermal_core.h > > b/drivers/thermal/thermal_core.h index 0531c75..6d9ffa5 100644 > > --- a/drivers/thermal/thermal_core.h > > +++ b/drivers/thermal/thermal_core.h > > @@ -41,6 +41,7 @@ struct thermal_instance { > > struct thermal_zone_device *tz; > > struct thermal_cooling_device *cdev; > > int trip; > > + bool initialized; > > unsigned long upper; /* Highest cooling state for this trip point */ > > unsigned long lower; /* Lowest cooling state for this trip point */ > > unsigned long target; /* expected cooling state */ > > diff --git a/include/linux/thermal.h b/include/linux/thermal.h index > > 5eac316..8650b0b 100644 > > --- a/include/linux/thermal.h > > +++ b/include/linux/thermal.h > > @@ -40,6 +40,9 @@ > > /* No upper/lower limit requirement */ > > #define THERMAL_NO_LIMIT ((u32)~0) > > > > +/* Invalid/uninitialized temperature */ > > +#define THERMAL_TEMP_INVALID -27400 > > + > > /* Unit conversion macros */ > > #define KELVIN_TO_CELSIUS(t) (long)(((long)t-2732 >= 0) ? \ > > ((long)t-2732+5)/10 : ((long)t-2732-5)/10) > > -- > > 1.9.1 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-pm" in > > the body of a message to majordomo@vger.kernel.org More majordomo info > > at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 2/3] Thermal: handle thermal zone device properly during system sleep 2015-03-24 5:21 [PATCH 0/3] Thermal: thermal enhancements for boot and system sleep Zhang Rui 2015-03-24 5:21 ` [PATCH 1/3] Thermal: initialize thermal zone device correctly Zhang Rui @ 2015-03-24 5:21 ` Zhang Rui 2015-03-24 15:06 ` Eduardo Valentin 2015-03-24 16:39 ` Javi Merino 2015-03-24 5:21 ` [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered Zhang Rui 2 siblings, 2 replies; 32+ messages in thread From: Zhang Rui @ 2015-03-24 5:21 UTC (permalink / raw) To: linux-pm; +Cc: Zhang Rui, stable Current thermal code does not handle system sleep well because 1. the cooling device cooling state may be changed during suspend 2. the previous temperature reading becomes invalid after resumed because it is got before system sleep 3. updating thermal zone device during suspending/resuming is wrong because some devices may have already been suspended or may have not been resumed. Thus, the proper way to do this is to cancel all thermal zone device update requirements during suspend/resume, and after all the devices have been resumed, reset and update every registered thermal zone devices. This also fixes a regression introduced by commit 19593a1fb1f6718406afca5b867dab184289d406 Author: Aaron Lu <aaron.lu@intel.com> Date: Tue Nov 19 16:59:20 2013 +0800 ACPI / fan: convert to platform driver Convert ACPI fan driver to a platform driver for the purpose of phasing out ACPI bus. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Because, with the commit applied, all the fan devices are attached to the acpi_general_pm_domain, and they are turned on by the pm_domain automatically after resume, without the awareness of thermal core. CC: <stable@vger.kernel.org> #3.18+ Reference: https://bugzilla.kernel.org/show_bug.cgi?id=78201 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=91411 Tested-by: Manuel Krause <manuelkrause@netscape.net> Tested-by: szegad <szegadlo@poczta.onet.pl> Tested-by: prash <prash.n.rao@gmail.com> Tested-by: amish <ammdispose-arch@yahoo.com> Tested-by: Matthias <morpheusxyz123@yahoo.de> Signed-off-by: Zhang Rui <rui.zhang@intel.com> --- drivers/thermal/thermal_core.c | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 9d6f71b..9c03561 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -37,6 +37,7 @@ #include <linux/of.h> #include <net/netlink.h> #include <net/genetlink.h> +#include <linux/suspend.h> #define CREATE_TRACE_POINTS #include <trace/events/thermal.h> @@ -59,6 +60,9 @@ static LIST_HEAD(thermal_governor_list); static DEFINE_MUTEX(thermal_list_lock); static DEFINE_MUTEX(thermal_governor_lock); +static struct notifier_block thermal_pm_nb; +static bool no_thermal_update; + static struct thermal_governor *def_governor; static struct thermal_governor *__find_governor(const char *name) @@ -491,6 +495,9 @@ void thermal_zone_device_update(struct thermal_zone_device *tz) { int count; + if (no_thermal_update) + return; + if (!tz->ops->get_temp) return; @@ -1823,6 +1830,33 @@ static void thermal_unregister_governors(void) thermal_gov_user_space_unregister(); } +static int thermal_notify(struct notifier_block *nb, + unsigned long mode, void *_unused) +{ + struct thermal_zone_device *tz; + + switch (mode) { + case PM_HIBERNATION_PREPARE: + case PM_RESTORE_PREPARE: + case PM_SUSPEND_PREPARE: + no_thermal_update = true; + break; + case PM_POST_HIBERNATION: + case PM_POST_RESTORE: + case PM_POST_SUSPEND: + no_thermal_update = false; + list_for_each_entry(tz, &thermal_tz_list, node) { + thermal_zone_device_reset(tz); + thermal_zone_device_update(tz); + } + break; + default: + break; + } + return 0; +} + + static int __init thermal_init(void) { int result; @@ -1843,6 +1877,9 @@ static int __init thermal_init(void) if (result) goto exit_netlink; + thermal_pm_nb.notifier_call = thermal_notify; + register_pm_notifier(&thermal_pm_nb); + return 0; exit_netlink: -- 1.9.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 2/3] Thermal: handle thermal zone device properly during system sleep 2015-03-24 5:21 ` [PATCH 2/3] Thermal: handle thermal zone device properly during system sleep Zhang Rui @ 2015-03-24 15:06 ` Eduardo Valentin 2015-03-25 2:25 ` Zhang, Rui 2015-03-24 16:39 ` Javi Merino 1 sibling, 1 reply; 32+ messages in thread From: Eduardo Valentin @ 2015-03-24 15:06 UTC (permalink / raw) To: Zhang Rui; +Cc: linux-pm, stable [-- Attachment #1: Type: text/plain, Size: 4633 bytes --] Hey Rui On Tue, Mar 24, 2015 at 01:21:29PM +0800, Zhang Rui wrote: > Current thermal code does not handle system sleep well because > 1. the cooling device cooling state may be changed during suspend > 2. the previous temperature reading becomes invalid after resumed because > it is got before system sleep > 3. updating thermal zone device during suspending/resuming > is wrong because some devices may have already been suspended > or may have not been resumed. > > Thus, the proper way to do this is to cancel all thermal zone > device update requirements during suspend/resume, and after all > the devices have been resumed, reset and update every registered > thermal zone devices. > > This also fixes a regression introduced by > commit 19593a1fb1f6718406afca5b867dab184289d406 > Author: Aaron Lu <aaron.lu@intel.com> > Date: Tue Nov 19 16:59:20 2013 +0800 > > ACPI / fan: convert to platform driver > > Convert ACPI fan driver to a platform driver for the purpose of phasing > out ACPI bus. > > Signed-off-by: Aaron Lu <aaron.lu@intel.com> > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > Because, with the commit applied, all the fan devices are attached > to the acpi_general_pm_domain, and they are turned on by the pm_domain > automatically after resume, without the awareness of thermal core. > > CC: <stable@vger.kernel.org> #3.18+ > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=78201 > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=91411 > Tested-by: Manuel Krause <manuelkrause@netscape.net> > Tested-by: szegad <szegadlo@poczta.onet.pl> > Tested-by: prash <prash.n.rao@gmail.com> > Tested-by: amish <ammdispose-arch@yahoo.com> > Tested-by: Matthias <morpheusxyz123@yahoo.de> > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > --- > drivers/thermal/thermal_core.c | 37 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 37 insertions(+) > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index 9d6f71b..9c03561 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -37,6 +37,7 @@ > #include <linux/of.h> > #include <net/netlink.h> > #include <net/genetlink.h> > +#include <linux/suspend.h> > > #define CREATE_TRACE_POINTS > #include <trace/events/thermal.h> > @@ -59,6 +60,9 @@ static LIST_HEAD(thermal_governor_list); > static DEFINE_MUTEX(thermal_list_lock); > static DEFINE_MUTEX(thermal_governor_lock); > > +static struct notifier_block thermal_pm_nb; > +static bool no_thermal_update; Should this variable be considered to be accessed using a lock? > + > static struct thermal_governor *def_governor; > > static struct thermal_governor *__find_governor(const char *name) > @@ -491,6 +495,9 @@ void thermal_zone_device_update(struct thermal_zone_device *tz) > { > int count; > > + if (no_thermal_update) > + return; > + > if (!tz->ops->get_temp) > return; > > @@ -1823,6 +1830,33 @@ static void thermal_unregister_governors(void) > thermal_gov_user_space_unregister(); > } > > +static int thermal_notify(struct notifier_block *nb, > + unsigned long mode, void *_unused) I believe thermal_pm_notify sounds a better naming for this case. > +{ > + struct thermal_zone_device *tz; > + > + switch (mode) { > + case PM_HIBERNATION_PREPARE: > + case PM_RESTORE_PREPARE: > + case PM_SUSPEND_PREPARE: > + no_thermal_update = true; > + break; > + case PM_POST_HIBERNATION: > + case PM_POST_RESTORE: > + case PM_POST_SUSPEND: > + no_thermal_update = false; > + list_for_each_entry(tz, &thermal_tz_list, node) { > + thermal_zone_device_reset(tz); > + thermal_zone_device_update(tz); > + } > + break; > + default: > + break; > + } > + return 0; > +} > + > + > static int __init thermal_init(void) > { > int result; > @@ -1843,6 +1877,9 @@ static int __init thermal_init(void) > if (result) > goto exit_netlink; > > + thermal_pm_nb.notifier_call = thermal_notify; I believe you can declare thermal_pm_nb already with the callback initialized: static struct notifier_block thermal_pm_nb = { .notifier_call = thermal_notify, }; just put it after the thermal_notify function. > + register_pm_notifier(&thermal_pm_nb); > + > return 0; > > exit_netlink: > -- > 1.9.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 2/3] Thermal: handle thermal zone device properly during system sleep 2015-03-24 15:06 ` Eduardo Valentin @ 2015-03-25 2:25 ` Zhang, Rui 2015-03-25 14:40 ` Eduardo Valentin 0 siblings, 1 reply; 32+ messages in thread From: Zhang, Rui @ 2015-03-25 2:25 UTC (permalink / raw) To: Eduardo Valentin; +Cc: linux-pm, stable > -----Original Message----- > From: linux-pm-owner@vger.kernel.org [mailto:linux-pm- > owner@vger.kernel.org] On Behalf Of Eduardo Valentin > Sent: Tuesday, March 24, 2015 11:07 PM > To: Zhang, Rui > Cc: linux-pm@vger.kernel.org; stable@vger.kernel.org > Subject: Re: [PATCH 2/3] Thermal: handle thermal zone device properly during > system sleep > Importance: High > > Hey Rui > > On Tue, Mar 24, 2015 at 01:21:29PM +0800, Zhang Rui wrote: > > Current thermal code does not handle system sleep well because 1. the > > cooling device cooling state may be changed during suspend 2. the > > previous temperature reading becomes invalid after resumed because > > it is got before system sleep > > 3. updating thermal zone device during suspending/resuming > > is wrong because some devices may have already been suspended > > or may have not been resumed. > > > > Thus, the proper way to do this is to cancel all thermal zone device > > update requirements during suspend/resume, and after all the devices > > have been resumed, reset and update every registered thermal zone > > devices. > > > > This also fixes a regression introduced by commit > > 19593a1fb1f6718406afca5b867dab184289d406 > > Author: Aaron Lu <aaron.lu@intel.com> > > Date: Tue Nov 19 16:59:20 2013 +0800 > > > > ACPI / fan: convert to platform driver > > > > Convert ACPI fan driver to a platform driver for the purpose of phasing > > out ACPI bus. > > > > Signed-off-by: Aaron Lu <aaron.lu@intel.com> > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > > > Because, with the commit applied, all the fan devices are attached to > > the acpi_general_pm_domain, and they are turned on by the pm_domain > > automatically after resume, without the awareness of thermal core. > > > > CC: <stable@vger.kernel.org> #3.18+ > > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=78201 > > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=91411 > > Tested-by: Manuel Krause <manuelkrause@netscape.net> > > Tested-by: szegad <szegadlo@poczta.onet.pl> > > Tested-by: prash <prash.n.rao@gmail.com> > > Tested-by: amish <ammdispose-arch@yahoo.com> > > Tested-by: Matthias <morpheusxyz123@yahoo.de> > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > --- > > drivers/thermal/thermal_core.c | 37 > > +++++++++++++++++++++++++++++++++++++ > > 1 file changed, 37 insertions(+) > > > > diff --git a/drivers/thermal/thermal_core.c > > b/drivers/thermal/thermal_core.c index 9d6f71b..9c03561 100644 > > --- a/drivers/thermal/thermal_core.c > > +++ b/drivers/thermal/thermal_core.c > > @@ -37,6 +37,7 @@ > > #include <linux/of.h> > > #include <net/netlink.h> > > #include <net/genetlink.h> > > +#include <linux/suspend.h> > > > > #define CREATE_TRACE_POINTS > > #include <trace/events/thermal.h> > > @@ -59,6 +60,9 @@ static LIST_HEAD(thermal_governor_list); static > > DEFINE_MUTEX(thermal_list_lock); static > > DEFINE_MUTEX(thermal_governor_lock); > > > > +static struct notifier_block thermal_pm_nb; static bool > > +no_thermal_update; > > Should this variable be considered to be accessed using a lock? > Hmmm, why? It is set once when entering suspend, and cleared once when resuming, and this whole process is protected by the pm_mutex lock, right? > > + > > static struct thermal_governor *def_governor; > > > > static struct thermal_governor *__find_governor(const char *name) @@ > > -491,6 +495,9 @@ void thermal_zone_device_update(struct > > thermal_zone_device *tz) { > > int count; > > > > + if (no_thermal_update) > > + return; > > + > > if (!tz->ops->get_temp) > > return; > > > > @@ -1823,6 +1830,33 @@ static void thermal_unregister_governors(void) > > thermal_gov_user_space_unregister(); > > } > > > > +static int thermal_notify(struct notifier_block *nb, > > + unsigned long mode, void *_unused) > > I believe thermal_pm_notify sounds a better naming for this case. > Okay, will change it to thermal_pm_notify in next version. > > +{ > > + struct thermal_zone_device *tz; > > + > > + switch (mode) { > > + case PM_HIBERNATION_PREPARE: > > + case PM_RESTORE_PREPARE: > > + case PM_SUSPEND_PREPARE: > > + no_thermal_update = true; > > + break; > > + case PM_POST_HIBERNATION: > > + case PM_POST_RESTORE: > > + case PM_POST_SUSPEND: > > + no_thermal_update = false; > > + list_for_each_entry(tz, &thermal_tz_list, node) { > > + thermal_zone_device_reset(tz); > > + thermal_zone_device_update(tz); > > + } > > + break; > > + default: > > + break; > > + } > > + return 0; > > +} > > + > > + > > static int __init thermal_init(void) > > { > > int result; > > @@ -1843,6 +1877,9 @@ static int __init thermal_init(void) > > if (result) > > goto exit_netlink; > > > > + thermal_pm_nb.notifier_call = thermal_notify; > > I believe you can declare thermal_pm_nb already with the callback > initialized: > > > > static struct notifier_block thermal_pm_nb = { > .notifier_call = thermal_notify, > }; > Yes, will do this. Thanks, rui > > just put it after the thermal_notify function. > > > + register_pm_notifier(&thermal_pm_nb); > > + > > return 0; > > > > exit_netlink: > > -- > > 1.9.1 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-pm" in > > the body of a message to majordomo@vger.kernel.org More majordomo info > > at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/3] Thermal: handle thermal zone device properly during system sleep 2015-03-25 2:25 ` Zhang, Rui @ 2015-03-25 14:40 ` Eduardo Valentin 0 siblings, 0 replies; 32+ messages in thread From: Eduardo Valentin @ 2015-03-25 14:40 UTC (permalink / raw) To: Zhang, Rui; +Cc: linux-pm, stable [-- Attachment #1: Type: text/plain, Size: 6129 bytes --] On Wed, Mar 25, 2015 at 02:25:06AM +0000, Zhang, Rui wrote: > > > > -----Original Message----- > > From: linux-pm-owner@vger.kernel.org [mailto:linux-pm- > > owner@vger.kernel.org] On Behalf Of Eduardo Valentin > > Sent: Tuesday, March 24, 2015 11:07 PM > > To: Zhang, Rui > > Cc: linux-pm@vger.kernel.org; stable@vger.kernel.org > > Subject: Re: [PATCH 2/3] Thermal: handle thermal zone device properly during > > system sleep > > Importance: High > > > > Hey Rui > > > > On Tue, Mar 24, 2015 at 01:21:29PM +0800, Zhang Rui wrote: > > > Current thermal code does not handle system sleep well because 1. the > > > cooling device cooling state may be changed during suspend 2. the > > > previous temperature reading becomes invalid after resumed because > > > it is got before system sleep > > > 3. updating thermal zone device during suspending/resuming > > > is wrong because some devices may have already been suspended > > > or may have not been resumed. > > > > > > Thus, the proper way to do this is to cancel all thermal zone device > > > update requirements during suspend/resume, and after all the devices > > > have been resumed, reset and update every registered thermal zone > > > devices. > > > > > > This also fixes a regression introduced by commit > > > 19593a1fb1f6718406afca5b867dab184289d406 > > > Author: Aaron Lu <aaron.lu@intel.com> > > > Date: Tue Nov 19 16:59:20 2013 +0800 > > > > > > ACPI / fan: convert to platform driver > > > > > > Convert ACPI fan driver to a platform driver for the purpose of phasing > > > out ACPI bus. > > > > > > Signed-off-by: Aaron Lu <aaron.lu@intel.com> > > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > > > > > Because, with the commit applied, all the fan devices are attached to > > > the acpi_general_pm_domain, and they are turned on by the pm_domain > > > automatically after resume, without the awareness of thermal core. > > > > > > CC: <stable@vger.kernel.org> #3.18+ > > > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=78201 > > > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=91411 > > > Tested-by: Manuel Krause <manuelkrause@netscape.net> > > > Tested-by: szegad <szegadlo@poczta.onet.pl> > > > Tested-by: prash <prash.n.rao@gmail.com> > > > Tested-by: amish <ammdispose-arch@yahoo.com> > > > Tested-by: Matthias <morpheusxyz123@yahoo.de> > > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > > --- > > > drivers/thermal/thermal_core.c | 37 > > > +++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 37 insertions(+) > > > > > > diff --git a/drivers/thermal/thermal_core.c > > > b/drivers/thermal/thermal_core.c index 9d6f71b..9c03561 100644 > > > --- a/drivers/thermal/thermal_core.c > > > +++ b/drivers/thermal/thermal_core.c > > > @@ -37,6 +37,7 @@ > > > #include <linux/of.h> > > > #include <net/netlink.h> > > > #include <net/genetlink.h> > > > +#include <linux/suspend.h> > > > > > > #define CREATE_TRACE_POINTS > > > #include <trace/events/thermal.h> > > > @@ -59,6 +60,9 @@ static LIST_HEAD(thermal_governor_list); static > > > DEFINE_MUTEX(thermal_list_lock); static > > > DEFINE_MUTEX(thermal_governor_lock); > > > > > > +static struct notifier_block thermal_pm_nb; static bool > > > +no_thermal_update; > > > > Should this variable be considered to be accessed using a lock? > > > Hmmm, why? Because you access the variable out of the suspend path. > It is set once when entering suspend, and cleared once when resuming, > and this whole process is protected by the pm_mutex lock, right? > yeah, if you would be accessing it only inside the suspend path, but you have an extra reader... > > > + > > > static struct thermal_governor *def_governor; > > > > > > static struct thermal_governor *__find_governor(const char *name) @@ > > > -491,6 +495,9 @@ void thermal_zone_device_update(struct > > > thermal_zone_device *tz) { > > > int count; > > > > > > + if (no_thermal_update) > > > + return; > > > + .. right here. > > > if (!tz->ops->get_temp) > > > return; > > > > > > @@ -1823,6 +1830,33 @@ static void thermal_unregister_governors(void) > > > thermal_gov_user_space_unregister(); > > > } > > > > > > +static int thermal_notify(struct notifier_block *nb, > > > + unsigned long mode, void *_unused) > > > > I believe thermal_pm_notify sounds a better naming for this case. > > > Okay, will change it to thermal_pm_notify in next version. > > > > +{ > > > + struct thermal_zone_device *tz; > > > + > > > + switch (mode) { > > > + case PM_HIBERNATION_PREPARE: > > > + case PM_RESTORE_PREPARE: > > > + case PM_SUSPEND_PREPARE: > > > + no_thermal_update = true; > > > + break; > > > + case PM_POST_HIBERNATION: > > > + case PM_POST_RESTORE: > > > + case PM_POST_SUSPEND: > > > + no_thermal_update = false; > > > + list_for_each_entry(tz, &thermal_tz_list, node) { > > > + thermal_zone_device_reset(tz); > > > + thermal_zone_device_update(tz); > > > + } > > > + break; > > > + default: > > > + break; > > > + } > > > + return 0; > > > +} > > > + > > > + > > > static int __init thermal_init(void) > > > { > > > int result; > > > @@ -1843,6 +1877,9 @@ static int __init thermal_init(void) > > > if (result) > > > goto exit_netlink; > > > > > > + thermal_pm_nb.notifier_call = thermal_notify; > > > > I believe you can declare thermal_pm_nb already with the callback > > initialized: > > > > > > > > static struct notifier_block thermal_pm_nb = { > > .notifier_call = thermal_notify, > > }; > > > Yes, will do this. > > Thanks, > rui > > > > just put it after the thermal_notify function. > > > > > + register_pm_notifier(&thermal_pm_nb); > > > + > > > return 0; > > > > > > exit_netlink: > > > -- > > > 1.9.1 > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-pm" in > > > the body of a message to majordomo@vger.kernel.org More majordomo info > > > at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/3] Thermal: handle thermal zone device properly during system sleep 2015-03-24 5:21 ` [PATCH 2/3] Thermal: handle thermal zone device properly during system sleep Zhang Rui 2015-03-24 15:06 ` Eduardo Valentin @ 2015-03-24 16:39 ` Javi Merino 2015-03-25 2:28 ` Zhang, Rui 1 sibling, 1 reply; 32+ messages in thread From: Javi Merino @ 2015-03-24 16:39 UTC (permalink / raw) To: Zhang Rui; +Cc: linux-pm, stable One minor nit On Tue, Mar 24, 2015 at 05:21:29AM +0000, Zhang Rui wrote: > Current thermal code does not handle system sleep well because > 1. the cooling device cooling state may be changed during suspend > 2. the previous temperature reading becomes invalid after resumed because > it is got before system sleep > 3. updating thermal zone device during suspending/resuming > is wrong because some devices may have already been suspended > or may have not been resumed. > > Thus, the proper way to do this is to cancel all thermal zone > device update requirements during suspend/resume, and after all > the devices have been resumed, reset and update every registered > thermal zone devices. > > This also fixes a regression introduced by > commit 19593a1fb1f6718406afca5b867dab184289d406 > Author: Aaron Lu <aaron.lu@intel.com> > Date: Tue Nov 19 16:59:20 2013 +0800 > > ACPI / fan: convert to platform driver > > Convert ACPI fan driver to a platform driver for the purpose of phasing > out ACPI bus. > > Signed-off-by: Aaron Lu <aaron.lu@intel.com> > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > Because, with the commit applied, all the fan devices are attached > to the acpi_general_pm_domain, and they are turned on by the pm_domain > automatically after resume, without the awareness of thermal core. > > CC: <stable@vger.kernel.org> #3.18+ > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=78201 > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=91411 > Tested-by: Manuel Krause <manuelkrause@netscape.net> > Tested-by: szegad <szegadlo@poczta.onet.pl> > Tested-by: prash <prash.n.rao@gmail.com> > Tested-by: amish <ammdispose-arch@yahoo.com> > Tested-by: Matthias <morpheusxyz123@yahoo.de> > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > --- > drivers/thermal/thermal_core.c | 37 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 37 insertions(+) > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index 9d6f71b..9c03561 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -37,6 +37,7 @@ > #include <linux/of.h> > #include <net/netlink.h> > #include <net/genetlink.h> > +#include <linux/suspend.h> > > #define CREATE_TRACE_POINTS > #include <trace/events/thermal.h> > @@ -59,6 +60,9 @@ static LIST_HEAD(thermal_governor_list); > static DEFINE_MUTEX(thermal_list_lock); > static DEFINE_MUTEX(thermal_governor_lock); > > +static struct notifier_block thermal_pm_nb; > +static bool no_thermal_update; Can this have a name without a negative? It's a bit hard to read the double-negative in "no_thermal_update = false". Maybe "in_suspend" is better? Cheers, Javi > + > static struct thermal_governor *def_governor; > > static struct thermal_governor *__find_governor(const char *name) > @@ -491,6 +495,9 @@ void thermal_zone_device_update(struct thermal_zone_device *tz) > { > int count; > > + if (no_thermal_update) > + return; > + > if (!tz->ops->get_temp) > return; > > @@ -1823,6 +1830,33 @@ static void thermal_unregister_governors(void) > thermal_gov_user_space_unregister(); > } > > +static int thermal_notify(struct notifier_block *nb, > + unsigned long mode, void *_unused) > +{ > + struct thermal_zone_device *tz; > + > + switch (mode) { > + case PM_HIBERNATION_PREPARE: > + case PM_RESTORE_PREPARE: > + case PM_SUSPEND_PREPARE: > + no_thermal_update = true; > + break; > + case PM_POST_HIBERNATION: > + case PM_POST_RESTORE: > + case PM_POST_SUSPEND: > + no_thermal_update = false; > + list_for_each_entry(tz, &thermal_tz_list, node) { > + thermal_zone_device_reset(tz); > + thermal_zone_device_update(tz); > + } > + break; > + default: > + break; > + } > + return 0; > +} > + > + > static int __init thermal_init(void) > { > int result; ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 2/3] Thermal: handle thermal zone device properly during system sleep 2015-03-24 16:39 ` Javi Merino @ 2015-03-25 2:28 ` Zhang, Rui 0 siblings, 0 replies; 32+ messages in thread From: Zhang, Rui @ 2015-03-25 2:28 UTC (permalink / raw) To: Javi Merino; +Cc: linux-pm, stable > -----Original Message----- > From: Javi Merino [mailto:javi.merino@arm.com] > Sent: Wednesday, March 25, 2015 12:39 AM > To: Zhang, Rui > Cc: linux-pm@vger.kernel.org; stable@vger.kernel.org > Subject: Re: [PATCH 2/3] Thermal: handle thermal zone device properly during > system sleep > Importance: High > > One minor nit > > On Tue, Mar 24, 2015 at 05:21:29AM +0000, Zhang Rui wrote: > > Current thermal code does not handle system sleep well because 1. the > > cooling device cooling state may be changed during suspend 2. the > > previous temperature reading becomes invalid after resumed because > > it is got before system sleep > > 3. updating thermal zone device during suspending/resuming > > is wrong because some devices may have already been suspended > > or may have not been resumed. > > > > Thus, the proper way to do this is to cancel all thermal zone device > > update requirements during suspend/resume, and after all the devices > > have been resumed, reset and update every registered thermal zone > > devices. > > > > This also fixes a regression introduced by commit > > 19593a1fb1f6718406afca5b867dab184289d406 > > Author: Aaron Lu <aaron.lu@intel.com> > > Date: Tue Nov 19 16:59:20 2013 +0800 > > > > ACPI / fan: convert to platform driver > > > > Convert ACPI fan driver to a platform driver for the purpose of phasing > > out ACPI bus. > > > > Signed-off-by: Aaron Lu <aaron.lu@intel.com> > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > > > Because, with the commit applied, all the fan devices are attached to > > the acpi_general_pm_domain, and they are turned on by the pm_domain > > automatically after resume, without the awareness of thermal core. > > > > CC: <stable@vger.kernel.org> #3.18+ > > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=78201 > > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=91411 > > Tested-by: Manuel Krause <manuelkrause@netscape.net> > > Tested-by: szegad <szegadlo@poczta.onet.pl> > > Tested-by: prash <prash.n.rao@gmail.com> > > Tested-by: amish <ammdispose-arch@yahoo.com> > > Tested-by: Matthias <morpheusxyz123@yahoo.de> > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > --- > > drivers/thermal/thermal_core.c | 37 > > +++++++++++++++++++++++++++++++++++++ > > 1 file changed, 37 insertions(+) > > > > diff --git a/drivers/thermal/thermal_core.c > > b/drivers/thermal/thermal_core.c index 9d6f71b..9c03561 100644 > > --- a/drivers/thermal/thermal_core.c > > +++ b/drivers/thermal/thermal_core.c > > @@ -37,6 +37,7 @@ > > #include <linux/of.h> > > #include <net/netlink.h> > > #include <net/genetlink.h> > > +#include <linux/suspend.h> > > > > #define CREATE_TRACE_POINTS > > #include <trace/events/thermal.h> > > @@ -59,6 +60,9 @@ static LIST_HEAD(thermal_governor_list); static > > DEFINE_MUTEX(thermal_list_lock); static > > DEFINE_MUTEX(thermal_governor_lock); > > > > +static struct notifier_block thermal_pm_nb; static bool > > +no_thermal_update; > > Can this have a name without a negative? It's a bit hard to read the double- > negative in "no_thermal_update = false". Maybe "in_suspend" is better? > Sounds reasonable, will do it in next version. Thanks, Rui > Cheers, > Javi > > > + > > static struct thermal_governor *def_governor; > > > > static struct thermal_governor *__find_governor(const char *name) @@ > > -491,6 +495,9 @@ void thermal_zone_device_update(struct > > thermal_zone_device *tz) { > > int count; > > > > + if (no_thermal_update) > > + return; > > + > > if (!tz->ops->get_temp) > > return; > > > > @@ -1823,6 +1830,33 @@ static void thermal_unregister_governors(void) > > thermal_gov_user_space_unregister(); > > } > > > > +static int thermal_notify(struct notifier_block *nb, > > + unsigned long mode, void *_unused) { > > + struct thermal_zone_device *tz; > > + > > + switch (mode) { > > + case PM_HIBERNATION_PREPARE: > > + case PM_RESTORE_PREPARE: > > + case PM_SUSPEND_PREPARE: > > + no_thermal_update = true; > > + break; > > + case PM_POST_HIBERNATION: > > + case PM_POST_RESTORE: > > + case PM_POST_SUSPEND: > > + no_thermal_update = false; > > + list_for_each_entry(tz, &thermal_tz_list, node) { > > + thermal_zone_device_reset(tz); > > + thermal_zone_device_update(tz); > > + } > > + break; > > + default: > > + break; > > + } > > + return 0; > > +} > > + > > + > > static int __init thermal_init(void) > > { > > int result; ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered 2015-03-24 5:21 [PATCH 0/3] Thermal: thermal enhancements for boot and system sleep Zhang Rui 2015-03-24 5:21 ` [PATCH 1/3] Thermal: initialize thermal zone device correctly Zhang Rui 2015-03-24 5:21 ` [PATCH 2/3] Thermal: handle thermal zone device properly during system sleep Zhang Rui @ 2015-03-24 5:21 ` Zhang Rui 2015-03-24 15:12 ` Eduardo Valentin 2 siblings, 1 reply; 32+ messages in thread From: Zhang Rui @ 2015-03-24 5:21 UTC (permalink / raw) To: linux-pm; +Cc: Zhang Rui, stable When a new cooling device is registered, we need to update the thermal zone to set the new registered cooling device to a proper state. This fixes a problem that the system is cool, while the fan devices are left running on full speed after boot, if fan device is registered after thermal zone device. CC: <stable@vger.kernel.org> #3.18+ Reference:https://bugzilla.kernel.org/show_bug.cgi?id=92431 Tested-by: Manuel Krause <manuelkrause@netscape.net> Tested-by: szegad <szegadlo@poczta.onet.pl> Tested-by: prash <prash.n.rao@gmail.com> Tested-by: amish <ammdispose-arch@yahoo.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> --- drivers/thermal/thermal_core.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 9c03561..7cef579 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -1141,6 +1141,7 @@ __thermal_cooling_device_register(struct device_node *np, const struct thermal_cooling_device_ops *ops) { struct thermal_cooling_device *cdev; + struct thermal_instance *pos, *next; int result; if (type && strlen(type) >= THERMAL_NAME_LENGTH) @@ -1185,6 +1186,15 @@ __thermal_cooling_device_register(struct device_node *np, /* Update binding information for 'this' new cdev */ bind_cdev(cdev); + list_for_each_entry_safe(pos, next, &cdev->thermal_instances, cdev_node) { + if (next->cdev_node.next == &cdev->thermal_instances) { + thermal_zone_device_update(next->tz); + break; + } + if (pos->tz != next->tz) + thermal_zone_device_update(pos->tz); + } + return cdev; } -- 1.9.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered 2015-03-24 5:21 ` [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered Zhang Rui @ 2015-03-24 15:12 ` Eduardo Valentin 2015-03-25 2:27 ` Zhang, Rui 0 siblings, 1 reply; 32+ messages in thread From: Eduardo Valentin @ 2015-03-24 15:12 UTC (permalink / raw) To: Zhang Rui; +Cc: linux-pm, stable [-- Attachment #1: Type: text/plain, Size: 2243 bytes --] Hi, On Tue, Mar 24, 2015 at 01:21:30PM +0800, Zhang Rui wrote: > When a new cooling device is registered, we need to update the > thermal zone to set the new registered cooling device to a proper > state. > > This fixes a problem that the system is cool, while the fan devices are left > running on full speed after boot, if fan device is registered after > thermal zone device. > > CC: <stable@vger.kernel.org> #3.18+ > Reference:https://bugzilla.kernel.org/show_bug.cgi?id=92431 > Tested-by: Manuel Krause <manuelkrause@netscape.net> > Tested-by: szegad <szegadlo@poczta.onet.pl> > Tested-by: prash <prash.n.rao@gmail.com> > Tested-by: amish <ammdispose-arch@yahoo.com> > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > --- > drivers/thermal/thermal_core.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index 9c03561..7cef579 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -1141,6 +1141,7 @@ __thermal_cooling_device_register(struct device_node *np, > const struct thermal_cooling_device_ops *ops) > { > struct thermal_cooling_device *cdev; > + struct thermal_instance *pos, *next; > int result; > > if (type && strlen(type) >= THERMAL_NAME_LENGTH) > @@ -1185,6 +1186,15 @@ __thermal_cooling_device_register(struct device_node *np, > /* Update binding information for 'this' new cdev */ > bind_cdev(cdev); > > + list_for_each_entry_safe(pos, next, &cdev->thermal_instances, cdev_node) { > + if (next->cdev_node.next == &cdev->thermal_instances) { > + thermal_zone_device_update(next->tz); > + break; > + } > + if (pos->tz != next->tz) > + thermal_zone_device_update(pos->tz); Shouldn't we simply trigger a thermal_zone_device_update(pos->tz) ? I mean, we are adding a new cooling device to the zone, so, it might make sense to update it anyway. > + } > + > return cdev; > } > > -- > 1.9.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered 2015-03-24 15:12 ` Eduardo Valentin @ 2015-03-25 2:27 ` Zhang, Rui 0 siblings, 0 replies; 32+ messages in thread From: Zhang, Rui @ 2015-03-25 2:27 UTC (permalink / raw) To: Eduardo Valentin; +Cc: linux-pm, stable > -----Original Message----- > From: linux-pm-owner@vger.kernel.org [mailto:linux-pm- > owner@vger.kernel.org] On Behalf Of Eduardo Valentin > Sent: Tuesday, March 24, 2015 11:13 PM > To: Zhang, Rui > Cc: linux-pm@vger.kernel.org; stable@vger.kernel.org > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling > device registered > Importance: High > > Hi, > > On Tue, Mar 24, 2015 at 01:21:30PM +0800, Zhang Rui wrote: > > When a new cooling device is registered, we need to update the thermal > > zone to set the new registered cooling device to a proper state. > > > > This fixes a problem that the system is cool, while the fan devices > > are left running on full speed after boot, if fan device is registered > > after thermal zone device. > > > > CC: <stable@vger.kernel.org> #3.18+ > > Reference:https://bugzilla.kernel.org/show_bug.cgi?id=92431 > > Tested-by: Manuel Krause <manuelkrause@netscape.net> > > Tested-by: szegad <szegadlo@poczta.onet.pl> > > Tested-by: prash <prash.n.rao@gmail.com> > > Tested-by: amish <ammdispose-arch@yahoo.com> > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > --- > > drivers/thermal/thermal_core.c | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/drivers/thermal/thermal_core.c > > b/drivers/thermal/thermal_core.c index 9c03561..7cef579 100644 > > --- a/drivers/thermal/thermal_core.c > > +++ b/drivers/thermal/thermal_core.c > > @@ -1141,6 +1141,7 @@ __thermal_cooling_device_register(struct > device_node *np, > > const struct thermal_cooling_device_ops *ops) > { > > struct thermal_cooling_device *cdev; > > + struct thermal_instance *pos, *next; > > int result; > > > > if (type && strlen(type) >= THERMAL_NAME_LENGTH) @@ -1185,6 > +1186,15 > > @@ __thermal_cooling_device_register(struct device_node *np, > > /* Update binding information for 'this' new cdev */ > > bind_cdev(cdev); > > > > + list_for_each_entry_safe(pos, next, &cdev->thermal_instances, > cdev_node) { > > + if (next->cdev_node.next == &cdev->thermal_instances) > { > > + thermal_zone_device_update(next->tz); > > + break; > > + } > > + if (pos->tz != next->tz) > > + thermal_zone_device_update(pos->tz); > > Shouldn't we simply trigger a thermal_zone_device_update(pos->tz) ? I mean, > we are adding a new cooling device to the zone, so, it might make sense to > update it anyway. > We may have a couple of themal instances for the same cdev and thermal zone, but for different trips. And the code above ignore the duplicate thermal_zone_device_update() for the same thermal zone. Thanks, rui > > + } > > > > + > > return cdev; > > } > > > > -- > > 1.9.1 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-pm" in > > the body of a message to majordomo@vger.kernel.org More majordomo info > > at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered @ 2015-09-27 5:48 Chen Yu 2015-09-28 14:29 ` Javi Merino 0 siblings, 1 reply; 32+ messages in thread From: Chen Yu @ 2015-09-27 5:48 UTC (permalink / raw) To: linux-pm, edubezval, javi.merino; +Cc: rui.zhang, linux-kernel, stable From: Zhang Rui <rui.zhang@intel.com> When a new cooling device is registered, we need to update the thermal zone to set the new registered cooling device to a proper state. This fixes a problem that the system is cool, while the fan devices are left running on full speed after boot, if fan device is registered after thermal zone device. CC: <stable@vger.kernel.org> #3.18+ Reference:https://bugzilla.kernel.org/show_bug.cgi?id=92431 Tested-by: Manuel Krause <manuelkrause@netscape.net> Tested-by: szegad <szegadlo@poczta.onet.pl> Tested-by: prash <prash.n.rao@gmail.com> Tested-by: amish <ammdispose-arch@yahoo.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> --- drivers/thermal/thermal_core.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index c3bdb48..09c78a4 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -1450,6 +1450,7 @@ __thermal_cooling_device_register(struct device_node *np, const struct thermal_cooling_device_ops *ops) { struct thermal_cooling_device *cdev; + struct thermal_instance *pos, *next; int result; if (type && strlen(type) >= THERMAL_NAME_LENGTH) @@ -1494,6 +1495,15 @@ __thermal_cooling_device_register(struct device_node *np, /* Update binding information for 'this' new cdev */ bind_cdev(cdev); + list_for_each_entry_safe(pos, next, &cdev->thermal_instances, cdev_node) { + if (next->cdev_node.next == &cdev->thermal_instances) { + thermal_zone_device_update(next->tz); + break; + } + if (pos->tz != next->tz) + thermal_zone_device_update(pos->tz); + } + return cdev; } -- 1.8.4.2 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered 2015-09-27 5:48 Chen Yu @ 2015-09-28 14:29 ` Javi Merino 2015-09-28 17:52 ` Chen, Yu C 0 siblings, 1 reply; 32+ messages in thread From: Javi Merino @ 2015-09-28 14:29 UTC (permalink / raw) To: Chen Yu; +Cc: linux-pm, edubezval, rui.zhang, linux-kernel, stable On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > From: Zhang Rui <rui.zhang@intel.com> > > When a new cooling device is registered, we need to update the > thermal zone to set the new registered cooling device to a proper > state. > > This fixes a problem that the system is cool, while the fan devices > are left running on full speed after boot, if fan device is registered > after thermal zone device. > > CC: <stable@vger.kernel.org> #3.18+ > Reference:https://bugzilla.kernel.org/show_bug.cgi?id=92431 > Tested-by: Manuel Krause <manuelkrause@netscape.net> > Tested-by: szegad <szegadlo@poczta.onet.pl> > Tested-by: prash <prash.n.rao@gmail.com> > Tested-by: amish <ammdispose-arch@yahoo.com> > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > Signed-off-by: Chen Yu <yu.c.chen@intel.com> > --- > drivers/thermal/thermal_core.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index c3bdb48..09c78a4 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -1450,6 +1450,7 @@ __thermal_cooling_device_register(struct device_node *np, > const struct thermal_cooling_device_ops *ops) > { > struct thermal_cooling_device *cdev; > + struct thermal_instance *pos, *next; > int result; > > if (type && strlen(type) >= THERMAL_NAME_LENGTH) > @@ -1494,6 +1495,15 @@ __thermal_cooling_device_register(struct device_node *np, > /* Update binding information for 'this' new cdev */ > bind_cdev(cdev); > I think you need to hold cdev->lock here, to make sure that no thermal zone is added or removed from cdev->thermal_instances while you are looping. > + list_for_each_entry_safe(pos, next, &cdev->thermal_instances, cdev_node) { Why list_for_each_entry_safe() ? You are not going to remove any entry, so you can just use list_for_each_entry() > + if (next->cdev_node.next == &cdev->thermal_instances) { > + thermal_zone_device_update(next->tz); > + break; > + } > + if (pos->tz != next->tz) > + thermal_zone_device_update(pos->tz); > + } Why is this so complicated? Can't you just do: list_for_each_entry(pos, &cdev->thermal_instances, cdev_node) thermal_zone_device_update(pos->tz); Cheers, Javi ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered 2015-09-28 14:29 ` Javi Merino @ 2015-09-28 17:52 ` Chen, Yu C 0 siblings, 0 replies; 32+ messages in thread From: Chen, Yu C @ 2015-09-28 17:52 UTC (permalink / raw) To: Javi Merino; +Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1471 bytes --] Hi, Javi, > -----Original Message----- > From: Javi Merino [mailto:javi.merino@arm.com] > Sent: Monday, September 28, 2015 10:29 PM > To: Chen, Yu C > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > kernel@vger.kernel.org; stable@vger.kernel.org > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling > device registered > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > I think you need to hold cdev->lock here, to make sure that no thermal zone > is added or removed from cdev->thermal_instances while you are looping. > Ah right, will add. If I add the cdev ->lock here, will there be a AB-BA lock with thermal_zone_unbind_cooling_device? > > Why list_for_each_entry_safe() ? You are not going to remove any entry, so > you can just use list_for_each_entry() > > > Why is this so complicated? Can't you just do: > > list_for_each_entry(pos, &cdev->thermal_instances, cdev_node) > thermal_zone_device_update(pos->tz); > This is an optimization here: Ignore thermal instance that refers to the same thermal zone in this loop, this works because bind_cdev() always binds the cooling device to one thermal zone first, and then binds to the next thermal zone. Best Regards, Yu ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered @ 2015-09-28 17:52 ` Chen, Yu C 0 siblings, 0 replies; 32+ messages in thread From: Chen, Yu C @ 2015-09-28 17:52 UTC (permalink / raw) To: Javi Merino; +Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable Hi, Javi, > -----Original Message----- > From: Javi Merino [mailto:javi.merino@arm.com] > Sent: Monday, September 28, 2015 10:29 PM > To: Chen, Yu C > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > kernel@vger.kernel.org; stable@vger.kernel.org > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling > device registered > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > I think you need to hold cdev->lock here, to make sure that no thermal zone > is added or removed from cdev->thermal_instances while you are looping. > Ah right, will add. If I add the cdev ->lock here, will there be a AB-BA lock with thermal_zone_unbind_cooling_device? > > Why list_for_each_entry_safe() ? You are not going to remove any entry, so > you can just use list_for_each_entry() > > > Why is this so complicated? Can't you just do: > > list_for_each_entry(pos, &cdev->thermal_instances, cdev_node) > thermal_zone_device_update(pos->tz); > This is an optimization here: Ignore thermal instance that refers to the same thermal zone in this loop, this works because bind_cdev() always binds the cooling device to one thermal zone first, and then binds to the next thermal zone. Best Regards, Yu ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered 2015-09-28 17:52 ` Chen, Yu C (?) @ 2015-09-29 16:01 ` Javi Merino 2015-10-12 9:23 ` Chen, Yu C -1 siblings, 1 reply; 32+ messages in thread From: Javi Merino @ 2015-09-29 16:01 UTC (permalink / raw) To: Chen, Yu C; +Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable Hi Yu, On Mon, Sep 28, 2015 at 06:52:00PM +0100, Chen, Yu C wrote: > Hi, Javi, > > > -----Original Message----- > > From: Javi Merino [mailto:javi.merino@arm.com] > > Sent: Monday, September 28, 2015 10:29 PM > > To: Chen, Yu C > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > > kernel@vger.kernel.org; stable@vger.kernel.org > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling > > device registered > > > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > > > > > I think you need to hold cdev->lock here, to make sure that no thermal zone > > is added or removed from cdev->thermal_instances while you are looping. > > > Ah right, will add. If I add the cdev ->lock here, will there be a AB-BA lock with > thermal_zone_unbind_cooling_device? You're right, it could lead to a deadlock. The locks can't be swapped because that won't work in step_wise. The best way that I can think of accessing thermal_instances atomically is by making it RCU protected instead of with mutexes. What do you think? > > Why list_for_each_entry_safe() ? You are not going to remove any entry, so > > you can just use list_for_each_entry() > > > > > > Why is this so complicated? Can't you just do: > > > > list_for_each_entry(pos, &cdev->thermal_instances, cdev_node) > > thermal_zone_device_update(pos->tz); > > > > This is an optimization here: > Ignore thermal instance that refers to the same thermal zone in this loop, > this works because bind_cdev() always binds the cooling device to one > thermal zone first, and then binds to the next thermal zone. It has taken me a while to understand this optimization. Please document both "if"s in the code. For the first "if" maybe you can use list_is_last() to make it easier to understand that you're looking for the last element in the list: if (list_is_last(&pos->cdev_node, &cdev->thermal_instances)) { thermal_zone_device_update(pos->tz); For the second "if" you can say that you only need to run thermal_zone_device_update() once per thermal zone, even though multiple thermal instances may refer to the same thermal zone. Cheers, Javi ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered 2015-09-29 16:01 ` Javi Merino @ 2015-10-12 9:23 ` Chen, Yu C 0 siblings, 0 replies; 32+ messages in thread From: Chen, Yu C @ 2015-10-12 9:23 UTC (permalink / raw) To: Javi Merino Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable, Pandruvada, Srinivas [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3236 bytes --] Hi, Javi Sorry for my late response, > -----Original Message----- > From: Javi Merino [mailto:javi.merino@arm.com] > Sent: Wednesday, September 30, 2015 12:02 AM > To: Chen, Yu C > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > kernel@vger.kernel.org; stable@vger.kernel.org > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling > device registered > > Hi Yu, > > On Mon, Sep 28, 2015 at 06:52:00PM +0100, Chen, Yu C wrote: > > Hi, Javi, > > > > > -----Original Message----- > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > Sent: Monday, September 28, 2015 10:29 PM > > > To: Chen, Yu C > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > cooling device registered > > > > > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > > > > > > > > > I think you need to hold cdev->lock here, to make sure that no > > > thermal zone is added or removed from cdev->thermal_instances while > you are looping. > > > > > Ah right, will add. If I add the cdev ->lock here, will there be a > > AB-BA lock with thermal_zone_unbind_cooling_device? > > You're right, it could lead to a deadlock. The locks can't be swapped because > that won't work in step_wise. > > The best way that I can think of accessing thermal_instances atomically is by > making it RCU protected instead of with mutexes. > What do you think? > RCU would need extra spinlocks to protect the list, and need to sync_rcu after we delete one instance from thermal_instance list, I think it is too complicated for me to rewrite: ( How about using thermal_list_lock instead of cdev ->lock? This guy should be big enough to protect the device.thermal_instance list. > > > > Why list_for_each_entry_safe() ? You are not going to remove any > > > entry, so you can just use list_for_each_entry() > > > > > > > > > Why is this so complicated? Can't you just do: > > > > > > list_for_each_entry(pos, &cdev->thermal_instances, cdev_node) > > > thermal_zone_device_update(pos->tz); > > > > > > > This is an optimization here: > > Ignore thermal instance that refers to the same thermal zone in this > > loop, this works because bind_cdev() always binds the cooling device > > to one thermal zone first, and then binds to the next thermal zone. > > It has taken me a while to understand this optimization. Please document > both "if"s in the code. For the first "if" maybe you can use > list_is_last() to make it easier to understand that you're looking for the last > element in the list: > > if (list_is_last(&pos->cdev_node, &cdev- > >thermal_instances)) { > thermal_zone_device_update(pos->tz); > Sure, ok > For the second "if" you can say that you only need to run > thermal_zone_device_update() once per thermal zone, even though > multiple thermal instances may refer to the same thermal zone. > OK Best Regards, Yu ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered @ 2015-10-12 9:23 ` Chen, Yu C 0 siblings, 0 replies; 32+ messages in thread From: Chen, Yu C @ 2015-10-12 9:23 UTC (permalink / raw) To: Javi Merino Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable, Pandruvada, Srinivas Hi, Javi Sorry for my late response, > -----Original Message----- > From: Javi Merino [mailto:javi.merino@arm.com] > Sent: Wednesday, September 30, 2015 12:02 AM > To: Chen, Yu C > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > kernel@vger.kernel.org; stable@vger.kernel.org > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling > device registered > > Hi Yu, > > On Mon, Sep 28, 2015 at 06:52:00PM +0100, Chen, Yu C wrote: > > Hi, Javi, > > > > > -----Original Message----- > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > Sent: Monday, September 28, 2015 10:29 PM > > > To: Chen, Yu C > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > cooling device registered > > > > > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > > > > > > > > > I think you need to hold cdev->lock here, to make sure that no > > > thermal zone is added or removed from cdev->thermal_instances while > you are looping. > > > > > Ah right, will add. If I add the cdev ->lock here, will there be a > > AB-BA lock with thermal_zone_unbind_cooling_device? > > You're right, it could lead to a deadlock. The locks can't be swapped because > that won't work in step_wise. > > The best way that I can think of accessing thermal_instances atomically is by > making it RCU protected instead of with mutexes. > What do you think? > RCU would need extra spinlocks to protect the list, and need to sync_rcu after we delete one instance from thermal_instance list, I think it is too complicated for me to rewrite: ( How about using thermal_list_lock instead of cdev ->lock? This guy should be big enough to protect the device.thermal_instance list. > > > > Why list_for_each_entry_safe() ? You are not going to remove any > > > entry, so you can just use list_for_each_entry() > > > > > > > > > Why is this so complicated? Can't you just do: > > > > > > list_for_each_entry(pos, &cdev->thermal_instances, cdev_node) > > > thermal_zone_device_update(pos->tz); > > > > > > > This is an optimization here: > > Ignore thermal instance that refers to the same thermal zone in this > > loop, this works because bind_cdev() always binds the cooling device > > to one thermal zone first, and then binds to the next thermal zone. > > It has taken me a while to understand this optimization. Please document > both "if"s in the code. For the first "if" maybe you can use > list_is_last() to make it easier to understand that you're looking for the last > element in the list: > > if (list_is_last(&pos->cdev_node, &cdev- > >thermal_instances)) { > thermal_zone_device_update(pos->tz); > Sure, ok > For the second "if" you can say that you only need to run > thermal_zone_device_update() once per thermal zone, even though > multiple thermal instances may refer to the same thermal zone. > OK Best Regards, Yu ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered 2015-10-12 9:23 ` Chen, Yu C (?) @ 2015-10-14 17:07 ` Javi Merino 2015-10-14 19:21 ` Chen, Yu C 2015-10-14 19:23 ` Chen, Yu C -1 siblings, 2 replies; 32+ messages in thread From: Javi Merino @ 2015-10-14 17:07 UTC (permalink / raw) To: Chen, Yu C Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable, Pandruvada, Srinivas On Mon, Oct 12, 2015 at 09:23:28AM +0000, Chen, Yu C wrote: > Hi, Javi > Sorry for my late response, > > > -----Original Message----- > > From: Javi Merino [mailto:javi.merino@arm.com] > > Sent: Wednesday, September 30, 2015 12:02 AM > > To: Chen, Yu C > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > > kernel@vger.kernel.org; stable@vger.kernel.org > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling > > device registered > > > > Hi Yu, > > > > On Mon, Sep 28, 2015 at 06:52:00PM +0100, Chen, Yu C wrote: > > > Hi, Javi, > > > > > > > -----Original Message----- > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > Sent: Monday, September 28, 2015 10:29 PM > > > > To: Chen, Yu C > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > > cooling device registered > > > > > > > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > > > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > > > > > > > > > > > > > I think you need to hold cdev->lock here, to make sure that no > > > > thermal zone is added or removed from cdev->thermal_instances while > > you are looping. > > > > > > > Ah right, will add. If I add the cdev ->lock here, will there be a > > > AB-BA lock with thermal_zone_unbind_cooling_device? > > > > You're right, it could lead to a deadlock. The locks can't be swapped because > > that won't work in step_wise. > > > > The best way that I can think of accessing thermal_instances atomically is by > > making it RCU protected instead of with mutexes. > > What do you think? > > > RCU would need extra spinlocks to protect the list, and need to sync_rcu after we delete > one instance from thermal_instance list, I think it is too complicated for me to rewrite: ( > How about using thermal_list_lock instead of cdev ->lock? > This guy should be big enough to protect the device.thermal_instance list. thermal_list_lock protects thermal_tz_list and thermal_cdev_list, but it doesn't protect the thermal_instances list. For example, thermal_zone_bind_cooling_device() adds a cooling device to the cdev->thermal_instances list without taking thermal_tz_list. To sum up, you have to protect accessing the cdev->thermal_instances list but with the current locking scheme, you would create an AB-BA deadlock. As I see it you would have to change the locking scheme to either RCU or add a new mutex that protects the cdev->thermal_instances and tz->thermal_instances lists and change all accesses to them to make sure they comply with the new locking scheme. Is there a better way of solving this? Cheers, Javi ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered 2015-10-14 17:07 ` Javi Merino @ 2015-10-14 19:21 ` Chen, Yu C 2015-10-14 19:23 ` Chen, Yu C 1 sibling, 0 replies; 32+ messages in thread From: Chen, Yu C @ 2015-10-14 19:21 UTC (permalink / raw) To: Javi Merino Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable, Pandruvada, Srinivas [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3521 bytes --] Hi Javi, > -----Original Message----- > From: Javi Merino [mailto:javi.merino@arm.com] > Sent: Thursday, October 15, 2015 1:08 AM > To: Chen, Yu C > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > kernel@vger.kernel.org; stable@vger.kernel.org; Pandruvada, Srinivas > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling > device registered > > On Mon, Oct 12, 2015 at 09:23:28AM +0000, Chen, Yu C wrote: > > Hi, Javi > > Sorry for my late response, > > > > > -----Original Message----- > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > Sent: Wednesday, September 30, 2015 12:02 AM > > > To: Chen, Yu C > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > cooling device registered > > > > > > Hi Yu, > > > > > > On Mon, Sep 28, 2015 at 06:52:00PM +0100, Chen, Yu C wrote: > > > > Hi, Javi, > > > > > > > > > -----Original Message----- > > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > > Sent: Monday, September 28, 2015 10:29 PM > > > > > To: Chen, Yu C > > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > > > cooling device registered > > > > > > > > > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > > > > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > > > > > > > > > > > > > > > > > I think you need to hold cdev->lock here, to make sure that no > > > > > thermal zone is added or removed from cdev->thermal_instances > > > > > while > > > you are looping. > > > > > > > > > Ah right, will add. If I add the cdev ->lock here, will there be a > > > > AB-BA lock with thermal_zone_unbind_cooling_device? > > > > > > You're right, it could lead to a deadlock. The locks can't be > > > swapped because that won't work in step_wise. > > > > > > The best way that I can think of accessing thermal_instances > > > atomically is by making it RCU protected instead of with mutexes. > > > What do you think? > > > > > RCU would need extra spinlocks to protect the list, and need to > > sync_rcu after we delete one instance from thermal_instance list, I > > think it is too complicated for me to rewrite: ( How about using > thermal_list_lock instead of cdev ->lock? > > This guy should be big enough to protect the device.thermal_instance list. > > thermal_list_lock protects thermal_tz_list and thermal_cdev_list, but it > doesn't protect the thermal_instances list. For example, > thermal_zone_bind_cooling_device() adds a cooling device to the > cdev->thermal_instances list without taking thermal_tz_list. > Before thermal_zone_bind_cooling_device is invoked, the thermal_list_lock will be firstly gripped: static void bind_cdev(struct thermal_cooling_device *cdev) { mutex_lock(&thermal_list_lock); either tz->ops->bind : thermal_zone_bind_cooling_device or __bind() : thermal_zone_bind_cooling_device mutex_unlock(&thermal_list_lock); } And it is the same as in passive_store. So when code is trying to add/delete thermal_instance of cdev, he has already hold thermal_list_lock IMO. Or do I miss anything? Best Regards, Yu ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered @ 2015-10-14 19:21 ` Chen, Yu C 0 siblings, 0 replies; 32+ messages in thread From: Chen, Yu C @ 2015-10-14 19:21 UTC (permalink / raw) To: Javi Merino Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable, Pandruvada, Srinivas Hi Javi, > -----Original Message----- > From: Javi Merino [mailto:javi.merino@arm.com] > Sent: Thursday, October 15, 2015 1:08 AM > To: Chen, Yu C > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > kernel@vger.kernel.org; stable@vger.kernel.org; Pandruvada, Srinivas > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling > device registered > > On Mon, Oct 12, 2015 at 09:23:28AM +0000, Chen, Yu C wrote: > > Hi, Javi > > Sorry for my late response, > > > > > -----Original Message----- > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > Sent: Wednesday, September 30, 2015 12:02 AM > > > To: Chen, Yu C > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > cooling device registered > > > > > > Hi Yu, > > > > > > On Mon, Sep 28, 2015 at 06:52:00PM +0100, Chen, Yu C wrote: > > > > Hi, Javi, > > > > > > > > > -----Original Message----- > > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > > Sent: Monday, September 28, 2015 10:29 PM > > > > > To: Chen, Yu C > > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > > > cooling device registered > > > > > > > > > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > > > > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > > > > > > > > > > > > > > > > > I think you need to hold cdev->lock here, to make sure that no > > > > > thermal zone is added or removed from cdev->thermal_instances > > > > > while > > > you are looping. > > > > > > > > > Ah right, will add. If I add the cdev ->lock here, will there be a > > > > AB-BA lock with thermal_zone_unbind_cooling_device? > > > > > > You're right, it could lead to a deadlock. The locks can't be > > > swapped because that won't work in step_wise. > > > > > > The best way that I can think of accessing thermal_instances > > > atomically is by making it RCU protected instead of with mutexes. > > > What do you think? > > > > > RCU would need extra spinlocks to protect the list, and need to > > sync_rcu after we delete one instance from thermal_instance list, I > > think it is too complicated for me to rewrite: ( How about using > thermal_list_lock instead of cdev ->lock? > > This guy should be big enough to protect the device.thermal_instance list. > > thermal_list_lock protects thermal_tz_list and thermal_cdev_list, but it > doesn't protect the thermal_instances list. For example, > thermal_zone_bind_cooling_device() adds a cooling device to the > cdev->thermal_instances list without taking thermal_tz_list. > Before thermal_zone_bind_cooling_device is invoked, the thermal_list_lock will be firstly gripped: static void bind_cdev(struct thermal_cooling_device *cdev) { mutex_lock(&thermal_list_lock); either tz->ops->bind : thermal_zone_bind_cooling_device or __bind() : thermal_zone_bind_cooling_device mutex_unlock(&thermal_list_lock); } And it is the same as in passive_store. So when code is trying to add/delete thermal_instance of cdev, he has already hold thermal_list_lock IMO. Or do I miss anything? Best Regards, Yu ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered 2015-10-14 17:07 ` Javi Merino @ 2015-10-14 19:23 ` Chen, Yu C 2015-10-14 19:23 ` Chen, Yu C 1 sibling, 0 replies; 32+ messages in thread From: Chen, Yu C @ 2015-10-14 19:23 UTC (permalink / raw) To: Javi Merino Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable, Pandruvada, Srinivas [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3510 bytes --] Hi,Javi > -----Original Message----- > From: Javi Merino [mailto:javi.merino@arm.com] > Sent: Thursday, October 15, 2015 1:08 AM > To: Chen, Yu C > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > kernel@vger.kernel.org; stable@vger.kernel.org; Pandruvada, Srinivas > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling > device registered > > On Mon, Oct 12, 2015 at 09:23:28AM +0000, Chen, Yu C wrote: > > Hi, Javi > > Sorry for my late response, > > > > > -----Original Message----- > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > Sent: Wednesday, September 30, 2015 12:02 AM > > > To: Chen, Yu C > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > cooling device registered > > > > > > Hi Yu, > > > > > > On Mon, Sep 28, 2015 at 06:52:00PM +0100, Chen, Yu C wrote: > > > > Hi, Javi, > > > > > > > > > -----Original Message----- > > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > > Sent: Monday, September 28, 2015 10:29 PM > > > > > To: Chen, Yu C > > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > > > cooling device registered > > > > > > > > > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > > > > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > > > > > > > > > > > > > > > > > I think you need to hold cdev->lock here, to make sure that no > > > > > thermal zone is added or removed from cdev->thermal_instances > > > > > while > > > you are looping. > > > > > > > > > Ah right, will add. If I add the cdev ->lock here, will there be a > > > > AB-BA lock with thermal_zone_unbind_cooling_device? > > > > > > You're right, it could lead to a deadlock. The locks can't be > > > swapped because that won't work in step_wise. > > > > > > The best way that I can think of accessing thermal_instances > > > atomically is by making it RCU protected instead of with mutexes. > > > What do you think? > > > > > RCU would need extra spinlocks to protect the list, and need to > > sync_rcu after we delete one instance from thermal_instance list, I > > think it is too complicated for me to rewrite: ( How about using > thermal_list_lock instead of cdev ->lock? > > This guy should be big enough to protect the device.thermal_instance list. > > thermal_list_lock protects thermal_tz_list and thermal_cdev_list, but it > doesn't protect the thermal_instances list. For example, > thermal_zone_bind_cooling_device() adds a cooling device to the > cdev->thermal_instances list without taking thermal_tz_list. > Before thermal_zone_bind_cooling_device is invoked, the thermal_list_lock will be firstly gripped: static void bind_cdev(struct thermal_cooling_device *cdev) { mutex_lock(&thermal_list_lock); either tz->ops->bind : thermal_zone_bind_cooling_device or __bind() : thermal_zone_bind_cooling_device mutex_unlock(&thermal_list_lock); } And it is the same as in passive_store. So when code is trying to add/delete thermal_instance of cdev, he has already hold thermal_list_lock IMO. Or do I miss anything? Best Regards, Yu ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered @ 2015-10-14 19:23 ` Chen, Yu C 0 siblings, 0 replies; 32+ messages in thread From: Chen, Yu C @ 2015-10-14 19:23 UTC (permalink / raw) To: Javi Merino Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable, Pandruvada, Srinivas Hi,Javi > -----Original Message----- > From: Javi Merino [mailto:javi.merino@arm.com] > Sent: Thursday, October 15, 2015 1:08 AM > To: Chen, Yu C > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > kernel@vger.kernel.org; stable@vger.kernel.org; Pandruvada, Srinivas > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling > device registered > > On Mon, Oct 12, 2015 at 09:23:28AM +0000, Chen, Yu C wrote: > > Hi, Javi > > Sorry for my late response, > > > > > -----Original Message----- > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > Sent: Wednesday, September 30, 2015 12:02 AM > > > To: Chen, Yu C > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > cooling device registered > > > > > > Hi Yu, > > > > > > On Mon, Sep 28, 2015 at 06:52:00PM +0100, Chen, Yu C wrote: > > > > Hi, Javi, > > > > > > > > > -----Original Message----- > > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > > Sent: Monday, September 28, 2015 10:29 PM > > > > > To: Chen, Yu C > > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > > > cooling device registered > > > > > > > > > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > > > > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > > > > > > > > > > > > > > > > > I think you need to hold cdev->lock here, to make sure that no > > > > > thermal zone is added or removed from cdev->thermal_instances > > > > > while > > > you are looping. > > > > > > > > > Ah right, will add. If I add the cdev ->lock here, will there be a > > > > AB-BA lock with thermal_zone_unbind_cooling_device? > > > > > > You're right, it could lead to a deadlock. The locks can't be > > > swapped because that won't work in step_wise. > > > > > > The best way that I can think of accessing thermal_instances > > > atomically is by making it RCU protected instead of with mutexes. > > > What do you think? > > > > > RCU would need extra spinlocks to protect the list, and need to > > sync_rcu after we delete one instance from thermal_instance list, I > > think it is too complicated for me to rewrite: ( How about using > thermal_list_lock instead of cdev ->lock? > > This guy should be big enough to protect the device.thermal_instance list. > > thermal_list_lock protects thermal_tz_list and thermal_cdev_list, but it > doesn't protect the thermal_instances list. For example, > thermal_zone_bind_cooling_device() adds a cooling device to the > cdev->thermal_instances list without taking thermal_tz_list. > Before thermal_zone_bind_cooling_device is invoked, the thermal_list_lock will be firstly gripped: static void bind_cdev(struct thermal_cooling_device *cdev) { mutex_lock(&thermal_list_lock); either tz->ops->bind : thermal_zone_bind_cooling_device or __bind() : thermal_zone_bind_cooling_device mutex_unlock(&thermal_list_lock); } And it is the same as in passive_store. So when code is trying to add/delete thermal_instance of cdev, he has already hold thermal_list_lock IMO. Or do I miss anything? Best Regards, Yu ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered 2015-10-14 19:23 ` Chen, Yu C (?) @ 2015-10-15 14:05 ` Javi Merino 2015-10-20 1:05 ` Chen, Yu C 2015-10-20 1:44 ` Chen, Yu C -1 siblings, 2 replies; 32+ messages in thread From: Javi Merino @ 2015-10-15 14:05 UTC (permalink / raw) To: Chen, Yu C Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable, Pandruvada, Srinivas On Wed, Oct 14, 2015 at 07:23:55PM +0000, Chen, Yu C wrote: > > -----Original Message----- > > From: Javi Merino [mailto:javi.merino@arm.com] > > Sent: Thursday, October 15, 2015 1:08 AM > > To: Chen, Yu C > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > > kernel@vger.kernel.org; stable@vger.kernel.org; Pandruvada, Srinivas > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling > > device registered > > > > On Mon, Oct 12, 2015 at 09:23:28AM +0000, Chen, Yu C wrote: > > > Hi, Javi > > > Sorry for my late response, > > > > > > > -----Original Message----- > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > Sent: Wednesday, September 30, 2015 12:02 AM > > > > To: Chen, Yu C > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > > cooling device registered > > > > > > > > Hi Yu, > > > > > > > > On Mon, Sep 28, 2015 at 06:52:00PM +0100, Chen, Yu C wrote: > > > > > Hi, Javi, > > > > > > > > > > > -----Original Message----- > > > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > > > Sent: Monday, September 28, 2015 10:29 PM > > > > > > To: Chen, Yu C > > > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > > > > cooling device registered > > > > > > > > > > > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > > > > > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > > > > > > > > > > > > > > > > > > > > > I think you need to hold cdev->lock here, to make sure that no > > > > > > thermal zone is added or removed from cdev->thermal_instances > > > > > > while > > > > you are looping. > > > > > > > > > > > Ah right, will add. If I add the cdev ->lock here, will there be a > > > > > AB-BA lock with thermal_zone_unbind_cooling_device? > > > > > > > > You're right, it could lead to a deadlock. The locks can't be > > > > swapped because that won't work in step_wise. > > > > > > > > The best way that I can think of accessing thermal_instances > > > > atomically is by making it RCU protected instead of with mutexes. > > > > What do you think? > > > > > > > RCU would need extra spinlocks to protect the list, and need to > > > sync_rcu after we delete one instance from thermal_instance list, I > > > think it is too complicated for me to rewrite: ( How about using > > thermal_list_lock instead of cdev ->lock? > > > This guy should be big enough to protect the device.thermal_instance list. > > > > thermal_list_lock protects thermal_tz_list and thermal_cdev_list, but it > > doesn't protect the thermal_instances list. For example, > > thermal_zone_bind_cooling_device() adds a cooling device to the > > cdev->thermal_instances list without taking thermal_tz_list. > > > Before thermal_zone_bind_cooling_device is invoked, > the thermal_list_lock will be firstly gripped: > > static void bind_cdev(struct thermal_cooling_device *cdev) > { > mutex_lock(&thermal_list_lock); > either tz->ops->bind : thermal_zone_bind_cooling_device > or __bind() : thermal_zone_bind_cooling_device > mutex_unlock(&thermal_list_lock); > } > > And it is the same as in passive_store. > So when code is trying to add/delete thermal_instance of cdev, > he has already hold thermal_list_lock IMO. Or do I miss anything? thermal_zone_bind_cooling_device() is exported, so you can't really rely on the static thermal_list_lock being acquired in every single call. thermal_list_lock and protects the lists thermal_tz_list and thermal_cdev_list. Making it implicitly protect the cooling device's and thermal zone device's instances list because no sensible code would call thermal_zone_bind_cooling_device() outside of a bind function is just asking for trouble. Locking is hard to understand and easy to get wrong so let's keep it simple. Cheers, Javi ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered 2015-10-15 14:05 ` Javi Merino @ 2015-10-20 1:05 ` Chen, Yu C 2015-10-20 1:44 ` Chen, Yu C 1 sibling, 0 replies; 32+ messages in thread From: Chen, Yu C @ 2015-10-20 1:05 UTC (permalink / raw) To: Javi Merino Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable, Pandruvada, Srinivas, manuelkrause [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5641 bytes --] Hi, > -----Original Message----- > From: Javi Merino [mailto:javi.merino@arm.com] > Sent: Thursday, October 15, 2015 10:05 PM > To: Chen, Yu C > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > kernel@vger.kernel.org; stable@vger.kernel.org; Pandruvada, Srinivas > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling > device registered > > On Wed, Oct 14, 2015 at 07:23:55PM +0000, Chen, Yu C wrote: > > > -----Original Message----- > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > Sent: Thursday, October 15, 2015 1:08 AM > > > To: Chen, Yu C > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org; Pandruvada, > > > Srinivas > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > cooling device registered > > > > > > On Mon, Oct 12, 2015 at 09:23:28AM +0000, Chen, Yu C wrote: > > > > Hi, Javi > > > > Sorry for my late response, > > > > > > > > > -----Original Message----- > > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > > Sent: Wednesday, September 30, 2015 12:02 AM > > > > > To: Chen, Yu C > > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > > > cooling device registered > > > > > > > > > > Hi Yu, > > > > > > > > > > On Mon, Sep 28, 2015 at 06:52:00PM +0100, Chen, Yu C wrote: > > > > > > Hi, Javi, > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > > > > Sent: Monday, September 28, 2015 10:29 PM > > > > > > > To: Chen, Yu C > > > > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, > > > > > > > Rui; > > > > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update > > > > > > > after a cooling device registered > > > > > > > > > > > > > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > > > > > > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think you need to hold cdev->lock here, to make sure that > > > > > > > no thermal zone is added or removed from > > > > > > > cdev->thermal_instances while > > > > > you are looping. > > > > > > > > > > > > > Ah right, will add. If I add the cdev ->lock here, will there > > > > > > be a AB-BA lock with thermal_zone_unbind_cooling_device? > > > > > > > > > > You're right, it could lead to a deadlock. The locks can't be > > > > > swapped because that won't work in step_wise. > > > > > > > > > > The best way that I can think of accessing thermal_instances > > > > > atomically is by making it RCU protected instead of with mutexes. > > > > > What do you think? > > > > > > > > > RCU would need extra spinlocks to protect the list, and need to > > > > sync_rcu after we delete one instance from thermal_instance list, > > > > I think it is too complicated for me to rewrite: ( How about using > > > thermal_list_lock instead of cdev ->lock? > > > > This guy should be big enough to protect the device.thermal_instance > list. > > > > > > thermal_list_lock protects thermal_tz_list and thermal_cdev_list, > > > but it doesn't protect the thermal_instances list. For example, > > > thermal_zone_bind_cooling_device() adds a cooling device to the > > > cdev->thermal_instances list without taking thermal_tz_list. > > > > > Before thermal_zone_bind_cooling_device is invoked, the > > thermal_list_lock will be firstly gripped: > > > > static void bind_cdev(struct thermal_cooling_device *cdev) { > > mutex_lock(&thermal_list_lock); > > either tz->ops->bind : thermal_zone_bind_cooling_device > > or __bind() : thermal_zone_bind_cooling_device > > mutex_unlock(&thermal_list_lock); > > } > > > > And it is the same as in passive_store. > > So when code is trying to add/delete thermal_instance of cdev, he has > > already hold thermal_list_lock IMO. Or do I miss anything? > > thermal_zone_bind_cooling_device() is exported, so you can't really rely on > the static thermal_list_lock being acquired in every single call. > > thermal_list_lock and protects the lists thermal_tz_list and thermal_cdev_list. > Making it implicitly protect the cooling device's and thermal zone device's > instances list because no sensible code would call > thermal_zone_bind_cooling_device() outside of a bind function is just asking > for trouble. > Yes, from this point of view,it is true. > Locking is hard to understand and easy to get wrong so let's keep it simple. > How about the following 2 methods: 1. avoid accessing device's thermal_instance,but access all thermal_zone_device directly, although there might be some redundancy, some thermal zones do not need to be updated, but we can avoid gripping dev->lock: mutex_lock(&thermal_list_lock); list_for_each_entry(pos, &thermal_tz_list, node) thermal_zone_device_update(tz); mutex_unlock(&thermal_list_lock); or, 2. Once we bind the new device with the thermal_zone_device, we can record that thermal_zone_device, and update that thermal_zone_device alone. BTW, since thermal_zone_device_update is not atomic, we might need another patch to make it into atomic or something like that, but for now, I think these three patches are just for fixing the regressions. Thanks Best Regards, Yu ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered @ 2015-10-20 1:05 ` Chen, Yu C 0 siblings, 0 replies; 32+ messages in thread From: Chen, Yu C @ 2015-10-20 1:05 UTC (permalink / raw) To: Javi Merino Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable, Pandruvada, Srinivas, manuelkrause Hi, > -----Original Message----- > From: Javi Merino [mailto:javi.merino@arm.com] > Sent: Thursday, October 15, 2015 10:05 PM > To: Chen, Yu C > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > kernel@vger.kernel.org; stable@vger.kernel.org; Pandruvada, Srinivas > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling > device registered > > On Wed, Oct 14, 2015 at 07:23:55PM +0000, Chen, Yu C wrote: > > > -----Original Message----- > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > Sent: Thursday, October 15, 2015 1:08 AM > > > To: Chen, Yu C > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org; Pandruvada, > > > Srinivas > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > cooling device registered > > > > > > On Mon, Oct 12, 2015 at 09:23:28AM +0000, Chen, Yu C wrote: > > > > Hi, Javi > > > > Sorry for my late response, > > > > > > > > > -----Original Message----- > > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > > Sent: Wednesday, September 30, 2015 12:02 AM > > > > > To: Chen, Yu C > > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > > > cooling device registered > > > > > > > > > > Hi Yu, > > > > > > > > > > On Mon, Sep 28, 2015 at 06:52:00PM +0100, Chen, Yu C wrote: > > > > > > Hi, Javi, > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > > > > Sent: Monday, September 28, 2015 10:29 PM > > > > > > > To: Chen, Yu C > > > > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, > > > > > > > Rui; > > > > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update > > > > > > > after a cooling device registered > > > > > > > > > > > > > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > > > > > > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think you need to hold cdev->lock here, to make sure that > > > > > > > no thermal zone is added or removed from > > > > > > > cdev->thermal_instances while > > > > > you are looping. > > > > > > > > > > > > > Ah right, will add. If I add the cdev ->lock here, will there > > > > > > be a AB-BA lock with thermal_zone_unbind_cooling_device? > > > > > > > > > > You're right, it could lead to a deadlock. The locks can't be > > > > > swapped because that won't work in step_wise. > > > > > > > > > > The best way that I can think of accessing thermal_instances > > > > > atomically is by making it RCU protected instead of with mutexes. > > > > > What do you think? > > > > > > > > > RCU would need extra spinlocks to protect the list, and need to > > > > sync_rcu after we delete one instance from thermal_instance list, > > > > I think it is too complicated for me to rewrite: ( How about using > > > thermal_list_lock instead of cdev ->lock? > > > > This guy should be big enough to protect the device.thermal_instance > list. > > > > > > thermal_list_lock protects thermal_tz_list and thermal_cdev_list, > > > but it doesn't protect the thermal_instances list. For example, > > > thermal_zone_bind_cooling_device() adds a cooling device to the > > > cdev->thermal_instances list without taking thermal_tz_list. > > > > > Before thermal_zone_bind_cooling_device is invoked, the > > thermal_list_lock will be firstly gripped: > > > > static void bind_cdev(struct thermal_cooling_device *cdev) { > > mutex_lock(&thermal_list_lock); > > either tz->ops->bind : thermal_zone_bind_cooling_device > > or __bind() : thermal_zone_bind_cooling_device > > mutex_unlock(&thermal_list_lock); > > } > > > > And it is the same as in passive_store. > > So when code is trying to add/delete thermal_instance of cdev, he has > > already hold thermal_list_lock IMO. Or do I miss anything? > > thermal_zone_bind_cooling_device() is exported, so you can't really rely on > the static thermal_list_lock being acquired in every single call. > > thermal_list_lock and protects the lists thermal_tz_list and thermal_cdev_list. > Making it implicitly protect the cooling device's and thermal zone device's > instances list because no sensible code would call > thermal_zone_bind_cooling_device() outside of a bind function is just asking > for trouble. > Yes, from this point of view,it is true. > Locking is hard to understand and easy to get wrong so let's keep it simple. > How about the following 2 methods: 1. avoid accessing device's thermal_instance,but access all thermal_zone_device directly, although there might be some redundancy, some thermal zones do not need to be updated, but we can avoid gripping dev->lock: mutex_lock(&thermal_list_lock); list_for_each_entry(pos, &thermal_tz_list, node) thermal_zone_device_update(tz); mutex_unlock(&thermal_list_lock); or, 2. Once we bind the new device with the thermal_zone_device, we can record that thermal_zone_device, and update that thermal_zone_device alone. BTW, since thermal_zone_device_update is not atomic, we might need another patch to make it into atomic or something like that, but for now, I think these three patches are just for fixing the regressions. Thanks Best Regards, Yu ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered 2015-10-15 14:05 ` Javi Merino @ 2015-10-20 1:44 ` Chen, Yu C 2015-10-20 1:44 ` Chen, Yu C 1 sibling, 0 replies; 32+ messages in thread From: Chen, Yu C @ 2015-10-20 1:44 UTC (permalink / raw) To: Javi Merino Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable, Pandruvada, Srinivas [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5920 bytes --] (resend for broken display) Hi, > -----Original Message----- > From: Javi Merino [mailto:javi.merino@arm.com] > Sent: Thursday, October 15, 2015 10:05 PM > To: Chen, Yu C > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > kernel@vger.kernel.org; stable@vger.kernel.org; Pandruvada, Srinivas > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > cooling device registered > > On Wed, Oct 14, 2015 at 07:23:55PM +0000, Chen, Yu C wrote: > > > -----Original Message----- > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > Sent: Thursday, October 15, 2015 1:08 AM > > > To: Chen, Yu C > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org; Pandruvada, > > > Srinivas > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > cooling device registered > > > > > > On Mon, Oct 12, 2015 at 09:23:28AM +0000, Chen, Yu C wrote: > > > > Hi, Javi > > > > Sorry for my late response, > > > > > > > > > -----Original Message----- > > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > > Sent: Wednesday, September 30, 2015 12:02 AM > > > > > To: Chen, Yu C > > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after > > > > > a cooling device registered > > > > > > > > > > Hi Yu, > > > > > > > > > > On Mon, Sep 28, 2015 at 06:52:00PM +0100, Chen, Yu C wrote: > > > > > > Hi, Javi, > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > > > > Sent: Monday, September 28, 2015 10:29 PM > > > > > > > To: Chen, Yu C > > > > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, > > > > > > > Rui; > > > > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update > > > > > > > after a cooling device registered > > > > > > > > > > > > > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > > > > > > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think you need to hold cdev->lock here, to make sure > > > > > > > that no thermal zone is added or removed from > > > > > > > cdev->thermal_instances while > > > > > you are looping. > > > > > > > > > > > > > Ah right, will add. If I add the cdev ->lock here, will > > > > > > there be a AB-BA lock with thermal_zone_unbind_cooling_device? > > > > > > > > > > You're right, it could lead to a deadlock. The locks can't be > > > > > swapped because that won't work in step_wise. > > > > > > > > > > The best way that I can think of accessing thermal_instances > > > > > atomically is by making it RCU protected instead of with mutexes. > > > > > What do you think? > > > > > > > > > RCU would need extra spinlocks to protect the list, and need to > > > > sync_rcu after we delete one instance from thermal_instance > > > > list, I think it is too complicated for me to rewrite: ( How > > > > about using > > > thermal_list_lock instead of cdev ->lock? > > > > This guy should be big enough to protect the > > > > device.thermal_instance > list. > > > > > > thermal_list_lock protects thermal_tz_list and thermal_cdev_list, > > > but it doesn't protect the thermal_instances list. For example, > > > thermal_zone_bind_cooling_device() adds a cooling device to the > > > cdev->thermal_instances list without taking thermal_tz_list. > > > > > Before thermal_zone_bind_cooling_device is invoked, the > > thermal_list_lock will be firstly gripped: > > > > static void bind_cdev(struct thermal_cooling_device *cdev) { > > mutex_lock(&thermal_list_lock); > > either tz->ops->bind : thermal_zone_bind_cooling_device > > or __bind() : thermal_zone_bind_cooling_device > > mutex_unlock(&thermal_list_lock); > > } > > > > And it is the same as in passive_store. > > So when code is trying to add/delete thermal_instance of cdev, he > > has already hold thermal_list_lock IMO. Or do I miss anything? > > thermal_zone_bind_cooling_device() is exported, so you can't really > rely on the static thermal_list_lock being acquired in every single call. > > thermal_list_lock and protects the lists thermal_tz_list and thermal_cdev_list. > Making it implicitly protect the cooling device's and thermal zone > device's instances list because no sensible code would call > thermal_zone_bind_cooling_device() outside of a bind function is just > asking for trouble. > Yes, from this point of view,it is true. > Locking is hard to understand and easy to get wrong so let's keep it simple. > How about the following 2 methods: 1. avoid accessing device's thermal_instance, but access all thermal_zone_device directly, although there might be some redundancy, some thermal zones do not need to be updated, but we can avoid gripping dev->lock: mutex_lock(&thermal_list_lock); list_for_each_entry(pos, &thermal_tz_list, node) thermal_zone_device_update(tz); mutex_unlock(&thermal_list_lock); or, 2. Once we bind the new device with the thermal_zone_device, we can record that thermal_zone_device, and update that thermal_zone_device alone,the the code would be: mutex_lock(&thermal_list_lock); list_for_each_entry(pos, &thermal_tz_list, node){ if (tz->need_update) thermal_zone_device_update(tz); } mutex_unlock(&thermal_list_lock); BTW, since thermal_zone_device_update is not atomic, we might need another patch to make it into atomic or something like that, but for now, I think these three patches are just for fixing the regressions. Thanks Best Regards, Yu ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered @ 2015-10-20 1:44 ` Chen, Yu C 0 siblings, 0 replies; 32+ messages in thread From: Chen, Yu C @ 2015-10-20 1:44 UTC (permalink / raw) To: Javi Merino Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable, Pandruvada, Srinivas (resend for broken display) Hi, > -----Original Message----- > From: Javi Merino [mailto:javi.merino@arm.com] > Sent: Thursday, October 15, 2015 10:05 PM > To: Chen, Yu C > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > kernel@vger.kernel.org; stable@vger.kernel.org; Pandruvada, Srinivas > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > cooling device registered > > On Wed, Oct 14, 2015 at 07:23:55PM +0000, Chen, Yu C wrote: > > > -----Original Message----- > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > Sent: Thursday, October 15, 2015 1:08 AM > > > To: Chen, Yu C > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org; Pandruvada, > > > Srinivas > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > cooling device registered > > > > > > On Mon, Oct 12, 2015 at 09:23:28AM +0000, Chen, Yu C wrote: > > > > Hi, Javi > > > > Sorry for my late response, > > > > > > > > > -----Original Message----- > > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > > Sent: Wednesday, September 30, 2015 12:02 AM > > > > > To: Chen, Yu C > > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after > > > > > a cooling device registered > > > > > > > > > > Hi Yu, > > > > > > > > > > On Mon, Sep 28, 2015 at 06:52:00PM +0100, Chen, Yu C wrote: > > > > > > Hi, Javi, > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > > > > Sent: Monday, September 28, 2015 10:29 PM > > > > > > > To: Chen, Yu C > > > > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, > > > > > > > Rui; > > > > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update > > > > > > > after a cooling device registered > > > > > > > > > > > > > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > > > > > > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think you need to hold cdev->lock here, to make sure > > > > > > > that no thermal zone is added or removed from > > > > > > > cdev->thermal_instances while > > > > > you are looping. > > > > > > > > > > > > > Ah right, will add. If I add the cdev ->lock here, will > > > > > > there be a AB-BA lock with thermal_zone_unbind_cooling_device? > > > > > > > > > > You're right, it could lead to a deadlock. The locks can't be > > > > > swapped because that won't work in step_wise. > > > > > > > > > > The best way that I can think of accessing thermal_instances > > > > > atomically is by making it RCU protected instead of with mutexes. > > > > > What do you think? > > > > > > > > > RCU would need extra spinlocks to protect the list, and need to > > > > sync_rcu after we delete one instance from thermal_instance > > > > list, I think it is too complicated for me to rewrite: ( How > > > > about using > > > thermal_list_lock instead of cdev ->lock? > > > > This guy should be big enough to protect the > > > > device.thermal_instance > list. > > > > > > thermal_list_lock protects thermal_tz_list and thermal_cdev_list, > > > but it doesn't protect the thermal_instances list. For example, > > > thermal_zone_bind_cooling_device() adds a cooling device to the > > > cdev->thermal_instances list without taking thermal_tz_list. > > > > > Before thermal_zone_bind_cooling_device is invoked, the > > thermal_list_lock will be firstly gripped: > > > > static void bind_cdev(struct thermal_cooling_device *cdev) { > > mutex_lock(&thermal_list_lock); > > either tz->ops->bind : thermal_zone_bind_cooling_device > > or __bind() : thermal_zone_bind_cooling_device > > mutex_unlock(&thermal_list_lock); > > } > > > > And it is the same as in passive_store. > > So when code is trying to add/delete thermal_instance of cdev, he > > has already hold thermal_list_lock IMO. Or do I miss anything? > > thermal_zone_bind_cooling_device() is exported, so you can't really > rely on the static thermal_list_lock being acquired in every single call. > > thermal_list_lock and protects the lists thermal_tz_list and thermal_cdev_list. > Making it implicitly protect the cooling device's and thermal zone > device's instances list because no sensible code would call > thermal_zone_bind_cooling_device() outside of a bind function is just > asking for trouble. > Yes, from this point of view,it is true. > Locking is hard to understand and easy to get wrong so let's keep it simple. > How about the following 2 methods: 1. avoid accessing device's thermal_instance, but access all thermal_zone_device directly, although there might be some redundancy, some thermal zones do not need to be updated, but we can avoid gripping dev->lock: mutex_lock(&thermal_list_lock); list_for_each_entry(pos, &thermal_tz_list, node) thermal_zone_device_update(tz); mutex_unlock(&thermal_list_lock); or, 2. Once we bind the new device with the thermal_zone_device, we can record that thermal_zone_device, and update that thermal_zone_device alone,the the code would be: mutex_lock(&thermal_list_lock); list_for_each_entry(pos, &thermal_tz_list, node){ if (tz->need_update) thermal_zone_device_update(tz); } mutex_unlock(&thermal_list_lock); BTW, since thermal_zone_device_update is not atomic, we might need another patch to make it into atomic or something like that, but for now, I think these three patches are just for fixing the regressions. Thanks Best Regards, Yu ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered 2015-10-20 1:44 ` Chen, Yu C (?) @ 2015-10-20 9:47 ` Javi Merino -1 siblings, 0 replies; 32+ messages in thread From: Javi Merino @ 2015-10-20 9:47 UTC (permalink / raw) To: Chen, Yu C Cc: linux-pm, edubezval, Zhang, Rui, linux-kernel, stable, Pandruvada, Srinivas Hi Yu, On Tue, Oct 20, 2015 at 01:44:20AM +0000, Chen, Yu C wrote: > > -----Original Message----- > > From: Javi Merino [mailto:javi.merino@arm.com] > > Sent: Thursday, October 15, 2015 10:05 PM > > To: Chen, Yu C > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; linux- > > kernel@vger.kernel.org; stable@vger.kernel.org; Pandruvada, Srinivas > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > cooling device registered > > > > On Wed, Oct 14, 2015 at 07:23:55PM +0000, Chen, Yu C wrote: > > > > -----Original Message----- > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > Sent: Thursday, October 15, 2015 1:08 AM > > > > To: Chen, Yu C > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org; Pandruvada, > > > > Srinivas > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after a > > > > cooling device registered > > > > > > > > On Mon, Oct 12, 2015 at 09:23:28AM +0000, Chen, Yu C wrote: > > > > > Hi, Javi > > > > > Sorry for my late response, > > > > > > > > > > > -----Original Message----- > > > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > > > Sent: Wednesday, September 30, 2015 12:02 AM > > > > > > To: Chen, Yu C > > > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, Rui; > > > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update after > > > > > > a cooling device registered > > > > > > > > > > > > Hi Yu, > > > > > > > > > > > > On Mon, Sep 28, 2015 at 06:52:00PM +0100, Chen, Yu C wrote: > > > > > > > Hi, Javi, > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Javi Merino [mailto:javi.merino@arm.com] > > > > > > > > Sent: Monday, September 28, 2015 10:29 PM > > > > > > > > To: Chen, Yu C > > > > > > > > Cc: linux-pm@vger.kernel.org; edubezval@gmail.com; Zhang, > > > > > > > > Rui; > > > > > > > > linux- kernel@vger.kernel.org; stable@vger.kernel.org > > > > > > > > Subject: Re: [PATCH 3/3] Thermal: do thermal zone update > > > > > > > > after a cooling device registered > > > > > > > > > > > > > > > > On Sun, Sep 27, 2015 at 06:48:44AM +0100, Chen Yu wrote: > > > > > > > > > From: Zhang Rui <rui.zhang@intel.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think you need to hold cdev->lock here, to make sure > > > > > > > > that no thermal zone is added or removed from > > > > > > > > cdev->thermal_instances while > > > > > > you are looping. > > > > > > > > > > > > > > > Ah right, will add. If I add the cdev ->lock here, will > > > > > > > there be a AB-BA lock with thermal_zone_unbind_cooling_device? > > > > > > > > > > > > You're right, it could lead to a deadlock. The locks can't be > > > > > > swapped because that won't work in step_wise. > > > > > > > > > > > > The best way that I can think of accessing thermal_instances > > > > > > atomically is by making it RCU protected instead of with mutexes. > > > > > > What do you think? > > > > > > > > > > > RCU would need extra spinlocks to protect the list, and need to > > > > > sync_rcu after we delete one instance from thermal_instance > > > > > list, I think it is too complicated for me to rewrite: ( How > > > > > about using > > > > thermal_list_lock instead of cdev ->lock? > > > > > This guy should be big enough to protect the > > > > > device.thermal_instance > > list. > > > > > > > > thermal_list_lock protects thermal_tz_list and thermal_cdev_list, > > > > but it doesn't protect the thermal_instances list. For example, > > > > thermal_zone_bind_cooling_device() adds a cooling device to the > > > > cdev->thermal_instances list without taking thermal_tz_list. > > > > > > > Before thermal_zone_bind_cooling_device is invoked, the > > > thermal_list_lock will be firstly gripped: > > > > > > static void bind_cdev(struct thermal_cooling_device *cdev) { > > > mutex_lock(&thermal_list_lock); > > > either tz->ops->bind : thermal_zone_bind_cooling_device > > > or __bind() : thermal_zone_bind_cooling_device > > > mutex_unlock(&thermal_list_lock); > > > } > > > > > > And it is the same as in passive_store. > > > So when code is trying to add/delete thermal_instance of cdev, he > > > has already hold thermal_list_lock IMO. Or do I miss anything? > > > > thermal_zone_bind_cooling_device() is exported, so you can't really > > rely on the static thermal_list_lock being acquired in every single call. > > > > thermal_list_lock and protects the lists thermal_tz_list and thermal_cdev_list. > > Making it implicitly protect the cooling device's and thermal zone > > device's instances list because no sensible code would call > > thermal_zone_bind_cooling_device() outside of a bind function is just > > asking for trouble. > > > Yes, from this point of view,it is true. > > > Locking is hard to understand and easy to get wrong so let's keep it simple. > > > How about the following 2 methods: > 1. avoid accessing device's thermal_instance, > but access all thermal_zone_device directly, > although there might be some redundancy, > some thermal zones do not need to be updated, > but we can avoid gripping dev->lock: > > mutex_lock(&thermal_list_lock); > list_for_each_entry(pos, &thermal_tz_list, node) > thermal_zone_device_update(tz); > mutex_unlock(&thermal_list_lock); > > or, > 2. Once we bind the new device with the thermal_zone_device, > we can record that thermal_zone_device, > and update that thermal_zone_device alone,the the code would be: > > mutex_lock(&thermal_list_lock); > list_for_each_entry(pos, &thermal_tz_list, node){ > if (tz->need_update) > thermal_zone_device_update(tz); > } > mutex_unlock(&thermal_list_lock); This sounds like a better alternative to me. I was thinking whether we could add the thremal_zone_device_update() directly in bind_cdev() to avoid the need_update field but I don't think it's any better: you would have to put it in two places (for the bind() and tbp.match() paths). With the solution you propose above you only have to put it in __thermal_cooling_device_register(), which is simpler. I vote for your solution (2) above. > BTW, since thermal_zone_device_update is not atomic, > we might need another patch to make it into atomic or > something like that, but for now, I think these three patches > are just for fixing the regressions. Yeah, we can fix that in another series. Cheers, Javi ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2015-10-20 9:48 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-03-24 5:21 [PATCH 0/3] Thermal: thermal enhancements for boot and system sleep Zhang Rui 2015-03-24 5:21 ` [PATCH 1/3] Thermal: initialize thermal zone device correctly Zhang Rui 2015-03-24 15:00 ` Eduardo Valentin 2015-03-24 17:20 ` Javi Merino 2015-03-25 2:14 ` Zhang, Rui 2015-03-24 5:21 ` [PATCH 2/3] Thermal: handle thermal zone device properly during system sleep Zhang Rui 2015-03-24 15:06 ` Eduardo Valentin 2015-03-25 2:25 ` Zhang, Rui 2015-03-25 14:40 ` Eduardo Valentin 2015-03-24 16:39 ` Javi Merino 2015-03-25 2:28 ` Zhang, Rui 2015-03-24 5:21 ` [PATCH 3/3] Thermal: do thermal zone update after a cooling device registered Zhang Rui 2015-03-24 15:12 ` Eduardo Valentin 2015-03-25 2:27 ` Zhang, Rui 2015-09-27 5:48 Chen Yu 2015-09-28 14:29 ` Javi Merino 2015-09-28 17:52 ` Chen, Yu C 2015-09-28 17:52 ` Chen, Yu C 2015-09-29 16:01 ` Javi Merino 2015-10-12 9:23 ` Chen, Yu C 2015-10-12 9:23 ` Chen, Yu C 2015-10-14 17:07 ` Javi Merino 2015-10-14 19:21 ` Chen, Yu C 2015-10-14 19:21 ` Chen, Yu C 2015-10-14 19:23 ` Chen, Yu C 2015-10-14 19:23 ` Chen, Yu C 2015-10-15 14:05 ` Javi Merino 2015-10-20 1:05 ` Chen, Yu C 2015-10-20 1:05 ` Chen, Yu C 2015-10-20 1:44 ` Chen, Yu C 2015-10-20 1:44 ` Chen, Yu C 2015-10-20 9:47 ` Javi Merino
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.