linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device correctly") causes performance drop
@ 2016-03-16 22:27 Laura Abbott
  2016-03-16 22:46 ` Greg Kroah-Hartman
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Laura Abbott @ 2016-03-16 22:27 UTC (permalink / raw)
  To: Zhang Rui, Javi Merino, Chen Yu
  Cc: Greg Kroah-Hartman, Manuel Krause, szegad, prash, amish,
	Matthias, linux-pm, Linux Kernel Mailing List

Hi,

Fedora received a bug report (https://bugzilla.redhat.com/show_bug.cgi?id=1317190)
of a major performance drop on various bench marks and general system
sluggishness with the 4.4.4 kernel update. The benchmarks were showing
a reduction to about 18% performance (not minor).

Bisection showed the first bad commit was

commit 774ac8b7eff69e0786970157de2157e68b22f456
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Fri Oct 30 16:31:47 2015 +0800

     Thermal: initialize thermal zone device correctly
     
     commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461 upstream.
     
     After thermal zone device registered, as we have not read any
     temperature before, thus tz->temperature should not be 0,
     which actually means 0C, and thermal trend is not available.
     In this case, we need specially handling for the first
     thermal_zone_device_update().
     
     Both thermal core framework and step_wise governor is
     enhanced to handle this. And since the step_wise governor
     is the only one that uses trends, so it's the only thermal
     governor that needs to be updated.
     
     Tested-by: Manuel Krause <manuelkrause@netscape.net>
     Tested-by: szegad <szegadlo@poczta.onet.pl>
     Tested-by: prash <prash.n.rao@gmail.com>
     Tested-by: amish <ammdispose-arch@yahoo.com>
     Tested-by: Matthias <morpheusxyz123@yahoo.de>
     Reviewed-by: Javi Merino <javi.merino@arm.com>
     Signed-off-by: Zhang Rui <rui.zhang@intel.com>
     Signed-off-by: Chen Yu <yu.c.chen@intel.com>
     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>



Reverting this plus to other commits in the series (a67208e94d94
"Thermal: handle thermal zone device properly during system sleep"
and 27f356149d59 "Thermal: do thermal zone update after a cooling
device registered") confirmed the performance was back to normal.

Bugzilla has the full discussion but this comment from one of the
reporters sums it up:

"In 4.4.3 and prior, my 2.40 MHz processor would fluctuate between
1000 and 3400 MHz.  In 4.4.4, the processor would fluctuate between
400 and 700 MHz, according to /proc/cpuinfo.

Setting /sys/devices/system/cpu/cpufreq/policy0/scaling_governor to
performance, instead of the default "powersave" forces the CPU to
2400 MHz, and improves performance greatly, but still not to the
same level as in 4.4.3."

Any ideas?

Thanks,
Laura

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device correctly") causes performance drop
  2016-03-16 22:27 [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device correctly") causes performance drop Laura Abbott
@ 2016-03-16 22:46 ` Greg Kroah-Hartman
  2016-03-17  0:00   ` Laura Abbott
  2016-03-17  0:17 ` Zhang, Rui
  2016-03-19  4:43 ` Zhang Rui
  2 siblings, 1 reply; 11+ messages in thread
From: Greg Kroah-Hartman @ 2016-03-16 22:46 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Zhang Rui, Javi Merino, Chen Yu, Manuel Krause, szegad, prash,
	amish, Matthias, linux-pm, Linux Kernel Mailing List

On Wed, Mar 16, 2016 at 03:27:57PM -0700, Laura Abbott wrote:
> Hi,
> 
> Fedora received a bug report (https://bugzilla.redhat.com/show_bug.cgi?id=1317190)
> of a major performance drop on various bench marks and general system
> sluggishness with the 4.4.4 kernel update. The benchmarks were showing
> a reduction to about 18% performance (not minor).
> 
> Bisection showed the first bad commit was
> 
> commit 774ac8b7eff69e0786970157de2157e68b22f456
> Author: Zhang Rui <rui.zhang@intel.com>
> Date:   Fri Oct 30 16:31:47 2015 +0800
> 
>     Thermal: initialize thermal zone device correctly
>     commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461 upstream.
>     After thermal zone device registered, as we have not read any
>     temperature before, thus tz->temperature should not be 0,
>     which actually means 0C, and thermal trend is not available.
>     In this case, we need specially handling for the first
>     thermal_zone_device_update().
>     Both thermal core framework and step_wise governor is
>     enhanced to handle this. And since the step_wise governor
>     is the only one that uses trends, so it's the only thermal
>     governor that needs to be updated.
>     Tested-by: Manuel Krause <manuelkrause@netscape.net>
>     Tested-by: szegad <szegadlo@poczta.onet.pl>
>     Tested-by: prash <prash.n.rao@gmail.com>
>     Tested-by: amish <ammdispose-arch@yahoo.com>
>     Tested-by: Matthias <morpheusxyz123@yahoo.de>
>     Reviewed-by: Javi Merino <javi.merino@arm.com>
>     Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>     Signed-off-by: Chen Yu <yu.c.chen@intel.com>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> 
> 
> Reverting this plus to other commits in the series (a67208e94d94
> "Thermal: handle thermal zone device properly during system sleep"
> and 27f356149d59 "Thermal: do thermal zone update after a cooling
> device registered") confirmed the performance was back to normal.
> 
> Bugzilla has the full discussion but this comment from one of the
> reporters sums it up:
> 
> "In 4.4.3 and prior, my 2.40 MHz processor would fluctuate between
> 1000 and 3400 MHz.  In 4.4.4, the processor would fluctuate between
> 400 and 700 MHz, according to /proc/cpuinfo.
> 
> Setting /sys/devices/system/cpu/cpufreq/policy0/scaling_governor to
> performance, instead of the default "powersave" forces the CPU to
> 2400 MHz, and improves performance greatly, but still not to the
> same level as in 4.4.3."
> 
> Any ideas?

Is this same "slowdown" also seen in 4.5?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device correctly") causes performance drop
  2016-03-16 22:46 ` Greg Kroah-Hartman
@ 2016-03-17  0:00   ` Laura Abbott
  2016-03-17  0:11     ` Greg Kroah-Hartman
  2016-03-17  0:20     ` Pandruvada, Srinivas
  0 siblings, 2 replies; 11+ messages in thread
From: Laura Abbott @ 2016-03-17  0:00 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Zhang Rui, Javi Merino, Chen Yu, Manuel Krause, szegad, prash,
	amish, Matthias, linux-pm, Linux Kernel Mailing List

On 03/16/2016 03:46 PM, Greg Kroah-Hartman wrote:
> On Wed, Mar 16, 2016 at 03:27:57PM -0700, Laura Abbott wrote:
>> Hi,
>>
>> Fedora received a bug report (https://bugzilla.redhat.com/show_bug.cgi?id=1317190)
>> of a major performance drop on various bench marks and general system
>> sluggishness with the 4.4.4 kernel update. The benchmarks were showing
>> a reduction to about 18% performance (not minor).
>>
>> Bisection showed the first bad commit was
>>
>> commit 774ac8b7eff69e0786970157de2157e68b22f456
>> Author: Zhang Rui <rui.zhang@intel.com>
>> Date:   Fri Oct 30 16:31:47 2015 +0800
>>
>>      Thermal: initialize thermal zone device correctly
>>      commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461 upstream.
>>      After thermal zone device registered, as we have not read any
>>      temperature before, thus tz->temperature should not be 0,
>>      which actually means 0C, and thermal trend is not available.
>>      In this case, we need specially handling for the first
>>      thermal_zone_device_update().
>>      Both thermal core framework and step_wise governor is
>>      enhanced to handle this. And since the step_wise governor
>>      is the only one that uses trends, so it's the only thermal
>>      governor that needs to be updated.
>>      Tested-by: Manuel Krause <manuelkrause@netscape.net>
>>      Tested-by: szegad <szegadlo@poczta.onet.pl>
>>      Tested-by: prash <prash.n.rao@gmail.com>
>>      Tested-by: amish <ammdispose-arch@yahoo.com>
>>      Tested-by: Matthias <morpheusxyz123@yahoo.de>
>>      Reviewed-by: Javi Merino <javi.merino@arm.com>
>>      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>      Signed-off-by: Chen Yu <yu.c.chen@intel.com>
>>      Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>
>>
>>
>> Reverting this plus to other commits in the series (a67208e94d94
>> "Thermal: handle thermal zone device properly during system sleep"
>> and 27f356149d59 "Thermal: do thermal zone update after a cooling
>> device registered") confirmed the performance was back to normal.
>>
>> Bugzilla has the full discussion but this comment from one of the
>> reporters sums it up:
>>
>> "In 4.4.3 and prior, my 2.40 MHz processor would fluctuate between
>> 1000 and 3400 MHz.  In 4.4.4, the processor would fluctuate between
>> 400 and 700 MHz, according to /proc/cpuinfo.
>>
>> Setting /sys/devices/system/cpu/cpufreq/policy0/scaling_governor to
>> performance, instead of the default "powersave" forces the CPU to
>> 2400 MHz, and improves performance greatly, but still not to the
>> same level as in 4.4.3."
>>
>> Any ideas?
>
> Is this same "slowdown" also seen in 4.5?
>
> thanks,
>
> greg k-h
>

Yes, the same issue is seen on 4.5 according to the reporter.

Thanks,
Laura

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device correctly") causes performance drop
  2016-03-17  0:00   ` Laura Abbott
@ 2016-03-17  0:11     ` Greg Kroah-Hartman
  2016-03-17  0:20     ` Pandruvada, Srinivas
  1 sibling, 0 replies; 11+ messages in thread
From: Greg Kroah-Hartman @ 2016-03-17  0:11 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Zhang Rui, Javi Merino, Chen Yu, Manuel Krause, szegad, prash,
	amish, Matthias, linux-pm, Linux Kernel Mailing List

On Wed, Mar 16, 2016 at 05:00:07PM -0700, Laura Abbott wrote:
> On 03/16/2016 03:46 PM, Greg Kroah-Hartman wrote:
> >On Wed, Mar 16, 2016 at 03:27:57PM -0700, Laura Abbott wrote:
> >>Hi,
> >>
> >>Fedora received a bug report (https://bugzilla.redhat.com/show_bug.cgi?id=1317190)
> >>of a major performance drop on various bench marks and general system
> >>sluggishness with the 4.4.4 kernel update. The benchmarks were showing
> >>a reduction to about 18% performance (not minor).
> >>
> >>Bisection showed the first bad commit was
> >>
> >>commit 774ac8b7eff69e0786970157de2157e68b22f456
> >>Author: Zhang Rui <rui.zhang@intel.com>
> >>Date:   Fri Oct 30 16:31:47 2015 +0800
> >>
> >>     Thermal: initialize thermal zone device correctly
> >>     commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461 upstream.
> >>     After thermal zone device registered, as we have not read any
> >>     temperature before, thus tz->temperature should not be 0,
> >>     which actually means 0C, and thermal trend is not available.
> >>     In this case, we need specially handling for the first
> >>     thermal_zone_device_update().
> >>     Both thermal core framework and step_wise governor is
> >>     enhanced to handle this. And since the step_wise governor
> >>     is the only one that uses trends, so it's the only thermal
> >>     governor that needs to be updated.
> >>     Tested-by: Manuel Krause <manuelkrause@netscape.net>
> >>     Tested-by: szegad <szegadlo@poczta.onet.pl>
> >>     Tested-by: prash <prash.n.rao@gmail.com>
> >>     Tested-by: amish <ammdispose-arch@yahoo.com>
> >>     Tested-by: Matthias <morpheusxyz123@yahoo.de>
> >>     Reviewed-by: Javi Merino <javi.merino@arm.com>
> >>     Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> >>     Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> >>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >>
> >>
> >>
> >>Reverting this plus to other commits in the series (a67208e94d94
> >>"Thermal: handle thermal zone device properly during system sleep"
> >>and 27f356149d59 "Thermal: do thermal zone update after a cooling
> >>device registered") confirmed the performance was back to normal.
> >>
> >>Bugzilla has the full discussion but this comment from one of the
> >>reporters sums it up:
> >>
> >>"In 4.4.3 and prior, my 2.40 MHz processor would fluctuate between
> >>1000 and 3400 MHz.  In 4.4.4, the processor would fluctuate between
> >>400 and 700 MHz, according to /proc/cpuinfo.
> >>
> >>Setting /sys/devices/system/cpu/cpufreq/policy0/scaling_governor to
> >>performance, instead of the default "powersave" forces the CPU to
> >>2400 MHz, and improves performance greatly, but still not to the
> >>same level as in 4.4.3."
> >>
> >>Any ideas?
> >
> >Is this same "slowdown" also seen in 4.5?
> >
> >thanks,
> >
> >greg k-h
> >
> 
> Yes, the same issue is seen on 4.5 according to the reporter.

Great, we are "bug compatible" :)

Can you please work to get this resolved in Linus's tree and then we can
backport the needed changes into the 4.4-stable release.

Zhang, any ideas here?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device correctly") causes performance drop
  2016-03-16 22:27 [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device correctly") causes performance drop Laura Abbott
  2016-03-16 22:46 ` Greg Kroah-Hartman
@ 2016-03-17  0:17 ` Zhang, Rui
  2016-03-19  4:43 ` Zhang Rui
  2 siblings, 0 replies; 11+ messages in thread
From: Zhang, Rui @ 2016-03-17  0:17 UTC (permalink / raw)
  To: Laura Abbott, Javi Merino, Chen, Yu C
  Cc: Greg Kroah-Hartman, Manuel Krause, szegad, prash, amish,
	Matthias, linux-pm, Linux Kernel Mailing List

Can you send me the output of "grep . /sys/class/thermal/*/*" both w/ and w/o the broken patch series?

Thanks,
rui

> -----Original Message-----
> From: Laura Abbott [mailto:labbott@redhat.com]
> Sent: Thursday, March 17, 2016 6:28 AM
> To: Zhang, Rui <rui.zhang@intel.com>; Javi Merino <javi.merino@arm.com>;
> Chen, Yu C <yu.c.chen@intel.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>; Manuel Krause
> <manuelkrause@netscape.net>; szegad <szegadlo@poczta.onet.pl>; prash
> <prash.n.rao@gmail.com>; amish <ammdispose-arch@yahoo.com>; Matthias
> <morpheusxyz123@yahoo.de>; linux-pm@vger.kernel.org; Linux Kernel
> Mailing List <linux-kernel@vger.kernel.org>
> Subject: [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device
> correctly") causes performance drop
> Importance: High
> 
> Hi,
> 
> Fedora received a bug report
> (https://bugzilla.redhat.com/show_bug.cgi?id=1317190)
> of a major performance drop on various bench marks and general system
> sluggishness with the 4.4.4 kernel update. The benchmarks were showing a
> reduction to about 18% performance (not minor).
> 
> Bisection showed the first bad commit was
> 
> commit 774ac8b7eff69e0786970157de2157e68b22f456
> Author: Zhang Rui <rui.zhang@intel.com>
> Date:   Fri Oct 30 16:31:47 2015 +0800
> 
>      Thermal: initialize thermal zone device correctly
> 
>      commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461 upstream.
> 
>      After thermal zone device registered, as we have not read any
>      temperature before, thus tz->temperature should not be 0,
>      which actually means 0C, and thermal trend is not available.
>      In this case, we need specially handling for the first
>      thermal_zone_device_update().
> 
>      Both thermal core framework and step_wise governor is
>      enhanced to handle this. And since the step_wise governor
>      is the only one that uses trends, so it's the only thermal
>      governor that needs to be updated.
> 
>      Tested-by: Manuel Krause <manuelkrause@netscape.net>
>      Tested-by: szegad <szegadlo@poczta.onet.pl>
>      Tested-by: prash <prash.n.rao@gmail.com>
>      Tested-by: amish <ammdispose-arch@yahoo.com>
>      Tested-by: Matthias <morpheusxyz123@yahoo.de>
>      Reviewed-by: Javi Merino <javi.merino@arm.com>
>      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>      Signed-off-by: Chen Yu <yu.c.chen@intel.com>
>      Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> 
> 
> Reverting this plus to other commits in the series (a67208e94d94
> "Thermal: handle thermal zone device properly during system sleep"
> and 27f356149d59 "Thermal: do thermal zone update after a cooling device
> registered") confirmed the performance was back to normal.
> 
> Bugzilla has the full discussion but this comment from one of the reporters
> sums it up:
> 
> "In 4.4.3 and prior, my 2.40 MHz processor would fluctuate between
> 1000 and 3400 MHz.  In 4.4.4, the processor would fluctuate between
> 400 and 700 MHz, according to /proc/cpuinfo.
> 
> Setting /sys/devices/system/cpu/cpufreq/policy0/scaling_governor to
> performance, instead of the default "powersave" forces the CPU to
> 2400 MHz, and improves performance greatly, but still not to the same level
> as in 4.4.3."
> 
> Any ideas?
> 
> Thanks,
> Laura

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device correctly") causes performance drop
  2016-03-17  0:00   ` Laura Abbott
  2016-03-17  0:11     ` Greg Kroah-Hartman
@ 2016-03-17  0:20     ` Pandruvada, Srinivas
  2016-03-18 22:28       ` Laura Abbott
  1 sibling, 1 reply; 11+ messages in thread
From: Pandruvada, Srinivas @ 2016-03-17  0:20 UTC (permalink / raw)
  To: labbott, gregkh
  Cc: ammdispose-arch, linux-kernel, Zhang, Rui, Chen, Yu C,
	manuelkrause, linux-pm, szegadlo, prash.n.rao, javi.merino,
	morpheusxyz123

On Wed, 2016-03-16 at 17:00 -0700, Laura Abbott wrote:
> On 03/16/2016 03:46 PM, Greg Kroah-Hartman wrote:
> > On Wed, Mar 16, 2016 at 03:27:57PM -0700, Laura Abbott wrote:
> > > Hi,
> > > 
> > > Fedora received a bug report (https://bugzilla.redhat.com/show_bu
> > > g.cgi?id=1317190)
> > > of a major performance drop on various bench marks and general
> > > system
> > > sluggishness with the 4.4.4 kernel update. The benchmarks were
> > > showing
> > > a reduction to about 18% performance (not minor).
> > > 
> > > Bisection showed the first bad commit was
> > > 
> > > commit 774ac8b7eff69e0786970157de2157e68b22f456
> > > Author: Zhang Rui <rui.zhang@intel.com>
> > > Date:   Fri Oct 30 16:31:47 2015 +0800
> > > 
> > >      Thermal: initialize thermal zone device correctly
> > >      commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461 upstream.
> > >      After thermal zone device registered, as we have not read
> > > any
> > >      temperature before, thus tz->temperature should not be 0,
> > >      which actually means 0C, and thermal trend is not available.
> > >      In this case, we need specially handling for the first
> > >      thermal_zone_device_update().
> > >      Both thermal core framework and step_wise governor is
> > >      enhanced to handle this. And since the step_wise governor
> > >      is the only one that uses trends, so it's the only thermal
> > >      governor that needs to be updated.
> > >      Tested-by: Manuel Krause <manuelkrause@netscape.net>
> > >      Tested-by: szegad <szegadlo@poczta.onet.pl>
> > >      Tested-by: prash <prash.n.rao@gmail.com>
> > >      Tested-by: amish <ammdispose-arch@yahoo.com>
> > >      Tested-by: Matthias <morpheusxyz123@yahoo.de>
> > >      Reviewed-by: Javi Merino <javi.merino@arm.com>
> > >      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> > >      Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> > >      Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.or
> > > g>
> > > 
> > > 
> > > 
> > > Reverting this plus to other commits in the series (a67208e94d94
> > > "Thermal: handle thermal zone device properly during system
> > > sleep"
> > > and 27f356149d59 "Thermal: do thermal zone update after a cooling
> > > device registered") confirmed the performance was back to normal.
> > > 
> > > Bugzilla has the full discussion but this comment from one of the
> > > reporters sums it up:
> > > 
> > > "In 4.4.3 and prior, my 2.40 MHz processor would fluctuate
> > > between
> > > 1000 and 3400 MHz.  In 4.4.4, the processor would fluctuate
> > > between
> > > 400 and 700 MHz, according to /proc/cpuinfo.
> > > 
> > > Setting /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
> > > to
> > > performance, instead of the default "powersave" forces the CPU to
> > > 2400 MHz, and improves performance greatly, but still not to the
> > > same level as in 4.4.3."
> > > 
> > > Any ideas?
> > 
> > Is this same "slowdown" also seen in 4.5?
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Yes, the same issue is seen on 4.5 according to the reporter.
What does it show here when performance drops?
grep . /sys/devices/system/cpu/intel_pstate/*

Is the problem still occurs if you set 
/sys/class/thermal/thermal_zone*/mode to "disabled" 

Thanks,
Srinivas

> 
> Thanks,
> Laura
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device correctly") causes performance drop
  2016-03-17  0:20     ` Pandruvada, Srinivas
@ 2016-03-18 22:28       ` Laura Abbott
  2016-03-18 23:36         ` Pandruvada, Srinivas
  0 siblings, 1 reply; 11+ messages in thread
From: Laura Abbott @ 2016-03-18 22:28 UTC (permalink / raw)
  To: Pandruvada, Srinivas, gregkh
  Cc: ammdispose-arch, linux-kernel, Zhang, Rui, Chen, Yu C,
	manuelkrause, linux-pm, szegadlo, prash.n.rao, javi.merino,
	morpheusxyz123, frolvlad

(bringing this back to the main thread)

On 03/16/2016 05:20 PM, Pandruvada, Srinivas wrote:
> On Wed, 2016-03-16 at 17:00 -0700, Laura Abbott wrote:
>> On 03/16/2016 03:46 PM, Greg Kroah-Hartman wrote:
>>> On Wed, Mar 16, 2016 at 03:27:57PM -0700, Laura Abbott wrote:
>>>> Hi,
>>>>
>>>> Fedora received a bug report (https://bugzilla.redhat.com/show_bu
>>>> g.cgi?id=1317190)
>>>> of a major performance drop on various bench marks and general
>>>> system
>>>> sluggishness with the 4.4.4 kernel update. The benchmarks were
>>>> showing
>>>> a reduction to about 18% performance (not minor).
>>>>
>>>> Bisection showed the first bad commit was
>>>>
>>>> commit 774ac8b7eff69e0786970157de2157e68b22f456
>>>> Author: Zhang Rui <rui.zhang@intel.com>
>>>> Date:   Fri Oct 30 16:31:47 2015 +0800
>>>>
>>>>       Thermal: initialize thermal zone device correctly
>>>>       commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461 upstream.
>>>>       After thermal zone device registered, as we have not read
>>>> any
>>>>       temperature before, thus tz->temperature should not be 0,
>>>>       which actually means 0C, and thermal trend is not available.
>>>>       In this case, we need specially handling for the first
>>>>       thermal_zone_device_update().
>>>>       Both thermal core framework and step_wise governor is
>>>>       enhanced to handle this. And since the step_wise governor
>>>>       is the only one that uses trends, so it's the only thermal
>>>>       governor that needs to be updated.
>>>>       Tested-by: Manuel Krause <manuelkrause@netscape.net>
>>>>       Tested-by: szegad <szegadlo@poczta.onet.pl>
>>>>       Tested-by: prash <prash.n.rao@gmail.com>
>>>>       Tested-by: amish <ammdispose-arch@yahoo.com>
>>>>       Tested-by: Matthias <morpheusxyz123@yahoo.de>
>>>>       Reviewed-by: Javi Merino <javi.merino@arm.com>
>>>>       Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>>       Signed-off-by: Chen Yu <yu.c.chen@intel.com>
>>>>       Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.or
>>>> g>
>>>>
>>>>
>>>>
>>>> Reverting this plus to other commits in the series (a67208e94d94
>>>> "Thermal: handle thermal zone device properly during system
>>>> sleep"
>>>> and 27f356149d59 "Thermal: do thermal zone update after a cooling
>>>> device registered") confirmed the performance was back to normal.
>>>>
>>>> Bugzilla has the full discussion but this comment from one of the
>>>> reporters sums it up:
>>>>
>>>> "In 4.4.3 and prior, my 2.40 MHz processor would fluctuate
>>>> between
>>>> 1000 and 3400 MHz.  In 4.4.4, the processor would fluctuate
>>>> between
>>>> 400 and 700 MHz, according to /proc/cpuinfo.
>>>>
>>>> Setting /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
>>>> to
>>>> performance, instead of the default "powersave" forces the CPU to
>>>> 2400 MHz, and improves performance greatly, but still not to the
>>>> same level as in 4.4.3."
>>>>
>>>> Any ideas?
>>>
>>> Is this same "slowdown" also seen in 4.5?
>>>
>>> thanks,
>>>
>>> greg k-h
>>>
>>
>> Yes, the same issue is seen on 4.5 according to the reporter.
> What does it show here when performance drops?
> grep . /sys/devices/system/cpu/intel_pstate/*
>
> Is the problem still occurs if you set
> /sys/class/thermal/thermal_zone*/mode to "disabled"
>
> Thanks,
> Srinivas
>

A separate thread was started which gave this insight:

"I think
the problem is your device has a passive trip temp of 0
/sys/class/thermal/thermal_zone0/trip_point_2_temp:0
/sys/class/thermal/thermal_zone0/trip_point_2_type:passive

Which triggers a false throttle = true. I think we should this trip as
invalid in the case of
if (tz->temperature >= trip_temp) {} check
in thermal_zone_trip_update()."

So would something like the following work?

diff --git a/drivers/thermal/step_wise.c b/drivers/thermal/step_wise.c
index ea9366a..1228797 100644
--- a/drivers/thermal/step_wise.c
+++ b/drivers/thermal/step_wise.c
@@ -143,7 +143,7 @@ static void thermal_zone_trip_update(struct thermal_zone_device *tz, int trip)
  
         trend = get_tz_trend(tz, trip);
  
-       if (tz->temperature >= trip_temp) {
+       if (trip_type != THERMAL_TRIPS_NONE && tz->temperature >= trip_temp) {
                 throttle = true;
                 trace_thermal_zone_trip(tz, trip, trip_type);
         }

(completely untested, no idea if I'm even close)

Thanks,
Laura

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device correctly") causes performance drop
  2016-03-18 22:28       ` Laura Abbott
@ 2016-03-18 23:36         ` Pandruvada, Srinivas
  0 siblings, 0 replies; 11+ messages in thread
From: Pandruvada, Srinivas @ 2016-03-18 23:36 UTC (permalink / raw)
  To: labbott, gregkh
  Cc: ammdispose-arch, linux-kernel, Chen, Yu C, Zhang, Rui,
	manuelkrause, linux-pm, frolvlad, szegadlo, prash.n.rao,
	javi.merino, morpheusxyz123

On Fri, 2016-03-18 at 15:28 -0700, Laura Abbott wrote:
> (bringing this back to the main thread)
> 
> On 03/16/2016 05:20 PM, Pandruvada, Srinivas wrote:
> > On Wed, 2016-03-16 at 17:00 -0700, Laura Abbott wrote:
> > > On 03/16/2016 03:46 PM, Greg Kroah-Hartman wrote:
> > > > On Wed, Mar 16, 2016 at 03:27:57PM -0700, Laura Abbott wrote:
> > > > > Hi,
> > > > > 
> > > > > Fedora received a bug report (https://bugzilla.redhat.com/sho
> > > > > w_bu
> > > > > g.cgi?id=1317190)
> > > > > of a major performance drop on various bench marks and
> > > > > general
> > > > > system
> > > > > sluggishness with the 4.4.4 kernel update. The benchmarks
> > > > > were
> > > > > showing
> > > > > a reduction to about 18% performance (not minor).
> > > > > 
> > > > > Bisection showed the first bad commit was
> > > > > 
> > > > > commit 774ac8b7eff69e0786970157de2157e68b22f456
> > > > > Author: Zhang Rui <rui.zhang@intel.com>
> > > > > Date:   Fri Oct 30 16:31:47 2015 +0800
> > > > > 
> > > > >       Thermal: initialize thermal zone device correctly
> > > > >       commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461
> > > > > upstream.
> > > > >       After thermal zone device registered, as we have not
> > > > > read
> > > > > any
> > > > >       temperature before, thus tz->temperature should not be
> > > > > 0,
> > > > >       which actually means 0C, and thermal trend is not
> > > > > available.
> > > > >       In this case, we need specially handling for the first
> > > > >       thermal_zone_device_update().
> > > > >       Both thermal core framework and step_wise governor is
> > > > >       enhanced to handle this. And since the step_wise
> > > > > governor
> > > > >       is the only one that uses trends, so it's the only
> > > > > thermal
> > > > >       governor that needs to be updated.
> > > > >       Tested-by: Manuel Krause <manuelkrause@netscape.net>
> > > > >       Tested-by: szegad <szegadlo@poczta.onet.pl>
> > > > >       Tested-by: prash <prash.n.rao@gmail.com>
> > > > >       Tested-by: amish <ammdispose-arch@yahoo.com>
> > > > >       Tested-by: Matthias <morpheusxyz123@yahoo.de>
> > > > >       Reviewed-by: Javi Merino <javi.merino@arm.com>
> > > > >       Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> > > > >       Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> > > > >       Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundati
> > > > > on.or
> > > > > g>
> > > > > 
> > > > > 
> > > > > 
> > > > > Reverting this plus to other commits in the series
> > > > > (a67208e94d94
> > > > > "Thermal: handle thermal zone device properly during system
> > > > > sleep"
> > > > > and 27f356149d59 "Thermal: do thermal zone update after a
> > > > > cooling
> > > > > device registered") confirmed the performance was back to
> > > > > normal.
> > > > > 
> > > > > Bugzilla has the full discussion but this comment from one of
> > > > > the
> > > > > reporters sums it up:
> > > > > 
> > > > > "In 4.4.3 and prior, my 2.40 MHz processor would fluctuate
> > > > > between
> > > > > 1000 and 3400 MHz.  In 4.4.4, the processor would fluctuate
> > > > > between
> > > > > 400 and 700 MHz, according to /proc/cpuinfo.
> > > > > 
> > > > > Setting
> > > > > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
> > > > > to
> > > > > performance, instead of the default "powersave" forces the
> > > > > CPU to
> > > > > 2400 MHz, and improves performance greatly, but still not to
> > > > > the
> > > > > same level as in 4.4.3."
> > > > > 
> > > > > Any ideas?
> > > > 
> > > > Is this same "slowdown" also seen in 4.5?
> > > > 
> > > > thanks,
> > > > 
> > > > greg k-h
> > > > 
> > > 
> > > Yes, the same issue is seen on 4.5 according to the reporter.
> > What does it show here when performance drops?
> > grep . /sys/devices/system/cpu/intel_pstate/*
> > 
> > Is the problem still occurs if you set
> > /sys/class/thermal/thermal_zone*/mode to "disabled"
> > 
> > Thanks,
> > Srinivas
> > 
> 
> A separate thread was started which gave this insight:
> 
> "I think
> the problem is your device has a passive trip temp of 0
> /sys/class/thermal/thermal_zone0/trip_point_2_temp:0
> /sys/class/thermal/thermal_zone0/trip_point_2_type:passive
> 
> Which triggers a false throttle = true. I think we should this trip
> as
> invalid in the case of
> if (tz->temperature >= trip_temp) {} check
> in thermal_zone_trip_update()."
> 
> So would something like the following work?
> 
> diff --git a/drivers/thermal/step_wise.c
> b/drivers/thermal/step_wise.c
> index ea9366a..1228797 100644
> --- a/drivers/thermal/step_wise.c
> +++ b/drivers/thermal/step_wise.c
> @@ -143,7 +143,7 @@ static void thermal_zone_trip_update(struct
> thermal_zone_device *tz, int trip)
>   
>          trend = get_tz_trend(tz, trip);
>   
> -       if (tz->temperature >= trip_temp) {
> +       if (trip_type != THERMAL_TRIPS_NONE && tz->temperature >=
> trip_temp) {
	if (trip_temp && tz->temperature >= trip_temp) {
                 throttle = true;
>                  trace_thermal_zone_trip(tz, trip, trip_type);
>          }
> 
I think Rui is working on some change.

Thanks,
Srinivas

> (completely untested, no idea if I'm even close)
> 
> Thanks,
> Laura

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device correctly") causes performance drop
  2016-03-16 22:27 [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device correctly") causes performance drop Laura Abbott
  2016-03-16 22:46 ` Greg Kroah-Hartman
  2016-03-17  0:17 ` Zhang, Rui
@ 2016-03-19  4:43 ` Zhang Rui
       [not found]   ` <CAJABK0MXJRh9wtud+RQrL-==afymcqVovRC8FAZ6=qG4Sob+Tw@mail.gmail.com>
  2 siblings, 1 reply; 11+ messages in thread
From: Zhang Rui @ 2016-03-19  4:43 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Javi Merino, Chen Yu, Greg Kroah-Hartman, Manuel Krause, szegad,
	prash, amish, Matthias, linux-pm, Linux Kernel Mailing List,
	frolvlad


Hi, all,

On Wed, 2016-03-16 at 15:27 -0700, Laura Abbott wrote:
> Hi,
> 
> Fedora received a bug report (https://bugzilla.redhat.com/show_bug.cgi?id=1317190)
> of a major performance drop on various bench marks and general system
> sluggishness with the 4.4.4 kernel update. The benchmarks were showing
> a reduction to about 18% performance (not minor).
> 
> Bisection showed the first bad commit was
> 
> commit 774ac8b7eff69e0786970157de2157e68b22f456
> Author: Zhang Rui <rui.zhang@intel.com>
> Date:   Fri Oct 30 16:31:47 2015 +0800
> 
>      Thermal: initialize thermal zone device correctly
>      
>      commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461 upstream.
>      
>      After thermal zone device registered, as we have not read any
>      temperature before, thus tz->temperature should not be 0,
>      which actually means 0C, and thermal trend is not available.
>      In this case, we need specially handling for the first
>      thermal_zone_device_update().
>      
>      Both thermal core framework and step_wise governor is
>      enhanced to handle this. And since the step_wise governor
>      is the only one that uses trends, so it's the only thermal
>      governor that needs to be updated.
>      
>      Tested-by: Manuel Krause <manuelkrause@netscape.net>
>      Tested-by: szegad <szegadlo@poczta.onet.pl>
>      Tested-by: prash <prash.n.rao@gmail.com>
>      Tested-by: amish <ammdispose-arch@yahoo.com>
>      Tested-by: Matthias <morpheusxyz123@yahoo.de>
>      Reviewed-by: Javi Merino <javi.merino@arm.com>
>      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>      Signed-off-by: Chen Yu <yu.c.chen@intel.com>
>      Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

According to https://bugzilla.redhat.com/show_bug.cgi?id=1317190#c30
and https://bugzilla.kernel.org/show_bug.cgi?id=114551,
the root cause of the problem is that
1. BIOS reports bogus passive trip point value, which is 0 degree
Celsius
2. Without the thermal patches, thermal core does nothing upon driver
probe, thus processor cooling states are not changed.
3. With the thermal patches applied, thermal core checks the new
registered thermal zone driver when it is probed, and set the cooling
devices to proper state, for example, fan device could be spinning
during boot, even if the system is cool.
4. Thus, on these Lenovo laptops, processor cooling devices are
throttled because the current temperature (around 50C) is higher than
the passive trip point (0C), and the thermal zone is considered as
overheating.

In order to workaround this bogus BIOS, we should disable those invalid
trip points by checking the trip point value, like the patch below,
Laura and Vlad,
can you please try the patch below and confirm if it fixes the problem
for you?

>From 81ad4276b505e987dd8ebbdf63605f92cd172b52 Mon Sep 17 00:00:00 2001
From: Zhang Rui <rui.zhang@intel.com>
Date: Fri, 18 Mar 2016 10:03:24 +0800
Subject: [PATCH] Thermal: Ignore invalid trip points

In some cases, platform thermal driver may report invalid trip points,
thermal core should not take any action for these trip points.

This fixed a regression that bogus trip point starts to screw up thermal
control on some Lenovo laptops, after
commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Fri Oct 30 16:31:47 2015 +0800

    Thermal: initialize thermal zone device correctly

    After thermal zone device registered, as we have not read any
    temperature before, thus tz->temperature should not be 0,
    which actually means 0C, and thermal trend is not available.
    In this case, we need specially handling for the first
    thermal_zone_device_update().

    Both thermal core framework and step_wise governor is
    enhanced to handle this. And since the step_wise governor
    is the only one that uses trends, so it's the only thermal
    governor that needs to be updated.

    Tested-by: Manuel Krause <manuelkrause@netscape.net>
    Tested-by: szegad <szegadlo@poczta.onet.pl>
    Tested-by: prash <prash.n.rao@gmail.com>
    Tested-by: amish <ammdispose-arch@yahoo.com>
    Tested-by: Matthias <morpheusxyz123@yahoo.de>
    Reviewed-by: Javi Merino <javi.merino@arm.com>
    Signed-off-by: Zhang Rui <rui.zhang@intel.com>
    Signed-off-by: Chen Yu <yu.c.chen@intel.com>

CC: <stable@vger.kernel.org> #3.18+
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1317190
Link: https://bugzilla.kernel.org/show_bug.cgi?id=114551
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
---
 drivers/thermal/thermal_core.c | 13 ++++++++++++-
 include/linux/thermal.h        |  2 ++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index a0a8fd1..d4b5465 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -454,6 +454,10 @@ static void handle_thermal_trip(struct thermal_zone_device *tz, int trip)
 {
 	enum thermal_trip_type type;
 
+	/* Ignore disabled trip points */
+	if (test_bit(trip, &tz->trips_disabled))
+		return;
+
 	tz->ops->get_trip_type(tz, trip, &type);
 
 	if (type == THERMAL_TRIP_CRITICAL || type == THERMAL_TRIP_HOT)
@@ -1800,6 +1804,7 @@ struct thermal_zone_device *thermal_zone_device_register(const char *type,
 {
 	struct thermal_zone_device *tz;
 	enum thermal_trip_type trip_type;
+	int trip_temp;
 	int result;
 	int count;
 	int passive = 0;
@@ -1871,9 +1876,15 @@ struct thermal_zone_device *thermal_zone_device_register(const char *type,
 		goto unregister;
 
 	for (count = 0; count < trips; count++) {
-		tz->ops->get_trip_type(tz, count, &trip_type);
+		if (tz->ops->get_trip_type(tz, count, &trip_type))
+			set_bit(count, &tz->trips_disabled);
 		if (trip_type == THERMAL_TRIP_PASSIVE)
 			passive = 1;
+		if (tz->ops->get_trip_temp(tz, count, &trip_temp))
+			set_bit(count, &tz->trips_disabled);
+		/* Check for bogus trip points */
+		if (trip_temp == 0)
+			set_bit(count, &tz->trips_disabled);
 	}
 
 	if (!passive) {
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 9c48199..a55d052 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -156,6 +156,7 @@ struct thermal_attr {
  * @trip_hyst_attrs:	attributes for trip points for sysfs: trip hysteresis
  * @devdata:	private pointer for device private data
  * @trips:	number of trip points the thermal zone supports
+ * @trips_disabled;	bitmap for disabled trips
  * @passive_delay:	number of milliseconds to wait between polls when
  *			performing passive cooling.
  * @polling_delay:	number of milliseconds to wait between polls when
@@ -191,6 +192,7 @@ struct thermal_zone_device {
 	struct thermal_attr *trip_hyst_attrs;
 	void *devdata;
 	int trips;
+	unsigned long trips_disabled;	/* bitmap for disabled trips */
 	int passive_delay;
 	int polling_delay;
 	int temperature;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device correctly") causes performance drop
       [not found]   ` <CAJABK0MXJRh9wtud+RQrL-==afymcqVovRC8FAZ6=qG4Sob+Tw@mail.gmail.com>
@ 2016-03-21 11:14     ` Vladyslav Frolov
  2016-03-21 12:14       ` Vladyslav Frolov
  0 siblings, 1 reply; 11+ messages in thread
From: Vladyslav Frolov @ 2016-03-21 11:14 UTC (permalink / raw)
  To: Zhang Rui
  Cc: Laura Abbott, Javi Merino, Chen Yu, Greg Kroah-Hartman,
	Manuel Krause, szegad, prash, amish, Matthias, linux-pm,
	Linux Kernel Mailing List

Hi Rui,

I'm sorry for the delay; Reading the e-mail the first time I have
missed the part about the patch, so I'm building the kernel now.

By the way, the patch reported a slight offset missmatch:

patching file drivers/thermal/thermal_core.c
Hunk #2 succeeded at 1800 (offset -4 lines).
Hunk #3 succeeded at 1872 (offset -4 lines).

Is this fine?


On 21 March 2016 at 18:09, Vladyslav Frolov <frolvlad@gmail.com> wrote:
> Hi Rui,
>
> I'm sorry for the delay; Reading the e-mail the first time I have missed the
> part about the patch, so I'm building the kernel now.
>
> By the way, the patch reported a slight offset missmatch:
>
> patching file drivers/thermal/thermal_core.c
> Hunk #2 succeeded at 1800 (offset -4 lines).
> Hunk #3 succeeded at 1872 (offset -4 lines).
>
> Is this fine?
>
>
>
> On 19 March 2016 at 11:43, Zhang Rui <rui.zhang@intel.com> wrote:
>>
>>
>> Hi, all,
>>
>> On Wed, 2016-03-16 at 15:27 -0700, Laura Abbott wrote:
>> > Hi,
>> >
>> > Fedora received a bug report
>> > (https://bugzilla.redhat.com/show_bug.cgi?id=1317190)
>> > of a major performance drop on various bench marks and general system
>> > sluggishness with the 4.4.4 kernel update. The benchmarks were showing
>> > a reduction to about 18% performance (not minor).
>> >
>> > Bisection showed the first bad commit was
>> >
>> > commit 774ac8b7eff69e0786970157de2157e68b22f456
>> > Author: Zhang Rui <rui.zhang@intel.com>
>> > Date:   Fri Oct 30 16:31:47 2015 +0800
>> >
>> >      Thermal: initialize thermal zone device correctly
>> >
>> >      commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461 upstream.
>> >
>> >      After thermal zone device registered, as we have not read any
>> >      temperature before, thus tz->temperature should not be 0,
>> >      which actually means 0C, and thermal trend is not available.
>> >      In this case, we need specially handling for the first
>> >      thermal_zone_device_update().
>> >
>> >      Both thermal core framework and step_wise governor is
>> >      enhanced to handle this. And since the step_wise governor
>> >      is the only one that uses trends, so it's the only thermal
>> >      governor that needs to be updated.
>> >
>> >      Tested-by: Manuel Krause <manuelkrause@netscape.net>
>> >      Tested-by: szegad <szegadlo@poczta.onet.pl>
>> >      Tested-by: prash <prash.n.rao@gmail.com>
>> >      Tested-by: amish <ammdispose-arch@yahoo.com>
>> >      Tested-by: Matthias <morpheusxyz123@yahoo.de>
>> >      Reviewed-by: Javi Merino <javi.merino@arm.com>
>> >      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>> >      Signed-off-by: Chen Yu <yu.c.chen@intel.com>
>> >      Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>
>> According to https://bugzilla.redhat.com/show_bug.cgi?id=1317190#c30
>> and https://bugzilla.kernel.org/show_bug.cgi?id=114551,
>> the root cause of the problem is that
>> 1. BIOS reports bogus passive trip point value, which is 0 degree
>> Celsius
>> 2. Without the thermal patches, thermal core does nothing upon driver
>> probe, thus processor cooling states are not changed.
>> 3. With the thermal patches applied, thermal core checks the new
>> registered thermal zone driver when it is probed, and set the cooling
>> devices to proper state, for example, fan device could be spinning
>> during boot, even if the system is cool.
>> 4. Thus, on these Lenovo laptops, processor cooling devices are
>> throttled because the current temperature (around 50C) is higher than
>> the passive trip point (0C), and the thermal zone is considered as
>> overheating.
>>
>> In order to workaround this bogus BIOS, we should disable those invalid
>> trip points by checking the trip point value, like the patch below,
>> Laura and Vlad,
>> can you please try the patch below and confirm if it fixes the problem
>> for you?
>>
>> From 81ad4276b505e987dd8ebbdf63605f92cd172b52 Mon Sep 17 00:00:00 2001
>> From: Zhang Rui <rui.zhang@intel.com>
>> Date: Fri, 18 Mar 2016 10:03:24 +0800
>> Subject: [PATCH] Thermal: Ignore invalid trip points
>>
>> In some cases, platform thermal driver may report invalid trip points,
>> thermal core should not take any action for these trip points.
>>
>> This fixed a regression that bogus trip point starts to screw up thermal
>> control on some Lenovo laptops, after
>> commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461
>> Author: Zhang Rui <rui.zhang@intel.com>
>> Date:   Fri Oct 30 16:31:47 2015 +0800
>>
>>     Thermal: initialize thermal zone device correctly
>>
>>     After thermal zone device registered, as we have not read any
>>     temperature before, thus tz->temperature should not be 0,
>>     which actually means 0C, and thermal trend is not available.
>>     In this case, we need specially handling for the first
>>     thermal_zone_device_update().
>>
>>     Both thermal core framework and step_wise governor is
>>     enhanced to handle this. And since the step_wise governor
>>     is the only one that uses trends, so it's the only thermal
>>     governor that needs to be updated.
>>
>>     Tested-by: Manuel Krause <manuelkrause@netscape.net>
>>     Tested-by: szegad <szegadlo@poczta.onet.pl>
>>     Tested-by: prash <prash.n.rao@gmail.com>
>>     Tested-by: amish <ammdispose-arch@yahoo.com>
>>     Tested-by: Matthias <morpheusxyz123@yahoo.de>
>>     Reviewed-by: Javi Merino <javi.merino@arm.com>
>>     Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>     Signed-off-by: Chen Yu <yu.c.chen@intel.com>
>>
>> CC: <stable@vger.kernel.org> #3.18+
>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1317190
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=114551
>> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>> ---
>>  drivers/thermal/thermal_core.c | 13 ++++++++++++-
>>  include/linux/thermal.h        |  2 ++
>>  2 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/thermal/thermal_core.c
>> b/drivers/thermal/thermal_core.c
>> index a0a8fd1..d4b5465 100644
>> --- a/drivers/thermal/thermal_core.c
>> +++ b/drivers/thermal/thermal_core.c
>> @@ -454,6 +454,10 @@ static void handle_thermal_trip(struct
>> thermal_zone_device *tz, int trip)
>>  {
>>         enum thermal_trip_type type;
>>
>> +       /* Ignore disabled trip points */
>> +       if (test_bit(trip, &tz->trips_disabled))
>> +               return;
>> +
>>         tz->ops->get_trip_type(tz, trip, &type);
>>
>>         if (type == THERMAL_TRIP_CRITICAL || type == THERMAL_TRIP_HOT)
>> @@ -1800,6 +1804,7 @@ struct thermal_zone_device
>> *thermal_zone_device_register(const char *type,
>>  {
>>         struct thermal_zone_device *tz;
>>         enum thermal_trip_type trip_type;
>> +       int trip_temp;
>>         int result;
>>         int count;
>>         int passive = 0;
>> @@ -1871,9 +1876,15 @@ struct thermal_zone_device
>> *thermal_zone_device_register(const char *type,
>>                 goto unregister;
>>
>>         for (count = 0; count < trips; count++) {
>> -               tz->ops->get_trip_type(tz, count, &trip_type);
>> +               if (tz->ops->get_trip_type(tz, count, &trip_type))
>> +                       set_bit(count, &tz->trips_disabled);
>>                 if (trip_type == THERMAL_TRIP_PASSIVE)
>>                         passive = 1;
>> +               if (tz->ops->get_trip_temp(tz, count, &trip_temp))
>> +                       set_bit(count, &tz->trips_disabled);
>> +               /* Check for bogus trip points */
>> +               if (trip_temp == 0)
>> +                       set_bit(count, &tz->trips_disabled);
>>         }
>>
>>         if (!passive) {
>> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
>> index 9c48199..a55d052 100644
>> --- a/include/linux/thermal.h
>> +++ b/include/linux/thermal.h
>> @@ -156,6 +156,7 @@ struct thermal_attr {
>>   * @trip_hyst_attrs:   attributes for trip points for sysfs: trip
>> hysteresis
>>   * @devdata:   private pointer for device private data
>>   * @trips:     number of trip points the thermal zone supports
>> + * @trips_disabled;    bitmap for disabled trips
>>   * @passive_delay:     number of milliseconds to wait between polls when
>>   *                     performing passive cooling.
>>   * @polling_delay:     number of milliseconds to wait between polls when
>> @@ -191,6 +192,7 @@ struct thermal_zone_device {
>>         struct thermal_attr *trip_hyst_attrs;
>>         void *devdata;
>>         int trips;
>> +       unsigned long trips_disabled;   /* bitmap for disabled trips */
>>         int passive_delay;
>>         int polling_delay;
>>         int temperature;
>> --
>> 1.9.1
>>
>>
>>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device correctly") causes performance drop
  2016-03-21 11:14     ` Vladyslav Frolov
@ 2016-03-21 12:14       ` Vladyslav Frolov
  0 siblings, 0 replies; 11+ messages in thread
From: Vladyslav Frolov @ 2016-03-21 12:14 UTC (permalink / raw)
  To: Zhang Rui
  Cc: Laura Abbott, Javi Merino, Chen Yu, Greg Kroah-Hartman,
	Manuel Krause, szegad, prash, amish, Matthias, linux-pm,
	Linux Kernel Mailing List

Hi Rui,

The patch seems to work fine! I have tried both `rmmod & modprobe
thermal` and suspend & resume. Great!

On 21 March 2016 at 18:14, Vladyslav Frolov <frolvlad@gmail.com> wrote:
> Hi Rui,
>
> I'm sorry for the delay; Reading the e-mail the first time I have
> missed the part about the patch, so I'm building the kernel now.
>
> By the way, the patch reported a slight offset missmatch:
>
> patching file drivers/thermal/thermal_core.c
> Hunk #2 succeeded at 1800 (offset -4 lines).
> Hunk #3 succeeded at 1872 (offset -4 lines).
>
> Is this fine?
>
>
> On 21 March 2016 at 18:09, Vladyslav Frolov <frolvlad@gmail.com> wrote:
>> Hi Rui,
>>
>> I'm sorry for the delay; Reading the e-mail the first time I have missed the
>> part about the patch, so I'm building the kernel now.
>>
>> By the way, the patch reported a slight offset missmatch:
>>
>> patching file drivers/thermal/thermal_core.c
>> Hunk #2 succeeded at 1800 (offset -4 lines).
>> Hunk #3 succeeded at 1872 (offset -4 lines).
>>
>> Is this fine?
>>
>>
>>
>> On 19 March 2016 at 11:43, Zhang Rui <rui.zhang@intel.com> wrote:
>>>
>>>
>>> Hi, all,
>>>
>>> On Wed, 2016-03-16 at 15:27 -0700, Laura Abbott wrote:
>>> > Hi,
>>> >
>>> > Fedora received a bug report
>>> > (https://bugzilla.redhat.com/show_bug.cgi?id=1317190)
>>> > of a major performance drop on various bench marks and general system
>>> > sluggishness with the 4.4.4 kernel update. The benchmarks were showing
>>> > a reduction to about 18% performance (not minor).
>>> >
>>> > Bisection showed the first bad commit was
>>> >
>>> > commit 774ac8b7eff69e0786970157de2157e68b22f456
>>> > Author: Zhang Rui <rui.zhang@intel.com>
>>> > Date:   Fri Oct 30 16:31:47 2015 +0800
>>> >
>>> >      Thermal: initialize thermal zone device correctly
>>> >
>>> >      commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461 upstream.
>>> >
>>> >      After thermal zone device registered, as we have not read any
>>> >      temperature before, thus tz->temperature should not be 0,
>>> >      which actually means 0C, and thermal trend is not available.
>>> >      In this case, we need specially handling for the first
>>> >      thermal_zone_device_update().
>>> >
>>> >      Both thermal core framework and step_wise governor is
>>> >      enhanced to handle this. And since the step_wise governor
>>> >      is the only one that uses trends, so it's the only thermal
>>> >      governor that needs to be updated.
>>> >
>>> >      Tested-by: Manuel Krause <manuelkrause@netscape.net>
>>> >      Tested-by: szegad <szegadlo@poczta.onet.pl>
>>> >      Tested-by: prash <prash.n.rao@gmail.com>
>>> >      Tested-by: amish <ammdispose-arch@yahoo.com>
>>> >      Tested-by: Matthias <morpheusxyz123@yahoo.de>
>>> >      Reviewed-by: Javi Merino <javi.merino@arm.com>
>>> >      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>> >      Signed-off-by: Chen Yu <yu.c.chen@intel.com>
>>> >      Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>
>>> According to https://bugzilla.redhat.com/show_bug.cgi?id=1317190#c30
>>> and https://bugzilla.kernel.org/show_bug.cgi?id=114551,
>>> the root cause of the problem is that
>>> 1. BIOS reports bogus passive trip point value, which is 0 degree
>>> Celsius
>>> 2. Without the thermal patches, thermal core does nothing upon driver
>>> probe, thus processor cooling states are not changed.
>>> 3. With the thermal patches applied, thermal core checks the new
>>> registered thermal zone driver when it is probed, and set the cooling
>>> devices to proper state, for example, fan device could be spinning
>>> during boot, even if the system is cool.
>>> 4. Thus, on these Lenovo laptops, processor cooling devices are
>>> throttled because the current temperature (around 50C) is higher than
>>> the passive trip point (0C), and the thermal zone is considered as
>>> overheating.
>>>
>>> In order to workaround this bogus BIOS, we should disable those invalid
>>> trip points by checking the trip point value, like the patch below,
>>> Laura and Vlad,
>>> can you please try the patch below and confirm if it fixes the problem
>>> for you?
>>>
>>> From 81ad4276b505e987dd8ebbdf63605f92cd172b52 Mon Sep 17 00:00:00 2001
>>> From: Zhang Rui <rui.zhang@intel.com>
>>> Date: Fri, 18 Mar 2016 10:03:24 +0800
>>> Subject: [PATCH] Thermal: Ignore invalid trip points
>>>
>>> In some cases, platform thermal driver may report invalid trip points,
>>> thermal core should not take any action for these trip points.
>>>
>>> This fixed a regression that bogus trip point starts to screw up thermal
>>> control on some Lenovo laptops, after
>>> commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461
>>> Author: Zhang Rui <rui.zhang@intel.com>
>>> Date:   Fri Oct 30 16:31:47 2015 +0800
>>>
>>>     Thermal: initialize thermal zone device correctly
>>>
>>>     After thermal zone device registered, as we have not read any
>>>     temperature before, thus tz->temperature should not be 0,
>>>     which actually means 0C, and thermal trend is not available.
>>>     In this case, we need specially handling for the first
>>>     thermal_zone_device_update().
>>>
>>>     Both thermal core framework and step_wise governor is
>>>     enhanced to handle this. And since the step_wise governor
>>>     is the only one that uses trends, so it's the only thermal
>>>     governor that needs to be updated.
>>>
>>>     Tested-by: Manuel Krause <manuelkrause@netscape.net>
>>>     Tested-by: szegad <szegadlo@poczta.onet.pl>
>>>     Tested-by: prash <prash.n.rao@gmail.com>
>>>     Tested-by: amish <ammdispose-arch@yahoo.com>
>>>     Tested-by: Matthias <morpheusxyz123@yahoo.de>
>>>     Reviewed-by: Javi Merino <javi.merino@arm.com>
>>>     Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>     Signed-off-by: Chen Yu <yu.c.chen@intel.com>
>>>
>>> CC: <stable@vger.kernel.org> #3.18+
>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1317190
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=114551
>>> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>> ---
>>>  drivers/thermal/thermal_core.c | 13 ++++++++++++-
>>>  include/linux/thermal.h        |  2 ++
>>>  2 files changed, 14 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/thermal/thermal_core.c
>>> b/drivers/thermal/thermal_core.c
>>> index a0a8fd1..d4b5465 100644
>>> --- a/drivers/thermal/thermal_core.c
>>> +++ b/drivers/thermal/thermal_core.c
>>> @@ -454,6 +454,10 @@ static void handle_thermal_trip(struct
>>> thermal_zone_device *tz, int trip)
>>>  {
>>>         enum thermal_trip_type type;
>>>
>>> +       /* Ignore disabled trip points */
>>> +       if (test_bit(trip, &tz->trips_disabled))
>>> +               return;
>>> +
>>>         tz->ops->get_trip_type(tz, trip, &type);
>>>
>>>         if (type == THERMAL_TRIP_CRITICAL || type == THERMAL_TRIP_HOT)
>>> @@ -1800,6 +1804,7 @@ struct thermal_zone_device
>>> *thermal_zone_device_register(const char *type,
>>>  {
>>>         struct thermal_zone_device *tz;
>>>         enum thermal_trip_type trip_type;
>>> +       int trip_temp;
>>>         int result;
>>>         int count;
>>>         int passive = 0;
>>> @@ -1871,9 +1876,15 @@ struct thermal_zone_device
>>> *thermal_zone_device_register(const char *type,
>>>                 goto unregister;
>>>
>>>         for (count = 0; count < trips; count++) {
>>> -               tz->ops->get_trip_type(tz, count, &trip_type);
>>> +               if (tz->ops->get_trip_type(tz, count, &trip_type))
>>> +                       set_bit(count, &tz->trips_disabled);
>>>                 if (trip_type == THERMAL_TRIP_PASSIVE)
>>>                         passive = 1;
>>> +               if (tz->ops->get_trip_temp(tz, count, &trip_temp))
>>> +                       set_bit(count, &tz->trips_disabled);
>>> +               /* Check for bogus trip points */
>>> +               if (trip_temp == 0)
>>> +                       set_bit(count, &tz->trips_disabled);
>>>         }
>>>
>>>         if (!passive) {
>>> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
>>> index 9c48199..a55d052 100644
>>> --- a/include/linux/thermal.h
>>> +++ b/include/linux/thermal.h
>>> @@ -156,6 +156,7 @@ struct thermal_attr {
>>>   * @trip_hyst_attrs:   attributes for trip points for sysfs: trip
>>> hysteresis
>>>   * @devdata:   private pointer for device private data
>>>   * @trips:     number of trip points the thermal zone supports
>>> + * @trips_disabled;    bitmap for disabled trips
>>>   * @passive_delay:     number of milliseconds to wait between polls when
>>>   *                     performing passive cooling.
>>>   * @polling_delay:     number of milliseconds to wait between polls when
>>> @@ -191,6 +192,7 @@ struct thermal_zone_device {
>>>         struct thermal_attr *trip_hyst_attrs;
>>>         void *devdata;
>>>         int trips;
>>> +       unsigned long trips_disabled;   /* bitmap for disabled trips */
>>>         int passive_delay;
>>>         int polling_delay;
>>>         int temperature;
>>> --
>>> 1.9.1
>>>
>>>
>>>
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-03-21 12:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-16 22:27 [REGRESSION] 774ac8b7eff6 ("Thermal: initialize thermal zone device correctly") causes performance drop Laura Abbott
2016-03-16 22:46 ` Greg Kroah-Hartman
2016-03-17  0:00   ` Laura Abbott
2016-03-17  0:11     ` Greg Kroah-Hartman
2016-03-17  0:20     ` Pandruvada, Srinivas
2016-03-18 22:28       ` Laura Abbott
2016-03-18 23:36         ` Pandruvada, Srinivas
2016-03-17  0:17 ` Zhang, Rui
2016-03-19  4:43 ` Zhang Rui
     [not found]   ` <CAJABK0MXJRh9wtud+RQrL-==afymcqVovRC8FAZ6=qG4Sob+Tw@mail.gmail.com>
2016-03-21 11:14     ` Vladyslav Frolov
2016-03-21 12:14       ` Vladyslav Frolov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).