All of lore.kernel.org
 help / color / mirror / Atom feed
* 3.13.?: Strange / dangerous fan policy...
@ 2014-03-07 19:33 Manuel Krause
  2014-03-07 20:55   ` [lm-sensors] " Guenter Roeck
  0 siblings, 1 reply; 45+ messages in thread
From: Manuel Krause @ 2014-03-07 19:33 UTC (permalink / raw)
  To: linux-kernel, linux-pm

Please have a short look at the following BUG report + the 
comments -- this message here is a kind of FWD-ing it:
https://bugs.archlinux.org/task/39005

I came late to test kernel 3.13 with the .5 one, as it was the 
time that the related -CK/BFS patch became available.

I'm not using Archlinux, but openSUSE, and my problems are quite 
the same. Especially these with smelling melting plastics.

My own reports went to Con Kolivas' Blog first:
"I get weird temperatures and abrupt 100% fan actions with 
vanilla 3.13.5 with this CK and most recent BFQ at my HP Notebook.
In gkrellm the highest T had been @74°C, so far (3.12.13), and is 
now growing to 94°C. Then, the fan goes to 100% for 10~30secs 
cooling it to approx. 82°C.
That is not good, if I compare 74 to 94 °C.
Have I missed a .CONFIG option for 3.13, especially?"

I'd get the same without (Con's && BFQ's) patches.

Machine:           HP Notebook with Core2Duo CPU (Penryn)
Distro:            openSUSE 13.1, 64bit, continuously updated
Desktop:           KDE 4.12.3
MESA & drm & Xorg: most recent ones from:
http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/

Current kernel:    3.13.6 vanilla from openSUSE repos, with
                    -ck1 and BFQ patches
Same behaviour:    without these patches

Last good kernel:  3.12.13 vanilla + CK2 + BFQ


Please, _always_CC_me_ -- as I'm not on the linux-kernel / 
linux-pm mailing lists.

And please, if you know any person in charge of this -- lead this 
message to him/her.

Thank you in advance and best regards,
Manuel Krause


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-07 19:33 3.13.?: Strange / dangerous fan policy Manuel Krause
@ 2014-03-07 20:55   ` Guenter Roeck
  0 siblings, 0 replies; 45+ messages in thread
From: Guenter Roeck @ 2014-03-07 20:55 UTC (permalink / raw)
  To: Manuel Krause; +Cc: linux-kernel, linux-pm, lm-sensors

On Fri, Mar 07, 2014 at 08:33:02PM +0100, Manuel Krause wrote:
> Please have a short look at the following BUG report + the comments
> -- this message here is a kind of FWD-ing it:
> https://bugs.archlinux.org/task/39005
> 
> I came late to test kernel 3.13 with the .5 one, as it was the time
> that the related -CK/BFS patch became available.
> 
> I'm not using Archlinux, but openSUSE, and my problems are quite the
> same. Especially these with smelling melting plastics.
> 
> My own reports went to Con Kolivas' Blog first:
> "I get weird temperatures and abrupt 100% fan actions with vanilla
> 3.13.5 with this CK and most recent BFQ at my HP Notebook.
> In gkrellm the highest T had been @74°C, so far (3.12.13), and is
> now growing to 94°C. Then, the fan goes to 100% for 10~30secs
> cooling it to approx. 82°C.
> That is not good, if I compare 74 to 94 °C.
> Have I missed a .CONFIG option for 3.13, especially?"
> 
> I'd get the same without (Con's && BFQ's) patches.
> 
> Machine:           HP Notebook with Core2Duo CPU (Penryn)
> Distro:            openSUSE 13.1, 64bit, continuously updated
> Desktop:           KDE 4.12.3
> MESA & drm & Xorg: most recent ones from:
> http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/
> 
> Current kernel:    3.13.6 vanilla from openSUSE repos, with
>                    -ck1 and BFQ patches
> Same behaviour:    without these patches
> 
> Last good kernel:  3.12.13 vanilla + CK2 + BFQ
> 

Can you add more information about your fan control policy ?
Do you rely on the hardware for automatic fan speed control,
or do you run the fancontrol script ?

What is the output from the 'sensors' command ?

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-03-07 20:55   ` Guenter Roeck
  0 siblings, 0 replies; 45+ messages in thread
From: Guenter Roeck @ 2014-03-07 20:55 UTC (permalink / raw)
  To: Manuel Krause; +Cc: linux-kernel, linux-pm, lm-sensors

On Fri, Mar 07, 2014 at 08:33:02PM +0100, Manuel Krause wrote:
> Please have a short look at the following BUG report + the comments
> -- this message here is a kind of FWD-ing it:
> https://bugs.archlinux.org/task/39005
> 
> I came late to test kernel 3.13 with the .5 one, as it was the time
> that the related -CK/BFS patch became available.
> 
> I'm not using Archlinux, but openSUSE, and my problems are quite the
> same. Especially these with smelling melting plastics.
> 
> My own reports went to Con Kolivas' Blog first:
> "I get weird temperatures and abrupt 100% fan actions with vanilla
> 3.13.5 with this CK and most recent BFQ at my HP Notebook.
> In gkrellm the highest T had been @74°C, so far (3.12.13), and is
> now growing to 94°C. Then, the fan goes to 100% for 10~30secs
> cooling it to approx. 82°C.
> That is not good, if I compare 74 to 94 °C.
> Have I missed a .CONFIG option for 3.13, especially?"
> 
> I'd get the same without (Con's && BFQ's) patches.
> 
> Machine:           HP Notebook with Core2Duo CPU (Penryn)
> Distro:            openSUSE 13.1, 64bit, continuously updated
> Desktop:           KDE 4.12.3
> MESA & drm & Xorg: most recent ones from:
> http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/
> 
> Current kernel:    3.13.6 vanilla from openSUSE repos, with
>                    -ck1 and BFQ patches
> Same behaviour:    without these patches
> 
> Last good kernel:  3.12.13 vanilla + CK2 + BFQ
> 

Can you add more information about your fan control policy ?
Do you rely on the hardware for automatic fan speed control,
or do you run the fancontrol script ?

What is the output from the 'sensors' command ?

Thanks,
Guenter

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-07 20:55   ` [lm-sensors] " Guenter Roeck
@ 2014-03-07 22:04     ` Manuel Krause
  -1 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-03-07 22:04 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-kernel, linux-pm, lm-sensors

On 2014-03-07 21:55, Guenter Roeck wrote:
> On Fri, Mar 07, 2014 at 08:33:02PM +0100, Manuel Krause wrote:
>> Please have a short look at the following BUG report + the comments
>> -- this message here is a kind of FWD-ing it:
>> https://bugs.archlinux.org/task/39005
>>
>> I came late to test kernel 3.13 with the .5 one, as it was the time
>> that the related -CK/BFS patch became available.
>>
>> I'm not using Archlinux, but openSUSE, and my problems are quite the
>> same. Especially these with smelling melting plastics.
>>
>> My own reports went to Con Kolivas' Blog first:
>> "I get weird temperatures and abrupt 100% fan actions with vanilla
>> 3.13.5 with this CK and most recent BFQ at my HP Notebook.
>> In gkrellm the highest T had been @74°C, so far (3.12.13), and is
>> now growing to 94°C. Then, the fan goes to 100% for 10~30secs
>> cooling it to approx. 82°C.
>> That is not good, if I compare 74 to 94 °C.
>> Have I missed a .CONFIG option for 3.13, especially?"
>>
>> I'd get the same without (Con's && BFQ's) patches.
>>
>> Machine:           HP Notebook with Core2Duo CPU (Penryn)
>> Distro:            openSUSE 13.1, 64bit, continuously updated
>> Desktop:           KDE 4.12.3
>> MESA & drm & Xorg: most recent ones from:
>> http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/
>>
>> Current kernel:    3.13.6 vanilla from openSUSE repos, with
>>                     -ck1 and BFQ patches
>> Same behaviour:    without these patches
>>
>> Last good kernel:  3.12.13 vanilla + CK2 + BFQ
>>
>
> Can you add more information about your fan control policy ?
> Do you rely on the hardware for automatic fan speed control,
> or do you run the fancontrol script ?
>
> What is the output from the 'sensors' command ?
>
> Thanks,
> Guenter
>

Hi, and thanks for the quick response!
No special fancy "fan control policy". 'fancontrol' isn't up or 
running.
Vanilla kernels 3.11.* and 3.12.* had been working on here 
without any extra work.
--
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +71.0°C  (crit = +256.0°C)
temp2:        +69.0°C  (crit = +110.0°C)
temp3:        +52.0°C  (crit = +105.0°C)
temp4:        +25.0°C  (crit = +110.0°C)
temp5:        +58.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
--
My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
This is with 3.12.13 with my normal workload.

Please, trust my above mentionned values of 94 °C vs. 74°C as I 
don't like to boot 3.13.6 anymore, to avoid harm to the 
notebook's casing.

But I'd do to test any improvement-patch.

Manuel Krause



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-03-07 22:04     ` Manuel Krause
  0 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-03-07 22:04 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-kernel, linux-pm, lm-sensors

On 2014-03-07 21:55, Guenter Roeck wrote:
> On Fri, Mar 07, 2014 at 08:33:02PM +0100, Manuel Krause wrote:
>> Please have a short look at the following BUG report + the comments
>> -- this message here is a kind of FWD-ing it:
>> https://bugs.archlinux.org/task/39005
>>
>> I came late to test kernel 3.13 with the .5 one, as it was the time
>> that the related -CK/BFS patch became available.
>>
>> I'm not using Archlinux, but openSUSE, and my problems are quite the
>> same. Especially these with smelling melting plastics.
>>
>> My own reports went to Con Kolivas' Blog first:
>> "I get weird temperatures and abrupt 100% fan actions with vanilla
>> 3.13.5 with this CK and most recent BFQ at my HP Notebook.
>> In gkrellm the highest T had been @74°C, so far (3.12.13), and is
>> now growing to 94°C. Then, the fan goes to 100% for 10~30secs
>> cooling it to approx. 82°C.
>> That is not good, if I compare 74 to 94 °C.
>> Have I missed a .CONFIG option for 3.13, especially?"
>>
>> I'd get the same without (Con's && BFQ's) patches.
>>
>> Machine:           HP Notebook with Core2Duo CPU (Penryn)
>> Distro:            openSUSE 13.1, 64bit, continuously updated
>> Desktop:           KDE 4.12.3
>> MESA & drm & Xorg: most recent ones from:
>> http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/
>>
>> Current kernel:    3.13.6 vanilla from openSUSE repos, with
>>                     -ck1 and BFQ patches
>> Same behaviour:    without these patches
>>
>> Last good kernel:  3.12.13 vanilla + CK2 + BFQ
>>
>
> Can you add more information about your fan control policy ?
> Do you rely on the hardware for automatic fan speed control,
> or do you run the fancontrol script ?
>
> What is the output from the 'sensors' command ?
>
> Thanks,
> Guenter
>

Hi, and thanks for the quick response!
No special fancy "fan control policy". 'fancontrol' isn't up or 
running.
Vanilla kernels 3.11.* and 3.12.* had been working on here 
without any extra work.
--
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +71.0°C  (crit = +256.0°C)
temp2:        +69.0°C  (crit = +110.0°C)
temp3:        +52.0°C  (crit = +105.0°C)
temp4:        +25.0°C  (crit = +110.0°C)
temp5:        +58.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
--
My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
This is with 3.12.13 with my normal workload.

Please, trust my above mentionned values of 94 °C vs. 74°C as I 
don't like to boot 3.13.6 anymore, to avoid harm to the 
notebook's casing.

But I'd do to test any improvement-patch.

Manuel Krause



_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-07 22:04     ` [lm-sensors] " Manuel Krause
@ 2014-03-07 22:52       ` Guenter Roeck
  -1 siblings, 0 replies; 45+ messages in thread
From: Guenter Roeck @ 2014-03-07 22:52 UTC (permalink / raw)
  To: Manuel Krause; +Cc: linux-kernel, linux-pm, lm-sensors

On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
> On 2014-03-07 21:55, Guenter Roeck wrote:
> >On Fri, Mar 07, 2014 at 08:33:02PM +0100, Manuel Krause wrote:
> >>Please have a short look at the following BUG report + the comments
> >>-- this message here is a kind of FWD-ing it:
> >>https://bugs.archlinux.org/task/39005
> >>
> >>I came late to test kernel 3.13 with the .5 one, as it was the time
> >>that the related -CK/BFS patch became available.
> >>
> >>I'm not using Archlinux, but openSUSE, and my problems are quite the
> >>same. Especially these with smelling melting plastics.
> >>
> >>My own reports went to Con Kolivas' Blog first:
> >>"I get weird temperatures and abrupt 100% fan actions with vanilla
> >>3.13.5 with this CK and most recent BFQ at my HP Notebook.
> >>In gkrellm the highest T had been @74°C, so far (3.12.13), and is
> >>now growing to 94°C. Then, the fan goes to 100% for 10~30secs
> >>cooling it to approx. 82°C.
> >>That is not good, if I compare 74 to 94 °C.
> >>Have I missed a .CONFIG option for 3.13, especially?"
> >>
> >>I'd get the same without (Con's && BFQ's) patches.
> >>
> >>Machine:           HP Notebook with Core2Duo CPU (Penryn)
> >>Distro:            openSUSE 13.1, 64bit, continuously updated
> >>Desktop:           KDE 4.12.3
> >>MESA & drm & Xorg: most recent ones from:
> >>http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/
> >>
> >>Current kernel:    3.13.6 vanilla from openSUSE repos, with
> >>                    -ck1 and BFQ patches
> >>Same behaviour:    without these patches
> >>
> >>Last good kernel:  3.12.13 vanilla + CK2 + BFQ
> >>
> >
> >Can you add more information about your fan control policy ?
> >Do you rely on the hardware for automatic fan speed control,
> >or do you run the fancontrol script ?
> >
> >What is the output from the 'sensors' command ?
> >
> >Thanks,
> >Guenter
> >
> 
> Hi, and thanks for the quick response!
> No special fancy "fan control policy". 'fancontrol' isn't up or
> running.
> Vanilla kernels 3.11.* and 3.12.* had been working on here without
> any extra work.
> --
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1:        +71.0°C  (crit = +256.0°C)
> temp2:        +69.0°C  (crit = +110.0°C)
> temp3:        +52.0°C  (crit = +105.0°C)
> temp4:        +25.0°C  (crit = +110.0°C)
> temp5:        +58.0°C  (crit = +110.0°C)
> 
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
> --
> My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
> This is with 3.12.13 with my normal workload.
> 
> Please, trust my above mentionned values of 94 °C vs. 74°C as I
> don't like to boot 3.13.6 anymore, to avoid harm to the notebook's
> casing.
> 
Understood. Unfortunately, we'll need to get information
from the new kernel to be able to track down the problem.

> But I'd do to test any improvement-patch.
> 
So far I have no idea what is going on. I don't see anything in the
drivers providing above data that would explain the behavior,
but I might be missing something.

Of course, if output is different in 3.13, that would be important
to know. Maybe someone else can post related information for both
kernel versions on an affected system.

Guenter

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-03-07 22:52       ` Guenter Roeck
  0 siblings, 0 replies; 45+ messages in thread
From: Guenter Roeck @ 2014-03-07 22:52 UTC (permalink / raw)
  To: Manuel Krause; +Cc: linux-kernel, linux-pm, lm-sensors

On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
> On 2014-03-07 21:55, Guenter Roeck wrote:
> >On Fri, Mar 07, 2014 at 08:33:02PM +0100, Manuel Krause wrote:
> >>Please have a short look at the following BUG report + the comments
> >>-- this message here is a kind of FWD-ing it:
> >>https://bugs.archlinux.org/task/39005
> >>
> >>I came late to test kernel 3.13 with the .5 one, as it was the time
> >>that the related -CK/BFS patch became available.
> >>
> >>I'm not using Archlinux, but openSUSE, and my problems are quite the
> >>same. Especially these with smelling melting plastics.
> >>
> >>My own reports went to Con Kolivas' Blog first:
> >>"I get weird temperatures and abrupt 100% fan actions with vanilla
> >>3.13.5 with this CK and most recent BFQ at my HP Notebook.
> >>In gkrellm the highest T had been @74°C, so far (3.12.13), and is
> >>now growing to 94°C. Then, the fan goes to 100% for 10~30secs
> >>cooling it to approx. 82°C.
> >>That is not good, if I compare 74 to 94 °C.
> >>Have I missed a .CONFIG option for 3.13, especially?"
> >>
> >>I'd get the same without (Con's && BFQ's) patches.
> >>
> >>Machine:           HP Notebook with Core2Duo CPU (Penryn)
> >>Distro:            openSUSE 13.1, 64bit, continuously updated
> >>Desktop:           KDE 4.12.3
> >>MESA & drm & Xorg: most recent ones from:
> >>http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/
> >>
> >>Current kernel:    3.13.6 vanilla from openSUSE repos, with
> >>                    -ck1 and BFQ patches
> >>Same behaviour:    without these patches
> >>
> >>Last good kernel:  3.12.13 vanilla + CK2 + BFQ
> >>
> >
> >Can you add more information about your fan control policy ?
> >Do you rely on the hardware for automatic fan speed control,
> >or do you run the fancontrol script ?
> >
> >What is the output from the 'sensors' command ?
> >
> >Thanks,
> >Guenter
> >
> 
> Hi, and thanks for the quick response!
> No special fancy "fan control policy". 'fancontrol' isn't up or
> running.
> Vanilla kernels 3.11.* and 3.12.* had been working on here without
> any extra work.
> --
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1:        +71.0°C  (crit = +256.0°C)
> temp2:        +69.0°C  (crit = +110.0°C)
> temp3:        +52.0°C  (crit = +105.0°C)
> temp4:        +25.0°C  (crit = +110.0°C)
> temp5:        +58.0°C  (crit = +110.0°C)
> 
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
> --
> My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
> This is with 3.12.13 with my normal workload.
> 
> Please, trust my above mentionned values of 94 °C vs. 74°C as I
> don't like to boot 3.13.6 anymore, to avoid harm to the notebook's
> casing.
> 
Understood. Unfortunately, we'll need to get information
from the new kernel to be able to track down the problem.

> But I'd do to test any improvement-patch.
> 
So far I have no idea what is going on. I don't see anything in the
drivers providing above data that would explain the behavior,
but I might be missing something.

Of course, if output is different in 3.13, that would be important
to know. Maybe someone else can post related information for both
kernel versions on an affected system.

Guenter

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
  2014-03-07 22:52       ` [lm-sensors] " Guenter Roeck
@ 2014-03-08 11:08         ` Jean Delvare
  -1 siblings, 0 replies; 45+ messages in thread
From: Jean Delvare @ 2014-03-08 11:08 UTC (permalink / raw)
  To: Manuel Krause; +Cc: Guenter Roeck, lm-sensors, linux-kernel, linux-pm

On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
> > Hi, and thanks for the quick response!
> > No special fancy "fan control policy". 'fancontrol' isn't up or
> > running.
> > Vanilla kernels 3.11.* and 3.12.* had been working on here without
> > any extra work.
> > --
> > # sensors
> > acpitz-virtual-0
> > Adapter: Virtual device
> > temp1:        +71.0°C  (crit = +256.0°C)
> > temp2:        +69.0°C  (crit = +110.0°C)
> > temp3:        +52.0°C  (crit = +105.0°C)
> > temp4:        +25.0°C  (crit = +110.0°C)
> > temp5:        +58.0°C  (crit = +110.0°C)
> > 
> > coretemp-isa-0000
> > Adapter: ISA adapter
> > Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
> > Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
> > --
> > My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
> > This is with 3.12.13 with my normal workload.
> > 
> > Please, trust my above mentionned values of 94 °C vs. 74°C as I
> > don't like to boot 3.13.6 anymore, to avoid harm to the notebook's
> > casing.
> 
> Understood. Unfortunately, we'll need to get information
> from the new kernel to be able to track down the problem.

Indeed. Not only the run-time temperatures, but also the high and crit
limits.

> > But I'd do to test any improvement-patch.
> 
> So far I have no idea what is going on. I don't see anything in the
> drivers providing above data that would explain the behavior,
> but I might be missing something.

Looks like a regression in the acpi subsystem or in power management,
not hwmon. Hwmon is merely reporting the temperatures, it's not
responsible for the actual temperatures.

A bisection would certainly help, but of course that would require
booting to a bad kernel half of the time, which I understand Manual
wouldn't enjoy.

The only two components which I think can reach such high temperatures
in a laptop are the CPU and the GPU. I suppose that the "94 °C vs.
74°C" refers to acpitz's temp1? If the the temperatures reported by
coretemp remain the same, then I can only suppose that temp1 is the GPU
temperature. Please tell us which GPU is in this laptop, and which
driver you're using.

-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-03-08 11:08         ` Jean Delvare
  0 siblings, 0 replies; 45+ messages in thread
From: Jean Delvare @ 2014-03-08 11:08 UTC (permalink / raw)
  To: Manuel Krause; +Cc: Guenter Roeck, lm-sensors, linux-kernel, linux-pm

On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
> > Hi, and thanks for the quick response!
> > No special fancy "fan control policy". 'fancontrol' isn't up or
> > running.
> > Vanilla kernels 3.11.* and 3.12.* had been working on here without
> > any extra work.
> > --
> > # sensors
> > acpitz-virtual-0
> > Adapter: Virtual device
> > temp1:        +71.0°C  (crit = +256.0°C)
> > temp2:        +69.0°C  (crit = +110.0°C)
> > temp3:        +52.0°C  (crit = +105.0°C)
> > temp4:        +25.0°C  (crit = +110.0°C)
> > temp5:        +58.0°C  (crit = +110.0°C)
> > 
> > coretemp-isa-0000
> > Adapter: ISA adapter
> > Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
> > Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
> > --
> > My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
> > This is with 3.12.13 with my normal workload.
> > 
> > Please, trust my above mentionned values of 94 °C vs. 74°C as I
> > don't like to boot 3.13.6 anymore, to avoid harm to the notebook's
> > casing.
> 
> Understood. Unfortunately, we'll need to get information
> from the new kernel to be able to track down the problem.

Indeed. Not only the run-time temperatures, but also the high and crit
limits.

> > But I'd do to test any improvement-patch.
> 
> So far I have no idea what is going on. I don't see anything in the
> drivers providing above data that would explain the behavior,
> but I might be missing something.

Looks like a regression in the acpi subsystem or in power management,
not hwmon. Hwmon is merely reporting the temperatures, it's not
responsible for the actual temperatures.

A bisection would certainly help, but of course that would require
booting to a bad kernel half of the time, which I understand Manual
wouldn't enjoy.

The only two components which I think can reach such high temperatures
in a laptop are the CPU and the GPU. I suppose that the "94 °C vs.
74°C" refers to acpitz's temp1? If the the temperatures reported by
coretemp remain the same, then I can only suppose that temp1 is the GPU
temperature. Please tell us which GPU is in this laptop, and which
driver you're using.

-- 
Jean Delvare
SUSE L3 Support

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
  2014-03-08 11:08         ` Jean Delvare
@ 2014-03-08 12:36           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 45+ messages in thread
From: Rafael J. Wysocki @ 2014-03-08 12:36 UTC (permalink / raw)
  To: Jean Delvare, Manuel Krause
  Cc: Guenter Roeck, lm-sensors, linux-kernel, linux-pm

On Saturday, March 08, 2014 12:08:31 PM Jean Delvare wrote:
> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
> > On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
> > > Hi, and thanks for the quick response!
> > > No special fancy "fan control policy". 'fancontrol' isn't up or
> > > running.
> > > Vanilla kernels 3.11.* and 3.12.* had been working on here without
> > > any extra work.
> > > --
> > > # sensors
> > > acpitz-virtual-0
> > > Adapter: Virtual device
> > > temp1:        +71.0°C  (crit = +256.0°C)
> > > temp2:        +69.0°C  (crit = +110.0°C)
> > > temp3:        +52.0°C  (crit = +105.0°C)
> > > temp4:        +25.0°C  (crit = +110.0°C)
> > > temp5:        +58.0°C  (crit = +110.0°C)
> > > 
> > > coretemp-isa-0000
> > > Adapter: ISA adapter
> > > Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
> > > Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
> > > --
> > > My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
> > > This is with 3.12.13 with my normal workload.
> > > 
> > > Please, trust my above mentionned values of 94 °C vs. 74°C as I
> > > don't like to boot 3.13.6 anymore, to avoid harm to the notebook's
> > > casing.
> > 
> > Understood. Unfortunately, we'll need to get information
> > from the new kernel to be able to track down the problem.
> 
> Indeed. Not only the run-time temperatures, but also the high and crit
> limits.
> 
> > > But I'd do to test any improvement-patch.
> > 
> > So far I have no idea what is going on. I don't see anything in the
> > drivers providing above data that would explain the behavior,
> > but I might be missing something.
> 
> Looks like a regression in the acpi subsystem or in power management,
> not hwmon. Hwmon is merely reporting the temperatures, it's not
> responsible for the actual temperatures.
> 
> A bisection would certainly help, but of course that would require
> booting to a bad kernel half of the time, which I understand Manual
> wouldn't enjoy.
> 
> The only two components which I think can reach such high temperatures
> in a laptop are the CPU and the GPU. I suppose that the "94 °C vs.
> 74°C" refers to acpitz's temp1? If the the temperatures reported by
> coretemp remain the same, then I can only suppose that temp1 is the GPU
> temperature. Please tell us which GPU is in this laptop, and which
> driver you're using.

Also it would be good to know which cpufreq and cpuidle drivers are in use
and whether or not 3.14-rc5 has the problem.

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-03-08 12:36           ` Rafael J. Wysocki
  0 siblings, 0 replies; 45+ messages in thread
From: Rafael J. Wysocki @ 2014-03-08 12:36 UTC (permalink / raw)
  To: Jean Delvare, Manuel Krause
  Cc: Guenter Roeck, lm-sensors, linux-kernel, linux-pm

T24gU2F0dXJkYXksIE1hcmNoIDA4LCAyMDE0IDEyOjA4OjMxIFBNIEplYW4gRGVsdmFyZSB3cm90
ZToKPiBPbiBGcmksIDcgTWFyIDIwMTQgMTQ6NTI6MzAgLTA4MDAsIEd1ZW50ZXIgUm9lY2sgd3Jv
dGU6Cj4gPiBPbiBGcmksIE1hciAwNywgMjAxNCBhdCAxMTowNDoyOVBNICswMTAwLCBNYW51ZWwg
S3JhdXNlIHdyb3RlOgo+ID4gPiBIaSwgYW5kIHRoYW5rcyBmb3IgdGhlIHF1aWNrIHJlc3BvbnNl
IQo+ID4gPiBObyBzcGVjaWFsIGZhbmN5ICJmYW4gY29udHJvbCBwb2xpY3kiLiAnZmFuY29udHJv
bCcgaXNuJ3QgdXAgb3IKPiA+ID4gcnVubmluZy4KPiA+ID4gVmFuaWxsYSBrZXJuZWxzIDMuMTEu
KiBhbmQgMy4xMi4qIGhhZCBiZWVuIHdvcmtpbmcgb24gaGVyZSB3aXRob3V0Cj4gPiA+IGFueSBl
eHRyYSB3b3JrLgo+ID4gPiAtLQo+ID4gPiAjIHNlbnNvcnMKPiA+ID4gYWNwaXR6LXZpcnR1YWwt
MAo+ID4gPiBBZGFwdGVyOiBWaXJ0dWFsIGRldmljZQo+ID4gPiB0ZW1wMTogICAgICAgICs3MS4w
wrBDICAoY3JpdCA9ICsyNTYuMMKwQykKPiA+ID4gdGVtcDI6ICAgICAgICArNjkuMMKwQyAgKGNy
aXQgPSArMTEwLjDCsEMpCj4gPiA+IHRlbXAzOiAgICAgICAgKzUyLjDCsEMgIChjcml0ID0gKzEw
NS4wwrBDKQo+ID4gPiB0ZW1wNDogICAgICAgICsyNS4wwrBDICAoY3JpdCA9ICsxMTAuMMKwQykK
PiA+ID4gdGVtcDU6ICAgICAgICArNTguMMKwQyAgKGNyaXQgPSArMTEwLjDCsEMpCj4gPiA+IAo+
ID4gPiBjb3JldGVtcC1pc2EtMDAwMAo+ID4gPiBBZGFwdGVyOiBJU0EgYWRhcHRlcgo+ID4gPiBD
b3JlIDA6ICAgICAgICs2Mi4wwrBDICAoaGlnaCA9ICsxMDUuMMKwQywgY3JpdCA9ICsxMDUuMMKw
QykKPiA+ID4gQ29yZSAxOiAgICAgICArNjAuMMKwQyAgKGhpZ2ggPSArMTA1LjDCsEMsIGNyaXQg
PSArMTA1LjDCsEMpCj4gPiA+IC0tCj4gPiA+IE15IG5vdGVib29rIChIUC9Db21wYXEgNjczMGIp
IGRvZXMgbm90IGhhdmUgYSBzZXBlcmF0ZSBmYW4gc2Vuc29yLgo+ID4gPiBUaGlzIGlzIHdpdGgg
My4xMi4xMyB3aXRoIG15IG5vcm1hbCB3b3JrbG9hZC4KPiA+ID4gCj4gPiA+IFBsZWFzZSwgdHJ1
c3QgbXkgYWJvdmUgbWVudGlvbm5lZCB2YWx1ZXMgb2YgOTQgwrBDIHZzLiA3NMKwQyBhcyBJCj4g
PiA+IGRvbid0IGxpa2UgdG8gYm9vdCAzLjEzLjYgYW55bW9yZSwgdG8gYXZvaWQgaGFybSB0byB0
aGUgbm90ZWJvb2sncwo+ID4gPiBjYXNpbmcuCj4gPiAKPiA+IFVuZGVyc3Rvb2QuIFVuZm9ydHVu
YXRlbHksIHdlJ2xsIG5lZWQgdG8gZ2V0IGluZm9ybWF0aW9uCj4gPiBmcm9tIHRoZSBuZXcga2Vy
bmVsIHRvIGJlIGFibGUgdG8gdHJhY2sgZG93biB0aGUgcHJvYmxlbS4KPiAKPiBJbmRlZWQuIE5v
dCBvbmx5IHRoZSBydW4tdGltZSB0ZW1wZXJhdHVyZXMsIGJ1dCBhbHNvIHRoZSBoaWdoIGFuZCBj
cml0Cj4gbGltaXRzLgo+IAo+ID4gPiBCdXQgSSdkIGRvIHRvIHRlc3QgYW55IGltcHJvdmVtZW50
LXBhdGNoLgo+ID4gCj4gPiBTbyBmYXIgSSBoYXZlIG5vIGlkZWEgd2hhdCBpcyBnb2luZyBvbi4g
SSBkb24ndCBzZWUgYW55dGhpbmcgaW4gdGhlCj4gPiBkcml2ZXJzIHByb3ZpZGluZyBhYm92ZSBk
YXRhIHRoYXQgd291bGQgZXhwbGFpbiB0aGUgYmVoYXZpb3IsCj4gPiBidXQgSSBtaWdodCBiZSBt
aXNzaW5nIHNvbWV0aGluZy4KPiAKPiBMb29rcyBsaWtlIGEgcmVncmVzc2lvbiBpbiB0aGUgYWNw
aSBzdWJzeXN0ZW0gb3IgaW4gcG93ZXIgbWFuYWdlbWVudCwKPiBub3QgaHdtb24uIEh3bW9uIGlz
IG1lcmVseSByZXBvcnRpbmcgdGhlIHRlbXBlcmF0dXJlcywgaXQncyBub3QKPiByZXNwb25zaWJs
ZSBmb3IgdGhlIGFjdHVhbCB0ZW1wZXJhdHVyZXMuCj4gCj4gQSBiaXNlY3Rpb24gd291bGQgY2Vy
dGFpbmx5IGhlbHAsIGJ1dCBvZiBjb3Vyc2UgdGhhdCB3b3VsZCByZXF1aXJlCj4gYm9vdGluZyB0
byBhIGJhZCBrZXJuZWwgaGFsZiBvZiB0aGUgdGltZSwgd2hpY2ggSSB1bmRlcnN0YW5kIE1hbnVh
bAo+IHdvdWxkbid0IGVuam95Lgo+IAo+IFRoZSBvbmx5IHR3byBjb21wb25lbnRzIHdoaWNoIEkg
dGhpbmsgY2FuIHJlYWNoIHN1Y2ggaGlnaCB0ZW1wZXJhdHVyZXMKPiBpbiBhIGxhcHRvcCBhcmUg
dGhlIENQVSBhbmQgdGhlIEdQVS4gSSBzdXBwb3NlIHRoYXQgdGhlICI5NCDCsEMgdnMuCj4gNzTC
sEMiIHJlZmVycyB0byBhY3BpdHoncyB0ZW1wMT8gSWYgdGhlIHRoZSB0ZW1wZXJhdHVyZXMgcmVw
b3J0ZWQgYnkKPiBjb3JldGVtcCByZW1haW4gdGhlIHNhbWUsIHRoZW4gSSBjYW4gb25seSBzdXBw
b3NlIHRoYXQgdGVtcDEgaXMgdGhlIEdQVQo+IHRlbXBlcmF0dXJlLiBQbGVhc2UgdGVsbCB1cyB3
aGljaCBHUFUgaXMgaW4gdGhpcyBsYXB0b3AsIGFuZCB3aGljaAo+IGRyaXZlciB5b3UncmUgdXNp
bmcuCgpBbHNvIGl0IHdvdWxkIGJlIGdvb2QgdG8ga25vdyB3aGljaCBjcHVmcmVxIGFuZCBjcHVp
ZGxlIGRyaXZlcnMgYXJlIGluIHVzZQphbmQgd2hldGhlciBvciBub3QgMy4xNC1yYzUgaGFzIHRo
ZSBwcm9ibGVtLgoKLS0gCkkgc3BlYWsgb25seSBmb3IgbXlzZWxmLgpSYWZhZWwgSi4gV3lzb2Nr
aSwgSW50ZWwgT3BlbiBTb3VyY2UgVGVjaG5vbG9neSBDZW50ZXIuCgpfX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpsbS1zZW5zb3JzIG1haWxpbmcgbGlzdAps
bS1zZW5zb3JzQGxtLXNlbnNvcnMub3JnCmh0dHA6Ly9saXN0cy5sbS1zZW5zb3JzLm9yZy9tYWls
bWFuL2xpc3RpbmZvL2xtLXNlbnNvcnM

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
  2014-03-08 11:08         ` Jean Delvare
@ 2014-03-08 15:59           ` Guenter Roeck
  -1 siblings, 0 replies; 45+ messages in thread
From: Guenter Roeck @ 2014-03-08 15:59 UTC (permalink / raw)
  To: Jean Delvare, Manuel Krause; +Cc: lm-sensors, linux-kernel, linux-pm

On 03/08/2014 03:08 AM, Jean Delvare wrote:
> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>> Hi, and thanks for the quick response!
>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>> running.
>>> Vanilla kernels 3.11.* and 3.12.* had been working on here without
>>> any extra work.
>>> --
>>> # sensors
>>> acpitz-virtual-0
>>> Adapter: Virtual device
>>> temp1:        +71.0°C  (crit = +256.0°C)
>>> temp2:        +69.0°C  (crit = +110.0°C)
>>> temp3:        +52.0°C  (crit = +105.0°C)
>>> temp4:        +25.0°C  (crit = +110.0°C)
>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>
>>> coretemp-isa-0000
>>> Adapter: ISA adapter
>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>> --
>>> My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
>>> This is with 3.12.13 with my normal workload.
>>>
>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>> don't like to boot 3.13.6 anymore, to avoid harm to the notebook's
>>> casing.
>>
>> Understood. Unfortunately, we'll need to get information
>> from the new kernel to be able to track down the problem.
>
> Indeed. Not only the run-time temperatures, but also the high and crit
> limits.
>
>>> But I'd do to test any improvement-patch.
>>
>> So far I have no idea what is going on. I don't see anything in the
>> drivers providing above data that would explain the behavior,
>> but I might be missing something.
>
> Looks like a regression in the acpi subsystem or in power management,
> not hwmon. Hwmon is merely reporting the temperatures, it's not
> responsible for the actual temperatures.
>

I would agree. I don't think we have enough information to be sure,
though. There might be some unintended interaction or interference.

gpu is a good hint ... for example, look at commit b9ed919f1c8
(drm/nouveau/drm/pm: remove everything except the hwmon interfaces
to THERM). nouveau does export pwm and fan control information,
so any change in that code may have unintended side effects.
Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
use devm_hwmon_register_with_groups) could have the observed impact,
as it is purely passive, but I prefer to be rather safe than sorry.

This problem has now been submitted into bugzilla as
https://bugzilla.kernel.org/show_bug.cgi?id=71711.

Guenter


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-03-08 15:59           ` Guenter Roeck
  0 siblings, 0 replies; 45+ messages in thread
From: Guenter Roeck @ 2014-03-08 15:59 UTC (permalink / raw)
  To: Jean Delvare, Manuel Krause; +Cc: lm-sensors, linux-kernel, linux-pm

On 03/08/2014 03:08 AM, Jean Delvare wrote:
> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>> Hi, and thanks for the quick response!
>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>> running.
>>> Vanilla kernels 3.11.* and 3.12.* had been working on here without
>>> any extra work.
>>> --
>>> # sensors
>>> acpitz-virtual-0
>>> Adapter: Virtual device
>>> temp1:        +71.0°C  (crit = +256.0°C)
>>> temp2:        +69.0°C  (crit = +110.0°C)
>>> temp3:        +52.0°C  (crit = +105.0°C)
>>> temp4:        +25.0°C  (crit = +110.0°C)
>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>
>>> coretemp-isa-0000
>>> Adapter: ISA adapter
>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>> --
>>> My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
>>> This is with 3.12.13 with my normal workload.
>>>
>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>> don't like to boot 3.13.6 anymore, to avoid harm to the notebook's
>>> casing.
>>
>> Understood. Unfortunately, we'll need to get information
>> from the new kernel to be able to track down the problem.
>
> Indeed. Not only the run-time temperatures, but also the high and crit
> limits.
>
>>> But I'd do to test any improvement-patch.
>>
>> So far I have no idea what is going on. I don't see anything in the
>> drivers providing above data that would explain the behavior,
>> but I might be missing something.
>
> Looks like a regression in the acpi subsystem or in power management,
> not hwmon. Hwmon is merely reporting the temperatures, it's not
> responsible for the actual temperatures.
>

I would agree. I don't think we have enough information to be sure,
though. There might be some unintended interaction or interference.

gpu is a good hint ... for example, look at commit b9ed919f1c8
(drm/nouveau/drm/pm: remove everything except the hwmon interfaces
to THERM). nouveau does export pwm and fan control information,
so any change in that code may have unintended side effects.
Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
use devm_hwmon_register_with_groups) could have the observed impact,
as it is purely passive, but I prefer to be rather safe than sorry.

This problem has now been submitted into bugzilla as
https://bugzilla.kernel.org/show_bug.cgi?id=71711.

Guenter


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-08 15:59           ` Guenter Roeck
  (?)
@ 2014-03-09  0:10             ` Manuel Krause
  -1 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-03-09  0:10 UTC (permalink / raw)
  To: Guenter Roeck, linux-kernel, linux-pm
  Cc: Jean Delvare, lm-sensors, Rafael J. Wysocki

On 2014-03-08 16:59, Guenter Roeck wrote:
> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>> Hi, and thanks for the quick response!
>>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>>> running.
>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>> without
>>>> any extra work.
>>>> --
>>>> # sensors
>>>> acpitz-virtual-0
>>>> Adapter: Virtual device
>>>> temp1:        +71.0°C  (crit = +256.0°C)
>>>> temp2:        +69.0°C  (crit = +110.0°C)
>>>> temp3:        +52.0°C  (crit = +105.0°C)
>>>> temp4:        +25.0°C  (crit = +110.0°C)
>>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>>
>>>> coretemp-isa-0000
>>>> Adapter: ISA adapter
>>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>>> --
>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>> sensor.
>>>> This is with 3.12.13 with my normal workload.
>>>>
>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>> notebook's
>>>> casing.
>>>
>>> Understood. Unfortunately, we'll need to get information
>>> from the new kernel to be able to track down the problem.
>>
>> Indeed. Not only the run-time temperatures, but also the high
>> and crit
>> limits.
>>
>>>> But I'd do to test any improvement-patch.
>>>
>>> So far I have no idea what is going on. I don't see anything
>>> in the
>>> drivers providing above data that would explain the behavior,
>>> but I might be missing something.
>>
>> Looks like a regression in the acpi subsystem or in power
>> management,
>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>> responsible for the actual temperatures.
>>
>
> I would agree. I don't think we have enough information to be sure,
> though. There might be some unintended interaction or interference.
>
> gpu is a good hint ... for example, look at commit b9ed919f1c8
> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
> to THERM). nouveau does export pwm and fan control information,
> so any change in that code may have unintended side effects.
> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
> use devm_hwmon_register_with_groups) could have the observed impact,
> as it is purely passive, but I prefer to be rather safe than sorry.
>
> This problem has now been submitted into bugzilla as
> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>
> Guenter
>

Sorry, for beeing late, had to search for/accumulate much info 
for you...
I hope, you like me to put it into one answer to you all CCing you.

My GFX is a GM45 Intel (mobile), shared memory, running the 
opensource Mesa drivers/extensions.
kernel-module: i915

According to the output of 'cpupower': I have
CPUidle driver: acpi_idle
CPUidle governor: menu

CPUfreq:
   driver: acpi-cpufreq
   available cpufreq governors: ondemand, performance
-
And "ondemand" is running.
--

# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +41.0°C  (crit = +256.0°C)
temp2:        +92.0°C  (crit = +110.0°C)
temp3:        +71.0°C  (crit = +105.0°C)
temp4:        +26.5°C  (crit = +110.0°C)
temp5:        +25.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)

FROM a critical "smelly" situation today, kernel-compilation, fan 
@100%.
--

Additional findings:

Identification from bootup ACPI initialisation vs. sensors:
temp1 = DTSZ
temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
temp3 = SKNZ
temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan 
(25 - 45 - 58 - max?)
Core 0 & Core 1 are the internal CPU T sensors.

With the 3.13.x (.5+) kernels the first gatherered cooling 
settings from bootup do stay forever. Means, rebooting a hot 
system will get a FDTZ @45°C+ and won't make any problems, as it 
does cool enough (even for kernel compiling on here). If it gets 
25°C @bootup the system goes into emergency cooling somewhen. 
Same is with a suspend/resume.

Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.


Thank you all for your engagement, best regards,
Manuel Krause.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
@ 2014-03-09  0:10             ` Manuel Krause
  0 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-03-09  0:10 UTC (permalink / raw)
  To: Guenter Roeck, linux-kernel, linux-pm; +Cc: Rafael J. Wysocki, lm-sensors

On 2014-03-08 16:59, Guenter Roeck wrote:
> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>> Hi, and thanks for the quick response!
>>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>>> running.
>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>> without
>>>> any extra work.
>>>> --
>>>> # sensors
>>>> acpitz-virtual-0
>>>> Adapter: Virtual device
>>>> temp1:        +71.0°C  (crit = +256.0°C)
>>>> temp2:        +69.0°C  (crit = +110.0°C)
>>>> temp3:        +52.0°C  (crit = +105.0°C)
>>>> temp4:        +25.0°C  (crit = +110.0°C)
>>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>>
>>>> coretemp-isa-0000
>>>> Adapter: ISA adapter
>>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>>> --
>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>> sensor.
>>>> This is with 3.12.13 with my normal workload.
>>>>
>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>> notebook's
>>>> casing.
>>>
>>> Understood. Unfortunately, we'll need to get information
>>> from the new kernel to be able to track down the problem.
>>
>> Indeed. Not only the run-time temperatures, but also the high
>> and crit
>> limits.
>>
>>>> But I'd do to test any improvement-patch.
>>>
>>> So far I have no idea what is going on. I don't see anything
>>> in the
>>> drivers providing above data that would explain the behavior,
>>> but I might be missing something.
>>
>> Looks like a regression in the acpi subsystem or in power
>> management,
>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>> responsible for the actual temperatures.
>>
>
> I would agree. I don't think we have enough information to be sure,
> though. There might be some unintended interaction or interference.
>
> gpu is a good hint ... for example, look at commit b9ed919f1c8
> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
> to THERM). nouveau does export pwm and fan control information,
> so any change in that code may have unintended side effects.
> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
> use devm_hwmon_register_with_groups) could have the observed impact,
> as it is purely passive, but I prefer to be rather safe than sorry.
>
> This problem has now been submitted into bugzilla as
> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>
> Guenter
>

Sorry, for beeing late, had to search for/accumulate much info 
for you...
I hope, you like me to put it into one answer to you all CCing you.

My GFX is a GM45 Intel (mobile), shared memory, running the 
opensource Mesa drivers/extensions.
kernel-module: i915

According to the output of 'cpupower': I have
CPUidle driver: acpi_idle
CPUidle governor: menu

CPUfreq:
   driver: acpi-cpufreq
   available cpufreq governors: ondemand, performance
-
And "ondemand" is running.
--

# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +41.0°C  (crit = +256.0°C)
temp2:        +92.0°C  (crit = +110.0°C)
temp3:        +71.0°C  (crit = +105.0°C)
temp4:        +26.5°C  (crit = +110.0°C)
temp5:        +25.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)

FROM a critical "smelly" situation today, kernel-compilation, fan 
@100%.
--

Additional findings:

Identification from bootup ACPI initialisation vs. sensors:
temp1 = DTSZ
temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
temp3 = SKNZ
temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan 
(25 - 45 - 58 - max?)
Core 0 & Core 1 are the internal CPU T sensors.

With the 3.13.x (.5+) kernels the first gatherered cooling 
settings from bootup do stay forever. Means, rebooting a hot 
system will get a FDTZ @45°C+ and won't make any problems, as it 
does cool enough (even for kernel compiling on here). If it gets 
25°C @bootup the system goes into emergency cooling somewhen. 
Same is with a suspend/resume.

Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.


Thank you all for your engagement, best regards,
Manuel Krause.



_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-03-09  0:10             ` Manuel Krause
  0 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-03-09  0:10 UTC (permalink / raw)
  To: Guenter Roeck, linux-kernel, linux-pm
  Cc: Jean Delvare, lm-sensors, Rafael J. Wysocki

On 2014-03-08 16:59, Guenter Roeck wrote:
> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>> Hi, and thanks for the quick response!
>>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>>> running.
>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>> without
>>>> any extra work.
>>>> --
>>>> # sensors
>>>> acpitz-virtual-0
>>>> Adapter: Virtual device
>>>> temp1:        +71.0°C  (crit = +256.0°C)
>>>> temp2:        +69.0°C  (crit = +110.0°C)
>>>> temp3:        +52.0°C  (crit = +105.0°C)
>>>> temp4:        +25.0°C  (crit = +110.0°C)
>>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>>
>>>> coretemp-isa-0000
>>>> Adapter: ISA adapter
>>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>>> --
>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>> sensor.
>>>> This is with 3.12.13 with my normal workload.
>>>>
>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>> notebook's
>>>> casing.
>>>
>>> Understood. Unfortunately, we'll need to get information
>>> from the new kernel to be able to track down the problem.
>>
>> Indeed. Not only the run-time temperatures, but also the high
>> and crit
>> limits.
>>
>>>> But I'd do to test any improvement-patch.
>>>
>>> So far I have no idea what is going on. I don't see anything
>>> in the
>>> drivers providing above data that would explain the behavior,
>>> but I might be missing something.
>>
>> Looks like a regression in the acpi subsystem or in power
>> management,
>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>> responsible for the actual temperatures.
>>
>
> I would agree. I don't think we have enough information to be sure,
> though. There might be some unintended interaction or interference.
>
> gpu is a good hint ... for example, look at commit b9ed919f1c8
> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
> to THERM). nouveau does export pwm and fan control information,
> so any change in that code may have unintended side effects.
> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
> use devm_hwmon_register_with_groups) could have the observed impact,
> as it is purely passive, but I prefer to be rather safe than sorry.
>
> This problem has now been submitted into bugzilla as
> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>
> Guenter
>

Sorry, for beeing late, had to search for/accumulate much info 
for you...
I hope, you like me to put it into one answer to you all CCing you.

My GFX is a GM45 Intel (mobile), shared memory, running the 
opensource Mesa drivers/extensions.
kernel-module: i915

According to the output of 'cpupower': I have
CPUidle driver: acpi_idle
CPUidle governor: menu

CPUfreq:
   driver: acpi-cpufreq
   available cpufreq governors: ondemand, performance
-
And "ondemand" is running.
--

# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +41.0°C  (crit = +256.0°C)
temp2:        +92.0°C  (crit = +110.0°C)
temp3:        +71.0°C  (crit = +105.0°C)
temp4:        +26.5°C  (crit = +110.0°C)
temp5:        +25.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)

FROM a critical "smelly" situation today, kernel-compilation, fan 
@100%.
--

Additional findings:

Identification from bootup ACPI initialisation vs. sensors:
temp1 = DTSZ
temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
temp3 = SKNZ
temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan 
(25 - 45 - 58 - max?)
Core 0 & Core 1 are the internal CPU T sensors.

With the 3.13.x (.5+) kernels the first gatherered cooling 
settings from bootup do stay forever. Means, rebooting a hot 
system will get a FDTZ @45°C+ and won't make any problems, as it 
does cool enough (even for kernel compiling on here). If it gets 
25°C @bootup the system goes into emergency cooling somewhen. 
Same is with a suspend/resume.

Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.


Thank you all for your engagement, best regards,
Manuel Krause.



_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-09  0:10             ` Manuel Krause
@ 2014-03-09 17:28               ` Guenter Roeck
  -1 siblings, 0 replies; 45+ messages in thread
From: Guenter Roeck @ 2014-03-09 17:28 UTC (permalink / raw)
  To: Manuel Krause, linux-kernel, linux-pm
  Cc: Jean Delvare, lm-sensors, Rafael J. Wysocki

On 03/08/2014 04:10 PM, Manuel Krause wrote:
> On 2014-03-08 16:59, Guenter Roeck wrote:
>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>>> Hi, and thanks for the quick response!
>>>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>>>> running.
>>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>>> without
>>>>> any extra work.
>>>>> --
>>>>> # sensors
>>>>> acpitz-virtual-0
>>>>> Adapter: Virtual device
>>>>> temp1:        +71.0°C  (crit = +256.0°C)
>>>>> temp2:        +69.0°C  (crit = +110.0°C)
>>>>> temp3:        +52.0°C  (crit = +105.0°C)
>>>>> temp4:        +25.0°C  (crit = +110.0°C)
>>>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>>>
>>>>> coretemp-isa-0000
>>>>> Adapter: ISA adapter
>>>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>> --
>>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>>> sensor.
>>>>> This is with 3.12.13 with my normal workload.
>>>>>
>>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>>> notebook's
>>>>> casing.
>>>>
>>>> Understood. Unfortunately, we'll need to get information
>>>> from the new kernel to be able to track down the problem.
>>>
>>> Indeed. Not only the run-time temperatures, but also the high
>>> and crit
>>> limits.
>>>
>>>>> But I'd do to test any improvement-patch.
>>>>
>>>> So far I have no idea what is going on. I don't see anything
>>>> in the
>>>> drivers providing above data that would explain the behavior,
>>>> but I might be missing something.
>>>
>>> Looks like a regression in the acpi subsystem or in power
>>> management,
>>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>>> responsible for the actual temperatures.
>>>
>>
>> I would agree. I don't think we have enough information to be sure,
>> though. There might be some unintended interaction or interference.
>>
>> gpu is a good hint ... for example, look at commit b9ed919f1c8
>> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
>> to THERM). nouveau does export pwm and fan control information,
>> so any change in that code may have unintended side effects.
>> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
>> use devm_hwmon_register_with_groups) could have the observed impact,
>> as it is purely passive, but I prefer to be rather safe than sorry.
>>
>> This problem has now been submitted into bugzilla as
>> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>>
>> Guenter
>>
>
> Sorry, for beeing late, had to search for/accumulate much info for you...
> I hope, you like me to put it into one answer to you all CCing you.
>
> My GFX is a GM45 Intel (mobile), shared memory, running the opensource Mesa drivers/extensions.
> kernel-module: i915
>
> According to the output of 'cpupower': I have
> CPUidle driver: acpi_idle
> CPUidle governor: menu
>
> CPUfreq:
>    driver: acpi-cpufreq
>    available cpufreq governors: ondemand, performance
> -
> And "ondemand" is running.
> --
>
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1:        +41.0°C  (crit = +256.0°C)
> temp2:        +92.0°C  (crit = +110.0°C)
> temp3:        +71.0°C  (crit = +105.0°C)
> temp4:        +26.5°C  (crit = +110.0°C)
> temp5:        +25.0°C  (crit = +110.0°C)
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
> Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)
>
> FROM a critical "smelly" situation today, kernel-compilation, fan @100%.
> --
>
> Additional findings:
>
> Identification from bootup ACPI initialisation vs. sensors:
> temp1 = DTSZ
> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
> temp3 = SKNZ
> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan (25 - 45 - 58 - max?)
> Core 0 & Core 1 are the internal CPU T sensors.
>
> With the 3.13.x (.5+) kernels the first gatherered cooling settings from bootup do stay forever. Means, rebooting a hot system will get a FDTZ @45°C+ and won't make any problems, as it does cool enough (even for kernel compiling on here). If it gets 25°C @bootup the system goes into emergency cooling somewhen. Same is with a suspend/resume.
>
> Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.
>

Hi Manuel,

thanks a lot for the additional information.

I added this exchange to bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=71711).
This is pretty much all I can do at this point; I have no idea what
is going on. Some change in ACPI would be my guess, but I did not see
anything catching my eye when looking through the ACPI code.

Guenter


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-03-09 17:28               ` Guenter Roeck
  0 siblings, 0 replies; 45+ messages in thread
From: Guenter Roeck @ 2014-03-09 17:28 UTC (permalink / raw)
  To: Manuel Krause, linux-kernel, linux-pm
  Cc: Jean Delvare, lm-sensors, Rafael J. Wysocki

On 03/08/2014 04:10 PM, Manuel Krause wrote:
> On 2014-03-08 16:59, Guenter Roeck wrote:
>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>>> Hi, and thanks for the quick response!
>>>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>>>> running.
>>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>>> without
>>>>> any extra work.
>>>>> --
>>>>> # sensors
>>>>> acpitz-virtual-0
>>>>> Adapter: Virtual device
>>>>> temp1:        +71.0°C  (crit = +256.0°C)
>>>>> temp2:        +69.0°C  (crit = +110.0°C)
>>>>> temp3:        +52.0°C  (crit = +105.0°C)
>>>>> temp4:        +25.0°C  (crit = +110.0°C)
>>>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>>>
>>>>> coretemp-isa-0000
>>>>> Adapter: ISA adapter
>>>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>> --
>>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>>> sensor.
>>>>> This is with 3.12.13 with my normal workload.
>>>>>
>>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>>> notebook's
>>>>> casing.
>>>>
>>>> Understood. Unfortunately, we'll need to get information
>>>> from the new kernel to be able to track down the problem.
>>>
>>> Indeed. Not only the run-time temperatures, but also the high
>>> and crit
>>> limits.
>>>
>>>>> But I'd do to test any improvement-patch.
>>>>
>>>> So far I have no idea what is going on. I don't see anything
>>>> in the
>>>> drivers providing above data that would explain the behavior,
>>>> but I might be missing something.
>>>
>>> Looks like a regression in the acpi subsystem or in power
>>> management,
>>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>>> responsible for the actual temperatures.
>>>
>>
>> I would agree. I don't think we have enough information to be sure,
>> though. There might be some unintended interaction or interference.
>>
>> gpu is a good hint ... for example, look at commit b9ed919f1c8
>> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
>> to THERM). nouveau does export pwm and fan control information,
>> so any change in that code may have unintended side effects.
>> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
>> use devm_hwmon_register_with_groups) could have the observed impact,
>> as it is purely passive, but I prefer to be rather safe than sorry.
>>
>> This problem has now been submitted into bugzilla as
>> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>>
>> Guenter
>>
>
> Sorry, for beeing late, had to search for/accumulate much info for you...
> I hope, you like me to put it into one answer to you all CCing you.
>
> My GFX is a GM45 Intel (mobile), shared memory, running the opensource Mesa drivers/extensions.
> kernel-module: i915
>
> According to the output of 'cpupower': I have
> CPUidle driver: acpi_idle
> CPUidle governor: menu
>
> CPUfreq:
>    driver: acpi-cpufreq
>    available cpufreq governors: ondemand, performance
> -
> And "ondemand" is running.
> --
>
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1:        +41.0°C  (crit = +256.0°C)
> temp2:        +92.0°C  (crit = +110.0°C)
> temp3:        +71.0°C  (crit = +105.0°C)
> temp4:        +26.5°C  (crit = +110.0°C)
> temp5:        +25.0°C  (crit = +110.0°C)
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
> Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)
>
> FROM a critical "smelly" situation today, kernel-compilation, fan @100%.
> --
>
> Additional findings:
>
> Identification from bootup ACPI initialisation vs. sensors:
> temp1 = DTSZ
> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
> temp3 = SKNZ
> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan (25 - 45 - 58 - max?)
> Core 0 & Core 1 are the internal CPU T sensors.
>
> With the 3.13.x (.5+) kernels the first gatherered cooling settings from bootup do stay forever. Means, rebooting a hot system will get a FDTZ @45°C+ and won't make any problems, as it does cool enough (even for kernel compiling on here). If it gets 25°C @bootup the system goes into emergency cooling somewhen. Same is with a suspend/resume.
>
> Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.
>

Hi Manuel,

thanks a lot for the additional information.

I added this exchange to bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=71711).
This is pretty much all I can do at this point; I have no idea what
is going on. Some change in ACPI would be my guess, but I did not see
anything catching my eye when looking through the ACPI code.

Guenter


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-09  0:10             ` Manuel Krause
@ 2014-03-09 17:58               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 45+ messages in thread
From: Rafael J. Wysocki @ 2014-03-09 17:58 UTC (permalink / raw)
  To: Manuel Krause
  Cc: Guenter Roeck, linux-kernel, linux-pm, Jean Delvare, lm-sensors,
	rui.zhang

On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
> On 2014-03-08 16:59, Guenter Roeck wrote:
> > On 03/08/2014 03:08 AM, Jean Delvare wrote:
> >> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
> >>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
> >>>> Hi, and thanks for the quick response!
> >>>> No special fancy "fan control policy". 'fancontrol' isn't up or
> >>>> running.
> >>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
> >>>> without
> >>>> any extra work.
> >>>> --
> >>>> # sensors
> >>>> acpitz-virtual-0
> >>>> Adapter: Virtual device
> >>>> temp1:        +71.0°C  (crit = +256.0°C)
> >>>> temp2:        +69.0°C  (crit = +110.0°C)
> >>>> temp3:        +52.0°C  (crit = +105.0°C)
> >>>> temp4:        +25.0°C  (crit = +110.0°C)
> >>>> temp5:        +58.0°C  (crit = +110.0°C)
> >>>>
> >>>> coretemp-isa-0000
> >>>> Adapter: ISA adapter
> >>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
> >>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
> >>>> --
> >>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
> >>>> sensor.
> >>>> This is with 3.12.13 with my normal workload.
> >>>>
> >>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
> >>>> don't like to boot 3.13.6 anymore, to avoid harm to the
> >>>> notebook's
> >>>> casing.
> >>>
> >>> Understood. Unfortunately, we'll need to get information
> >>> from the new kernel to be able to track down the problem.
> >>
> >> Indeed. Not only the run-time temperatures, but also the high
> >> and crit
> >> limits.
> >>
> >>>> But I'd do to test any improvement-patch.
> >>>
> >>> So far I have no idea what is going on. I don't see anything
> >>> in the
> >>> drivers providing above data that would explain the behavior,
> >>> but I might be missing something.
> >>
> >> Looks like a regression in the acpi subsystem or in power
> >> management,
> >> not hwmon. Hwmon is merely reporting the temperatures, it's not
> >> responsible for the actual temperatures.
> >>
> >
> > I would agree. I don't think we have enough information to be sure,
> > though. There might be some unintended interaction or interference.
> >
> > gpu is a good hint ... for example, look at commit b9ed919f1c8
> > (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
> > to THERM). nouveau does export pwm and fan control information,
> > so any change in that code may have unintended side effects.
> > Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
> > use devm_hwmon_register_with_groups) could have the observed impact,
> > as it is purely passive, but I prefer to be rather safe than sorry.
> >
> > This problem has now been submitted into bugzilla as
> > https://bugzilla.kernel.org/show_bug.cgi?id=71711.
> >
> > Guenter
> >
> 
> Sorry, for beeing late, had to search for/accumulate much info 
> for you...
> I hope, you like me to put it into one answer to you all CCing you.
> 
> My GFX is a GM45 Intel (mobile), shared memory, running the 
> opensource Mesa drivers/extensions.
> kernel-module: i915
> 
> According to the output of 'cpupower': I have
> CPUidle driver: acpi_idle
> CPUidle governor: menu
> 
> CPUfreq:
>    driver: acpi-cpufreq
>    available cpufreq governors: ondemand, performance
> -
> And "ondemand" is running.
> --
> 
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1:        +41.0°C  (crit = +256.0°C)
> temp2:        +92.0°C  (crit = +110.0°C)
> temp3:        +71.0°C  (crit = +105.0°C)
> temp4:        +26.5°C  (crit = +110.0°C)
> temp5:        +25.0°C  (crit = +110.0°C)
> 
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
> Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)
> 
> FROM a critical "smelly" situation today, kernel-compilation, fan 
> @100%.
> --
> 
> Additional findings:
> 
> Identification from bootup ACPI initialisation vs. sensors:
> temp1 = DTSZ
> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
> temp3 = SKNZ
> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan 
> (25 - 45 - 58 - max?)
> Core 0 & Core 1 are the internal CPU T sensors.
> 
> With the 3.13.x (.5+) kernels the first gatherered cooling 
> settings from bootup do stay forever. Means, rebooting a hot 
> system will get a FDTZ @45°C+ and won't make any problems, as it 
> does cool enough (even for kernel compiling on here). If it gets 
> 25°C @bootup the system goes into emergency cooling somewhen. 
> Same is with a suspend/resume.
> 
> Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.

This almost certainly is an ACPI regression, but I'm not sure whether
thermal management or CPU power management is broken on your system.

Can you compare the contents of /sys/class/thermal/ from working and
not working kernels, please?

Rafael


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-03-09 17:58               ` Rafael J. Wysocki
  0 siblings, 0 replies; 45+ messages in thread
From: Rafael J. Wysocki @ 2014-03-09 17:58 UTC (permalink / raw)
  To: Manuel Krause
  Cc: Guenter Roeck, linux-kernel, linux-pm, Jean Delvare, lm-sensors,
	rui.zhang

T24gU3VuZGF5LCBNYXJjaCAwOSwgMjAxNCAwMToxMDoyNSBBTSBNYW51ZWwgS3JhdXNlIHdyb3Rl
Ogo+IE9uIDIwMTQtMDMtMDggMTY6NTksIEd1ZW50ZXIgUm9lY2sgd3JvdGU6Cj4gPiBPbiAwMy8w
OC8yMDE0IDAzOjA4IEFNLCBKZWFuIERlbHZhcmUgd3JvdGU6Cj4gPj4gT24gRnJpLCA3IE1hciAy
MDE0IDE0OjUyOjMwIC0wODAwLCBHdWVudGVyIFJvZWNrIHdyb3RlOgo+ID4+PiBPbiBGcmksIE1h
ciAwNywgMjAxNCBhdCAxMTowNDoyOVBNICswMTAwLCBNYW51ZWwgS3JhdXNlIHdyb3RlOgo+ID4+
Pj4gSGksIGFuZCB0aGFua3MgZm9yIHRoZSBxdWljayByZXNwb25zZSEKPiA+Pj4+IE5vIHNwZWNp
YWwgZmFuY3kgImZhbiBjb250cm9sIHBvbGljeSIuICdmYW5jb250cm9sJyBpc24ndCB1cCBvcgo+
ID4+Pj4gcnVubmluZy4KPiA+Pj4+IFZhbmlsbGEga2VybmVscyAzLjExLiogYW5kIDMuMTIuKiBo
YWQgYmVlbiB3b3JraW5nIG9uIGhlcmUKPiA+Pj4+IHdpdGhvdXQKPiA+Pj4+IGFueSBleHRyYSB3
b3JrLgo+ID4+Pj4gLS0KPiA+Pj4+ICMgc2Vuc29ycwo+ID4+Pj4gYWNwaXR6LXZpcnR1YWwtMAo+
ID4+Pj4gQWRhcHRlcjogVmlydHVhbCBkZXZpY2UKPiA+Pj4+IHRlbXAxOiAgICAgICAgKzcxLjDC
sEMgIChjcml0ID0gKzI1Ni4wwrBDKQo+ID4+Pj4gdGVtcDI6ICAgICAgICArNjkuMMKwQyAgKGNy
aXQgPSArMTEwLjDCsEMpCj4gPj4+PiB0ZW1wMzogICAgICAgICs1Mi4wwrBDICAoY3JpdCA9ICsx
MDUuMMKwQykKPiA+Pj4+IHRlbXA0OiAgICAgICAgKzI1LjDCsEMgIChjcml0ID0gKzExMC4wwrBD
KQo+ID4+Pj4gdGVtcDU6ICAgICAgICArNTguMMKwQyAgKGNyaXQgPSArMTEwLjDCsEMpCj4gPj4+
Pgo+ID4+Pj4gY29yZXRlbXAtaXNhLTAwMDAKPiA+Pj4+IEFkYXB0ZXI6IElTQSBhZGFwdGVyCj4g
Pj4+PiBDb3JlIDA6ICAgICAgICs2Mi4wwrBDICAoaGlnaCA9ICsxMDUuMMKwQywgY3JpdCA9ICsx
MDUuMMKwQykKPiA+Pj4+IENvcmUgMTogICAgICAgKzYwLjDCsEMgIChoaWdoID0gKzEwNS4wwrBD
LCBjcml0ID0gKzEwNS4wwrBDKQo+ID4+Pj4gLS0KPiA+Pj4+IE15IG5vdGVib29rIChIUC9Db21w
YXEgNjczMGIpIGRvZXMgbm90IGhhdmUgYSBzZXBlcmF0ZSBmYW4KPiA+Pj4+IHNlbnNvci4KPiA+
Pj4+IFRoaXMgaXMgd2l0aCAzLjEyLjEzIHdpdGggbXkgbm9ybWFsIHdvcmtsb2FkLgo+ID4+Pj4K
PiA+Pj4+IFBsZWFzZSwgdHJ1c3QgbXkgYWJvdmUgbWVudGlvbm5lZCB2YWx1ZXMgb2YgOTQgwrBD
IHZzLiA3NMKwQyBhcyBJCj4gPj4+PiBkb24ndCBsaWtlIHRvIGJvb3QgMy4xMy42IGFueW1vcmUs
IHRvIGF2b2lkIGhhcm0gdG8gdGhlCj4gPj4+PiBub3RlYm9vaydzCj4gPj4+PiBjYXNpbmcuCj4g
Pj4+Cj4gPj4+IFVuZGVyc3Rvb2QuIFVuZm9ydHVuYXRlbHksIHdlJ2xsIG5lZWQgdG8gZ2V0IGlu
Zm9ybWF0aW9uCj4gPj4+IGZyb20gdGhlIG5ldyBrZXJuZWwgdG8gYmUgYWJsZSB0byB0cmFjayBk
b3duIHRoZSBwcm9ibGVtLgo+ID4+Cj4gPj4gSW5kZWVkLiBOb3Qgb25seSB0aGUgcnVuLXRpbWUg
dGVtcGVyYXR1cmVzLCBidXQgYWxzbyB0aGUgaGlnaAo+ID4+IGFuZCBjcml0Cj4gPj4gbGltaXRz
Lgo+ID4+Cj4gPj4+PiBCdXQgSSdkIGRvIHRvIHRlc3QgYW55IGltcHJvdmVtZW50LXBhdGNoLgo+
ID4+Pgo+ID4+PiBTbyBmYXIgSSBoYXZlIG5vIGlkZWEgd2hhdCBpcyBnb2luZyBvbi4gSSBkb24n
dCBzZWUgYW55dGhpbmcKPiA+Pj4gaW4gdGhlCj4gPj4+IGRyaXZlcnMgcHJvdmlkaW5nIGFib3Zl
IGRhdGEgdGhhdCB3b3VsZCBleHBsYWluIHRoZSBiZWhhdmlvciwKPiA+Pj4gYnV0IEkgbWlnaHQg
YmUgbWlzc2luZyBzb21ldGhpbmcuCj4gPj4KPiA+PiBMb29rcyBsaWtlIGEgcmVncmVzc2lvbiBp
biB0aGUgYWNwaSBzdWJzeXN0ZW0gb3IgaW4gcG93ZXIKPiA+PiBtYW5hZ2VtZW50LAo+ID4+IG5v
dCBod21vbi4gSHdtb24gaXMgbWVyZWx5IHJlcG9ydGluZyB0aGUgdGVtcGVyYXR1cmVzLCBpdCdz
IG5vdAo+ID4+IHJlc3BvbnNpYmxlIGZvciB0aGUgYWN0dWFsIHRlbXBlcmF0dXJlcy4KPiA+Pgo+
ID4KPiA+IEkgd291bGQgYWdyZWUuIEkgZG9uJ3QgdGhpbmsgd2UgaGF2ZSBlbm91Z2ggaW5mb3Jt
YXRpb24gdG8gYmUgc3VyZSwKPiA+IHRob3VnaC4gVGhlcmUgbWlnaHQgYmUgc29tZSB1bmludGVu
ZGVkIGludGVyYWN0aW9uIG9yIGludGVyZmVyZW5jZS4KPiA+Cj4gPiBncHUgaXMgYSBnb29kIGhp
bnQgLi4uIGZvciBleGFtcGxlLCBsb29rIGF0IGNvbW1pdCBiOWVkOTE5ZjFjOAo+ID4gKGRybS9u
b3V2ZWF1L2RybS9wbTogcmVtb3ZlIGV2ZXJ5dGhpbmcgZXhjZXB0IHRoZSBod21vbiBpbnRlcmZh
Y2VzCj4gPiB0byBUSEVSTSkuIG5vdXZlYXUgZG9lcyBleHBvcnQgcHdtIGFuZCBmYW4gY29udHJv
bCBpbmZvcm1hdGlvbiwKPiA+IHNvIGFueSBjaGFuZ2UgaW4gdGhhdCBjb2RlIG1heSBoYXZlIHVu
aW50ZW5kZWQgc2lkZSBlZmZlY3RzLgo+ID4gU2ltaWxhciwgSSBkb24ndCBrbm93IGhvdyBlYzM5
ZjY0YmJhIChkcm0vcmFkZW9uL2RwbTogQ29udmVydCB0bwo+ID4gdXNlIGRldm1faHdtb25fcmVn
aXN0ZXJfd2l0aF9ncm91cHMpIGNvdWxkIGhhdmUgdGhlIG9ic2VydmVkIGltcGFjdCwKPiA+IGFz
IGl0IGlzIHB1cmVseSBwYXNzaXZlLCBidXQgSSBwcmVmZXIgdG8gYmUgcmF0aGVyIHNhZmUgdGhh
biBzb3JyeS4KPiA+Cj4gPiBUaGlzIHByb2JsZW0gaGFzIG5vdyBiZWVuIHN1Ym1pdHRlZCBpbnRv
IGJ1Z3ppbGxhIGFzCj4gPiBodHRwczovL2J1Z3ppbGxhLmtlcm5lbC5vcmcvc2hvd19idWcuY2dp
P2lkPTcxNzExLgo+ID4KPiA+IEd1ZW50ZXIKPiA+Cj4gCj4gU29ycnksIGZvciBiZWVpbmcgbGF0
ZSwgaGFkIHRvIHNlYXJjaCBmb3IvYWNjdW11bGF0ZSBtdWNoIGluZm8gCj4gZm9yIHlvdS4uLgo+
IEkgaG9wZSwgeW91IGxpa2UgbWUgdG8gcHV0IGl0IGludG8gb25lIGFuc3dlciB0byB5b3UgYWxs
IENDaW5nIHlvdS4KPiAKPiBNeSBHRlggaXMgYSBHTTQ1IEludGVsIChtb2JpbGUpLCBzaGFyZWQg
bWVtb3J5LCBydW5uaW5nIHRoZSAKPiBvcGVuc291cmNlIE1lc2EgZHJpdmVycy9leHRlbnNpb25z
Lgo+IGtlcm5lbC1tb2R1bGU6IGk5MTUKPiAKPiBBY2NvcmRpbmcgdG8gdGhlIG91dHB1dCBvZiAn
Y3B1cG93ZXInOiBJIGhhdmUKPiBDUFVpZGxlIGRyaXZlcjogYWNwaV9pZGxlCj4gQ1BVaWRsZSBn
b3Zlcm5vcjogbWVudQo+IAo+IENQVWZyZXE6Cj4gICAgZHJpdmVyOiBhY3BpLWNwdWZyZXEKPiAg
ICBhdmFpbGFibGUgY3B1ZnJlcSBnb3Zlcm5vcnM6IG9uZGVtYW5kLCBwZXJmb3JtYW5jZQo+IC0K
PiBBbmQgIm9uZGVtYW5kIiBpcyBydW5uaW5nLgo+IC0tCj4gCj4gIyBzZW5zb3JzCj4gYWNwaXR6
LXZpcnR1YWwtMAo+IEFkYXB0ZXI6IFZpcnR1YWwgZGV2aWNlCj4gdGVtcDE6ICAgICAgICArNDEu
MMKwQyAgKGNyaXQgPSArMjU2LjDCsEMpCj4gdGVtcDI6ICAgICAgICArOTIuMMKwQyAgKGNyaXQg
PSArMTEwLjDCsEMpCj4gdGVtcDM6ICAgICAgICArNzEuMMKwQyAgKGNyaXQgPSArMTA1LjDCsEMp
Cj4gdGVtcDQ6ICAgICAgICArMjYuNcKwQyAgKGNyaXQgPSArMTEwLjDCsEMpCj4gdGVtcDU6ICAg
ICAgICArMjUuMMKwQyAgKGNyaXQgPSArMTEwLjDCsEMpCj4gCj4gY29yZXRlbXAtaXNhLTAwMDAK
PiBBZGFwdGVyOiBJU0EgYWRhcHRlcgo+IENvcmUgMDogICAgICAgKzg2LjDCsEMgIChoaWdoID0g
KzEwNS4wwrBDLCBjcml0ID0gKzEwNS4wwrBDKQo+IENvcmUgMTogICAgICAgKzg0LjDCsEMgICho
aWdoID0gKzEwNS4wwrBDLCBjcml0ID0gKzEwNS4wwrBDKQo+IAo+IEZST00gYSBjcml0aWNhbCAi
c21lbGx5IiBzaXR1YXRpb24gdG9kYXksIGtlcm5lbC1jb21waWxhdGlvbiwgZmFuIAo+IEAxMDAl
Lgo+IC0tCj4gCj4gQWRkaXRpb25hbCBmaW5kaW5nczoKPiAKPiBJZGVudGlmaWNhdGlvbiBmcm9t
IGJvb3R1cCBBQ1BJIGluaXRpYWxpc2F0aW9uIHZzLiBzZW5zb3JzOgo+IHRlbXAxID0gRFRTWgo+
IHRlbXAyID0gQ1BVWiAtLT4gdHJpZ2dlcmluZyBDb29saW5nIGluIDMuMTIuMTMgaWYgPiA3NMKw
Qwo+IHRlbXAzID0gU0tOWgo+IHRlbXA0ID0gQkFUWiAiQmF0dGVyeSBab25lIiBhbHdheXMgY2Fs
bSB+ICs2wrBDIG9mIGFtYmllbnQgVAo+IHRlbXA1ID0gRkRUWiAtLS0gaW4gMy4xMi4xMyBhIHJl
cHJlc2VudGF0aW9uIG9mIHRoZSBjb29saW5nLWZhbiAKPiAoMjUgLSA0NSAtIDU4IC0gbWF4PykK
PiBDb3JlIDAgJiBDb3JlIDEgYXJlIHRoZSBpbnRlcm5hbCBDUFUgVCBzZW5zb3JzLgo+IAo+IFdp
dGggdGhlIDMuMTMueCAoLjUrKSBrZXJuZWxzIHRoZSBmaXJzdCBnYXRoZXJlcmVkIGNvb2xpbmcg
Cj4gc2V0dGluZ3MgZnJvbSBib290dXAgZG8gc3RheSBmb3JldmVyLiBNZWFucywgcmVib290aW5n
IGEgaG90IAo+IHN5c3RlbSB3aWxsIGdldCBhIEZEVFogQDQ1wrBDKyBhbmQgd29uJ3QgbWFrZSBh
bnkgcHJvYmxlbXMsIGFzIGl0IAo+IGRvZXMgY29vbCBlbm91Z2ggKGV2ZW4gZm9yIGtlcm5lbCBj
b21waWxpbmcgb24gaGVyZSkuIElmIGl0IGdldHMgCj4gMjXCsEMgQGJvb3R1cCB0aGUgc3lzdGVt
IGdvZXMgaW50byBlbWVyZ2VuY3kgY29vbGluZyBzb21ld2hlbi4gCj4gU2FtZSBpcyB3aXRoIGEg
c3VzcGVuZC9yZXN1bWUuCj4gCj4gS2VybmVsIDMuMTIuMTMgYWRqdXN0cyB0aGUgY29vbGluZyBv
biBpdCdzIG93biwgYnV0IGFwcHJvcHJpYXRlbHkuCgpUaGlzIGFsbW9zdCBjZXJ0YWlubHkgaXMg
YW4gQUNQSSByZWdyZXNzaW9uLCBidXQgSSdtIG5vdCBzdXJlIHdoZXRoZXIKdGhlcm1hbCBtYW5h
Z2VtZW50IG9yIENQVSBwb3dlciBtYW5hZ2VtZW50IGlzIGJyb2tlbiBvbiB5b3VyIHN5c3RlbS4K
CkNhbiB5b3UgY29tcGFyZSB0aGUgY29udGVudHMgb2YgL3N5cy9jbGFzcy90aGVybWFsLyBmcm9t
IHdvcmtpbmcgYW5kCm5vdCB3b3JraW5nIGtlcm5lbHMsIHBsZWFzZT8KClJhZmFlbAoKCl9fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCmxtLXNlbnNvcnMgbWFp
bGluZyBsaXN0CmxtLXNlbnNvcnNAbG0tc2Vuc29ycy5vcmcKaHR0cDovL2xpc3RzLmxtLXNlbnNv
cnMub3JnL21haWxtYW4vbGlzdGluZm8vbG0tc2Vuc29ycw=

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-09 17:58               ` [lm-sensors] " Rafael J. Wysocki
@ 2014-03-10  1:49                 ` Manuel Krause
  -1 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-03-10  1:49 UTC (permalink / raw)
  To: Rafael J. Wysocki, linux-kernel, linux-pm
  Cc: Guenter Roeck, Jean Delvare, lm-sensors, rui.zhang

On 2014-03-09 18:58, Rafael J. Wysocki wrote:
> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>>>> Hi, and thanks for the quick response!
>>>>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>>>>> running.
>>>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>>>> without
>>>>>> any extra work.
>>>>>> --
>>>>>> # sensors
>>>>>> acpitz-virtual-0
>>>>>> Adapter: Virtual device
>>>>>> temp1:        +71.0°C  (crit = +256.0°C)
>>>>>> temp2:        +69.0°C  (crit = +110.0°C)
>>>>>> temp3:        +52.0°C  (crit = +105.0°C)
>>>>>> temp4:        +25.0°C  (crit = +110.0°C)
>>>>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>>>>
>>>>>> coretemp-isa-0000
>>>>>> Adapter: ISA adapter
>>>>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>>> --
>>>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>>>> sensor.
>>>>>> This is with 3.12.13 with my normal workload.
>>>>>>
>>>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>>>> notebook's
>>>>>> casing.
>>>>>
>>>>> Understood. Unfortunately, we'll need to get information
>>>>> from the new kernel to be able to track down the problem.
>>>>
>>>> Indeed. Not only the run-time temperatures, but also the high
>>>> and crit
>>>> limits.
>>>>
>>>>>> But I'd do to test any improvement-patch.
>>>>>
>>>>> So far I have no idea what is going on. I don't see anything
>>>>> in the
>>>>> drivers providing above data that would explain the behavior,
>>>>> but I might be missing something.
>>>>
>>>> Looks like a regression in the acpi subsystem or in power
>>>> management,
>>>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>>>> responsible for the actual temperatures.
>>>>
>>>
>>> I would agree. I don't think we have enough information to be sure,
>>> though. There might be some unintended interaction or interference.
>>>
>>> gpu is a good hint ... for example, look at commit b9ed919f1c8
>>> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
>>> to THERM). nouveau does export pwm and fan control information,
>>> so any change in that code may have unintended side effects.
>>> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
>>> use devm_hwmon_register_with_groups) could have the observed impact,
>>> as it is purely passive, but I prefer to be rather safe than sorry.
>>>
>>> This problem has now been submitted into bugzilla as
>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>>>
>>> Guenter
>>>
>>
>> Sorry, for beeing late, had to search for/accumulate much info
>> for you...
>> I hope, you like me to put it into one answer to you all CCing you.
>>
>> My GFX is a GM45 Intel (mobile), shared memory, running the
>> opensource Mesa drivers/extensions.
>> kernel-module: i915
>>
>> According to the output of 'cpupower': I have
>> CPUidle driver: acpi_idle
>> CPUidle governor: menu
>>
>> CPUfreq:
>>     driver: acpi-cpufreq
>>     available cpufreq governors: ondemand, performance
>> -
>> And "ondemand" is running.
>> --
>>
>> # sensors
>> acpitz-virtual-0
>> Adapter: Virtual device
>> temp1:        +41.0°C  (crit = +256.0°C)
>> temp2:        +92.0°C  (crit = +110.0°C)
>> temp3:        +71.0°C  (crit = +105.0°C)
>> temp4:        +26.5°C  (crit = +110.0°C)
>> temp5:        +25.0°C  (crit = +110.0°C)
>>
>> coretemp-isa-0000
>> Adapter: ISA adapter
>> Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
>> Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)
>>
>> FROM a critical "smelly" situation today, kernel-compilation, fan
>> @100%.
>> --
>>
>> Additional findings:
>>
>> Identification from bootup ACPI initialisation vs. sensors:
>> temp1 = DTSZ
>> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
>> temp3 = SKNZ
>> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
>> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan
>> (25 - 45 - 58 - max?)
>> Core 0 & Core 1 are the internal CPU T sensors.
>>
>> With the 3.13.x (.5+) kernels the first gatherered cooling
>> settings from bootup do stay forever. Means, rebooting a hot
>> system will get a FDTZ @45°C+ and won't make any problems, as it
>> does cool enough (even for kernel compiling on here). If it gets
>> 25°C @bootup the system goes into emergency cooling somewhen.
>> Same is with a suspend/resume.
>>
>> Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.
>
> This almost certainly is an ACPI regression, but I'm not sure whether
> thermal management or CPU power management is broken on your system.
>
> Can you compare the contents of /sys/class/thermal/ from working and
> not working kernels, please?
>
> Rafael
>

Hi again,
unfortunately you didn't specify how deeply I should dig into 
/sys/class/thermal. So you get the lines from # BOF # to # EOF # 
below. I hope they're readable without more comments.

The most remarkable changes, in my eyes, had happened within 
"thermal_zone1".

Best regards,
Manuel Krause


# BOF #
Following ones are all from /sys/class/thermal/ which are links 
to -> ../../devices/virtual/thermal/

I've listed the directories in sections of cooling_devices and 
thermal_zones separately for each bad/good kernel. For Emailing 
purposes only. You can merge them into a spreadsheet for your 
evaluation on your own. I've left out reporting some subdirs and 
subdir's values that _really_ didn't seem to need attention.

Also, I've had collected the #sensors output for each readout, 
having reproduced nearly the same workload, represented by the 
"Fan speed" (thermal_zone4==FDTZ).

And I've done my very best to not produce typos or c&p errors.


  3.13.5 -- 20140309 -- 20:52 -- bad
=============================
dir             |-
                  /type       /cur_state  /max_state
cooling_device0  Processor    0          10
cooling_device1  Processor    0          10
cooling_device2  Fan          0           1
cooling_device3  Fan          1           1
cooling_device4  Fan          0           1
cooling_device5  Fan          0           1
cooling_device6  Fan          0           1
cooling_device7  LCD          0          24

  3.12.13 -- 20140310 -- 00:26 -- good
==============================
dir             |-
                  /type       /cur_state  /max_state
cooling_device0  Processor    0          10
cooling_device1  Processor    0          10
cooling_device2  Fan          0           1
cooling_device3  Fan          1           1
cooling_device4  Fan          1           1
cooling_device5  Fan          1           1
cooling_device6  Fan          1           1
cooling_device7  LCD          0          24


  3.13.5 -- 20140309 -- 20:52 -- bad
=============================
dir          |-
               /passive /temp  |-     /cdev?_  /trip_   /trip_
                                       trip_    point_   point_
                                       point    ?_temp   ?_type
thermal_zone0  0        68000   ?=0    n.a.   256000   critical
thermal_zone1   n.a.    70000 |-
                                 ?=0   6       110000   critical
                                 ?=1   5       107000   passive
                                 ?=2   4        90000   active
                                 ?=3   3        75000   active
                                 ?=4   2        55000   active
                                 ?=5   1        45000   active
                                 ?=6   1        30000   active
thermal_zone2   n.a.    54000 |-
                                 ?=0   1       105000   critical
                                 ?=1   1        95000   passive
thermal_zone3   n.a.    25800 |-
                                 ?=0   1       110000   critical
                                 ?=1   1        60000   passive
thermal_zone4  0        58000   ?=0    n.a.   110000   critical


  3.12.13 -- 20140310 -- 00:26 -- good
==============================
dir          |-
               /passive /temp  |-     /cdev?_  /trip_   /trip_
                                       trip_    point_   point_
                                       point    ?_temp   ?_type
thermal_zone0  0        50000   ?=0    n.a.   256000   critical
thermal_zone1   n.a.    70000 |-
                                 ?=0   1       110000   critical
                                 ?=1   1       107000   passive
                                 ?=2   2        90000   active
                                 ?=3   3        67000   active
                                 ?=4   4        55000   active
                                 ?=5   5        45000   active
                                 ?=6   6        30000   active
thermal_zone2   n.a.    53000 |-
                                 ?=0   1       105000   critical
                                 ?=1   1        95000   passive
thermal_zone3   n.a.    25600 |-
                                 ?=0   1       110000   critical
                                 ?=1   1        60000   passive
thermal_zone4  0        58000   ?=0    n.a.   110000   critical

---
Legend here:
        /type  is always  acpitz
        /mode             enabled
        /policy           step_wise

       - from kernel ACPI initialisation: thermal_zone0==DTSZ,
          thermal_zone1==CPUZ, thermal_zone2==SKNZ,
          thermal_zone3==BATZ, thermal_zone4==FDTZ
       - n.a. means      file or value is not available
___
Legend in general:
              /power/control          is always  auto
              /power/runtime_status              unsupported
              /uevent                            ''==empty

----------------------------------------------------------------

  3.13.5 -- 20140309 -- 20:52 -- bad
=============================
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +68.0°C  (crit = +256.0°C)
temp2:        +70.0°C  (crit = +110.0°C)
temp3:        +54.0°C  (crit = +105.0°C)
temp4:        +25.8°C  (crit = +110.0°C)
temp5:        +58.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +66.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +63.0°C  (high = +105.0°C, crit = +105.0°C)


  3.12.13 -- 20140310 -- 00:26 -- good
==============================
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +50.0°C  (crit = +256.0°C)
temp2:        +70.0°C  (crit = +110.0°C)
temp3:        +53.0°C  (crit = +105.0°C)
temp4:        +25.6°C  (crit = +110.0°C)
temp5:        +58.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +65.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +61.0°C  (high = +105.0°C, crit = +105.0°C)

# EOF #



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-03-10  1:49                 ` Manuel Krause
  0 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-03-10  1:49 UTC (permalink / raw)
  To: Rafael J. Wysocki, linux-kernel, linux-pm
  Cc: Guenter Roeck, Jean Delvare, lm-sensors, rui.zhang

On 2014-03-09 18:58, Rafael J. Wysocki wrote:
> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>>>> Hi, and thanks for the quick response!
>>>>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>>>>> running.
>>>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>>>> without
>>>>>> any extra work.
>>>>>> --
>>>>>> # sensors
>>>>>> acpitz-virtual-0
>>>>>> Adapter: Virtual device
>>>>>> temp1:        +71.0°C  (crit = +256.0°C)
>>>>>> temp2:        +69.0°C  (crit = +110.0°C)
>>>>>> temp3:        +52.0°C  (crit = +105.0°C)
>>>>>> temp4:        +25.0°C  (crit = +110.0°C)
>>>>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>>>>
>>>>>> coretemp-isa-0000
>>>>>> Adapter: ISA adapter
>>>>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>>> --
>>>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>>>> sensor.
>>>>>> This is with 3.12.13 with my normal workload.
>>>>>>
>>>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>>>> notebook's
>>>>>> casing.
>>>>>
>>>>> Understood. Unfortunately, we'll need to get information
>>>>> from the new kernel to be able to track down the problem.
>>>>
>>>> Indeed. Not only the run-time temperatures, but also the high
>>>> and crit
>>>> limits.
>>>>
>>>>>> But I'd do to test any improvement-patch.
>>>>>
>>>>> So far I have no idea what is going on. I don't see anything
>>>>> in the
>>>>> drivers providing above data that would explain the behavior,
>>>>> but I might be missing something.
>>>>
>>>> Looks like a regression in the acpi subsystem or in power
>>>> management,
>>>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>>>> responsible for the actual temperatures.
>>>>
>>>
>>> I would agree. I don't think we have enough information to be sure,
>>> though. There might be some unintended interaction or interference.
>>>
>>> gpu is a good hint ... for example, look at commit b9ed919f1c8
>>> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
>>> to THERM). nouveau does export pwm and fan control information,
>>> so any change in that code may have unintended side effects.
>>> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
>>> use devm_hwmon_register_with_groups) could have the observed impact,
>>> as it is purely passive, but I prefer to be rather safe than sorry.
>>>
>>> This problem has now been submitted into bugzilla as
>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>>>
>>> Guenter
>>>
>>
>> Sorry, for beeing late, had to search for/accumulate much info
>> for you...
>> I hope, you like me to put it into one answer to you all CCing you.
>>
>> My GFX is a GM45 Intel (mobile), shared memory, running the
>> opensource Mesa drivers/extensions.
>> kernel-module: i915
>>
>> According to the output of 'cpupower': I have
>> CPUidle driver: acpi_idle
>> CPUidle governor: menu
>>
>> CPUfreq:
>>     driver: acpi-cpufreq
>>     available cpufreq governors: ondemand, performance
>> -
>> And "ondemand" is running.
>> --
>>
>> # sensors
>> acpitz-virtual-0
>> Adapter: Virtual device
>> temp1:        +41.0°C  (crit = +256.0°C)
>> temp2:        +92.0°C  (crit = +110.0°C)
>> temp3:        +71.0°C  (crit = +105.0°C)
>> temp4:        +26.5°C  (crit = +110.0°C)
>> temp5:        +25.0°C  (crit = +110.0°C)
>>
>> coretemp-isa-0000
>> Adapter: ISA adapter
>> Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
>> Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)
>>
>> FROM a critical "smelly" situation today, kernel-compilation, fan
>> @100%.
>> --
>>
>> Additional findings:
>>
>> Identification from bootup ACPI initialisation vs. sensors:
>> temp1 = DTSZ
>> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
>> temp3 = SKNZ
>> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
>> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan
>> (25 - 45 - 58 - max?)
>> Core 0 & Core 1 are the internal CPU T sensors.
>>
>> With the 3.13.x (.5+) kernels the first gatherered cooling
>> settings from bootup do stay forever. Means, rebooting a hot
>> system will get a FDTZ @45°C+ and won't make any problems, as it
>> does cool enough (even for kernel compiling on here). If it gets
>> 25°C @bootup the system goes into emergency cooling somewhen.
>> Same is with a suspend/resume.
>>
>> Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.
>
> This almost certainly is an ACPI regression, but I'm not sure whether
> thermal management or CPU power management is broken on your system.
>
> Can you compare the contents of /sys/class/thermal/ from working and
> not working kernels, please?
>
> Rafael
>

Hi again,
unfortunately you didn't specify how deeply I should dig into 
/sys/class/thermal. So you get the lines from # BOF # to # EOF # 
below. I hope they're readable without more comments.

The most remarkable changes, in my eyes, had happened within 
"thermal_zone1".

Best regards,
Manuel Krause


# BOF #
Following ones are all from /sys/class/thermal/ which are links 
to -> ../../devices/virtual/thermal/

I've listed the directories in sections of cooling_devices and 
thermal_zones separately for each bad/good kernel. For Emailing 
purposes only. You can merge them into a spreadsheet for your 
evaluation on your own. I've left out reporting some subdirs and 
subdir's values that _really_ didn't seem to need attention.

Also, I've had collected the #sensors output for each readout, 
having reproduced nearly the same workload, represented by the 
"Fan speed" (thermal_zone4==FDTZ).

And I've done my very best to not produce typos or c&p errors.


  3.13.5 -- 20140309 -- 20:52 -- bad
=============================
dir             |-
                  /type       /cur_state  /max_state
cooling_device0  Processor    0          10
cooling_device1  Processor    0          10
cooling_device2  Fan          0           1
cooling_device3  Fan          1           1
cooling_device4  Fan          0           1
cooling_device5  Fan          0           1
cooling_device6  Fan          0           1
cooling_device7  LCD          0          24

  3.12.13 -- 20140310 -- 00:26 -- good
==============================
dir             |-
                  /type       /cur_state  /max_state
cooling_device0  Processor    0          10
cooling_device1  Processor    0          10
cooling_device2  Fan          0           1
cooling_device3  Fan          1           1
cooling_device4  Fan          1           1
cooling_device5  Fan          1           1
cooling_device6  Fan          1           1
cooling_device7  LCD          0          24


  3.13.5 -- 20140309 -- 20:52 -- bad
=============================
dir          |-
               /passive /temp  |-     /cdev?_  /trip_   /trip_
                                       trip_    point_   point_
                                       point    ?_temp   ?_type
thermal_zone0  0        68000   ?=0    n.a.   256000   critical
thermal_zone1   n.a.    70000 |-
                                 ?=0   6       110000   critical
                                 ?=1   5       107000   passive
                                 ?=2   4        90000   active
                                 ?=3   3        75000   active
                                 ?=4   2        55000   active
                                 ?=5   1        45000   active
                                 ?=6   1        30000   active
thermal_zone2   n.a.    54000 |-
                                 ?=0   1       105000   critical
                                 ?=1   1        95000   passive
thermal_zone3   n.a.    25800 |-
                                 ?=0   1       110000   critical
                                 ?=1   1        60000   passive
thermal_zone4  0        58000   ?=0    n.a.   110000   critical


  3.12.13 -- 20140310 -- 00:26 -- good
==============================
dir          |-
               /passive /temp  |-     /cdev?_  /trip_   /trip_
                                       trip_    point_   point_
                                       point    ?_temp   ?_type
thermal_zone0  0        50000   ?=0    n.a.   256000   critical
thermal_zone1   n.a.    70000 |-
                                 ?=0   1       110000   critical
                                 ?=1   1       107000   passive
                                 ?=2   2        90000   active
                                 ?=3   3        67000   active
                                 ?=4   4        55000   active
                                 ?=5   5        45000   active
                                 ?=6   6        30000   active
thermal_zone2   n.a.    53000 |-
                                 ?=0   1       105000   critical
                                 ?=1   1        95000   passive
thermal_zone3   n.a.    25600 |-
                                 ?=0   1       110000   critical
                                 ?=1   1        60000   passive
thermal_zone4  0        58000   ?=0    n.a.   110000   critical

---
Legend here:
        /type  is always  acpitz
        /mode             enabled
        /policy           step_wise

       - from kernel ACPI initialisation: thermal_zone0==DTSZ,
          thermal_zone1==CPUZ, thermal_zone2==SKNZ,
          thermal_zone3==BATZ, thermal_zone4==FDTZ
       - n.a. means      file or value is not available
___
Legend in general:
              /power/control          is always  auto
              /power/runtime_status              unsupported
              /uevent                            ''==empty

----------------------------------------------------------------

  3.13.5 -- 20140309 -- 20:52 -- bad
=============================
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +68.0°C  (crit = +256.0°C)
temp2:        +70.0°C  (crit = +110.0°C)
temp3:        +54.0°C  (crit = +105.0°C)
temp4:        +25.8°C  (crit = +110.0°C)
temp5:        +58.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +66.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +63.0°C  (high = +105.0°C, crit = +105.0°C)


  3.12.13 -- 20140310 -- 00:26 -- good
==============================
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +50.0°C  (crit = +256.0°C)
temp2:        +70.0°C  (crit = +110.0°C)
temp3:        +53.0°C  (crit = +105.0°C)
temp4:        +25.6°C  (crit = +110.0°C)
temp5:        +58.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +65.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +61.0°C  (high = +105.0°C, crit = +105.0°C)

# EOF #



_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-10  1:49                 ` [lm-sensors] " Manuel Krause
@ 2014-03-11 21:59                   ` Manuel Krause
  -1 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-03-11 21:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang
  Cc: Guenter Roeck, Jean Delvare, lm-sensors

On 2014-03-10 02:49, Manuel Krause wrote:
> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>>>>> Hi, and thanks for the quick response!
>>>>>>> No special fancy "fan control policy". 'fancontrol' isn't
>>>>>>> up or
>>>>>>> running.
>>>>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>>>>> without
>>>>>>> any extra work.
>>>>>>> --
>>>>>>> # sensors
>>>>>>> acpitz-virtual-0
>>>>>>> Adapter: Virtual device
>>>>>>> temp1:        +71.0°C  (crit = +256.0°C)
>>>>>>> temp2:        +69.0°C  (crit = +110.0°C)
>>>>>>> temp3:        +52.0°C  (crit = +105.0°C)
>>>>>>> temp4:        +25.0°C  (crit = +110.0°C)
>>>>>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>>>>>
>>>>>>> coretemp-isa-0000
>>>>>>> Adapter: ISA adapter
>>>>>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>>>> --
>>>>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>>>>> sensor.
>>>>>>> This is with 3.12.13 with my normal workload.
>>>>>>>
>>>>>>> Please, trust my above mentionned values of 94 °C vs. 74°C
>>>>>>> as I
>>>>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>>>>> notebook's
>>>>>>> casing.
>>>>>>
>>>>>> Understood. Unfortunately, we'll need to get information
>>>>>> from the new kernel to be able to track down the problem.
>>>>>
>>>>> Indeed. Not only the run-time temperatures, but also the high
>>>>> and crit
>>>>> limits.
>>>>>
>>>>>>> But I'd do to test any improvement-patch.
>>>>>>
>>>>>> So far I have no idea what is going on. I don't see anything
>>>>>> in the
>>>>>> drivers providing above data that would explain the behavior,
>>>>>> but I might be missing something.
>>>>>
>>>>> Looks like a regression in the acpi subsystem or in power
>>>>> management,
>>>>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>>>>> responsible for the actual temperatures.
>>>>>
>>>>
>>>> I would agree. I don't think we have enough information to be
>>>> sure,
>>>> though. There might be some unintended interaction or
>>>> interference.
>>>>
>>>> gpu is a good hint ... for example, look at commit b9ed919f1c8
>>>> (drm/nouveau/drm/pm: remove everything except the hwmon
>>>> interfaces
>>>> to THERM). nouveau does export pwm and fan control information,
>>>> so any change in that code may have unintended side effects.
>>>> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
>>>> use devm_hwmon_register_with_groups) could have the observed
>>>> impact,
>>>> as it is purely passive, but I prefer to be rather safe than
>>>> sorry.
>>>>
>>>> This problem has now been submitted into bugzilla as
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>>>>
>>>> Guenter
>>>>
>>>
>>> Sorry, for beeing late, had to search for/accumulate much info
>>> for you...
>>> I hope, you like me to put it into one answer to you all CCing
>>> you.
>>>
>>> My GFX is a GM45 Intel (mobile), shared memory, running the
>>> opensource Mesa drivers/extensions.
>>> kernel-module: i915
>>>
>>> According to the output of 'cpupower': I have
>>> CPUidle driver: acpi_idle
>>> CPUidle governor: menu
>>>
>>> CPUfreq:
>>>     driver: acpi-cpufreq
>>>     available cpufreq governors: ondemand, performance
>>> -
>>> And "ondemand" is running.
>>> --
>>>
>>> # sensors
>>> acpitz-virtual-0
>>> Adapter: Virtual device
>>> temp1:        +41.0°C  (crit = +256.0°C)
>>> temp2:        +92.0°C  (crit = +110.0°C)
>>> temp3:        +71.0°C  (crit = +105.0°C)
>>> temp4:        +26.5°C  (crit = +110.0°C)
>>> temp5:        +25.0°C  (crit = +110.0°C)
>>>
>>> coretemp-isa-0000
>>> Adapter: ISA adapter
>>> Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
>>> Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)
>>>
>>> FROM a critical "smelly" situation today, kernel-compilation, fan
>>> @100%.
>>> --
>>>
>>> Additional findings:
>>>
>>> Identification from bootup ACPI initialisation vs. sensors:
>>> temp1 = DTSZ
>>> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
>>> temp3 = SKNZ
>>> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
>>> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan
>>> (25 - 45 - 58 - max?)
>>> Core 0 & Core 1 are the internal CPU T sensors.
>>>
>>> With the 3.13.x (.5+) kernels the first gatherered cooling
>>> settings from bootup do stay forever. Means, rebooting a hot
>>> system will get a FDTZ @45°C+ and won't make any problems, as it
>>> does cool enough (even for kernel compiling on here). If it gets
>>> 25°C @bootup the system goes into emergency cooling somewhen.
>>> Same is with a suspend/resume.
>>>
>>> Kernel 3.12.13 adjusts the cooling on it's own, but
>>> appropriately.
>>
>> This almost certainly is an ACPI regression, but I'm not sure
>> whether
>> thermal management or CPU power management is broken on your
>> system.
>>
>> Can you compare the contents of /sys/class/thermal/ from
>> working and
>> not working kernels, please?
>>
>> Rafael
>>
>
> Hi again,
> unfortunately you didn't specify how deeply I should dig into
> /sys/class/thermal. So you get the lines from # BOF # to # EOF #
> below. I hope they're readable without more comments.
>
> The most remarkable changes, in my eyes, had happened within
> "thermal_zone1".
>
> Best regards,
> Manuel Krause
>
>
> # BOF #
> Following ones are all from /sys/class/thermal/ which are links
> to -> ../../devices/virtual/thermal/
>
> I've listed the directories in sections of cooling_devices and
> thermal_zones separately for each bad/good kernel. For Emailing
> purposes only. You can merge them into a spreadsheet for your
> evaluation on your own. I've left out reporting some subdirs and
> subdir's values that _really_ didn't seem to need attention.
>
> Also, I've had collected the #sensors output for each readout,
> having reproduced nearly the same workload, represented by the
> "Fan speed" (thermal_zone4==FDTZ).
>
> And I've done my very best to not produce typos or c&p errors.
>
>
>   3.13.5 -- 20140309 -- 20:52 -- bad
> =============================
> dir             |-
>                   /type       /cur_state  /max_state
> cooling_device0  Processor    0          10
> cooling_device1  Processor    0          10
> cooling_device2  Fan          0           1
> cooling_device3  Fan          1           1
> cooling_device4  Fan          0           1
> cooling_device5  Fan          0           1
> cooling_device6  Fan          0           1
> cooling_device7  LCD          0          24
>
>   3.12.13 -- 20140310 -- 00:26 -- good
> ==============================
> dir             |-
>                   /type       /cur_state  /max_state
> cooling_device0  Processor    0          10
> cooling_device1  Processor    0          10
> cooling_device2  Fan          0           1
> cooling_device3  Fan          1           1
> cooling_device4  Fan          1           1
> cooling_device5  Fan          1           1
> cooling_device6  Fan          1           1
> cooling_device7  LCD          0          24
>
>
>   3.13.5 -- 20140309 -- 20:52 -- bad
> =============================
> dir          |-
>                /passive /temp  |-     /cdev?_  /trip_   /trip_
>                                        trip_    point_   point_
>                                        point    ?_temp   ?_type
> thermal_zone0  0        68000   ?=0    n.a.   256000   critical
> thermal_zone1   n.a.    70000 |-
>                                  ?=0   6       110000   critical
>                                  ?=1   5       107000   passive
>                                  ?=2   4        90000   active
>                                  ?=3   3        75000   active
>                                  ?=4   2        55000   active
>                                  ?=5   1        45000   active
>                                  ?=6   1        30000   active
> thermal_zone2   n.a.    54000 |-
>                                  ?=0   1       105000   critical
>                                  ?=1   1        95000   passive
> thermal_zone3   n.a.    25800 |-
>                                  ?=0   1       110000   critical
>                                  ?=1   1        60000   passive
> thermal_zone4  0        58000   ?=0    n.a.   110000   critical
>
>
>   3.12.13 -- 20140310 -- 00:26 -- good
> ==============================
> dir          |-
>                /passive /temp  |-     /cdev?_  /trip_   /trip_
>                                        trip_    point_   point_
>                                        point    ?_temp   ?_type
> thermal_zone0  0        50000   ?=0    n.a.   256000   critical
> thermal_zone1   n.a.    70000 |-
>                                  ?=0   1       110000   critical
>                                  ?=1   1       107000   passive
>                                  ?=2   2        90000   active
>                                  ?=3   3        67000   active
>                                  ?=4   4        55000   active
>                                  ?=5   5        45000   active
>                                  ?=6   6        30000   active
> thermal_zone2   n.a.    53000 |-
>                                  ?=0   1       105000   critical
>                                  ?=1   1        95000   passive
> thermal_zone3   n.a.    25600 |-
>                                  ?=0   1       110000   critical
>                                  ?=1   1        60000   passive
> thermal_zone4  0        58000   ?=0    n.a.   110000   critical
>
> ---
> Legend here:
>         /type  is always  acpitz
>         /mode             enabled
>         /policy           step_wise
>
>        - from kernel ACPI initialisation: thermal_zone0==DTSZ,
>           thermal_zone1==CPUZ, thermal_zone2==SKNZ,
>           thermal_zone3==BATZ, thermal_zone4==FDTZ
>        - n.a. means      file or value is not available
> ___
> Legend in general:
>               /power/control          is always  auto
>               /power/runtime_status              unsupported
>               /uevent                            ''==empty
>
> ----------------------------------------------------------------
>
>   3.13.5 -- 20140309 -- 20:52 -- bad
> =============================
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1:        +68.0°C  (crit = +256.0°C)
> temp2:        +70.0°C  (crit = +110.0°C)
> temp3:        +54.0°C  (crit = +105.0°C)
> temp4:        +25.8°C  (crit = +110.0°C)
> temp5:        +58.0°C  (crit = +110.0°C)
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +66.0°C  (high = +105.0°C, crit = +105.0°C)
> Core 1:       +63.0°C  (high = +105.0°C, crit = +105.0°C)
>
>
>   3.12.13 -- 20140310 -- 00:26 -- good
> ==============================
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1:        +50.0°C  (crit = +256.0°C)
> temp2:        +70.0°C  (crit = +110.0°C)
> temp3:        +53.0°C  (crit = +105.0°C)
> temp4:        +25.6°C  (crit = +110.0°C)
> temp5:        +58.0°C  (crit = +110.0°C)
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +65.0°C  (high = +105.0°C, crit = +105.0°C)
> Core 1:       +61.0°C  (high = +105.0°C, crit = +105.0°C)
>
> # EOF #
>
>

Hi, and thank you for your attention ^^

at the bottom of this email you'd get the actual values for the 
new 3.12.14 kernel for two different levels of usage and ambient 
temperature.
You'd read, in kernel 3.12.14 the /cdev?_trip_point enumeration 
has changed to the way of 3.13.? and also one /trip_point_?_temp 
did. But 3.12.14 is working as well as 3.12.13. (So my first 
eyecatcher didn't lead to useful things.)
I'm not capaple of finding or understanding the related code, 
but, please, let me present an idea of what MAY be going on:

In 3.12.13+, on my system, the effective cooling fan speed seems 
to be an accumulation, maybe bitwise, of 
cooling_device[2-6]/cur_state, that each get activated (=1) by a 
certain other temperature value or level; each of the 
cooling_device[2-6]/cur_state stays @1 as long as their ref. 
temp. does not undershoot. For my system this ref. temp.  would 
most likely be triggered by temp2 == thermal_zone1/temp [CPUZ].

In 3.13.? there seems to get only one of 
cooling_device[2-6]/cur_state be set to 1, the others left and/or 
rewritten with 0. And the fan speed algorithm then accumulates 
only one 1 without seeing the [_LEVEL_] number of 
cooling_device[2-6]... or re-requesting the related trigger 
temperature.

I hope this leads you developers nearer to a conclusion on how to 
fix it,
best regards, Manuel Krause

_____________________________
3.12.14 -- 20140311 -- 19:07 -- changed, not broken -- normal use
=============================
/sys/class/thermal/*  which
are links to -> ../../devices/virtual/thermal/*

dir             |-
                  /type       /cur_state  /max_state  Maybe
                                                       trigger
                                                       /PWM
...
cooling_device2  Fan          0           1          not yet
                                                       observed
cooling_device3  Fan          0           1          FDTZ==58°C
cooling_device4  Fan          1           1          FDTZ==45°C
cooling_device5  Fan          1           1          FDTZ==34°C
cooling_device6  Fan          1           1          FDTZ==25°C
...

dir          |-
               /passive /temp  |-     /cdev?_  /trip_   /trip_
                                       trip_    point_   point_
                                       point    ?_temp   ?_type
...
thermal_zone1   n.a.    73000 |- 
(CPUZ)
                                 ?=0   6       110000   critical
                                 ?=1   5       107000   passive
                                 ?=2   4        90000   active
                                 ?=3   3        75000   active
                                 ?=4   2        55000   active
                                 ?=5   1        45000   active
                                 ?=6   1        30000   active
...
thermal_zone4   n.a.    45000   ?=0    n.a.   110000   critical 
(FDTZ)
...

# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +46.0°C  (crit = +256.0°C)
temp2:        +73.0°C  (crit = +110.0°C)
temp3:        +57.0°C  (crit = +105.0°C)
temp4:        +26.3°C  (crit = +110.0°C)
temp5:        +45.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +68.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +66.0°C  (high = +105.0°C, crit = +105.0°C)


_____________________________
3.12.14 -- 20140311 -- 21:09 -- changed, not broken -- idle state
=============================

dir             |-
                  /type       /cur_state  /max_state  Maybe
                                                       trigger
                                                       /PWM
...
cooling_device2  Fan          0           1          not yet
                                                       observed
cooling_device3  Fan          0           1          FDTZ==58°C
cooling_device4  Fan          0           1          FDTZ==45°C
cooling_device5  Fan          0           1          FDTZ==34°C
cooling_device6  Fan          1           1          FDTZ==25°C
...

dir          |-
               /passive /temp
thermal_zone1   n.a.    46000 ... (CPUZ)
...
thermal_zone4   n.a.    25000 ... (FDTZ)
...

# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +50.0°C  (crit = +256.0°C)
temp2:        +46.0°C  (crit = +110.0°C)
temp3:        +44.0°C  (crit = +105.0°C)
temp4:        +25.7°C  (crit = +110.0°C)
temp5:        +25.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +41.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +41.0°C  (high = +105.0°C, crit = +105.0°C)
_____________________________



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-03-11 21:59                   ` Manuel Krause
  0 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-03-11 21:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang
  Cc: Guenter Roeck, Jean Delvare, lm-sensors

On 2014-03-10 02:49, Manuel Krause wrote:
> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>>>>> Hi, and thanks for the quick response!
>>>>>>> No special fancy "fan control policy". 'fancontrol' isn't
>>>>>>> up or
>>>>>>> running.
>>>>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>>>>> without
>>>>>>> any extra work.
>>>>>>> --
>>>>>>> # sensors
>>>>>>> acpitz-virtual-0
>>>>>>> Adapter: Virtual device
>>>>>>> temp1:        +71.0°C  (crit = +256.0°C)
>>>>>>> temp2:        +69.0°C  (crit = +110.0°C)
>>>>>>> temp3:        +52.0°C  (crit = +105.0°C)
>>>>>>> temp4:        +25.0°C  (crit = +110.0°C)
>>>>>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>>>>>
>>>>>>> coretemp-isa-0000
>>>>>>> Adapter: ISA adapter
>>>>>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>>>> --
>>>>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>>>>> sensor.
>>>>>>> This is with 3.12.13 with my normal workload.
>>>>>>>
>>>>>>> Please, trust my above mentionned values of 94 °C vs. 74°C
>>>>>>> as I
>>>>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>>>>> notebook's
>>>>>>> casing.
>>>>>>
>>>>>> Understood. Unfortunately, we'll need to get information
>>>>>> from the new kernel to be able to track down the problem.
>>>>>
>>>>> Indeed. Not only the run-time temperatures, but also the high
>>>>> and crit
>>>>> limits.
>>>>>
>>>>>>> But I'd do to test any improvement-patch.
>>>>>>
>>>>>> So far I have no idea what is going on. I don't see anything
>>>>>> in the
>>>>>> drivers providing above data that would explain the behavior,
>>>>>> but I might be missing something.
>>>>>
>>>>> Looks like a regression in the acpi subsystem or in power
>>>>> management,
>>>>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>>>>> responsible for the actual temperatures.
>>>>>
>>>>
>>>> I would agree. I don't think we have enough information to be
>>>> sure,
>>>> though. There might be some unintended interaction or
>>>> interference.
>>>>
>>>> gpu is a good hint ... for example, look at commit b9ed919f1c8
>>>> (drm/nouveau/drm/pm: remove everything except the hwmon
>>>> interfaces
>>>> to THERM). nouveau does export pwm and fan control information,
>>>> so any change in that code may have unintended side effects.
>>>> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
>>>> use devm_hwmon_register_with_groups) could have the observed
>>>> impact,
>>>> as it is purely passive, but I prefer to be rather safe than
>>>> sorry.
>>>>
>>>> This problem has now been submitted into bugzilla as
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>>>>
>>>> Guenter
>>>>
>>>
>>> Sorry, for beeing late, had to search for/accumulate much info
>>> for you...
>>> I hope, you like me to put it into one answer to you all CCing
>>> you.
>>>
>>> My GFX is a GM45 Intel (mobile), shared memory, running the
>>> opensource Mesa drivers/extensions.
>>> kernel-module: i915
>>>
>>> According to the output of 'cpupower': I have
>>> CPUidle driver: acpi_idle
>>> CPUidle governor: menu
>>>
>>> CPUfreq:
>>>     driver: acpi-cpufreq
>>>     available cpufreq governors: ondemand, performance
>>> -
>>> And "ondemand" is running.
>>> --
>>>
>>> # sensors
>>> acpitz-virtual-0
>>> Adapter: Virtual device
>>> temp1:        +41.0°C  (crit = +256.0°C)
>>> temp2:        +92.0°C  (crit = +110.0°C)
>>> temp3:        +71.0°C  (crit = +105.0°C)
>>> temp4:        +26.5°C  (crit = +110.0°C)
>>> temp5:        +25.0°C  (crit = +110.0°C)
>>>
>>> coretemp-isa-0000
>>> Adapter: ISA adapter
>>> Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
>>> Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)
>>>
>>> FROM a critical "smelly" situation today, kernel-compilation, fan
>>> @100%.
>>> --
>>>
>>> Additional findings:
>>>
>>> Identification from bootup ACPI initialisation vs. sensors:
>>> temp1 = DTSZ
>>> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
>>> temp3 = SKNZ
>>> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
>>> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan
>>> (25 - 45 - 58 - max?)
>>> Core 0 & Core 1 are the internal CPU T sensors.
>>>
>>> With the 3.13.x (.5+) kernels the first gatherered cooling
>>> settings from bootup do stay forever. Means, rebooting a hot
>>> system will get a FDTZ @45°C+ and won't make any problems, as it
>>> does cool enough (even for kernel compiling on here). If it gets
>>> 25°C @bootup the system goes into emergency cooling somewhen.
>>> Same is with a suspend/resume.
>>>
>>> Kernel 3.12.13 adjusts the cooling on it's own, but
>>> appropriately.
>>
>> This almost certainly is an ACPI regression, but I'm not sure
>> whether
>> thermal management or CPU power management is broken on your
>> system.
>>
>> Can you compare the contents of /sys/class/thermal/ from
>> working and
>> not working kernels, please?
>>
>> Rafael
>>
>
> Hi again,
> unfortunately you didn't specify how deeply I should dig into
> /sys/class/thermal. So you get the lines from # BOF # to # EOF #
> below. I hope they're readable without more comments.
>
> The most remarkable changes, in my eyes, had happened within
> "thermal_zone1".
>
> Best regards,
> Manuel Krause
>
>
> # BOF #
> Following ones are all from /sys/class/thermal/ which are links
> to -> ../../devices/virtual/thermal/
>
> I've listed the directories in sections of cooling_devices and
> thermal_zones separately for each bad/good kernel. For Emailing
> purposes only. You can merge them into a spreadsheet for your
> evaluation on your own. I've left out reporting some subdirs and
> subdir's values that _really_ didn't seem to need attention.
>
> Also, I've had collected the #sensors output for each readout,
> having reproduced nearly the same workload, represented by the
> "Fan speed" (thermal_zone4==FDTZ).
>
> And I've done my very best to not produce typos or c&p errors.
>
>
>   3.13.5 -- 20140309 -- 20:52 -- bad
> =============================
> dir             |-
>                   /type       /cur_state  /max_state
> cooling_device0  Processor    0          10
> cooling_device1  Processor    0          10
> cooling_device2  Fan          0           1
> cooling_device3  Fan          1           1
> cooling_device4  Fan          0           1
> cooling_device5  Fan          0           1
> cooling_device6  Fan          0           1
> cooling_device7  LCD          0          24
>
>   3.12.13 -- 20140310 -- 00:26 -- good
> ==============================
> dir             |-
>                   /type       /cur_state  /max_state
> cooling_device0  Processor    0          10
> cooling_device1  Processor    0          10
> cooling_device2  Fan          0           1
> cooling_device3  Fan          1           1
> cooling_device4  Fan          1           1
> cooling_device5  Fan          1           1
> cooling_device6  Fan          1           1
> cooling_device7  LCD          0          24
>
>
>   3.13.5 -- 20140309 -- 20:52 -- bad
> =============================
> dir          |-
>                /passive /temp  |-     /cdev?_  /trip_   /trip_
>                                        trip_    point_   point_
>                                        point    ?_temp   ?_type
> thermal_zone0  0        68000   ?=0    n.a.   256000   critical
> thermal_zone1   n.a.    70000 |-
>                                  ?=0   6       110000   critical
>                                  ?=1   5       107000   passive
>                                  ?=2   4        90000   active
>                                  ?=3   3        75000   active
>                                  ?=4   2        55000   active
>                                  ?=5   1        45000   active
>                                  ?=6   1        30000   active
> thermal_zone2   n.a.    54000 |-
>                                  ?=0   1       105000   critical
>                                  ?=1   1        95000   passive
> thermal_zone3   n.a.    25800 |-
>                                  ?=0   1       110000   critical
>                                  ?=1   1        60000   passive
> thermal_zone4  0        58000   ?=0    n.a.   110000   critical
>
>
>   3.12.13 -- 20140310 -- 00:26 -- good
> ==============================
> dir          |-
>                /passive /temp  |-     /cdev?_  /trip_   /trip_
>                                        trip_    point_   point_
>                                        point    ?_temp   ?_type
> thermal_zone0  0        50000   ?=0    n.a.   256000   critical
> thermal_zone1   n.a.    70000 |-
>                                  ?=0   1       110000   critical
>                                  ?=1   1       107000   passive
>                                  ?=2   2        90000   active
>                                  ?=3   3        67000   active
>                                  ?=4   4        55000   active
>                                  ?=5   5        45000   active
>                                  ?=6   6        30000   active
> thermal_zone2   n.a.    53000 |-
>                                  ?=0   1       105000   critical
>                                  ?=1   1        95000   passive
> thermal_zone3   n.a.    25600 |-
>                                  ?=0   1       110000   critical
>                                  ?=1   1        60000   passive
> thermal_zone4  0        58000   ?=0    n.a.   110000   critical
>
> ---
> Legend here:
>         /type  is always  acpitz
>         /mode             enabled
>         /policy           step_wise
>
>        - from kernel ACPI initialisation: thermal_zone0==DTSZ,
>           thermal_zone1==CPUZ, thermal_zone2==SKNZ,
>           thermal_zone3==BATZ, thermal_zone4==FDTZ
>        - n.a. means      file or value is not available
> ___
> Legend in general:
>               /power/control          is always  auto
>               /power/runtime_status              unsupported
>               /uevent                            ''==empty
>
> ----------------------------------------------------------------
>
>   3.13.5 -- 20140309 -- 20:52 -- bad
> =============================
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1:        +68.0°C  (crit = +256.0°C)
> temp2:        +70.0°C  (crit = +110.0°C)
> temp3:        +54.0°C  (crit = +105.0°C)
> temp4:        +25.8°C  (crit = +110.0°C)
> temp5:        +58.0°C  (crit = +110.0°C)
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +66.0°C  (high = +105.0°C, crit = +105.0°C)
> Core 1:       +63.0°C  (high = +105.0°C, crit = +105.0°C)
>
>
>   3.12.13 -- 20140310 -- 00:26 -- good
> ==============================
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1:        +50.0°C  (crit = +256.0°C)
> temp2:        +70.0°C  (crit = +110.0°C)
> temp3:        +53.0°C  (crit = +105.0°C)
> temp4:        +25.6°C  (crit = +110.0°C)
> temp5:        +58.0°C  (crit = +110.0°C)
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +65.0°C  (high = +105.0°C, crit = +105.0°C)
> Core 1:       +61.0°C  (high = +105.0°C, crit = +105.0°C)
>
> # EOF #
>
>

Hi, and thank you for your attention ^^

at the bottom of this email you'd get the actual values for the 
new 3.12.14 kernel for two different levels of usage and ambient 
temperature.
You'd read, in kernel 3.12.14 the /cdev?_trip_point enumeration 
has changed to the way of 3.13.? and also one /trip_point_?_temp 
did. But 3.12.14 is working as well as 3.12.13. (So my first 
eyecatcher didn't lead to useful things.)
I'm not capaple of finding or understanding the related code, 
but, please, let me present an idea of what MAY be going on:

In 3.12.13+, on my system, the effective cooling fan speed seems 
to be an accumulation, maybe bitwise, of 
cooling_device[2-6]/cur_state, that each get activated (=1) by a 
certain other temperature value or level; each of the 
cooling_device[2-6]/cur_state stays @1 as long as their ref. 
temp. does not undershoot. For my system this ref. temp.  would 
most likely be triggered by temp2 == thermal_zone1/temp [CPUZ].

In 3.13.? there seems to get only one of 
cooling_device[2-6]/cur_state be set to 1, the others left and/or 
rewritten with 0. And the fan speed algorithm then accumulates 
only one 1 without seeing the [_LEVEL_] number of 
cooling_device[2-6]... or re-requesting the related trigger 
temperature.

I hope this leads you developers nearer to a conclusion on how to 
fix it,
best regards, Manuel Krause

_____________________________
3.12.14 -- 20140311 -- 19:07 -- changed, not broken -- normal use
=============================
/sys/class/thermal/*  which
are links to -> ../../devices/virtual/thermal/*

dir             |-
                  /type       /cur_state  /max_state  Maybe
                                                       trigger
                                                       /PWM
...
cooling_device2  Fan          0           1          not yet
                                                       observed
cooling_device3  Fan          0           1          FDTZ==58°C
cooling_device4  Fan          1           1          FDTZ==45°C
cooling_device5  Fan          1           1          FDTZ==34°C
cooling_device6  Fan          1           1          FDTZ==25°C
...

dir          |-
               /passive /temp  |-     /cdev?_  /trip_   /trip_
                                       trip_    point_   point_
                                       point    ?_temp   ?_type
...
thermal_zone1   n.a.    73000 |- 
(CPUZ)
                                 ?=0   6       110000   critical
                                 ?=1   5       107000   passive
                                 ?=2   4        90000   active
                                 ?=3   3        75000   active
                                 ?=4   2        55000   active
                                 ?=5   1        45000   active
                                 ?=6   1        30000   active
...
thermal_zone4   n.a.    45000   ?=0    n.a.   110000   critical 
(FDTZ)
...

# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +46.0°C  (crit = +256.0°C)
temp2:        +73.0°C  (crit = +110.0°C)
temp3:        +57.0°C  (crit = +105.0°C)
temp4:        +26.3°C  (crit = +110.0°C)
temp5:        +45.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +68.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +66.0°C  (high = +105.0°C, crit = +105.0°C)


_____________________________
3.12.14 -- 20140311 -- 21:09 -- changed, not broken -- idle state
=============================

dir             |-
                  /type       /cur_state  /max_state  Maybe
                                                       trigger
                                                       /PWM
...
cooling_device2  Fan          0           1          not yet
                                                       observed
cooling_device3  Fan          0           1          FDTZ==58°C
cooling_device4  Fan          0           1          FDTZ==45°C
cooling_device5  Fan          0           1          FDTZ==34°C
cooling_device6  Fan          1           1          FDTZ==25°C
...

dir          |-
               /passive /temp
thermal_zone1   n.a.    46000 ... (CPUZ)
...
thermal_zone4   n.a.    25000 ... (FDTZ)
...

# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +50.0°C  (crit = +256.0°C)
temp2:        +46.0°C  (crit = +110.0°C)
temp3:        +44.0°C  (crit = +105.0°C)
temp4:        +25.7°C  (crit = +110.0°C)
temp5:        +25.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +41.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +41.0°C  (high = +105.0°C, crit = +105.0°C)
_____________________________



_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
       [not found]                   ` <532B4DC5.4010705@netscape.net>
@ 2014-03-31 23:37                       ` Manuel Krause
  0 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-03-31 23:37 UTC (permalink / raw)
  To: Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang
  Cc: Guenter Roeck, Jean Delvare, lm-sensors

On 2014-03-20 21:21, Manuel Krause wrote:
> On 2014-03-11 22:59, Manuel Krause wrote:
>> On 2014-03-10 02:49, Manuel Krause wrote:
>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>> wrote:
> [SNIP]
>
> Long time no reply from you... Have I overseen a unwritten
> convention? Or were my charts that unusable for your analysis/work?
>
> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
> persists. "Strange / dangerous fan policy..."
>
> Since kernel 3.13.6 I've managed to 'fix' the potential
> overheating problem by manually issuing a:
> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
> _before_ obviously critical temperatures occur. Remind: This
> particular setting may only work for my system! ...and keeps
> working for 3.14-rc.
>
> In the following I'd like to present you a modified output of my
> /sys/class/thermal, that I've written a script for (for my
> system), that shows the results in the way of
> linux/Documentation/thermal/sysfs-api.txt, point 3:
> {I've uploded the files to pastebin, to not swamp you and the
> lists with so many lines of logs.}
>
> For the last good kernel -- 3.12.14 -- in-use:
>   http://pastebin.com/HL1PNcda
> For my first bad kernel revision 3.13 -- at critical temp:
>   http://pastebin.com/98hgf1a9
> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>   http://pastebin.com/MuTwTnjD
> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>   *) command:
>   http://pastebin.com/2peda54z
>
> Please, have a look at them! And maybe, give me hints on how I
> can help you to further debug this issue, as my manual method
> works but it's annoying.
>
> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
> Email-thread to someone in charge.
>
> Thank you for your work && best regards,
> Manuel Krause
>

This is still BUG 71711
https://bugzilla.kernel.org/show_bug.cgi?id=71711

3.12.15 works very well
3.13.7 fails
3.14.0-rc8 fails

I've tried the tmon tool, now, too. Nice eyecandy and for monitoring!

I've tried to revert all "thermal" related patches from 
3.12.14->3.13.7 from 3.13.7. But they don't seem to matter. (Even 
if I apply the vice-versa patch to 3.12.15.)

So "thermal" is out?

For the failing kernels: Not any reached trip point (active) 
triggers ONE fan action!

Next would be ACPI, to be investigated,

THX for this audience,
Manuel Krause


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-03-31 23:37                       ` Manuel Krause
  0 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-03-31 23:37 UTC (permalink / raw)
  To: Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang
  Cc: Guenter Roeck, Jean Delvare, lm-sensors

On 2014-03-20 21:21, Manuel Krause wrote:
> On 2014-03-11 22:59, Manuel Krause wrote:
>> On 2014-03-10 02:49, Manuel Krause wrote:
>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>> wrote:
> [SNIP]
>
> Long time no reply from you... Have I overseen a unwritten
> convention? Or were my charts that unusable for your analysis/work?
>
> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
> persists. "Strange / dangerous fan policy..."
>
> Since kernel 3.13.6 I've managed to 'fix' the potential
> overheating problem by manually issuing a:
> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
> _before_ obviously critical temperatures occur. Remind: This
> particular setting may only work for my system! ...and keeps
> working for 3.14-rc.
>
> In the following I'd like to present you a modified output of my
> /sys/class/thermal, that I've written a script for (for my
> system), that shows the results in the way of
> linux/Documentation/thermal/sysfs-api.txt, point 3:
> {I've uploded the files to pastebin, to not swamp you and the
> lists with so many lines of logs.}
>
> For the last good kernel -- 3.12.14 -- in-use:
>   http://pastebin.com/HL1PNcda
> For my first bad kernel revision 3.13 -- at critical temp:
>   http://pastebin.com/98hgf1a9
> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>   http://pastebin.com/MuTwTnjD
> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>   *) command:
>   http://pastebin.com/2peda54z
>
> Please, have a look at them! And maybe, give me hints on how I
> can help you to further debug this issue, as my manual method
> works but it's annoying.
>
> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
> Email-thread to someone in charge.
>
> Thank you for your work && best regards,
> Manuel Krause
>

This is still BUG 71711
https://bugzilla.kernel.org/show_bug.cgi?idq711

3.12.15 works very well
3.13.7 fails
3.14.0-rc8 fails

I've tried the tmon tool, now, too. Nice eyecandy and for monitoring!

I've tried to revert all "thermal" related patches from 
3.12.14->3.13.7 from 3.13.7. But they don't seem to matter. (Even 
if I apply the vice-versa patch to 3.12.15.)

So "thermal" is out?

For the failing kernels: Not any reached trip point (active) 
triggers ONE fan action!

Next would be ACPI, to be investigated,

THX for this audience,
Manuel Krause


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-31 23:37                       ` [lm-sensors] " Manuel Krause
@ 2014-03-31 23:47                         ` Guenter Roeck
  -1 siblings, 0 replies; 45+ messages in thread
From: Guenter Roeck @ 2014-03-31 23:47 UTC (permalink / raw)
  To: Manuel Krause, Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang
  Cc: Jean Delvare, lm-sensors

On 03/31/2014 04:37 PM, Manuel Krause wrote:
> On 2014-03-20 21:21, Manuel Krause wrote:
>> On 2014-03-11 22:59, Manuel Krause wrote:
>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>> wrote:
>> [SNIP]
>>
>> Long time no reply from you... Have I overseen a unwritten
>> convention? Or were my charts that unusable for your analysis/work?
>>
>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>> persists. "Strange / dangerous fan policy..."
>>
>> Since kernel 3.13.6 I've managed to 'fix' the potential
>> overheating problem by manually issuing a:
>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>> _before_ obviously critical temperatures occur. Remind: This
>> particular setting may only work for my system! ...and keeps
>> working for 3.14-rc.
>>
>> In the following I'd like to present you a modified output of my
>> /sys/class/thermal, that I've written a script for (for my
>> system), that shows the results in the way of
>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>> {I've uploded the files to pastebin, to not swamp you and the
>> lists with so many lines of logs.}
>>
>> For the last good kernel -- 3.12.14 -- in-use:
>>   http://pastebin.com/HL1PNcda
>> For my first bad kernel revision 3.13 -- at critical temp:
>>   http://pastebin.com/98hgf1a9
>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>   http://pastebin.com/MuTwTnjD
>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>   *) command:
>>   http://pastebin.com/2peda54z
>>
>> Please, have a look at them! And maybe, give me hints on how I
>> can help you to further debug this issue, as my manual method
>> works but it's annoying.
>>
>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>> Email-thread to someone in charge.
>>
>> Thank you for your work && best regards,
>> Manuel Krause
>>
>
> This is still BUG 71711
> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>
> 3.12.15 works very well
> 3.13.7 fails
> 3.14.0-rc8 fails
>

Best you can do would really be to bisect the problem.
Unfortunately only you (or someone else with an affected system)
can do that. Once the culprit is known it would be much easier
to get it fixed.

To answer your earlier question: I don't think you did anything wrong.
I guess everyone else is just as clueless as I am (if not, speak up
and help ;-).

Guenter


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-03-31 23:47                         ` Guenter Roeck
  0 siblings, 0 replies; 45+ messages in thread
From: Guenter Roeck @ 2014-03-31 23:47 UTC (permalink / raw)
  To: Manuel Krause, Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang
  Cc: Jean Delvare, lm-sensors

On 03/31/2014 04:37 PM, Manuel Krause wrote:
> On 2014-03-20 21:21, Manuel Krause wrote:
>> On 2014-03-11 22:59, Manuel Krause wrote:
>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>> wrote:
>> [SNIP]
>>
>> Long time no reply from you... Have I overseen a unwritten
>> convention? Or were my charts that unusable for your analysis/work?
>>
>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>> persists. "Strange / dangerous fan policy..."
>>
>> Since kernel 3.13.6 I've managed to 'fix' the potential
>> overheating problem by manually issuing a:
>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>> _before_ obviously critical temperatures occur. Remind: This
>> particular setting may only work for my system! ...and keeps
>> working for 3.14-rc.
>>
>> In the following I'd like to present you a modified output of my
>> /sys/class/thermal, that I've written a script for (for my
>> system), that shows the results in the way of
>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>> {I've uploded the files to pastebin, to not swamp you and the
>> lists with so many lines of logs.}
>>
>> For the last good kernel -- 3.12.14 -- in-use:
>>   http://pastebin.com/HL1PNcda
>> For my first bad kernel revision 3.13 -- at critical temp:
>>   http://pastebin.com/98hgf1a9
>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>   http://pastebin.com/MuTwTnjD
>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>   *) command:
>>   http://pastebin.com/2peda54z
>>
>> Please, have a look at them! And maybe, give me hints on how I
>> can help you to further debug this issue, as my manual method
>> works but it's annoying.
>>
>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>> Email-thread to someone in charge.
>>
>> Thank you for your work && best regards,
>> Manuel Krause
>>
>
> This is still BUG 71711
> https://bugzilla.kernel.org/show_bug.cgi?idq711
>
> 3.12.15 works very well
> 3.13.7 fails
> 3.14.0-rc8 fails
>

Best you can do would really be to bisect the problem.
Unfortunately only you (or someone else with an affected system)
can do that. Once the culprit is known it would be much easier
to get it fixed.

To answer your earlier question: I don't think you did anything wrong.
I guess everyone else is just as clueless as I am (if not, speak up
and help ;-).

Guenter


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-31 23:47                         ` [lm-sensors] " Guenter Roeck
@ 2014-04-06  2:37                           ` Manuel Krause
  -1 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-04-06  2:37 UTC (permalink / raw)
  To: Guenter Roeck, Rafael J. Wysocki, linux-kernel, linux-pm,
	rui.zhang, Jean Delvare, lm-sensors

On 2014-04-01 01:47, Guenter Roeck wrote:
> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>> On 2014-03-20 21:21, Manuel Krause wrote:
>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>> wrote:
>>> [SNIP]
>>>
>>> Long time no reply from you... Have I overseen a unwritten
>>> convention? Or were my charts that unusable for your
>>> analysis/work?
>>>
>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>> persists. "Strange / dangerous fan policy..."
>>>
>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>> overheating problem by manually issuing a:
>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>> _before_ obviously critical temperatures occur. Remind: This
>>> particular setting may only work for my system! ...and keeps
>>> working for 3.14-rc.
>>>
>>> In the following I'd like to present you a modified output of my
>>> /sys/class/thermal, that I've written a script for (for my
>>> system), that shows the results in the way of
>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>> {I've uploded the files to pastebin, to not swamp you and the
>>> lists with so many lines of logs.}
>>>
>>> For the last good kernel -- 3.12.14 -- in-use:
>>>   http://pastebin.com/HL1PNcda
>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>   http://pastebin.com/98hgf1a9
>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>   http://pastebin.com/MuTwTnjD
>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>   *) command:
>>>   http://pastebin.com/2peda54z
>>>
>>> Please, have a look at them! And maybe, give me hints on how I
>>> can help you to further debug this issue, as my manual method
>>> works but it's annoying.
>>>
>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>> Email-thread to someone in charge.
>>>
>>> Thank you for your work && best regards,
>>> Manuel Krause
>>>
>>
>> This is still BUG 71711
>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>
>> 3.12.15 works very well
>> 3.13.7 fails
>> 3.14.0-rc8 fails
>>
>
> Best you can do would really be to bisect the problem.
> Unfortunately only you (or someone else with an affected system)
> can do that. Once the culprit is known it would be much easier
> to get it fixed.
>
> To answer your earlier question: I don't think you did anything
> wrong.
> I guess everyone else is just as clueless as I am (if not, speak up
> and help ;-).
>
> Guenter
>

I've now bisected two times. From two different kernel origins, 
just to be sure, as I'm new to this stupid-and-lengthy method, 
and, to be sure, I haven't given a false positive inbetween due 
to boredom.

In the end it says each time:
# git bisect bad | tee -a /var/log/bisect.log
cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
commit cc8ef52707341e67a12067d6ead991d56ea017ca
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Wed Sep 25 20:39:45 2013 +0800

     ACPI / AC: convert ACPI ac driver to platform bus

     Signed-off-by: Zhang Rui <rui.zhang@intel.com>
     Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

:040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 
4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers


Please help me, on how I can help debug this more, and please 
also read the newest from
https://bugzilla.kernel.org/show_bug.cgi?id=71711

Manuel Krause


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-04-06  2:37                           ` Manuel Krause
  0 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-04-06  2:37 UTC (permalink / raw)
  To: Guenter Roeck, Rafael J. Wysocki, linux-kernel, linux-pm,
	rui.zhang, Jean Delvare, lm-sensors

On 2014-04-01 01:47, Guenter Roeck wrote:
> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>> On 2014-03-20 21:21, Manuel Krause wrote:
>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>> wrote:
>>> [SNIP]
>>>
>>> Long time no reply from you... Have I overseen a unwritten
>>> convention? Or were my charts that unusable for your
>>> analysis/work?
>>>
>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>> persists. "Strange / dangerous fan policy..."
>>>
>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>> overheating problem by manually issuing a:
>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>> _before_ obviously critical temperatures occur. Remind: This
>>> particular setting may only work for my system! ...and keeps
>>> working for 3.14-rc.
>>>
>>> In the following I'd like to present you a modified output of my
>>> /sys/class/thermal, that I've written a script for (for my
>>> system), that shows the results in the way of
>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>> {I've uploded the files to pastebin, to not swamp you and the
>>> lists with so many lines of logs.}
>>>
>>> For the last good kernel -- 3.12.14 -- in-use:
>>>   http://pastebin.com/HL1PNcda
>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>   http://pastebin.com/98hgf1a9
>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>   http://pastebin.com/MuTwTnjD
>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>   *) command:
>>>   http://pastebin.com/2peda54z
>>>
>>> Please, have a look at them! And maybe, give me hints on how I
>>> can help you to further debug this issue, as my manual method
>>> works but it's annoying.
>>>
>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>> Email-thread to someone in charge.
>>>
>>> Thank you for your work && best regards,
>>> Manuel Krause
>>>
>>
>> This is still BUG 71711
>> https://bugzilla.kernel.org/show_bug.cgi?idq711
>>
>> 3.12.15 works very well
>> 3.13.7 fails
>> 3.14.0-rc8 fails
>>
>
> Best you can do would really be to bisect the problem.
> Unfortunately only you (or someone else with an affected system)
> can do that. Once the culprit is known it would be much easier
> to get it fixed.
>
> To answer your earlier question: I don't think you did anything
> wrong.
> I guess everyone else is just as clueless as I am (if not, speak up
> and help ;-).
>
> Guenter
>

I've now bisected two times. From two different kernel origins, 
just to be sure, as I'm new to this stupid-and-lengthy method, 
and, to be sure, I haven't given a false positive inbetween due 
to boredom.

In the end it says each time:
# git bisect bad | tee -a /var/log/bisect.log
cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
commit cc8ef52707341e67a12067d6ead991d56ea017ca
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Wed Sep 25 20:39:45 2013 +0800

     ACPI / AC: convert ACPI ac driver to platform bus

     Signed-off-by: Zhang Rui <rui.zhang@intel.com>
     Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

:040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 
4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers


Please help me, on how I can help debug this more, and please 
also read the newest from
https://bugzilla.kernel.org/show_bug.cgi?idq711

Manuel Krause


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-04-06  2:37                           ` [lm-sensors] " Manuel Krause
@ 2014-04-06  2:43                             ` Guenter Roeck
  -1 siblings, 0 replies; 45+ messages in thread
From: Guenter Roeck @ 2014-04-06  2:43 UTC (permalink / raw)
  To: Manuel Krause, Rafael J. Wysocki, linux-kernel, linux-pm,
	rui.zhang, Jean Delvare, lm-sensors

On 04/05/2014 07:37 PM, Manuel Krause wrote:
> On 2014-04-01 01:47, Guenter Roeck wrote:
>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>>> wrote:
>>>> [SNIP]
>>>>
>>>> Long time no reply from you... Have I overseen a unwritten
>>>> convention? Or were my charts that unusable for your
>>>> analysis/work?
>>>>
>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>>> persists. "Strange / dangerous fan policy..."
>>>>
>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>> overheating problem by manually issuing a:
>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>> _before_ obviously critical temperatures occur. Remind: This
>>>> particular setting may only work for my system! ...and keeps
>>>> working for 3.14-rc.
>>>>
>>>> In the following I'd like to present you a modified output of my
>>>> /sys/class/thermal, that I've written a script for (for my
>>>> system), that shows the results in the way of
>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>> {I've uploded the files to pastebin, to not swamp you and the
>>>> lists with so many lines of logs.}
>>>>
>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>   http://pastebin.com/HL1PNcda
>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>   http://pastebin.com/98hgf1a9
>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>   http://pastebin.com/MuTwTnjD
>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>   *) command:
>>>>   http://pastebin.com/2peda54z
>>>>
>>>> Please, have a look at them! And maybe, give me hints on how I
>>>> can help you to further debug this issue, as my manual method
>>>> works but it's annoying.
>>>>
>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>> Email-thread to someone in charge.
>>>>
>>>> Thank you for your work && best regards,
>>>> Manuel Krause
>>>>
>>>
>>> This is still BUG 71711
>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>
>>> 3.12.15 works very well
>>> 3.13.7 fails
>>> 3.14.0-rc8 fails
>>>
>>
>> Best you can do would really be to bisect the problem.
>> Unfortunately only you (or someone else with an affected system)
>> can do that. Once the culprit is known it would be much easier
>> to get it fixed.
>>
>> To answer your earlier question: I don't think you did anything
>> wrong.
>> I guess everyone else is just as clueless as I am (if not, speak up
>> and help ;-).
>>
>> Guenter
>>
>
> I've now bisected two times. From two different kernel origins, just to be sure, as I'm new to this stupid-and-lengthy method, and, to be sure, I haven't given a false positive inbetween due to boredom.
>

Not really. Keep in mint that you were able to track down the bad commit
among more than 10,000 commits in a reasonably short period of time.

> In the end it says each time:
> # git bisect bad | tee -a /var/log/bisect.log
> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
> commit cc8ef52707341e67a12067d6ead991d56ea017ca
> Author: Zhang Rui <rui.zhang@intel.com>
> Date:   Wed Sep 25 20:39:45 2013 +0800
>
>      ACPI / AC: convert ACPI ac driver to platform bus
>
>      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>      Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
Off to the two of you...

Guenter

> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>
>
> Please help me, on how I can help debug this more, and please also read the newest from
> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>
> Manuel Krause
>
>
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-04-06  2:43                             ` Guenter Roeck
  0 siblings, 0 replies; 45+ messages in thread
From: Guenter Roeck @ 2014-04-06  2:43 UTC (permalink / raw)
  To: Manuel Krause, Rafael J. Wysocki, linux-kernel, linux-pm,
	rui.zhang, Jean Delvare, lm-sensors

On 04/05/2014 07:37 PM, Manuel Krause wrote:
> On 2014-04-01 01:47, Guenter Roeck wrote:
>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>>> wrote:
>>>> [SNIP]
>>>>
>>>> Long time no reply from you... Have I overseen a unwritten
>>>> convention? Or were my charts that unusable for your
>>>> analysis/work?
>>>>
>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>>> persists. "Strange / dangerous fan policy..."
>>>>
>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>> overheating problem by manually issuing a:
>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>> _before_ obviously critical temperatures occur. Remind: This
>>>> particular setting may only work for my system! ...and keeps
>>>> working for 3.14-rc.
>>>>
>>>> In the following I'd like to present you a modified output of my
>>>> /sys/class/thermal, that I've written a script for (for my
>>>> system), that shows the results in the way of
>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>> {I've uploded the files to pastebin, to not swamp you and the
>>>> lists with so many lines of logs.}
>>>>
>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>   http://pastebin.com/HL1PNcda
>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>   http://pastebin.com/98hgf1a9
>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>   http://pastebin.com/MuTwTnjD
>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>   *) command:
>>>>   http://pastebin.com/2peda54z
>>>>
>>>> Please, have a look at them! And maybe, give me hints on how I
>>>> can help you to further debug this issue, as my manual method
>>>> works but it's annoying.
>>>>
>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>> Email-thread to someone in charge.
>>>>
>>>> Thank you for your work && best regards,
>>>> Manuel Krause
>>>>
>>>
>>> This is still BUG 71711
>>> https://bugzilla.kernel.org/show_bug.cgi?idq711
>>>
>>> 3.12.15 works very well
>>> 3.13.7 fails
>>> 3.14.0-rc8 fails
>>>
>>
>> Best you can do would really be to bisect the problem.
>> Unfortunately only you (or someone else with an affected system)
>> can do that. Once the culprit is known it would be much easier
>> to get it fixed.
>>
>> To answer your earlier question: I don't think you did anything
>> wrong.
>> I guess everyone else is just as clueless as I am (if not, speak up
>> and help ;-).
>>
>> Guenter
>>
>
> I've now bisected two times. From two different kernel origins, just to be sure, as I'm new to this stupid-and-lengthy method, and, to be sure, I haven't given a false positive inbetween due to boredom.
>

Not really. Keep in mint that you were able to track down the bad commit
among more than 10,000 commits in a reasonably short period of time.

> In the end it says each time:
> # git bisect bad | tee -a /var/log/bisect.log
> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
> commit cc8ef52707341e67a12067d6ead991d56ea017ca
> Author: Zhang Rui <rui.zhang@intel.com>
> Date:   Wed Sep 25 20:39:45 2013 +0800
>
>      ACPI / AC: convert ACPI ac driver to platform bus
>
>      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>      Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
Off to the two of you...

Guenter

> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>
>
> Please help me, on how I can help debug this more, and please also read the newest from
> https://bugzilla.kernel.org/show_bug.cgi?idq711
>
> Manuel Krause
>
>
>


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-04-06  2:43                             ` [lm-sensors] " Guenter Roeck
  (?)
@ 2014-04-06 23:17                               ` Manuel Krause
  -1 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-04-06 23:17 UTC (permalink / raw)
  To: Guenter Roeck, Rafael J. Wysocki, linux-kernel, linux-pm,
	rui.zhang, Jean Delvare, lm-sensors

On 2014-04-06 04:43, Guenter Roeck wrote:
> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>>>> wrote:
>>>>> [SNIP]
>>>>>
>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>> convention? Or were my charts that unusable for your
>>>>> analysis/work?
>>>>>
>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>>>> persists. "Strange / dangerous fan policy..."
>>>>>
>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>> overheating problem by manually issuing a:
>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>> particular setting may only work for my system! ...and keeps
>>>>> working for 3.14-rc.
>>>>>
>>>>> In the following I'd like to present you a modified output
>>>>> of my
>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>> system), that shows the results in the way of
>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>> {I've uploded the files to pastebin, to not swamp you and the
>>>>> lists with so many lines of logs.}
>>>>>
>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>   http://pastebin.com/HL1PNcda
>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>   http://pastebin.com/98hgf1a9
>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>   http://pastebin.com/MuTwTnjD
>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>   *) command:
>>>>>   http://pastebin.com/2peda54z
>>>>>
>>>>> Please, have a look at them! And maybe, give me hints on how I
>>>>> can help you to further debug this issue, as my manual method
>>>>> works but it's annoying.
>>>>>
>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>> Email-thread to someone in charge.
>>>>>
>>>>> Thank you for your work && best regards,
>>>>> Manuel Krause
>>>>>
>>>>
>>>> This is still BUG 71711
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>
>>>> 3.12.15 works very well
>>>> 3.13.7 fails
>>>> 3.14.0-rc8 fails
>>>>
>>>
>>> Best you can do would really be to bisect the problem.
>>> Unfortunately only you (or someone else with an affected system)
>>> can do that. Once the culprit is known it would be much easier
>>> to get it fixed.
>>>
>>> To answer your earlier question: I don't think you did anything
>>> wrong.
>>> I guess everyone else is just as clueless as I am (if not,
>>> speak up
>>> and help ;-).
>>>
>>> Guenter
>>>
>>
>> I've now bisected two times. From two different kernel origins,
>> just to be sure, as I'm new to this stupid-and-lengthy method,
>> and, to be sure, I haven't given a false positive inbetween due
>> to boredom.
>>
>
> Not really. Keep in mint that you were able to track down the bad
> commit
> among more than 10,000 commits in a reasonably short period of time.
>
>> In the end it says each time:
>> # git bisect bad | tee -a /var/log/bisect.log
>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>> Author: Zhang Rui <rui.zhang@intel.com>
>> Date:   Wed Sep 25 20:39:45 2013 +0800
>>
>>      ACPI / AC: convert ACPI ac driver to platform bus
>>
>>      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>      Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
> Off to the two of you...
>
> Guenter
>
>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>>
>>
>> Please help me, on how I can help debug this more, and please
>> also read the newest from
>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>
>> Manuel Krause
>>
>>
>>
>

Sorry, that I've forgotton to add the following last night: After 
the first bisection round, I was so glad about a result that 
time, that I reverted this mentioned patch from the 3.13.8 
kernel, but this didn't fix it. Must be something that came 
later: But you all understand more of what you've coded.

Best regards, Manuel Krause


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
@ 2014-04-06 23:17                               ` Manuel Krause
  0 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-04-06 23:17 UTC (permalink / raw)
  To: Guenter Roeck, Rafael J. Wysocki, linux-kernel, linux-pm,
	rui.zhang, Jean Delvare, lm-sensors

On 2014-04-06 04:43, Guenter Roeck wrote:
> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>>>> wrote:
>>>>> [SNIP]
>>>>>
>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>> convention? Or were my charts that unusable for your
>>>>> analysis/work?
>>>>>
>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>>>> persists. "Strange / dangerous fan policy..."
>>>>>
>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>> overheating problem by manually issuing a:
>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>> particular setting may only work for my system! ...and keeps
>>>>> working for 3.14-rc.
>>>>>
>>>>> In the following I'd like to present you a modified output
>>>>> of my
>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>> system), that shows the results in the way of
>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>> {I've uploded the files to pastebin, to not swamp you and the
>>>>> lists with so many lines of logs.}
>>>>>
>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>   http://pastebin.com/HL1PNcda
>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>   http://pastebin.com/98hgf1a9
>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>   http://pastebin.com/MuTwTnjD
>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>   *) command:
>>>>>   http://pastebin.com/2peda54z
>>>>>
>>>>> Please, have a look at them! And maybe, give me hints on how I
>>>>> can help you to further debug this issue, as my manual method
>>>>> works but it's annoying.
>>>>>
>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>> Email-thread to someone in charge.
>>>>>
>>>>> Thank you for your work && best regards,
>>>>> Manuel Krause
>>>>>
>>>>
>>>> This is still BUG 71711
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>
>>>> 3.12.15 works very well
>>>> 3.13.7 fails
>>>> 3.14.0-rc8 fails
>>>>
>>>
>>> Best you can do would really be to bisect the problem.
>>> Unfortunately only you (or someone else with an affected system)
>>> can do that. Once the culprit is known it would be much easier
>>> to get it fixed.
>>>
>>> To answer your earlier question: I don't think you did anything
>>> wrong.
>>> I guess everyone else is just as clueless as I am (if not,
>>> speak up
>>> and help ;-).
>>>
>>> Guenter
>>>
>>
>> I've now bisected two times. From two different kernel origins,
>> just to be sure, as I'm new to this stupid-and-lengthy method,
>> and, to be sure, I haven't given a false positive inbetween due
>> to boredom.
>>
>
> Not really. Keep in mint that you were able to track down the bad
> commit
> among more than 10,000 commits in a reasonably short period of time.
>
>> In the end it says each time:
>> # git bisect bad | tee -a /var/log/bisect.log
>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>> Author: Zhang Rui <rui.zhang@intel.com>
>> Date:   Wed Sep 25 20:39:45 2013 +0800
>>
>>      ACPI / AC: convert ACPI ac driver to platform bus
>>
>>      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>      Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
> Off to the two of you...
>
> Guenter
>
>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>>
>>
>> Please help me, on how I can help debug this more, and please
>> also read the newest from
>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>
>> Manuel Krause
>>
>>
>>
>

Sorry, that I've forgotton to add the following last night: After 
the first bisection round, I was so glad about a result that 
time, that I reverted this mentioned patch from the 3.13.8 
kernel, but this didn't fix it. Must be something that came 
later: But you all understand more of what you've coded.

Best regards, Manuel Krause


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-04-06 23:17                               ` Manuel Krause
  0 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-04-06 23:17 UTC (permalink / raw)
  To: Guenter Roeck, Rafael J. Wysocki, linux-kernel, linux-pm,
	rui.zhang, Jean Delvare, lm-sensors

On 2014-04-06 04:43, Guenter Roeck wrote:
> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>>>> wrote:
>>>>> [SNIP]
>>>>>
>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>> convention? Or were my charts that unusable for your
>>>>> analysis/work?
>>>>>
>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>>>> persists. "Strange / dangerous fan policy..."
>>>>>
>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>> overheating problem by manually issuing a:
>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>> particular setting may only work for my system! ...and keeps
>>>>> working for 3.14-rc.
>>>>>
>>>>> In the following I'd like to present you a modified output
>>>>> of my
>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>> system), that shows the results in the way of
>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>> {I've uploded the files to pastebin, to not swamp you and the
>>>>> lists with so many lines of logs.}
>>>>>
>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>   http://pastebin.com/HL1PNcda
>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>   http://pastebin.com/98hgf1a9
>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>   http://pastebin.com/MuTwTnjD
>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>   *) command:
>>>>>   http://pastebin.com/2peda54z
>>>>>
>>>>> Please, have a look at them! And maybe, give me hints on how I
>>>>> can help you to further debug this issue, as my manual method
>>>>> works but it's annoying.
>>>>>
>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>> Email-thread to someone in charge.
>>>>>
>>>>> Thank you for your work && best regards,
>>>>> Manuel Krause
>>>>>
>>>>
>>>> This is still BUG 71711
>>>> https://bugzilla.kernel.org/show_bug.cgi?idq711
>>>>
>>>> 3.12.15 works very well
>>>> 3.13.7 fails
>>>> 3.14.0-rc8 fails
>>>>
>>>
>>> Best you can do would really be to bisect the problem.
>>> Unfortunately only you (or someone else with an affected system)
>>> can do that. Once the culprit is known it would be much easier
>>> to get it fixed.
>>>
>>> To answer your earlier question: I don't think you did anything
>>> wrong.
>>> I guess everyone else is just as clueless as I am (if not,
>>> speak up
>>> and help ;-).
>>>
>>> Guenter
>>>
>>
>> I've now bisected two times. From two different kernel origins,
>> just to be sure, as I'm new to this stupid-and-lengthy method,
>> and, to be sure, I haven't given a false positive inbetween due
>> to boredom.
>>
>
> Not really. Keep in mint that you were able to track down the bad
> commit
> among more than 10,000 commits in a reasonably short period of time.
>
>> In the end it says each time:
>> # git bisect bad | tee -a /var/log/bisect.log
>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>> Author: Zhang Rui <rui.zhang@intel.com>
>> Date:   Wed Sep 25 20:39:45 2013 +0800
>>
>>      ACPI / AC: convert ACPI ac driver to platform bus
>>
>>      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>      Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
> Off to the two of you...
>
> Guenter
>
>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>>
>>
>> Please help me, on how I can help debug this more, and please
>> also read the newest from
>> https://bugzilla.kernel.org/show_bug.cgi?idq711
>>
>> Manuel Krause
>>
>>
>>
>

Sorry, that I've forgotton to add the following last night: After 
the first bisection round, I was so glad about a result that 
time, that I reverted this mentioned patch from the 3.13.8 
kernel, but this didn't fix it. Must be something that came 
later: But you all understand more of what you've coded.

Best regards, Manuel Krause


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-04-06 23:17                               ` Manuel Krause
@ 2014-04-07 11:45                                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 45+ messages in thread
From: Rafael J. Wysocki @ 2014-04-07 11:45 UTC (permalink / raw)
  To: Manuel Krause
  Cc: Guenter Roeck, linux-kernel, linux-pm, rui.zhang, Jean Delvare,
	lm-sensors

On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
> On 2014-04-06 04:43, Guenter Roeck wrote:
> > On 04/05/2014 07:37 PM, Manuel Krause wrote:
> >> On 2014-04-01 01:47, Guenter Roeck wrote:
> >>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
> >>>> On 2014-03-20 21:21, Manuel Krause wrote:
> >>>>> On 2014-03-11 22:59, Manuel Krause wrote:
> >>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
> >>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
> >>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
> >>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
> >>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
> >>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
> >>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
> >>>>>>>>>>>> wrote:
> >>>>> [SNIP]
> >>>>>
> >>>>> Long time no reply from you... Have I overseen a unwritten
> >>>>> convention? Or were my charts that unusable for your
> >>>>> analysis/work?
> >>>>>
> >>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
> >>>>> persists. "Strange / dangerous fan policy..."
> >>>>>
> >>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
> >>>>> overheating problem by manually issuing a:
> >>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
> >>>>> _before_ obviously critical temperatures occur. Remind: This
> >>>>> particular setting may only work for my system! ...and keeps
> >>>>> working for 3.14-rc.
> >>>>>
> >>>>> In the following I'd like to present you a modified output
> >>>>> of my
> >>>>> /sys/class/thermal, that I've written a script for (for my
> >>>>> system), that shows the results in the way of
> >>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
> >>>>> {I've uploded the files to pastebin, to not swamp you and the
> >>>>> lists with so many lines of logs.}
> >>>>>
> >>>>> For the last good kernel -- 3.12.14 -- in-use:
> >>>>>   http://pastebin.com/HL1PNcda
> >>>>> For my first bad kernel revision 3.13 -- at critical temp:
> >>>>>   http://pastebin.com/98hgf1a9
> >>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
> >>>>>   http://pastebin.com/MuTwTnjD
> >>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
> >>>>>   *) command:
> >>>>>   http://pastebin.com/2peda54z
> >>>>>
> >>>>> Please, have a look at them! And maybe, give me hints on how I
> >>>>> can help you to further debug this issue, as my manual method
> >>>>> works but it's annoying.
> >>>>>
> >>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
> >>>>> Email-thread to someone in charge.
> >>>>>
> >>>>> Thank you for your work && best regards,
> >>>>> Manuel Krause
> >>>>>
> >>>>
> >>>> This is still BUG 71711
> >>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
> >>>>
> >>>> 3.12.15 works very well
> >>>> 3.13.7 fails
> >>>> 3.14.0-rc8 fails
> >>>>
> >>>
> >>> Best you can do would really be to bisect the problem.
> >>> Unfortunately only you (or someone else with an affected system)
> >>> can do that. Once the culprit is known it would be much easier
> >>> to get it fixed.
> >>>
> >>> To answer your earlier question: I don't think you did anything
> >>> wrong.
> >>> I guess everyone else is just as clueless as I am (if not,
> >>> speak up
> >>> and help ;-).
> >>>
> >>> Guenter
> >>>
> >>
> >> I've now bisected two times. From two different kernel origins,
> >> just to be sure, as I'm new to this stupid-and-lengthy method,
> >> and, to be sure, I haven't given a false positive inbetween due
> >> to boredom.
> >>
> >
> > Not really. Keep in mint that you were able to track down the bad
> > commit
> > among more than 10,000 commits in a reasonably short period of time.
> >
> >> In the end it says each time:
> >> # git bisect bad | tee -a /var/log/bisect.log
> >> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
> >> commit cc8ef52707341e67a12067d6ead991d56ea017ca
> >> Author: Zhang Rui <rui.zhang@intel.com>
> >> Date:   Wed Sep 25 20:39:45 2013 +0800
> >>
> >>      ACPI / AC: convert ACPI ac driver to platform bus
> >>
> >>      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> >>      Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>
> > Off to the two of you...
> >
> > Guenter
> >
> >> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
> >> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
> >>
> >>
> >> Please help me, on how I can help debug this more, and please
> >> also read the newest from
> >> https://bugzilla.kernel.org/show_bug.cgi?id=71711
> >>
> >> Manuel Krause
> >>
> >>
> >>
> >
> 
> Sorry, that I've forgotton to add the following last night: After 
> the first bisection round, I was so glad about a result that 
> time, that I reverted this mentioned patch from the 3.13.8 
> kernel, but this didn't fix it.

This means that the commit in question didn't introduce the problem
you're seeing.

Please check out commit 7f2dc5c4bcbf (Merge tag 'dm-3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
build a kernel from that and see if you can reprocude the problem with it.
If so, it can be used as your new "first known bad" kernel for bisection.
Otherwise, you can use it as the "first good" one and commit cc8ef52707341
as "first known bad".

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-04-07 11:45                                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 45+ messages in thread
From: Rafael J. Wysocki @ 2014-04-07 11:45 UTC (permalink / raw)
  To: Manuel Krause
  Cc: Guenter Roeck, linux-kernel, linux-pm, rui.zhang, Jean Delvare,
	lm-sensors

On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
> On 2014-04-06 04:43, Guenter Roeck wrote:
> > On 04/05/2014 07:37 PM, Manuel Krause wrote:
> >> On 2014-04-01 01:47, Guenter Roeck wrote:
> >>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
> >>>> On 2014-03-20 21:21, Manuel Krause wrote:
> >>>>> On 2014-03-11 22:59, Manuel Krause wrote:
> >>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
> >>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
> >>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
> >>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
> >>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
> >>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
> >>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
> >>>>>>>>>>>> wrote:
> >>>>> [SNIP]
> >>>>>
> >>>>> Long time no reply from you... Have I overseen a unwritten
> >>>>> convention? Or were my charts that unusable for your
> >>>>> analysis/work?
> >>>>>
> >>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
> >>>>> persists. "Strange / dangerous fan policy..."
> >>>>>
> >>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
> >>>>> overheating problem by manually issuing a:
> >>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
> >>>>> _before_ obviously critical temperatures occur. Remind: This
> >>>>> particular setting may only work for my system! ...and keeps
> >>>>> working for 3.14-rc.
> >>>>>
> >>>>> In the following I'd like to present you a modified output
> >>>>> of my
> >>>>> /sys/class/thermal, that I've written a script for (for my
> >>>>> system), that shows the results in the way of
> >>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
> >>>>> {I've uploded the files to pastebin, to not swamp you and the
> >>>>> lists with so many lines of logs.}
> >>>>>
> >>>>> For the last good kernel -- 3.12.14 -- in-use:
> >>>>>   http://pastebin.com/HL1PNcda
> >>>>> For my first bad kernel revision 3.13 -- at critical temp:
> >>>>>   http://pastebin.com/98hgf1a9
> >>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
> >>>>>   http://pastebin.com/MuTwTnjD
> >>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
> >>>>>   *) command:
> >>>>>   http://pastebin.com/2peda54z
> >>>>>
> >>>>> Please, have a look at them! And maybe, give me hints on how I
> >>>>> can help you to further debug this issue, as my manual method
> >>>>> works but it's annoying.
> >>>>>
> >>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
> >>>>> Email-thread to someone in charge.
> >>>>>
> >>>>> Thank you for your work && best regards,
> >>>>> Manuel Krause
> >>>>>
> >>>>
> >>>> This is still BUG 71711
> >>>> https://bugzilla.kernel.org/show_bug.cgi?idq711
> >>>>
> >>>> 3.12.15 works very well
> >>>> 3.13.7 fails
> >>>> 3.14.0-rc8 fails
> >>>>
> >>>
> >>> Best you can do would really be to bisect the problem.
> >>> Unfortunately only you (or someone else with an affected system)
> >>> can do that. Once the culprit is known it would be much easier
> >>> to get it fixed.
> >>>
> >>> To answer your earlier question: I don't think you did anything
> >>> wrong.
> >>> I guess everyone else is just as clueless as I am (if not,
> >>> speak up
> >>> and help ;-).
> >>>
> >>> Guenter
> >>>
> >>
> >> I've now bisected two times. From two different kernel origins,
> >> just to be sure, as I'm new to this stupid-and-lengthy method,
> >> and, to be sure, I haven't given a false positive inbetween due
> >> to boredom.
> >>
> >
> > Not really. Keep in mint that you were able to track down the bad
> > commit
> > among more than 10,000 commits in a reasonably short period of time.
> >
> >> In the end it says each time:
> >> # git bisect bad | tee -a /var/log/bisect.log
> >> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
> >> commit cc8ef52707341e67a12067d6ead991d56ea017ca
> >> Author: Zhang Rui <rui.zhang@intel.com>
> >> Date:   Wed Sep 25 20:39:45 2013 +0800
> >>
> >>      ACPI / AC: convert ACPI ac driver to platform bus
> >>
> >>      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> >>      Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>
> > Off to the two of you...
> >
> > Guenter
> >
> >> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
> >> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
> >>
> >>
> >> Please help me, on how I can help debug this more, and please
> >> also read the newest from
> >> https://bugzilla.kernel.org/show_bug.cgi?idq711
> >>
> >> Manuel Krause
> >>
> >>
> >>
> >
> 
> Sorry, that I've forgotton to add the following last night: After 
> the first bisection round, I was so glad about a result that 
> time, that I reverted this mentioned patch from the 3.13.8 
> kernel, but this didn't fix it.

This means that the commit in question didn't introduce the problem
you're seeing.

Please check out commit 7f2dc5c4bcbf (Merge tag 'dm-3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
build a kernel from that and see if you can reprocude the problem with it.
If so, it can be used as your new "first known bad" kernel for bisection.
Otherwise, you can use it as the "first good" one and commit cc8ef52707341
as "first known bad".

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-04-07 11:45                                 ` [lm-sensors] " Rafael J. Wysocki
@ 2014-04-10 22:51                                   ` Manuel Krause
  -1 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-04-10 22:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Guenter Roeck, linux-kernel, linux-pm, rui.zhang, Jean Delvare,
	lm-sensors

On 2014-04-07 13:45, Rafael J. Wysocki wrote:
> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
>> On 2014-04-06 04:43, Guenter Roeck wrote:
>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>>>>>> wrote:
>>>>>>> [SNIP]
>>>>>>>
>>>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>>>> convention? Or were my charts that unusable for your
>>>>>>> analysis/work?
>>>>>>>
>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>>>>>> persists. "Strange / dangerous fan policy..."
>>>>>>>
>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>>>> overheating problem by manually issuing a:
>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>>>> particular setting may only work for my system! ...and keeps
>>>>>>> working for 3.14-rc.
>>>>>>>
>>>>>>> In the following I'd like to present you a modified output
>>>>>>> of my
>>>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>>>> system), that shows the results in the way of
>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>>>> {I've uploded the files to pastebin, to not swamp you and the
>>>>>>> lists with so many lines of logs.}
>>>>>>>
>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>>>    http://pastebin.com/HL1PNcda
>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>>>    http://pastebin.com/98hgf1a9
>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>>>    http://pastebin.com/MuTwTnjD
>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>>>    *) command:
>>>>>>>    http://pastebin.com/2peda54z
>>>>>>>
>>>>>>> Please, have a look at them! And maybe, give me hints on how I
>>>>>>> can help you to further debug this issue, as my manual method
>>>>>>> works but it's annoying.
>>>>>>>
>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>>>> Email-thread to someone in charge.
>>>>>>>
>>>>>>> Thank you for your work && best regards,
>>>>>>> Manuel Krause
>>>>>>>
>>>>>>
>>>>>> This is still BUG 71711
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>>
>>>>>> 3.12.15 works very well
>>>>>> 3.13.7 fails
>>>>>> 3.14.0-rc8 fails
>>>>>>
>>>>>
>>>>> Best you can do would really be to bisect the problem.
>>>>> Unfortunately only you (or someone else with an affected system)
>>>>> can do that. Once the culprit is known it would be much easier
>>>>> to get it fixed.
>>>>>
>>>>> To answer your earlier question: I don't think you did anything
>>>>> wrong.
>>>>> I guess everyone else is just as clueless as I am (if not,
>>>>> speak up
>>>>> and help ;-).
>>>>>
>>>>> Guenter
>>>>>
>>>>
>>>> I've now bisected two times. From two different kernel origins,
>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
>>>> and, to be sure, I haven't given a false positive inbetween due
>>>> to boredom.
>>>>
>>>
>>> Not really. Keep in mint that you were able to track down the bad
>>> commit
>>> among more than 10,000 commits in a reasonably short period of time.
>>>
>>>> In the end it says each time:
>>>> # git bisect bad | tee -a /var/log/bisect.log
>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>>>> Author: Zhang Rui <rui.zhang@intel.com>
>>>> Date:   Wed Sep 25 20:39:45 2013 +0800
>>>>
>>>>       ACPI / AC: convert ACPI ac driver to platform bus
>>>>
>>>>       Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>>       Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>>
>>> Off to the two of you...
>>>
>>> Guenter
>>>
>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>>>>
>>>>
>>>> Please help me, on how I can help debug this more, and please
>>>> also read the newest from
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>
>>>> Manuel Krause
>>>>
>>>>
>>>>
>>>
>>
>> Sorry, that I've forgotton to add the following last night: After
>> the first bisection round, I was so glad about a result that
>> time, that I reverted this mentioned patch from the 3.13.8
>> kernel, but this didn't fix it.
>
> This means that the commit in question didn't introduce the problem
> you're seeing.
>
> Please check out commit 7f2dc5c4bcbf (Merge tag 'dm-3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
> build a kernel from that and see if you can reprocude the problem with it.
> If so, it can be used as your new "first known bad" kernel for bisection.
> Otherwise, you can use it as the "first good" one and commit cc8ef52707341
> as "first known bad".
>
> Thanks!
>

Sorry, for any inconvenience, but you should forget about what 
I've written, that reverting the patch in question from 3.13.x 
didn't fix it. Of course it didn't fix it, as the patch doesn't 
cleanly revert from release-kernels at all. My mistake!

I' ve been guided by Guenter Roeck through two more bisecting 
sessions/ways on this, that always pointed to the commit in question.

Some citation:
Me:
>>> O.k. I've now followed your latest directions:
>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
>>> => result after rebuild was BAD =>
>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
>>> => result after rebuild was GOOD
>>>
[ ...]
>>> Reverting that commit in question from this very git tree makes the
>>> kernel work as expected.
[ ... ]
Guenter:
>> Report the results you have above. That should show without question
>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
>> and it should be easy to reproduce.

That seems to be all I can do for you for now. Please let me know 
of any preliminary patches to test!
And I want to add special thanks to Guenter Roeck for his 
always-just-in-time assistance over so many days,

Manuel Krause



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-04-10 22:51                                   ` Manuel Krause
  0 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-04-10 22:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Guenter Roeck, linux-kernel, linux-pm, rui.zhang, Jean Delvare,
	lm-sensors

On 2014-04-07 13:45, Rafael J. Wysocki wrote:
> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
>> On 2014-04-06 04:43, Guenter Roeck wrote:
>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>>>>>> wrote:
>>>>>>> [SNIP]
>>>>>>>
>>>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>>>> convention? Or were my charts that unusable for your
>>>>>>> analysis/work?
>>>>>>>
>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>>>>>> persists. "Strange / dangerous fan policy..."
>>>>>>>
>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>>>> overheating problem by manually issuing a:
>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>>>> particular setting may only work for my system! ...and keeps
>>>>>>> working for 3.14-rc.
>>>>>>>
>>>>>>> In the following I'd like to present you a modified output
>>>>>>> of my
>>>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>>>> system), that shows the results in the way of
>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>>>> {I've uploded the files to pastebin, to not swamp you and the
>>>>>>> lists with so many lines of logs.}
>>>>>>>
>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>>>    http://pastebin.com/HL1PNcda
>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>>>    http://pastebin.com/98hgf1a9
>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>>>    http://pastebin.com/MuTwTnjD
>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>>>    *) command:
>>>>>>>    http://pastebin.com/2peda54z
>>>>>>>
>>>>>>> Please, have a look at them! And maybe, give me hints on how I
>>>>>>> can help you to further debug this issue, as my manual method
>>>>>>> works but it's annoying.
>>>>>>>
>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>>>> Email-thread to someone in charge.
>>>>>>>
>>>>>>> Thank you for your work && best regards,
>>>>>>> Manuel Krause
>>>>>>>
>>>>>>
>>>>>> This is still BUG 71711
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?idq711
>>>>>>
>>>>>> 3.12.15 works very well
>>>>>> 3.13.7 fails
>>>>>> 3.14.0-rc8 fails
>>>>>>
>>>>>
>>>>> Best you can do would really be to bisect the problem.
>>>>> Unfortunately only you (or someone else with an affected system)
>>>>> can do that. Once the culprit is known it would be much easier
>>>>> to get it fixed.
>>>>>
>>>>> To answer your earlier question: I don't think you did anything
>>>>> wrong.
>>>>> I guess everyone else is just as clueless as I am (if not,
>>>>> speak up
>>>>> and help ;-).
>>>>>
>>>>> Guenter
>>>>>
>>>>
>>>> I've now bisected two times. From two different kernel origins,
>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
>>>> and, to be sure, I haven't given a false positive inbetween due
>>>> to boredom.
>>>>
>>>
>>> Not really. Keep in mint that you were able to track down the bad
>>> commit
>>> among more than 10,000 commits in a reasonably short period of time.
>>>
>>>> In the end it says each time:
>>>> # git bisect bad | tee -a /var/log/bisect.log
>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>>>> Author: Zhang Rui <rui.zhang@intel.com>
>>>> Date:   Wed Sep 25 20:39:45 2013 +0800
>>>>
>>>>       ACPI / AC: convert ACPI ac driver to platform bus
>>>>
>>>>       Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>>       Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>>
>>> Off to the two of you...
>>>
>>> Guenter
>>>
>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>>>>
>>>>
>>>> Please help me, on how I can help debug this more, and please
>>>> also read the newest from
>>>> https://bugzilla.kernel.org/show_bug.cgi?idq711
>>>>
>>>> Manuel Krause
>>>>
>>>>
>>>>
>>>
>>
>> Sorry, that I've forgotton to add the following last night: After
>> the first bisection round, I was so glad about a result that
>> time, that I reverted this mentioned patch from the 3.13.8
>> kernel, but this didn't fix it.
>
> This means that the commit in question didn't introduce the problem
> you're seeing.
>
> Please check out commit 7f2dc5c4bcbf (Merge tag 'dm-3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
> build a kernel from that and see if you can reprocude the problem with it.
> If so, it can be used as your new "first known bad" kernel for bisection.
> Otherwise, you can use it as the "first good" one and commit cc8ef52707341
> as "first known bad".
>
> Thanks!
>

Sorry, for any inconvenience, but you should forget about what 
I've written, that reverting the patch in question from 3.13.x 
didn't fix it. Of course it didn't fix it, as the patch doesn't 
cleanly revert from release-kernels at all. My mistake!

I' ve been guided by Guenter Roeck through two more bisecting 
sessions/ways on this, that always pointed to the commit in question.

Some citation:
Me:
>>> O.k. I've now followed your latest directions:
>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
>>> => result after rebuild was BAD =>
>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
>>> => result after rebuild was GOOD
>>>
[ ...]
>>> Reverting that commit in question from this very git tree makes the
>>> kernel work as expected.
[ ... ]
Guenter:
>> Report the results you have above. That should show without question
>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
>> and it should be easy to reproduce.

That seems to be all I can do for you for now. Please let me know 
of any preliminary patches to test!
And I want to add special thanks to Guenter Roeck for his 
always-just-in-time assistance over so many days,

Manuel Krause



_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-04-10 22:51                                   ` [lm-sensors] " Manuel Krause
@ 2014-04-13  0:05                                     ` Manuel Krause
  -1 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-04-13  0:05 UTC (permalink / raw)
  To: rui.zhang
  Cc: Rafael J. Wysocki, Guenter Roeck, linux-kernel, linux-pm,
	Jean Delvare, lm-sensors

On 2014-04-11 00:51, Manuel Krause wrote:
> On 2014-04-07 13:45, Rafael J. Wysocki wrote:
>> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
>>> On 2014-04-06 04:43, Guenter Roeck wrote:
>>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel
>>>>>>>>>>>>>>> Krause
>>>>>>>>>>>>>>> wrote:
>>>>>>>> [SNIP]
>>>>>>>>
>>>>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>>>>> convention? Or were my charts that unusable for your
>>>>>>>> analysis/work?
>>>>>>>>
>>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the
>>>>>>>> problem
>>>>>>>> persists. "Strange / dangerous fan policy..."
>>>>>>>>
>>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>>>>> overheating problem by manually issuing a:
>>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>>>>> particular setting may only work for my system! ...and keeps
>>>>>>>> working for 3.14-rc.
>>>>>>>>
>>>>>>>> In the following I'd like to present you a modified output
>>>>>>>> of my
>>>>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>>>>> system), that shows the results in the way of
>>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>>>>> {I've uploded the files to pastebin, to not swamp you and
>>>>>>>> the
>>>>>>>> lists with so many lines of logs.}
>>>>>>>>
>>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>>>>    http://pastebin.com/HL1PNcda
>>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>>>>    http://pastebin.com/98hgf1a9
>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>>>>    http://pastebin.com/MuTwTnjD
>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>>>>    *) command:
>>>>>>>>    http://pastebin.com/2peda54z
>>>>>>>>
>>>>>>>> Please, have a look at them! And maybe, give me hints on
>>>>>>>> how I
>>>>>>>> can help you to further debug this issue, as my manual
>>>>>>>> method
>>>>>>>> works but it's annoying.
>>>>>>>>
>>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>>>>> Email-thread to someone in charge.
>>>>>>>>
>>>>>>>> Thank you for your work && best regards,
>>>>>>>> Manuel Krause
>>>>>>>>
>>>>>>>
>>>>>>> This is still BUG 71711
>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>>>
>>>>>>> 3.12.15 works very well
>>>>>>> 3.13.7 fails
>>>>>>> 3.14.0-rc8 fails
>>>>>>>
>>>>>>
>>>>>> Best you can do would really be to bisect the problem.
>>>>>> Unfortunately only you (or someone else with an affected
>>>>>> system)
>>>>>> can do that. Once the culprit is known it would be much easier
>>>>>> to get it fixed.
>>>>>>
>>>>>> To answer your earlier question: I don't think you did
>>>>>> anything
>>>>>> wrong.
>>>>>> I guess everyone else is just as clueless as I am (if not,
>>>>>> speak up
>>>>>> and help ;-).
>>>>>>
>>>>>> Guenter
>>>>>>
>>>>>
>>>>> I've now bisected two times. From two different kernel origins,
>>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
>>>>> and, to be sure, I haven't given a false positive inbetween due
>>>>> to boredom.
>>>>>
>>>>
>>>> Not really. Keep in mint that you were able to track down the
>>>> bad
>>>> commit
>>>> among more than 10,000 commits in a reasonably short period
>>>> of time.
>>>>
>>>>> In the end it says each time:
>>>>> # git bisect bad | tee -a /var/log/bisect.log
>>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad
>>>>> commit
>>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>> Author: Zhang Rui <rui.zhang@intel.com>
>>>>> Date:   Wed Sep 25 20:39:45 2013 +0800
>>>>>
>>>>>       ACPI / AC: convert ACPI ac driver to platform bus
>>>>>
>>>>>       Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>>>       Signed-off-by: Rafael J. Wysocki
>>>>> <rafael.j.wysocki@intel.com>
>>>>>
>>>> Off to the two of you...
>>>>
>>>> Guenter
>>>>
>>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>>>>>
>>>>>
>>>>> Please help me, on how I can help debug this more, and please
>>>>> also read the newest from
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>
>>>>> Manuel Krause
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> Sorry, that I've forgotton to add the following last night: After
>>> the first bisection round, I was so glad about a result that
>>> time, that I reverted this mentioned patch from the 3.13.8
>>> kernel, but this didn't fix it.
>>
>> This means that the commit in question didn't introduce the
>> problem
>> you're seeing.
>>
>> Please check out commit 7f2dc5c4bcbf (Merge tag
>> 'dm-3.13-changes' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
>>
>> build a kernel from that and see if you can reprocude the
>> problem with it.
>> If so, it can be used as your new "first known bad" kernel for
>> bisection.
>> Otherwise, you can use it as the "first good" one and commit
>> cc8ef52707341
>> as "first known bad".
>>
>> Thanks!
>>
>
> Sorry, for any inconvenience, but you should forget about what
> I've written, that reverting the patch in question from 3.13.x
> didn't fix it. Of course it didn't fix it, as the patch doesn't
> cleanly revert from release-kernels at all. My mistake!
>
> I' ve been guided by Guenter Roeck through two more bisecting
> sessions/ways on this, that always pointed to the commit in
> question.
>
> Some citation:
> Me:
>>>> O.k. I've now followed your latest directions:
>>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
>>>> => result after rebuild was BAD =>
>>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
>>>> => result after rebuild was GOOD
>>>>
> [ ...]
>>>> Reverting that commit in question from this very git tree
>>>> makes the
>>>> kernel work as expected.
> [ ... ]
> Guenter:
>>> Report the results you have above. That should show without
>>> question
>>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
>>> and it should be easy to reproduce.
>
> That seems to be all I can do for you for now. Please let me know
> of any preliminary patches to test!
> And I want to add special thanks to Guenter Roeck for his
> always-just-in-time assistance over so many days,
>
> Manuel Krause
>

BTW -- applying this patch in question to a 3.12.17 kernel, that 
worked optimal WITHOUT it, makes it FAIL as described for 3.13.x 
kernels. (And, yes, the patch applied cleanly, compiled fine and 
boots nicely.)

Manuel Krause


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-04-13  0:05                                     ` Manuel Krause
  0 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-04-13  0:05 UTC (permalink / raw)
  To: rui.zhang
  Cc: Rafael J. Wysocki, Guenter Roeck, linux-kernel, linux-pm,
	Jean Delvare, lm-sensors

On 2014-04-11 00:51, Manuel Krause wrote:
> On 2014-04-07 13:45, Rafael J. Wysocki wrote:
>> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
>>> On 2014-04-06 04:43, Guenter Roeck wrote:
>>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel
>>>>>>>>>>>>>>> Krause
>>>>>>>>>>>>>>> wrote:
>>>>>>>> [SNIP]
>>>>>>>>
>>>>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>>>>> convention? Or were my charts that unusable for your
>>>>>>>> analysis/work?
>>>>>>>>
>>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the
>>>>>>>> problem
>>>>>>>> persists. "Strange / dangerous fan policy..."
>>>>>>>>
>>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>>>>> overheating problem by manually issuing a:
>>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>>>>> particular setting may only work for my system! ...and keeps
>>>>>>>> working for 3.14-rc.
>>>>>>>>
>>>>>>>> In the following I'd like to present you a modified output
>>>>>>>> of my
>>>>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>>>>> system), that shows the results in the way of
>>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>>>>> {I've uploded the files to pastebin, to not swamp you and
>>>>>>>> the
>>>>>>>> lists with so many lines of logs.}
>>>>>>>>
>>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>>>>    http://pastebin.com/HL1PNcda
>>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>>>>    http://pastebin.com/98hgf1a9
>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>>>>    http://pastebin.com/MuTwTnjD
>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>>>>    *) command:
>>>>>>>>    http://pastebin.com/2peda54z
>>>>>>>>
>>>>>>>> Please, have a look at them! And maybe, give me hints on
>>>>>>>> how I
>>>>>>>> can help you to further debug this issue, as my manual
>>>>>>>> method
>>>>>>>> works but it's annoying.
>>>>>>>>
>>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>>>>> Email-thread to someone in charge.
>>>>>>>>
>>>>>>>> Thank you for your work && best regards,
>>>>>>>> Manuel Krause
>>>>>>>>
>>>>>>>
>>>>>>> This is still BUG 71711
>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?idq711
>>>>>>>
>>>>>>> 3.12.15 works very well
>>>>>>> 3.13.7 fails
>>>>>>> 3.14.0-rc8 fails
>>>>>>>
>>>>>>
>>>>>> Best you can do would really be to bisect the problem.
>>>>>> Unfortunately only you (or someone else with an affected
>>>>>> system)
>>>>>> can do that. Once the culprit is known it would be much easier
>>>>>> to get it fixed.
>>>>>>
>>>>>> To answer your earlier question: I don't think you did
>>>>>> anything
>>>>>> wrong.
>>>>>> I guess everyone else is just as clueless as I am (if not,
>>>>>> speak up
>>>>>> and help ;-).
>>>>>>
>>>>>> Guenter
>>>>>>
>>>>>
>>>>> I've now bisected two times. From two different kernel origins,
>>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
>>>>> and, to be sure, I haven't given a false positive inbetween due
>>>>> to boredom.
>>>>>
>>>>
>>>> Not really. Keep in mint that you were able to track down the
>>>> bad
>>>> commit
>>>> among more than 10,000 commits in a reasonably short period
>>>> of time.
>>>>
>>>>> In the end it says each time:
>>>>> # git bisect bad | tee -a /var/log/bisect.log
>>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad
>>>>> commit
>>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>> Author: Zhang Rui <rui.zhang@intel.com>
>>>>> Date:   Wed Sep 25 20:39:45 2013 +0800
>>>>>
>>>>>       ACPI / AC: convert ACPI ac driver to platform bus
>>>>>
>>>>>       Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>>>       Signed-off-by: Rafael J. Wysocki
>>>>> <rafael.j.wysocki@intel.com>
>>>>>
>>>> Off to the two of you...
>>>>
>>>> Guenter
>>>>
>>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>>>>>
>>>>>
>>>>> Please help me, on how I can help debug this more, and please
>>>>> also read the newest from
>>>>> https://bugzilla.kernel.org/show_bug.cgi?idq711
>>>>>
>>>>> Manuel Krause
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> Sorry, that I've forgotton to add the following last night: After
>>> the first bisection round, I was so glad about a result that
>>> time, that I reverted this mentioned patch from the 3.13.8
>>> kernel, but this didn't fix it.
>>
>> This means that the commit in question didn't introduce the
>> problem
>> you're seeing.
>>
>> Please check out commit 7f2dc5c4bcbf (Merge tag
>> 'dm-3.13-changes' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
>>
>> build a kernel from that and see if you can reprocude the
>> problem with it.
>> If so, it can be used as your new "first known bad" kernel for
>> bisection.
>> Otherwise, you can use it as the "first good" one and commit
>> cc8ef52707341
>> as "first known bad".
>>
>> Thanks!
>>
>
> Sorry, for any inconvenience, but you should forget about what
> I've written, that reverting the patch in question from 3.13.x
> didn't fix it. Of course it didn't fix it, as the patch doesn't
> cleanly revert from release-kernels at all. My mistake!
>
> I' ve been guided by Guenter Roeck through two more bisecting
> sessions/ways on this, that always pointed to the commit in
> question.
>
> Some citation:
> Me:
>>>> O.k. I've now followed your latest directions:
>>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
>>>> => result after rebuild was BAD =>
>>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
>>>> => result after rebuild was GOOD
>>>>
> [ ...]
>>>> Reverting that commit in question from this very git tree
>>>> makes the
>>>> kernel work as expected.
> [ ... ]
> Guenter:
>>> Report the results you have above. That should show without
>>> question
>>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
>>> and it should be easy to reproduce.
>
> That seems to be all I can do for you for now. Please let me know
> of any preliminary patches to test!
> And I want to add special thanks to Guenter Roeck for his
> always-just-in-time assistance over so many days,
>
> Manuel Krause
>

BTW -- applying this patch in question to a 3.12.17 kernel, that 
worked optimal WITHOUT it, makes it FAIL as described for 3.13.x 
kernels. (And, yes, the patch applied cleanly, compiled fine and 
boots nicely.)

Manuel Krause


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-04-13  0:05                                     ` [lm-sensors] " Manuel Krause
@ 2014-04-16 18:32                                       ` Zhang Rui
  -1 siblings, 0 replies; 45+ messages in thread
From: Zhang Rui @ 2014-04-16 18:32 UTC (permalink / raw)
  To: Manuel Krause
  Cc: Rafael J. Wysocki, Guenter Roeck, linux-kernel, linux-pm,
	Jean Delvare, lm-sensors

On Sun, 2014-04-13 at 02:05 +0200, Manuel Krause wrote:
> On 2014-04-11 00:51, Manuel Krause wrote:
> > On 2014-04-07 13:45, Rafael J. Wysocki wrote:
> >> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
> >>> On 2014-04-06 04:43, Guenter Roeck wrote:
> >>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
> >>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
> >>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
> >>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
> >>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
> >>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
> >>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
> >>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
> >>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
> >>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel
> >>>>>>>>>>>>>>> Krause
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>> [SNIP]
> >>>>>>>>
> >>>>>>>> Long time no reply from you... Have I overseen a unwritten
> >>>>>>>> convention? Or were my charts that unusable for your
> >>>>>>>> analysis/work?
> >>>>>>>>
> >>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the
> >>>>>>>> problem
> >>>>>>>> persists. "Strange / dangerous fan policy..."
> >>>>>>>>
> >>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
> >>>>>>>> overheating problem by manually issuing a:
> >>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
> >>>>>>>> _before_ obviously critical temperatures occur. Remind: This
> >>>>>>>> particular setting may only work for my system! ...and keeps
> >>>>>>>> working for 3.14-rc.
> >>>>>>>>
> >>>>>>>> In the following I'd like to present you a modified output
> >>>>>>>> of my
> >>>>>>>> /sys/class/thermal, that I've written a script for (for my
> >>>>>>>> system), that shows the results in the way of
> >>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
> >>>>>>>> {I've uploded the files to pastebin, to not swamp you and
> >>>>>>>> the
> >>>>>>>> lists with so many lines of logs.}
> >>>>>>>>
> >>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
> >>>>>>>>    http://pastebin.com/HL1PNcda
> >>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
> >>>>>>>>    http://pastebin.com/98hgf1a9
> >>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
> >>>>>>>>    http://pastebin.com/MuTwTnjD
> >>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
> >>>>>>>>    *) command:
> >>>>>>>>    http://pastebin.com/2peda54z
> >>>>>>>>
> >>>>>>>> Please, have a look at them! And maybe, give me hints on
> >>>>>>>> how I
> >>>>>>>> can help you to further debug this issue, as my manual
> >>>>>>>> method
> >>>>>>>> works but it's annoying.
> >>>>>>>>
> >>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
> >>>>>>>> Email-thread to someone in charge.
> >>>>>>>>
> >>>>>>>> Thank you for your work && best regards,
> >>>>>>>> Manuel Krause
> >>>>>>>>
> >>>>>>>
> >>>>>>> This is still BUG 71711
> >>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
> >>>>>>>
> >>>>>>> 3.12.15 works very well
> >>>>>>> 3.13.7 fails
> >>>>>>> 3.14.0-rc8 fails
> >>>>>>>
> >>>>>>
> >>>>>> Best you can do would really be to bisect the problem.
> >>>>>> Unfortunately only you (or someone else with an affected
> >>>>>> system)
> >>>>>> can do that. Once the culprit is known it would be much easier
> >>>>>> to get it fixed.
> >>>>>>
> >>>>>> To answer your earlier question: I don't think you did
> >>>>>> anything
> >>>>>> wrong.
> >>>>>> I guess everyone else is just as clueless as I am (if not,
> >>>>>> speak up
> >>>>>> and help ;-).
> >>>>>>
> >>>>>> Guenter
> >>>>>>
> >>>>>
> >>>>> I've now bisected two times. From two different kernel origins,
> >>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
> >>>>> and, to be sure, I haven't given a false positive inbetween due
> >>>>> to boredom.
> >>>>>
> >>>>
> >>>> Not really. Keep in mint that you were able to track down the
> >>>> bad
> >>>> commit
> >>>> among more than 10,000 commits in a reasonably short period
> >>>> of time.
> >>>>
> >>>>> In the end it says each time:
> >>>>> # git bisect bad | tee -a /var/log/bisect.log
> >>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad
> >>>>> commit
> >>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
> >>>>> Author: Zhang Rui <rui.zhang@intel.com>
> >>>>> Date:   Wed Sep 25 20:39:45 2013 +0800
> >>>>>
> >>>>>       ACPI / AC: convert ACPI ac driver to platform bus
> >>>>>
> >>>>>       Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> >>>>>       Signed-off-by: Rafael J. Wysocki
> >>>>> <rafael.j.wysocki@intel.com>
> >>>>>
> >>>> Off to the two of you...
> >>>>
> >>>> Guenter
> >>>>
> >>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
> >>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
> >>>>>
> >>>>>
> >>>>> Please help me, on how I can help debug this more, and please
> >>>>> also read the newest from
> >>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
> >>>>>
> >>>>> Manuel Krause
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>> Sorry, that I've forgotton to add the following last night: After
> >>> the first bisection round, I was so glad about a result that
> >>> time, that I reverted this mentioned patch from the 3.13.8
> >>> kernel, but this didn't fix it.
> >>
> >> This means that the commit in question didn't introduce the
> >> problem
> >> you're seeing.
> >>
> >> Please check out commit 7f2dc5c4bcbf (Merge tag
> >> 'dm-3.13-changes' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
> >>
> >> build a kernel from that and see if you can reprocude the
> >> problem with it.
> >> If so, it can be used as your new "first known bad" kernel for
> >> bisection.
> >> Otherwise, you can use it as the "first good" one and commit
> >> cc8ef52707341
> >> as "first known bad".
> >>
> >> Thanks!
> >>
> >
> > Sorry, for any inconvenience, but you should forget about what
> > I've written, that reverting the patch in question from 3.13.x
> > didn't fix it. Of course it didn't fix it, as the patch doesn't
> > cleanly revert from release-kernels at all. My mistake!
> >
> > I' ve been guided by Guenter Roeck through two more bisecting
> > sessions/ways on this, that always pointed to the commit in
> > question.
> >
> > Some citation:
> > Me:
> >>>> O.k. I've now followed your latest directions:
> >>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
> >>>> => result after rebuild was BAD =>
> >>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
> >>>> => result after rebuild was GOOD
> >>>>
> > [ ...]
> >>>> Reverting that commit in question from this very git tree
> >>>> makes the
> >>>> kernel work as expected.
> > [ ... ]
> > Guenter:
> >>> Report the results you have above. That should show without
> >>> question
> >>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
> >>> and it should be easy to reproduce.
> >
> > That seems to be all I can do for you for now. Please let me know
> > of any preliminary patches to test!
> > And I want to add special thanks to Guenter Roeck for his
> > always-just-in-time assistance over so many days,
> >
> > Manuel Krause
> >
> 
> BTW -- applying this patch in question to a 3.12.17 kernel, that 
> worked optimal WITHOUT it, makes it FAIL as described for 3.13.x 
> kernels. (And, yes, the patch applied cleanly, compiled fine and 
> boots nicely.)
> 
could you please apply commit 50a2bc5429f07ec4d53df2d287b03bdbceb281bb
on top of commit cc8ef52707341e67a12067d6ead991d56ea017ca and check if
the problem still exist in 3.12.17 kernel?

thanks,
rui
> Manuel Krause
> 



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-04-16 18:32                                       ` Zhang Rui
  0 siblings, 0 replies; 45+ messages in thread
From: Zhang Rui @ 2014-04-16 18:32 UTC (permalink / raw)
  To: Manuel Krause
  Cc: Rafael J. Wysocki, Guenter Roeck, linux-kernel, linux-pm,
	Jean Delvare, lm-sensors

On Sun, 2014-04-13 at 02:05 +0200, Manuel Krause wrote:
> On 2014-04-11 00:51, Manuel Krause wrote:
> > On 2014-04-07 13:45, Rafael J. Wysocki wrote:
> >> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
> >>> On 2014-04-06 04:43, Guenter Roeck wrote:
> >>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
> >>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
> >>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
> >>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
> >>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
> >>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
> >>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
> >>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
> >>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
> >>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel
> >>>>>>>>>>>>>>> Krause
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>> [SNIP]
> >>>>>>>>
> >>>>>>>> Long time no reply from you... Have I overseen a unwritten
> >>>>>>>> convention? Or were my charts that unusable for your
> >>>>>>>> analysis/work?
> >>>>>>>>
> >>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the
> >>>>>>>> problem
> >>>>>>>> persists. "Strange / dangerous fan policy..."
> >>>>>>>>
> >>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
> >>>>>>>> overheating problem by manually issuing a:
> >>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
> >>>>>>>> _before_ obviously critical temperatures occur. Remind: This
> >>>>>>>> particular setting may only work for my system! ...and keeps
> >>>>>>>> working for 3.14-rc.
> >>>>>>>>
> >>>>>>>> In the following I'd like to present you a modified output
> >>>>>>>> of my
> >>>>>>>> /sys/class/thermal, that I've written a script for (for my
> >>>>>>>> system), that shows the results in the way of
> >>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
> >>>>>>>> {I've uploded the files to pastebin, to not swamp you and
> >>>>>>>> the
> >>>>>>>> lists with so many lines of logs.}
> >>>>>>>>
> >>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
> >>>>>>>>    http://pastebin.com/HL1PNcda
> >>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
> >>>>>>>>    http://pastebin.com/98hgf1a9
> >>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
> >>>>>>>>    http://pastebin.com/MuTwTnjD
> >>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
> >>>>>>>>    *) command:
> >>>>>>>>    http://pastebin.com/2peda54z
> >>>>>>>>
> >>>>>>>> Please, have a look at them! And maybe, give me hints on
> >>>>>>>> how I
> >>>>>>>> can help you to further debug this issue, as my manual
> >>>>>>>> method
> >>>>>>>> works but it's annoying.
> >>>>>>>>
> >>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
> >>>>>>>> Email-thread to someone in charge.
> >>>>>>>>
> >>>>>>>> Thank you for your work && best regards,
> >>>>>>>> Manuel Krause
> >>>>>>>>
> >>>>>>>
> >>>>>>> This is still BUG 71711
> >>>>>>> https://bugzilla.kernel.org/show_bug.cgi?idq711
> >>>>>>>
> >>>>>>> 3.12.15 works very well
> >>>>>>> 3.13.7 fails
> >>>>>>> 3.14.0-rc8 fails
> >>>>>>>
> >>>>>>
> >>>>>> Best you can do would really be to bisect the problem.
> >>>>>> Unfortunately only you (or someone else with an affected
> >>>>>> system)
> >>>>>> can do that. Once the culprit is known it would be much easier
> >>>>>> to get it fixed.
> >>>>>>
> >>>>>> To answer your earlier question: I don't think you did
> >>>>>> anything
> >>>>>> wrong.
> >>>>>> I guess everyone else is just as clueless as I am (if not,
> >>>>>> speak up
> >>>>>> and help ;-).
> >>>>>>
> >>>>>> Guenter
> >>>>>>
> >>>>>
> >>>>> I've now bisected two times. From two different kernel origins,
> >>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
> >>>>> and, to be sure, I haven't given a false positive inbetween due
> >>>>> to boredom.
> >>>>>
> >>>>
> >>>> Not really. Keep in mint that you were able to track down the
> >>>> bad
> >>>> commit
> >>>> among more than 10,000 commits in a reasonably short period
> >>>> of time.
> >>>>
> >>>>> In the end it says each time:
> >>>>> # git bisect bad | tee -a /var/log/bisect.log
> >>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad
> >>>>> commit
> >>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
> >>>>> Author: Zhang Rui <rui.zhang@intel.com>
> >>>>> Date:   Wed Sep 25 20:39:45 2013 +0800
> >>>>>
> >>>>>       ACPI / AC: convert ACPI ac driver to platform bus
> >>>>>
> >>>>>       Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> >>>>>       Signed-off-by: Rafael J. Wysocki
> >>>>> <rafael.j.wysocki@intel.com>
> >>>>>
> >>>> Off to the two of you...
> >>>>
> >>>> Guenter
> >>>>
> >>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
> >>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
> >>>>>
> >>>>>
> >>>>> Please help me, on how I can help debug this more, and please
> >>>>> also read the newest from
> >>>>> https://bugzilla.kernel.org/show_bug.cgi?idq711
> >>>>>
> >>>>> Manuel Krause
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>> Sorry, that I've forgotton to add the following last night: After
> >>> the first bisection round, I was so glad about a result that
> >>> time, that I reverted this mentioned patch from the 3.13.8
> >>> kernel, but this didn't fix it.
> >>
> >> This means that the commit in question didn't introduce the
> >> problem
> >> you're seeing.
> >>
> >> Please check out commit 7f2dc5c4bcbf (Merge tag
> >> 'dm-3.13-changes' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
> >>
> >> build a kernel from that and see if you can reprocude the
> >> problem with it.
> >> If so, it can be used as your new "first known bad" kernel for
> >> bisection.
> >> Otherwise, you can use it as the "first good" one and commit
> >> cc8ef52707341
> >> as "first known bad".
> >>
> >> Thanks!
> >>
> >
> > Sorry, for any inconvenience, but you should forget about what
> > I've written, that reverting the patch in question from 3.13.x
> > didn't fix it. Of course it didn't fix it, as the patch doesn't
> > cleanly revert from release-kernels at all. My mistake!
> >
> > I' ve been guided by Guenter Roeck through two more bisecting
> > sessions/ways on this, that always pointed to the commit in
> > question.
> >
> > Some citation:
> > Me:
> >>>> O.k. I've now followed your latest directions:
> >>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
> >>>> => result after rebuild was BAD =>
> >>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
> >>>> => result after rebuild was GOOD
> >>>>
> > [ ...]
> >>>> Reverting that commit in question from this very git tree
> >>>> makes the
> >>>> kernel work as expected.
> > [ ... ]
> > Guenter:
> >>> Report the results you have above. That should show without
> >>> question
> >>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
> >>> and it should be easy to reproduce.
> >
> > That seems to be all I can do for you for now. Please let me know
> > of any preliminary patches to test!
> > And I want to add special thanks to Guenter Roeck for his
> > always-just-in-time assistance over so many days,
> >
> > Manuel Krause
> >
> 
> BTW -- applying this patch in question to a 3.12.17 kernel, that 
> worked optimal WITHOUT it, makes it FAIL as described for 3.13.x 
> kernels. (And, yes, the patch applied cleanly, compiled fine and 
> boots nicely.)
> 
could you please apply commit 50a2bc5429f07ec4d53df2d287b03bdbceb281bb
on top of commit cc8ef52707341e67a12067d6ead991d56ea017ca and check if
the problem still exist in 3.12.17 kernel?

thanks,
rui
> Manuel Krause
> 



_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-04-16 18:32                                       ` [lm-sensors] " Zhang Rui
@ 2014-04-16 22:17                                         ` Manuel Krause
  -1 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-04-16 22:17 UTC (permalink / raw)
  To: Zhang Rui
  Cc: Rafael J. Wysocki, Guenter Roeck, linux-kernel, linux-pm,
	Jean Delvare, lm-sensors

On 2014-04-16 20:32, Zhang Rui wrote:
> On Sun, 2014-04-13 at 02:05 +0200, Manuel Krause wrote:
>> On 2014-04-11 00:51, Manuel Krause wrote:
>>> On 2014-04-07 13:45, Rafael J. Wysocki wrote:
>>>> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
>>>>> On 2014-04-06 04:43, Guenter Roeck wrote:
>>>>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>>>>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>>>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel
>>>>>>>>>>>>>>>>> Krause
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>> [SNIP]
>>>>>>>>>>
>>>>>>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>>>>>>> convention? Or were my charts that unusable for your
>>>>>>>>>> analysis/work?
>>>>>>>>>>
>>>>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the
>>>>>>>>>> problem
>>>>>>>>>> persists. "Strange / dangerous fan policy..."
>>>>>>>>>>
>>>>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>>>>>>> overheating problem by manually issuing a:
>>>>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>>>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>>>>>>> particular setting may only work for my system! ...and keeps
>>>>>>>>>> working for 3.14-rc.
>>>>>>>>>>
>>>>>>>>>> In the following I'd like to present you a modified output
>>>>>>>>>> of my
>>>>>>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>>>>>>> system), that shows the results in the way of
>>>>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>>>>>>> {I've uploded the files to pastebin, to not swamp you and
>>>>>>>>>> the
>>>>>>>>>> lists with so many lines of logs.}
>>>>>>>>>>
>>>>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>>>>>>     http://pastebin.com/HL1PNcda
>>>>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>>>>>>     http://pastebin.com/98hgf1a9
>>>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>>>>>>     http://pastebin.com/MuTwTnjD
>>>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>>>>>>     *) command:
>>>>>>>>>>     http://pastebin.com/2peda54z
>>>>>>>>>>
>>>>>>>>>> Please, have a look at them! And maybe, give me hints on
>>>>>>>>>> how I
>>>>>>>>>> can help you to further debug this issue, as my manual
>>>>>>>>>> method
>>>>>>>>>> works but it's annoying.
>>>>>>>>>>
>>>>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>>>>>>> Email-thread to someone in charge.
>>>>>>>>>>
>>>>>>>>>> Thank you for your work && best regards,
>>>>>>>>>> Manuel Krause
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is still BUG 71711
>>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>>>>>
>>>>>>>>> 3.12.15 works very well
>>>>>>>>> 3.13.7 fails
>>>>>>>>> 3.14.0-rc8 fails
>>>>>>>>>
>>>>>>>>
>>>>>>>> Best you can do would really be to bisect the problem.
>>>>>>>> Unfortunately only you (or someone else with an affected
>>>>>>>> system)
>>>>>>>> can do that. Once the culprit is known it would be much easier
>>>>>>>> to get it fixed.
>>>>>>>>
>>>>>>>> To answer your earlier question: I don't think you did
>>>>>>>> anything
>>>>>>>> wrong.
>>>>>>>> I guess everyone else is just as clueless as I am (if not,
>>>>>>>> speak up
>>>>>>>> and help ;-).
>>>>>>>>
>>>>>>>> Guenter
>>>>>>>>
>>>>>>>
>>>>>>> I've now bisected two times. From two different kernel origins,
>>>>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
>>>>>>> and, to be sure, I haven't given a false positive inbetween due
>>>>>>> to boredom.
>>>>>>>
>>>>>>
>>>>>> Not really. Keep in mint that you were able to track down the
>>>>>> bad
>>>>>> commit
>>>>>> among more than 10,000 commits in a reasonably short period
>>>>>> of time.
>>>>>>
>>>>>>> In the end it says each time:
>>>>>>> # git bisect bad | tee -a /var/log/bisect.log
>>>>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad
>>>>>>> commit
>>>>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>>>> Author: Zhang Rui <rui.zhang@intel.com>
>>>>>>> Date:   Wed Sep 25 20:39:45 2013 +0800
>>>>>>>
>>>>>>>        ACPI / AC: convert ACPI ac driver to platform bus
>>>>>>>
>>>>>>>        Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>>>>>        Signed-off-by: Rafael J. Wysocki
>>>>>>> <rafael.j.wysocki@intel.com>
>>>>>>>
>>>>>> Off to the two of you...
>>>>>>
>>>>>> Guenter
>>>>>>
>>>>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>>>>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>>>>>>>
>>>>>>>
>>>>>>> Please help me, on how I can help debug this more, and please
>>>>>>> also read the newest from
>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>>>
>>>>>>> Manuel Krause
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> Sorry, that I've forgotton to add the following last night: After
>>>>> the first bisection round, I was so glad about a result that
>>>>> time, that I reverted this mentioned patch from the 3.13.8
>>>>> kernel, but this didn't fix it.
>>>>
>>>> This means that the commit in question didn't introduce the
>>>> problem
>>>> you're seeing.
>>>>
>>>> Please check out commit 7f2dc5c4bcbf (Merge tag
>>>> 'dm-3.13-changes' of
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
>>>>
>>>> build a kernel from that and see if you can reprocude the
>>>> problem with it.
>>>> If so, it can be used as your new "first known bad" kernel for
>>>> bisection.
>>>> Otherwise, you can use it as the "first good" one and commit
>>>> cc8ef52707341
>>>> as "first known bad".
>>>>
>>>> Thanks!
>>>>
>>>
>>> Sorry, for any inconvenience, but you should forget about what
>>> I've written, that reverting the patch in question from 3.13.x
>>> didn't fix it. Of course it didn't fix it, as the patch doesn't
>>> cleanly revert from release-kernels at all. My mistake!
>>>
>>> I' ve been guided by Guenter Roeck through two more bisecting
>>> sessions/ways on this, that always pointed to the commit in
>>> question.
>>>
>>> Some citation:
>>> Me:
>>>>>> O.k. I've now followed your latest directions:
>>>>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>>> => result after rebuild was BAD =>
>>>>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>>> => result after rebuild was GOOD
>>>>>>
>>> [ ...]
>>>>>> Reverting that commit in question from this very git tree
>>>>>> makes the
>>>>>> kernel work as expected.
>>> [ ... ]
>>> Guenter:
>>>>> Report the results you have above. That should show without
>>>>> question
>>>>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
>>>>> and it should be easy to reproduce.
>>>
>>> That seems to be all I can do for you for now. Please let me know
>>> of any preliminary patches to test!
>>> And I want to add special thanks to Guenter Roeck for his
>>> always-just-in-time assistance over so many days,
>>>
>>> Manuel Krause
>>>
>>
>> BTW -- applying this patch in question to a 3.12.17 kernel, that
>> worked optimal WITHOUT it, makes it FAIL as described for 3.13.x
>> kernels. (And, yes, the patch applied cleanly, compiled fine and
>> boots nicely.)
>>
> could you please apply commit 50a2bc5429f07ec4d53df2d287b03bdbceb281bb
> on top of commit cc8ef52707341e67a12067d6ead991d56ea017ca and check if
> the problem still exist in 3.12.17 kernel?
>
> thanks,
> rui

I'm so sorry: 3.12.17 + cc8ef52707341e67a12067d6ead991d56ea017ca 
+ 50a2bc5429f07ec4d53df2d287b03bdbceb281bb does NOT improve the 
situation.

Thank you for your work,
Manuel


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
@ 2014-04-16 22:17                                         ` Manuel Krause
  0 siblings, 0 replies; 45+ messages in thread
From: Manuel Krause @ 2014-04-16 22:17 UTC (permalink / raw)
  To: Zhang Rui
  Cc: Rafael J. Wysocki, Guenter Roeck, linux-kernel, linux-pm,
	Jean Delvare, lm-sensors

On 2014-04-16 20:32, Zhang Rui wrote:
> On Sun, 2014-04-13 at 02:05 +0200, Manuel Krause wrote:
>> On 2014-04-11 00:51, Manuel Krause wrote:
>>> On 2014-04-07 13:45, Rafael J. Wysocki wrote:
>>>> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
>>>>> On 2014-04-06 04:43, Guenter Roeck wrote:
>>>>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>>>>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>>>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel
>>>>>>>>>>>>>>>>> Krause
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>> [SNIP]
>>>>>>>>>>
>>>>>>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>>>>>>> convention? Or were my charts that unusable for your
>>>>>>>>>> analysis/work?
>>>>>>>>>>
>>>>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the
>>>>>>>>>> problem
>>>>>>>>>> persists. "Strange / dangerous fan policy..."
>>>>>>>>>>
>>>>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>>>>>>> overheating problem by manually issuing a:
>>>>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>>>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>>>>>>> particular setting may only work for my system! ...and keeps
>>>>>>>>>> working for 3.14-rc.
>>>>>>>>>>
>>>>>>>>>> In the following I'd like to present you a modified output
>>>>>>>>>> of my
>>>>>>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>>>>>>> system), that shows the results in the way of
>>>>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>>>>>>> {I've uploded the files to pastebin, to not swamp you and
>>>>>>>>>> the
>>>>>>>>>> lists with so many lines of logs.}
>>>>>>>>>>
>>>>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>>>>>>     http://pastebin.com/HL1PNcda
>>>>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>>>>>>     http://pastebin.com/98hgf1a9
>>>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>>>>>>     http://pastebin.com/MuTwTnjD
>>>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>>>>>>     *) command:
>>>>>>>>>>     http://pastebin.com/2peda54z
>>>>>>>>>>
>>>>>>>>>> Please, have a look at them! And maybe, give me hints on
>>>>>>>>>> how I
>>>>>>>>>> can help you to further debug this issue, as my manual
>>>>>>>>>> method
>>>>>>>>>> works but it's annoying.
>>>>>>>>>>
>>>>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>>>>>>> Email-thread to someone in charge.
>>>>>>>>>>
>>>>>>>>>> Thank you for your work && best regards,
>>>>>>>>>> Manuel Krause
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is still BUG 71711
>>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?idq711
>>>>>>>>>
>>>>>>>>> 3.12.15 works very well
>>>>>>>>> 3.13.7 fails
>>>>>>>>> 3.14.0-rc8 fails
>>>>>>>>>
>>>>>>>>
>>>>>>>> Best you can do would really be to bisect the problem.
>>>>>>>> Unfortunately only you (or someone else with an affected
>>>>>>>> system)
>>>>>>>> can do that. Once the culprit is known it would be much easier
>>>>>>>> to get it fixed.
>>>>>>>>
>>>>>>>> To answer your earlier question: I don't think you did
>>>>>>>> anything
>>>>>>>> wrong.
>>>>>>>> I guess everyone else is just as clueless as I am (if not,
>>>>>>>> speak up
>>>>>>>> and help ;-).
>>>>>>>>
>>>>>>>> Guenter
>>>>>>>>
>>>>>>>
>>>>>>> I've now bisected two times. From two different kernel origins,
>>>>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
>>>>>>> and, to be sure, I haven't given a false positive inbetween due
>>>>>>> to boredom.
>>>>>>>
>>>>>>
>>>>>> Not really. Keep in mint that you were able to track down the
>>>>>> bad
>>>>>> commit
>>>>>> among more than 10,000 commits in a reasonably short period
>>>>>> of time.
>>>>>>
>>>>>>> In the end it says each time:
>>>>>>> # git bisect bad | tee -a /var/log/bisect.log
>>>>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad
>>>>>>> commit
>>>>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>>>> Author: Zhang Rui <rui.zhang@intel.com>
>>>>>>> Date:   Wed Sep 25 20:39:45 2013 +0800
>>>>>>>
>>>>>>>        ACPI / AC: convert ACPI ac driver to platform bus
>>>>>>>
>>>>>>>        Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>>>>>        Signed-off-by: Rafael J. Wysocki
>>>>>>> <rafael.j.wysocki@intel.com>
>>>>>>>
>>>>>> Off to the two of you...
>>>>>>
>>>>>> Guenter
>>>>>>
>>>>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>>>>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>>>>>>>
>>>>>>>
>>>>>>> Please help me, on how I can help debug this more, and please
>>>>>>> also read the newest from
>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?idq711
>>>>>>>
>>>>>>> Manuel Krause
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> Sorry, that I've forgotton to add the following last night: After
>>>>> the first bisection round, I was so glad about a result that
>>>>> time, that I reverted this mentioned patch from the 3.13.8
>>>>> kernel, but this didn't fix it.
>>>>
>>>> This means that the commit in question didn't introduce the
>>>> problem
>>>> you're seeing.
>>>>
>>>> Please check out commit 7f2dc5c4bcbf (Merge tag
>>>> 'dm-3.13-changes' of
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
>>>>
>>>> build a kernel from that and see if you can reprocude the
>>>> problem with it.
>>>> If so, it can be used as your new "first known bad" kernel for
>>>> bisection.
>>>> Otherwise, you can use it as the "first good" one and commit
>>>> cc8ef52707341
>>>> as "first known bad".
>>>>
>>>> Thanks!
>>>>
>>>
>>> Sorry, for any inconvenience, but you should forget about what
>>> I've written, that reverting the patch in question from 3.13.x
>>> didn't fix it. Of course it didn't fix it, as the patch doesn't
>>> cleanly revert from release-kernels at all. My mistake!
>>>
>>> I' ve been guided by Guenter Roeck through two more bisecting
>>> sessions/ways on this, that always pointed to the commit in
>>> question.
>>>
>>> Some citation:
>>> Me:
>>>>>> O.k. I've now followed your latest directions:
>>>>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>>> => result after rebuild was BAD =>
>>>>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>>> => result after rebuild was GOOD
>>>>>>
>>> [ ...]
>>>>>> Reverting that commit in question from this very git tree
>>>>>> makes the
>>>>>> kernel work as expected.
>>> [ ... ]
>>> Guenter:
>>>>> Report the results you have above. That should show without
>>>>> question
>>>>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
>>>>> and it should be easy to reproduce.
>>>
>>> That seems to be all I can do for you for now. Please let me know
>>> of any preliminary patches to test!
>>> And I want to add special thanks to Guenter Roeck for his
>>> always-just-in-time assistance over so many days,
>>>
>>> Manuel Krause
>>>
>>
>> BTW -- applying this patch in question to a 3.12.17 kernel, that
>> worked optimal WITHOUT it, makes it FAIL as described for 3.13.x
>> kernels. (And, yes, the patch applied cleanly, compiled fine and
>> boots nicely.)
>>
> could you please apply commit 50a2bc5429f07ec4d53df2d287b03bdbceb281bb
> on top of commit cc8ef52707341e67a12067d6ead991d56ea017ca and check if
> the problem still exist in 3.12.17 kernel?
>
> thanks,
> rui

I'm so sorry: 3.12.17 + cc8ef52707341e67a12067d6ead991d56ea017ca 
+ 50a2bc5429f07ec4d53df2d287b03bdbceb281bb does NOT improve the 
situation.

Thank you for your work,
Manuel


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2014-04-16 22:18 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-07 19:33 3.13.?: Strange / dangerous fan policy Manuel Krause
2014-03-07 20:55 ` Guenter Roeck
2014-03-07 20:55   ` [lm-sensors] " Guenter Roeck
2014-03-07 22:04   ` Manuel Krause
2014-03-07 22:04     ` [lm-sensors] " Manuel Krause
2014-03-07 22:52     ` Guenter Roeck
2014-03-07 22:52       ` [lm-sensors] " Guenter Roeck
2014-03-08 11:08       ` Jean Delvare
2014-03-08 11:08         ` Jean Delvare
2014-03-08 12:36         ` Rafael J. Wysocki
2014-03-08 12:36           ` Rafael J. Wysocki
2014-03-08 15:59         ` Guenter Roeck
2014-03-08 15:59           ` Guenter Roeck
2014-03-09  0:10           ` Manuel Krause
2014-03-09  0:10             ` [lm-sensors] " Manuel Krause
2014-03-09  0:10             ` Manuel Krause
2014-03-09 17:28             ` Guenter Roeck
2014-03-09 17:28               ` [lm-sensors] " Guenter Roeck
2014-03-09 17:58             ` Rafael J. Wysocki
2014-03-09 17:58               ` [lm-sensors] " Rafael J. Wysocki
2014-03-10  1:49               ` Manuel Krause
2014-03-10  1:49                 ` [lm-sensors] " Manuel Krause
2014-03-11 21:59                 ` Manuel Krause
2014-03-11 21:59                   ` [lm-sensors] " Manuel Krause
     [not found]                   ` <532B4DC5.4010705@netscape.net>
2014-03-31 23:37                     ` Manuel Krause
2014-03-31 23:37                       ` [lm-sensors] " Manuel Krause
2014-03-31 23:47                       ` Guenter Roeck
2014-03-31 23:47                         ` [lm-sensors] " Guenter Roeck
2014-04-06  2:37                         ` Manuel Krause
2014-04-06  2:37                           ` [lm-sensors] " Manuel Krause
2014-04-06  2:43                           ` Guenter Roeck
2014-04-06  2:43                             ` [lm-sensors] " Guenter Roeck
2014-04-06 23:17                             ` Manuel Krause
2014-04-06 23:17                               ` [lm-sensors] " Manuel Krause
2014-04-06 23:17                               ` Manuel Krause
2014-04-07 11:45                               ` Rafael J. Wysocki
2014-04-07 11:45                                 ` [lm-sensors] " Rafael J. Wysocki
2014-04-10 22:51                                 ` Manuel Krause
2014-04-10 22:51                                   ` [lm-sensors] " Manuel Krause
2014-04-13  0:05                                   ` Manuel Krause
2014-04-13  0:05                                     ` [lm-sensors] " Manuel Krause
2014-04-16 18:32                                     ` Zhang Rui
2014-04-16 18:32                                       ` [lm-sensors] " Zhang Rui
2014-04-16 22:17                                       ` Manuel Krause
2014-04-16 22:17                                         ` [lm-sensors] " Manuel Krause

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.