All of lore.kernel.org
 help / color / mirror / Atom feed
* Using a temperature sensor with 1-bit output for CPU throttling
@ 2015-04-28 11:27 Mason
  2015-04-29 13:47 ` Mason
  0 siblings, 1 reply; 10+ messages in thread
From: Mason @ 2015-04-28 11:27 UTC (permalink / raw)
  To: Linux PM; +Cc: cpufreq, Zhang Rui, Eduardo Valentin

Hello everyone,

The SoC I'm working on provides a temperature sensor (NXP) in the CPU block.
The sensor seems to be very primitive, so I wanted to ask experienced people
what would be the best way to use it from Linux.

General Description
"The sensor generates an output signal that indicates if the die temperature
exceeds a programmable threshold. This makes it particularly suitable for
detecting overheating."

So it seems that the original purpose of this sensor was to periodically
check that the temperature has not exceeded a given threshold.

- Is the CPU temp higher than 100°C ?
- No.
- OK. Business as usual.

(1 second later)
- Is the CPU temp higher than 100°C ?
- Yes.
- Uh-oh! I need to do something about it.


Basic Functions
"The temp sensor uses a bandgap type of circuit to compare a voltage which
has a negative temperature coefficient with a voltage that is proportional
to absolute temperature. A resistor bank allows 40 different temperature
thresholds to be selected and the logic output 'out_temperature' will then
indicate whether the actual die temperature lies above or below the selected
threshold."

The available thresholds seem to be chosen somewhat arbitrarily:

  -45.1, -39.7, -33.7, -29.4, -24.4, -20.4, -15.4, -10.1,
  -6.4, -1.4, 3.6, 7.6, 12.9, 16.6, 20.6, 25.6, 30.9,
  34.9, 38.6, 43.9, 48.9, 52.9, 57.9, 61.9, 66.9, 70.9,
  76.3, 81.3, 85.3, 90.3, 95.3, 98.9, 102.9, 108.3, 111.9,
  117.3, 122.3, 126.3, 131.3, 135.3, 139.3

The spacing between values seems arbitrary also.
(Is there an underlying physical explanation?)

I'm not sure that there is much point in testing for temperatures lower
than 50°C ? (I'm told that the SoC can reliably function up to 125°C.)

Do higher temperatures shorten the lifespan of a component?
In other words, would a CPU running 24/7 at 100°C "break" sooner
than one running 24/7 at 50°C ?


Characteristics

Symbol      Parameter             Min  Typ  Max  Unit

(Operating conditions)
Tjunc      Junction temperature   -40   25   125  °C
Vdd        Supply voltage         1.0  1.1  1.26   V

(Normal operating mode)
Idd         Supply current              50    60  μA
Vbandgapref Ref output voltage   0.72  0.8  0.88   V
∆outtemp    Absolute Temp               ±2   ±10  °C
            threshold error
T_res       Temp resolution        3    4.5    7  °C


Given the semantics of the temperature sensor hardware block, I was
tempted to implement something along these lines:

Create a kernel thread that runs periodically (e.g. every second)
to check if the temperature is above 100°C.
- If not, do nothing
- If yes, somehow prevent the CPU from using the highest frequencies
defined in cpufreq's freq table
(They are 1000, 500, 333, 200, 100 MHz)

Is that a sensible approach?
Is there a way to implement this using the thermal framework?

Or am I looking at this wrong, and things should be done a
different way? (I'm using 3.14 by the way.)

I suppose I could perform some kind of binary search to zoom in
on the current threshold (although it might change during the
measurements, so I'd rather not go there.)

Regards.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using a temperature sensor with 1-bit output for CPU throttling
  2015-04-28 11:27 Using a temperature sensor with 1-bit output for CPU throttling Mason
@ 2015-04-29 13:47 ` Mason
  2015-04-29 16:36   ` Javi Merino
  2015-05-13  8:02   ` Mason
  0 siblings, 2 replies; 10+ messages in thread
From: Mason @ 2015-04-29 13:47 UTC (permalink / raw)
  To: Linux PM
  Cc: cpufreq, Zhang Rui, Eduardo Valentin, Amit Daniel, Andrew Lunn,
	Radhesh Fadnis

On 28/04/2015 13:27, Mason wrote:

> The SoC I'm working on provides a temperature sensor (NXP) in the CPU block.
> The sensor seems to be very primitive, so I wanted to ask experienced people
> what would be the best way to use it from Linux.
> 
> General Description
> "The sensor generates an output signal that indicates if the die temperature
> exceeds a programmable threshold. This makes it particularly suitable for
> detecting overheating."
> 
> So it seems that the original purpose of this sensor was to periodically
> check that the temperature has not exceeded a given threshold.
> 
> - Is the CPU temp higher than 100°C ?
> - No.
> - OK. Business as usual.
> 
> (1 second later)
> - Is the CPU temp higher than 100°C ?
> - Yes.
> - Uh-oh! I need to do something about it.
> 
> 
> Basic Functions
> "The temp sensor uses a bandgap type of circuit to compare a voltage which
> has a negative temperature coefficient with a voltage that is proportional
> to absolute temperature. A resistor bank allows 40 different temperature
> thresholds to be selected and the logic output 'out_temperature' will then
> indicate whether the actual die temperature lies above or below the selected
> threshold."
> 
> The available thresholds seem to be chosen somewhat arbitrarily:
> 
>   -45.1, -39.7, -33.7, -29.4, -24.4, -20.4, -15.4, -10.1,
>   -6.4, -1.4, 3.6, 7.6, 12.9, 16.6, 20.6, 25.6, 30.9,
>   34.9, 38.6, 43.9, 48.9, 52.9, 57.9, 61.9, 66.9, 70.9,
>   76.3, 81.3, 85.3, 90.3, 95.3, 98.9, 102.9, 108.3, 111.9,
>   117.3, 122.3, 126.3, 131.3, 135.3, 139.3
> 
> The spacing between values seems arbitrary also.
> (Is there an underlying physical explanation?)
> 
> I'm not sure that there is much point in testing for temperatures lower
> than 50°C ? (I'm told that the SoC can reliably function up to 125°C.)
> 
> Do higher temperatures shorten the lifespan of a component?
> In other words, would a CPU running 24/7 at 100°C "break" sooner
> than one running 24/7 at 50°C ?
> 
> 
> Characteristics
> 
> Symbol      Parameter             Min  Typ  Max  Unit
> 
> (Operating conditions)
> Tjunc      Junction temperature   -40   25   125  °C
> Vdd        Supply voltage         1.0  1.1  1.26   V
> 
> (Normal operating mode)
> Idd         Supply current              50    60  μA
> Vbandgapref Ref output voltage   0.72  0.8  0.88   V
> ∆outtemp    Absolute Temp               ±2   ±10  °C
>             threshold error
> T_res       Temp resolution        3    4.5    7  °C
> 
> 
> Given the semantics of the temperature sensor hardware block, I was
> tempted to implement something along these lines:
> 
> Create a kernel thread that runs periodically (e.g. every second)
> to check if the temperature is above 100°C.
> - If not, do nothing
> - If yes, somehow prevent the CPU from using the highest frequencies
> defined in cpufreq's freq table
> (They are 1000, 500, 333, 200, 100 MHz)
> 
> Is that a sensible approach?
> Is there a way to implement this using the thermal framework?
> 
> Or am I looking at this wrong, and things should be done a
> different way? (I'm using 3.14 by the way.)
> 
> I suppose I could perform some kind of binary search to zoom in
> on the current threshold (although it might change during the
> measurements, so I'd rather not go there.)

I'm aware that I posted many questions. I'd be grateful if someone
would answer even a tiny subset. That would get the ball rolling.

If I understand correctly, if I want to use the CPU throttling
framework, I need to define a "thermal zone device" and a
"cooling device". AFAIU, the cooling device is taken care of
by cpu_cooling.c

  cpufreq_cooling_register(cpu_present_mask);

My temperature sensor would be the thermal zone device?
How do I tie the two devices together?
Is that where a thermal governor comes in play?

I took a look at the dove_thermal driver, because it seems simple
enough to understand (by me).

Looking at ti-soc-thermal/omap?-thermal-data.c
the lookup table looks familiar. Are they using the same kind
of technology as my primitive sensor? (bandgap)
I do note that the precision is much higher though.

Regards.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using a temperature sensor with 1-bit output for CPU throttling
  2015-04-29 13:47 ` Mason
@ 2015-04-29 16:36   ` Javi Merino
  2015-07-21  9:10     ` Mason
  2015-05-13  8:02   ` Mason
  1 sibling, 1 reply; 10+ messages in thread
From: Javi Merino @ 2015-04-29 16:36 UTC (permalink / raw)
  To: Mason
  Cc: Linux PM, cpufreq, Zhang Rui, Eduardo Valentin, Amit Daniel,
	Andrew Lunn, Radhesh Fadnis

On Wed, Apr 29, 2015 at 02:47:13PM +0100, Mason wrote:
> On 28/04/2015 13:27, Mason wrote:
> 
> > The SoC I'm working on provides a temperature sensor (NXP) in the CPU block.
> > The sensor seems to be very primitive, so I wanted to ask experienced people
> > what would be the best way to use it from Linux.
> > 
> > General Description
> > "The sensor generates an output signal that indicates if the die temperature
> > exceeds a programmable threshold. This makes it particularly suitable for
> > detecting overheating."
> > 
> > So it seems that the original purpose of this sensor was to periodically
> > check that the temperature has not exceeded a given threshold.
> > 
> > - Is the CPU temp higher than 100°C ?
> > - No.
> > - OK. Business as usual.
> > 
> > (1 second later)
> > - Is the CPU temp higher than 100°C ?
> > - Yes.
> > - Uh-oh! I need to do something about it.
> > 
> > 
> > Basic Functions
> > "The temp sensor uses a bandgap type of circuit to compare a voltage which
> > has a negative temperature coefficient with a voltage that is proportional
> > to absolute temperature. A resistor bank allows 40 different temperature
> > thresholds to be selected and the logic output 'out_temperature' will then
> > indicate whether the actual die temperature lies above or below the selected
> > threshold."
> > 
> > The available thresholds seem to be chosen somewhat arbitrarily:
> > 
> >   -45.1, -39.7, -33.7, -29.4, -24.4, -20.4, -15.4, -10.1,
> >   -6.4, -1.4, 3.6, 7.6, 12.9, 16.6, 20.6, 25.6, 30.9,
> >   34.9, 38.6, 43.9, 48.9, 52.9, 57.9, 61.9, 66.9, 70.9,
> >   76.3, 81.3, 85.3, 90.3, 95.3, 98.9, 102.9, 108.3, 111.9,
> >   117.3, 122.3, 126.3, 131.3, 135.3, 139.3
> > 
> > The spacing between values seems arbitrary also.
> > (Is there an underlying physical explanation?)
> > 
> > I'm not sure that there is much point in testing for temperatures lower
> > than 50°C ? (I'm told that the SoC can reliably function up to 125°C.)
> > 
> > Do higher temperatures shorten the lifespan of a component?
> > In other words, would a CPU running 24/7 at 100°C "break" sooner
> > than one running 24/7 at 50°C ?
> > 
> > 
> > Characteristics
> > 
> > Symbol      Parameter             Min  Typ  Max  Unit
> > 
> > (Operating conditions)
> > Tjunc      Junction temperature   -40   25   125  °C
> > Vdd        Supply voltage         1.0  1.1  1.26   V
> > 
> > (Normal operating mode)
> > Idd         Supply current              50    60  μA
> > Vbandgapref Ref output voltage   0.72  0.8  0.88   V
> > ∆outtemp    Absolute Temp               ±2   ±10  °C
> >             threshold error
> > T_res       Temp resolution        3    4.5    7  °C
> > 
> > 
> > Given the semantics of the temperature sensor hardware block, I was
> > tempted to implement something along these lines:
> > 
> > Create a kernel thread that runs periodically (e.g. every second)
> > to check if the temperature is above 100°C.
> > - If not, do nothing
> > - If yes, somehow prevent the CPU from using the highest frequencies
> > defined in cpufreq's freq table
> > (They are 1000, 500, 333, 200, 100 MHz)
> > 
> > Is that a sensible approach?
> > Is there a way to implement this using the thermal framework?
> > 
> > Or am I looking at this wrong, and things should be done a
> > different way? (I'm using 3.14 by the way.)
> > 
> > I suppose I could perform some kind of binary search to zoom in
> > on the current threshold (although it might change during the
> > measurements, so I'd rather not go there.)
> 
> I'm aware that I posted many questions. I'd be grateful if someone
> would answer even a tiny subset. That would get the ball rolling.
> 
> If I understand correctly, if I want to use the CPU throttling
> framework, I need to define a "thermal zone device" and a
> "cooling device". AFAIU, the cooling device is taken care of
> by cpu_cooling.c
> 
>   cpufreq_cooling_register(cpu_present_mask);

Correct

> My temperature sensor would be the thermal zone device?
> How do I tie the two devices together?

Your temperature sensor would be the input to the thermal zone
device.  You register it with thermal_zone_device_register().  See
Documentation/thermal/sysfs-api.txt .  Your thermal sensor
doesn't actually report temperature but the thermal framework expects
a temperature, so your thermal zone's get_temp() function should
report a fake temperature.  For example, if you've configure your
sensor for 50C, then you could report 45C if the sensor reads as 0 and
50C if the sensor reads as 1.  It's a hack, but it should work.  Bear
in mind that get_temp() should report in millicelsius.

> Is that where a thermal governor comes in play?

Because you want on/off behavior, the bangbang governor is the
simplest to use and should do the work.  You can choose as the default
in your kernel configuration or you can choose it by passing it as
part of the tzp that you pass to thermal_zone_device_register()

Put a trip point at the temperature you've set up your sensor to and
bind the cpu cooling device to it.

Hope this helps,
Javi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using a temperature sensor with 1-bit output for CPU throttling
  2015-04-29 13:47 ` Mason
  2015-04-29 16:36   ` Javi Merino
@ 2015-05-13  8:02   ` Mason
  2015-05-14  9:25     ` Punit Agrawal
  1 sibling, 1 reply; 10+ messages in thread
From: Mason @ 2015-05-13  8:02 UTC (permalink / raw)
  To: Linux PM; +Cc: cpufreq, Zhang Rui, Eduardo Valentin, Andrew Lunn, Amit Kachhap

On 29/04/2015 15:47, Mason wrote:

> On 28/04/2015 13:27, Mason wrote:
> 
>> The SoC I'm working on provides a temperature sensor (NXP) in the CPU block.
>> The sensor seems to be very primitive, so I wanted to ask experienced people
>> what would be the best way to use it from Linux.
>>
>> General Description
>> "The sensor generates an output signal that indicates if the die temperature
>> exceeds a programmable threshold. This makes it particularly suitable for
>> detecting overheating."
>>
>> So it seems that the original purpose of this sensor was to periodically
>> check that the temperature has not exceeded a given threshold.
>>
>> - Is the CPU temp higher than 100°C ?
>> - No.
>> - OK. Business as usual.
>>
>> (1 second later)
>> - Is the CPU temp higher than 100°C ?
>> - Yes.
>> - Uh-oh! I need to do something about it.
>>
>>
>> Basic Functions
>> "The temp sensor uses a bandgap type of circuit to compare a voltage which
>> has a negative temperature coefficient with a voltage that is proportional
>> to absolute temperature. A resistor bank allows 40 different temperature
>> thresholds to be selected and the logic output 'out_temperature' will then
>> indicate whether the actual die temperature lies above or below the selected
>> threshold."
>>
>> The available thresholds seem to be chosen somewhat arbitrarily:
>>
>>   -45.1, -39.7, -33.7, -29.4, -24.4, -20.4, -15.4, -10.1,
>>   -6.4, -1.4, 3.6, 7.6, 12.9, 16.6, 20.6, 25.6, 30.9,
>>   34.9, 38.6, 43.9, 48.9, 52.9, 57.9, 61.9, 66.9, 70.9,
>>   76.3, 81.3, 85.3, 90.3, 95.3, 98.9, 102.9, 108.3, 111.9,
>>   117.3, 122.3, 126.3, 131.3, 135.3, 139.3
>>
>> The spacing between values seems arbitrary also.
>> (Is there an underlying physical explanation?)
>>
>> I'm not sure that there is much point in testing for temperatures lower
>> than 50°C ? (I'm told that the SoC can reliably function up to 125°C.)
>>
>> Do higher temperatures shorten the lifespan of a component?
>> In other words, would a CPU running 24/7 at 100°C "break" sooner
>> than one running 24/7 at 50°C ?
>>
>>
>> Characteristics
>>
>> Symbol      Parameter             Min  Typ  Max  Unit
>>
>> (Operating conditions)
>> Tjunc      Junction temperature   -40   25   125  °C
>> Vdd        Supply voltage         1.0  1.1  1.26   V
>>
>> (Normal operating mode)
>> Idd         Supply current              50    60  μA
>> Vbandgapref Ref output voltage   0.72  0.8  0.88   V
>> ∆outtemp    Absolute Temp               ±2   ±10  °C
>>             threshold error
>> T_res       Temp resolution        3    4.5    7  °C
>>
>>
>> Given the semantics of the temperature sensor hardware block, I was
>> tempted to implement something along these lines:
>>
>> Create a kernel thread that runs periodically (e.g. every second)
>> to check if the temperature is above 100°C.
>> - If not, do nothing
>> - If yes, somehow prevent the CPU from using the highest frequencies
>> defined in cpufreq's freq table
>> (They are 1000, 500, 333, 200, 100 MHz)
>>
>> Is that a sensible approach?
>> Is there a way to implement this using the thermal framework?
>>
>> Or am I looking at this wrong, and things should be done a
>> different way? (I'm using 3.14 by the way.)
>>
>> I suppose I could perform some kind of binary search to zoom in
>> on the current threshold (although it might change during the
>> measurements, so I'd rather not go there.)
> 
> I'm aware that I posted many questions. I'd be grateful if someone
> would answer even a tiny subset. That would get the ball rolling.
> 
> If I understand correctly, if I want to use the CPU throttling
> framework, I need to define a "thermal zone device" and a
> "cooling device". AFAIU, the cooling device is taken care of
> by cpu_cooling.c
> 
>   cpufreq_cooling_register(cpu_present_mask);
> 
> My temperature sensor would be the thermal zone device?
> How do I tie the two devices together?
> Is that where a thermal governor comes in play?
> 
> I took a look at the dove_thermal driver, because it seems simple
> enough to understand (by me).
> 
> Looking at ti-soc-thermal/omap?-thermal-data.c
> the lookup table looks familiar. Are they using the same kind
> of technology as my primitive sensor? (bandgap)
> I do note that the precision is much higher though.

Hello everyone,

Is there, perhaps, a better place to discuss these issues?
(IRC, web forum, other mailing list, Stack Overflow, ...)

Regards.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using a temperature sensor with 1-bit output for CPU throttling
  2015-05-13  8:02   ` Mason
@ 2015-05-14  9:25     ` Punit Agrawal
  2015-05-14  9:46       ` Mason
  0 siblings, 1 reply; 10+ messages in thread
From: Punit Agrawal @ 2015-05-14  9:25 UTC (permalink / raw)
  To: Mason
  Cc: Linux PM, cpufreq, Zhang Rui, Eduardo Valentin, Andrew Lunn,
	Amit Kachhap

Mason <slash.tmp@free.fr> writes:

> On 29/04/2015 15:47, Mason wrote:
>
>> On 28/04/2015 13:27, Mason wrote:
>> 
>>> The SoC I'm working on provides a temperature sensor (NXP) in the CPU block.
>>> The sensor seems to be very primitive, so I wanted to ask experienced people
>>> what would be the best way to use it from Linux.
>>>
>>> General Description
>>> "The sensor generates an output signal that indicates if the die temperature
>>> exceeds a programmable threshold. This makes it particularly suitable for
>>> detecting overheating."
>>>
>>> So it seems that the original purpose of this sensor was to periodically
>>> check that the temperature has not exceeded a given threshold.
>>>
>>> - Is the CPU temp higher than 100°C ?
>>> - No.
>>> - OK. Business as usual.
>>>
>>> (1 second later)
>>> - Is the CPU temp higher than 100°C ?
>>> - Yes.
>>> - Uh-oh! I need to do something about it.
>>>
>>>
>>> Basic Functions
>>> "The temp sensor uses a bandgap type of circuit to compare a voltage which
>>> has a negative temperature coefficient with a voltage that is proportional
>>> to absolute temperature. A resistor bank allows 40 different temperature
>>> thresholds to be selected and the logic output 'out_temperature' will then
>>> indicate whether the actual die temperature lies above or below the selected
>>> threshold."
>>>
>>> The available thresholds seem to be chosen somewhat arbitrarily:
>>>
>>>   -45.1, -39.7, -33.7, -29.4, -24.4, -20.4, -15.4, -10.1,
>>>   -6.4, -1.4, 3.6, 7.6, 12.9, 16.6, 20.6, 25.6, 30.9,
>>>   34.9, 38.6, 43.9, 48.9, 52.9, 57.9, 61.9, 66.9, 70.9,
>>>   76.3, 81.3, 85.3, 90.3, 95.3, 98.9, 102.9, 108.3, 111.9,
>>>   117.3, 122.3, 126.3, 131.3, 135.3, 139.3
>>>
>>> The spacing between values seems arbitrary also.
>>> (Is there an underlying physical explanation?)
>>>
>>> I'm not sure that there is much point in testing for temperatures lower
>>> than 50°C ? (I'm told that the SoC can reliably function up to 125°C.)
>>>
>>> Do higher temperatures shorten the lifespan of a component?
>>> In other words, would a CPU running 24/7 at 100°C "break" sooner
>>> than one running 24/7 at 50°C ?
>>>
>>>
>>> Characteristics
>>>
>>> Symbol      Parameter             Min  Typ  Max  Unit
>>>
>>> (Operating conditions)
>>> Tjunc      Junction temperature   -40   25   125  °C
>>> Vdd        Supply voltage         1.0  1.1  1.26   V
>>>
>>> (Normal operating mode)
>>> Idd         Supply current              50    60  μA
>>> Vbandgapref Ref output voltage   0.72  0.8  0.88   V
>>> ∆outtemp    Absolute Temp               ±2   ±10  °C
>>>             threshold error
>>> T_res       Temp resolution        3    4.5    7  °C
>>>
>>>
>>> Given the semantics of the temperature sensor hardware block, I was
>>> tempted to implement something along these lines:
>>>
>>> Create a kernel thread that runs periodically (e.g. every second)
>>> to check if the temperature is above 100°C.
>>> - If not, do nothing
>>> - If yes, somehow prevent the CPU from using the highest frequencies
>>> defined in cpufreq's freq table
>>> (They are 1000, 500, 333, 200, 100 MHz)
>>>
>>> Is that a sensible approach?
>>> Is there a way to implement this using the thermal framework?
>>>
>>> Or am I looking at this wrong, and things should be done a
>>> different way? (I'm using 3.14 by the way.)
>>>
>>> I suppose I could perform some kind of binary search to zoom in
>>> on the current threshold (although it might change during the
>>> measurements, so I'd rather not go there.)
>> 
>> I'm aware that I posted many questions. I'd be grateful if someone
>> would answer even a tiny subset. That would get the ball rolling.
>> 
>> If I understand correctly, if I want to use the CPU throttling
>> framework, I need to define a "thermal zone device" and a
>> "cooling device". AFAIU, the cooling device is taken care of
>> by cpu_cooling.c
>> 
>>   cpufreq_cooling_register(cpu_present_mask);
>> 
>> My temperature sensor would be the thermal zone device?
>> How do I tie the two devices together?
>> Is that where a thermal governor comes in play?
>> 
>> I took a look at the dove_thermal driver, because it seems simple
>> enough to understand (by me).
>> 
>> Looking at ti-soc-thermal/omap?-thermal-data.c
>> the lookup table looks familiar. Are they using the same kind
>> of technology as my primitive sensor? (bandgap)
>> I do note that the precision is much higher though.
>
> Hello everyone,
>
> Is there, perhaps, a better place to discuss these issues?
> (IRC, web forum, other mailing list, Stack Overflow, ...)

There is a ##thermal channel on freenode that might be a good place to
discuss linux thermal framework related queries.

>
> Regards.
>
> --
> To unsubscribe from this list: send the line "unsubscribe cpufreq" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using a temperature sensor with 1-bit output for CPU throttling
  2015-05-14  9:25     ` Punit Agrawal
@ 2015-05-14  9:46       ` Mason
  0 siblings, 0 replies; 10+ messages in thread
From: Mason @ 2015-05-14  9:46 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: Linux PM, cpufreq, Zhang Rui, Eduardo Valentin, Andrew Lunn,
	Amit Kachhap

Punit Agrawal wrote:

> Mason wrote:
> 
>> Is there, perhaps, a better place to discuss these issues?
>> (IRC, web forum, other mailing list, Stack Overflow, ...)
> 
> There is a ##thermal channel on freenode that might be a good place
> to discuss linux thermal framework related queries.

Thanks Punit, I will drop by.

Regards.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using a temperature sensor with 1-bit output for CPU throttling
  2015-04-29 16:36   ` Javi Merino
@ 2015-07-21  9:10     ` Mason
  2015-07-21 11:49       ` Mason
  2015-07-23  9:19       ` Mason
  0 siblings, 2 replies; 10+ messages in thread
From: Mason @ 2015-07-21  9:10 UTC (permalink / raw)
  To: Javi Merino
  Cc: Linux PM, cpufreq, Zhang Rui, Eduardo Valentin, Amit Daniel,
	Andrew Lunn, Radhesh Fadnis

[-- Attachment #1: Type: text/plain, Size: 3483 bytes --]

Hello everyone,

I've made some progress (in that the system at least prints *something*)

On 29/04/2015 18:36, Javi Merino wrote:

> Your temperature sensor would be the input to the thermal zone
> device.  You register it with thermal_zone_device_register().  See
> Documentation/thermal/sysfs-api.txt .

My .bind() callback is:

static int tango_bind(struct thermal_zone_device *tz, struct thermal_cooling_device *cdev)
{
  return thermal_zone_bind_cooling_device(tz, 0, cdev, 4, 1);
}

I'm not sure I completely understand the last two parameters
(upper and lower).

    upper:the Maximum cooling state for this trip point.
          THERMAL_NO_LIMIT means no upper limit,
	  and the cooling device can be in max_state.

    lower:the Minimum cooling state can be used for this trip point.
          THERMAL_NO_LIMIT means no lower limit,
	  and the cooling device can be in cooling state 0.


My cpufreq driver exposes 5 frequencies, in this order:
F, F/2, F/3, F/5, F/9

So cooling state = 1 means 'F' is forbidden, right?
Thus the on-demand cpufreq governor is free to pick among
{F/2, F/3, F/5, F/9} ?

And cooling state = 4 means only F/9 is allowed?

> Your thermal sensor
> doesn't actually report temperature but the thermal framework expects
> a temperature, so your thermal zone's get_temp() function should
> report a fake temperature.  For example, if you've configure your
> sensor for 50C, then you could report 45C if the sensor reads as 0 and
> 50C if the sensor reads as 1.  It's a hack, but it should work.  Bear
> in mind that get_temp() should report in millicelsius.

Actually, I can get a rough estimate of the current temperature
by querying the sensor multiple times.

> Because you want on/off behavior, the bangbang governor is the
> simplest to use and should do the work.  You can choose as the default
> in your kernel configuration or you can choose it by passing it as
> part of the tzp that you pass to thermal_zone_device_register()
> 
> Put a trip point at the temperature you've set up your sensor to and
> bind the cpu cooling device to it.

I don't think I want on/off behavior. I want CPU throttling when the
temperature rises above a user-defined value.

I'm stuck with kernel 3.14, so I don't have all the governors to
choose from. I picked "step-wise".

I don't understand this behavior from the governor:

[   27.557494] thermal thermal_zone0: last_temperature=0, current_temperature=51000
[   27.583626] thermal thermal_zone0: Trip0[type=1,temp=70000]:trend=1,throttle=0
[   27.590961] thermal cooling_device0: cur_state=0
[   27.595672] thermal cooling_device0: old_target=-1, target=-1
[   27.601473] thermal cooling_device0: zone0->target=4294967295
[   27.607263] thermal cooling_device0: set to state 0

[   40.643340] thermal thermal_zone0: last_temperature=51000, current_temperature=46000
[   40.669930] thermal thermal_zone0: Trip0[type=1,temp=70000]:trend=2,throttle=0
[   40.677217] thermal cooling_device0: cur_state=0
[   40.681873] thermal cooling_device0: old_target=-1, target=4
[   40.687579] thermal cooling_device0: zone0->target=4
[   40.692669] thermal cooling_device0: set to state 4

The first temperature read = 51°C (below the 70°C trip point).
Next read = 46°C (still below the trip point) and even though
the governor claims throttle=0, it sets the cooling state to 4
(so minimal frequency if I understand correctly).

What am I missing?

I've attached my work-in-progress driver for reference.

Regards.


[-- Attachment #2: temperature.c --]
[-- Type: text/x-csrc, Size: 1981 bytes --]

#include <linux/module.h>
#include <linux/io.h>		// readl_relaxed, writel_relaxed
#include <linux/cpu_cooling.h>

MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Tango CPU throttling");

/*** CPU TEMPERATURE SENSOR ***/
#define SENSOR_ADDR 0x920100
static void __iomem *sensor_base;

#define TEMPSI_CMD	sensor_base + 0
#define TEMPSI_RES	sensor_base + 4
#define TEMPSI_CFG	sensor_base + 8

static const u8 temperature[] = {
	46, 51, 55, 60, 64, 69, 74, 79, 83, 88, 93, 97, 101, 106, 110, 115, 120, 124, 129, 133, 137,
};

static int tango_get_temp(struct thermal_zone_device *tz, unsigned long *res)
{
	int i;

	for (i = 20; i < 40; ++i)
	{
		writel_relaxed(i << 8 | 2, TEMPSI_CMD);
		while ((readl_relaxed(TEMPSI_CMD) & 0x80) == 0);
		if (readl_relaxed(TEMPSI_RES) == 0) break;
	}

	*res = temperature[i-20] * 1000;
	return 0;
}

static int tango_bind(struct thermal_zone_device *tz, struct thermal_cooling_device *cdev)
{
	return thermal_zone_bind_cooling_device(tz, 0, cdev, 4, 1);
}

static int tango_get_trip_type(struct thermal_zone_device *tz, int idx, enum thermal_trip_type *res)
{
	if (idx != 0) return -EINVAL;
	*res = THERMAL_TRIP_PASSIVE;
	return 0;
}

static int tango_get_trip_temp(struct thermal_zone_device *tz, int idx, unsigned long *res)
{
	if (idx != 0) return -EINVAL;
	*res = 70000;
	return 0;
}

static struct thermal_zone_device_ops ops = {
	.bind		= tango_bind,
	.get_temp	= tango_get_temp,
	.get_mode = 0,
	.get_trip_type	= tango_get_trip_type,
	.get_trip_temp	= tango_get_trip_temp,
};

static struct thermal_cooling_device *cdev;
static struct thermal_zone_device *tzdev;

static int ts_init(void)
{
	sensor_base = ioremap(SENSOR_ADDR, 16);
	writel_relaxed( 1, TEMPSI_CMD);
	writel_relaxed(50, TEMPSI_CFG);
	cdev = cpufreq_cooling_register(cpu_present_mask);
	tzdev = thermal_zone_device_register("tango_tz", 1, 0, NULL, &ops, NULL, 5000, 13000);
	return 0;
}

static void __exit ts_cleanup(void) { return; }

module_init(ts_init);
module_exit(ts_cleanup);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using a temperature sensor with 1-bit output for CPU throttling
  2015-07-21  9:10     ` Mason
@ 2015-07-21 11:49       ` Mason
  2015-07-23  9:19       ` Mason
  1 sibling, 0 replies; 10+ messages in thread
From: Mason @ 2015-07-21 11:49 UTC (permalink / raw)
  To: Javi Merino
  Cc: Linux PM, cpufreq, Zhang Rui, Eduardo Valentin, Amit Daniel,
	Lukasz Majewski, Andrew Lunn

On 21/07/2015 11:10, Mason wrote:

> I don't understand this behavior from the governor:
> 
> [   27.557494] thermal thermal_zone0: last_temperature=0, current_temperature=51000
> [   27.583626] thermal thermal_zone0: Trip0[type=1,temp=70000]:trend=1,throttle=0
> [   27.590961] thermal cooling_device0: cur_state=0
> [   27.595672] thermal cooling_device0: old_target=-1, target=-1
> [   27.601473] thermal cooling_device0: zone0->target=4294967295
> [   27.607263] thermal cooling_device0: set to state 0
> 
> [   40.643340] thermal thermal_zone0: last_temperature=51000, current_temperature=46000
> [   40.669930] thermal thermal_zone0: Trip0[type=1,temp=70000]:trend=2,throttle=0
> [   40.677217] thermal cooling_device0: cur_state=0
> [   40.681873] thermal cooling_device0: old_target=-1, target=4
> [   40.687579] thermal cooling_device0: zone0->target=4
> [   40.692669] thermal cooling_device0: set to state 4
> 
> The first temperature read = 51°C (below the 70°C trip point).
> Next read = 46°C (still below the trip point) and even though
> the governor claims throttle=0, it sets the cooling state to 4
> (so minimal frequency if I understand correctly).

Never mind. This is due to a bug that was fixed almost a year ago.
(commit 26bb0e9a1a938ec98ee07aa76533f1a711fba706)

I've requested the fix be back-ported to linux-3.14.y
http://thread.gmane.org/gmane.linux.kernel.stable/143070

Regards.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using a temperature sensor with 1-bit output for CPU throttling
  2015-07-21  9:10     ` Mason
  2015-07-21 11:49       ` Mason
@ 2015-07-23  9:19       ` Mason
  2015-07-23 12:51         ` Mason
  1 sibling, 1 reply; 10+ messages in thread
From: Mason @ 2015-07-23  9:19 UTC (permalink / raw)
  To: Linux PM, cpufreq
  Cc: Javi Merino, Zhang Rui, Eduardo Valentin, Amit Daniel,
	Andrew Lunn, Lukasz Majewski

[-- Attachment #1: Type: text/plain, Size: 883 bytes --]

On 21/07/2015 11:10, Mason wrote:

> I don't think I want on/off behavior. I want CPU throttling when
> the temperature rises above a user-defined value.
> 
> I'm stuck with kernel 3.14, so I don't have all the governors to
> choose from. I picked "step-wise".

I think the attached driver does what I want. (If anyone wants to
comment on the code, I'd be delighted to hear their suggestions!
Especially if there's a better way to do something.)

When the CPU temperature exceeds a user-defined threshold (default
120 °C) the maximum frequency is disabled for all cores (they are
in the same clock domain), and the system will only run at F/2
(at most) until the temperature dips below the threshold.

Does the cpu_cooling driver written by Amit Daniel also support
disabling/off-lining entire cores in multi-core systems, not just
throttling the frequency of the cores?

Regards.


[-- Attachment #2: tango_thermal.c --]
[-- Type: text/x-csrc, Size: 2623 bytes --]

#include <linux/module.h>
#include <linux/io.h>		// readl_relaxed, writel_relaxed
#include <linux/cpu_cooling.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Sigma Designs");
MODULE_DESCRIPTION("Tango CPU throttling");

/*** CPU TEMPERATURE SENSOR ***/
#define SENSOR_ADDR 0x920100
static struct thermal_cooling_device *tcdev;
static struct thermal_zone_device *tzdev;
static void __iomem *sensor_base;
static unsigned int threshold;

#define TEMPSI_CMD	sensor_base + 0
#define TEMPSI_RES	sensor_base + 4
#define TEMPSI_CFG	sensor_base + 8
#define SENSOR_IDLE	(readl_relaxed(TEMPSI_CMD) & 0x80)
#define IDX_OFFSET	18

static const u8 temperature[] = {
	 37,  41,  46,  51,  55,  60,  64,  69,
	 74,  79,  83,  88,  93,  97, 101, 106,
	110, 115, 120, 124, 129, 133, 137, 142,
};

typedef struct thermal_zone_device TZD;

static int tango_get_temp(TZD *tz, unsigned long *res)
{
	int i;

	for (i = IDX_OFFSET; i < 40; ++i)
	{
		writel_relaxed(i << 8 | 2, TEMPSI_CMD);
		while (!SENSOR_IDLE);
		if (readl_relaxed(TEMPSI_RES) == 0) break;
	}

	*res = temperature[i - IDX_OFFSET] * 1000;
	return 0;
}

static int tango_bind(TZD *tz, struct thermal_cooling_device *cdev)
{
	/*
	 * Disable max frequency when CPU temperature exceeds trip point
	 * by setting upper and lower cooling states to 1
	 */
	return thermal_zone_bind_cooling_device(tz, 0, cdev, 1, 1);
}

static int tango_unbind(TZD *tz, struct thermal_cooling_device *cdev)
{
	return thermal_zone_unbind_cooling_device(tz, 0, cdev);
}

static int tango_get_trip_type(TZD *tz, int idx, enum thermal_trip_type *res)
{
	*res = THERMAL_TRIP_PASSIVE;
	return 0;
}

static int tango_get_trip_temp(TZD *tz, int idx, unsigned long *res)
{
	*res = threshold;
	return 0;
}

static int tango_set_trip_temp(TZD *tz, int idx, unsigned long res)
{
	threshold = res;
	return 0;
}

static struct thermal_zone_device_ops ops = {
	.bind		= tango_bind,
	.unbind		= tango_unbind,
	.get_temp	= tango_get_temp,
	.get_trip_type	= tango_get_trip_type,
	.get_trip_temp	= tango_get_trip_temp,
	.set_trip_temp	= tango_set_trip_temp,
};

static int ts_init(void)
{
	threshold = 120000; // millidegrees Celsius
	sensor_base = ioremap(SENSOR_ADDR, 16);
	writel_relaxed( 1, TEMPSI_CMD);
	writel_relaxed(50, TEMPSI_CFG);
	tcdev = cpufreq_cooling_register(cpu_present_mask);
	tzdev = thermal_zone_device_register("tango_thermal", 1, 1, NULL, &ops, NULL, 1000, 2000);
	return 0;
}

static void __exit ts_cleanup(void)
{
	thermal_zone_device_unregister(tzdev);
	cpufreq_cooling_unregister(tcdev);
	writel_relaxed(0, TEMPSI_CFG);
	writel_relaxed(0, TEMPSI_CMD);
}

module_init(ts_init);
module_exit(ts_cleanup);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Using a temperature sensor with 1-bit output for CPU throttling
  2015-07-23  9:19       ` Mason
@ 2015-07-23 12:51         ` Mason
  0 siblings, 0 replies; 10+ messages in thread
From: Mason @ 2015-07-23 12:51 UTC (permalink / raw)
  To: Linux PM, cpufreq
  Cc: Javi Merino, Zhang Rui, Eduardo Valentin, Amit Daniel,
	Andrew Lunn, Lukasz Majewski, Zoran Markovic

On 23/07/2015 11:19, Mason wrote:

> Does the cpu_cooling driver written by Amit Daniel also support
> disabling/off-lining entire cores in multi-core systems, not just
> throttling the frequency of the cores?

According to Javi Merino, the answer is no.

He also pointed out the latest attempt to merge such a feature:

[RFC PATCH] thermal: add generic cpu hotplug cooling device
http://thread.gmane.org/gmane.linux.documentation/15658

Regards.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-23 12:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-28 11:27 Using a temperature sensor with 1-bit output for CPU throttling Mason
2015-04-29 13:47 ` Mason
2015-04-29 16:36   ` Javi Merino
2015-07-21  9:10     ` Mason
2015-07-21 11:49       ` Mason
2015-07-23  9:19       ` Mason
2015-07-23 12:51         ` Mason
2015-05-13  8:02   ` Mason
2015-05-14  9:25     ` Punit Agrawal
2015-05-14  9:46       ` Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.