All of lore.kernel.org
 help / color / mirror / Atom feed
* thermald for OpenBMC
@ 2017-04-17 20:21 Patrick Venture
  2017-04-18  2:31 ` Patrick Williams
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick Venture @ 2017-04-17 20:21 UTC (permalink / raw)
  To: openbmc

[-- Attachment #1: Type: text/plain, Size: 1084 bytes --]

I'm working on a thermal control loop that'll operate within the openbmc
framework(s) and wanted to provide a somewhat high level overview for
thoughts.

The general design is to have a daemon that reads fans and temperatures
from dbus (reaching out to phosphor-hwmon) as well as being able to receive
temperatures and other sensor information over an OEM IPMI command.

The system will support zones defined (yes, probably in YAML).  A zone will
have at least one exclusion fan, and at least one thermal sensor.  The
thermal sensor can be shared.  There will be defaults provided in this
configuration to act as fallbacks.

The thermal loop will be margin based and attempt to drive the fans to
maintain the temperature within operating temperature of the zones.  Each
zone will be independently managed.

Because not all thermal sensors can necessarily be ready by the BMC, we
need a method of getting that information from the host.  From a previous
project, we have the notion of sending thermal margins for slow and quick
(heat change) devices to a controller.

Regards,
Patrick

[-- Attachment #2: Type: text/html, Size: 1253 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thermald for OpenBMC
  2017-04-17 20:21 thermald for OpenBMC Patrick Venture
@ 2017-04-18  2:31 ` Patrick Williams
  2017-04-18  3:20   ` Patrick Venture
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick Williams @ 2017-04-18  2:31 UTC (permalink / raw)
  To: Patrick Venture; +Cc: openbmc

[-- Attachment #1: Type: text/plain, Size: 2588 bytes --]

Patrick,

On Mon, Apr 17, 2017 at 01:21:29PM -0700, Patrick Venture wrote:
> I'm working on a thermal control loop that'll operate within the openbmc
> framework(s) and wanted to provide a somewhat high level overview for
> thoughts.

We should connect you with Matt Spinler (mspinler) and Matt Barth
(msbarth) on IRC.  They have been working on implementing the "IBM fan
control algorithm" but I suspect there is a significant amount of
overlap.  Our intention was that you'd be able to reuse our
implementation and insert a different (low-level detailed) algorithm.

> The general design is to have a daemon that reads fans and temperatures
> from dbus (reaching out to phosphor-hwmon) as well as being able to receive
> temperatures and other sensor information over an OEM IPMI command.

Sounds good.  This is how it is suppose to work.

For the IPMI commands, the expectation would be that either the IPMI
provider or an application fed by the IPMI provider for these OEM
commands would implement the same xyz.openbmc_project.Sensor.Value
interface as the phosphor-hwmon.  This way the thermal algorithm really
doesn't need to know where the data comes from.

> The system will support zones defined (yes, probably in YAML).  A zone will
> have at least one exclusion fan, and at least one thermal sensor.  The
> thermal sensor can be shared.  There will be defaults provided in this
> configuration to act as fallbacks.

There is some code available to define zones via YAML.  Matt Spinler can
point you at these.

> The thermal loop will be margin based and attempt to drive the fans to
> maintain the temperature within operating temperature of the zones.  Each
> zone will be independently managed.

These sounds very similar to what their intended design is as well.  For
a zone there is a lower-threshold and an upper-threshold.  When the
temperature is above the upper-threshold, the fan speed is increased and
the fans are decreased when the temperature is below the
lower-threshold.  Again, the Matts can give you details on what the "IBM
fan control algorithm" design is.

> Because not all thermal sensors can necessarily be ready by the BMC, we
> need a method of getting that information from the host.  From a previous
> project, we have the notion of sending thermal margins for slow and quick
> (heat change) devices to a controller.

Is this the Host->BMC via IPMI you mentioned earlier or does the BMC
need to actively query the host in some cases?  Hopefully it is always
one direction.

-- 
Patrick Williams

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thermald for OpenBMC
  2017-04-18  2:31 ` Patrick Williams
@ 2017-04-18  3:20   ` Patrick Venture
  2017-05-02 18:07     ` OpenBMC Thermal Design Matthew Barth
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick Venture @ 2017-04-18  3:20 UTC (permalink / raw)
  To: Patrick Williams; +Cc: openbmc

[-- Attachment #1: Type: text/plain, Size: 6593 bytes --]

Patrick,

>> I'm working on a thermal control loop that'll operate within the openbmc
>> framework(s) and wanted to provide a somewhat high level overview for
>> thoughts.

> We should connect you with Matt Spinler (mspinler) and Matt Barth
> (msbarth) on IRC.  They have been working on implementing the "IBM fan
> control algorithm" but I suspect there is a significant amount of
> overlap.  Our intention was that you'd be able to reuse our
> implementation and insert a different (low-level detailed) algorithm.

Definitely.  I know there's a Google algorithm we use for thermal control
that's based on proportional–integral–derivative.  I'll ping them on IRC to
get a peek at their design, roadmap and timeline.  It's also possible
because of our specific data center requirements based on configurations,
it may be more work to plug-in a different low-level algorithm.  But
without seeing the design, it's impossible to say.

>> The general design is to have a daemon that reads fans and temperatures
>> from dbus (reaching out to phosphor-hwmon) as well as being able to
receive
>> temperatures and other sensor information over an OEM IPMI command.

> Sounds good.  This is how it is suppose to work.

Good.  Yeah.  I'll end up running some performance experiments to make sure
things are handled quickly enough going through dbus for everything, but
I'm sure it will be reasonably quick.

> For the IPMI commands, the expectation would be that either the IPMI
> provider or an application fed by the IPMI provider for these OEM
> commands would implement the same xyz.openbmc_project.Sensor.Value
> interface as the phosphor-hwmon.  This way the thermal algorithm really
> doesn't need to know where the data comes from.

Right.  I just need to verify the exact design of that information
required.  Discussions today indicated I'd be provided with the temperature
margin for the fastest device and the slowest (in terms of thermal
adjustment) per zone.  The YAML definition will need to allow for
indicating whether a sensor is available to the BMC or is "outside."

>> The system will support zones defined (yes, probably in YAML).  A zone
will
>> have at least one exclusion fan, and at least one thermal sensor.  The
>> thermal sensor can be shared.  There will be defaults provided in this
>> configuration to act as fallbacks.

> There is some code available to define zones via YAML.  Matt Spinler can
> point you at these.

Ok.

>> The thermal loop will be margin based and attempt to drive the fans to
>> maintain the temperature within operating temperature of the zones.  Each
>> zone will be independently managed.

> These sounds very similar to what their intended design is as well.  For
> a zone there is a lower-threshold and an upper-threshold.  When the
> temperature is above the upper-threshold, the fan speed is increased and
> the fans are decreased when the temperature is below the
> lower-threshold.  Again, the Matts can give you details on what the "IBM
> fan control algorithm" design is.

That's the basic idea.

>> Because not all thermal sensors can necessarily be ready by the BMC, we
>> need a method of getting that information from the host.  From a previous
>> project, we have the notion of sending thermal margins for slow and quick
>> (heat change) devices to a controller.

> Is this the Host->BMC via IPMI you mentioned earlier or does the BMC
> need to actively query the host in some cases?  Hopefully it is always
> one direction.

The plan is for Host->BMC only.  The host just feeds thermal information on
a cycle to the BMC for those sensors out of reach.

I'm very interested in seeing the design doc, or any code that exists, and
especially a timeline.

Regards,
Patrick

On Mon, Apr 17, 2017 at 7:31 PM, Patrick Williams <patrick@stwcx.xyz> wrote:

> Patrick,
>
> On Mon, Apr 17, 2017 at 01:21:29PM -0700, Patrick Venture wrote:
> > I'm working on a thermal control loop that'll operate within the openbmc
> > framework(s) and wanted to provide a somewhat high level overview for
> > thoughts.
>
> We should connect you with Matt Spinler (mspinler) and Matt Barth
> (msbarth) on IRC.  They have been working on implementing the "IBM fan
> control algorithm" but I suspect there is a significant amount of
> overlap.  Our intention was that you'd be able to reuse our
> implementation and insert a different (low-level detailed) algorithm.
>
> > The general design is to have a daemon that reads fans and temperatures
> > from dbus (reaching out to phosphor-hwmon) as well as being able to
> receive
> > temperatures and other sensor information over an OEM IPMI command.
>
> Sounds good.  This is how it is suppose to work.
>
> For the IPMI commands, the expectation would be that either the IPMI
> provider or an application fed by the IPMI provider for these OEM
> commands would implement the same xyz.openbmc_project.Sensor.Value
> interface as the phosphor-hwmon.  This way the thermal algorithm really
> doesn't need to know where the data comes from.
>
> > The system will support zones defined (yes, probably in YAML).  A zone
> will
> > have at least one exclusion fan, and at least one thermal sensor.  The
> > thermal sensor can be shared.  There will be defaults provided in this
> > configuration to act as fallbacks.
>
> There is some code available to define zones via YAML.  Matt Spinler can
> point you at these.
>
> > The thermal loop will be margin based and attempt to drive the fans to
> > maintain the temperature within operating temperature of the zones.  Each
> > zone will be independently managed.
>
> These sounds very similar to what their intended design is as well.  For
> a zone there is a lower-threshold and an upper-threshold.  When the
> temperature is above the upper-threshold, the fan speed is increased and
> the fans are decreased when the temperature is below the
> lower-threshold.  Again, the Matts can give you details on what the "IBM
> fan control algorithm" design is.
>
> > Because not all thermal sensors can necessarily be ready by the BMC, we
> > need a method of getting that information from the host.  From a previous
> > project, we have the notion of sending thermal margins for slow and quick
> > (heat change) devices to a controller.
>
> Is this the Host->BMC via IPMI you mentioned earlier or does the BMC
> need to actively query the host in some cases?  Hopefully it is always
> one direction.
>
> --
> Patrick Williams
>

[-- Attachment #2: Type: text/html, Size: 11424 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* OpenBMC Thermal Design
  2017-04-18  3:20   ` Patrick Venture
@ 2017-05-02 18:07     ` Matthew Barth
  2017-05-02 19:33       ` Patrick Venture
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Barth @ 2017-05-02 18:07 UTC (permalink / raw)
  To: Patrick Venture, Patrick Williams; +Cc: openbmc

[-- Attachment #1: Type: text/plain, Size: 249 bytes --]

Patrick,

As Patrick mentioned a few of us have been working on the fan control 
infrastructure currently and wrote up a quick outline to share our 
thoughts on the design layout. Let us know if there are areas you'd like 
to see more detail.

Matt

[-- Attachment #2: ThermalDesign.pdf --]
[-- Type: application/pdf, Size: 73786 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: OpenBMC Thermal Design
  2017-05-02 18:07     ` OpenBMC Thermal Design Matthew Barth
@ 2017-05-02 19:33       ` Patrick Venture
  2017-05-02 20:46         ` Patrick Venture
  2017-05-02 21:21         ` Matthew Barth
  0 siblings, 2 replies; 9+ messages in thread
From: Patrick Venture @ 2017-05-02 19:33 UTC (permalink / raw)
  To: Matthew Barth; +Cc: Patrick Williams, openbmc

[-- Attachment #1: Type: text/plain, Size: 794 bytes --]

Please elaborate on the format or layout of the control configuration file.
Please elaborate on the mechanism planned to import it into code?  Y'all
often use python programs to make c++, will that be the case here?
Please elaborate on how the control program will execute it's "pluggable"
algorithm?
Please elaborate on how the fans will be controlled?  Will that be as a
group per system or independently per zone?
  -- What is a zone?

Patrick

On Tue, May 2, 2017 at 11:07 AM, Matthew Barth <msbarth@linux.vnet.ibm.com>
wrote:

> Patrick,
>
> As Patrick mentioned a few of us have been working on the fan control
> infrastructure currently and wrote up a quick outline to share our thoughts
> on the design layout. Let us know if there are areas you'd like to see more
> detail.
>
> Matt
>

[-- Attachment #2: Type: text/html, Size: 1193 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: OpenBMC Thermal Design
  2017-05-02 19:33       ` Patrick Venture
@ 2017-05-02 20:46         ` Patrick Venture
  2017-05-02 21:21         ` Matthew Barth
  1 sibling, 0 replies; 9+ messages in thread
From: Patrick Venture @ 2017-05-02 20:46 UTC (permalink / raw)
  To: Matthew Barth; +Cc: Patrick Williams, openbmc

[-- Attachment #1: Type: text/plain, Size: 1498 bytes --]

Just to give some details, the present design is as follows:
- A zone is a group of fans controlled independently.
 -- The configuration specifies the inputs to the PID loops. -- each loop
takes an input and a goal, and outputs an RPM to achieve the goal.  (the
loop input can be the margin)
 -- The PID loops all feed into a maximum function which then feeds a fan
PID loop that tweaks the duty cycles as write and reads the fan tachs back
until it's where it needs to be.

So we'll be dynamically building a list of PIDs to run and drive the output
for controlling the fans.

Patrick

On Tue, May 2, 2017 at 12:33 PM, Patrick Venture <venture@google.com> wrote:

> Please elaborate on the format or layout of the control configuration file.
> Please elaborate on the mechanism planned to import it into code?  Y'all
> often use python programs to make c++, will that be the case here?
> Please elaborate on how the control program will execute it's "pluggable"
> algorithm?
> Please elaborate on how the fans will be controlled?  Will that be as a
> group per system or independently per zone?
>   -- What is a zone?
>
> Patrick
>
> On Tue, May 2, 2017 at 11:07 AM, Matthew Barth <msbarth@linux.vnet.ibm.com
> > wrote:
>
>> Patrick,
>>
>> As Patrick mentioned a few of us have been working on the fan control
>> infrastructure currently and wrote up a quick outline to share our thoughts
>> on the design layout. Let us know if there are areas you'd like to see more
>> detail.
>>
>> Matt
>>
>
>

[-- Attachment #2: Type: text/html, Size: 2345 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: OpenBMC Thermal Design
  2017-05-02 19:33       ` Patrick Venture
  2017-05-02 20:46         ` Patrick Venture
@ 2017-05-02 21:21         ` Matthew Barth
  2017-05-02 21:33           ` Patrick Venture
  1 sibling, 1 reply; 9+ messages in thread
From: Matthew Barth @ 2017-05-02 21:21 UTC (permalink / raw)
  To: Patrick Venture; +Cc: Patrick Williams, openbmc

[-- Attachment #1: Type: text/plain, Size: 1774 bytes --]

On 05/02/17 2:33 PM, Patrick Venture wrote:
> Please elaborate on the format or layout of the control configuration 
> file.
This will be a yaml file containing the zone and fan definitions with 
the associated set of parameters that feed into the control algorithm. 
Currently this contains the zone number, the zone initial speed, and the 
list of fans(including their inventory path, sensor name, etc...).
> Please elaborate on the mechanism planned to import it into code?  
> Y'all often use python programs to make c++, will that be the case here?
Correct, that will be the case here as well.
> Please elaborate on how the control program will execute it's 
> "pluggable" algorithm?
Filling out the associated yaml file for the control application defines 
how the algorithm will control the fan speeds based on the values, 
sensors listed, delays, etc.. that will be supported as parameter inputs 
to the algorithm.
> Please elaborate on how the fans will be controlled?  Will that be as 
> a group per system or independently per zone?
This will be configurable by the yaml file where the fan speeds are set 
on the zone, which a zone is a group of fans. These zones can be 
constructed as 1-to-many fans enabling individual fans being controlled 
or a group of fans being controlled similarly.
>   -- What is a zone?
A grouping of fans
>
> Patrick
>
> On Tue, May 2, 2017 at 11:07 AM, Matthew Barth 
> <msbarth@linux.vnet.ibm.com <mailto:msbarth@linux.vnet.ibm.com>> wrote:
>
>     Patrick,
>
>     As Patrick mentioned a few of us have been working on the fan
>     control infrastructure currently and wrote up a quick outline to
>     share our thoughts on the design layout. Let us know if there are
>     areas you'd like to see more detail.
>
>     Matt
>
>


[-- Attachment #2: Type: text/html, Size: 3589 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: OpenBMC Thermal Design
  2017-05-02 21:21         ` Matthew Barth
@ 2017-05-02 21:33           ` Patrick Venture
  2017-05-03 14:05             ` Matthew Barth
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick Venture @ 2017-05-02 21:33 UTC (permalink / raw)
  To: Matthew Barth; +Cc: Patrick Williams, openbmc

[-- Attachment #1: Type: text/plain, Size: 2245 bytes --]

Fantastic!  Thanks for elaborating.

>> Please elaborate on how the control program will execute it's
"pluggable" algorithm?
> Filling out the associated yaml file for the control application defines
how the algorithm will control the fan speeds based on the values, sensors
listed, delays, etc.. that will be supported as parameter inputs to the
algorithm.

What do you mean by "delays?"

Patrick

On Tue, May 2, 2017 at 2:21 PM, Matthew Barth <msbarth@linux.vnet.ibm.com>
wrote:

> On 05/02/17 2:33 PM, Patrick Venture wrote:
>
> Please elaborate on the format or layout of the control configuration file.
>
> This will be a yaml file containing the zone and fan definitions with the
> associated set of parameters that feed into the control algorithm.
> Currently this contains the zone number, the zone initial speed, and the
> list of fans(including their inventory path, sensor name, etc...).
>
> Please elaborate on the mechanism planned to import it into code?  Y'all
> often use python programs to make c++, will that be the case here?
>
> Correct, that will be the case here as well.
>
> Please elaborate on how the control program will execute it's "pluggable"
> algorithm?
>
> Filling out the associated yaml file for the control application defines
> how the algorithm will control the fan speeds based on the values, sensors
> listed, delays, etc.. that will be supported as parameter inputs to the
> algorithm.
>
> Please elaborate on how the fans will be controlled?  Will that be as a
> group per system or independently per zone?
>
> This will be configurable by the yaml file where the fan speeds are set on
> the zone, which a zone is a group of fans. These zones can be constructed
> as 1-to-many fans enabling individual fans being controlled or a group of
> fans being controlled similarly.
>
>   -- What is a zone?
>
> A grouping of fans
>
>
> Patrick
>
> On Tue, May 2, 2017 at 11:07 AM, Matthew Barth <msbarth@linux.vnet.ibm.com
> > wrote:
>
>> Patrick,
>>
>> As Patrick mentioned a few of us have been working on the fan control
>> infrastructure currently and wrote up a quick outline to share our thoughts
>> on the design layout. Let us know if there are areas you'd like to see more
>> detail.
>>
>> Matt
>>
>
>
>

[-- Attachment #2: Type: text/html, Size: 4152 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: OpenBMC Thermal Design
  2017-05-02 21:33           ` Patrick Venture
@ 2017-05-03 14:05             ` Matthew Barth
  0 siblings, 0 replies; 9+ messages in thread
From: Matthew Barth @ 2017-05-03 14:05 UTC (permalink / raw)
  To: Patrick Venture; +Cc: Patrick Williams, openbmc

[-- Attachment #1: Type: text/plain, Size: 4103 bytes --]

On 05/02/17 4:33 PM, Patrick Venture wrote:
> Fantastic!  Thanks for elaborating.
>
> >> Please elaborate on how the control program will execute it's 
> "pluggable" algorithm?
> > Filling out the associated yaml file for the control application 
> defines how the algorithm will control the fan speeds based on the 
> values, sensors listed, delays, etc.. that will be supported as 
> parameter inputs to the algorithm.
>
> What do you mean by "delays?"
>
No prob, so in our design each temperature sensor that's an input to the 
control algorithm has a defined RPM delta for each degree above or below 
a defined "nominal" temperature range. For each sensor read, the maximum 
RPM delta based on their reported temps are written as the new speed 
target for the zone they are included in. After that occurs, any more 
RPM deltas determined from the sensor readings are ignored for a set 
amount of time(delay), unless the RPM delta is larger than the previous 
RPM change. If that happens, the difference in RPM deltas is then 
written as the new speed target for the zone they are included in again.

Not sure if that describes it well enough, but for example:
Given a core temp's "nominal" temperature range is 75-78C, if that core 
reports a temp of 79C with a defined RPM increase delta of 300rpms per 
degree above, then the fans in the zone containing this core are 
increased 300rpms. After that increase is requested, no increase 
requests 300rpm and less are done until after the delay interval has 
passed. Whereas if another core is reported at 80C, resulting in a 
600rpm increase request during that delay interval, then an additional 
300rpms are added to the previous target and the delay interval 
restarts. After the interval expires it considers all incoming deltas again.

The delay interval is there to help eliminate unnecessary speed change 
requests while the fans are going to their target speed. Also, this 
minimizes the possibility for speed oscillations.

Matt
>
> Patrick
>
>
> On Tue, May 2, 2017 at 2:21 PM, Matthew Barth 
> <msbarth@linux.vnet.ibm.com <mailto:msbarth@linux.vnet.ibm.com>> wrote:
>
>     On 05/02/17 2:33 PM, Patrick Venture wrote:
>>     Please elaborate on the format or layout of the control
>>     configuration file.
>     This will be a yaml file containing the zone and fan definitions
>     with the associated set of parameters that feed into the control
>     algorithm. Currently this contains the zone number, the zone
>     initial speed, and the list of fans(including their inventory
>     path, sensor name, etc...).
>>     Please elaborate on the mechanism planned to import it into
>>     code?  Y'all often use python programs to make c++, will that be
>>     the case here?
>     Correct, that will be the case here as well.
>>     Please elaborate on how the control program will execute it's
>>     "pluggable" algorithm?
>     Filling out the associated yaml file for the control application
>     defines how the algorithm will control the fan speeds based on the
>     values, sensors listed, delays, etc.. that will be supported as
>     parameter inputs to the algorithm.
>>     Please elaborate on how the fans will be controlled?  Will that
>>     be as a group per system or independently per zone?
>     This will be configurable by the yaml file where the fan speeds
>     are set on the zone, which a zone is a group of fans. These zones
>     can be constructed as 1-to-many fans enabling individual fans
>     being controlled or a group of fans being controlled similarly.
>>       -- What is a zone?
>     A grouping of fans
>>
>>     Patrick
>>
>>     On Tue, May 2, 2017 at 11:07 AM, Matthew Barth
>>     <msbarth@linux.vnet.ibm.com <mailto:msbarth@linux.vnet.ibm.com>>
>>     wrote:
>>
>>         Patrick,
>>
>>         As Patrick mentioned a few of us have been working on the fan
>>         control infrastructure currently and wrote up a quick outline
>>         to share our thoughts on the design layout. Let us know if
>>         there are areas you'd like to see more detail.
>>
>>         Matt
>>
>>
>
>


[-- Attachment #2: Type: text/html, Size: 8030 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-05-03 14:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-17 20:21 thermald for OpenBMC Patrick Venture
2017-04-18  2:31 ` Patrick Williams
2017-04-18  3:20   ` Patrick Venture
2017-05-02 18:07     ` OpenBMC Thermal Design Matthew Barth
2017-05-02 19:33       ` Patrick Venture
2017-05-02 20:46         ` Patrick Venture
2017-05-02 21:21         ` Matthew Barth
2017-05-02 21:33           ` Patrick Venture
2017-05-03 14:05             ` Matthew Barth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.