From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-x234.google.com (mail-pf0-x234.google.com [IPv6:2607:f8b0:400e:c00::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3w6VlD470fzDqHj for ; Tue, 18 Apr 2017 13:20:56 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="JWZVQ2Yh"; dkim-atps=neutral Received: by mail-pf0-x234.google.com with SMTP id 194so34915947pfv.3 for ; Mon, 17 Apr 2017 20:20:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=XA2c8571KiH2i1RT5MUyVKHkRLnv1DhQpvx+wkKqULI=; b=JWZVQ2YhlEfE/6Ur1xl4mmgWVEhbLbn59mpZUUANKLKadhzWsPKRPIMliI/N3IxiZ0 Y9kx56PSD+MKLFAQTufCrrLUytMbP8qYMvSivilWc3TP1rnlIUeo5geGBTUPdCga+KHX ws2NkNKzHbbrwFCPGhtZzDWQDjC4+8gDPnKygJ39QOnlLqwx1gNh9A3tJHYL6CX9YSEo h05R6waXkgDNJkTtv6KJHH2DAHjHezrVpUWggJrNzgysNKdhpRZgsCrz63Fz7px8M12N v4kLzDf0oxPZSg/7wBtIhA4V5eLBLN9suAqL+jmU4piMulD3qn51lFSAm66j3fKkAm7n gBOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=XA2c8571KiH2i1RT5MUyVKHkRLnv1DhQpvx+wkKqULI=; b=Tjc75IAgcpG7auYBllKviA+Nbgx2fPhCDnb30CrcFSeoBgEiq4r3DjHIJBF2FSzAN0 964kk1pBPyus0GGivihXdWbNdoqKjd/1HsMJJ2Z7OVcpXHjwdIBnjqHL6lfCoH49kDKy ovMWo+Rywby16MLJnO8VOspUExfdUOYTpNsEW4AEyt5IyKEqeu3YBBRldUDYibnRyzja vGBOW41NZHBxTZjzvvxSw4xRYPy9rR9gTDymXcez/RF+pBTNwBFy8SEn1/jdjLX7Cg+F rhhLEp79Xbak+McQjrUG/5DSLgLRWV6TIuWx4YgwFBZKJVCG4cndU1rImJNHP7NTC8m8 lA5A== X-Gm-Message-State: AN3rC/6+7jPZCl2quGnxTrZCJHbXBkQTHasPAI253nTB9OMh/0FHbn68 zSl+RDCeZkIdpcV1gKdaVdRkvRUnu9C9 X-Received: by 10.98.25.69 with SMTP id 66mr15344989pfz.84.1492485654471; Mon, 17 Apr 2017 20:20:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.166.133 with HTTP; Mon, 17 Apr 2017 20:20:34 -0700 (PDT) In-Reply-To: <20170418023159.GA25774@heinlein.lan> References: <20170418023159.GA25774@heinlein.lan> From: Patrick Venture Date: Mon, 17 Apr 2017 20:20:34 -0700 Message-ID: Subject: Re: thermald for OpenBMC To: Patrick Williams Cc: openbmc@lists.ozlabs.org Content-Type: multipart/alternative; boundary=94eb2c03becc44c3a6054d686512 X-BeenThere: openbmc@lists.ozlabs.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Development list for OpenBMC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Apr 2017 03:20:57 -0000 --94eb2c03becc44c3a6054d686512 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Patrick, >> I'm working on a thermal control loop that'll operate within the openbmc >> framework(s) and wanted to provide a somewhat high level overview for >> thoughts. > We should connect you with Matt Spinler (mspinler) and Matt Barth > (msbarth) on IRC. They have been working on implementing the "IBM fan > control algorithm" but I suspect there is a significant amount of > overlap. Our intention was that you'd be able to reuse our > implementation and insert a different (low-level detailed) algorithm. Definitely. I know there's a Google algorithm we use for thermal control that's based on proportional=E2=80=93integral=E2=80=93derivative. I'll pin= g them on IRC to get a peek at their design, roadmap and timeline. It's also possible because of our specific data center requirements based on configurations, it may be more work to plug-in a different low-level algorithm. But without seeing the design, it's impossible to say. >> The general design is to have a daemon that reads fans and temperatures >> from dbus (reaching out to phosphor-hwmon) as well as being able to receive >> temperatures and other sensor information over an OEM IPMI command. > Sounds good. This is how it is suppose to work. Good. Yeah. I'll end up running some performance experiments to make sure things are handled quickly enough going through dbus for everything, but I'm sure it will be reasonably quick. > For the IPMI commands, the expectation would be that either the IPMI > provider or an application fed by the IPMI provider for these OEM > commands would implement the same xyz.openbmc_project.Sensor.Value > interface as the phosphor-hwmon. This way the thermal algorithm really > doesn't need to know where the data comes from. Right. I just need to verify the exact design of that information required. Discussions today indicated I'd be provided with the temperature margin for the fastest device and the slowest (in terms of thermal adjustment) per zone. The YAML definition will need to allow for indicating whether a sensor is available to the BMC or is "outside." >> The system will support zones defined (yes, probably in YAML). A zone will >> have at least one exclusion fan, and at least one thermal sensor. The >> thermal sensor can be shared. There will be defaults provided in this >> configuration to act as fallbacks. > There is some code available to define zones via YAML. Matt Spinler can > point you at these. Ok. >> The thermal loop will be margin based and attempt to drive the fans to >> maintain the temperature within operating temperature of the zones. Eac= h >> zone will be independently managed. > These sounds very similar to what their intended design is as well. For > a zone there is a lower-threshold and an upper-threshold. When the > temperature is above the upper-threshold, the fan speed is increased and > the fans are decreased when the temperature is below the > lower-threshold. Again, the Matts can give you details on what the "IBM > fan control algorithm" design is. That's the basic idea. >> Because not all thermal sensors can necessarily be ready by the BMC, we >> need a method of getting that information from the host. From a previou= s >> project, we have the notion of sending thermal margins for slow and quic= k >> (heat change) devices to a controller. > Is this the Host->BMC via IPMI you mentioned earlier or does the BMC > need to actively query the host in some cases? Hopefully it is always > one direction. The plan is for Host->BMC only. The host just feeds thermal information on a cycle to the BMC for those sensors out of reach. I'm very interested in seeing the design doc, or any code that exists, and especially a timeline. Regards, Patrick On Mon, Apr 17, 2017 at 7:31 PM, Patrick Williams wrote= : > Patrick, > > On Mon, Apr 17, 2017 at 01:21:29PM -0700, Patrick Venture wrote: > > I'm working on a thermal control loop that'll operate within the openbm= c > > framework(s) and wanted to provide a somewhat high level overview for > > thoughts. > > We should connect you with Matt Spinler (mspinler) and Matt Barth > (msbarth) on IRC. They have been working on implementing the "IBM fan > control algorithm" but I suspect there is a significant amount of > overlap. Our intention was that you'd be able to reuse our > implementation and insert a different (low-level detailed) algorithm. > > > The general design is to have a daemon that reads fans and temperatures > > from dbus (reaching out to phosphor-hwmon) as well as being able to > receive > > temperatures and other sensor information over an OEM IPMI command. > > Sounds good. This is how it is suppose to work. > > For the IPMI commands, the expectation would be that either the IPMI > provider or an application fed by the IPMI provider for these OEM > commands would implement the same xyz.openbmc_project.Sensor.Value > interface as the phosphor-hwmon. This way the thermal algorithm really > doesn't need to know where the data comes from. > > > The system will support zones defined (yes, probably in YAML). A zone > will > > have at least one exclusion fan, and at least one thermal sensor. The > > thermal sensor can be shared. There will be defaults provided in this > > configuration to act as fallbacks. > > There is some code available to define zones via YAML. Matt Spinler can > point you at these. > > > The thermal loop will be margin based and attempt to drive the fans to > > maintain the temperature within operating temperature of the zones. Ea= ch > > zone will be independently managed. > > These sounds very similar to what their intended design is as well. For > a zone there is a lower-threshold and an upper-threshold. When the > temperature is above the upper-threshold, the fan speed is increased and > the fans are decreased when the temperature is below the > lower-threshold. Again, the Matts can give you details on what the "IBM > fan control algorithm" design is. > > > Because not all thermal sensors can necessarily be ready by the BMC, we > > need a method of getting that information from the host. From a previo= us > > project, we have the notion of sending thermal margins for slow and qui= ck > > (heat change) devices to a controller. > > Is this the Host->BMC via IPMI you mentioned earlier or does the BMC > need to actively query the host in some cases? Hopefully it is always > one direction. > > -- > Patrick Williams > --94eb2c03becc44c3a6054d686512 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
<= span style=3D"font-size:12.8px">Patrick,
=
>
> I'm working on a ther= mal control loop that'll operate within the openbmc
>> framework(s) and wanted to provide a somew= hat high level overview for
>= > thoughts.

> We shoul= d connect you with Matt Spinler (mspinler) and Matt Barth
>=C2=A0(msbarth) on IRC.=C2=A0 They have been workin= g on implementing the "IBM fan
>=C2=A0control algorithm" but I suspect there is a significant amount o= f
>= ;=C2=A0overlap.=C2=A0 Our intention= was that you'd be able to reuse our
>=C2=A0implementation and insert a different (low-level detailed) algor= ithm.

Definitely.=C2=A0 I know there's a Goog= le algorithm we use for thermal control that's based on=C2=A0proportion= al=E2=80=93integral=E2=80=93derivative.=C2=A0 I= 9;ll ping them on IRC to get a peek at their design, roadmap and timeline.= =C2=A0 It's also possible because of our specific data center requireme= nts based on configurations, it may be more work to plug-in a different low= -level algorithm.=C2=A0 But without seeing the design, it's impossible = to say.

>> The general design is to have a daemon that reads fans= and temperatures
>> from = dbus (reaching out to phosphor-hwmon) as well as being able to receive
>> temperatures and other sens= or information over an OEM IPMI command.

>=C2=A0Sou= nds good.=C2=A0 This is how it is suppose to work.

Good.=C2=A0 Yeah.=C2=A0 I'll end up running some performance ex= periments to make sure things are handled quickly enough going through dbus= for everything, but I'm sure it will be reasonably quick.

>=C2=A0For the IPMI comma= nds, the expectation would be that either the IPMI
>=C2=A0provider or an application fed by the IPMI provider f= or these OEM
>=C2=A0commands would im= plement the same xyz.openbmc_project.Sensor.Value
>=C2=A0interface as the phosphor-hwmon.=C2=A0 This way the therma= l algorithm really
>=C2=A0doesn't= need to know where the data comes from.

Right.=C2=A0 I j= ust need to verify the exact design of that information required.=C2=A0 Dis= cussions today indicated I'd be provided with the temperature margin fo= r the fastest device and the slowest (in terms of thermal adjustment) per z= one.=C2=A0 The YAML definition will need to allow for indicating whether a = sensor is available to the BMC or is "outside."

>> The system will support zones define= d (yes, probably in YAML).=C2=A0 A zone will
>> have at least one exclusion fan, and at least one the= rmal sensor.=C2=A0 The
>> = thermal sensor can be shared.=C2=A0 There will be defaults provided in this=
>> configuration to act a= s fallbacks.

>=C2=A0There is some code available to define z= ones via YAML.=C2=A0 Matt Spinler can
= >=C2=A0point you at these.

Ok.

= >> The thermal loop will be m= argin based and attempt to drive the fans to
>> maintain the temperature within operating temperature= of the zones.=C2=A0 Each
>&g= t; zone will be independently managed.

>=C2=A0These sounds v= ery similar to what their intended design is as well.=C2=A0 For
>=C2=A0a zone there is a lower-threshold and an = upper-threshold.=C2=A0 When the
>=C2=A0temperature is above the upper-threshold, the fan speed is increased and<= /span>
>= =C2=A0the fans are decreased when t= he temperature is below the
>=C2=A0lo= wer-threshold.=C2=A0 Again, the Matts can give you details on what the &quo= t;IBM
>=C2=A0fan control algorithm&qu= ot; design is.

That's the basic idea.
>> Because not all = thermal sensors can necessarily be ready by the BMC, we
>> need a method of getting that information = from the host.=C2=A0 From a previous
&g= t;> project, we have the notion of sending thermal margins for sl= ow and quick
>> (heat change) devices to a controller.

>=C2=A0Is this the Host->BMC via IPMI you mentioned earlier or does = the BMC
>=C2=A0need to actively query= the host in some cases?=C2=A0 Hopefully it is always
>=C2=A0one direction.

The pl= an is for Host->BMC only.=C2=A0 The host just feeds thermal information = on a cycle to the BMC for those sensors out of reach.

I'm very interested in seeing the design doc, or any code that= exists, and especially a timeline.

Regards,
Patrick
<= /div>

On Mon, Apr = 17, 2017 at 7:31 PM, Patrick Williams <patrick@stwcx.xyz> wr= ote:
Patrick,

On Mon, Apr 17, 2017 at 01:21:29PM -0700, Patrick Venture wrote:
> I'm working on a thermal control loop that'll operate within t= he openbmc
> framework(s) and wanted to provide a somewhat high level overview for<= br> > thoughts.

We should connect you with Matt Spinler (mspinler) and Matt Barth (msbarth) on IRC.=C2=A0 They have been working on implementing the "IB= M fan
control algorithm" but I suspect there is a significant amount of
overlap.=C2=A0 Our intention was that you'd be able to reuse our
implementation and insert a different (low-level detailed) algorithm.

> The general design is to have a daemon that reads fans and temperature= s
> from dbus (reaching out to phosphor-hwmon) as well as being able to re= ceive
> temperatures and other sensor information over an OEM IPMI command.
Sounds good.=C2=A0 This is how it is suppose to work.

For the IPMI commands, the expectation would be that either the IPMI
provider or an application fed by the IPMI provider for these OEM
commands would implement the same xyz.openbmc_project.Sensor.Value
interface as the phosphor-hwmon.=C2=A0 This way the thermal algorithm reall= y
doesn't need to know where the data comes from.

> The system will support zones defined (yes, probably in YAML).=C2=A0 A= zone will
> have at least one exclusion fan, and at least one thermal sensor.=C2= =A0 The
> thermal sensor can be shared.=C2=A0 There will be defaults provided in= this
> configuration to act as fallbacks.

There is some code available to define zones via YAML.=C2=A0 Matt Sp= inler can
point you at these.

> The thermal loop will be margin based and attempt to drive the fans to=
> maintain the temperature within operating temperature of the zones.=C2= =A0 Each
> zone will be independently managed.

These sounds very similar to what their intended design is as well.= =C2=A0 For
a zone there is a lower-threshold and an upper-threshold.=C2=A0 When the temperature is above the upper-threshold, the fan speed is increased and the fans are decreased when the temperature is below the
lower-threshold.=C2=A0 Again, the Matts can give you details on what the &q= uot;IBM
fan control algorithm" design is.

> Because not all thermal sensors can necessarily be ready by the BMC, w= e
> need a method of getting that information from the host.=C2=A0 From a = previous
> project, we have the notion of sending thermal margins for slow and qu= ick
> (heat change) devices to a controller.

Is this the Host->BMC via IPMI you mentioned earlier or does the = BMC
need to actively query the host in some cases?=C2=A0 Hopefully it is always=
one direction.

--
Patrick Williams

--94eb2c03becc44c3a6054d686512--