All of lore.kernel.org
 help / color / mirror / Atom feed
* Fault handling(Threshold exceeds/low) in Fan and NIC sensors
@ 2020-11-13 16:30 Kumar Thangavel
  2020-11-13 20:44 ` Ed Tanous
  0 siblings, 1 reply; 6+ messages in thread
From: Kumar Thangavel @ 2020-11-13 16:30 UTC (permalink / raw)
  To: openbmc
  Cc: Zhikui Ren, Jae Hyun Yoo, Patrick Venture, Ed Tanous,
	Vernon Mauery, Velumani T-ERS, HCLTech, Patrick Williams

[-- Attachment #1: Type: text/plain, Size: 1759 bytes --]

Classification: Internal
Hi All,

         We wanted to power-off 12 V of the hosts/BMC, if the Fan and NIC sensors crossed the threshold level. It would be platform specific.

        In dbus-sensors, most of the sensor handles the threshold checks and throws error if it crossed.

         So, we are planning to add a new field in entity manager to identify the particular sensors to handle this fault condition.  Planning to add default script in the dbus-sensor to handle this fault condition and this would be overwritten from the machine layer.

         Could you please provide your suggestions on this.

Thanks,
Kumar.



::DISCLAIMER::
________________________________
The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects.
________________________________

[-- Attachment #2: Type: text/html, Size: 4311 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fault handling(Threshold exceeds/low) in Fan and NIC sensors
  2020-11-13 16:30 Fault handling(Threshold exceeds/low) in Fan and NIC sensors Kumar Thangavel
@ 2020-11-13 20:44 ` Ed Tanous
  2020-11-16 13:05   ` Kumar Thangavel
  0 siblings, 1 reply; 6+ messages in thread
From: Ed Tanous @ 2020-11-13 20:44 UTC (permalink / raw)
  To: Kumar Thangavel
  Cc: Zhikui Ren, Jae Hyun Yoo, Patrick Venture, openbmc,
	Vernon Mauery, Velumani T-ERS, HCLTech, Patrick Williams

On Fri, Nov 13, 2020 at 8:31 AM Kumar Thangavel <thangavel.k@hcl.com> wrote:
>
>          Could you please provide your suggestions on this.

I'm having a little trouble following your email.  Dbus-sensors has
the ability to mask thresholds where appropriate, the platform
specifics of which are already captured in the config file definition.
If there's some configurable masking needed that's new, we can
certainly add it, but I'd recommend looking at the existing threshold
masking before adding anything new to see if what's there meets your
needs.  If you have some concrete things you'd like to see added, I'm
happy to talk in more detail, just at this point, I have no idea what
you're looking to solve, so you might want to be slightly more
specific, and reference the existing threshold even masking in your
proposed changes.

Cheers,

-Ed

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Fault handling(Threshold exceeds/low) in Fan and NIC sensors
  2020-11-13 20:44 ` Ed Tanous
@ 2020-11-16 13:05   ` Kumar Thangavel
  2020-11-16 15:59     ` Ed Tanous
  0 siblings, 1 reply; 6+ messages in thread
From: Kumar Thangavel @ 2020-11-16 13:05 UTC (permalink / raw)
  To: Ed Tanous
  Cc: Zhikui Ren, Jae Hyun Yoo, Patrick Venture, openbmc,
	Vernon Mauery, Velumani T-ERS,HCLTech, Patrick Williams

Classification: Internal

Hi Ed,

        In short, Our requirement is to take the actions when the fan fails. That action is platform specific.

        Fan failure :  This is based on Fan sensors. If fan sensor's tach values is less than 33%, will consider as a fan failure. So will take the actions to reduce the heat production in the system.
                                So that,  hosts, NIC and other power consuming modules.

        Dbus-sensor's already handles the threshold masking. We just use that threshold masking to take the platform specific actions.

        Please let us know if any clarifications needed.

Thanks,
Kumar.

-----Original Message-----
From: Ed Tanous <ed@tanous.net>
Sent: Saturday, November 14, 2020 2:14 AM
To: Kumar Thangavel <thangavel.k@hcl.com>
Cc: openbmc@lists.ozlabs.org; Velumani T-ERS,HCLTech <velumanit@hcl.com>; sdasari@fb.com; Patrick Williams <patrickw3@fb.com>; Patrick Venture <venture@google.com>; Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>; Vernon Mauery <vernon.mauery@linux.intel.com>; Zhikui Ren <zhikui.ren@intel.com>
Subject: Re: Fault handling(Threshold exceeds/low) in Fan and NIC sensors

[CAUTION: This Email is from outside the Organization. Unless you trust the sender, Don’t click links or open attachments as it may be a Phishing email, which can steal your Information and compromise your Computer.]

On Fri, Nov 13, 2020 at 8:31 AM Kumar Thangavel <thangavel.k@hcl.com> wrote:
>
>          Could you please provide your suggestions on this.

I'm having a little trouble following your email.  Dbus-sensors has the ability to mask thresholds where appropriate, the platform specifics of which are already captured in the config file definition.
If there's some configurable masking needed that's new, we can certainly add it, but I'd recommend looking at the existing threshold masking before adding anything new to see if what's there meets your needs.  If you have some concrete things you'd like to see added, I'm happy to talk in more detail, just at this point, I have no idea what you're looking to solve, so you might want to be slightly more specific, and reference the existing threshold even masking in your proposed changes.

Cheers,

-Ed
::DISCLAIMER::
________________________________
The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects.
________________________________

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fault handling(Threshold exceeds/low) in Fan and NIC sensors
  2020-11-16 13:05   ` Kumar Thangavel
@ 2020-11-16 15:59     ` Ed Tanous
  2020-11-17 11:59       ` Kumar Thangavel
  0 siblings, 1 reply; 6+ messages in thread
From: Ed Tanous @ 2020-11-16 15:59 UTC (permalink / raw)
  To: Kumar Thangavel
  Cc: Zhikui Ren, Jae Hyun Yoo, Patrick Venture, openbmc,
	Vernon Mauery, Velumani T-ERS, HCLTech, Patrick Williams

On Mon, Nov 16, 2020 at 5:05 AM Kumar Thangavel <thangavel.k@hcl.com> wrote:
>
> Classification: Internal
>
> Hi Ed,
>
>         In short, Our requirement is to take the actions when the fan fails. That action is platform specific.
>
>         Fan failure :  This is based on Fan sensors. If fan sensor's tach values is less than 33%, will consider as a fan failure. So will take the actions to reduce the heat production in the system.

dbus-sensors and phosphor-pid-control already have mechanisms for
handling fan failure in these ways.  Take a look at the existing
config files, and they'll guide you on what you need to do next.

>                                 So that,  hosts, NIC and other power consuming modules.
>
>         Dbus-sensor's already handles the threshold masking. We just use that threshold masking to take the platform specific actions.
>
>         Please let us know if any clarifications needed.
>
> Thanks,
> Kumar.

Ps, Please don't toppost.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Fault handling(Threshold exceeds/low) in Fan and NIC sensors
  2020-11-16 15:59     ` Ed Tanous
@ 2020-11-17 11:59       ` Kumar Thangavel
  2020-11-20 17:10         ` Matt Spinler
  0 siblings, 1 reply; 6+ messages in thread
From: Kumar Thangavel @ 2020-11-17 11:59 UTC (permalink / raw)
  To: Ed Tanous
  Cc: Zhikui Ren, Jae Hyun Yoo, Patrick Venture, openbmc,
	Vernon Mauery, Velumani T-ERS,HCLTech, Patrick Williams

Classification: Internal

Hi Ed,

        Please find below my response inline.

Thanks,
Kumar.

-----Original Message-----
From: Ed Tanous <ed@tanous.net>
Sent: Monday, November 16, 2020 9:29 PM
To: Kumar Thangavel <thangavel.k@hcl.com>
Cc: openbmc@lists.ozlabs.org; Velumani T-ERS,HCLTech <velumanit@hcl.com>; sdasari@fb.com; Patrick Williams <patrickw3@fb.com>; Patrick Venture <venture@google.com>; Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>; Vernon Mauery <vernon.mauery@linux.intel.com>; Zhikui Ren <zhikui.ren@intel.com>
Subject: Re: Fault handling(Threshold exceeds/low) in Fan and NIC sensors

[CAUTION: This Email is from outside the Organization. Unless you trust the sender, Don’t click links or open attachments as it may be a Phishing email, which can steal your Information and compromise your Computer.]

On Mon, Nov 16, 2020 at 5:05 AM Kumar Thangavel <thangavel.k@hcl.com> wrote:
>
> Classification: Internal
>
> Hi Ed,
>
>         In short, Our requirement is to take the actions when the fan fails. That action is platform specific.
>
>         Fan failure :  This is based on Fan sensors. If fan sensor's tach values is less than 33%, will consider as a fan failure. So will take the actions to reduce the heat production in the system.

dbus-sensors and phosphor-pid-control already have mechanisms for handling fan failure in these ways.  Take a look at the existing config files, and they'll guide you on what you need to do next.

 Kumar :  Are you saying about dbus-sensor's checkThresholds function ?  In that function, high/low threshold levels are handled.  Please confirm once.
                 In that function,  planning to add the service to handle the platform specific actions.
                 Also, planning to add a new field in entity manager to identify the particular sensors to handle this fault condition.

>                                 So that,  hosts, NIC and other power consuming modules.
>
>         Dbus-sensor's already handles the threshold masking. We just use that threshold masking to take the platform specific actions.
>
>         Please let us know if any clarifications needed.
>
> Thanks,
> Kumar.

Ps, Please don't toppost.
::DISCLAIMER::
________________________________
The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects.
________________________________

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fault handling(Threshold exceeds/low) in Fan and NIC sensors
  2020-11-17 11:59       ` Kumar Thangavel
@ 2020-11-20 17:10         ` Matt Spinler
  0 siblings, 0 replies; 6+ messages in thread
From: Matt Spinler @ 2020-11-20 17:10 UTC (permalink / raw)
  To: Kumar Thangavel, Ed Tanous
  Cc: Zhikui Ren, Jae Hyun Yoo, Patrick Venture, openbmc,
	Vernon Mauery, Velumani T-ERS, HCLTech, Patrick Williams



On 11/17/2020 5:59 AM, Kumar Thangavel wrote:
> Classification: Internal
>
> Hi Ed,
>
>          Please find below my response inline.
>
> Thanks,
> Kumar.
>
> -----Original Message-----
> From: Ed Tanous <ed@tanous.net>
> Sent: Monday, November 16, 2020 9:29 PM
> To: Kumar Thangavel <thangavel.k@hcl.com>
> Cc: openbmc@lists.ozlabs.org; Velumani T-ERS,HCLTech <velumanit@hcl.com>; sdasari@fb.com; Patrick Williams <patrickw3@fb.com>; Patrick Venture <venture@google.com>; Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>; Vernon Mauery <vernon.mauery@linux.intel.com>; Zhikui Ren <zhikui.ren@intel.com>
> Subject: Re: Fault handling(Threshold exceeds/low) in Fan and NIC sensors
>
> [CAUTION: This Email is from outside the Organization. Unless you trust the sender, Don’t click links or open attachments as it may be a Phishing email, which can steal your Information and compromise your Computer.]
>
> On Mon, Nov 16, 2020 at 5:05 AM Kumar Thangavel <thangavel.k@hcl.com> wrote:
>> Classification: Internal
>>
>> Hi Ed,
>>
>>          In short, Our requirement is to take the actions when the fan fails. That action is platform specific.
>>
>>          Fan failure :  This is based on Fan sensors. If fan sensor's tach values is less than 33%, will consider as a fan failure. So will take the actions to reduce the heat production in the system.
> dbus-sensors and phosphor-pid-control already have mechanisms for handling fan failure in these ways.  Take a look at the existing config files, and they'll guide you on what you need to do next.
>
>   Kumar :  Are you saying about dbus-sensor's checkThresholds function ?  In that function, high/low threshold levels are handled.  Please confirm once.
>                   In that function,  planning to add the service to handle the platform specific actions.
>                   Also, planning to add a new field in entity manager to identify the particular sensors to handle this fault condition.

I have a need to monitor some temperature sensor thresholds and take 
various actions, such as creating
phosphor-logging event logs and doing soft and hard shutdowns after 
various delays.  In fact, not all sensors
I need to monitor will be provided by D-Bus sensors, but I do need to 
use data provided by entity
manager to tell me things like how long to delay, etc.

I wouldn't think that dbus-sensors is probably the appropriate place to 
put this code, since it isn't putting
any sensors on D-Bus and won't necessarily being monitoring sensors 
provided by that repo.

Does anyone have a good idea of where a daemon like this could go? If 
nowhere else, I could put it
in phosphor-fan, though not fan related, since our platforms will always 
use the fan-monitor app
provided there which already does similar things for fan errors.

>
>>                                  So that,  hosts, NIC and other power consuming modules.
>>
>>          Dbus-sensor's already handles the threshold masking. We just use that threshold masking to take the platform specific actions.
>>
>>          Please let us know if any clarifications needed.
>>
>> Thanks,
>> Kumar.
> Ps, Please don't toppost.
> ::DISCLAIMER::
> ________________________________
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects.
> ________________________________



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-11-20 17:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-13 16:30 Fault handling(Threshold exceeds/low) in Fan and NIC sensors Kumar Thangavel
2020-11-13 20:44 ` Ed Tanous
2020-11-16 13:05   ` Kumar Thangavel
2020-11-16 15:59     ` Ed Tanous
2020-11-17 11:59       ` Kumar Thangavel
2020-11-20 17:10         ` Matt Spinler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.