All of lore.kernel.org
 help / color / mirror / Atom feed
* dbus-sensors
@ 2020-04-21 19:35 Matt Spinler
  2020-04-21 21:54 ` dbus-sensors James Feist
  0 siblings, 1 reply; 11+ messages in thread
From: Matt Spinler @ 2020-04-21 19:35 UTC (permalink / raw)
  To: James Feist, OpenBMC Maillist

Hi James,

We're looking into using dbus-sensors(HwmonTemp and PSU) in the future,
but would need to make a few changes to fit our requirements.  Was wondering
what you'd think of the following:

1. Check if a sensor has a _fault sysfs file, and if it does and it
   is nonzero, set the Functional property on the OperationalStatus
   interface to false (and/or maybe 6 below).

2. After the 10 failed reads, instead of just setting the sensor to 0
   also make a D-Bus call to create a phosphor-logging event log and set
   the OperationalStatus sensor to false.

3. After creating this event log, make sure not to do it again until
   main power is cycled.

4. If not already supported (was unsure), be able to find an
   _input file based on a value it has in the corresponding _label file.

5. We have a case where a driver isn't loaded with power off, so somehow
   we still need the sensors to stay on D-Bus when off (and show them
   as not available).

6. Maybe add a new property to Sensor.Value on the validity
   of the value property, for when driver is unloaded or there is an
   error or the sensor reading is otherwise not valid.  We could add
  this to phosphor-hwmon at the same time.
  (I think this was mentioned on the list before).

We would definitely of course work with you on the best way to
accomplish these, and I know #6 needs more discussion on if
this is something we want to do in OpenBMC, though I thought
I remembered an earlier discussion where it was popular.

Thanks,
Matt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dbus-sensors
  2020-04-21 19:35 dbus-sensors Matt Spinler
@ 2020-04-21 21:54 ` James Feist
  2020-04-22 11:56   ` dbus-sensors Brad Bishop
  2020-04-22 16:19   ` dbus-sensors Matt Spinler
  0 siblings, 2 replies; 11+ messages in thread
From: James Feist @ 2020-04-21 21:54 UTC (permalink / raw)
  To: Matt Spinler, OpenBMC Maillist

On 4/21/2020 12:35 PM, Matt Spinler wrote:
> Hi James,
> 
> We're looking into using dbus-sensors(HwmonTemp and PSU) in the future,
> but would need to make a few changes to fit our requirements.  Was 
> wondering
> what you'd think of the following:
> 
> 1. Check if a sensor has a _fault sysfs file, and if it does and it
>    is nonzero, set the Functional property on the OperationalStatus
>    interface to false (and/or maybe 6 below)
Sounds ok.

> 
> 2. After the 10 failed reads, instead of just setting the sensor to 0
>    also make a D-Bus call to create a phosphor-logging event log and set
>    the OperationalStatus sensor to false.

Sounds ok.

> 
> 3. After creating this event log, make sure not to do it again until
>    main power is cycled.

I'd rather this be until the status goes OK again.

> 
> 4. If not already supported (was unsure), be able to find an
>    _input file based on a value it has in the corresponding _label file.

PSU sensor does this, hwmontemp does it by index.
> 
> 5. We have a case where a driver isn't loaded with power off, so somehow
>    we still need the sensors to stay on D-Bus when off (and show them
>    as not available).

All sensors are on d-bus all the time, its based on the EM config.

> 
> 6. Maybe add a new property to Sensor.Value on the validity
>    of the value property, for when driver is unloaded or there is an
>    error or the sensor reading is otherwise not valid.  We could add
>   this to phosphor-hwmon at the same time.
>   (I think this was mentioned on the list before).

Yes, this is where we've used std::nan, I'm not sure if that made it to 
all sensors as it's not tested very much. I know the fans do this.

> 
> We would definitely of course work with you on the best way to
> accomplish these, and I know #6 needs more discussion on if
> this is something we want to do in OpenBMC, though I thought
> I remembered an earlier discussion where it was popular.
> 
> Thanks,
> Matt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dbus-sensors
  2020-04-21 21:54 ` dbus-sensors James Feist
@ 2020-04-22 11:56   ` Brad Bishop
  2020-04-22 12:11     ` dbus-sensors Patrick Williams
  2020-04-22 16:19   ` dbus-sensors Matt Spinler
  1 sibling, 1 reply; 11+ messages in thread
From: Brad Bishop @ 2020-04-22 11:56 UTC (permalink / raw)
  To: James Feist; +Cc: Matt Spinler, OpenBMC Maillist

at 5:54 PM, James Feist <james.feist@linux.intel.com> wrote:

> On 4/21/2020 12:35 PM, Matt Spinler wrote:
>> Hi James,
>> We're looking into using dbus-sensors(HwmonTemp and PSU) in the future,
>> but would need to make a few changes to fit our requirements.  Was  
>> wondering
>> what you'd think of the following:
>> 1. Check if a sensor has a _fault sysfs file, and if it does and it
>>    is nonzero, set the Functional property on the OperationalStatus
>>    interface to false (and/or maybe 6 below)
> Sounds ok.
>
>> 2. After the 10 failed reads, instead of just setting the sensor to 0
>>    also make a D-Bus call to create a phosphor-logging event log and set
>>    the OperationalStatus sensor to false.
>
> Sounds ok.
>
>> 3. After creating this event log, make sure not to do it again until
>>    main power is cycled.
>
> I'd rather this be until the status goes OK again.

We have user-experience requirements that the server administrator must be  
“nagged” in this fashion when something is broken like this.  Could the  
behavior be selectable via build switch?  Any other ideas on how to  
accommodate both behaviors?

>
>> 4. If not already supported (was unsure), be able to find an
>>    _input file based on a value it has in the corresponding _label file.
>
> PSU sensor does this, hwmontemp does it by index.

I think the question here was can we change the temp sensor implementation  
to do that also?

>> 5. We have a case where a driver isn't loaded with power off, so somehow
>>    we still need the sensors to stay on D-Bus when off (and show them
>>    as not available).
>
> All sensors are on d-bus all the time, its based on the EM config.
>
>> 6. Maybe add a new property to Sensor.Value on the validity
>>    of the value property, for when driver is unloaded or there is an
>>    error or the sensor reading is otherwise not valid.  We could add
>>   this to phosphor-hwmon at the same time.
>>   (I think this was mentioned on the list before).
>
> Yes, this is where we've used std::nan, I'm not sure if that made it to  
> all sensors as it's not tested very much. I know the fans do this.
>
>> We would definitely of course work with you on the best way to
>> accomplish these, and I know #6 needs more discussion on if
>> this is something we want to do in OpenBMC, though I thought
>> I remembered an earlier discussion where it was popular.
>> Thanks,
>> Matt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dbus-sensors
  2020-04-22 11:56   ` dbus-sensors Brad Bishop
@ 2020-04-22 12:11     ` Patrick Williams
  2020-04-22 12:24       ` dbus-sensors Brad Bishop
  0 siblings, 1 reply; 11+ messages in thread
From: Patrick Williams @ 2020-04-22 12:11 UTC (permalink / raw)
  To: Brad Bishop; +Cc: James Feist, OpenBMC Maillist, Matt Spinler

[-- Attachment #1: Type: text/plain, Size: 759 bytes --]

On Wed, Apr 22, 2020 at 07:56:14AM -0400, Brad Bishop wrote:
> at 5:54 PM, James Feist <james.feist@linux.intel.com> wrote:
> >> 3. After creating this event log, make sure not to do it again until
> >>    main power is cycled.
> >
> > I'd rather this be until the status goes OK again.
> 
> We have user-experience requirements that the server administrator must be  
> “nagged” in this fashion when something is broken like this.  Could the  
> behavior be selectable via build switch?  Any other ideas on how to  
> accommodate both behaviors?

This sounds like a form of error filtering.  Shouldn't that decision be
done at a much higher level in the stack than down in the entity that
reads the hardware sensor?
-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dbus-sensors
  2020-04-22 12:11     ` dbus-sensors Patrick Williams
@ 2020-04-22 12:24       ` Brad Bishop
  2020-04-22 16:01         ` dbus-sensors Matt Spinler
  0 siblings, 1 reply; 11+ messages in thread
From: Brad Bishop @ 2020-04-22 12:24 UTC (permalink / raw)
  To: Patrick Williams; +Cc: OpenBMC Maillist, Matt Spinler, James Feist

at 8:11 AM, Patrick Williams <patrick@stwcx.xyz> wrote:

> On Wed, Apr 22, 2020 at 07:56:14AM -0400, Brad Bishop wrote:
>> at 5:54 PM, James Feist <james.feist@linux.intel.com> wrote:
>>>> 3. After creating this event log, make sure not to do it again until
>>>>    main power is cycled.
>>>
>>> I'd rather this be until the status goes OK again.
>>
>> We have user-experience requirements that the server administrator must be
>> “nagged” in this fashion when something is broken like this.  Could the
>> behavior be selectable via build switch?  Any other ideas on how to
>> accommodate both behaviors?
>
> This sounds like a form of error filtering.  Shouldn't that decision be
> done at a much higher level in the stack than down in the entity that
> reads the hardware sensor?

Thats an interesting thought.  When the error reporting code sees the error  
for the first time, it could maintain a list of errors that it needs to  
“replay” at different system events, like when the server powers on.

This is certainly more flexible and I like the idea - but one down side  
though is the logging code becomes stateful and the complexity is slightly  
higher.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dbus-sensors
  2020-04-22 12:24       ` dbus-sensors Brad Bishop
@ 2020-04-22 16:01         ` Matt Spinler
  2020-04-22 17:24           ` dbus-sensors James Feist
  0 siblings, 1 reply; 11+ messages in thread
From: Matt Spinler @ 2020-04-22 16:01 UTC (permalink / raw)
  To: Brad Bishop, Patrick Williams; +Cc: OpenBMC Maillist, James Feist



On 4/22/2020 7:24 AM, Brad Bishop wrote:
> at 8:11 AM, Patrick Williams <patrick@stwcx.xyz> wrote:
>
>> On Wed, Apr 22, 2020 at 07:56:14AM -0400, Brad Bishop wrote:
>>> at 5:54 PM, James Feist <james.feist@linux.intel.com> wrote:
>>>>> 3. After creating this event log, make sure not to do it again until
>>>>>    main power is cycled.
>>>>
>>>> I'd rather this be until the status goes OK again.
>>>
>>> We have user-experience requirements that the server administrator 
>>> must be
>>> “nagged” in this fashion when something is broken like this. Could the
>>> behavior be selectable via build switch?  Any other ideas on how to
>>> accommodate both behaviors?
>>
>> This sounds like a form of error filtering.  Shouldn't that decision be
>> done at a much higher level in the stack than down in the entity that
>> reads the hardware sensor?
>
> Thats an interesting thought.  When the error reporting code sees the 
> error for the first time, it could maintain a list of errors that it 
> needs to “replay” at different system events, like when the server 
> powers on.

It isn't really nagging, it's more like error throttling.  At most, only 
log one error per power cycle.
I have to check still, but we may also need to still log the other 
errors, just with a
different severity (for debug purposes).

I kinda like this filtering idea too.  It is flexible and we would only 
have to do it in one place as
opposed to in all the sensor applications we end up using, and could 
also be used to change the
event log severities as mentioned above.  We will have to make sure when 
creating the event log
that it contains enough information to recognize the device that is 
failing so that we can filter
appropriately.


>
> This is certainly more flexible and I like the idea - but one down 
> side though is the logging code becomes stateful and the complexity is 
> slightly higher.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dbus-sensors
  2020-04-21 21:54 ` dbus-sensors James Feist
  2020-04-22 11:56   ` dbus-sensors Brad Bishop
@ 2020-04-22 16:19   ` Matt Spinler
  2020-04-22 17:03     ` dbus-sensors Bills, Jason M
  2020-04-22 17:25     ` dbus-sensors James Feist
  1 sibling, 2 replies; 11+ messages in thread
From: Matt Spinler @ 2020-04-22 16:19 UTC (permalink / raw)
  To: James Feist, OpenBMC Maillist

Great!  Sounds like we should be able to make thing work.
A few comments below.

On 4/21/2020 4:54 PM, James Feist wrote:
> On 4/21/2020 12:35 PM, Matt Spinler wrote:
>> Hi James,
>>
>> We're looking into using dbus-sensors(HwmonTemp and PSU) in the future,
>> but would need to make a few changes to fit our requirements. Was 
>> wondering
>> what you'd think of the following:
>>
>> 1. Check if a sensor has a _fault sysfs file, and if it does and it
>>    is nonzero, set the Functional property on the OperationalStatus
>>    interface to false (and/or maybe 6 below)
> Sounds ok.
>
>>
>> 2. After the 10 failed reads, instead of just setting the sensor to 0
>>    also make a D-Bus call to create a phosphor-logging event log and set
>>    the OperationalStatus sensor to false.
>
> Sounds ok.
>
>>
>> 3. After creating this event log, make sure not to do it again until
>>    main power is cycled.
>
> I'd rather this be until the status goes OK again.

As suggested by Patrick, I agree the throttling can be done elsewhere, so we
would just create the logs as you state here.

>
>>
>> 4. If not already supported (was unsure), be able to find an
>>    _input file based on a value it has in the corresponding _label file.
>
> PSU sensor does this, hwmontemp does it by index.

Would you be OK with us also adding this to PSUSensor?

>>
>> 5. We have a case where a driver isn't loaded with power off, so somehow
>>    we still need the sensors to stay on D-Bus when off (and show them
>>    as not available).
>
> All sensors are on d-bus all the time, its based on the EM config.

Perfect!

>
>>
>> 6. Maybe add a new property to Sensor.Value on the validity
>>    of the value property, for when driver is unloaded or there is an
>>    error or the sensor reading is otherwise not valid.  We could add
>>   this to phosphor-hwmon at the same time.
>>   (I think this was mentioned on the list before).
>
> Yes, this is where we've used std::nan, I'm not sure if that made it 
> to all sensors as it's not tested very much. I know the fans do this.
>
>>
>> We would definitely of course work with you on the best way to
>> accomplish these, and I know #6 needs more discussion on if
>> this is something we want to do in OpenBMC, though I thought
>> I remembered an earlier discussion where it was popular.
>>
>> Thanks,
>> Matt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dbus-sensors
  2020-04-22 16:19   ` dbus-sensors Matt Spinler
@ 2020-04-22 17:03     ` Bills, Jason M
  2020-04-22 17:25     ` dbus-sensors James Feist
  1 sibling, 0 replies; 11+ messages in thread
From: Bills, Jason M @ 2020-04-22 17:03 UTC (permalink / raw)
  To: openbmc



On 4/22/2020 9:19 AM, Matt Spinler wrote:
> Great!  Sounds like we should be able to make thing work.
> A few comments below.
> 
> On 4/21/2020 4:54 PM, James Feist wrote:
>> On 4/21/2020 12:35 PM, Matt Spinler wrote:
>>> Hi James,
>>>
>>> We're looking into using dbus-sensors(HwmonTemp and PSU) in the future,
>>> but would need to make a few changes to fit our requirements. Was 
>>> wondering
>>> what you'd think of the following:
>>>
>>> 1. Check if a sensor has a _fault sysfs file, and if it does and it
>>>    is nonzero, set the Functional property on the OperationalStatus
>>>    interface to false (and/or maybe 6 below)
>> Sounds ok.
>>
>>>
>>> 2. After the 10 failed reads, instead of just setting the sensor to 0
>>>    also make a D-Bus call to create a phosphor-logging event log and set
>>>    the OperationalStatus sensor to false.
>>
>> Sounds ok.
>>
>>>
>>> 3. After creating this event log, make sure not to do it again until
>>>    main power is cycled.
>>
>> I'd rather this be until the status goes OK again.
> 
> As suggested by Patrick, I agree the throttling can be done elsewhere, 
> so we
> would just create the logs as you state here.
> 
I'm not familiar with the sensors, but for this specific case, would it 
work to base the log on OperationalStatus?  It seems logical to not log 
events for sensors that are not operational, and since it will be set to 
false after the 10 failures, it would stop the log from nagging.

>>
>>>
>>> 4. If not already supported (was unsure), be able to find an
>>>    _input file based on a value it has in the corresponding _label file.
>>
>> PSU sensor does this, hwmontemp does it by index.
> 
> Would you be OK with us also adding this to PSUSensor?
> 
>>>
>>> 5. We have a case where a driver isn't loaded with power off, so somehow
>>>    we still need the sensors to stay on D-Bus when off (and show them
>>>    as not available).
>>
>> All sensors are on d-bus all the time, its based on the EM config.
> 
> Perfect!
> 
>>
>>>
>>> 6. Maybe add a new property to Sensor.Value on the validity
>>>    of the value property, for when driver is unloaded or there is an
>>>    error or the sensor reading is otherwise not valid.  We could add
>>>   this to phosphor-hwmon at the same time.
>>>   (I think this was mentioned on the list before).
>>
>> Yes, this is where we've used std::nan, I'm not sure if that made it 
>> to all sensors as it's not tested very much. I know the fans do this.
>>
>>>
>>> We would definitely of course work with you on the best way to
>>> accomplish these, and I know #6 needs more discussion on if
>>> this is something we want to do in OpenBMC, though I thought
>>> I remembered an earlier discussion where it was popular.
>>>
>>> Thanks,
>>> Matt
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dbus-sensors
  2020-04-22 16:01         ` dbus-sensors Matt Spinler
@ 2020-04-22 17:24           ` James Feist
  0 siblings, 0 replies; 11+ messages in thread
From: James Feist @ 2020-04-22 17:24 UTC (permalink / raw)
  To: Matt Spinler, Brad Bishop, Patrick Williams; +Cc: OpenBMC Maillist

On 4/22/2020 9:01 AM, Matt Spinler wrote:
> 
> 
> On 4/22/2020 7:24 AM, Brad Bishop wrote:
>> at 8:11 AM, Patrick Williams <patrick@stwcx.xyz> wrote:
>>
>>> On Wed, Apr 22, 2020 at 07:56:14AM -0400, Brad Bishop wrote:
>>>> at 5:54 PM, James Feist <james.feist@linux.intel.com> wrote:
>>>>>> 3. After creating this event log, make sure not to do it again until
>>>>>>    main power is cycled.
>>>>>
>>>>> I'd rather this be until the status goes OK again.
>>>>
>>>> We have user-experience requirements that the server administrator 
>>>> must be
>>>> “nagged” in this fashion when something is broken like this. Could the
>>>> behavior be selectable via build switch?  Any other ideas on how to
>>>> accommodate both behaviors?

As long as it's configurable somehow, fine by me.

>>>
>>> This sounds like a form of error filtering.  Shouldn't that decision be
>>> done at a much higher level in the stack than down in the entity that
>>> reads the hardware sensor?
>>
>> Thats an interesting thought.  When the error reporting code sees the 
>> error for the first time, it could maintain a list of errors that it 
>> needs to “replay” at different system events, like when the server 
>> powers on.
> 
> It isn't really nagging, it's more like error throttling.  At most, only 
> log one error per power cycle.
> I have to check still, but we may also need to still log the other 
> errors, just with a
> different severity (for debug purposes).
> 
> I kinda like this filtering idea too.  It is flexible and we would only 
> have to do it in one place as
> opposed to in all the sensor applications we end up using, and could 
> also be used to change the
> event log severities as mentioned above.  We will have to make sure when 
> creating the event log
> that it contains enough information to recognize the device that is 
> failing so that we can filter
> appropriately.
> 
> 
>>
>> This is certainly more flexible and I like the idea - but one down 
>> side though is the logging code becomes stateful and the complexity is 
>> slightly higher.
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dbus-sensors
  2020-04-22 16:19   ` dbus-sensors Matt Spinler
  2020-04-22 17:03     ` dbus-sensors Bills, Jason M
@ 2020-04-22 17:25     ` James Feist
  2020-07-27 12:06       ` dbus-sensors Deng Tyler
  1 sibling, 1 reply; 11+ messages in thread
From: James Feist @ 2020-04-22 17:25 UTC (permalink / raw)
  To: Matt Spinler, OpenBMC Maillist

On 4/22/2020 9:19 AM, Matt Spinler wrote:
>>>
>>> 4. If not already supported (was unsure), be able to find an
>>>    _input file based on a value it has in the corresponding _label file.
>>
>> PSU sensor does this, hwmontemp does it by index.
> 
> Would you be OK with us also adding this to PSUSensor?

PSUSensor already does this? Do you mean hwmontemp sensor? I'd be fine 
with converting the index scheme to a label scheme.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dbus-sensors
  2020-04-22 17:25     ` dbus-sensors James Feist
@ 2020-07-27 12:06       ` Deng Tyler
  0 siblings, 0 replies; 11+ messages in thread
From: Deng Tyler @ 2020-07-27 12:06 UTC (permalink / raw)
  To: James Feist; +Cc: Matt Spinler, OpenBMC Maillist

[-- Attachment #1: Type: text/plain, Size: 1318 bytes --]

Hi Jame:
    I am investigating and porting psusensor to my platform. I have a
question item 5.
>> 5. We have a case where a driver isn't loaded with power off, so somehow
>>    we still need the sensors to stay on D-Bus when off (and show them
>>    as not available).
>
> All sensors are on d-bus all the time, its based on the EM config.

I create 2 psu config, psu1 and psu2, in entityManager and booting my
system with 1 psu. There is only psu1 sensor object created in psusensor
service.
After inserting psu2 and the driver binding hwmon for psu2, psusensor
service don't create psu2 sensor object.
Does it work as design? or I need update dbus-sensor?
My dbus-sensor revision is "c140e2008b038b5fbf07aca0795b8030224eddd4".

Tyler

James Feist <james.feist@linux.intel.com> 於 2020年4月23日 週四 上午1:27寫道:

> On 4/22/2020 9:19 AM, Matt Spinler wrote:
> >>>
> >>> 4. If not already supported (was unsure), be able to find an
> >>>    _input file based on a value it has in the corresponding _label
> file.
> >>
> >> PSU sensor does this, hwmontemp does it by index.
> >
> > Would you be OK with us also adding this to PSUSensor?
>
> PSUSensor already does this? Do you mean hwmontemp sensor? I'd be fine
> with converting the index scheme to a label scheme.
>
>

[-- Attachment #2: Type: text/html, Size: 2129 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-07-27 12:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-21 19:35 dbus-sensors Matt Spinler
2020-04-21 21:54 ` dbus-sensors James Feist
2020-04-22 11:56   ` dbus-sensors Brad Bishop
2020-04-22 12:11     ` dbus-sensors Patrick Williams
2020-04-22 12:24       ` dbus-sensors Brad Bishop
2020-04-22 16:01         ` dbus-sensors Matt Spinler
2020-04-22 17:24           ` dbus-sensors James Feist
2020-04-22 16:19   ` dbus-sensors Matt Spinler
2020-04-22 17:03     ` dbus-sensors Bills, Jason M
2020-04-22 17:25     ` dbus-sensors James Feist
2020-07-27 12:06       ` dbus-sensors Deng Tyler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.