All of lore.kernel.org
 help / color / mirror / Atom feed
* Dealing with a sensor which doesn't have valid reading until host is powered up
@ 2020-08-26  0:49 Alex Qiu
  2020-08-27 21:49 ` Alex Qiu
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Qiu @ 2020-08-26  0:49 UTC (permalink / raw)
  To: James Feist, OpenBMC Maillist; +Cc: Peter Lundgren, Jason Ling, Josh Lehan

[-- Attachment #1: Type: text/plain, Size: 536 bytes --]

Hi James and OpenBMC community,

We have a sensor for HwmonTempSensor which doesn't have a valid reading
until the host is fully booted. Before it's becoming alive and useful, it's
getting disabled in code (
https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp#L266)
because of errors thrown up by the hwmon driver. Ideally, the thermal
control loop should kick the fan to fail safe mode until no more errors are
observed.

Any suggestions on how we should handle this kind of sensor properly?

Thank you!

- Alex Qiu

[-- Attachment #2: Type: text/html, Size: 871 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-08-26  0:49 Dealing with a sensor which doesn't have valid reading until host is powered up Alex Qiu
@ 2020-08-27 21:49 ` Alex Qiu
  2020-08-28 16:38   ` James Feist
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Qiu @ 2020-08-27 21:49 UTC (permalink / raw)
  To: James Feist, OpenBMC Maillist; +Cc: Peter Lundgren, Jason Ling, Josh Lehan

[-- Attachment #1: Type: text/plain, Size: 870 bytes --]

Hi James,

After some debugging, I realized that the code I pointed out earlier wasn't
the root cause. Update is that, the HwmonTempSensor stops updating after
the hwmon driver returns EAGAIN as errno. I'll keep debugging...

- Alex Qiu


On Tue, Aug 25, 2020 at 5:49 PM Alex Qiu <xqiu@google.com> wrote:

> Hi James and OpenBMC community,
>
> We have a sensor for HwmonTempSensor which doesn't have a valid reading
> until the host is fully booted. Before it's becoming alive and useful, it's
> getting disabled in code (
> https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp#L266)
> because of errors thrown up by the hwmon driver. Ideally, the thermal
> control loop should kick the fan to fail safe mode until no more errors are
> observed.
>
> Any suggestions on how we should handle this kind of sensor properly?
>
> Thank you!
>
> - Alex Qiu
>

[-- Attachment #2: Type: text/html, Size: 1581 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-08-27 21:49 ` Alex Qiu
@ 2020-08-28 16:38   ` James Feist
  2020-08-28 16:43     ` Alex Qiu
  0 siblings, 1 reply; 18+ messages in thread
From: James Feist @ 2020-08-28 16:38 UTC (permalink / raw)
  To: Alex Qiu, OpenBMC Maillist; +Cc: Peter Lundgren, Josh Lehan, Jason Ling

On 8/27/2020 2:49 PM, Alex Qiu wrote:
> Hi James,
> 
> After some debugging, I realized that the code I pointed out earlier 
> wasn't the root cause. Update is that, the HwmonTempSensor stops 
> updating after the hwmon driver returns EAGAIN as errno. I'll keep 
> debugging...
> 
> - Alex Qiu
> 
> 
> On Tue, Aug 25, 2020 at 5:49 PM Alex Qiu <xqiu@google.com 
> <mailto:xqiu@google.com>> wrote:
> 
>     Hi James and OpenBMC community,
> 
>     We have a sensor for HwmonTempSensor which doesn't have a valid
>     reading until the host is fully booted. Before it's becoming alive
>     and useful, it's getting disabled in code
>     (https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp#L266)
>     because of errors thrown up by the hwmon driver. Ideally, the
>     thermal control loop should kick the fan to fail safe mode until no
>     more errors are observed.
> 
>     Any suggestions on how we should handle this kind of sensor properly?

For what its worth we use the PowerState property that has options of 
power on or BiosPost to disable scanning when the state is invalid: 
https://github.com/openbmc/dbus-sensors/blob/f27a55c775383a3fb1ac655f3eda785f6845f214/src/HwmonTempMain.cpp#L208


> 
>     Thank you!
> 
>     - Alex Qiu
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-08-28 16:38   ` James Feist
@ 2020-08-28 16:43     ` Alex Qiu
  2020-08-28 17:54       ` James Feist
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Qiu @ 2020-08-28 16:43 UTC (permalink / raw)
  To: James Feist; +Cc: OpenBMC Maillist, Peter Lundgren, Josh Lehan, Jason Ling

[-- Attachment #1: Type: text/plain, Size: 1802 bytes --]

Hi James,

Thx for the reply! So right now, one thing is that the sensor is not
dependent on the power state of the host solely, but also dependent on the
boot progress of the host. And the more serious issue is that returning
EAGAIN from the driver freezes the sensor, which is what I'm debugging
right now. Do we have special treatment on errno returned by the driver?
Thx.

- Alex Qiu


On Fri, Aug 28, 2020 at 9:38 AM James Feist <james.feist@linux.intel.com>
wrote:

> On 8/27/2020 2:49 PM, Alex Qiu wrote:
> > Hi James,
> >
> > After some debugging, I realized that the code I pointed out earlier
> > wasn't the root cause. Update is that, the HwmonTempSensor stops
> > updating after the hwmon driver returns EAGAIN as errno. I'll keep
> > debugging...
> >
> > - Alex Qiu
> >
> >
> > On Tue, Aug 25, 2020 at 5:49 PM Alex Qiu <xqiu@google.com
> > <mailto:xqiu@google.com>> wrote:
> >
> >     Hi James and OpenBMC community,
> >
> >     We have a sensor for HwmonTempSensor which doesn't have a valid
> >     reading until the host is fully booted. Before it's becoming alive
> >     and useful, it's getting disabled in code
> >     (
> https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp#L266
> )
> >     because of errors thrown up by the hwmon driver. Ideally, the
> >     thermal control loop should kick the fan to fail safe mode until no
> >     more errors are observed.
> >
> >     Any suggestions on how we should handle this kind of sensor properly?
>
> For what its worth we use the PowerState property that has options of
> power on or BiosPost to disable scanning when the state is invalid:
>
> https://github.com/openbmc/dbus-sensors/blob/f27a55c775383a3fb1ac655f3eda785f6845f214/src/HwmonTempMain.cpp#L208
>
>
> >
> >     Thank you!
> >
> >     - Alex Qiu
> >
>

[-- Attachment #2: Type: text/html, Size: 2915 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-08-28 16:43     ` Alex Qiu
@ 2020-08-28 17:54       ` James Feist
  2020-08-31 21:32         ` Alex Qiu
  0 siblings, 1 reply; 18+ messages in thread
From: James Feist @ 2020-08-28 17:54 UTC (permalink / raw)
  To: Alex Qiu; +Cc: OpenBMC Maillist, Peter Lundgren, Josh Lehan, Jason Ling

On 8/28/2020 9:43 AM, Alex Qiu wrote:
> Hi James,
> 
> Thx for the reply! So right now, one thing is that the sensor is not 
> dependent on the power state of the host solely, but also dependent on 
> the boot progress of the host.

Would the BiosPost power state not suffice?

> And the more serious issue is that 
> returning EAGAIN from the driver freezes the sensor, which is what I'm 
> debugging right now. Do we have special treatment on errno returned by 
> the driver? Thx.

I ran into a similar issue with the CPUSensor and this was my fix:
https://github.com/openbmc/dbus-sensors/commit/c22b842bfa8cfe798d83f99fa7aa9f142278c21d#diff-ccbe0562fe1d501b4c1c42d967a02ea0

I haven't hit this issue with hwmon sensor though.

> 
> - Alex Qiu
> 
> 
> On Fri, Aug 28, 2020 at 9:38 AM James Feist <james.feist@linux.intel.com 
> <mailto:james.feist@linux.intel.com>> wrote:
> 
>     On 8/27/2020 2:49 PM, Alex Qiu wrote:
>      > Hi James,
>      >
>      > After some debugging, I realized that the code I pointed out earlier
>      > wasn't the root cause. Update is that, the HwmonTempSensor stops
>      > updating after the hwmon driver returns EAGAIN as errno. I'll keep
>      > debugging...
>      >
>      > - Alex Qiu
>      >
>      >
>      > On Tue, Aug 25, 2020 at 5:49 PM Alex Qiu <xqiu@google.com
>     <mailto:xqiu@google.com>
>      > <mailto:xqiu@google.com <mailto:xqiu@google.com>>> wrote:
>      >
>      >     Hi James and OpenBMC community,
>      >
>      >     We have a sensor for HwmonTempSensor which doesn't have a valid
>      >     reading until the host is fully booted. Before it's becoming
>     alive
>      >     and useful, it's getting disabled in code
>      >   
>       (https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp#L266)
>      >     because of errors thrown up by the hwmon driver. Ideally, the
>      >     thermal control loop should kick the fan to fail safe mode
>     until no
>      >     more errors are observed.
>      >
>      >     Any suggestions on how we should handle this kind of sensor
>     properly?
> 
>     For what its worth we use the PowerState property that has options of
>     power on or BiosPost to disable scanning when the state is invalid:
>     https://github.com/openbmc/dbus-sensors/blob/f27a55c775383a3fb1ac655f3eda785f6845f214/src/HwmonTempMain.cpp#L208
> 
> 
>      >
>      >     Thank you!
>      >
>      >     - Alex Qiu
>      >
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-08-28 17:54       ` James Feist
@ 2020-08-31 21:32         ` Alex Qiu
  2020-08-31 22:08           ` Alex Qiu
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Qiu @ 2020-08-31 21:32 UTC (permalink / raw)
  To: James Feist
  Cc: OpenBMC Maillist, Peter Lundgren, Josh Lehan, Jason Ling, Ed Tanous

[-- Attachment #1: Type: text/plain, Size: 3400 bytes --]

Hi James,

I think BiosPist power state might not suffice, because the host needs to
load firmware onto the device in order to enable the sensors at a certain
stage in the OS boot, which is very close to boot completion.

However, we can tolerate the fan being noisy before boot completion, and I
believe the root cause the issue is the HwmonTempSensor freezes once the
control flow hitting boost::asio::async_read_until (
https://github.com/openbmc/dbus-sensors/blob/master/src/HwmonTempSensor.cpp#L92).
Do you know if this function has something special to do with a file that
can have errno EAGAIN? Based on that, replacing the errno in the driver
with sth other than EAGAIN also seems to be a viable fix.

Thanks!

- Alex Qiu



- Alex Qiu


On Fri, Aug 28, 2020 at 10:54 AM James Feist <james.feist@linux.intel.com>
wrote:

> On 8/28/2020 9:43 AM, Alex Qiu wrote:
> > Hi James,
> >
> > Thx for the reply! So right now, one thing is that the sensor is not
> > dependent on the power state of the host solely, but also dependent on
> > the boot progress of the host.
>
> Would the BiosPost power state not suffice?
>
> > And the more serious issue is that
> > returning EAGAIN from the driver freezes the sensor, which is what I'm
> > debugging right now. Do we have special treatment on errno returned by
> > the driver? Thx.
>
> I ran into a similar issue with the CPUSensor and this was my fix:
>
> https://github.com/openbmc/dbus-sensors/commit/c22b842bfa8cfe798d83f99fa7aa9f142278c21d#diff-ccbe0562fe1d501b4c1c42d967a02ea0
>
> I haven't hit this issue with hwmon sensor though.
>
> >
> > - Alex Qiu
> >
> >
> > On Fri, Aug 28, 2020 at 9:38 AM James Feist <james.feist@linux.intel.com
> > <mailto:james.feist@linux.intel.com>> wrote:
> >
> >     On 8/27/2020 2:49 PM, Alex Qiu wrote:
> >      > Hi James,
> >      >
> >      > After some debugging, I realized that the code I pointed out
> earlier
> >      > wasn't the root cause. Update is that, the HwmonTempSensor stops
> >      > updating after the hwmon driver returns EAGAIN as errno. I'll keep
> >      > debugging...
> >      >
> >      > - Alex Qiu
> >      >
> >      >
> >      > On Tue, Aug 25, 2020 at 5:49 PM Alex Qiu <xqiu@google.com
> >     <mailto:xqiu@google.com>
> >      > <mailto:xqiu@google.com <mailto:xqiu@google.com>>> wrote:
> >      >
> >      >     Hi James and OpenBMC community,
> >      >
> >      >     We have a sensor for HwmonTempSensor which doesn't have a
> valid
> >      >     reading until the host is fully booted. Before it's becoming
> >     alive
> >      >     and useful, it's getting disabled in code
> >      >
> >       (
> https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp#L266
> )
> >      >     because of errors thrown up by the hwmon driver. Ideally, the
> >      >     thermal control loop should kick the fan to fail safe mode
> >     until no
> >      >     more errors are observed.
> >      >
> >      >     Any suggestions on how we should handle this kind of sensor
> >     properly?
> >
> >     For what its worth we use the PowerState property that has options of
> >     power on or BiosPost to disable scanning when the state is invalid:
> >
> https://github.com/openbmc/dbus-sensors/blob/f27a55c775383a3fb1ac655f3eda785f6845f214/src/HwmonTempMain.cpp#L208
> >
> >
> >      >
> >      >     Thank you!
> >      >
> >      >     - Alex Qiu
> >      >
> >
>

[-- Attachment #2: Type: text/html, Size: 5468 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-08-31 21:32         ` Alex Qiu
@ 2020-08-31 22:08           ` Alex Qiu
  2020-08-31 23:54             ` Guenter Roeck
  2020-09-02 17:59             ` James Feist
  0 siblings, 2 replies; 18+ messages in thread
From: Alex Qiu @ 2020-08-31 22:08 UTC (permalink / raw)
  To: James Feist, Guenter Roeck
  Cc: OpenBMC Maillist, Peter Lundgren, Josh Lehan, Jason Ling,
	Ed Tanous, Kun Yi

[-- Attachment #1: Type: text/plain, Size: 4140 bytes --]

Hi James,

I just came through this doc (
https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/overview/posix/stream_descriptor.html).
Looks like that it's a terrible idea for hwmon driver to return EAGAIN for
dbus-sensors. With that, I think the proper fix is also to use other errno
instead in our driver, and this caveat should be probably documented
somewhere.

Hi Guenter,

Is it reasonable for hwmon drivers to return EAGAIN? Is it something that
has special meaning and should be avoided in hwmon drivers?

Thank you!

- Alex Qiu


On Mon, Aug 31, 2020 at 2:32 PM Alex Qiu <xqiu@google.com> wrote:

> Hi James,
>
> I think BiosPist power state might not suffice, because the host needs to
> load firmware onto the device in order to enable the sensors at a certain
> stage in the OS boot, which is very close to boot completion.
>
> However, we can tolerate the fan being noisy before boot completion, and I
> believe the root cause the issue is the HwmonTempSensor freezes once the
> control flow hitting boost::asio::async_read_until (
> https://github.com/openbmc/dbus-sensors/blob/master/src/HwmonTempSensor.cpp#L92).
> Do you know if this function has something special to do with a file that
> can have errno EAGAIN? Based on that, replacing the errno in the driver
> with sth other than EAGAIN also seems to be a viable fix.
>
> Thanks!
>
> - Alex Qiu
>
>
>
> - Alex Qiu
>
>
> On Fri, Aug 28, 2020 at 10:54 AM James Feist <james.feist@linux.intel.com>
> wrote:
>
>> On 8/28/2020 9:43 AM, Alex Qiu wrote:
>> > Hi James,
>> >
>> > Thx for the reply! So right now, one thing is that the sensor is not
>> > dependent on the power state of the host solely, but also dependent on
>> > the boot progress of the host.
>>
>> Would the BiosPost power state not suffice?
>>
>> > And the more serious issue is that
>> > returning EAGAIN from the driver freezes the sensor, which is what I'm
>> > debugging right now. Do we have special treatment on errno returned by
>> > the driver? Thx.
>>
>> I ran into a similar issue with the CPUSensor and this was my fix:
>>
>> https://github.com/openbmc/dbus-sensors/commit/c22b842bfa8cfe798d83f99fa7aa9f142278c21d#diff-ccbe0562fe1d501b4c1c42d967a02ea0
>>
>> I haven't hit this issue with hwmon sensor though.
>>
>> >
>> > - Alex Qiu
>> >
>> >
>> > On Fri, Aug 28, 2020 at 9:38 AM James Feist <
>> james.feist@linux.intel.com
>> > <mailto:james.feist@linux.intel.com>> wrote:
>> >
>> >     On 8/27/2020 2:49 PM, Alex Qiu wrote:
>> >      > Hi James,
>> >      >
>> >      > After some debugging, I realized that the code I pointed out
>> earlier
>> >      > wasn't the root cause. Update is that, the HwmonTempSensor stops
>> >      > updating after the hwmon driver returns EAGAIN as errno. I'll
>> keep
>> >      > debugging...
>> >      >
>> >      > - Alex Qiu
>> >      >
>> >      >
>> >      > On Tue, Aug 25, 2020 at 5:49 PM Alex Qiu <xqiu@google.com
>> >     <mailto:xqiu@google.com>
>> >      > <mailto:xqiu@google.com <mailto:xqiu@google.com>>> wrote:
>> >      >
>> >      >     Hi James and OpenBMC community,
>> >      >
>> >      >     We have a sensor for HwmonTempSensor which doesn't have a
>> valid
>> >      >     reading until the host is fully booted. Before it's becoming
>> >     alive
>> >      >     and useful, it's getting disabled in code
>> >      >
>> >       (
>> https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp#L266
>> )
>> >      >     because of errors thrown up by the hwmon driver. Ideally, the
>> >      >     thermal control loop should kick the fan to fail safe mode
>> >     until no
>> >      >     more errors are observed.
>> >      >
>> >      >     Any suggestions on how we should handle this kind of sensor
>> >     properly?
>> >
>> >     For what its worth we use the PowerState property that has options
>> of
>> >     power on or BiosPost to disable scanning when the state is invalid:
>> >
>> https://github.com/openbmc/dbus-sensors/blob/f27a55c775383a3fb1ac655f3eda785f6845f214/src/HwmonTempMain.cpp#L208
>> >
>> >
>> >      >
>> >      >     Thank you!
>> >      >
>> >      >     - Alex Qiu
>> >      >
>> >
>>
>

[-- Attachment #2: Type: text/html, Size: 6714 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-08-31 22:08           ` Alex Qiu
@ 2020-08-31 23:54             ` Guenter Roeck
  2020-08-31 23:58               ` Alex Qiu
  2020-09-02 17:59             ` James Feist
  1 sibling, 1 reply; 18+ messages in thread
From: Guenter Roeck @ 2020-08-31 23:54 UTC (permalink / raw)
  To: Alex Qiu
  Cc: James Feist, OpenBMC Maillist, Peter Lundgren, Josh Lehan,
	Jason Ling, Ed Tanous, Kun Yi

On Mon, Aug 31, 2020 at 3:09 PM Alex Qiu <xqiu@google.com> wrote:
>
> Hi James,
>
> I just came through this doc (https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/overview/posix/stream_descriptor.html). Looks like that it's a terrible idea for hwmon driver to return EAGAIN for dbus-sensors. With that, I think the proper fix is also to use other errno instead in our driver, and this caveat should be probably documented somewhere.
>
> Hi Guenter,
>
> Is it reasonable for hwmon drivers to return EAGAIN? Is it something that has special meaning and should be avoided in hwmon drivers?
>

Not sure how to relate the link above with -EAGAIN, but ... -EAGAIN
might trigger userspace to try again immediately, which would
potentially be quite bad. We had seen that effect at a previous
company, where it ended up overwhelming userspace. So I am not
entirely in favor of it. How about -ENODATA ? that might make more
sense unless the problem is known to be a short term glitch.

Thanks,
Guenter

> Thank you!
>
> - Alex Qiu
>
>
> On Mon, Aug 31, 2020 at 2:32 PM Alex Qiu <xqiu@google.com> wrote:
>>
>> Hi James,
>>
>> I think BiosPist power state might not suffice, because the host needs to load firmware onto the device in order to enable the sensors at a certain stage in the OS boot, which is very close to boot completion.
>>
>> However, we can tolerate the fan being noisy before boot completion, and I believe the root cause the issue is the HwmonTempSensor freezes once the control flow hitting boost::asio::async_read_until (https://github.com/openbmc/dbus-sensors/blob/master/src/HwmonTempSensor.cpp#L92). Do you know if this function has something special to do with a file that can have errno EAGAIN? Based on that, replacing the errno in the driver with sth other than EAGAIN also seems to be a viable fix.
>>
>> Thanks!
>>
>> - Alex Qiu
>>
>>
>>
>> - Alex Qiu
>>
>>
>> On Fri, Aug 28, 2020 at 10:54 AM James Feist <james.feist@linux.intel.com> wrote:
>>>
>>> On 8/28/2020 9:43 AM, Alex Qiu wrote:
>>> > Hi James,
>>> >
>>> > Thx for the reply! So right now, one thing is that the sensor is not
>>> > dependent on the power state of the host solely, but also dependent on
>>> > the boot progress of the host.
>>>
>>> Would the BiosPost power state not suffice?
>>>
>>> > And the more serious issue is that
>>> > returning EAGAIN from the driver freezes the sensor, which is what I'm
>>> > debugging right now. Do we have special treatment on errno returned by
>>> > the driver? Thx.
>>>
>>> I ran into a similar issue with the CPUSensor and this was my fix:
>>> https://github.com/openbmc/dbus-sensors/commit/c22b842bfa8cfe798d83f99fa7aa9f142278c21d#diff-ccbe0562fe1d501b4c1c42d967a02ea0
>>>
>>> I haven't hit this issue with hwmon sensor though.
>>>
>>> >
>>> > - Alex Qiu
>>> >
>>> >
>>> > On Fri, Aug 28, 2020 at 9:38 AM James Feist <james.feist@linux.intel.com
>>> > <mailto:james.feist@linux.intel.com>> wrote:
>>> >
>>> >     On 8/27/2020 2:49 PM, Alex Qiu wrote:
>>> >      > Hi James,
>>> >      >
>>> >      > After some debugging, I realized that the code I pointed out earlier
>>> >      > wasn't the root cause. Update is that, the HwmonTempSensor stops
>>> >      > updating after the hwmon driver returns EAGAIN as errno. I'll keep
>>> >      > debugging...
>>> >      >
>>> >      > - Alex Qiu
>>> >      >
>>> >      >
>>> >      > On Tue, Aug 25, 2020 at 5:49 PM Alex Qiu <xqiu@google.com
>>> >     <mailto:xqiu@google.com>
>>> >      > <mailto:xqiu@google.com <mailto:xqiu@google.com>>> wrote:
>>> >      >
>>> >      >     Hi James and OpenBMC community,
>>> >      >
>>> >      >     We have a sensor for HwmonTempSensor which doesn't have a valid
>>> >      >     reading until the host is fully booted. Before it's becoming
>>> >     alive
>>> >      >     and useful, it's getting disabled in code
>>> >      >
>>> >       (https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp#L266)
>>> >      >     because of errors thrown up by the hwmon driver. Ideally, the
>>> >      >     thermal control loop should kick the fan to fail safe mode
>>> >     until no
>>> >      >     more errors are observed.
>>> >      >
>>> >      >     Any suggestions on how we should handle this kind of sensor
>>> >     properly?
>>> >
>>> >     For what its worth we use the PowerState property that has options of
>>> >     power on or BiosPost to disable scanning when the state is invalid:
>>> >     https://github.com/openbmc/dbus-sensors/blob/f27a55c775383a3fb1ac655f3eda785f6845f214/src/HwmonTempMain.cpp#L208
>>> >
>>> >
>>> >      >
>>> >      >     Thank you!
>>> >      >
>>> >      >     - Alex Qiu
>>> >      >
>>> >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-08-31 23:54             ` Guenter Roeck
@ 2020-08-31 23:58               ` Alex Qiu
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Qiu @ 2020-08-31 23:58 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: James Feist, OpenBMC Maillist, Peter Lundgren, Josh Lehan,
	Jason Ling, Ed Tanous, Kun Yi

[-- Attachment #1: Type: text/plain, Size: 5259 bytes --]

Hi Guenter,

FYI, the boot_asio call freezes on EAGAIN, even the driver later recovers
to a normal state, which can be verified by reading the hwmon file directly.

I'm switching to ENODATA and ENOMSG. Thanks!

- Alex Qiu


On Mon, Aug 31, 2020 at 4:54 PM Guenter Roeck <groeck@google.com> wrote:

> On Mon, Aug 31, 2020 at 3:09 PM Alex Qiu <xqiu@google.com> wrote:
> >
> > Hi James,
> >
> > I just came through this doc (
> https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/overview/posix/stream_descriptor.html).
> Looks like that it's a terrible idea for hwmon driver to return EAGAIN for
> dbus-sensors. With that, I think the proper fix is also to use other errno
> instead in our driver, and this caveat should be probably documented
> somewhere.
> >
> > Hi Guenter,
> >
> > Is it reasonable for hwmon drivers to return EAGAIN? Is it something
> that has special meaning and should be avoided in hwmon drivers?
> >
>
> Not sure how to relate the link above with -EAGAIN, but ... -EAGAIN
> might trigger userspace to try again immediately, which would
> potentially be quite bad. We had seen that effect at a previous
> company, where it ended up overwhelming userspace. So I am not
> entirely in favor of it. How about -ENODATA ? that might make more
> sense unless the problem is known to be a short term glitch.
>
> Thanks,
> Guenter
>
> > Thank you!
> >
> > - Alex Qiu
> >
> >
> > On Mon, Aug 31, 2020 at 2:32 PM Alex Qiu <xqiu@google.com> wrote:
> >>
> >> Hi James,
> >>
> >> I think BiosPist power state might not suffice, because the host needs
> to load firmware onto the device in order to enable the sensors at a
> certain stage in the OS boot, which is very close to boot completion.
> >>
> >> However, we can tolerate the fan being noisy before boot completion,
> and I believe the root cause the issue is the HwmonTempSensor freezes once
> the control flow hitting boost::asio::async_read_until (
> https://github.com/openbmc/dbus-sensors/blob/master/src/HwmonTempSensor.cpp#L92).
> Do you know if this function has something special to do with a file that
> can have errno EAGAIN? Based on that, replacing the errno in the driver
> with sth other than EAGAIN also seems to be a viable fix.
> >>
> >> Thanks!
> >>
> >> - Alex Qiu
> >>
> >>
> >>
> >> - Alex Qiu
> >>
> >>
> >> On Fri, Aug 28, 2020 at 10:54 AM James Feist <
> james.feist@linux.intel.com> wrote:
> >>>
> >>> On 8/28/2020 9:43 AM, Alex Qiu wrote:
> >>> > Hi James,
> >>> >
> >>> > Thx for the reply! So right now, one thing is that the sensor is not
> >>> > dependent on the power state of the host solely, but also dependent
> on
> >>> > the boot progress of the host.
> >>>
> >>> Would the BiosPost power state not suffice?
> >>>
> >>> > And the more serious issue is that
> >>> > returning EAGAIN from the driver freezes the sensor, which is what
> I'm
> >>> > debugging right now. Do we have special treatment on errno returned
> by
> >>> > the driver? Thx.
> >>>
> >>> I ran into a similar issue with the CPUSensor and this was my fix:
> >>>
> https://github.com/openbmc/dbus-sensors/commit/c22b842bfa8cfe798d83f99fa7aa9f142278c21d#diff-ccbe0562fe1d501b4c1c42d967a02ea0
> >>>
> >>> I haven't hit this issue with hwmon sensor though.
> >>>
> >>> >
> >>> > - Alex Qiu
> >>> >
> >>> >
> >>> > On Fri, Aug 28, 2020 at 9:38 AM James Feist <
> james.feist@linux.intel.com
> >>> > <mailto:james.feist@linux.intel.com>> wrote:
> >>> >
> >>> >     On 8/27/2020 2:49 PM, Alex Qiu wrote:
> >>> >      > Hi James,
> >>> >      >
> >>> >      > After some debugging, I realized that the code I pointed out
> earlier
> >>> >      > wasn't the root cause. Update is that, the HwmonTempSensor
> stops
> >>> >      > updating after the hwmon driver returns EAGAIN as errno. I'll
> keep
> >>> >      > debugging...
> >>> >      >
> >>> >      > - Alex Qiu
> >>> >      >
> >>> >      >
> >>> >      > On Tue, Aug 25, 2020 at 5:49 PM Alex Qiu <xqiu@google.com
> >>> >     <mailto:xqiu@google.com>
> >>> >      > <mailto:xqiu@google.com <mailto:xqiu@google.com>>> wrote:
> >>> >      >
> >>> >      >     Hi James and OpenBMC community,
> >>> >      >
> >>> >      >     We have a sensor for HwmonTempSensor which doesn't have a
> valid
> >>> >      >     reading until the host is fully booted. Before it's
> becoming
> >>> >     alive
> >>> >      >     and useful, it's getting disabled in code
> >>> >      >
> >>> >       (
> https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp#L266
> )
> >>> >      >     because of errors thrown up by the hwmon driver. Ideally,
> the
> >>> >      >     thermal control loop should kick the fan to fail safe mode
> >>> >     until no
> >>> >      >     more errors are observed.
> >>> >      >
> >>> >      >     Any suggestions on how we should handle this kind of
> sensor
> >>> >     properly?
> >>> >
> >>> >     For what its worth we use the PowerState property that has
> options of
> >>> >     power on or BiosPost to disable scanning when the state is
> invalid:
> >>> >
> https://github.com/openbmc/dbus-sensors/blob/f27a55c775383a3fb1ac655f3eda785f6845f214/src/HwmonTempMain.cpp#L208
> >>> >
> >>> >
> >>> >      >
> >>> >      >     Thank you!
> >>> >      >
> >>> >      >     - Alex Qiu
> >>> >      >
> >>> >
>

[-- Attachment #2: Type: text/html, Size: 8503 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-08-31 22:08           ` Alex Qiu
  2020-08-31 23:54             ` Guenter Roeck
@ 2020-09-02 17:59             ` James Feist
  2020-09-02 20:31               ` Alex Qiu
  2020-09-03 15:38               ` Patrick Williams
  1 sibling, 2 replies; 18+ messages in thread
From: James Feist @ 2020-09-02 17:59 UTC (permalink / raw)
  To: Alex Qiu, Guenter Roeck
  Cc: OpenBMC Maillist, Peter Lundgren, Josh Lehan, Jason Ling,
	Ed Tanous, Kun Yi

On 8/31/2020 3:08 PM, Alex Qiu wrote:
> Hi James,
> 
> I just came through this doc 
> (https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/overview/posix/stream_descriptor.html). 
> Looks like that it's a terrible idea for hwmon driver to return EAGAIN 
> for dbus-sensors. With that, I think the proper fix is also to use other 
> errno instead in our driver, and this caveat should be probably 
> documented somewhere.
> 

Hi Alex,

I hit something similar with peci where timeouts was causing the scan 
loop to hang. I remembered that back when we were developing ipmbbridge 
we hit something similar with i2c, and the work around was to use the 
tcp socket instead 
https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/reference/ip__tcp/socket.html 
as it could correctly handle the errors. This worked for me for the 
CpuSensor and is a easy to implement change that might be worth trying 
for other sensors 
https://gerrit.openbmc-project.xyz/c/openbmc/dbus-sensors/+/36181.

Thanks

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-09-02 17:59             ` James Feist
@ 2020-09-02 20:31               ` Alex Qiu
  2020-09-02 22:07                 ` Alex Qiu
  2020-09-03 15:38               ` Patrick Williams
  1 sibling, 1 reply; 18+ messages in thread
From: Alex Qiu @ 2020-09-02 20:31 UTC (permalink / raw)
  To: James Feist
  Cc: Guenter Roeck, OpenBMC Maillist, Peter Lundgren, Josh Lehan,
	Jason Ling, Ed Tanous, Kun Yi

[-- Attachment #1: Type: text/plain, Size: 1446 bytes --]

Hi James,

I think Ed has mentioned the same thing in an internal discussion. I'll
give it a try.

Anyhow, as discussed with Guenter, EAGAIN still may not be a good use of
our case here, as it's not something that a busy loop should wait for. I've
also changed the driver to return ENODATA.

Thanks!

- Alex Qiu


On Wed, Sep 2, 2020 at 10:59 AM James Feist <james.feist@linux.intel.com>
wrote:

> On 8/31/2020 3:08 PM, Alex Qiu wrote:
> > Hi James,
> >
> > I just came through this doc
> > (
> https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/overview/posix/stream_descriptor.html).
>
> > Looks like that it's a terrible idea for hwmon driver to return EAGAIN
> > for dbus-sensors. With that, I think the proper fix is also to use other
> > errno instead in our driver, and this caveat should be probably
> > documented somewhere.
> >
>
> Hi Alex,
>
> I hit something similar with peci where timeouts was causing the scan
> loop to hang. I remembered that back when we were developing ipmbbridge
> we hit something similar with i2c, and the work around was to use the
> tcp socket instead
>
> https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/reference/ip__tcp/socket.html
> as it could correctly handle the errors. This worked for me for the
> CpuSensor and is a easy to implement change that might be worth trying
> for other sensors
> https://gerrit.openbmc-project.xyz/c/openbmc/dbus-sensors/+/36181.
>
> Thanks
>
> James
>

[-- Attachment #2: Type: text/html, Size: 2472 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-09-02 20:31               ` Alex Qiu
@ 2020-09-02 22:07                 ` Alex Qiu
  2020-09-03 15:31                   ` Guenter Roeck
  2020-09-03 17:20                   ` James Feist
  0 siblings, 2 replies; 18+ messages in thread
From: Alex Qiu @ 2020-09-02 22:07 UTC (permalink / raw)
  To: James Feist
  Cc: Guenter Roeck, OpenBMC Maillist, Peter Lundgren, Josh Lehan,
	Jason Ling, Ed Tanous, Kun Yi

[-- Attachment #1: Type: text/plain, Size: 1734 bytes --]

Hi James,

I just tried, now I have no reading at all... I saw you marked your pull
request to work in progress. I'll wait for further updates...

- Alex Qiu


On Wed, Sep 2, 2020 at 1:31 PM Alex Qiu <xqiu@google.com> wrote:

> Hi James,
>
> I think Ed has mentioned the same thing in an internal discussion. I'll
> give it a try.
>
> Anyhow, as discussed with Guenter, EAGAIN still may not be a good use of
> our case here, as it's not something that a busy loop should wait for. I've
> also changed the driver to return ENODATA.
>
> Thanks!
>
> - Alex Qiu
>
>
> On Wed, Sep 2, 2020 at 10:59 AM James Feist <james.feist@linux.intel.com>
> wrote:
>
>> On 8/31/2020 3:08 PM, Alex Qiu wrote:
>> > Hi James,
>> >
>> > I just came through this doc
>> > (
>> https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/overview/posix/stream_descriptor.html).
>>
>> > Looks like that it's a terrible idea for hwmon driver to return EAGAIN
>> > for dbus-sensors. With that, I think the proper fix is also to use
>> other
>> > errno instead in our driver, and this caveat should be probably
>> > documented somewhere.
>> >
>>
>> Hi Alex,
>>
>> I hit something similar with peci where timeouts was causing the scan
>> loop to hang. I remembered that back when we were developing ipmbbridge
>> we hit something similar with i2c, and the work around was to use the
>> tcp socket instead
>>
>> https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/reference/ip__tcp/socket.html
>> as it could correctly handle the errors. This worked for me for the
>> CpuSensor and is a easy to implement change that might be worth trying
>> for other sensors
>> https://gerrit.openbmc-project.xyz/c/openbmc/dbus-sensors/+/36181.
>>
>> Thanks
>>
>> James
>>
>

[-- Attachment #2: Type: text/html, Size: 3102 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-09-02 22:07                 ` Alex Qiu
@ 2020-09-03 15:31                   ` Guenter Roeck
  2020-09-03 16:03                     ` Alex Qiu
  2020-09-03 17:20                   ` James Feist
  1 sibling, 1 reply; 18+ messages in thread
From: Guenter Roeck @ 2020-09-03 15:31 UTC (permalink / raw)
  To: Alex Qiu
  Cc: James Feist, OpenBMC Maillist, Peter Lundgren, Josh Lehan,
	Jason Ling, Ed Tanous, Kun Yi

On Wed, Sep 2, 2020 at 3:07 PM Alex Qiu <xqiu@google.com> wrote:
>
> Hi James,
>
> I just tried, now I have no reading at all... I saw you marked your pull request to work in progress. I'll wait for further updates...
>
I may be missing something, but isn't that the idea ?

Thanks,
Guenter

> - Alex Qiu
>
>
> On Wed, Sep 2, 2020 at 1:31 PM Alex Qiu <xqiu@google.com> wrote:
>>
>> Hi James,
>>
>> I think Ed has mentioned the same thing in an internal discussion. I'll give it a try.
>>
>> Anyhow, as discussed with Guenter, EAGAIN still may not be a good use of our case here, as it's not something that a busy loop should wait for. I've also changed the driver to return ENODATA.
>>
>> Thanks!
>>
>> - Alex Qiu
>>
>>
>> On Wed, Sep 2, 2020 at 10:59 AM James Feist <james.feist@linux.intel.com> wrote:
>>>
>>> On 8/31/2020 3:08 PM, Alex Qiu wrote:
>>> > Hi James,
>>> >
>>> > I just came through this doc
>>> > (https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/overview/posix/stream_descriptor.html).
>>> > Looks like that it's a terrible idea for hwmon driver to return EAGAIN
>>> > for dbus-sensors. With that, I think the proper fix is also to use other
>>> > errno instead in our driver, and this caveat should be probably
>>> > documented somewhere.
>>> >
>>>
>>> Hi Alex,
>>>
>>> I hit something similar with peci where timeouts was causing the scan
>>> loop to hang. I remembered that back when we were developing ipmbbridge
>>> we hit something similar with i2c, and the work around was to use the
>>> tcp socket instead
>>> https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/reference/ip__tcp/socket.html
>>> as it could correctly handle the errors. This worked for me for the
>>> CpuSensor and is a easy to implement change that might be worth trying
>>> for other sensors
>>> https://gerrit.openbmc-project.xyz/c/openbmc/dbus-sensors/+/36181.
>>>
>>> Thanks
>>>
>>> James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-09-02 17:59             ` James Feist
  2020-09-02 20:31               ` Alex Qiu
@ 2020-09-03 15:38               ` Patrick Williams
  2020-09-03 15:43                 ` Ed Tanous
  1 sibling, 1 reply; 18+ messages in thread
From: Patrick Williams @ 2020-09-03 15:38 UTC (permalink / raw)
  To: James Feist
  Cc: Alex Qiu, Guenter Roeck, Peter Lundgren, OpenBMC Maillist,
	Ed Tanous, Josh Lehan, Jason Ling

[-- Attachment #1: Type: text/plain, Size: 1518 bytes --]

On Wed, Sep 02, 2020 at 10:59:12AM -0700, James Feist wrote:
> On 8/31/2020 3:08 PM, Alex Qiu wrote:
> > Hi James,
> > 
> > I just came through this doc (https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/overview/posix/stream_descriptor.html).
> > Looks like that it's a terrible idea for hwmon driver to return EAGAIN
> > for dbus-sensors. With that, I think the proper fix is also to use other
> > errno instead in our driver, and this caveat should be probably
> > documented somewhere.
> > 
> 
> Hi Alex,
> 
> I hit something similar with peci where timeouts was causing the scan loop
> to hang. I remembered that back when we were developing ipmbbridge we hit
> something similar with i2c, and the work around was to use the tcp socket
> instead https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/reference/ip__tcp/socket.html
> as it could correctly handle the errors. This worked for me for the
> CpuSensor and is a easy to implement change that might be worth trying for
> other sensors
> https://gerrit.openbmc-project.xyz/c/openbmc/dbus-sensors/+/36181.
> 
> Thanks
> 
> James

I might have missed this but what are we going to do about the
Sensor.Value in the case when the data isn't yet available?  Is the
sensor only going to exist when the data is available?  Do we need to
define a specific value to indicate that a Sensor.Value isn't available?
Now that we've moved to a `double` for Sensor.Value it seems like we
could use NaN.

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-09-03 15:38               ` Patrick Williams
@ 2020-09-03 15:43                 ` Ed Tanous
  2020-09-03 15:54                   ` Patrick Williams
  0 siblings, 1 reply; 18+ messages in thread
From: Ed Tanous @ 2020-09-03 15:43 UTC (permalink / raw)
  To: Patrick Williams
  Cc: James Feist, Alex Qiu, Guenter Roeck, Peter Lundgren,
	OpenBMC Maillist, Josh Lehan, Jason Ling

On Thu, Sep 3, 2020 at 8:38 AM Patrick Williams <patrick@stwcx.xyz> wrote:
>
> On Wed, Sep 02, 2020 at 10:59:12AM -0700, James Feist wrote:
> > On 8/31/2020 3:08 PM, Alex Qiu wrote:
> > > Hi James,
> > >
> > > I just came through this doc (https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/overview/posix/stream_descriptor.html).
> > > Looks like that it's a terrible idea for hwmon driver to return EAGAIN
> > > for dbus-sensors. With that, I think the proper fix is also to use other
> > > errno instead in our driver, and this caveat should be probably
> > > documented somewhere.
> > >
> >
> > Hi Alex,
> >
> > I hit something similar with peci where timeouts was causing the scan loop
> > to hang. I remembered that back when we were developing ipmbbridge we hit
> > something similar with i2c, and the work around was to use the tcp socket
> > instead https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/reference/ip__tcp/socket.html
> > as it could correctly handle the errors. This worked for me for the
> > CpuSensor and is a easy to implement change that might be worth trying for
> > other sensors
> > https://gerrit.openbmc-project.xyz/c/openbmc/dbus-sensors/+/36181.
> >
> > Thanks
> >
> > James
>
> I might have missed this but what are we going to do about the
> Sensor.Value in the case when the data isn't yet available?  Is the
> sensor only going to exist when the data is available?  Do we need to
> define a specific value to indicate that a Sensor.Value isn't available?
> Now that we've moved to a `double` for Sensor.Value it seems like we
> could use NaN.

CPUSensor code that does exactly what you describe :)

https://github.com/openbmc/dbus-sensors/blob/105a19754f003956def5934612b1de86225a4bc1/src/CPUSensor.cpp#L180

dbus-sensors has done exactly this basically since its creation.  I
wonder how we'd get that more formalized in phosphor-dbus-interfaces.

>
> --
> Patrick Williams

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-09-03 15:43                 ` Ed Tanous
@ 2020-09-03 15:54                   ` Patrick Williams
  0 siblings, 0 replies; 18+ messages in thread
From: Patrick Williams @ 2020-09-03 15:54 UTC (permalink / raw)
  To: Ed Tanous
  Cc: James Feist, Alex Qiu, Guenter Roeck, Peter Lundgren,
	OpenBMC Maillist, Josh Lehan, Jason Ling

[-- Attachment #1: Type: text/plain, Size: 1280 bytes --]

On Thu, Sep 03, 2020 at 08:43:16AM -0700, Ed Tanous wrote:
> On Thu, Sep 3, 2020 at 8:38 AM Patrick Williams <patrick@stwcx.xyz> wrote:
> >
> > I might have missed this but what are we going to do about the
> > Sensor.Value in the case when the data isn't yet available?  Is the
> > sensor only going to exist when the data is available?  Do we need to
> > define a specific value to indicate that a Sensor.Value isn't available?
> > Now that we've moved to a `double` for Sensor.Value it seems like we
> > could use NaN.
> 
> CPUSensor code that does exactly what you describe :)
> 
> https://github.com/openbmc/dbus-sensors/blob/105a19754f003956def5934612b1de86225a4bc1/src/CPUSensor.cpp#L180
> 
> dbus-sensors has done exactly this basically since its creation.  I
> wonder how we'd get that more formalized in phosphor-dbus-interfaces.
> 

First step would be a comment here:

https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Sensor/Value.interface.yaml#L23

There isn't currently a way in the YAML to define specific values as
meaningful, other than enums.  Certainly something that could be
improved.  It seems like we also have fixed values like object-path segments
that are in demand.

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-09-03 15:31                   ` Guenter Roeck
@ 2020-09-03 16:03                     ` Alex Qiu
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Qiu @ 2020-09-03 16:03 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: James Feist, OpenBMC Maillist, Peter Lundgren, Josh Lehan,
	Jason Ling, Ed Tanous, Kun Yi

[-- Attachment #1: Type: text/plain, Size: 2278 bytes --]

Hi Guenter,

For that, this was referring to the OpenBMC application. What I meant is
that even normal sensors stopped working... . Sry for the confusion...

- Alex Qiu


On Thu, Sep 3, 2020 at 8:31 AM Guenter Roeck <groeck@google.com> wrote:

> On Wed, Sep 2, 2020 at 3:07 PM Alex Qiu <xqiu@google.com> wrote:
> >
> > Hi James,
> >
> > I just tried, now I have no reading at all... I saw you marked your pull
> request to work in progress. I'll wait for further updates...
> >
> I may be missing something, but isn't that the idea ?
>
> Thanks,
> Guenter
>
> > - Alex Qiu
> >
> >
> > On Wed, Sep 2, 2020 at 1:31 PM Alex Qiu <xqiu@google.com> wrote:
> >>
> >> Hi James,
> >>
> >> I think Ed has mentioned the same thing in an internal discussion. I'll
> give it a try.
> >>
> >> Anyhow, as discussed with Guenter, EAGAIN still may not be a good use
> of our case here, as it's not something that a busy loop should wait for.
> I've also changed the driver to return ENODATA.
> >>
> >> Thanks!
> >>
> >> - Alex Qiu
> >>
> >>
> >> On Wed, Sep 2, 2020 at 10:59 AM James Feist <
> james.feist@linux.intel.com> wrote:
> >>>
> >>> On 8/31/2020 3:08 PM, Alex Qiu wrote:
> >>> > Hi James,
> >>> >
> >>> > I just came through this doc
> >>> > (
> https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/overview/posix/stream_descriptor.html
> ).
> >>> > Looks like that it's a terrible idea for hwmon driver to return
> EAGAIN
> >>> > for dbus-sensors. With that, I think the proper fix is also to use
> other
> >>> > errno instead in our driver, and this caveat should be probably
> >>> > documented somewhere.
> >>> >
> >>>
> >>> Hi Alex,
> >>>
> >>> I hit something similar with peci where timeouts was causing the scan
> >>> loop to hang. I remembered that back when we were developing ipmbbridge
> >>> we hit something similar with i2c, and the work around was to use the
> >>> tcp socket instead
> >>>
> https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/reference/ip__tcp/socket.html
> >>> as it could correctly handle the errors. This worked for me for the
> >>> CpuSensor and is a easy to implement change that might be worth trying
> >>> for other sensors
> >>> https://gerrit.openbmc-project.xyz/c/openbmc/dbus-sensors/+/36181.
> >>>
> >>> Thanks
> >>>
> >>> James
>

[-- Attachment #2: Type: text/html, Size: 3843 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dealing with a sensor which doesn't have valid reading until host is powered up
  2020-09-02 22:07                 ` Alex Qiu
  2020-09-03 15:31                   ` Guenter Roeck
@ 2020-09-03 17:20                   ` James Feist
  1 sibling, 0 replies; 18+ messages in thread
From: James Feist @ 2020-09-03 17:20 UTC (permalink / raw)
  To: Alex Qiu; +Cc: Peter Lundgren, OpenBMC Maillist

On 9/2/2020 3:07 PM, Alex Qiu wrote:
> Hi James,
> 
> I just tried, now I have no reading at all... I saw you marked your pull 
> request to work in progress. I'll wait for further updates...
> 
> - Alex Qiu

Hi Alex,

Yes I ran into the same issue, I was originally using the tcp socket, 
then calling read() directly, and that 'worked' and may work for your 
use case, but mine the driver timeout still slowed down the sensor to an 
non-functional state. I pushed a patch to CPU sensor to make it fail 
faster, because in the case peci is down, you have bigger problems then 
sensors responding slow. It sounds like we aren't the first ones to hit 
this problem 
https://lists.ozlabs.org/pipermail/openbmc/2017-May/007557.html.

I'm thinking the only 'full proof' ways to fix this sort of things are:

1. Have some way to check state that you're good to read.
2. Handle reading in threads separate from d-bus.
3. Maybe avoid using the sysfs handles as they don't seem to respond 
correctly to epoll as they always respond ready to read.

-James

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-09-03 17:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-26  0:49 Dealing with a sensor which doesn't have valid reading until host is powered up Alex Qiu
2020-08-27 21:49 ` Alex Qiu
2020-08-28 16:38   ` James Feist
2020-08-28 16:43     ` Alex Qiu
2020-08-28 17:54       ` James Feist
2020-08-31 21:32         ` Alex Qiu
2020-08-31 22:08           ` Alex Qiu
2020-08-31 23:54             ` Guenter Roeck
2020-08-31 23:58               ` Alex Qiu
2020-09-02 17:59             ` James Feist
2020-09-02 20:31               ` Alex Qiu
2020-09-02 22:07                 ` Alex Qiu
2020-09-03 15:31                   ` Guenter Roeck
2020-09-03 16:03                     ` Alex Qiu
2020-09-03 17:20                   ` James Feist
2020-09-03 15:38               ` Patrick Williams
2020-09-03 15:43                 ` Ed Tanous
2020-09-03 15:54                   ` Patrick Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.