On Thu, Jan 28, 2021 at 09:23:08AM +0000, Vaittinen, Matti wrote:
> On Wed, 2021-01-27 at 16:32 +0000, Mark Brown wrote:

> > Note that the events the API currently has are expected to be for the
> > actual error conditions, not for the warning ones - indicating that
> > the
> > voltage is out of regulation for example.

> I am unsure how to interpret this. What is the criteria of issue being
> an error/warning. When I was talking about warning I meant that the
> issue which is detected is unexpected and abnormal (error?) - but might
> still be recoverable (warning?). I understand the regulator framework
> must not signal same events for different purposes - but I don't really
> know what the current events are used for - I am grateful for any
> guidance!

What the majority of hardware interrupts on is situations where things
have already gone out of spec and there are actual problems with the
output - for example with current limiting there's often an actual
limiter in there so the regulator simply won't supply any more current
than is configured.  With a warning everything is still working fine but
getting close to not doing so.

> > Well, if these things are kicking in the hardware is in serious
> > trouble
> > anyway so it's unclear what the system would be likely to do in
> > software, and also unclear how safe it is to rely on software to be
> > able
> > to take that action given that it let things get into such a bad
> > state
> > in the first place.

> Actually, bear with me but I am unsure why we have these notifications
> if we don't expect SW to be able to do anything? Wouldn't the panic
> print be all that is needed then? I think that setups which have dual

You'll notice that there aren't any actual users of this stuff in tree
at the minute - people don't generally put much effort into software
recovery as they're not expecting to be anywhere near limiting in normal
operation.  What I'd expect people to do where they do implement
handling is something like shutting down all other supplies on the
device, possibly also trying to shut down the system as a whole.  Things
more about preventing physical damage rather than being part of the
normal operation of the system.

For thermal issues systems generally try to apply software limits well
before an individual component starts flagging things up with an
interrupt, the limits that devices have are generally super high and
often there'll be issues at a system level (eg, a case getting unusably
hot) earlier and it can take a while for responses to have an impact.

> limits (one for initiating potential SW recovery - other for HW to
> forcing protection) actually make sense. So does implementing notifiers
> / error statuses for events where SW recovery is potentially helpful.
> But whether the existing event notifications / error flags are correct
> for these is something I can't decide :) Here I ask guidance for Mark &
> others who know what is the idea behind existing error-flags/events.

It's not that we shouldn't implement support for warnings, it's that
they're not the common case for hardware and so won't line up with
behaviour for other users.