linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Mark Brown <broonie@kernel.org>
Cc: "Oleksij Rempel" <o.rempel@pengutronix.de>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	"Ulf Hansson" <ulf.hansson@linaro.org>,
	kernel@pengutronix.de, linux-kernel@vger.kernel.org,
	linux-mmc@vger.kernel.org, linux-pm@vger.kernel.org,
	"Søren Andersen" <san@skov.dk>
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support
Date: Sat, 25 Nov 2023 19:58:12 +0000	[thread overview]
Message-ID: <2023112504-cathedral-pulmonary-83ce@gregkh> (raw)
In-Reply-To: <ZWIWBhBN8AmK7tAJ@finisterre.sirena.org.uk>

On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote:
> On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote:
> > On Sat, Nov 25, 2023 at 10:30:42AM +0000, Mark Brown wrote:
> > > On Sat, Nov 25, 2023 at 09:09:01AM +0000, Greg Kroah-Hartman wrote:
> 
> > > > So hardware is attempting to rely on software in order to prevent the
> > > > destruction of that same hardware?  Surely hardware designers aren't
> > > > that crazy, right?  (rhetorical question, I know...)
> 
> > > Surely software people aren't going to make no effort to integrate with
> > > the notification features that the hardware engineers have so helpfully
> > > provided us with?
> 
> > That would be great, but I don't see that here, do you?  All I see is
> > the shutdown sequence changing because someone wants it to go "faster"
> > with the threat of hardware breaking if we don't meet that "faster"
> > number, yet no knowledge or guarantee that this number can ever be known
> > or happen.
> 
> The idea was to have somewhere to send notifications when the hardware
> starts reporting things like power supplies starting to fail.  We do
> have those from hardware, we just don't do anything terribly useful
> with them yet.

Ok, but that's not what I recall this patchset doing, or did I missing
something?  All I saw was a "reorder the shutdown sequence" set of
changes.  Or at least that's all I remember at this point in time,
sorry, it's been a few days, but at least that lines up with what the
Subject line says above :)

> TBH it does seem reasonable that there will be systems that can usefully
> detect these issues but hasn't got a detailed characterisation of
> exactly how long you've got before things expire, it's also likely that
> the actual bound is going to be highly variable depending on what the
> system is up to at the point of detection.  It's quite likely that we'd
> only get a worst case bound so it's also likely that we'd have more time
> in practice than in spec.  I'd expect characterisation that does happen
> to be very system specific at this point, I don't think we can rely on
> getting that information.  I'd certainly expect that we have vastly more
> systems can usefully detect issues than systems where we have firm
> numbers.

Sure, that all sounds good, but again, I don't think that's what is
happening here.

> > > > > Same problem was seen not only in automotive devices, but also in
> > > > > industrial or agricultural. With other words, it is important enough to bring
> > > > > some kind of solution mainline.
> 
> > > > But you are not providing a real solution here, only a "I am going to
> > > > attempt to shut down a specific type of device before the others, there
> > > > are no time or ordering guarantees here, so good luck!" solution.
> 
> > > I'm not sure there are great solutions here, the system integrators are
> > > constrained by the what the application appropriate silicon that's on
> > > the market is capable of, the siicon is constrained by the area costs of
> > > dealing with corner cases for system robustness and how much of the
> > > market cares about fixing these issues and software is constrained by
> > > what hardware ends up being built.  Everyone's just got to try their
> > > best with the reality they're confronted with, hopefully what's possible
> > > will improve with time.

Note, if you attempt to mitigate broken hardware with software fixes,
hardware will never get unbroken as it never needs to change.  Push back
on this, it's the only real way forward here.  I know it's not always
possible, but the number of times I have heard hardware engineers say
"but no one ever told us that was broken/impossible/whatever, we just
assumed software could handle it" is uncountable.

> > Agreed, but I don't think this patch is going to actually work properly
> > over time as there is no time values involved :)
> 
> This seems to be more into the area of mitigation than firm solution, I
> suspect users will be pleased if they can make a noticable dent in the
> number of failures they're seeing.

Mitigation is good, but this patch series is just a hack by doing "throw
this device type at the front of the shutdown list because we have
hardware that crashes a lot" :)

> > > > And again, how are you going to prevent the in-fighting of all device
> > > > types to be "first" in the list?
> 
> > > It doesn't seem like the most complex integration challenge we've ever
> > > had to deal with TBH.
> 
> > True, but we all know how this grows and thinking about how to handle it
> > now is key for this to be acceptable.
> 
> It feels like if we're concerned about mitigating physical damage during
> the process of power failure that's a very limited set of devices - the
> storage case where we're in the middle of writing to flash or whatever
> is the most obvious case.

Then why isn't userspace handling this?  This is a policy decision that
it needs to take to properly know what hardware needs to be shut down,
and what needs to happen in order to do that (i.e. flush, unmount,
etc.?)  And userspace today should be able to say, "power down this
device now!" for any device in the system based on the sysfs device
tree, or at the very least, force it to a specific power state.  So why
not handle this policy there?

thanks,

greg k-h

  reply	other threads:[~2023-11-25 19:58 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-24 14:53 [PATCH v1 0/3] introduce priority-based shutdown support Oleksij Rempel
2023-11-24 14:53 ` [PATCH v1 1/3] driver core: move core part of device_shutdown() to a separate function Oleksij Rempel
2023-11-24 15:07   ` Greg Kroah-Hartman
2023-11-24 20:04   ` kernel test robot
2023-11-24 14:53 ` [PATCH v1 2/3] driver core: introduce prioritized device shutdown sequence Oleksij Rempel
2023-11-24 15:10   ` Greg Kroah-Hartman
2023-11-24 14:53 ` [PATCH v1 3/3] mmc: core: increase shutdown priority for MMC devices Oleksij Rempel
2023-11-24 15:05 ` [PATCH v1 0/3] introduce priority-based shutdown support Greg Kroah-Hartman
2023-11-24 15:21   ` Mark Brown
2023-11-24 15:27     ` Greg Kroah-Hartman
2023-11-24 15:49       ` Mark Brown
2023-11-24 15:56         ` Greg Kroah-Hartman
2023-11-24 16:32           ` Oleksij Rempel
2023-11-24 17:26             ` Greg Kroah-Hartman
2023-11-24 18:57               ` Oleksij Rempel
2023-11-25  6:51                 ` Greg Kroah-Hartman
2023-11-25  8:50                   ` Oleksij Rempel
2023-11-25  9:09                     ` Greg Kroah-Hartman
2023-11-25 10:30                       ` Mark Brown
2023-11-25 14:35                         ` Greg Kroah-Hartman
2023-11-25 15:43                           ` Mark Brown
2023-11-25 19:58                             ` Greg Kroah-Hartman [this message]
2023-11-26 10:14                               ` Mark Brown
2023-11-26 19:31                                 ` Oleksij Rempel
2023-11-27 11:27                                   ` Christian Loehle
2023-11-27 11:44                                     ` Oleksij Rempel
2023-11-27 11:57                                       ` Christian Loehle
2023-11-26 19:42                                 ` Ferry Toth
2023-11-27 14:09                                   ` Mark Brown
2023-11-27 10:13                     ` Christian Loehle
2023-11-27 11:36                       ` Oleksij Rempel
2023-11-30 21:59                         ` Francesco Dolcini
2023-11-27 12:54               ` Matti Vaittinen
2023-11-27 13:08                 ` Greg Kroah-Hartman
2023-11-27 14:24                   ` Mark Brown
2023-11-27 14:49                   ` Matti Vaittinen
2023-11-27 16:23                     ` Mark Brown
2023-11-30  9:57 ` Ulf Hansson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2023112504-cathedral-pulmonary-83ce@gregkh \
    --to=gregkh@linuxfoundation.org \
    --cc=broonie@kernel.org \
    --cc=kernel@pengutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=o.rempel@pengutronix.de \
    --cc=rafael@kernel.org \
    --cc=san@skov.dk \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).