linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Loehle <christian.loehle@arm.com>
To: Oleksij Rempel <o.rempel@pengutronix.de>,
	Mark Brown <broonie@kernel.org>
Cc: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	"Ulf Hansson" <ulf.hansson@linaro.org>,
	kernel@pengutronix.de, linux-kernel@vger.kernel.org,
	linux-mmc@vger.kernel.org, linux-pm@vger.kernel.org,
	"Søren Andersen" <san@skov.dk>
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support
Date: Mon, 27 Nov 2023 11:27:31 +0000	[thread overview]
Message-ID: <8ffb32c8-907c-4266-b8be-c7309418b9f0@arm.com> (raw)
In-Reply-To: <20231126193125.GB877872@pengutronix.de>

On 26/11/2023 19:31, Oleksij Rempel wrote:
> On Sun, Nov 26, 2023 at 10:14:45AM +0000, Mark Brown wrote:
>> On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote:
>>> On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote:
>>>> On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote:
>>
>>>>> That would be great, but I don't see that here, do you?  All I see is
>>>>> the shutdown sequence changing because someone wants it to go "faster"
>>>>> with the threat of hardware breaking if we don't meet that "faster"
>>>>> number, yet no knowledge or guarantee that this number can ever be known
>>>>> or happen.
>>
>>>> The idea was to have somewhere to send notifications when the hardware
>>>> starts reporting things like power supplies starting to fail.  We do
>>>> have those from hardware, we just don't do anything terribly useful
>>>> with them yet.
>>
>>> Ok, but that's not what I recall this patchset doing, or did I missing
>>> something?  All I saw was a "reorder the shutdown sequence" set of
>>> changes.  Or at least that's all I remember at this point in time,
>>> sorry, it's been a few days, but at least that lines up with what the
>>> Subject line says above :)
>>
>> That's not in the series, a bunch of it is merged in some form (eg, see
>> hw_protection_shutdown()) and more of it would need to be built on top
>> if this were merged.
> 
> The current kernel has enough infrastructure to manage essential functions
> related to hardware protection:
> - The Device Tree specifies the source of interrupts for detecting
>   under-voltage events. It also details critical system regulators and some
>   of specification of backup power supplied by the board.
> - Various frameworks within the kernel can identify critical hardware
>   conditions like over-temperature and under-voltage. Upon detection, these
>   frameworks invoke the hw_protection_shutdown() function.
> 
>>>>> Agreed, but I don't think this patch is going to actually work properly
>>>>> over time as there is no time values involved :)
> 
> If we're to implement a deadline for each shutdown call (as the requirement for
> "time values" suggests?), then prioritization becomes essential. Without
> establishing a shutdown order, the inclusion of time values might not be
> effectively utilized.  Am I overlooking anything in this regard?
> 
>>>> This seems to be more into the area of mitigation than firm solution, I
>>>> suspect users will be pleased if they can make a noticable dent in the
>>>> number of failures they're seeing.
>>
>>> Mitigation is good, but this patch series is just a hack by doing "throw
>>> this device type at the front of the shutdown list because we have
>>> hardware that crashes a lot" :)
> 
> The root of the issue seems to be the choice of primary storage device.
> 
> All storage technologies - HDD, SSD, eMMC, NAND - are vulnerable to power
> loss. The only foolproof safeguard is a backup power source, but this
> introduces its own set of challenges:

I disagree and would say that any storage device sold as "industrial" should
guarantee power-fail safety. Plus, you mentioned data loss isn't even your concern,
but the storage device fails/bricks.
> 
> 1. Batteries: While they provide a backup, they come with limitations like a
> finite number of charge cycles, sensitivity to temperature (a significant
> concern in industrial and automotive environments), higher costs, and
> increased device size. For most embedded applications, a UPS isn't a viable
> solution.
> 
> 2. Capacitors: A potential alternative, but they cannot offer prolonged
> backup time. Increasing the number of capacitors to extend backup time leads
> to additional issues:
>    - Increased costs and space requirements on the PCB.
>    - The need to manage partially charged capacitors during power failures.
>    - The requirement for a power supply capable of rapid charging.
>    - The risk of not reaching a safe state before the backup energy
>      depletes.
>    - In specific environments, like explosive atmospheres, storing large
>      amounts of energy can be hazardous.

And also just practically, ensuring a safe power down could be in the order
of a second, so it would be quite a capacitor.

> 
> Given these considerations, it's crucial to understand that such design choices
> aren't merely "hacks". They represent a balance between different types of
> trade-offs.
> 
>>>> It feels like if we're concerned about mitigating physical damage during
>>>> the process of power failure that's a very limited set of devices - the
>>>> storage case where we're in the middle of writing to flash or whatever
>>>> is the most obvious case.
>>
>>> Then why isn't userspace handling this?  This is a policy decision that
>>> it needs to take to properly know what hardware needs to be shut down,
>>> and what needs to happen in order to do that (i.e. flush, unmount,
>>> etc.?)  And userspace today should be able to say, "power down this
>>> device now!" for any device in the system based on the sysfs device
>>> tree, or at the very least, force it to a specific power state.  So why
>>> not handle this policy there?
>>
>> Given the tight timelines it does seem reasonable to have some of this
>> in the kernel - the specific decisions about how to handle these events
>> can always be controlled from userspace (eg, with a sysfs file like we
>> do for autosuspend delay times which seem to be in a similar ballpark).
> 
> Upon investigating the feasibility of a user space solution for eMMC
> power control, I've concluded that it's likely not possible. The primary
> issue is that most board designs don't include reset signaling for
> eMMCs. Additionally, the eMMC power rail is usually linked to the
> system's main power controller. While powering off is doable, cleanly
> powering it back on isn’t feasible. This is especially problematic when
> the rootfs is located on the eMMC, as power cycling the storage device
> could lead to system instability.
> 
> Therefore, any user space method to power off eMMC wouldn't be reliable
> or safe, as there's no way to ensure it can be turned back on without
> risking the integrity of the system. The design rationale is clear:
> avoiding the risks associated with powering off the primary storage
> device.
> 
> Considering these constraints, the only practical implementation I see
> is integrating this functionality into the system's shutdown sequence.
> This approach ensures a controlled environment for powering off the
> eMMC, avoiding potential issues.

You don't need the RST signal, in fact even if you had it it would be
the wrong thing to do. (Implementation is vendor-specific but RST
assumes that eMMCs' VCC and VCCQ are left untouched.)
You can try turning off eMMC cache completely and/or sending power down
notification on 'emergency shutdown', but since power-loss/fail behavior
is vendor-specific asking the storage device vendor how to ensure a safe
power-down.
Anyway the proper eMMC power-down methods are up to a second in timeouts,
so infeasible for your requirements from what I can see.

BR,
Christian

  reply	other threads:[~2023-11-27 11:27 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-24 14:53 [PATCH v1 0/3] introduce priority-based shutdown support Oleksij Rempel
2023-11-24 14:53 ` [PATCH v1 1/3] driver core: move core part of device_shutdown() to a separate function Oleksij Rempel
2023-11-24 15:07   ` Greg Kroah-Hartman
2023-11-24 20:04   ` kernel test robot
2023-11-24 14:53 ` [PATCH v1 2/3] driver core: introduce prioritized device shutdown sequence Oleksij Rempel
2023-11-24 15:10   ` Greg Kroah-Hartman
2023-11-24 14:53 ` [PATCH v1 3/3] mmc: core: increase shutdown priority for MMC devices Oleksij Rempel
2023-11-24 15:05 ` [PATCH v1 0/3] introduce priority-based shutdown support Greg Kroah-Hartman
2023-11-24 15:21   ` Mark Brown
2023-11-24 15:27     ` Greg Kroah-Hartman
2023-11-24 15:49       ` Mark Brown
2023-11-24 15:56         ` Greg Kroah-Hartman
2023-11-24 16:32           ` Oleksij Rempel
2023-11-24 17:26             ` Greg Kroah-Hartman
2023-11-24 18:57               ` Oleksij Rempel
2023-11-25  6:51                 ` Greg Kroah-Hartman
2023-11-25  8:50                   ` Oleksij Rempel
2023-11-25  9:09                     ` Greg Kroah-Hartman
2023-11-25 10:30                       ` Mark Brown
2023-11-25 14:35                         ` Greg Kroah-Hartman
2023-11-25 15:43                           ` Mark Brown
2023-11-25 19:58                             ` Greg Kroah-Hartman
2023-11-26 10:14                               ` Mark Brown
2023-11-26 19:31                                 ` Oleksij Rempel
2023-11-27 11:27                                   ` Christian Loehle [this message]
2023-11-27 11:44                                     ` Oleksij Rempel
2023-11-27 11:57                                       ` Christian Loehle
2023-11-26 19:42                                 ` Ferry Toth
2023-11-27 14:09                                   ` Mark Brown
2023-11-27 10:13                     ` Christian Loehle
2023-11-27 11:36                       ` Oleksij Rempel
2023-11-30 21:59                         ` Francesco Dolcini
2023-11-27 12:54               ` Matti Vaittinen
2023-11-27 13:08                 ` Greg Kroah-Hartman
2023-11-27 14:24                   ` Mark Brown
2023-11-27 14:49                   ` Matti Vaittinen
2023-11-27 16:23                     ` Mark Brown
2023-11-30  9:57 ` Ulf Hansson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ffb32c8-907c-4266-b8be-c7309418b9f0@arm.com \
    --to=christian.loehle@arm.com \
    --cc=broonie@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=kernel@pengutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=o.rempel@pengutronix.de \
    --cc=rafael@kernel.org \
    --cc=san@skov.dk \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).