Hi Samuel, On Tue, Mar 23, 2021 at 11:44:50PM -0500, Samuel Holland wrote: > On 3/22/21 8:56 PM, Andre Przywara wrote: > >> I'm sending this patch as an RFC because it raises questions about how > >> we handle firmware versioning. How far back does (or should) our support > >> for old TF-A and Crust versions go? > >> > >> cpuidle has a problem that without working firmware support, CPUs will > >> enter idle states and be unable to wake up. As a result, the system will > >> hang at some point during boot, usually before getting to userspace. > >> > >> For over a year[0], TF-A has exposed the PSCI CPU_SUSPEND function when > >> a SCPI implementation is present[1]. Implementing CPU_SUSPEND is > >> required for implementing SYSTEM_SUSPEND[2], even if CPU_SUSPEND is not > >> itself used for anything. > >> > >> However, there was no code to actually wake up a CPU once it called the > >> CPU_SUSPEND function, because I could not find the register providing > >> the necessary information. The fact that CPU_SUSPEND was broken affected > >> nobody, because nothing ever called it -- there were no idle states in > >> the DTS. In hindsight, what I should have done was always return failure > >> from sunxi_validate_power_state(), but that ship has long sailed. > >> > >> I finally found the elusive register and implemented the wakeup code > >> earlier this month[3]. So now, CPU_SUSPEND actually works, if all of > >> your firmware is up to date, and cpuidle works if you add the states in > >> your device tree. > >> > >> Unfortunately, there is currently nothing verifying that compatibility. > >> So you can get into four possible scenarios: > >> 1) No idle states in DTS, any firmware => Linux works, with baseline > >> power consumption. > >> 2) Idle states added to DTS, no Crust/SCPI => Linux works, but every > >> attempt to enter an idle state is rejected because CPU_SUSPEND is > >> not hooked up. So power consumption increases by a sizable amount. > >> 3) Idle states added to DTS, "old" Crust/SCPI (before [3]) => Linux > >> fails to boot, because CPUs never return from idle states. > >> 4) Idle states added to DTS, "new" Crust/SCPI (after [3]) => Linux > >> works, with improved power consumption compared to the baseline. > >> > >> Obviously, we want to prevent scenario 3 if possible. > > > > So I think the core of the problem is that the DT describes some > > firmware feature, but we have the DT bundled with the kernel, not the > > firmware. > > I would say the core problem is that the firmware lies about supporting > PSCI CPU_SUSPEND. Linux shouldn't be calling CPU_SUSPEND if the firmware > declares it as unavailable, regardless of what is in the DTS. I would say we have two core problems :) > (Technically, per the PSCI standard, CPU_SUSPEND is a mandatory > function, but a quick survey of the TF-A platforms shows it is far from > universally implemented.) U-boot also implements CPU_SUSPEND but will return -1 if you don't have an implementation. I guess that at the moment we shouldn't be reporting it as supported if we don't But I also agree with Andre here, we shouldn't list cpuidles capabilities in the DTS if we don't always have them either. > > So is there any way we can detect an older crust version in U-Boot, > > then remove any potential idle states from the DT? > > Let's assume that we are using a functioning SoC (H3) or the secure fuse > is blown (A64) and therefore U-Boot cannot access SRAM A2. I can think > of three ways it can learn about crust: > > a) PSCI_FEATURES (e.g. is CPU_SUSPEND supported) > b) Metadata in the FIT image > c) Custom SMCs > > TF-A has some additional methods available: > > d) The SCPI-reported firmware version > e) The magic number at the beginning of the firmware binary > > > Granted, this requires recent U-Boot as well, but at least we could try > > to mitigate the worst case a bit? > > If we're okay with modifying firmware to solve this problem, then I > propose the following solution: > > 1) Version bump crust or change its magic number. > 2) Modify TF-A to only report CPU_SUSPEND as available if it detects the > new crust version. This would involve conditionally setting > sunxi_scpi_psci_ops.validate_power_state, and updating psci_setup.c > to also check for .validate_power_state when setting psci_caps. > 3) Modify the Linux PSCI client to respect PSCI_FEATURES when setting > psci_ops.cpu_suspend. cpuidle-psci checks for this function before > setting up idle states. > 4) Finally, after some time, add the idle states to the DTS. > > In fact, this solution solves both scenarios 2 and 3, because it also > takes care of the native PM implementation, which doesn't implement > CPU_SUSPEND at all. > > Does that sound workable? If we can add the DT node at boot in crust (or in TF-A), then we don't really need PSCI_FEATURES, and it would magically work once the firmware is updated. It looks like a solid plan otherwise Maxime