I'll let Julien ack (and commit) the patch. On Wed, 29 Jun 2022, Dmytro Semenets wrote: > Hi Stefano and Julien, > What is the conclusion about this patch? > > сб, 25 июн. 2022 г. в 17:45, Julien Grall : > > > > Hi Stefano, > > > > On 24/06/2022 22:31, Stefano Stabellini wrote: > > > On Fri, 24 Jun 2022, Julien Grall wrote: > > >> On 23/06/2022 23:07, Stefano Stabellini wrote: > > >>> On Thu, 23 Jun 2022, dmitry.semenets@gmail.com wrote: > > >>>> From: Dmytro Semenets > > >>> So wouldn't it be better to remove the panic from the implementation of > > >>> call_psci_cpu_off? > > >> > > >> I have asked to keep the panic() in call_psci_cpu_off(). If you remove the > > >> panic() then we will hide the fact that the CPU was not properly turned off > > >> and will consume more energy than expected. > > >> > > >> The WFI loop is fine when shutting down or rebooting because we know this will > > >> only happen for a short period of time. > > > > > > Yeah, I don't think we should hide that CPU_OFF failed. I think we > > > should print a warning. However, given that we know CPU_OFF can > > > reasonably fail in normal conditions returning DENIED when a Trusted OS > > > is present, then I think we should not panic. > > > > We know how to detect this condition (see section 5.9 in DEN0022D.b). So > > I would argue we should fix it properly rather than removing the panic(). > > > > > > > > If there was a way to distinguish a failure because a Trusted OS is > > > present (the "normal" failure) from other failures, I would suggest to: > > > - print a warning if failed due to a Trusted OS being present > > > - panic in other cases > > > > > > Unfortunately it looks like in all cases the return code is DENIED :-( > > I am confused. Per the spec, the only reason CPU_OFF can return DENIED > > is because the Trusted OS is resident. So what other cases are you > > talking about? > > > > > > > > > > > Given that, I would not panic and only print a warning in all cases. Or > > > we could ASSERT which at least goes away in !DEBUG builds. > > > > ASSERT() is definitely not way to deal with external input. I could > > possibly accept a WARN(), but see above. > > > > >>> The reason I am saying this is that stop_cpu is called in a number of > > >>> places beyond halt_this_cpu and as far as I can tell any of them could > > >>> trigger the panic. (I admit they are unlikely places but still.) > > >> > > >> This is one of the example where the CPU will not be stopped for a short > > >> period of time. We should deal with them differently (i.e. migrating the > > >> trusted OS or else) so we give all the chance for the CPU to be fully powered. > > >> > > >> IMHO, this is a different issue and hence why I didn't ask Dmitry to solve it. > > > > > > I see your point now. I was seeing the two things as one. > > > > > > I think it is true that the WFI loop is likely to work. Also it is true > > > that from a power perspective it makes no different on power down or > > > reboot. From that point of view this patch is OK. > > > > > > But even on shut-down/reboot, why not do that as a fallback in case > > > CPU_OFF didn't work? It is going to work most of the times anyway, why > > > change the default for the few cases that doesn't work? > > > > Because we would not be consistent how we would turn off a CPU on a > > system supporting PSCI. I would prefer to use the same method everywhere > > so it is easier to reason about. > > > > I am also not sure how you define "most of the time". Yes it is possible > > that the boards we aware of will not have this issue, but how about the > > one we don't know? > > > > > > > > Given that this patch would work, I don't want to insist on this and let > > > you decide. > > > > > > > > > But even if we don't want to remove the panic as part of this patch, I > > > think we should remove the panic in a separate patch anyway, at least > > > until someone investigates and thinks of a strategy how to migrate the > > > TrustedOS as you suggested. > > If we accept this patch, then we remove the immediate pain. The other > > uses of stop_cpu() are in: > > 1) idle_loop(), this is reachable when turning off a CPU after boot > > (not supported on Arm) > > 2) start_secondary(), this is only used during boot (CPU > > hot-unplug is not supported) > > > > Even if it would be possible to trigger the panic() in 2), I am not > > aware of an immediate issue there. So I think it would be the wrong > > approach to remove the panic() first and then investigate. > > > > The advantage of the panic() is it will remind us that some needs to be > > fixed. With a warning (or WARN()) people will tend to ignore it. > > > > Cheers, > > > > -- > > Julien Grall >