I'll let Julien ack (and commit) the patch.


On Wed, 29 Jun 2022, Dmytro Semenets wrote:
> Hi Stefano and Julien,
> What is the conclusion about this patch?
> 
> сб, 25 июн. 2022 г. в 17:45, Julien Grall <julien@xen.org>:
> >
> > Hi Stefano,
> >
> > On 24/06/2022 22:31, Stefano Stabellini wrote:
> > > On Fri, 24 Jun 2022, Julien Grall wrote:
> > >> On 23/06/2022 23:07, Stefano Stabellini wrote:
> > >>> On Thu, 23 Jun 2022, dmitry.semenets@gmail.com wrote:
> > >>>> From: Dmytro Semenets <dmytro_semenets@epam.com>
> > >>> So wouldn't it be better to remove the panic from the implementation of
> > >>> call_psci_cpu_off?
> > >>
> > >> I have asked to keep the panic() in call_psci_cpu_off(). If you remove the
> > >> panic() then we will hide the fact that the CPU was not properly turned off
> > >> and will consume more energy than expected.
> > >>
> > >> The WFI loop is fine when shutting down or rebooting because we know this will
> > >> only happen for a short period of time.
> > >
> > > Yeah, I don't think we should hide that CPU_OFF failed. I think we
> > > should print a warning. However, given that we know CPU_OFF can
> > > reasonably fail in normal conditions returning DENIED when a Trusted OS
> > > is present, then I think we should not panic.
> >
> > We know how to detect this condition (see section 5.9 in DEN0022D.b). So
> > I would argue we should fix it properly rather than removing the panic().
> >
> > >
> > > If there was a way to distinguish a failure because a Trusted OS is
> > > present (the "normal" failure) from other failures, I would suggest to:
> > > - print a warning if failed due to a Trusted OS being present
> > > - panic in other cases
> > >
> > > Unfortunately it looks like in all cases the return code is DENIED :-(
> > I am confused. Per the spec, the only reason CPU_OFF can return DENIED
> > is because the Trusted OS is resident. So what other cases are you
> > talking about?
> >
> > >
> > >
> > > Given that, I would not panic and only print a warning in all cases. Or
> > > we could ASSERT which at least goes away in !DEBUG builds.
> >
> > ASSERT() is definitely not way to deal with external input. I could
> > possibly accept a WARN(), but see above.
> >
> > >>> The reason I am saying this is that stop_cpu is called in a number of
> > >>> places beyond halt_this_cpu and as far as I can tell any of them could
> > >>> trigger the panic. (I admit they are unlikely places but still.)
> > >>
> > >> This is one of the example where the CPU will not be stopped for a short
> > >> period of time. We should deal with them differently (i.e. migrating the
> > >> trusted OS or else) so we give all the chance for the CPU to be fully powered.
> > >>
> > >> IMHO, this is a different issue and hence why I didn't ask Dmitry to solve it.
> > >
> > > I see your point now. I was seeing the two things as one.
> > >
> > > I think it is true that the WFI loop is likely to work. Also it is true
> > > that from a power perspective it makes no different on power down or
> > > reboot.  From that point of view this patch is OK.
> > >
> > > But even on shut-down/reboot, why not do that as a fallback in case
> > > CPU_OFF didn't work? It is going to work most of the times anyway, why
> > > change the default for the few cases that doesn't work?
> >
> > Because we would not be consistent how we would turn off a CPU on a
> > system supporting PSCI. I would prefer to use the same method everywhere
> > so it is easier to reason about.
> >
> > I am also not sure how you define "most of the time". Yes it is possible
> > that the boards we aware of will not have this issue, but how about the
> > one we don't know?
> >
> > >
> > > Given that this patch would work, I don't want to insist on this and let
> > > you decide.
> > >
> > >
> > > But even if we don't want to remove the panic as part of this patch, I
> > > think we should remove the panic in a separate patch anyway, at least
> > > until someone investigates and thinks of a strategy how to migrate the
> > > TrustedOS as you suggested.
> > If we accept this patch, then we remove the immediate pain. The other
> > uses of stop_cpu() are in:
> >         1) idle_loop(), this is reachable when turning off a CPU after boot
> > (not supported on Arm)
> >          2) start_secondary(), this is only used during boot (CPU
> > hot-unplug is not supported)
> >
> > Even if it would be possible to trigger the panic() in 2), I am not
> > aware of an immediate issue there. So I think it would be the wrong
> > approach to remove the panic() first and then investigate.
> >
> > The advantage of the panic() is it will remind us that some needs to be
> > fixed. With a warning (or WARN()) people will tend to ignore it.
> >
> > Cheers,
> >
> > --
> > Julien Grall
>