On Fri, 2018-05-11 at 12:41 +0200, Mirela Simonovic wrote: > Hi Dario, > Hi, > On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli > wrote: > > I may very well be missing or misunderstanding something, but I > > continue to think that the problem here is that CPU_STARTING can't, > > right now, fail, while you need it to be able to. > > > > If that is the case, giving different priorities to the notifier, > > is > > not a solution. > > > Let me try to clarify. The assumption is that the starting CPU can > fail. Additional requirement set by Julien is that panic or BUG_ON is > not acceptable. > So, currently, in Xen, CPU bringup can't fail at the CPU_STARTING phase. This is the point of the BUG_ON(). It is you that need it to change this, and make it possible for CPU bringup to fail during CPU_STARTING. This is fine, but require changes, which, IMO, are not limited to removing the BUG_ON() or trading it with something else. > There are 2 ways to deal with this scenario: > > 1) Ignore and report the error, and let the erroneous CPU become > online. > This cannot be done without changing the logic in either scheduler or > notify_cpu_starting(), or I least I don't see how would that be done. > In previous email when I said "escalating this to who knows where" I > did not refer to error escalation but the escalation of the scope of > this series. > How can that be an option? If CPU bringup failed, how is it possible to let it go online? To me, it's not that "we can't let a CPU for which bringup failed continue without changing the scheduler or notify_cpu_starting()". It's rather "we must not a CPU for which bringup faile continue. Period.". Which is to say, you need to change things (in common code!) in such a way that CPU bringup can fail during the CPU_STARTING phase. > 2) Stop the erroneous CPU. > Taking this approach requires setting the priority for the notifier. > No! Stop the CPU if the bringup fails duting the CPU_STARTING phase does not require setting the priorities of the CPU_STARTING notifier. It requires to changing things (in common code) in such a way that CPU bringup can fail during the CPU_STARTING phase. (Did I say that already? :-) ) I understand that, if you set the priority, your series work. But that does not mean that it is a proper solution to the problem. It means that it is an hack that...well... makes your series work. :-) > The key point is that notify_cpu_starting() and scheduler do not have > to be changed. If errata notifier has higher priority than > scheduler's > notifier in the case of an error the CPU will not return into > notify_cpu_starting() and it will never execute BUG_ON because it > will > be stopped. The rest of the system will continue to function without > that CPU. > Right. But now we have and architecture (ARM) for which CPU bringup can fail at the CPU_STARTING phase. And yet, looking at common code, we see that the CPU_STARTING notifier has a BUG_ON() if it ever fails. How would people looking at this in 2 years time from now make sense of this? No, I don't think there really is a sensible workaround here. If you need the CPU_STARTING phase of CPU bringup to be able to fail, you need to make all the changes required to make that happen properly. Which does involve changing common code, and which should therefore be discussed with the other arches and core hypervisor maintainers. Regards, Dario -- <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/