On Fri, 2019-08-02 at 14:49 +0100, Julien Grall wrote: > /!\/!\/!\ > > I am not a scheduler expert so my view maybe be wrong. Dario feel > free to > correct me :). > > /!\/!\/!\ > :-) > On 02/08/2019 14:07, Andrii Anisov wrote: > > On 02.08.19 12:15, Julien Grall wrote: > > > > > > But the time spent by hypervisor to handle interrupts, update the > > hardware state > > is not requested by the guest itself. It is a virtualization > > overhead. And the > > overhead heavily depends on the system configuration (e.g. how many > > guests are > > running). > > While context switch cost will depend on your system configuration. > The HW state > synchronization on entry to the hypervisor and exit from the > hypervisor will > always be there. This is even if you have one guest running or > partitioning your > system. > This might be a good way of thinking to this problem. The overhead/hypervisor time that is always there, even if you are running only one guest, with static and strict resource partitioning (e.g., hypercalls, IRQs), you want it charged to the guest. The overhead/hypervisor time coming from operations that you do because you have multiple guests and/or non-static partitioning (e.g., scheduling, load balancing), you don't want it charged to any specific guest. Note that, we're talking, in both cases of "hypervisor time". > There are some issues to account some of the work on exit to the > hypervisor > time. Let's take the example of the P2M, this task is a deferred work > from an > system register emulation because we need preemption. > > The task can be long running (several hundred milliseconds). A > scheduler may > only take into account the guest time and consider that vCPU does not > need to be > unscheduled. You are at the risk a vCPU will hog a pCPU and delay any > other > vCPU. This is not something ideal even for RT task. > Yes, this is indeed an example of what I was also describing in my other email. > Other work done on exit (e.g syncing the vGIC state to HW) will be > less a > concern where they are accounted because it cannot possibly hog a > pCPU. > Indeed. But it'd still be good to charge the proper entity for it, if possible. :-) > I understand you want to get the virtualization overhead. It feels to > me, this > needs to be a different category (i.e neither hypervisor time, nor > guest time). > IMO, what we need to do is separate the concept of guest/hypervisor time, from the fact that we account/charge someone or not (and if yes, who). E.g., hypercalls are hypervisor time and (in most cases) you want to charge a guest making the hypercalls for it. OTOH, running QEMU (e.g., in dom0) is guest time, and you want to charge the guest for which QEMU is acting as a DM for it (not dom0). Of course, some parts of this (e.g., the QEMU running in dom0 one) are going to be very difficult, if possible at all, to implement. But still, this would be the idea, IMO. > > Our target is XEN in safety critical systems. So I chosen more > > deterministic > > (from my point of view) approach. > > See above, I believe you are building an secure system with > accounting some of > the guest work to the hypervisor. > Yep, I do agree with Julien here. Doing the accounting right, you get both a more robust and a more fair system. Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <> (Raistlin Majere)