[Adding George plus others x86, ARM and core-Xen people] Hi Andrii, First of all, thanks a lot for this series! The problem you mention is a long standing one, and I'm glad we're eventually starting to properly look into it. I already have one comment: I think I can see from where this come from, but I don't think 'XEN scheduling hardening' is what we're doing in this series... I'd go for something like "xen: sched: improve idle and vcpu time accounting precision", or something like that. On Fri, 2019-07-26 at 13:37 +0300, Andrii Anisov wrote: > One of the scheduling problems is a misleading CPU idle time concept. > Now > for the CPU idle time, it is taken an idle vcpu run time. But idle > vcpu run > time includes IRQ processing, softirqs processing, tasklets > processing, etc. > Those tasks are not actual idle and they accounting may mislead CPU > freq > governors who rely on the CPU idle time. > Indeed! And I agree this is quite bad. > The other problem is that pure hypervisor tasks execution time is > charged from > the guest vcpu budget. > Yep, equally bad. > For example, IRQ and softirq processing time are charged > from the current vcpu budget, which is likely the guest vcpu. This is > quite > unfair and may break scheduling reliability. > It is proposed to charge guest > vcpus for the guest actual run time and time to serve guest's > hypercalls and > access to emulated iomem. All the rest is calculated as the > hypervisor run time > (IRQ and softirq processing, branch prediction hardening, etc.) > Right. > While the series is the early RFC, several points are still > untouched: > - Now the time elapsed from the last rescheduling is not fully > charged from > the current vcpu budget. Are there any changes needed in the > existing > scheduling algorithms? > I'll think about it, but out of the top of my head, I don't see how this can be a problem. Scheduling algorithms (should!) base their logic and their calculations on actual vcpus' runtime, not much on idle vcpus' one. > - How to avoid the absolute top priority of tasklets (what is obeyed > by all > schedulers so far). Should idle vcpu be scheduled as the normal > guest vcpus > (through queues, priorities, etc)? > Now, this is something to think about, and try to understand if anything would break if we go for it. I mean, I see why you'd want to do that, but tasklets and softirqs works the way they do, in Xen, since when they were introduced, I believe. Therefore, even if there wouldn't be any subsystem explicitly relying on the current behavior (which should be verified), I think we are at high risk of breaking things, if we change. That's not to mean it would not be a good change, or that it is impossible... It's, rather, just to raise some awareness. :-) > - Idle vcpu naming is quite misleading. It is a kind of system > (hypervisor) > task which is responsible for some hypervisor work. Should it be > renamed/reconsidered? > Well, that's a design question, even for this very series, isn't it? I mean, I see two ways of achieving proper idle time accounting: 1) you leave things as they are --i.e., idle does not only do idling, it also does all these other things, but you make sure you don't count the time they take as idle time; 2) you move all these activities out of idle, and in some other context, and you let idle just do the idling. At that point, time accounted to idle will be only actual idle time, as the time it took to Xen to do all the other things is now accounted to the new execution context which is running them. So, which path this path series takes (I believe 1), and which path you (and others) believe is better? (And, yes, discussing this is why I've added, apart from George, some other x86, ARM, and core-Xen people) Thanks and Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <> (Raistlin Majere)