Re: [RFC PATCH 00/10] Preemption in hypervisor (ARM only)

From: George Dunlap <George.Dunlap@citrix.com>
To: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"Dario Faggioli" <dfaggioli@suse.com>,
	Meng Xu <mengxu@cis.upenn.edu>, Ian Jackson <iwj@xenproject.org>,
	Jan Beulich <jbeulich@suse.com>, Julien Grall <julien@xen.org>,
	Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>
Subject: Re: [RFC PATCH 00/10] Preemption in hypervisor (ARM only)
Date: Mon, 1 Mar 2021 14:39:26 +0000	[thread overview]
Message-ID: <F8E5E0F1-0E6C-4659-95E3-01CA52CFCC3F@citrix.com> (raw)
In-Reply-To: <87k0qx9gw0.fsf@epam.com>

> On Feb 24, 2021, at 11:37 PM, Volodymyr Babchuk <volodymyr_babchuk@epam.com> wrote:
> 
> 
>> Hypervisor/virt properties are different to both a kernel-only-RTOS, and
>> regular usespace.  This was why I gave you some specific extra scenarios
>> to do latency testing with, so you could make a fair comparison of
>> "extra overhead caused by Xen" separate from "overhead due to
>> fundamental design constraints of using virt".
> 
> I can't see any fundamental constraints there. I see how virtualization
> architecture can influence context switch time: how many actions you
> need to switch one vCPU to another. I have in mind low level things
> there: reprogram MMU to use another set of tables, reprogram your
> interrupt controller, timer, etc. Of course, you can't get latency lower
> that context switch time. This is the only fundamental constraint I can
> see.

Well suppose you have two domains, A and B, both of which control  hardware which have hard real-time requirements.

And suppose that A has just started handling handling a latency-sensitive interrupt, when a latency-sensitive interrupt comes in for B.  You might well preempt A and let B run for a full timeslice, causing A’s interrupt handler to be delayed by a significant amount.

Preventing that sort of thing would be a much more tricky issue to get right.

>> If you want timely interrupt handling, you either need to partition your
>> workloads by the long-running-ness of their hypercalls, or not have
>> long-running hypercalls.
> 
> ... or do long-running tasks asynchronously. I believe, for most
> domctls and sysctls there is no need to hold calling vCPU in hypervisor
> mode at all.
> 
>> I remain unconvinced that preemption is an sensible fix to the problem
>> you're trying to solve.
> 
> Well, this is the purpose of this little experiment. I want to discuss
> different approaches and to estimate amount of required efforts. By the
> way, from x86 point of view, how hard to switch vCPU context while it is
> running in hypervisor mode?

I’m not necessarily opposed to introducing preemption, but the more we ask about things, the more complex things begin to look.  The idea of introducing an async framework to deal with long-running hypercalls is a huge engineering and design effort, not just for Xen, but for all future callers of the interface.

The claim in the cover letter was that “[s]ome hypercalls can not be preempted at all”.  I looked at the reference, and it looks like you’re referring to this:

"I brooded over ways to make [alloc_domheap_pages()] preemptible. But it is a) located deep in call chain and b) used not only by hypercalls. So I can't see an easy way to make it preemptible."

Let’s assume for the sake of argument that preventing delays due to alloc_domheap_pages() would require significant rearchitecting of the code.  And let’s even assume that there are 2-3 other such knotty issues making for unacceptably long hypercalls.  Will identifying and tracking down those issues really be more effort than introducing preemption, introducing async operations, and all the other things we’ve been talking about?

One thing that might be interesting is to add some sort of metrics (disabled in Kconfig by default); e.g.:

1. On entry to a hypercall, take a timestamp

2. On every hypercall_preempt() call, take another timestamp and see how much time has passed without a preempt, and reset the timestamp count; also do a check on exit of the hypercall

We could start by trying to do stats and figuring out which hypercalls go the longest without preemption, as a way to guide the optimization efforts.  Then as we get that number down, we could add an ASSERT()s that the time is never longer than a certain amount, and add runs like that to osstest to make sure there are no regressions introduced.

I agree that hypercall continuations are complex; and you’re right that the fact that the hypercall continuation may never be called limits where preemption can happen.  But making the entire hypervisor preemption-friendly is also quite complex in its own way; it’s not immediately obvious to me from this thread that hypervisor preemption is less complex.

 -George