All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: George Dunlap <george.dunlap@citrix.com>,
	Dario Faggioli <dfaggioli@suse.com>,
	Meng Xu <mengxu@cis.upenn.edu>, Ian Jackson <iwj@xenproject.org>,
	Jan Beulich <jbeulich@suse.com>, Julien Grall <julien@xen.org>,
	Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>
Subject: Re: [RFC PATCH 00/10] Preemption in hypervisor (ARM only)
Date: Wed, 24 Feb 2021 18:07:25 +0000	[thread overview]
Message-ID: <25034a7a-83ed-0848-8d23-67ed9d02c61c@citrix.com> (raw)
In-Reply-To: <20210223023428.757694-1-volodymyr_babchuk@epam.com>

On 23/02/2021 02:34, Volodymyr Babchuk wrote:
> Hello community,
>
> Subject of this cover letter is quite self-explanatory. This patch
> series implements PoC for preemption in hypervisor mode.
>
> This is the sort of follow-up to recent discussion about latency
> ([1]).
>
> Motivation
> ==========
>
> It is well known that Xen is not preemptable. On other words, it is
> impossible to switch vCPU contexts while running in hypervisor
> mode. Only one place where scheduling decision can be made and one
> vCPU can be replaced with another is the exit path from the hypervisor
> mode. The one exception are Idle vCPUs, which never leaves the
> hypervisor mode for obvious reasons.
>
> This leads to a number of problems. This list is not comprehensive. It
> lists only things that I or my colleagues encountered personally.
>
> Long-running hypercalls. Due to nature of some hypercalls they can
> execute for arbitrary long time. Mostly those are calls that deal with
> long list of similar actions, like memory pages processing. To deal
> with this issue Xen employs most horrific technique called "hypercall
> continuation". When code that handles hypercall decides that it should
> be preempted, it basically updates the hypercall parameters, and moves
> guest PC one instruction back. This causes guest to re-execute the
> hypercall with altered parameters, which will allow hypervisor to
> continue hypercall execution later. This approach itself have obvious
> problems: code that executes hypercall is responsible for preemption,
> preemption checks are infrequent (because they are costly by
> themselves), hypercall execution state is stored in guest-controlled
> area, we rely on guest's good will to continue the hypercall. All this
> imposes restrictions on which hypercalls can be preempted, when they
> can be preempted and how to write hypercall handlers. Also, it
> requires very accurate coding and already led to at least one
> vulnerability - XSA-318. Some hypercalls can not be preempted at all,
> like the one mentioned in [1].
>
> Absence of hypervisor threads/vCPUs. Hypervisor owns only idle vCPUs,
> which are supposed to run when the system is idle. If hypervisor needs
> to execute own tasks that are required to run right now, it have no
> other way than to execute them on current vCPU. But scheduler does not
> know that hypervisor executes hypervisor task and accounts spent time
> to a domain. This can lead to domain starvation.
>
> Also, absence of hypervisor threads leads to absence of high-level
> synchronization primitives like mutexes, conditional variables,
> completions, etc. This leads to two problems: we need to use spinlocks
> everywhere and we have problems when porting device drivers from linux
> kernel.

You cannot reenter a guest, even to deliver interrupts, if pre-empted at
an arbitrary point in a hypercall.  State needs unwinding suitably.

Xen's non-preemptible-ness is designed to specifically force you to not
implement long-running hypercalls which would interfere with timely
interrupt handling in the general case.

Hypervisor/virt properties are different to both a kernel-only-RTOS, and
regular usespace.  This was why I gave you some specific extra scenarios
to do latency testing with, so you could make a fair comparison of
"extra overhead caused by Xen" separate from "overhead due to
fundamental design constraints of using virt".


Preemption like this will make some benchmarks look better, but it also
introduces the ability to create fundamental problems, like preventing
any interrupt delivery into a VM for seconds of wallclock time while
each vcpu happens to be in a long-running hypercall.

If you want timely interrupt handling, you either need to partition your
workloads by the long-running-ness of their hypercalls, or not have
long-running hypercalls.

I remain unconvinced that preemption is an sensible fix to the problem
you're trying to solve.

~Andrew


  parent reply	other threads:[~2021-02-24 18:08 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23  2:34 [RFC PATCH 00/10] Preemption in hypervisor (ARM only) Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 01/10] sched: core: save IRQ state during locking Volodymyr Babchuk
2021-02-23  8:52   ` Jürgen Groß
2021-02-23 11:15     ` Volodymyr Babchuk
2021-02-24 18:29   ` Andrew Cooper
2021-02-23  2:34 ` [RFC PATCH 03/10] sched: credit2: " Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 02/10] sched: rt: " Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 04/10] preempt: use atomic_t to for preempt_count Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 05/10] preempt: add try_preempt() function Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 07/10] sched: core: remove ASSERT_NOT_IN_ATOMIC and disable preemption[!] Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 06/10] arm: setup: disable preemption during startup Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 08/10] arm: context_switch: allow to run with IRQs already disabled Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 10/10] [HACK] alloc pages: enable preemption early Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 09/10] arm: traps: try to preempt before leaving IRQ handler Volodymyr Babchuk
2021-02-23  9:02 ` [RFC PATCH 00/10] Preemption in hypervisor (ARM only) Julien Grall
2021-02-23 12:06   ` Volodymyr Babchuk
2021-02-24 10:08     ` Julien Grall
2021-02-24 20:57       ` Volodymyr Babchuk
2021-02-24 22:31         ` Julien Grall
2021-02-24 23:58           ` Volodymyr Babchuk
2021-02-25  0:39             ` Andrew Cooper
2021-02-25 12:51               ` Volodymyr Babchuk
2021-03-05  9:31                 ` Volodymyr Babchuk
2021-02-24 18:07 ` Andrew Cooper [this message]
2021-02-24 23:37   ` Volodymyr Babchuk
2021-03-01 14:39     ` George Dunlap
     [not found] <161405394665.5977.17427402181939884734@c667a6b167f6>
2021-02-23 20:29 ` Stefano Stabellini
2021-02-24  0:19   ` Volodymyr Babchuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25034a7a-83ed-0848-8d23-67ed9d02c61c@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=dfaggioli@suse.com \
    --cc=george.dunlap@citrix.com \
    --cc=iwj@xenproject.org \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=mengxu@cis.upenn.edu \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.