xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Julien Grall <julien.grall.oss@gmail.com>
To: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	 George Dunlap <george.dunlap@citrix.com>,
	Dario Faggioli <dfaggioli@suse.com>,
	 Meng Xu <mengxu@cis.upenn.edu>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	 Ian Jackson <iwj@xenproject.org>,
	Jan Beulich <jbeulich@suse.com>,
	 Stefano Stabellini <sstabellini@kernel.org>,
	Wei Liu <wl@xen.org>
Subject: Re: [RFC PATCH 00/10] Preemption in hypervisor (ARM only)
Date: Wed, 24 Feb 2021 22:31:26 +0000	[thread overview]
Message-ID: <CAJ=z9a0v37rc_B7xVdQECAYd52PJ0UajGzvX1DYP56Q2RXQ2Tw@mail.gmail.com> (raw)
In-Reply-To: <87o8g99oal.fsf@epam.com>

On Wed, 24 Feb 2021 at 20:58, Volodymyr Babchuk
<Volodymyr_Babchuk@epam.com> wrote:
>
>
> Hi Julien,
>
> Julien Grall writes:
>
> > On 23/02/2021 12:06, Volodymyr Babchuk wrote:
> >> Hi Julien,
> >
> > Hi Volodymyr,
> >
> >> Julien Grall writes:
> >>> On 23/02/2021 02:34, Volodymyr Babchuk wrote:
> >>> ... just rescheduling the vCPU. It will also give the opportunity for
> >>> the guest to handle interrupts.
> >>>
> >>> If you don't return to the guest, then risk to get an RCU sched stall
> >>> on that the vCPU (some hypercalls can take really really long).
> >> Ah yes, you are right. I'd only wish that hypervisor saved context
> >> of
> >> hypercall on it's side...
> >> I have example of OP-TEE before my eyes. They have special return
> >> code
> >> "task was interrupted" and they use separate call "continue execution of
> >> interrupted task", which takes opaque context handle as a
> >> parameter. With this approach state of interrupted call never leaks to > rest of the system.
> >
> > Feel free to suggest a new approach for the hypercals.
> >
>
> I believe, I suggested it right above. There are some corner cases, that
> should be addressed, of course.

If we wanted a clean break, then possibly yes.  But I meant one that doesn't
break all the existing users and doesn't put Xen at risk.

I don't believe your approach fulfill it.

>
> >>>
> >>>> This approach itself have obvious
> >>>> problems: code that executes hypercall is responsible for preemption,
> >>>> preemption checks are infrequent (because they are costly by
> >>>> themselves), hypercall execution state is stored in guest-controlled
> >>>> area, we rely on guest's good will to continue the hypercall.
> >>>
> >>> Why is it a problem to rely on guest's good will? The hypercalls
> >>> should be preempted at a boundary that is safe to continue.
> >> Yes, and it imposes restrictions on how to write hypercall
> >> handler.
> >> In other words, there are much more places in hypercall handler code
> >> where it can be preempted than where hypercall continuation can be
> >> used. For example, you can preempt hypercall that holds a mutex, but of
> >> course you can't create an continuation point in such place.
> >
> > I disagree, you can create continuation point in such place. Although
> > it will be more complex because you have to make sure you break the
> > work in a restartable place.
>
> Maybe there is some misunderstanding. You can't create hypercall
> continuation point in a place where you are holding mutex. Because,
> there is absolutely not guarantee that guest will restart the
> hypercall.

I don't think we are disagreeing here. My point is you should rarely
need to hold a mutex for a long period, so you could break your work
in smaller chunk. In which cases, you can use hypercall continuation.

>
> But you can preempt vCPU while holding mutex, because xen owns scheduler
> and it can guarantee that vCPU will be scheduled eventually to continue
> the work and release mutex.

The problem is the "eventually". If you are accounting the time spent
in the hypervisor to the vCPU A, then there is a possibility that it
has exhausted its time slice. In which case, your vCPU A may be
sleeping for a while with a mutex held.

If another vCPU B needs the mutex, it will either have to wait
potentially for a long time or we need to force vCPU A to run on
borrowed time.

>
> > I would also like to point out that preemption also have some drawbacks.
> > With RT in mind, you have to deal with priority inversion (e.g. a
> > lower priority vCPU held a mutex that is required by an higher
> > priority).
>
> Of course. This is not as simple as "just call scheduler when we want
> to".

Your e-mail made it sounds like it was easy to add preemption in Xen. ;)

>
> > Outside of RT, you have to be careful where mutex are held. In your
> > earlier answer, you suggested to held mutex for the memory
> > allocation. If you do that, then it means a domain A can block
> > allocation for domain B as it helds the mutex.
>
> As long as we do not exit to a EL1 with mutex being held, domain A can't
> block anything. Of course, we have to deal with priority inversion, but
> malicious domain can't cause DoS.

It is not really a priority inversion problem outside of RT because
all the tasks will have the same priority. It is more a time
accounting problem because each vCPU may have a different number of
credits.

> >>> I am really surprised that this is the only changes necessary in
> >>> Xen. For a first approach, we may want to be conservative when the
> >>> preemption is happening as I am not convinced that all the places are
> >>> safe to preempt.
> >>>
> >> Well, I can't say that I ran extensive tests, but I played with this
> >> for
> >> some time and it seemed quite stable. Of course, I had some problems
> >> with RTDS...
> >> As I see it, Xen is already supports SMP, so all places where races
> >> are
> >> possible should already be covered by spinlocks or taken into account by
> >> some other means.
> > That's correct for shared resources. I am more worried for any
> > hypercalls that expected to run more or less continuously (e.g not
> > taking into account interrupt) on the same pCPU.
>
> Are there many such hypercalls? They can disable preemption if they
> really need to run on the same pCPU. As I understand, they should be
> relatively fast, because they can't create continuations anyway.

Well, I never tried to make Xen preemptible... My comment is based on
the fact that the use preempt_{enable, disable}() was mostly done on a
best effort basis.

The usual suspects are anything using this_cpu() or interacting with
the per-CPU HW registers.

From a quick look here a few things (only looked at Arm):
  * map_domain_page() in particular on arm32 because this is using
per-CPU page-tables
  * guest_atomics_* as this uses this_cpu()
  * virt_to_mfn() in particular the failure path
  * Incorrect use (or missing) rcu locking. (Hopefully Juergen's
recent work in the RCU mitigate the risk)

I can provide guidance, but you will have to go through the code and
check what's happening.

Cheers,


  reply	other threads:[~2021-02-24 22:32 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23  2:34 [RFC PATCH 00/10] Preemption in hypervisor (ARM only) Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 01/10] sched: core: save IRQ state during locking Volodymyr Babchuk
2021-02-23  8:52   ` Jürgen Groß
2021-02-23 11:15     ` Volodymyr Babchuk
2021-02-24 18:29   ` Andrew Cooper
2021-02-23  2:34 ` [RFC PATCH 03/10] sched: credit2: " Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 02/10] sched: rt: " Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 04/10] preempt: use atomic_t to for preempt_count Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 05/10] preempt: add try_preempt() function Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 07/10] sched: core: remove ASSERT_NOT_IN_ATOMIC and disable preemption[!] Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 06/10] arm: setup: disable preemption during startup Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 08/10] arm: context_switch: allow to run with IRQs already disabled Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 10/10] [HACK] alloc pages: enable preemption early Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 09/10] arm: traps: try to preempt before leaving IRQ handler Volodymyr Babchuk
2021-02-23  9:02 ` [RFC PATCH 00/10] Preemption in hypervisor (ARM only) Julien Grall
2021-02-23 12:06   ` Volodymyr Babchuk
2021-02-24 10:08     ` Julien Grall
2021-02-24 20:57       ` Volodymyr Babchuk
2021-02-24 22:31         ` Julien Grall [this message]
2021-02-24 23:58           ` Volodymyr Babchuk
2021-02-25  0:39             ` Andrew Cooper
2021-02-25 12:51               ` Volodymyr Babchuk
2021-03-05  9:31                 ` Volodymyr Babchuk
2021-02-24 18:07 ` Andrew Cooper
2021-02-24 23:37   ` Volodymyr Babchuk
2021-03-01 14:39     ` George Dunlap
     [not found] <161405394665.5977.17427402181939884734@c667a6b167f6>
2021-02-23 20:29 ` Stefano Stabellini
2021-02-24  0:19   ` Volodymyr Babchuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJ=z9a0v37rc_B7xVdQECAYd52PJ0UajGzvX1DYP56Q2RXQ2Tw@mail.gmail.com' \
    --to=julien.grall.oss@gmail.com \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dfaggioli@suse.com \
    --cc=george.dunlap@citrix.com \
    --cc=iwj@xenproject.org \
    --cc=jbeulich@suse.com \
    --cc=mengxu@cis.upenn.edu \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).