All of lore.kernel.org
 help / color / mirror / Atom feed
From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
To: Julien Grall <julien.grall.oss@gmail.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	George Dunlap <george.dunlap@citrix.com>,
	Dario Faggioli <dfaggioli@suse.com>,
	Meng Xu <mengxu@cis.upenn.edu>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <iwj@xenproject.org>, Jan Beulich <jbeulich@suse.com>,
	Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>
Subject: Re: [RFC PATCH 00/10] Preemption in hypervisor (ARM only)
Date: Wed, 24 Feb 2021 23:58:54 +0000	[thread overview]
Message-ID: <87eeh59fwi.fsf@epam.com> (raw)
In-Reply-To: <CAJ=z9a0v37rc_B7xVdQECAYd52PJ0UajGzvX1DYP56Q2RXQ2Tw@mail.gmail.com>



Julien Grall writes:

> On Wed, 24 Feb 2021 at 20:58, Volodymyr Babchuk
> <Volodymyr_Babchuk@epam.com> wrote:
>>
>>
>> Hi Julien,
>>
>> Julien Grall writes:
>>
>>> On 23/02/2021 12:06, Volodymyr Babchuk wrote:
>>>> Hi Julien,
>>>
>>> Hi Volodymyr,
>>>
>>>> Julien Grall writes:
>>>>> On 23/02/2021 02:34, Volodymyr Babchuk wrote:
>>>>> ... just rescheduling the vCPU. It will also give the opportunity for
>>>>> the guest to handle interrupts.
>>>>>
>>>>> If you don't return to the guest, then risk to get an RCU sched stall
>>>>> on that the vCPU (some hypercalls can take really really long).
>>>> Ah yes, you are right. I'd only wish that hypervisor saved context
>>>> of
>>>> hypercall on it's side...
>>>> I have example of OP-TEE before my eyes. They have special return
>>>> code
>>>> "task was interrupted" and they use separate call "continue execution of
>>>> interrupted task", which takes opaque context handle as a
>>>> parameter. With this approach state of interrupted call never leaks to > rest of the system.
>>>
>>> Feel free to suggest a new approach for the hypercals.
>>>
>>
>> I believe, I suggested it right above. There are some corner cases, that
>> should be addressed, of course.
>
> If we wanted a clean break, then possibly yes.  But I meant one that doesn't
> break all the existing users and doesn't put Xen at risk.
>
> I don't believe your approach fulfill it.

Of course, we can't touch any hypercalls that are part of stable
ABI. But if I got this right, domctls and sysctls are not stable, so one
can change theirs behavior quite drastically in major releases.

>>
>>>>>
>>>>>> This approach itself have obvious
>>>>>> problems: code that executes hypercall is responsible for preemption,
>>>>>> preemption checks are infrequent (because they are costly by
>>>>>> themselves), hypercall execution state is stored in guest-controlled
>>>>>> area, we rely on guest's good will to continue the hypercall.
>>>>>
>>>>> Why is it a problem to rely on guest's good will? The hypercalls
>>>>> should be preempted at a boundary that is safe to continue.
>>>> Yes, and it imposes restrictions on how to write hypercall
>>>> handler.
>>>> In other words, there are much more places in hypercall handler code
>>>> where it can be preempted than where hypercall continuation can be
>>>> used. For example, you can preempt hypercall that holds a mutex, but of
>>>> course you can't create an continuation point in such place.
>>>
>>> I disagree, you can create continuation point in such place. Although
>>> it will be more complex because you have to make sure you break the
>>> work in a restartable place.
>>
>> Maybe there is some misunderstanding. You can't create hypercall
>> continuation point in a place where you are holding mutex. Because,
>> there is absolutely not guarantee that guest will restart the
>> hypercall.
>
> I don't think we are disagreeing here. My point is you should rarely
> need to hold a mutex for a long period, so you could break your work
> in smaller chunk. In which cases, you can use hypercall continuation.

Let's say in this way: generally you can hold a mutex much longer than
you can hold a spinlock. And nothing catastrophic will happen if you are
preempted while holding a mutex. Better to avoid, this of course.

>
>>
>> But you can preempt vCPU while holding mutex, because xen owns scheduler
>> and it can guarantee that vCPU will be scheduled eventually to continue
>> the work and release mutex.
>
> The problem is the "eventually". If you are accounting the time spent
> in the hypervisor to the vCPU A, then there is a possibility that it
> has exhausted its time slice. In which case, your vCPU A may be
> sleeping for a while with a mutex held.
>
> If another vCPU B needs the mutex, it will either have to wait
> potentially for a long time or we need to force vCPU A to run on
> borrowed time.

Yes, of course.

>>
>>> I would also like to point out that preemption also have some drawbacks.
>>> With RT in mind, you have to deal with priority inversion (e.g. a
>>> lower priority vCPU held a mutex that is required by an higher
>>> priority).
>>
>> Of course. This is not as simple as "just call scheduler when we want
>> to".
>
> Your e-mail made it sounds like it was easy to add preemption in
> Xen. ;)

I'm sorry for that :)
Actually, there is lots of work to do. It appeared to me that "current"
needs to be reworked, preempt_enable/disable needs to be reworked, per-cpu
variables also should be reworked. And this is just to ensure
consistency of already existing code.

And I am not mentioning x86 support there...

>>> Outside of RT, you have to be careful where mutex are held. In your
>>> earlier answer, you suggested to held mutex for the memory
>>> allocation. If you do that, then it means a domain A can block
>>> allocation for domain B as it helds the mutex.
>>
>> As long as we do not exit to a EL1 with mutex being held, domain A can't
>> block anything. Of course, we have to deal with priority inversion, but
>> malicious domain can't cause DoS.
>
> It is not really a priority inversion problem outside of RT because
> all the tasks will have the same priority. It is more a time
> accounting problem because each vCPU may have a different number of
> credits.

Speaking of that, RTDS does not use concept of priority. And ARINC of
course neither.


>>>>> I am really surprised that this is the only changes necessary in
>>>>> Xen. For a first approach, we may want to be conservative when the
>>>>> preemption is happening as I am not convinced that all the places are
>>>>> safe to preempt.
>>>>>
>>>> Well, I can't say that I ran extensive tests, but I played with this
>>>> for
>>>> some time and it seemed quite stable. Of course, I had some problems
>>>> with RTDS...
>>>> As I see it, Xen is already supports SMP, so all places where races
>>>> are
>>>> possible should already be covered by spinlocks or taken into account by
>>>> some other means.
>>> That's correct for shared resources. I am more worried for any
>>> hypercalls that expected to run more or less continuously (e.g not
>>> taking into account interrupt) on the same pCPU.
>>
>> Are there many such hypercalls? They can disable preemption if they
>> really need to run on the same pCPU. As I understand, they should be
>> relatively fast, because they can't create continuations anyway.
>
> Well, I never tried to make Xen preemptible... My comment is based on
> the fact that the use preempt_{enable, disable}() was mostly done on a
> best effort basis.
>
> The usual suspects are anything using this_cpu() or interacting with
> the per-CPU HW registers.
>
> From a quick look here a few things (only looked at Arm):
>   * map_domain_page() in particular on arm32 because this is using
> per-CPU page-tables
>   * guest_atomics_* as this uses this_cpu()
>   * virt_to_mfn() in particular the failure path
>   * Incorrect use (or missing) rcu locking. (Hopefully Juergen's
> recent work in the RCU mitigate the risk)
>
> I can provide guidance, but you will have to go through the code and
> check what's happening.

Thank you for the list. Of course, I need to see thru all the code. I
already had a bunch of problems with per_cpu variables...

-- 
Volodymyr Babchuk at EPAM

  reply	other threads:[~2021-02-24 23:59 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23  2:34 [RFC PATCH 00/10] Preemption in hypervisor (ARM only) Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 01/10] sched: core: save IRQ state during locking Volodymyr Babchuk
2021-02-23  8:52   ` Jürgen Groß
2021-02-23 11:15     ` Volodymyr Babchuk
2021-02-24 18:29   ` Andrew Cooper
2021-02-23  2:34 ` [RFC PATCH 03/10] sched: credit2: " Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 02/10] sched: rt: " Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 04/10] preempt: use atomic_t to for preempt_count Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 05/10] preempt: add try_preempt() function Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 07/10] sched: core: remove ASSERT_NOT_IN_ATOMIC and disable preemption[!] Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 06/10] arm: setup: disable preemption during startup Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 08/10] arm: context_switch: allow to run with IRQs already disabled Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 10/10] [HACK] alloc pages: enable preemption early Volodymyr Babchuk
2021-02-23  2:34 ` [RFC PATCH 09/10] arm: traps: try to preempt before leaving IRQ handler Volodymyr Babchuk
2021-02-23  9:02 ` [RFC PATCH 00/10] Preemption in hypervisor (ARM only) Julien Grall
2021-02-23 12:06   ` Volodymyr Babchuk
2021-02-24 10:08     ` Julien Grall
2021-02-24 20:57       ` Volodymyr Babchuk
2021-02-24 22:31         ` Julien Grall
2021-02-24 23:58           ` Volodymyr Babchuk [this message]
2021-02-25  0:39             ` Andrew Cooper
2021-02-25 12:51               ` Volodymyr Babchuk
2021-03-05  9:31                 ` Volodymyr Babchuk
2021-02-24 18:07 ` Andrew Cooper
2021-02-24 23:37   ` Volodymyr Babchuk
2021-03-01 14:39     ` George Dunlap
     [not found] <161405394665.5977.17427402181939884734@c667a6b167f6>
2021-02-23 20:29 ` Stefano Stabellini
2021-02-24  0:19   ` Volodymyr Babchuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87eeh59fwi.fsf@epam.com \
    --to=volodymyr_babchuk@epam.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dfaggioli@suse.com \
    --cc=george.dunlap@citrix.com \
    --cc=iwj@xenproject.org \
    --cc=jbeulich@suse.com \
    --cc=julien.grall.oss@gmail.com \
    --cc=mengxu@cis.upenn.edu \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.