All of lore.kernel.org
 help / color / mirror / Atom feed
From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
To: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <stefano.stabellini@xilinx.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Julien Grall <jgrall@amazon.com>,
	Dario Faggioli <dario.faggioli@suse.com>,
	"Bertrand.Marquis@arm.com" <Bertrand.Marquis@arm.com>,
	"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>
Subject: Re: IRQ latency measurements in hypervisor
Date: Fri, 15 Jan 2021 15:45:43 +0000	[thread overview]
Message-ID: <87wnwe2ogp.fsf@epam.com> (raw)
In-Reply-To: <f31c9cca-0275-eaef-5fcd-c8484d4b5da0@xen.org>


Hi Julien,

Julien Grall writes:

> Hi Volodymyr, Stefano,
>
> On 14/01/2021 23:33, Stefano Stabellini wrote:
>> + Bertrand, Andrew (see comment on alloc_heap_pages())
>
> Long running hypercalls are usually considered security issues.
>
> In this case, only the control domain can issue large memory
> allocation (2GB at a time). Guest, would only be able to allocate 2MB
> at the time, so from the numbers below, it would only take 1ms max.
>
> So I think we are fine here. Next time, you find a large loop, please
> provide an explanation why they are not security issues (e.g. cannot
> be used by guests) or send an email to the Security Team in doubt.

Sure. In this case I took into account that only control domain can
issue this call, I just didn't stated this explicitly. Next time will
do.

>>> ARMv8 platform. Namely Renesas Rcar H3 SoC on Salvator board.
>
> Which core is it?

Cortex A57

[...]

>> 2. RTDS scheduler. With console disabled, things like "hexdump -v
>>>     /dev/zero" didn't affected the latency so badly, but anyways,
>>>     sometimes I got ~600us spikes. This is not a surprise, because of
>>>     default RTDS configuration. I changed period for DomU from default
>>>     10ms to 100us and things got better: with Dom0 burning CPU I am
>>>     rarely getting max latency of about ~30us with mean latency of ~9us
>>>     and deviation of ~0.5us. On other hand, when I tried to set period
>>>     to 30us, max latency rose up to ~60us.
> In a related topic, I am not entirely sure that all the hypercalls
> would be able to fit in the 100us slice. In particular, the one which
> are touching the P2M and do memory allocation.

I agree with you. In my experiments I didn't found a case with long
running hypercall (apart from mentioned populate_physmap), but of course
there should be cases with such calls.

>
>> This is very interestingi too. Did you get any spikes with the
>> period
>> set to 100us? It would be fantastic if there were none.
>> 
>>> 3. Huge latency spike during domain creation. I conducted some
>>>     additional tests, including use of PV drivers, but this didn't
>>>     affected the latency in my "real time" domain. But attempt to
>>>     create another domain with relatively large memory size of 2GB led
>>>     to huge spike in latency. Debugging led to this call path:
>>>
>>>     XENMEM_populate_physmap -> populate_physmap() ->
>>>     alloc_domheap_pages() -> alloc_heap_pages()-> huge
>>>     "for ( i = 0; i < (1 << order); i++ )" loop.
>
> There are two for loops in alloc_heap_pages() using this syntax. Which
> one are your referring to?

I did some tracing with Lautrebach. It pointed to the first loop and
especially to flush_page_to_ram() call if I remember correctly.

>>>
>>>     This loops handles struct page* for every one of 262144 pages that
>>>     was allocated by calling populate_physmap().
>
> Looking at the domain creation code, 2GB will be split in two extents
> of 1GB. This means, there will be at least a preemption point between
> the allocation of the two extents.

Yes. 1GB is exactly 262144 4KB pages.

[...]

>>> I managed to overcome the issue #3 by commenting out all calls to
>>> populate_one_size() except the populate_one_size(PFN_4K_SHIFT) in
>>> xg_dom_arm.c. This lengthened domain construction, but my "RT" domain
>>> didn't experienced so big latency issues. Apparently all other
>>> hypercalls which are used during domain creation are either fast or
>>> preemptible. No doubts that my hack lead to page tables inflation and
>>> overall performance drop.
>> I think we need to follow this up and fix this. Maybe just by adding
>> a hypercall continuation to the loop.
>
> When I read "hypercall continuation", I read we will return to the
> guest context so it can process interrupts and potentially switch to
> another task.
>
> This means that the guest could issue a second populate_physmap() from
> the vCPU. Therefore any restart information should be part of the 
> hypercall parameters. So far, I don't see how this would be possible.
>
> Even if we overcome that part, this can be easily abuse by a guest as
> the memory is not yet accounted to the domain. Imagine a guest that 
> never request the continuation of the populate_physmap(). So we would
> need to block the vCPU until the allocation is finished.

Moreover, most of the alloc_heap_pages() sits under spinlock, so first
step would be to split this function into smaller atomic parts.

> I think the first step is we need to figure out which part of the
> allocation is slow (see my question above). From there, we can figure 
> out if there is a way to reduce the impact.

I'll do more tracing and will return with more accurate numbers. But as
far as I can see, any loop on 262144 pages will take some time...

-- 
Volodymyr Babchuk at EPAM

  reply	other threads:[~2021-01-15 15:46 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-12 23:48 IRQ latency measurements in hypervisor Volodymyr Babchuk
2021-01-14 23:33 ` Stefano Stabellini
2021-01-15 11:42   ` Julien Grall
2021-01-15 15:45     ` Volodymyr Babchuk [this message]
2021-01-15 17:13       ` Julien Grall
2021-01-15 23:41         ` Stefano Stabellini
2021-01-16 12:59           ` Andrew Cooper
2021-01-20 23:09           ` Volodymyr Babchuk
2021-01-20 23:03         ` Volodymyr Babchuk
2021-01-21  0:52           ` Stefano Stabellini
2021-01-21 21:01           ` Julien Grall
2021-01-15 15:27   ` Volodymyr Babchuk
2021-01-15 23:17     ` Stefano Stabellini
2021-01-16 12:47       ` Julien Grall
2021-01-21  0:49       ` Volodymyr Babchuk
2021-01-21  0:59         ` Stefano Stabellini
2021-01-18 16:40   ` Dario Faggioli
2021-01-21  1:20     ` Volodymyr Babchuk
2021-01-21  8:39       ` Dario Faggioli
2021-01-16 14:40 ` Andrew Cooper
2021-01-21  2:39   ` Volodymyr Babchuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wnwe2ogp.fsf@epam.com \
    --to=volodymyr_babchuk@epam.com \
    --cc=Bertrand.Marquis@arm.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dario.faggioli@suse.com \
    --cc=jgrall@amazon.com \
    --cc=julien@xen.org \
    --cc=stefano.stabellini@xilinx.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.