From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
To: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <stefano.stabellini@xilinx.com>,
"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
Julien Grall <jgrall@amazon.com>,
Dario Faggioli <dario.faggioli@suse.com>,
"Bertrand.Marquis@arm.com" <Bertrand.Marquis@arm.com>,
"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>
Subject: Re: IRQ latency measurements in hypervisor
Date: Fri, 15 Jan 2021 15:45:43 +0000 [thread overview]
Message-ID: <87wnwe2ogp.fsf@epam.com> (raw)
In-Reply-To: <f31c9cca-0275-eaef-5fcd-c8484d4b5da0@xen.org>
Hi Julien,
Julien Grall writes:
> Hi Volodymyr, Stefano,
>
> On 14/01/2021 23:33, Stefano Stabellini wrote:
>> + Bertrand, Andrew (see comment on alloc_heap_pages())
>
> Long running hypercalls are usually considered security issues.
>
> In this case, only the control domain can issue large memory
> allocation (2GB at a time). Guest, would only be able to allocate 2MB
> at the time, so from the numbers below, it would only take 1ms max.
>
> So I think we are fine here. Next time, you find a large loop, please
> provide an explanation why they are not security issues (e.g. cannot
> be used by guests) or send an email to the Security Team in doubt.
Sure. In this case I took into account that only control domain can
issue this call, I just didn't stated this explicitly. Next time will
do.
>>> ARMv8 platform. Namely Renesas Rcar H3 SoC on Salvator board.
>
> Which core is it?
Cortex A57
[...]
>> 2. RTDS scheduler. With console disabled, things like "hexdump -v
>>> /dev/zero" didn't affected the latency so badly, but anyways,
>>> sometimes I got ~600us spikes. This is not a surprise, because of
>>> default RTDS configuration. I changed period for DomU from default
>>> 10ms to 100us and things got better: with Dom0 burning CPU I am
>>> rarely getting max latency of about ~30us with mean latency of ~9us
>>> and deviation of ~0.5us. On other hand, when I tried to set period
>>> to 30us, max latency rose up to ~60us.
> In a related topic, I am not entirely sure that all the hypercalls
> would be able to fit in the 100us slice. In particular, the one which
> are touching the P2M and do memory allocation.
I agree with you. In my experiments I didn't found a case with long
running hypercall (apart from mentioned populate_physmap), but of course
there should be cases with such calls.
>
>> This is very interestingi too. Did you get any spikes with the
>> period
>> set to 100us? It would be fantastic if there were none.
>>
>>> 3. Huge latency spike during domain creation. I conducted some
>>> additional tests, including use of PV drivers, but this didn't
>>> affected the latency in my "real time" domain. But attempt to
>>> create another domain with relatively large memory size of 2GB led
>>> to huge spike in latency. Debugging led to this call path:
>>>
>>> XENMEM_populate_physmap -> populate_physmap() ->
>>> alloc_domheap_pages() -> alloc_heap_pages()-> huge
>>> "for ( i = 0; i < (1 << order); i++ )" loop.
>
> There are two for loops in alloc_heap_pages() using this syntax. Which
> one are your referring to?
I did some tracing with Lautrebach. It pointed to the first loop and
especially to flush_page_to_ram() call if I remember correctly.
>>>
>>> This loops handles struct page* for every one of 262144 pages that
>>> was allocated by calling populate_physmap().
>
> Looking at the domain creation code, 2GB will be split in two extents
> of 1GB. This means, there will be at least a preemption point between
> the allocation of the two extents.
Yes. 1GB is exactly 262144 4KB pages.
[...]
>>> I managed to overcome the issue #3 by commenting out all calls to
>>> populate_one_size() except the populate_one_size(PFN_4K_SHIFT) in
>>> xg_dom_arm.c. This lengthened domain construction, but my "RT" domain
>>> didn't experienced so big latency issues. Apparently all other
>>> hypercalls which are used during domain creation are either fast or
>>> preemptible. No doubts that my hack lead to page tables inflation and
>>> overall performance drop.
>> I think we need to follow this up and fix this. Maybe just by adding
>> a hypercall continuation to the loop.
>
> When I read "hypercall continuation", I read we will return to the
> guest context so it can process interrupts and potentially switch to
> another task.
>
> This means that the guest could issue a second populate_physmap() from
> the vCPU. Therefore any restart information should be part of the
> hypercall parameters. So far, I don't see how this would be possible.
>
> Even if we overcome that part, this can be easily abuse by a guest as
> the memory is not yet accounted to the domain. Imagine a guest that
> never request the continuation of the populate_physmap(). So we would
> need to block the vCPU until the allocation is finished.
Moreover, most of the alloc_heap_pages() sits under spinlock, so first
step would be to split this function into smaller atomic parts.
> I think the first step is we need to figure out which part of the
> allocation is slow (see my question above). From there, we can figure
> out if there is a way to reduce the impact.
I'll do more tracing and will return with more accurate numbers. But as
far as I can see, any loop on 262144 pages will take some time...
--
Volodymyr Babchuk at EPAM
next prev parent reply other threads:[~2021-01-15 15:46 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-12 23:48 IRQ latency measurements in hypervisor Volodymyr Babchuk
2021-01-14 23:33 ` Stefano Stabellini
2021-01-15 11:42 ` Julien Grall
2021-01-15 15:45 ` Volodymyr Babchuk [this message]
2021-01-15 17:13 ` Julien Grall
2021-01-15 23:41 ` Stefano Stabellini
2021-01-16 12:59 ` Andrew Cooper
2021-01-20 23:09 ` Volodymyr Babchuk
2021-01-20 23:03 ` Volodymyr Babchuk
2021-01-21 0:52 ` Stefano Stabellini
2021-01-21 21:01 ` Julien Grall
2021-01-15 15:27 ` Volodymyr Babchuk
2021-01-15 23:17 ` Stefano Stabellini
2021-01-16 12:47 ` Julien Grall
2021-01-21 0:49 ` Volodymyr Babchuk
2021-01-21 0:59 ` Stefano Stabellini
2021-01-18 16:40 ` Dario Faggioli
2021-01-21 1:20 ` Volodymyr Babchuk
2021-01-21 8:39 ` Dario Faggioli
2021-01-16 14:40 ` Andrew Cooper
2021-01-21 2:39 ` Volodymyr Babchuk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wnwe2ogp.fsf@epam.com \
--to=volodymyr_babchuk@epam.com \
--cc=Bertrand.Marquis@arm.com \
--cc=andrew.cooper3@citrix.com \
--cc=dario.faggioli@suse.com \
--cc=jgrall@amazon.com \
--cc=julien@xen.org \
--cc=stefano.stabellini@xilinx.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).