Re: IRQ latency measurements in hypervisor

From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
To: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien@xen.org>,
	Stefano Stabellini <stefano.stabellini@xilinx.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Julien Grall <jgrall@amazon.com>,
	Dario Faggioli <dario.faggioli@suse.com>,
	"Bertrand.Marquis@arm.com" <Bertrand.Marquis@arm.com>,
	"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>
Subject: Re: IRQ latency measurements in hypervisor
Date: Wed, 20 Jan 2021 23:09:59 +0000	[thread overview]
Message-ID: <87czxz2ojc.fsf@epam.com> (raw)
In-Reply-To: <alpine.DEB.2.21.2101151459280.31265@sstabellini-ThinkPad-T480s>

Hi Stefano,

Stefano Stabellini writes:

> On Fri, 15 Jan 2021, Julien Grall wrote:
>> On 15/01/2021 15:45, Volodymyr Babchuk wrote:
>> > 
>> > Hi Julien,
>> > 
>> > Julien Grall writes:
>> > 
>> > > Hi Volodymyr, Stefano,
>> > > 
>> > > On 14/01/2021 23:33, Stefano Stabellini wrote:
>> > > > + Bertrand, Andrew (see comment on alloc_heap_pages())
>> > > 
>> > > Long running hypercalls are usually considered security issues.
>> > > 
>> > > In this case, only the control domain can issue large memory
>> > > allocation (2GB at a time). Guest, would only be able to allocate 2MB
>> > > at the time, so from the numbers below, it would only take 1ms max.
>> > > 
>> > > So I think we are fine here. Next time, you find a large loop, please
>> > > provide an explanation why they are not security issues (e.g. cannot
>> > > be used by guests) or send an email to the Security Team in doubt.
>> > 
>> > Sure. In this case I took into account that only control domain can
>> > issue this call, I just didn't stated this explicitly. Next time will
>> > do.
>> 
>> I am afraid that's not correct. The guest can request to populate a region.
>> This is used for instance in the ballooning case.
>> 
>> The main difference is a non-privileged guest will not be able to do
>> allocation larger than 2MB.
>> 
>> [...]
>> 
>> > > > This is very interestingi too. Did you get any spikes with the
>> > > > period
>> > > > set to 100us? It would be fantastic if there were none.
>> > > > 
>> > > > > 3. Huge latency spike during domain creation. I conducted some
>> > > > >      additional tests, including use of PV drivers, but this didn't
>> > > > >      affected the latency in my "real time" domain. But attempt to
>> > > > >      create another domain with relatively large memory size of 2GB
>> > > > > led
>> > > > >      to huge spike in latency. Debugging led to this call path:
>> > > > > 
>> > > > >      XENMEM_populate_physmap -> populate_physmap() ->
>> > > > >      alloc_domheap_pages() -> alloc_heap_pages()-> huge
>> > > > >      "for ( i = 0; i < (1 << order); i++ )" loop.
>> > > 
>> > > There are two for loops in alloc_heap_pages() using this syntax. Which
>> > > one are your referring to?
>> > 
>> > I did some tracing with Lautrebach. It pointed to the first loop and
>> > especially to flush_page_to_ram() call if I remember correctly.
>> 
>> Thanks, I am not entirely surprised because we are clean and invalidating the
>> region line by line and across all the CPUs.
>> 
>> If we are assuming 128 bytes cacheline, we will need to issue 32 cache
>> instructions per page. This going to involve quite a bit of traffic on the
>> system.
>
> I think Julien is most likely right. It would be good to verify this
> with an experiment. For instance, you could remove the
> flush_page_to_ram() call for one test and see if you see any latency
> problems.

Yes, I did exactly this and shared results in my reply to Julien.

>> One possibility would be to defer the cache flush when the domain is created
>> and use the hypercall XEN_DOMCTL_cacheflush to issue the flush.
>> 
>> Note that XEN_DOMCTL_cacheflush would need some modification to be
>> preemptible. But at least, it will work on a GFN which is easier to track.
>  
> This looks like a solid suggestion. XEN_DOMCTL_cacheflush is already
> used by the toolstack in a few places. 
>
> I am also wondering if we can get away with fewer flush_page_to_ram()
> calls from alloc_heap_pages() for memory allocations done at boot time
> soon after global boot memory scrubbing.

This is doable, if you are trying to optimize boot time. On the other
hand, this is the extra check in already quite complex function.

BTW, I briefly looked at Xen boot time and saw that Dom0 construction takes
ample amount of time.

-- 
Volodymyr Babchuk at EPAM