All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefano Stabellini <stefano.stabellini@xilinx.com>
To: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"Julien Grall" <jgrall@amazon.com>,
	Stefano Stabellini <stefano.stabellini@xilinx.com>,
	Dario Faggioli <dario.faggioli@suse.com>,
	<Bertrand.Marquis@arm.com>, <andrew.cooper3@citrix.com>
Subject: Re: IRQ latency measurements in hypervisor
Date: Thu, 14 Jan 2021 15:33:24 -0800	[thread overview]
Message-ID: <alpine.DEB.2.21.2101141515230.31265@sstabellini-ThinkPad-T480s> (raw)
In-Reply-To: <87pn294szv.fsf@epam.com>

+ Bertrand, Andrew (see comment on alloc_heap_pages())

Really great work, Volodymyr! Some comments below.


On Tue, 12 Jan 2021, Volodymyr Babchuk wrote:
> Hello community,
> 
> Recently I was experimenting with context switching latencies in
> Xen. I want to share my findings and discuss some issues I have found.
> 
> My target was to ensure that no more 30us passes between interrupt
> generation and handling in a guest VM. Main goal was to determine if
> Xen is ready for soft RT and to identify main issues that prevent
> this.
> 
> TLDR: I was able to achieve mean 15us latency with tuned RTDS
> scheduler, disabled serial console and some hacks to the Xen
> toolstack. This was done on a setup with 1 PCPU.
> 
> Test setup.
> 
> ARMv8 platform. Namely Renesas Rcar H3 SoC on Salvator board.
> 
> To accurately determine latency, I employed one of timer counter units
> (TMUs) available on the SoC. This is 32-bit timer with auto-reload,
> that can generate interrupt on underflow. I fed it with 33.275MHz
> clock, which gave me resolution of about 30ns. I programmed the timer
> to generate interrupt every 10ms. My ISR then read the current timer
> value and determined how much time passed since last underrun. This
> gave me time interval between IRQ generation and ISR invocation.
> Those values were collected and every 10 seconds statistics was
> calculated. There is an example of output from my Linux driver:

It looks like a solid approach to collect results, similar to the one we
used for the cache coloring work. Just make sure to collect very many
results.

A few of questions: did you use a single physical CPU? Are you using
RTDS and schedule 2 vCPU on 1 pCPU? Is dom0 idle or busy? I take the
results were measured in domU?


> [   83.873162] rt_eval_tmu e6fc0000.tmu: Mean: 44 (1320 ns) stddev: 8 (240 ns)
> [   94.136632] rt_eval_tmu e6fc0000.tmu: Mean: 44 (1320 ns) stddev: 8 (240 ns)
> [  104.400098] rt_eval_tmu e6fc0000.tmu: Mean: 50 (1500 ns) stddev: 129 (3870 ns)
> [  114.663828] rt_eval_tmu e6fc0000.tmu: Mean: 44 (1320 ns) stddev: 8 (240 ns)
> [  124.927296] rt_eval_tmu e6fc0000.tmu: Mean: 56 (1680 ns) stddev: 183 (5490 ns)
> 
> This is the baremetal Linux. And there is Dom0:
> 
> [  237.431003] rt_eval_tmu e6fc0000.tmu: Mean: 306 (9180 ns) stddev: 25 (750 ns)
> [  247.694506] rt_eval_tmu e6fc0000.tmu: Mean: 302 (9060 ns) stddev: 17 (510 ns)
> 
> Driver outputs both the raw timer value (eg. 4) and the same value
> scaled to nanoseconds (eg. 1320 ns). As you can see baremetal setup is
> much faster. But experiments showed that Linux does not provide
> consistent values, even when running in baremetal mode. You can see
> sporadic spikes in "std dev" values.

So baremetal IRQ latency is 1320-1680ns and Linux IRQ latency is
9060-9180ns. I am not surprised that Linux results are inconsistent but
I have a couple of observations:

- 9us is high for Linux
If the system is idle, the latency should be lower, around 2-3us. I
imagine you are actually running some sort of interference from dom0? Or
using RTDS and descheduling vCPUs?

- the stddev of 3870ns is high for baremetal
In the baremetal case the stddev should be minimal if the system is
idle.


> So my next step was to use proper RT OS to do the measurements. I
> chose Zephyr. My PR that adds Xen support to Zephyr can be found at
> [1]. Support for RCAR Gen3 is not upstreamed, but is present on my
> GitHub([2]). At [3] you can find the source code for application that
> does the latency measurements. It behaves exactly as my linux driver,
> but provides a bit more information:
> 
>  *** Booting Zephyr OS build zephyr-v2.4.0-2750-g0f2c858a39fc  ***
> RT Eval app
> 
> Counter freq is 33280000 Hz. Period is 30 ns
> Set alarm in 0 sec (332800 ticks)
> Mean: 600 (18000 ns) stddev: 3737 (112110 ns) above thr: 0% [265 (7950 ns) - 66955 (2008650 ns)] global [265 (7950 ns) 66955 (2008650 ns)]
> Mean: 388 (11640 ns) stddev: 2059 (61770 ns) above thr: 0% [266 (7980 ns) - 58830 (1764900 ns)] global [265 (7950 ns) 66955 (2008650 ns)]
> Mean: 358 (10740 ns) stddev: 1796 (53880 ns) above thr: 0% [265 (7950 ns) - 57780 (1733400 ns)] global [265 (7950 ns) 66955 (2008650 ns)]
> ...
> 
> So there you can see: mean time, standard deviation, % of interrupts
> that was processed above 30us threshold, minimum and maximum latency
> values for the current 10s run, global minimum and maximum.
> 
> Zephyr running as baremetal showed very stable results (this is an
> older build, so no extended statistics there):
> 
> ## Starting application at 0x480803C0 ...
>  *** Booting Zephyr OS build zephyr-v2.4.0-1137-g5803ee1e8183  ***
> RT Eval app
> 
> Counter freq is 33280000 Hz. Period is 30 ns
> Mean: 31 (930 ns) stddev: 0 (0 ns)
> Mean: 31 (930 ns) stddev: 0 (0 ns)
> Mean: 31 (930 ns) stddev: 0 (0 ns)
> Mean: 31 (930 ns) stddev: 0 (0 ns)
> Mean: 31 (930 ns) stddev: 0 (0 ns)
> Mean: 31 (930 ns) stddev: 0 (0 ns)
> ...
> 
> As Zephyr provided stable readouts with no jitter, I used it to do all
> subsequent measurements.

I am a bit confused here. Looking at the numbers above the stddev is
112110 ns in the first instance. That is pretty high. Am I looking at
the wrong numbers?


 
> IMPORTANT! All subsequent tests was conducted with only 1 CPU core
> enabled. My goal was to ensure that system can timely react to an
> external interrupt even under load.

All right. FYI I have no frame of reference for 2 vCPUs on 1 pCPUs, all
my tests were done with 1vCPU <-> 1pCPU and the null scheduler.

 
> Test results and latency sources
> 
> As you can see, baremetal OS provides latency of ~0.9us without any
> deviations. The same code running as DomU on idle system shows mean
> latency of 12us and deviation of about 10us. Range of latencies in a
> 10s batch can vary from 8us to 25us. This fits into required 30us
> threshold, so no big issue there.
> 
> But this worsens under certain conditions.
> 
> 1. Serial console. RCAR serial driver (scif) works in synchronous
>    mode, so any heavy serial console output leads to higher
>    latency. Tests shows mean latency of 1000us and deviation of 1332
>    us. 54% of interrupts are handled outside of 30us period. Real
>    values may be even higher, because in some cases timer would do
>    full cycle and instead of say 11ms ISR would read 11 % 10 = 1ms
>    latency. I tried to enable asynchronous mode for the serial
>    console. This made things better, but it lead to gaps in output, so
>    I just turned the serial console off completely.

That's very interesting. I wonder if other serial drivers would cause
similar issues, e.g. PL011.


> 2. RTDS scheduler. With console disabled, things like "hexdump -v
>    /dev/zero" didn't affected the latency so badly, but anyways,
>    sometimes I got ~600us spikes. This is not a surprise, because of
>    default RTDS configuration. I changed period for DomU from default
>    10ms to 100us and things got better: with Dom0 burning CPU I am
>    rarely getting max latency of about ~30us with mean latency of ~9us
>    and deviation of ~0.5us. On other hand, when I tried to set period
>    to 30us, max latency rose up to ~60us.

This is very interestingi too. Did you get any spikes with the period
set to 100us? It would be fantastic if there were none.


> 3. Huge latency spike during domain creation. I conducted some
>    additional tests, including use of PV drivers, but this didn't
>    affected the latency in my "real time" domain. But attempt to
>    create another domain with relatively large memory size of 2GB led
>    to huge spike in latency. Debugging led to this call path:
> 
>    XENMEM_populate_physmap -> populate_physmap() ->
>    alloc_domheap_pages() -> alloc_heap_pages()-> huge
>    "for ( i = 0; i < (1 << order); i++ )" loop.
> 
>    This loops handles struct page* for every one of 262144 pages that
>    was allocated by calling populate_physmap(). This function is not
>    preemptible and it takes about a second to complete on my
>    setup. Needless to say that it completely screws any responsiveness
>    of the hypervisor.
> 
> I managed to overcome the issue #3 by commenting out all calls to
> populate_one_size() except the populate_one_size(PFN_4K_SHIFT) in
> xg_dom_arm.c. This lengthened domain construction, but my "RT" domain
> didn't experienced so big latency issues. Apparently all other
> hypercalls which are used during domain creation are either fast or
> preemptible. No doubts that my hack lead to page tables inflation and
> overall performance drop.

I think we need to follow this up and fix this. Maybe just by adding
a hypercall continuation to the loop.



> Conclusions.
> 
> I wanted to:
> 
> 1. Share my approach to IRQ latency measurements.
> 
> 2. Share results of said measurements.

Thank you very much. I asked a couple of questions above to better
understand your setup.


> 3. Ask for other potential scenarios, which can affect IRQ latency so I
>    can test them on my setup.

I'll follow up once I understood the above a bit better.


> 4. Discuss the issue with alloc_domheap_pages(). I brooded over ways
>    to make it preemptible. But it is a) located deep in call chain and
>    b) used not only by hypercalls. So I can't see an easy way to make
>    it preemptible. The best solution would be of course to make the
>    whole hypervisor preemptible. I even heard about some work long
>    time ago...
> 
> 5. Show that under certain conditions Xen can be used in a soft RT
>    system. Of course, additional tests are needed, but with some minor
>    tweaks it can provide ~15us IRQ latency on a typical workload.
> 
> [1] https://github.com/zephyrproject-rtos/zephyr/pull/30164
> [2] https://github.com/lorc/zephyr/tree/renesas_salvator_x
> [3] https://github.com/lorc/zephyr/blob/rt_eval/samples/rt_eval/src/main.c



  reply	other threads:[~2021-01-14 23:34 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-12 23:48 IRQ latency measurements in hypervisor Volodymyr Babchuk
2021-01-14 23:33 ` Stefano Stabellini [this message]
2021-01-15 11:42   ` Julien Grall
2021-01-15 15:45     ` Volodymyr Babchuk
2021-01-15 17:13       ` Julien Grall
2021-01-15 23:41         ` Stefano Stabellini
2021-01-16 12:59           ` Andrew Cooper
2021-01-20 23:09           ` Volodymyr Babchuk
2021-01-20 23:03         ` Volodymyr Babchuk
2021-01-21  0:52           ` Stefano Stabellini
2021-01-21 21:01           ` Julien Grall
2021-01-15 15:27   ` Volodymyr Babchuk
2021-01-15 23:17     ` Stefano Stabellini
2021-01-16 12:47       ` Julien Grall
2021-01-21  0:49       ` Volodymyr Babchuk
2021-01-21  0:59         ` Stefano Stabellini
2021-01-18 16:40   ` Dario Faggioli
2021-01-21  1:20     ` Volodymyr Babchuk
2021-01-21  8:39       ` Dario Faggioli
2021-01-16 14:40 ` Andrew Cooper
2021-01-21  2:39   ` Volodymyr Babchuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.2101141515230.31265@sstabellini-ThinkPad-T480s \
    --to=stefano.stabellini@xilinx.com \
    --cc=Bertrand.Marquis@arm.com \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dario.faggioli@suse.com \
    --cc=jgrall@amazon.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.