All of lore.kernel.org
 help / color / mirror / Atom feed
From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
To: Stefano Stabellini <sstabellini@kernel.org>
Cc: Stefano Stabellini <stefano.stabellini@xilinx.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Julien Grall <jgrall@amazon.com>,
	Dario Faggioli <dario.faggioli@suse.com>,
	"Bertrand.Marquis@arm.com" <Bertrand.Marquis@arm.com>,
	"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>
Subject: Re: IRQ latency measurements in hypervisor
Date: Thu, 21 Jan 2021 00:49:23 +0000	[thread overview]
Message-ID: <871ref2jxp.fsf@epam.com> (raw)
In-Reply-To: <alpine.DEB.2.21.2101151511010.31265@sstabellini-ThinkPad-T480s>



Stefano Stabellini writes:

> On Fri, 15 Jan 2021, Volodymyr Babchuk wrote:
[...]
>> >> [   83.873162] rt_eval_tmu e6fc0000.tmu: Mean: 44 (1320 ns) stddev: 8 (240 ns)
>> >> [   94.136632] rt_eval_tmu e6fc0000.tmu: Mean: 44 (1320 ns) stddev: 8 (240 ns)
>> >> [  104.400098] rt_eval_tmu e6fc0000.tmu: Mean: 50 (1500 ns) stddev: 129 (3870 ns)
>> >> [  114.663828] rt_eval_tmu e6fc0000.tmu: Mean: 44 (1320 ns) stddev: 8 (240 ns)
>> >> [  124.927296] rt_eval_tmu e6fc0000.tmu: Mean: 56 (1680 ns) stddev: 183 (5490 ns)
>> >> 
>> >> This is the baremetal Linux. And there is Dom0:
>> >> 
>> >> [  237.431003] rt_eval_tmu e6fc0000.tmu: Mean: 306 (9180 ns) stddev: 25 (750 ns)
>> >> [  247.694506] rt_eval_tmu e6fc0000.tmu: Mean: 302 (9060 ns) stddev: 17 (510 ns)
>> >> 
>> >> Driver outputs both the raw timer value (eg. 4) and the same value
>> >> scaled to nanoseconds (eg. 1320 ns). As you can see baremetal setup is
>> >> much faster. But experiments showed that Linux does not provide
>> >> consistent values, even when running in baremetal mode. You can see
>> >> sporadic spikes in "std dev" values.
>> >
>> > So baremetal IRQ latency is 1320-1680ns and Linux IRQ latency is
>> > 9060-9180ns. I am not surprised that Linux results are inconsistent but
>> > I have a couple of observations:
>> >
>> > - 9us is high for Linux
>> > If the system is idle, the latency should be lower, around 2-3us. I
>> > imagine you are actually running some sort of interference from dom0? Or
>> > using RTDS and descheduling vCPUs?
>> 
>> 9us was in idle state. Interestingly enough, I got latency if 3us while
>> Dom0 was doing some CPU-intensive tasks. So, under load latency is lower
>> than in idle state. I didn't investigated this, so I can't tell you what
>> causes this behavior.
>
> Did you use vwfi=native? You should definitely be able to see ~3us
> without interference and with 1vCPU <-> 1pCPU

With vwfi=native things in Dom0 got better, but it is still not 3us:

[   41.563904] rt_eval_tmu e6fc0000.tmu: Mean: 173 (5190 ns) stddev: 15 (450 ns) min: 125 (3750 ns) max: 276102 (8283060 ns)
[   51.820403] rt_eval_tmu e6fc0000.tmu: Mean: 172 (5160 ns) stddev: 8 (240 ns) min: 125 (3750 ns) max: 276102 (8283060 ns)
[   62.076898] rt_eval_tmu e6fc0000.tmu: Mean: 172 (5160 ns) stddev: 9 (270 ns) min: 125 (3750 ns) max: 276102 (8283060 ns)
[   72.333377] rt_eval_tmu e6fc0000.tmu: Mean: 171 (5130 ns) stddev: 9 (270 ns) min: 125 (3750 ns) max: 276102 (8283060 ns)
[   82.589852] rt_eval_tmu e6fc0000.tmu: Mean: 172 (5160 ns) stddev: 10 (300 ns) min: 125 (3750 ns) max: 276102 (8283060 ns)

And DomU (as Julien suggested) is extremely unhappy:

Mean: 282 (8460 ns) stddev: 412 (12360 ns) above thr: 4% [109 (3270 ns) - 2245 (67350 ns)] global [108 (3240 ns) 74261 (2227830 ns)]
Mean: 958 (28740 ns) stddev: 860 (25800 ns) above thr: 50% [120 (3600 ns) - 2253 (67590 ns)] global [108 (3240 ns) 74261 (2227830 ns)]
Mean: 713 (21390 ns) stddev: 533 (15990 ns) above thr: 37% [114 (3420 ns) - 2186 (65580 ns)] global [108 (3240 ns) 74261 (2227830 ns)]

>
>> > - the stddev of 3870ns is high for baremetal
>> > In the baremetal case the stddev should be minimal if the system is
>> > idle.
>> 
>> This is what I expected too. But nevertheless there was spikes. I didn't
>> investigated this as well, so I can only speculate there. My rootfs is
>> on NFS, so maybe network driver caused this spikes.
>
> Yeah, maybe it would be best to reduce the sources of possible spikes
> and get rid of NFS.

I agree there, but I am working remotely, so NFS is easier. I'll try
with eMMC later.

>
>> >
>> >
>> >> So my next step was to use proper RT OS to do the measurements. I
>> >> chose Zephyr. My PR that adds Xen support to Zephyr can be found at
>> >> [1]. Support for RCAR Gen3 is not upstreamed, but is present on my
>> >> GitHub([2]). At [3] you can find the source code for application that
>> >> does the latency measurements. It behaves exactly as my linux driver,
>> >> but provides a bit more information:
>> >> 
>> >>  *** Booting Zephyr OS build zephyr-v2.4.0-2750-g0f2c858a39fc  ***
>> >> RT Eval app
>> >> 
>> >> Counter freq is 33280000 Hz. Period is 30 ns
>> >> Set alarm in 0 sec (332800 ticks)
>> >> Mean: 600 (18000 ns) stddev: 3737 (112110 ns) above thr: 0% [265 (7950 ns) - 66955 (2008650 ns)] global [265 (7950 ns) 66955 (2008650 ns)]
>> >> Mean: 388 (11640 ns) stddev: 2059 (61770 ns) above thr: 0% [266 (7980 ns) - 58830 (1764900 ns)] global [265 (7950 ns) 66955 (2008650 ns)]
>> >> Mean: 358 (10740 ns) stddev: 1796 (53880 ns) above thr: 0% [265 (7950 ns) - 57780 (1733400 ns)] global [265 (7950 ns) 66955 (2008650 ns)]
>> 
>> This is Zephyr running as DomU.
>> 
>> >> ...
>> >> 
>> >> So there you can see: mean time, standard deviation, % of interrupts
>> >> that was processed above 30us threshold, minimum and maximum latency
>> >> values for the current 10s run, global minimum and maximum.
>> >> 
>> >> Zephyr running as baremetal showed very stable results (this is an
>> >> older build, so no extended statistics there):
>> >> 
>> >> ## Starting application at 0x480803C0 ...
>> >>  *** Booting Zephyr OS build zephyr-v2.4.0-1137-g5803ee1e8183  ***
>> >> RT Eval app
>> >> 
>> >> Counter freq is 33280000 Hz. Period is 30 ns
>> >> Mean: 31 (930 ns) stddev: 0 (0 ns)
>> >> Mean: 31 (930 ns) stddev: 0 (0 ns)
>> >> Mean: 31 (930 ns) stddev: 0 (0 ns)
>> >> Mean: 31 (930 ns) stddev: 0 (0 ns)
>> >> Mean: 31 (930 ns) stddev: 0 (0 ns)
>> >> Mean: 31 (930 ns) stddev: 0 (0 ns)
>> 
>> And this is Zephyr is running as baremetal.
>> 
>> >> ...
>> >> 
>> >> As Zephyr provided stable readouts with no jitter, I used it to do all
>> >> subsequent measurements.
>> >
>> > I am a bit confused here. Looking at the numbers above the stddev is
>> > 112110 ns in the first instance. That is pretty high. Am I looking at
>> > the wrong numbers?
>> 
>> I added some clarification above. As for 112110ns in the very first instance
>> - I always ignored the first instance, assuming that things need to
>> settle after domain being created.
>> 
>> But your comment is actually correct: what exacelt should "settle"?
>> Domain is already created. All should run smoothly. So, this is worth
>> investigating.
>
> It is fair to ignore the first 2 measurements.
>
> However, the numbers above have high stddev even at the 3rd measurement:
> 53us is high and above the target 30us. I take you didn't apply yet the 
> alloc_heap_pages and the serial workarounds?

Yes, this is the very first try. Without workarounds and proper
scheduler configuration.

With all fixes it is much better:

Mean: 307 (9210 ns) stddev: 4 (120 ns) above thr: 0% [265 (7950 ns) - 376 (11280 ns)] global [265 (7950 ns) 376 (11280 ns)]
Mean: 306 (9180 ns) stddev: 3 (90 ns) above thr: 0% [263 (7890 ns) - 360 (10800 ns)] global [263 (7890 ns) 376 (11280 ns)]
Mean: 306 (9180 ns) stddev: 3 (90 ns) above thr: 0% [298 (8940 ns) - 373 (11190 ns)] global [263 (7890 ns) 376 (11280 ns)]
Mean: 307 (9210 ns) stddev: 17 (510 ns) above thr: 0% [265 (7950 ns) - 856 (25680 ns)] global [263 (7890 ns) 1251 (37530 ns)]
Mean: 304 (9120 ns) stddev: 7 (210 ns) above thr: 0% [286 (8580 ns) - 486 (14580 ns)] global [263 (7890 ns) 1251 (37530 ns)]
Mean: 307 (9210 ns) stddev: 47 (1410 ns) above thr: 0% [260 (7800 ns) - 1299 (38970 ns)] global [260 (7800 ns) 1299 (38970 ns)]
Mean: 291 (8730 ns) stddev: 41 (1230 ns) above thr: 0% [250 (7500 ns) - 1324 (39720 ns)] global [250 (7500 ns) 1324 (39720 ns)]

>
>> >> IMPORTANT! All subsequent tests was conducted with only 1 CPU core
>> >> enabled. My goal was to ensure that system can timely react to an
>> >> external interrupt even under load.
>> >
>> > All right. FYI I have no frame of reference for 2 vCPUs on 1 pCPUs, all
>> > my tests were done with 1vCPU <-> 1pCPU and the null scheduler.
>> 
>> As I said, I had no issues with 1vCPU <-> 1pCPU setup, so I quickly
>> moved to cases which had issues.
>> 
>> >> Test results and latency sources
>> >> 
>> >> As you can see, baremetal OS provides latency of ~0.9us without any
>> >> deviations. The same code running as DomU on idle system shows mean
>> >> latency of 12us and deviation of about 10us. Range of latencies in a
>> >> 10s batch can vary from 8us to 25us. This fits into required 30us
>> >> threshold, so no big issue there.
>> >> 
>> >> But this worsens under certain conditions.
>> >> 
>> >> 1. Serial console. RCAR serial driver (scif) works in synchronous
>> >>    mode, so any heavy serial console output leads to higher
>> >>    latency. Tests shows mean latency of 1000us and deviation of 1332
>> >>    us. 54% of interrupts are handled outside of 30us period. Real
>> >>    values may be even higher, because in some cases timer would do
>> >>    full cycle and instead of say 11ms ISR would read 11 % 10 = 1ms
>> >>    latency. I tried to enable asynchronous mode for the serial
>> >>    console. This made things better, but it lead to gaps in output, so
>> >>    I just turned the serial console off completely.
>> >
>> > That's very interesting. I wonder if other serial drivers would cause
>> > similar issues, e.g. PL011.
>> 
>> They should. This behavior is programmed in serial.c. Driver can enable
>> async mode calling serial_async_transmit(). As I can see, only ns16550
>> driver does this.
>> 
>> Maybe you didn't saw problems there because you had more pCPU enabled
>> and Xen used some other pCPU to do UART operations.
>
> Good to know. A workaround would be to let Dom0/DomU have direct access
> to the UART, not giving any UART to Xen at all.

Yes. Or use SSH. I chose the second way.

[...]

-- 
Volodymyr Babchuk at EPAM

  parent reply	other threads:[~2021-01-21  0:50 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-12 23:48 IRQ latency measurements in hypervisor Volodymyr Babchuk
2021-01-14 23:33 ` Stefano Stabellini
2021-01-15 11:42   ` Julien Grall
2021-01-15 15:45     ` Volodymyr Babchuk
2021-01-15 17:13       ` Julien Grall
2021-01-15 23:41         ` Stefano Stabellini
2021-01-16 12:59           ` Andrew Cooper
2021-01-20 23:09           ` Volodymyr Babchuk
2021-01-20 23:03         ` Volodymyr Babchuk
2021-01-21  0:52           ` Stefano Stabellini
2021-01-21 21:01           ` Julien Grall
2021-01-15 15:27   ` Volodymyr Babchuk
2021-01-15 23:17     ` Stefano Stabellini
2021-01-16 12:47       ` Julien Grall
2021-01-21  0:49       ` Volodymyr Babchuk [this message]
2021-01-21  0:59         ` Stefano Stabellini
2021-01-18 16:40   ` Dario Faggioli
2021-01-21  1:20     ` Volodymyr Babchuk
2021-01-21  8:39       ` Dario Faggioli
2021-01-16 14:40 ` Andrew Cooper
2021-01-21  2:39   ` Volodymyr Babchuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871ref2jxp.fsf@epam.com \
    --to=volodymyr_babchuk@epam.com \
    --cc=Bertrand.Marquis@arm.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dario.faggioli@suse.com \
    --cc=jgrall@amazon.com \
    --cc=sstabellini@kernel.org \
    --cc=stefano.stabellini@xilinx.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.