linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/xen: Add "xen_timer_slop" command line option
@ 2019-03-22 18:29 thibodux
  2019-03-22 22:10 ` Boris Ostrovsky
  2019-04-24 18:47 ` Boris Ostrovsky
  0 siblings, 2 replies; 22+ messages in thread
From: thibodux @ 2019-03-22 18:29 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: boris.ostrovsky, oleksandr_andrushchenko, tglx, jgross, thibodux,
	ryan.thibodeaux

From: Ryan Thibodeaux <ryan.thibodeaux@starlab.io>

Add a new command-line option "xen_timer_slop=<INT>" that sets the
minimum delta of virtual Xen timers. This commit does not change the
default timer slop value for virtual Xen timers.

Lowering the timer slop value should improve the accuracy of virtual
timers (e.g., better process dispatch latency), but it will likely
increase the number of virtual timer interrupts (relative to the
original slop setting).

The original timer slop value has not changed since the introduction
of the Xen-aware Linux kernel code. This commit provides users an
opportunity to tune timer performance given the refinements to
hardware and the Xen event channel processing. It also mirrors
a feature in the Xen hypervisor - the "timer_slop" Xen command line
option.

Signed-off-by: Ryan Thibodeaux <ryan.thibodeaux@starlab.io>
---
 Documentation/admin-guide/kernel-parameters.txt |  7 +++++++
 arch/x86/xen/time.c                             | 18 ++++++++++++++++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 858b6c0..fb58c84 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5129,6 +5129,13 @@
 			with /sys/devices/system/xen_memory/xen_memory0/scrub_pages.
 			Default value controlled with CONFIG_XEN_SCRUB_PAGES_DEFAULT.
 
+	xen_timer_slop=	[X86-64,XEN]
+			Set the timer slop (in nanoseconds) for the virtual Xen
+			timers (default is 100000). This adjusts the minimum
+			delta of virtualized Xen timers, where lower values
+			improve timer resolution at the expense of processing
+			more timer interrupts.
+
 	xirc2ps_cs=	[NET,PCMCIA]
 			Format:
 			<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 6e29794..0393968 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -212,7 +212,7 @@ static int xen_timerop_set_next_event(unsigned long delta,
 	return 0;
 }
 
-static const struct clock_event_device xen_timerop_clockevent = {
+static struct clock_event_device xen_timerop_clockevent __ro_after_init = {
 	.name			= "xen",
 	.features		= CLOCK_EVT_FEAT_ONESHOT,
 
@@ -273,7 +273,7 @@ static int xen_vcpuop_set_next_event(unsigned long delta,
 	return ret;
 }
 
-static const struct clock_event_device xen_vcpuop_clockevent = {
+static struct clock_event_device xen_vcpuop_clockevent __ro_after_init = {
 	.name = "xen",
 	.features = CLOCK_EVT_FEAT_ONESHOT,
 
@@ -570,3 +570,17 @@ void __init xen_hvm_init_time_ops(void)
 	x86_platform.set_wallclock = xen_set_wallclock;
 }
 #endif
+
+/* Kernel parameter to specify Xen timer slop */
+static int __init parse_xen_timer_slop(char *ptr)
+{
+	unsigned long slop = memparse(ptr, NULL);
+
+	xen_timerop_clockevent.min_delta_ns = slop;
+	xen_timerop_clockevent.min_delta_ticks = slop;
+	xen_vcpuop_clockevent.min_delta_ns = slop;
+	xen_vcpuop_clockevent.min_delta_ticks = slop;
+
+	return 0;
+}
+early_param("xen_timer_slop", parse_xen_timer_slop);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-22 18:29 [PATCH] x86/xen: Add "xen_timer_slop" command line option thibodux
@ 2019-03-22 22:10 ` Boris Ostrovsky
  2019-03-23  2:58   ` Dario Faggioli
  2019-03-23 12:00   ` Ryan Thibodeaux
  2019-04-24 18:47 ` Boris Ostrovsky
  1 sibling, 2 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2019-03-22 22:10 UTC (permalink / raw)
  To: thibodux, xen-devel, linux-kernel
  Cc: oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux

On 3/22/19 2:29 PM, thibodux@gmail.com wrote:
> From: Ryan Thibodeaux <ryan.thibodeaux@starlab.io>
>
> Add a new command-line option "xen_timer_slop=<INT>" that sets the
> minimum delta of virtual Xen timers. This commit does not change the
> default timer slop value for virtual Xen timers.
>
> Lowering the timer slop value should improve the accuracy of virtual
> timers (e.g., better process dispatch latency), but it will likely
> increase the number of virtual timer interrupts (relative to the
> original slop setting).
>
> The original timer slop value has not changed since the introduction
> of the Xen-aware Linux kernel code. This commit provides users an
> opportunity to tune timer performance given the refinements to
> hardware and the Xen event channel processing. It also mirrors
> a feature in the Xen hypervisor - the "timer_slop" Xen command line
> option.

Is there any data that shows effects of using this new parameter?

-boris


>
> Signed-off-by: Ryan Thibodeaux <ryan.thibodeaux@starlab.io>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  7 +++++++
>  arch/x86/xen/time.c                             | 18 ++++++++++++++++--
>  2 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 858b6c0..fb58c84 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -5129,6 +5129,13 @@
>  			with /sys/devices/system/xen_memory/xen_memory0/scrub_pages.
>  			Default value controlled with CONFIG_XEN_SCRUB_PAGES_DEFAULT.
>  
> +	xen_timer_slop=	[X86-64,XEN]
> +			Set the timer slop (in nanoseconds) for the virtual Xen
> +			timers (default is 100000). This adjusts the minimum
> +			delta of virtualized Xen timers, where lower values
> +			improve timer resolution at the expense of processing
> +			more timer interrupts.
> +
>  	xirc2ps_cs=	[NET,PCMCIA]
>  			Format:
>  			<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> index 6e29794..0393968 100644
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -212,7 +212,7 @@ static int xen_timerop_set_next_event(unsigned long delta,
>  	return 0;
>  }
>  
> -static const struct clock_event_device xen_timerop_clockevent = {
> +static struct clock_event_device xen_timerop_clockevent __ro_after_init = {
>  	.name			= "xen",
>  	.features		= CLOCK_EVT_FEAT_ONESHOT,
>  
> @@ -273,7 +273,7 @@ static int xen_vcpuop_set_next_event(unsigned long delta,
>  	return ret;
>  }
>  
> -static const struct clock_event_device xen_vcpuop_clockevent = {
> +static struct clock_event_device xen_vcpuop_clockevent __ro_after_init = {
>  	.name = "xen",
>  	.features = CLOCK_EVT_FEAT_ONESHOT,
>  
> @@ -570,3 +570,17 @@ void __init xen_hvm_init_time_ops(void)
>  	x86_platform.set_wallclock = xen_set_wallclock;
>  }
>  #endif
> +
> +/* Kernel parameter to specify Xen timer slop */
> +static int __init parse_xen_timer_slop(char *ptr)
> +{
> +	unsigned long slop = memparse(ptr, NULL);
> +
> +	xen_timerop_clockevent.min_delta_ns = slop;
> +	xen_timerop_clockevent.min_delta_ticks = slop;
> +	xen_vcpuop_clockevent.min_delta_ns = slop;
> +	xen_vcpuop_clockevent.min_delta_ticks = slop;
> +
> +	return 0;
> +}
> +early_param("xen_timer_slop", parse_xen_timer_slop);


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-22 22:10 ` Boris Ostrovsky
@ 2019-03-23  2:58   ` Dario Faggioli
  2019-03-23 10:41     ` luca abeni
  2019-03-23 12:00   ` Ryan Thibodeaux
  1 sibling, 1 reply; 22+ messages in thread
From: Dario Faggioli @ 2019-03-23  2:58 UTC (permalink / raw)
  To: Boris Ostrovsky, thibodux, xen-devel, linux-kernel
  Cc: oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux, luca abeni

[-- Attachment #1: Type: text/plain, Size: 1386 bytes --]

On Fri, 2019-03-22 at 18:10 -0400, Boris Ostrovsky wrote:
> On 3/22/19 2:29 PM, thibodux@gmail.com wrote:
> > From: Ryan Thibodeaux <ryan.thibodeaux@starlab.io>
> > 
> > The original timer slop value has not changed since the
> > introduction
> > of the Xen-aware Linux kernel code. This commit provides users an
> > opportunity to tune timer performance given the refinements to
> > hardware and the Xen event channel processing. It also mirrors
> > a feature in the Xen hypervisor - the "timer_slop" Xen command line
> > option.
> 
> Is there any data that shows effects of using this new parameter?
> 
Yes, I've done some research and experiments on this. I did it together
with a friend, which I'm Cc-ing, as I'm not sure we're ready/capable to
share the results, yet (Luca?).

What I think I can anticipate is that having such a high value for
timer slop in the kernel, for the Xen clockevent device is (together
with the also quite high default value of timer_slop in Xen itself)
responsible for really high vcpu wakeup latencies.

Lowering those two values, reduces such latencies dramatically.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-23  2:58   ` Dario Faggioli
@ 2019-03-23 10:41     ` luca abeni
  2019-03-25 12:05       ` luca abeni
  0 siblings, 1 reply; 22+ messages in thread
From: luca abeni @ 2019-03-23 10:41 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Boris Ostrovsky, thibodux, xen-devel, linux-kernel,
	oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux

Hi all,

On Sat, 23 Mar 2019 03:58:16 +0100
Dario Faggioli <dfaggioli@suse.com> wrote:

> On Fri, 2019-03-22 at 18:10 -0400, Boris Ostrovsky wrote:
> > On 3/22/19 2:29 PM, thibodux@gmail.com wrote:  
> > > From: Ryan Thibodeaux <ryan.thibodeaux@starlab.io>
> > > 
> > > The original timer slop value has not changed since the
> > > introduction
> > > of the Xen-aware Linux kernel code. This commit provides users an
> > > opportunity to tune timer performance given the refinements to
> > > hardware and the Xen event channel processing. It also mirrors
> > > a feature in the Xen hypervisor - the "timer_slop" Xen command
> > > line option.  
> > 
> > Is there any data that shows effects of using this new parameter?
> >   
> Yes, I've done some research and experiments on this. I did it
> together with a friend, which I'm Cc-ing, as I'm not sure we're
> ready/capable to share the results, yet (Luca?).

I think we can easily share the experimental data (cyclictest output
and plots).

Moreover, we can share the scripts and tools for running the
experiments (so, everyone can easily reproduce the numbers by simply
typing "make" and waiting for some time :)


I'll try to package the results and the scripts/tools this evening, and
I'll send them.



				Luca


> 
> What I think I can anticipate is that having such a high value for
> timer slop in the kernel, for the Xen clockevent device is (together
> with the also quite high default value of timer_slop in Xen itself)
> responsible for really high vcpu wakeup latencies.
> 
> Lowering those two values, reduces such latencies dramatically.
> 
> Regards,
> Dario


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-22 22:10 ` Boris Ostrovsky
  2019-03-23  2:58   ` Dario Faggioli
@ 2019-03-23 12:00   ` Ryan Thibodeaux
  2019-03-24 18:07     ` Boris Ostrovsky
  1 sibling, 1 reply; 22+ messages in thread
From: Ryan Thibodeaux @ 2019-03-23 12:00 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross,
	ryan.thibodeaux

On Fri, Mar 22, 2019 at 06:10:16PM -0400, Boris Ostrovsky wrote:
> On 3/22/19 2:29 PM, thibodux@gmail.com wrote:
> > From: Ryan Thibodeaux <ryan.thibodeaux@starlab.io>
> >
> > Add a new command-line option "xen_timer_slop=<INT>" that sets the
> > minimum delta of virtual Xen timers. This commit does not change the
> > default timer slop value for virtual Xen timers.
> >
> > Lowering the timer slop value should improve the accuracy of virtual
> > timers (e.g., better process dispatch latency), but it will likely
> > increase the number of virtual timer interrupts (relative to the
> > original slop setting).
> >
> > The original timer slop value has not changed since the introduction
> > of the Xen-aware Linux kernel code. This commit provides users an
> > opportunity to tune timer performance given the refinements to
> > hardware and the Xen event channel processing. It also mirrors
> > a feature in the Xen hypervisor - the "timer_slop" Xen command line
> > option.
> 
> Is there any data that shows effects of using this new parameter?
> 
> -boris
> 

For our own testing using "cyclictest" from the rt-tests project,
lowering the timer slop helped produce the best test runs, especially
in terms of maximum process dispatch latency (PDL).

Here is the output from one such test that ran overnight. The Xen
timer slop in this case was 10000 or 10 microseconds.

...
[root@slop1 ~]# cset shield -c 3
[root@slop1 ~]# echo ; date ; echo ; \
./rt-tests-1.3/cyclictest -p95 -a3 -t1 -m; echo ; date

Thu Mar 14 19:45:36 UTC 2019

# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 0.00 0.02 0.00 1/91 4260
T: 0 ( 3212) P:95 I:1000 C:57077313 Min: 27 Act: 44 Avg: 43 Max: 145
^C
Fri Mar 15 11:36:53 UTC 2019
...

This test system was configured to use a TSC clocksource, disabled
C states, and lowered the timer slop. I am not claiming the timer
slop change was solely responsible for the best results. In other
testing with the default timer slop setting of 100000 (100
microseconds), the average PDL would run slightly higher, but the
spikes were much higher and more in number, often near the 1000s
and happening multiple times per 10 minutes of testing.

- Ryan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-23 12:00   ` Ryan Thibodeaux
@ 2019-03-24 18:07     ` Boris Ostrovsky
  2019-03-25 10:36       ` Dario Faggioli
  0 siblings, 1 reply; 22+ messages in thread
From: Boris Ostrovsky @ 2019-03-24 18:07 UTC (permalink / raw)
  To: Ryan Thibodeaux
  Cc: xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross,
	ryan.thibodeaux

On Sat, Mar 23, 2019 at 08:00:52AM -0400, Ryan Thibodeaux wrote:
> On Fri, Mar 22, 2019 at 06:10:16PM -0400, Boris Ostrovsky wrote:
> > On 3/22/19 2:29 PM, thibodux@gmail.com wrote:
> > > From: Ryan Thibodeaux <ryan.thibodeaux@starlab.io>
> > >
> > > Add a new command-line option "xen_timer_slop=<INT>" that sets the
> > > minimum delta of virtual Xen timers. This commit does not change the
> > > default timer slop value for virtual Xen timers.
> > >
> > > Lowering the timer slop value should improve the accuracy of virtual
> > > timers (e.g., better process dispatch latency), but it will likely
> > > increase the number of virtual timer interrupts (relative to the
> > > original slop setting).
> > >
> > > The original timer slop value has not changed since the introduction
> > > of the Xen-aware Linux kernel code. This commit provides users an
> > > opportunity to tune timer performance given the refinements to
> > > hardware and the Xen event channel processing. It also mirrors
> > > a feature in the Xen hypervisor - the "timer_slop" Xen command line
> > > option.
> > 
> > Is there any data that shows effects of using this new parameter?
> > 
> > -boris
> > 
> 
> For our own testing using "cyclictest" from the rt-tests project,
> lowering the timer slop helped produce the best test runs, especially
> in terms of maximum process dispatch latency (PDL).
> 
> Here is the output from one such test that ran overnight. The Xen
> timer slop in this case was 10000 or 10 microseconds.
> 
> ...
> [root@slop1 ~]# cset shield -c 3
> [root@slop1 ~]# echo ; date ; echo ; \
> ./rt-tests-1.3/cyclictest -p95 -a3 -t1 -m; echo ; date
> 
> Thu Mar 14 19:45:36 UTC 2019
> 
> # /dev/cpu_dma_latency set to 0us
> policy: fifo: loadavg: 0.00 0.02 0.00 1/91 4260
> T: 0 ( 3212) P:95 I:1000 C:57077313 Min: 27 Act: 44 Avg: 43 Max: 145
> ^C
> Fri Mar 15 11:36:53 UTC 2019
> ...
> 
> This test system was configured to use a TSC clocksource, disabled
> C states, and lowered the timer slop. I am not claiming the timer
> slop change was solely responsible for the best results.

How can we then be sure that the proposed change will indeed provide
some sort of benefit?

Were there any other changes between your tests to think that slop
time modification may not be responsible for better results?

-boris


> In other
> testing with the default timer slop setting of 100000 (100
> microseconds), the average PDL would run slightly higher, but the
> spikes were much higher and more in number, often near the 1000s
> and happening multiple times per 10 minutes of testing.
> 
> - Ryan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-24 18:07     ` Boris Ostrovsky
@ 2019-03-25 10:36       ` Dario Faggioli
  0 siblings, 0 replies; 22+ messages in thread
From: Dario Faggioli @ 2019-03-25 10:36 UTC (permalink / raw)
  To: Boris Ostrovsky, Ryan Thibodeaux
  Cc: xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross,
	ryan.thibodeaux, luca.abeni

[-- Attachment #1: Type: text/plain, Size: 1940 bytes --]

On Sun, 2019-03-24 at 14:07 -0400, Boris Ostrovsky wrote:
> On Sat, Mar 23, 2019 at 08:00:52AM -0400, Ryan Thibodeaux wrote:
> > This test system was configured to use a TSC clocksource, disabled
> > C states, and lowered the timer slop. I am not claiming the timer
> > slop change was solely responsible for the best results.
> 
> How can we then be sure that the proposed change will indeed provide
> some sort of benefit?
> 
> Were there any other changes between your tests to think that slop
> time modification may not be responsible for better results?
> 
FWIW, in mine and Luca's experiments, changing lowering both timer_slop
in Xen and TIMER_SLOP in Linux, improved latency dramatically, without
any other change.

We also tried _only_ playing with the Xen tunable (as there's a Xen
boot parameter for it already) but that wasn't enough. It was only when
we also tuned TIMER_SLOP in Linux's Xen clockevent device that we got
decent numbers (i.e., comparable to KVM ones).

Reason why we had not share these results yet was that we were still
"polishing" them, and because we also found a couple of other issues,
and we were trying to understand them better, before sending anything
out. But those other issues were --although still about achieving low
latencies-- orthogonal from this, and lowering the default slop is
absolute prerequisite for even talking about having a reasonable vcpu
response time.

A patch like this one, was something we were thinking to submit ourself
sooner or later (backed up by our results). Personally, in addition to
making the value tunable, which I think is a good thing, I also would
lower the default.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-23 10:41     ` luca abeni
@ 2019-03-25 12:05       ` luca abeni
  2019-03-25 13:43         ` Boris Ostrovsky
  0 siblings, 1 reply; 22+ messages in thread
From: luca abeni @ 2019-03-25 12:05 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Boris Ostrovsky, thibodux, xen-devel, linux-kernel,
	oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux

Hi all,

On Sat, 23 Mar 2019 11:41:51 +0100
luca abeni <luca.abeni@santannapisa.it> wrote:
[...]
> > > Is there any data that shows effects of using this new parameter?
> > >     
> > Yes, I've done some research and experiments on this. I did it
> > together with a friend, which I'm Cc-ing, as I'm not sure we're
> > ready/capable to share the results, yet (Luca?).  
> 
> I think we can easily share the experimental data (cyclictest output
> and plots).
> 
> Moreover, we can share the scripts and tools for running the
> experiments (so, everyone can easily reproduce the numbers by simply
> typing "make" and waiting for some time :)
> 
> 
> I'll try to package the results and the scripts/tools this evening,
> and I'll send them.

Sorry for the delay. I put some quick results here:
http://retis.santannapisa.it/luca/XenTimers/
(there also is a link to the scripts to be used for reproducing the
results). The latencies have been measured by running cyclictest in the
guest (see the scripts for details).

The picture shows the latencies measured with an unpatched guest kernel
and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary small
value :).
All the experiments have been performed booting the hypervisor with a
small timer_slop (the hypervisor's one) value. So, they show that
decreasing the hypervisor's timer_slop is not enough to measure low
latencies with cyclictest.


				Luca

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-25 12:05       ` luca abeni
@ 2019-03-25 13:43         ` Boris Ostrovsky
  2019-03-25 14:07           ` luca abeni
                             ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2019-03-25 13:43 UTC (permalink / raw)
  To: luca abeni, Dario Faggioli
  Cc: thibodux, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx,
	jgross, ryan.thibodeaux

On 3/25/19 8:05 AM, luca abeni wrote:
> Hi all,
>
> On Sat, 23 Mar 2019 11:41:51 +0100
> luca abeni <luca.abeni@santannapisa.it> wrote:
> [...]
>>>> Is there any data that shows effects of using this new parameter?
>>>>     
>>> Yes, I've done some research and experiments on this. I did it
>>> together with a friend, which I'm Cc-ing, as I'm not sure we're
>>> ready/capable to share the results, yet (Luca?).  
>> I think we can easily share the experimental data (cyclictest output
>> and plots).
>>
>> Moreover, we can share the scripts and tools for running the
>> experiments (so, everyone can easily reproduce the numbers by simply
>> typing "make" and waiting for some time :)
>>
>>
>> I'll try to package the results and the scripts/tools this evening,
>> and I'll send them.
> Sorry for the delay. I put some quick results here:
> http://retis.santannapisa.it/luca/XenTimers/
> (there also is a link to the scripts to be used for reproducing the
> results). The latencies have been measured by running cyclictest in the
> guest (see the scripts for details).
>
> The picture shows the latencies measured with an unpatched guest kernel
> and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary small
> value :).
> All the experiments have been performed booting the hypervisor with a
> small timer_slop (the hypervisor's one) value. So, they show that
> decreasing the hypervisor's timer_slop is not enough to measure low
> latencies with cyclictest.



I have a couple of questions:
* Does it make sense to make this a tunable for other clockevent devices
as well?
* This patch adjusts min value. Could max value (ever) need a similar
adjustment?

-boris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-25 13:43         ` Boris Ostrovsky
@ 2019-03-25 14:07           ` luca abeni
  2019-03-25 14:11           ` Ryan Thibodeaux
  2019-03-26  9:13           ` Dario Faggioli
  2 siblings, 0 replies; 22+ messages in thread
From: luca abeni @ 2019-03-25 14:07 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Dario Faggioli, thibodux, xen-devel, linux-kernel,
	oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux

Hi,

On Mon, 25 Mar 2019 09:43:20 -0400
Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
[...]
> > http://retis.santannapisa.it/luca/XenTimers/
> > (there also is a link to the scripts to be used for reproducing the
> > results). The latencies have been measured by running cyclictest in
> > the guest (see the scripts for details).
> >
> > The picture shows the latencies measured with an unpatched guest
> > kernel and with a guest kernel having TIMER_SLOP set to 1000
> > (arbitrary small value :).
> > All the experiments have been performed booting the hypervisor with
> > a small timer_slop (the hypervisor's one) value. So, they show that
> > decreasing the hypervisor's timer_slop is not enough to measure low
> > latencies with cyclictest.  
> 
> 
> 
> I have a couple of questions:
> * Does it make sense to make this a tunable for other clockevent
> devices as well?
> * This patch adjusts min value. Could max value (ever) need a similar
> adjustment?

Sorry, I do not know much about clockevent devices, so I have no answers
to these questions...

What I can say is that when I repeated the cyclictest experiments on
VMs using a different clockevent device (lapic) I did not measure large
latencies.
So, I guess the "lapic" clockevent device already defaults to a smaller
min value (not sure about other clockevent devices, I do not know how
to test them).



				Luca

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-25 13:43         ` Boris Ostrovsky
  2019-03-25 14:07           ` luca abeni
@ 2019-03-25 14:11           ` Ryan Thibodeaux
  2019-03-25 17:36             ` Ryan Thibodeaux
  2019-03-25 18:31             ` Boris Ostrovsky
  2019-03-26  9:13           ` Dario Faggioli
  2 siblings, 2 replies; 22+ messages in thread
From: Ryan Thibodeaux @ 2019-03-25 14:11 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: luca abeni, Dario Faggioli, xen-devel, linux-kernel,
	oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux

On Mon, Mar 25, 2019 at 09:43:20AM -0400, Boris Ostrovsky wrote:
> On 3/25/19 8:05 AM, luca abeni wrote:
> > Hi all,
> >
> > On Sat, 23 Mar 2019 11:41:51 +0100
> > luca abeni <luca.abeni@santannapisa.it> wrote:
> > [...]
> >>>> Is there any data that shows effects of using this new parameter?
> >>>>     
> >>> Yes, I've done some research and experiments on this. I did it
> >>> together with a friend, which I'm Cc-ing, as I'm not sure we're
> >>> ready/capable to share the results, yet (Luca?).  
> >> I think we can easily share the experimental data (cyclictest output
> >> and plots).
> >>
> >> Moreover, we can share the scripts and tools for running the
> >> experiments (so, everyone can easily reproduce the numbers by simply
> >> typing "make" and waiting for some time :)
> >>
> >>
> >> I'll try to package the results and the scripts/tools this evening,
> >> and I'll send them.
> > Sorry for the delay. I put some quick results here:
> > http://retis.santannapisa.it/luca/XenTimers/
> > (there also is a link to the scripts to be used for reproducing the
> > results). The latencies have been measured by running cyclictest in the
> > guest (see the scripts for details).
> >
> > The picture shows the latencies measured with an unpatched guest kernel
> > and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary small
> > value :).
> > All the experiments have been performed booting the hypervisor with a
> > small timer_slop (the hypervisor's one) value. So, they show that
> > decreasing the hypervisor's timer_slop is not enough to measure low
> > latencies with cyclictest.
> 
> 
> 
> I have a couple of questions:
> * Does it make sense to make this a tunable for other clockevent devices
> as well?

I gather that would be on a case-by-case basis for very specific 
ones.

For many timers in the kernel, the minimums are determined by the
actual hardware  backing the timer, and the minimum can be
adjusted by the clockevent code. This case is special since it 
is entirely a software-based implementation in the kernel, where 
the actual timer implementation is in the Xen hypervisor.

> * This patch adjusts min value. Could max value (ever) need a similar
> adjustment?

I cannot think of such a case where that would be helpful, but I
cannot rule that out or speak as an authority.

- Ryan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-25 14:11           ` Ryan Thibodeaux
@ 2019-03-25 17:36             ` Ryan Thibodeaux
  2019-03-25 18:31             ` Boris Ostrovsky
  1 sibling, 0 replies; 22+ messages in thread
From: Ryan Thibodeaux @ 2019-03-25 17:36 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: luca abeni, Dario Faggioli, xen-devel, linux-kernel,
	oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux

On Mon, Mar 25, 2019 at 10:11:38AM -0400, Ryan Thibodeaux wrote:
> > > [...]
> > >>>> Is there any data that shows effects of using this new parameter?
> > >>>> 

Continuing with the experimental data conversation (thanks to Luca and
Dario for being so generous), I am providing more results from quick
tests this morning.

I ran the same sequence of tests four times with the same hardware,
hypervisor, and Linux guest setup. Only changes between runs was 
adjusting the slop settings in Xen and Linux. This was on a 
build of Xen 4.10 and a Linux guest running the current Xen 
tip.git kernel with my patch.

For each sequence, I ran two variations of cyclictest on an isolated
processor. The first test used an interval of 50 microseconds and
second test used an interval of 1000 microseconds, passing "-i50"
and "-10000" arguments to cyclictest respectively.

The variations of the sequences are as follows:
#1 - default slops:  Xen@50000, Linux@100000
#2 - lowering Linux: Xen@50000, Linux@5000
#3 - lowering Xen:   Xen@5000,  Linux@100000
#4 - lowering both:  Xen@5000,  Linux@5000

The cleaned up test output is below. Only showing the total
stats for each run and the number of spikes / samples that went 
over 100 microseconds. I do not record each sample value like
Luca and Dario, because I want to eliminate as many variables as
possible, e.g., eliminating overhead of writing out raw results.

Looking at the results, you can see that only lowering the Linux
slop (with my proposed patch) does reduce the overall PDL stats for
the shorter interval, but it especially lowers the spikes, in both
cases. Even in test #3 where the Xen slop was lowered, the spikes 
are a problem at the default Linux slop.

Reiterating what Luca and Dario said, lowering both slops is the 
way to consisten results for both interval configurations.

Note: even better stats can likely be achieved with more tuning
and using the RT patchset. These results were just focusing on
a non-specialized configuration.

...
##############################

# Timer Slop: Xen (default, 50000) | Guest (default, 100000)

# Cyclictest Interval (-i50)
Min: 62
Avg: 127
Max: 212
Spikes (over 100): 3892034

# Cyclictest Interval (-i1000)
Min: 24
Avg: 45
Max: 156
Spikes (over 100): 27


##############################

# Timer Slop: Xen (default, 50000) | Guest (5000)

# Cyclictest Interval (-i50)
Min: 25
Avg: 78
Max: 230
Spikes (over 100): 274549

# Cyclictest Interval (-i1000)
Min: 37
Avg: 45
Max: 82
Spikes (over 100): 0


##############################

# Timer Slop: Xen (5000) | Guest (default, 100000)

# Cyclictest Interval (-i50)
Min: 61
Avg: 126
Max: 226
Spikes (over 100): 3877860

# Cyclictest Interval (-i1000)
Min: 37
Avg: 45
Max: 74
Spikes (over 100): 0


##############################

# Timer Slop: Xen (5000) | Guest (5000)

# Cyclictest Interval (-i50)
Min: 13
Avg: 30
Max: 150
Spikes (over 100): 120

# Cyclictest Interval (-i1000)
Min: 37
Avg: 45
Max: 97
Spikes (over 100): 0
...

- Ryan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-25 14:11           ` Ryan Thibodeaux
  2019-03-25 17:36             ` Ryan Thibodeaux
@ 2019-03-25 18:31             ` Boris Ostrovsky
  1 sibling, 0 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2019-03-25 18:31 UTC (permalink / raw)
  To: Ryan Thibodeaux
  Cc: luca abeni, Dario Faggioli, xen-devel, linux-kernel,
	oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux

On 3/25/19 10:11 AM, Ryan Thibodeaux wrote:
> On Mon, Mar 25, 2019 at 09:43:20AM -0400, Boris Ostrovsky wrote:
>> On 3/25/19 8:05 AM, luca abeni wrote:
>>> Hi all,
>>>
>>> On Sat, 23 Mar 2019 11:41:51 +0100
>>> luca abeni <luca.abeni@santannapisa.it> wrote:
>>> [...]
>>>>>> Is there any data that shows effects of using this new parameter?
>>>>>>     
>>>>> Yes, I've done some research and experiments on this. I did it
>>>>> together with a friend, which I'm Cc-ing, as I'm not sure we're
>>>>> ready/capable to share the results, yet (Luca?).  
>>>> I think we can easily share the experimental data (cyclictest output
>>>> and plots).
>>>>
>>>> Moreover, we can share the scripts and tools for running the
>>>> experiments (so, everyone can easily reproduce the numbers by simply
>>>> typing "make" and waiting for some time :)
>>>>
>>>>
>>>> I'll try to package the results and the scripts/tools this evening,
>>>> and I'll send them.
>>> Sorry for the delay. I put some quick results here:
>>> http://retis.santannapisa.it/luca/XenTimers/
>>> (there also is a link to the scripts to be used for reproducing the
>>> results). The latencies have been measured by running cyclictest in the
>>> guest (see the scripts for details).
>>>
>>> The picture shows the latencies measured with an unpatched guest kernel
>>> and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary small
>>> value :).
>>> All the experiments have been performed booting the hypervisor with a
>>> small timer_slop (the hypervisor's one) value. So, they show that
>>> decreasing the hypervisor's timer_slop is not enough to measure low
>>> latencies with cyclictest.
>>
>>
>> I have a couple of questions:
>> * Does it make sense to make this a tunable for other clockevent devices
>> as well?
> I gather that would be on a case-by-case basis for very specific 
> ones.
>
> For many timers in the kernel, the minimums are determined by the
> actual hardware  backing the timer, and the minimum can be
> adjusted by the clockevent code. This case is special since it 
> is entirely a software-based implementation in the kernel, where 
> the actual timer implementation is in the Xen hypervisor.
>
>> * This patch adjusts min value. Could max value (ever) need a similar
>> adjustment?
> I cannot think of such a case where that would be helpful, but I
> cannot rule that out or speak as an authority.


I am asking mostly because you are introducing new interface and I don't
want it to change in the future. I suppose if later we decide to add
control for the max value we could just expand your current proposal to
xen_timer_slop=[min],[max] and keep it to be back-compatible.

For the patch:

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-25 13:43         ` Boris Ostrovsky
  2019-03-25 14:07           ` luca abeni
  2019-03-25 14:11           ` Ryan Thibodeaux
@ 2019-03-26  9:13           ` Dario Faggioli
  2019-03-26 11:12             ` luca abeni
  2019-03-26 23:21             ` Boris Ostrovsky
  2 siblings, 2 replies; 22+ messages in thread
From: Dario Faggioli @ 2019-03-26  9:13 UTC (permalink / raw)
  To: Boris Ostrovsky, luca abeni
  Cc: thibodux, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx,
	jgross, ryan.thibodeaux

[-- Attachment #1: Type: text/plain, Size: 3280 bytes --]

On Mon, 2019-03-25 at 09:43 -0400, Boris Ostrovsky wrote:
> On 3/25/19 8:05 AM, luca abeni wrote:
> > 
> > The picture shows the latencies measured with an unpatched guest
> > kernel
> > and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary
> > small
> > value :).
> > All the experiments have been performed booting the hypervisor with
> > a
> > small timer_slop (the hypervisor's one) value. So, they show that
> > decreasing the hypervisor's timer_slop is not enough to measure low
> > latencies with cyclictest.
> 
> I have a couple of questions:
> * Does it make sense to make this a tunable for other clockevent
> devices
> as well?
>
So, AFAIUI, the thing is as follows. In clockevents_program_event(), we
keep the delta between now and the next timer event within
dev->max_delta_ns and dev->min_delta_ns:

  delta = min(delta, (int64_t) dev->max_delta_ns);
  delta = max(delta, (int64_t) dev->min_delta_ns);

For Xen (well, for the Xen clock) we have:

  .max_delta_ns = 0xffffffff,
  .min_delta_ns = TIMER_SLOP,

which means a guest can't ask for a timer to fire earlier than 100us
ahead, which is a bit too coarse, especially on contemporary hardware.

For "lapic_deadline" (which was what was in use in KVM guests, in our
experiments) we have:

  lapic_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFF, &lapic_clockevent);
  lapic_clockevent.min_delta_ns = clockevent_delta2ns(0xF, &lapic_clockevent);

Which means max is 0x7FFFFF device ticks, and min is 0xF.
clockevent_delta2ns() does the conversion from ticks to ns, basing on
the results of the APIC calibration process. It calls cev_delta2ns()
which does some scaling, shifting, divs, etc, and, at the very end,
this:

  /* Deltas less than 1usec are pointless noise */
  return clc > 1000 ? clc : 1000;

So, as Ryan is also saying, the actual minimum, in this case, depends
on hardware, with a sanity check of "never below 1us" (which is quite
smaller than 100us!)

Of course, the actual granularity depends on hardware in the Xen case
as well, but that is handled in Xen itself. And we have mechanisms in
place in there to avoid timer interrupt storms (like, ahem, the Xen's
'timer_slop' boot parameter... :-P)

And this is basically why I was also thinking we can/should lower the
default value of TIMER_SLOP, here in the Xen clock implementation in
Linux.

> * This patch adjusts min value. Could max value (ever) need a similar
> adjustment?
> 
Well, for Xen, it's already 0xffffffff. I don't see use cases when one
would want a smaller max. Wanting an higher max *might* be of some
interest,  e.g., for power management, if the first timer event is 1min
ahead, and you don't want to be woken up every (if my math is right) 4
secs.

But we'd have to see if that actually works, not to mention that 4 secs
is already large enough, IMHO, that it's unlikely we'll be really
sleeping for that much time without having to wake up for one reason or
another. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-26  9:13           ` Dario Faggioli
@ 2019-03-26 11:12             ` luca abeni
  2019-03-26 11:41               ` Ryan Thibodeaux
  2019-03-26 23:21             ` Boris Ostrovsky
  1 sibling, 1 reply; 22+ messages in thread
From: luca abeni @ 2019-03-26 11:12 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Boris Ostrovsky, thibodux, xen-devel, linux-kernel,
	oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux

Hi all,

On Tue, 26 Mar 2019 10:13:32 +0100
Dario Faggioli <dfaggioli@suse.com> wrote:

> On Mon, 2019-03-25 at 09:43 -0400, Boris Ostrovsky wrote:
> > On 3/25/19 8:05 AM, luca abeni wrote:  
> > > 
> > > The picture shows the latencies measured with an unpatched guest
> > > kernel
> > > and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary
> > > small
> > > value :).
> > > All the experiments have been performed booting the hypervisor
> > > with a
> > > small timer_slop (the hypervisor's one) value. So, they show that
> > > decreasing the hypervisor's timer_slop is not enough to measure
> > > low latencies with cyclictest.  
> > 
> > I have a couple of questions:
> > * Does it make sense to make this a tunable for other clockevent
> > devices
> > as well?
> >  
> So, AFAIUI, the thing is as follows. In clockevents_program_event(),
> we keep the delta between now and the next timer event within
> dev->max_delta_ns and dev->min_delta_ns:
> 
>   delta = min(delta, (int64_t) dev->max_delta_ns);
>   delta = max(delta, (int64_t) dev->min_delta_ns);
> 
> For Xen (well, for the Xen clock) we have:
> 
>   .max_delta_ns = 0xffffffff,
>   .min_delta_ns = TIMER_SLOP,
> 
> which means a guest can't ask for a timer to fire earlier than 100us
[...]

I know this is not fully related with the current discussion, but in
these days I had a look at the code again, and...
The comment for TIMER_SLOP in arch/x86/xen/time.c says:
	/* Xen may fire a timer up to this many ns early */

Isn't the comment wrong? shouldn't it be "...many ns late" instead of
"early"?



			Thanks,
				Luca

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-26 11:12             ` luca abeni
@ 2019-03-26 11:41               ` Ryan Thibodeaux
  0 siblings, 0 replies; 22+ messages in thread
From: Ryan Thibodeaux @ 2019-03-26 11:41 UTC (permalink / raw)
  To: luca abeni
  Cc: Dario Faggioli, Boris Ostrovsky, xen-devel, linux-kernel,
	oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux

On Tue, Mar 26, 2019 at 12:12:56PM +0100, luca abeni wrote:
> Hi all,
> 
> On Tue, 26 Mar 2019 10:13:32 +0100
> Dario Faggioli <dfaggioli@suse.com> wrote:
> 
> > On Mon, 2019-03-25 at 09:43 -0400, Boris Ostrovsky wrote:
> > > On 3/25/19 8:05 AM, luca abeni wrote:  
> > > > 
> > > > The picture shows the latencies measured with an unpatched guest
> > > > kernel
> > > > and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary
> > > > small
> > > > value :).
> > > > All the experiments have been performed booting the hypervisor
> > > > with a
> > > > small timer_slop (the hypervisor's one) value. So, they show that
> > > > decreasing the hypervisor's timer_slop is not enough to measure
> > > > low latencies with cyclictest.  
> > > 
> > > I have a couple of questions:
> > > * Does it make sense to make this a tunable for other clockevent
> > > devices
> > > as well?
> > >  
> > So, AFAIUI, the thing is as follows. In clockevents_program_event(),
> > we keep the delta between now and the next timer event within
> > dev->max_delta_ns and dev->min_delta_ns:
> > 
> >   delta = min(delta, (int64_t) dev->max_delta_ns);
> >   delta = max(delta, (int64_t) dev->min_delta_ns);
> > 
> > For Xen (well, for the Xen clock) we have:
> > 
> >   .max_delta_ns = 0xffffffff,
> >   .min_delta_ns = TIMER_SLOP,
> > 
> > which means a guest can't ask for a timer to fire earlier than 100us
> [...]
> 
> I know this is not fully related with the current discussion, but in
> these days I had a look at the code again, and...
> The comment for TIMER_SLOP in arch/x86/xen/time.c says:
> 	/* Xen may fire a timer up to this many ns early */
> 
> Isn't the comment wrong? shouldn't it be "...many ns late" instead of
> "early"?

I would say is something else entirely. 

If you look at "clockevents_program_event()" in kernel/time/clockevents.c,
you see that the min_delta_ns value sets the limit or granulariy for the
clock's sleep time.

Basically, it is the minimum amount of sleep one can set for the next
event for the clock in question.

- Ryan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-26  9:13           ` Dario Faggioli
  2019-03-26 11:12             ` luca abeni
@ 2019-03-26 23:21             ` Boris Ostrovsky
  2019-03-27 10:00               ` Ryan Thibodeaux
  1 sibling, 1 reply; 22+ messages in thread
From: Boris Ostrovsky @ 2019-03-26 23:21 UTC (permalink / raw)
  To: Dario Faggioli, luca abeni
  Cc: thibodux, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx,
	jgross, ryan.thibodeaux


[-- Attachment #1.1: Type: text/plain, Size: 2744 bytes --]

On 3/26/19 5:13 AM, Dario Faggioli wrote:
> On Mon, 2019-03-25 at 09:43 -0400, Boris Ostrovsky wrote:
>> On 3/25/19 8:05 AM, luca abeni wrote:
>>> The picture shows the latencies measured with an unpatched guest
>>> kernel
>>> and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary
>>> small
>>> value :).
>>> All the experiments have been performed booting the hypervisor with
>>> a
>>> small timer_slop (the hypervisor's one) value. So, they show that
>>> decreasing the hypervisor's timer_slop is not enough to measure low
>>> latencies with cyclictest.
>> I have a couple of questions:
>> * Does it make sense to make this a tunable for other clockevent
>> devices
>> as well?
>>
> So, AFAIUI, the thing is as follows. In clockevents_program_event(), we
> keep the delta between now and the next timer event within
> dev->max_delta_ns and dev->min_delta_ns:
>
>   delta = min(delta, (int64_t) dev->max_delta_ns);
>   delta = max(delta, (int64_t) dev->min_delta_ns);
>
> For Xen (well, for the Xen clock) we have:
>
>   .max_delta_ns = 0xffffffff,
>   .min_delta_ns = TIMER_SLOP,
>
> which means a guest can't ask for a timer to fire earlier than 100us
> ahead, which is a bit too coarse, especially on contemporary hardware.
>
> For "lapic_deadline" (which was what was in use in KVM guests, in our
> experiments) we have:
>
>   lapic_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFF, &lapic_clockevent);
>   lapic_clockevent.min_delta_ns = clockevent_delta2ns(0xF, &lapic_clockevent);
>
> Which means max is 0x7FFFFF device ticks, and min is 0xF.
> clockevent_delta2ns() does the conversion from ticks to ns, basing on
> the results of the APIC calibration process. It calls cev_delta2ns()
> which does some scaling, shifting, divs, etc, and, at the very end,
> this:
>
>   /* Deltas less than 1usec are pointless noise */
>   return clc > 1000 ? clc : 1000;
>
> So, as Ryan is also saying, the actual minimum, in this case, depends
> on hardware, with a sanity check of "never below 1us" (which is quite
> smaller than 100us!)
>
> Of course, the actual granularity depends on hardware in the Xen case
> as well, but that is handled in Xen itself. And we have mechanisms in
> place in there to avoid timer interrupt storms (like, ahem, the Xen's
> 'timer_slop' boot parameter... :-P)
>
> And this is basically why I was also thinking we can/should lower the
> default value of TIMER_SLOP, here in the Xen clock implementation in
> Linux.

What do you think would be a sane value? 10us? Should we then still keep
this patch?

My concern would be that if we change the current value and it turns out
to be very wrong we'd then have no recourse.


-boris


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-26 23:21             ` Boris Ostrovsky
@ 2019-03-27 10:00               ` Ryan Thibodeaux
  2019-03-27 14:46                 ` Boris Ostrovsky
  0 siblings, 1 reply; 22+ messages in thread
From: Ryan Thibodeaux @ 2019-03-27 10:00 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Dario Faggioli, luca abeni, xen-devel, linux-kernel,
	oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux

On Tue, Mar 26, 2019 at 07:21:31PM -0400, Boris Ostrovsky wrote:
> On 3/26/19 5:13 AM, Dario Faggioli wrote:
> > On Mon, 2019-03-25 at 09:43 -0400, Boris Ostrovsky wrote:
> >> On 3/25/19 8:05 AM, luca abeni wrote:
> >>> The picture shows the latencies measured with an unpatched guest
> >>> kernel
> >>> and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary
> >>> small
> >>> value :).
> >>> All the experiments have been performed booting the hypervisor with
> >>> a
> >>> small timer_slop (the hypervisor's one) value. So, they show that
> >>> decreasing the hypervisor's timer_slop is not enough to measure low
> >>> latencies with cyclictest.
> >> I have a couple of questions:
> >> * Does it make sense to make this a tunable for other clockevent
> >> devices
> >> as well?
> >>
> > So, AFAIUI, the thing is as follows. In clockevents_program_event(), we
> > keep the delta between now and the next timer event within
> > dev->max_delta_ns and dev->min_delta_ns:
> >
> >   delta = min(delta, (int64_t) dev->max_delta_ns);
> >   delta = max(delta, (int64_t) dev->min_delta_ns);
> >
> > For Xen (well, for the Xen clock) we have:
> >
> >   .max_delta_ns = 0xffffffff,
> >   .min_delta_ns = TIMER_SLOP,
> >
> > which means a guest can't ask for a timer to fire earlier than 100us
> > ahead, which is a bit too coarse, especially on contemporary hardware.
> >
> > For "lapic_deadline" (which was what was in use in KVM guests, in our
> > experiments) we have:
> >
> >   lapic_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFF, &lapic_clockevent);
> >   lapic_clockevent.min_delta_ns = clockevent_delta2ns(0xF, &lapic_clockevent);
> >
> > Which means max is 0x7FFFFF device ticks, and min is 0xF.
> > clockevent_delta2ns() does the conversion from ticks to ns, basing on
> > the results of the APIC calibration process. It calls cev_delta2ns()
> > which does some scaling, shifting, divs, etc, and, at the very end,
> > this:
> >
> >   /* Deltas less than 1usec are pointless noise */
> >   return clc > 1000 ? clc : 1000;
> >
> > So, as Ryan is also saying, the actual minimum, in this case, depends
> > on hardware, with a sanity check of "never below 1us" (which is quite
> > smaller than 100us!)
> >
> > Of course, the actual granularity depends on hardware in the Xen case
> > as well, but that is handled in Xen itself. And we have mechanisms in
> > place in there to avoid timer interrupt storms (like, ahem, the Xen's
> > 'timer_slop' boot parameter... :-P)
> >
> > And this is basically why I was also thinking we can/should lower the
> > default value of TIMER_SLOP, here in the Xen clock implementation in
> > Linux.
> 
> What do you think would be a sane value? 10us? Should we then still keep
> this patch?
> 
> My concern would be that if we change the current value and it turns out
> to be very wrong we'd then have no recourse.
> 
> 
> -boris
> 

Speaking out of turn but as a participant in this thread, I would not
assume to change the default value for all cases without significant
testing by the community, touching a variety of configurations.

It feels like changing the default has a non-trivial amount of 
unknowns that would need to be addressed.

Not surprisingly, I am biased to the approach of my patch which
does not change the default but offers flexibility to all.

- Ryan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-27 10:00               ` Ryan Thibodeaux
@ 2019-03-27 14:46                 ` Boris Ostrovsky
  2019-03-27 14:59                   ` Ryan Thibodeaux
  2019-03-27 15:19                   ` Dario Faggioli
  0 siblings, 2 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2019-03-27 14:46 UTC (permalink / raw)
  To: Ryan Thibodeaux
  Cc: Dario Faggioli, luca abeni, xen-devel, linux-kernel,
	oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux

On 3/27/19 6:00 AM, Ryan Thibodeaux wrote:
> On Tue, Mar 26, 2019 at 07:21:31PM -0400, Boris Ostrovsky wrote:
>> On 3/26/19 5:13 AM, Dario Faggioli wrote:
>>> On Mon, 2019-03-25 at 09:43 -0400, Boris Ostrovsky wrote:
>>>> On 3/25/19 8:05 AM, luca abeni wrote:
>>>>> The picture shows the latencies measured with an unpatched guest
>>>>> kernel
>>>>> and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary
>>>>> small
>>>>> value :).
>>>>> All the experiments have been performed booting the hypervisor with
>>>>> a
>>>>> small timer_slop (the hypervisor's one) value. So, they show that
>>>>> decreasing the hypervisor's timer_slop is not enough to measure low
>>>>> latencies with cyclictest.
>>>> I have a couple of questions:
>>>> * Does it make sense to make this a tunable for other clockevent
>>>> devices
>>>> as well?
>>>>
>>> So, AFAIUI, the thing is as follows. In clockevents_program_event(), we
>>> keep the delta between now and the next timer event within
>>> dev->max_delta_ns and dev->min_delta_ns:
>>>
>>>   delta = min(delta, (int64_t) dev->max_delta_ns);
>>>   delta = max(delta, (int64_t) dev->min_delta_ns);
>>>
>>> For Xen (well, for the Xen clock) we have:
>>>
>>>   .max_delta_ns = 0xffffffff,
>>>   .min_delta_ns = TIMER_SLOP,
>>>
>>> which means a guest can't ask for a timer to fire earlier than 100us
>>> ahead, which is a bit too coarse, especially on contemporary hardware.
>>>
>>> For "lapic_deadline" (which was what was in use in KVM guests, in our
>>> experiments) we have:
>>>
>>>   lapic_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFF, &lapic_clockevent);
>>>   lapic_clockevent.min_delta_ns = clockevent_delta2ns(0xF, &lapic_clockevent);
>>>
>>> Which means max is 0x7FFFFF device ticks, and min is 0xF.
>>> clockevent_delta2ns() does the conversion from ticks to ns, basing on
>>> the results of the APIC calibration process. It calls cev_delta2ns()
>>> which does some scaling, shifting, divs, etc, and, at the very end,
>>> this:
>>>
>>>   /* Deltas less than 1usec are pointless noise */
>>>   return clc > 1000 ? clc : 1000;
>>>
>>> So, as Ryan is also saying, the actual minimum, in this case, depends
>>> on hardware, with a sanity check of "never below 1us" (which is quite
>>> smaller than 100us!)
>>>
>>> Of course, the actual granularity depends on hardware in the Xen case
>>> as well, but that is handled in Xen itself. And we have mechanisms in
>>> place in there to avoid timer interrupt storms (like, ahem, the Xen's
>>> 'timer_slop' boot parameter... :-P)
>>>
>>> And this is basically why I was also thinking we can/should lower the
>>> default value of TIMER_SLOP, here in the Xen clock implementation in
>>> Linux.
>> What do you think would be a sane value? 10us? Should we then still keep
>> this patch?
>>
>> My concern would be that if we change the current value and it turns out
>> to be very wrong we'd then have no recourse.
>>
>>
>> -boris
>>
> Speaking out of turn but as a participant in this thread, I would not
> assume to change the default value for all cases without significant
> testing by the community, touching a variety of configurations.
>
> It feels like changing the default has a non-trivial amount of 
> unknowns that would need to be addressed.
>
> Not surprisingly, I am biased to the approach of my patch which
> does not change the default but offers flexibility to all.


If we are to change the default it would be good to at least collect
some data on distribution of delta values in
clockevents_program_event(). But as I said, I'd keep the patch.

Also, as far as the comment describing TIMER_SLOP, I agree that it is
rather misleading.

I can replace it with /* Minimum amount of time until next clock event
fires */, I  can do it while committing so no need to resend.

-boris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-27 14:46                 ` Boris Ostrovsky
@ 2019-03-27 14:59                   ` Ryan Thibodeaux
  2019-03-27 15:19                   ` Dario Faggioli
  1 sibling, 0 replies; 22+ messages in thread
From: Ryan Thibodeaux @ 2019-03-27 14:59 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Dario Faggioli, luca abeni, xen-devel, linux-kernel,
	oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux

On Wed, Mar 27, 2019 at 10:46:21AM -0400, Boris Ostrovsky wrote:
> On 3/27/19 6:00 AM, Ryan Thibodeaux wrote:
> > On Tue, Mar 26, 2019 at 07:21:31PM -0400, Boris Ostrovsky wrote:
> >> On 3/26/19 5:13 AM, Dario Faggioli wrote:
> >>> On Mon, 2019-03-25 at 09:43 -0400, Boris Ostrovsky wrote:
> >>>> On 3/25/19 8:05 AM, luca abeni wrote:
> >>>>> The picture shows the latencies measured with an unpatched guest
> >>>>> kernel
> >>>>> and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary
> >>>>> small
> >>>>> value :).
> >>>>> All the experiments have been performed booting the hypervisor with
> >>>>> a
> >>>>> small timer_slop (the hypervisor's one) value. So, they show that
> >>>>> decreasing the hypervisor's timer_slop is not enough to measure low
> >>>>> latencies with cyclictest.
> >>>> I have a couple of questions:
> >>>> * Does it make sense to make this a tunable for other clockevent
> >>>> devices
> >>>> as well?
> >>>>
> >>> So, AFAIUI, the thing is as follows. In clockevents_program_event(), we
> >>> keep the delta between now and the next timer event within
> >>> dev->max_delta_ns and dev->min_delta_ns:
> >>>
> >>>   delta = min(delta, (int64_t) dev->max_delta_ns);
> >>>   delta = max(delta, (int64_t) dev->min_delta_ns);
> >>>
> >>> For Xen (well, for the Xen clock) we have:
> >>>
> >>>   .max_delta_ns = 0xffffffff,
> >>>   .min_delta_ns = TIMER_SLOP,
> >>>
> >>> which means a guest can't ask for a timer to fire earlier than 100us
> >>> ahead, which is a bit too coarse, especially on contemporary hardware.
> >>>
> >>> For "lapic_deadline" (which was what was in use in KVM guests, in our
> >>> experiments) we have:
> >>>
> >>>   lapic_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFF, &lapic_clockevent);
> >>>   lapic_clockevent.min_delta_ns = clockevent_delta2ns(0xF, &lapic_clockevent);
> >>>
> >>> Which means max is 0x7FFFFF device ticks, and min is 0xF.
> >>> clockevent_delta2ns() does the conversion from ticks to ns, basing on
> >>> the results of the APIC calibration process. It calls cev_delta2ns()
> >>> which does some scaling, shifting, divs, etc, and, at the very end,
> >>> this:
> >>>
> >>>   /* Deltas less than 1usec are pointless noise */
> >>>   return clc > 1000 ? clc : 1000;
> >>>
> >>> So, as Ryan is also saying, the actual minimum, in this case, depends
> >>> on hardware, with a sanity check of "never below 1us" (which is quite
> >>> smaller than 100us!)
> >>>
> >>> Of course, the actual granularity depends on hardware in the Xen case
> >>> as well, but that is handled in Xen itself. And we have mechanisms in
> >>> place in there to avoid timer interrupt storms (like, ahem, the Xen's
> >>> 'timer_slop' boot parameter... :-P)
> >>>
> >>> And this is basically why I was also thinking we can/should lower the
> >>> default value of TIMER_SLOP, here in the Xen clock implementation in
> >>> Linux.
> >> What do you think would be a sane value? 10us? Should we then still keep
> >> this patch?
> >>
> >> My concern would be that if we change the current value and it turns out
> >> to be very wrong we'd then have no recourse.
> >>
> >>
> >> -boris
> >>
> > Speaking out of turn but as a participant in this thread, I would not
> > assume to change the default value for all cases without significant
> > testing by the community, touching a variety of configurations.
> >
> > It feels like changing the default has a non-trivial amount of 
> > unknowns that would need to be addressed.
> >
> > Not surprisingly, I am biased to the approach of my patch which
> > does not change the default but offers flexibility to all.
> 
> 
> If we are to change the default it would be good to at least collect
> some data on distribution of delta values in
> clockevents_program_event(). But as I said, I'd keep the patch.
> 
> Also, as far as the comment describing TIMER_SLOP, I agree that it is
> rather misleading.
> 
> I can replace it with /* Minimum amount of time until next clock event
> fires */, I  can do it while committing so no need to resend.
> 
> -boris

I like that. Thanks Boris!

- Ryan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-27 14:46                 ` Boris Ostrovsky
  2019-03-27 14:59                   ` Ryan Thibodeaux
@ 2019-03-27 15:19                   ` Dario Faggioli
  1 sibling, 0 replies; 22+ messages in thread
From: Dario Faggioli @ 2019-03-27 15:19 UTC (permalink / raw)
  To: Boris Ostrovsky, Ryan Thibodeaux
  Cc: luca abeni, xen-devel, linux-kernel, oleksandr_andrushchenko,
	tglx, jgross, ryan.thibodeaux

On Wed, 2019-03-27 at 10:46 -0400, Boris Ostrovsky wrote:
> On 3/27/19 6:00 AM, Ryan Thibodeaux wrote:
> > On Tue, Mar 26, 2019 at 07:21:31PM -0400, Boris Ostrovsky wrote:
> > > On 3/26/19 5:13 AM, Dario Faggioli wrote:
> > > > 
> > > > And this is basically why I was also thinking we can/should
> > > > lower the
> > > > default value of TIMER_SLOP, here in the Xen clock
> > > > implementation in
> > > > Linux.
> > > What do you think would be a sane value? 10us? Should we then
> > > still keep
> > > this patch?
> > > 
> > > My concern would be that if we change the current value and it
> > > turns out
> > > to be very wrong we'd then have no recourse.
> > > 
> > Speaking out of turn but as a participant in this thread, I would
> > not
> > assume to change the default value for all cases without
> > significant
> > testing by the community, touching a variety of configurations.
> > 
> If we are to change the default it would be good to at least collect
> some data on distribution of delta values in
> clockevents_program_event(). But as I said, I'd keep the patch.
> 
I would definitely take/keep this patch. Choosing a more sane (IMO)
default and making things flexible and configurable are not mutually
exclusive things. :-)

I think that having this set to 100us stands in the way of a lot of
people wanting to do time sensitive stuff in Xen VMs. I'd at least
halve that to 50us, but 10us is even better.

But sure we can do this at a later point. And even at that point, a
patch like this is valuable, because there might be people that might,
probably after some testing of their own setup, want to lower it even
further.

> Also, as far as the comment describing TIMER_SLOP, I agree that it is
> rather misleading.
> 
> I can replace it with /* Minimum amount of time until next clock
> event
> fires */, I  can do it while committing so no need to resend.
> 
Yeah, much better, yes.

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option
  2019-03-22 18:29 [PATCH] x86/xen: Add "xen_timer_slop" command line option thibodux
  2019-03-22 22:10 ` Boris Ostrovsky
@ 2019-04-24 18:47 ` Boris Ostrovsky
  1 sibling, 0 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2019-04-24 18:47 UTC (permalink / raw)
  To: thibodux, xen-devel, linux-kernel
  Cc: oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux

On 3/22/19 2:29 PM, thibodux@gmail.com wrote:
> From: Ryan Thibodeaux <ryan.thibodeaux@starlab.io>
>
> Add a new command-line option "xen_timer_slop=<INT>" that sets the
> minimum delta of virtual Xen timers. This commit does not change the
> default timer slop value for virtual Xen timers.
>
> Lowering the timer slop value should improve the accuracy of virtual
> timers (e.g., better process dispatch latency), but it will likely
> increase the number of virtual timer interrupts (relative to the
> original slop setting).
>
> The original timer slop value has not changed since the introduction
> of the Xen-aware Linux kernel code. This commit provides users an
> opportunity to tune timer performance given the refinements to
> hardware and the Xen event channel processing. It also mirrors
> a feature in the Xen hypervisor - the "timer_slop" Xen command line
> option.
>
> Signed-off-by: Ryan Thibodeaux <ryan.thibodeaux@starlab.io>


Applied to for-linus-5.2

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-04-24 18:48 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-22 18:29 [PATCH] x86/xen: Add "xen_timer_slop" command line option thibodux
2019-03-22 22:10 ` Boris Ostrovsky
2019-03-23  2:58   ` Dario Faggioli
2019-03-23 10:41     ` luca abeni
2019-03-25 12:05       ` luca abeni
2019-03-25 13:43         ` Boris Ostrovsky
2019-03-25 14:07           ` luca abeni
2019-03-25 14:11           ` Ryan Thibodeaux
2019-03-25 17:36             ` Ryan Thibodeaux
2019-03-25 18:31             ` Boris Ostrovsky
2019-03-26  9:13           ` Dario Faggioli
2019-03-26 11:12             ` luca abeni
2019-03-26 11:41               ` Ryan Thibodeaux
2019-03-26 23:21             ` Boris Ostrovsky
2019-03-27 10:00               ` Ryan Thibodeaux
2019-03-27 14:46                 ` Boris Ostrovsky
2019-03-27 14:59                   ` Ryan Thibodeaux
2019-03-27 15:19                   ` Dario Faggioli
2019-03-23 12:00   ` Ryan Thibodeaux
2019-03-24 18:07     ` Boris Ostrovsky
2019-03-25 10:36       ` Dario Faggioli
2019-04-24 18:47 ` Boris Ostrovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).