kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Radim Krčmář" <rkrcmar@redhat.com>
To: Wanpeng Li <kernellwp@gmail.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v2] KVM: LAPIC: Periodically revaluate to get conservative lapic_timer_advance_ns
Date: Tue, 27 Aug 2019 19:47:17 +0200	[thread overview]
Message-ID: <20190827174717.GB65641@flask> (raw)
In-Reply-To: <1565841817-25982-1-git-send-email-wanpengli@tencent.com>

2019-08-15 12:03+0800, Wanpeng Li:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Even if for realtime CPUs, cache line bounces, frequency scaling, presence 
> of higher-priority RT tasks, etc can still cause different response. These 
> interferences should be considered and periodically revaluate whether 
> or not the lapic_timer_advance_ns value is the best, do nothing if it is,
> otherwise recaluate again. Set lapic_timer_advance_ns to the minimal 
> conservative value from all the estimated values.

IIUC, this patch is looking for the minimal timer_advance_ns because it
provides the best throughput:
When every code path ran as fast as possible and we don't have to wait
for the timer to expire, but still arrive exactly at the point when it
would have expired.
We always arrive late late if something delayed the execution, which
means higher latencies, but RT shouldn't be using an adaptive algorithm
anyway, so that is not an issue.

The computed conservative timer_advance_ns will converge to the minimal
measured timer_advance_ns as time progresses, because it can only go
down and will do so repeatedly by small steps as even one smaller
measurement sufficiently close is enough to decrease it.

With that in mind, wouldn't the following patch (completely untested)
give you about the same numbers?

The idea is that if we have to wait, we are wasting time and therefore
decrease timer_advance_ns to eliminate the time spent waiting.

The first run is special and just sets timer_advance_ns to the latency
we measured, regardless of what it is -- it deals with the possibility
that the default was too low.

This algorithm is also likely prone to turbo boost making few runs
faster than what is then achieved during a more common workload, but
we'd need to have a sliding window or some other more sophisticated
algorithm in order to deal with that.

I also like Paolo's idea of a smoothing -- if we use a moving average
based on advance_expire_delta, we wouldn't even have to convert it into
ns unless it reached a given threshold, which could make decently fast
to be run every time.

Something like

   moving_average = (apic->lapic_timer.moving_average * (weight - 1) + advance_expire_delta) / weight

   if (moving_average > threshold)
      recompute timer_advance_ns

   apic->lapic_timer.moving_average = moving_average

where weight would be a of 2 to make the operation fast.

This kind of moving average gives less value to old inputs and the
weight allows us to control the reaction speed of the approximation.
(A small number like 4 or 8 seems about right.)

I don't have any information on the latency, though.
Do you think that the added overhead isn't worth the smoothing?

Thanks.

---8<---
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e904ff06a83d..d7f2af2eb3ce 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1491,23 +1491,20 @@ static inline void adjust_lapic_timer_advance(struct kvm_vcpu *vcpu,
 	if (advance_expire_delta < 0) {
 		ns = -advance_expire_delta * 1000000ULL;
 		do_div(ns, vcpu->arch.virtual_tsc_khz);
-		timer_advance_ns -= min((u32)ns,
-			timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
+		timer_advance_ns -= (u32)ns;
 	} else {
 	/* too late */
+		/* This branch can only be taken on the initial calibration. */
+		if (apic->lapic_timer.timer_advance_adjust_done)
+			pr_err_once("kvm: broken expectation in lapic timer_advance_ns");
+
 		ns = advance_expire_delta * 1000000ULL;
 		do_div(ns, vcpu->arch.virtual_tsc_khz);
-		timer_advance_ns += min((u32)ns,
-			timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
+		timer_advance_ns += (u32)ns;
 	}
 
-	if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
-		apic->lapic_timer.timer_advance_adjust_done = true;
-	if (unlikely(timer_advance_ns > 5000)) {
-		timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT;
-		apic->lapic_timer.timer_advance_adjust_done = false;
-	}
 	apic->lapic_timer.timer_advance_ns = timer_advance_ns;
+	apic->lapic_timer.timer_advance_adjust_done = true;
 }
 
 static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
@@ -1526,7 +1523,7 @@ static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
 	if (guest_tsc < tsc_deadline)
 		__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
 
-	if (unlikely(!apic->lapic_timer.timer_advance_adjust_done))
+	if (unlikely(!apic->lapic_timer.timer_advance_adjust_done) || guest_tsc < tsc_deadline)
 		adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
 }
 

  parent reply	other threads:[~2019-08-27 17:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-15  4:03 [PATCH v2] KVM: LAPIC: Periodically revaluate to get conservative lapic_timer_advance_ns Wanpeng Li
2019-08-22  2:37 ` Wanpeng Li
2019-08-27 17:47 ` Radim Krčmář [this message]
2019-08-28  8:09   ` Wanpeng Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190827174717.GB65641@flask \
    --to=rkrcmar@redhat.com \
    --cc=kernellwp@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).