linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dongli Zhang <dongli.zhang@oracle.com>
To: x86@kernel.org, virtualization@lists.linux-foundation.org,
	kvm@vger.kernel.org, pv-drivers@vmware.com,
	xen-devel@lists.xenproject.org, linux-hyperv@vger.kernel.org
Cc: jgross@suse.com, akaher@vmware.com, amakhalov@vmware.com,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, pbonzini@redhat.com,
	wanpengli@tencent.com, vkuznets@redhat.com, peterz@infradead.org,
	seanjc@google.com, dwmw2@infradead.org, joe.jin@oracle.com,
	boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org
Subject: [PATCH RFC 1/1] x86/paravirt: introduce param to disable pv sched_clock
Date: Wed, 18 Oct 2023 15:11:23 -0700	[thread overview]
Message-ID: <20231018221123.136403-1-dongli.zhang@oracle.com> (raw)

As mentioned in the linux kernel development document, "sched_clock() is
used for scheduling and timestamping". While there is a default native
implementation, many paravirtualizations have their own implementations.

About KVM, it uses kvm_sched_clock_read() and there is no way to only
disable KVM's sched_clock. The "no-kvmclock" may disable all
paravirtualized kvmclock features.

 94 static inline void kvm_sched_clock_init(bool stable)
 95 {
 96         if (!stable)
 97                 clear_sched_clock_stable();
 98         kvm_sched_clock_offset = kvm_clock_read();
 99         paravirt_set_sched_clock(kvm_sched_clock_read);
100
101         pr_info("kvm-clock: using sched offset of %llu cycles",
102                 kvm_sched_clock_offset);
103
104         BUILD_BUG_ON(sizeof(kvm_sched_clock_offset) >
105                 sizeof(((struct pvclock_vcpu_time_info *)NULL)->system_time));
106 }

There is known issue that kvmclock may drift during vCPU hotplug [1].
Although a temporary fix is available [2], we may need a way to disable pv
sched_clock. Nowadays, the TSC is more stable and has less performance
overhead than kvmclock.

This is to propose to introduce a global param to disable pv sched_clock
for all paravirtualizations.

Please suggest and comment if other options are better:

1. Global param (this RFC patch).

2. The kvmclock specific param (e.g., "no-vmw-sched-clock" in vmware).

Indeed I like the 2nd method.

3. Enforce native sched_clock only when TSC is invariant (hyper-v method).

4. Remove and cleanup pv sched_clock, and always use pv_sched_clock() for
all (suggested by Peter Zijlstra in [3]). Some paravirtualizations may
want to keep the pv sched_clock.

To introduce a param may be easier to backport to old kernel version.

References:
[1] https://lore.kernel.org/all/20230926230649.67852-1-dongli.zhang@oracle.com/
[2] https://lore.kernel.org/all/20231018195638.1898375-1-seanjc@google.com/
[3] https://lore.kernel.org/all/20231002211651.GA3774@noisy.programming.kicks-ass.net/

Thank you very much for the suggestion!

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
 arch/x86/include/asm/paravirt.h |  2 +-
 arch/x86/kernel/kvmclock.c      | 12 +++++++-----
 arch/x86/kernel/paravirt.c      | 18 +++++++++++++++++-
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 6c8ff12140ae..f36edf608b6b 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -24,7 +24,7 @@ u64 dummy_sched_clock(void);
 DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
 DECLARE_STATIC_CALL(pv_sched_clock, dummy_sched_clock);
 
-void paravirt_set_sched_clock(u64 (*func)(void));
+int paravirt_set_sched_clock(u64 (*func)(void));
 
 static __always_inline u64 paravirt_sched_clock(void)
 {
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index fb8f52149be9..0b8bf5677d44 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -93,13 +93,15 @@ static noinstr u64 kvm_sched_clock_read(void)
 
 static inline void kvm_sched_clock_init(bool stable)
 {
-	if (!stable)
-		clear_sched_clock_stable();
 	kvm_sched_clock_offset = kvm_clock_read();
-	paravirt_set_sched_clock(kvm_sched_clock_read);
 
-	pr_info("kvm-clock: using sched offset of %llu cycles",
-		kvm_sched_clock_offset);
+	if (!paravirt_set_sched_clock(kvm_sched_clock_read)) {
+		if (!stable)
+			clear_sched_clock_stable();
+
+		pr_info("kvm-clock: using sched offset of %llu cycles",
+			kvm_sched_clock_offset);
+	}
 
 	BUILD_BUG_ON(sizeof(kvm_sched_clock_offset) >
 		sizeof(((struct pvclock_vcpu_time_info *)NULL)->system_time));
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 97f1436c1a20..2cfef94317b0 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -118,9 +118,25 @@ static u64 native_steal_clock(int cpu)
 DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
 DEFINE_STATIC_CALL(pv_sched_clock, native_sched_clock);
 
-void paravirt_set_sched_clock(u64 (*func)(void))
+static bool no_pv_sched_clock;
+
+static int __init parse_no_pv_sched_clock(char *arg)
+{
+	no_pv_sched_clock = true;
+	return 0;
+}
+early_param("no_pv_sched_clock", parse_no_pv_sched_clock);
+
+int paravirt_set_sched_clock(u64 (*func)(void))
 {
+	if (no_pv_sched_clock) {
+		pr_info("sched_clock: not configurable\n");
+		return -EPERM;
+	}
+
 	static_call_update(pv_sched_clock, func);
+
+	return 0;
 }
 
 /* These are in entry.S */
-- 
2.34.1


             reply	other threads:[~2023-10-18 22:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-18 22:11 Dongli Zhang [this message]
2023-10-19 11:55 ` [PATCH RFC 1/1] x86/paravirt: introduce param to disable pv sched_clock Vitaly Kuznetsov
2023-10-19 15:40   ` Sean Christopherson
2023-10-19 15:47     ` David Woodhouse
2023-10-19 16:46       ` Sean Christopherson
2023-10-19 23:11     ` Dongli Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231018221123.136403-1-dongli.zhang@oracle.com \
    --to=dongli.zhang@oracle.com \
    --cc=akaher@vmware.com \
    --cc=amakhalov@vmware.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=joe.jin@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pv-drivers@vmware.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).