All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATH 6.4.y] KVM: x86/mmu: Add "never" option to allow sticky disabling of nx_huge_pages
@ 2023-09-05 14:54 Luiz Capitulino
  2023-09-06  0:03 ` Sean Christopherson
  0 siblings, 1 reply; 2+ messages in thread
From: Luiz Capitulino @ 2023-09-05 14:54 UTC (permalink / raw)
  To: stable, seanjc
  Cc: lcapitulino, Li RongQing, Yong He, Robert Hoo, Kai Huang,
	Luiz Capitulino

From: Sean Christopherson <seanjc@google.com>

Commit 0b210faf337314e4bc88e796218bc70c72a51209 upstream.

Add a "never" option to the nx_huge_pages module param to allow userspace
to do a one-way hard disabling of the mitigation, and don't create the
per-VM recovery threads when the mitigation is hard disabled.  Letting
userspace pinky swear that userspace doesn't want to enable NX mitigation
(without reloading KVM) allows certain use cases to avoid the latency
problems associated with spawning a kthread for each VM.

E.g. in FaaS use cases, the guest kernel is trusted and the host may
create 100+ VMs per logical CPU, which can result in 100ms+ latencies when
a burst of VMs is created.

Reported-by: Li RongQing <lirongqing@baidu.com>
Closes: https://lore.kernel.org/all/1679555884-32544-1-git-send-email-lirongqing@baidu.com
Cc: Yong He <zhuangel570@gmail.com>
Cc: Robert Hoo <robert.hoo.linux@gmail.com>
Cc: Kai Huang <kai.huang@intel.com>
Reviewed-by: Robert Hoo <robert.hoo.linux@gmail.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Tested-by: Luiz Capitulino <luizcap@amazon.com>
Reviewed-by: Li RongQing <lirongqing@baidu.com>
Link: https://lore.kernel.org/r/20230602005859.784190-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Luiz Capitulino <luizcap@amazon.com>
---
 arch/x86/kvm/mmu/mmu.c | 41 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 36 insertions(+), 5 deletions(-)

I submitted this backport for 6.1.y[1] but we agreed that having it for 6.4.y
is desirable to allow upgrade path.

Tests performed:

 * Confirmed KVM_CREATE_VM latency goes down to less than 1ms
 * Quickly booted a simple guest with kvmtool

[1] https://lore.kernel.org/stable/cover.1693593288.git.luizcap@amazon.com/

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6eaa3d6994ae..11c050f40d82 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -58,6 +58,8 @@
 
 extern bool itlb_multihit_kvm_mitigation;
 
+static bool nx_hugepage_mitigation_hard_disabled;
+
 int __read_mostly nx_huge_pages = -1;
 static uint __read_mostly nx_huge_pages_recovery_period_ms;
 #ifdef CONFIG_PREEMPT_RT
@@ -67,12 +69,13 @@ static uint __read_mostly nx_huge_pages_recovery_ratio = 0;
 static uint __read_mostly nx_huge_pages_recovery_ratio = 60;
 #endif
 
+static int get_nx_huge_pages(char *buffer, const struct kernel_param *kp);
 static int set_nx_huge_pages(const char *val, const struct kernel_param *kp);
 static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel_param *kp);
 
 static const struct kernel_param_ops nx_huge_pages_ops = {
 	.set = set_nx_huge_pages,
-	.get = param_get_bool,
+	.get = get_nx_huge_pages,
 };
 
 static const struct kernel_param_ops nx_huge_pages_recovery_param_ops = {
@@ -6844,6 +6847,14 @@ static void mmu_destroy_caches(void)
 	kmem_cache_destroy(mmu_page_header_cache);
 }
 
+static int get_nx_huge_pages(char *buffer, const struct kernel_param *kp)
+{
+	if (nx_hugepage_mitigation_hard_disabled)
+		return sprintf(buffer, "never\n");
+
+	return param_get_bool(buffer, kp);
+}
+
 static bool get_nx_auto_mode(void)
 {
 	/* Return true when CPU has the bug, and mitigations are ON */
@@ -6860,15 +6871,29 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
 	bool old_val = nx_huge_pages;
 	bool new_val;
 
+	if (nx_hugepage_mitigation_hard_disabled)
+		return -EPERM;
+
 	/* In "auto" mode deploy workaround only if CPU has the bug. */
-	if (sysfs_streq(val, "off"))
+	if (sysfs_streq(val, "off")) {
 		new_val = 0;
-	else if (sysfs_streq(val, "force"))
+	} else if (sysfs_streq(val, "force")) {
 		new_val = 1;
-	else if (sysfs_streq(val, "auto"))
+	} else if (sysfs_streq(val, "auto")) {
 		new_val = get_nx_auto_mode();
-	else if (kstrtobool(val, &new_val) < 0)
+	} else if (sysfs_streq(val, "never")) {
+		new_val = 0;
+
+		mutex_lock(&kvm_lock);
+		if (!list_empty(&vm_list)) {
+			mutex_unlock(&kvm_lock);
+			return -EBUSY;
+		}
+		nx_hugepage_mitigation_hard_disabled = true;
+		mutex_unlock(&kvm_lock);
+	} else if (kstrtobool(val, &new_val) < 0) {
 		return -EINVAL;
+	}
 
 	__set_nx_huge_pages(new_val);
 
@@ -7006,6 +7031,9 @@ static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel
 	uint old_period, new_period;
 	int err;
 
+	if (nx_hugepage_mitigation_hard_disabled)
+		return -EPERM;
+
 	was_recovery_enabled = calc_nx_huge_pages_recovery_period(&old_period);
 
 	err = param_set_uint(val, kp);
@@ -7164,6 +7192,9 @@ int kvm_mmu_post_init_vm(struct kvm *kvm)
 {
 	int err;
 
+	if (nx_hugepage_mitigation_hard_disabled)
+		return 0;
+
 	err = kvm_vm_create_worker_thread(kvm, kvm_nx_huge_page_recovery_worker, 0,
 					  "kvm-nx-lpage-recovery",
 					  &kvm->arch.nx_huge_page_recovery_thread);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATH 6.4.y] KVM: x86/mmu: Add "never" option to allow sticky disabling of nx_huge_pages
  2023-09-05 14:54 [PATH 6.4.y] KVM: x86/mmu: Add "never" option to allow sticky disabling of nx_huge_pages Luiz Capitulino
@ 2023-09-06  0:03 ` Sean Christopherson
  0 siblings, 0 replies; 2+ messages in thread
From: Sean Christopherson @ 2023-09-06  0:03 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: stable, lcapitulino, Li RongQing, Yong He, Robert Hoo, Kai Huang

On Tue, Sep 05, 2023, Luiz Capitulino wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Commit 0b210faf337314e4bc88e796218bc70c72a51209 upstream.
> 
> Add a "never" option to the nx_huge_pages module param to allow userspace
> to do a one-way hard disabling of the mitigation, and don't create the
> per-VM recovery threads when the mitigation is hard disabled.  Letting
> userspace pinky swear that userspace doesn't want to enable NX mitigation
> (without reloading KVM) allows certain use cases to avoid the latency
> problems associated with spawning a kthread for each VM.
> 
> E.g. in FaaS use cases, the guest kernel is trusted and the host may
> create 100+ VMs per logical CPU, which can result in 100ms+ latencies when
> a burst of VMs is created.
> 
> Reported-by: Li RongQing <lirongqing@baidu.com>
> Closes: https://lore.kernel.org/all/1679555884-32544-1-git-send-email-lirongqing@baidu.com
> Cc: Yong He <zhuangel570@gmail.com>
> Cc: Robert Hoo <robert.hoo.linux@gmail.com>
> Cc: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Robert Hoo <robert.hoo.linux@gmail.com>
> Acked-by: Kai Huang <kai.huang@intel.com>
> Tested-by: Luiz Capitulino <luizcap@amazon.com>
> Reviewed-by: Li RongQing <lirongqing@baidu.com>
> Link: https://lore.kernel.org/r/20230602005859.784190-1-seanjc@google.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Luiz Capitulino <luizcap@amazon.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 41 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 36 insertions(+), 5 deletions(-)
> 
> I submitted this backport for 6.1.y[1] but we agreed that having it for 6.4.y
> is desirable to allow upgrade path.

Heh, I would have personally just let 6.4 suffer, but since you went through the
effort:

Acked-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-09-06  0:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-05 14:54 [PATH 6.4.y] KVM: x86/mmu: Add "never" option to allow sticky disabling of nx_huge_pages Luiz Capitulino
2023-09-06  0:03 ` Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.