From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752243AbdKICC2 (ORCPT <rfc822;w@1wt.eu>);
        Wed, 8 Nov 2017 21:02:28 -0500
Received: from mail-pg0-f66.google.com ([74.125.83.66]:45995 "EHLO
        mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751930AbdKICCW (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 8 Nov 2017 21:02:22 -0500
X-Google-Smtp-Source: ABhQp+ShagNjBLYM+0iK36Owp+hh6wW4oc904AbhciBz5eba/UFaJ77IOfHQP8oQct/9qm5H1y7Pcw==
From: Wanpeng Li <kernellwp@gmail.com>
X-Google-Original-From: Wanpeng Li <wanpeng.li@hotmail.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
        =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= <rkrcmar@redhat.com>,
        Wanpeng Li <wanpeng.li@hotmail.com>
Subject: [PATCH RESEND 2/3] KVM: Add paravirt remote TLB flush
Date: Wed,  8 Nov 2017 18:02:13 -0800
Message-Id: <1510192934-5369-3-git-send-email-wanpeng.li@hotmail.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1510192934-5369-1-git-send-email-wanpeng.li@hotmail.com>
References: <1510192934-5369-1-git-send-email-wanpeng.li@hotmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

From: Wanpeng Li <wanpeng.li@hotmail.com>

Remote flushing api's does a busy wait which is fine in bare-metal
scenario. But with-in the guest, the vcpus might have been pre-empted
or blocked. In this scenario, the initator vcpu would end up
busy-waiting for a long amount of time.

This patch set implements para-virt flush tlbs making sure that it
does not wait for vcpus that are sleeping. And all the sleeping vcpus
flush the tlb on guest enter.

The best result is achieved when we're overcommiting the host by running 
multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching 
vCPUs which are not scheduled and avoid the wait on the main CPU.

Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in 
one linux guest.

ebizzy -M 
              vanilla    optimized     boost
 8 vCPUs       10152       10083       -0.68% 
16 vCPUs        1224        4866       297.5% 
24 vCPUs        1109        3871       249%
32 vCPUs        1025        3375       229.3% 

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kernel/kvm.c                | 29 +++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index ff23ce9..189e354 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -52,6 +52,7 @@ struct kvm_steal_time {
 
 #define KVM_VCPU_NOT_PREEMPTED      (0 << 0)
 #define KVM_VCPU_PREEMPTED          (1 << 0)
+#define KVM_VCPU_SHOULD_FLUSH       (1 << 1)
 
 #define KVM_CLOCK_PAIRING_WALLCLOCK 0
 struct kvm_clock_pairing {
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 1b1b641..2e2f3ae 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void)
 	update_intr_gate(X86_TRAP_PF, async_page_fault);
 }
 
+static void kvm_flush_tlb_others(const struct cpumask *cpumask,
+			const struct flush_tlb_info *info)
+{
+	u8 state;
+	int cpu;
+	struct kvm_steal_time *src;
+	cpumask_t flushmask;
+
+
+	cpumask_copy(&flushmask, cpumask);
+	/*
+	 * We have to call flush only on online vCPUs. And
+	 * queue flush_on_enter for pre-empted vCPUs
+	 */
+	for_each_cpu(cpu, cpumask) {
+		src = &per_cpu(steal_time, cpu);
+		state = src->preempted;
+		if ((state & KVM_VCPU_PREEMPTED)) {
+			if (cmpxchg(&src->preempted, state, state | 1 <<
+				KVM_VCPU_SHOULD_FLUSH))
+					cpumask_clear_cpu(cpu, &flushmask);
+		}
+	}
+
+	native_flush_tlb_others(&flushmask, info);
+}
+
 void __init kvm_guest_init(void)
 {
 	int i;
@@ -484,6 +511,8 @@ void __init kvm_guest_init(void)
 		pv_time_ops.steal_clock = kvm_steal_clock;
 	}
 
+	pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others;
+
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
 		apic_set_eoi_write(kvm_guest_apic_eoi_write);
 
-- 
2.7.4