From: Wanpeng Li <kernellwp@gmail.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: "Paolo Bonzini" <pbonzini@redhat.com>,
"Radim Krčmář" <rkrcmar@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Thomas Gleixner" <tglx@linutronix.de>
Subject: [PATCH] KVM: Disable wake-affine vCPU process to mitigate lock holder preemption
Date: Tue, 30 Jul 2019 17:33:55 +0800 [thread overview]
Message-ID: <1564479235-25074-1-git-send-email-wanpengli@tencent.com> (raw)
From: Wanpeng Li <wanpengli@tencent.com>
Wake-affine is a feature inside scheduler which we attempt to make processes
running closely, it gains benefit mostly from cache-hit. When waker tries
to wakup wakee, it needs to select cpu to run wakee, wake affine heuristic
mays select the cpu which waker is running on currently instead of the prev
cpu which wakee was last time running.
However, in multiple VMs over-subscribe virtualization scenario, it increases
the probability to incur vCPU stacking which means that the sibling vCPUs from
the same VM will be stacked on one pCPU. I test three 80 vCPUs VMs running on
one 80 pCPUs Skylake server(PLE is supported), the ebizzy score can increase 17%
after disabling wake-affine for vCPU process.
When qemu/other vCPU inject virtual interrupt to guest through waking up one
sleeping vCPU, it increases the probability to stack vCPUs/qemu by scheduler
wake-affine. vCPU stacking issue can greately inceases the lock synchronization
latency in a virtualized environment. This patch disables wake-affine vCPU
process to mitigtate lock holder preemption.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
include/linux/sched.h | 1 +
kernel/sched/fair.c | 3 +++
virt/kvm/kvm_main.c | 1 +
3 files changed, 5 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8dc1811..3dd33d8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1468,6 +1468,7 @@ extern struct pid *cad_pid;
#define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */
#define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */
#define PF_MEMALLOC_NOCMA 0x10000000 /* All allocation request will have _GFP_MOVABLE cleared */
+#define PF_NO_WAKE_AFFINE 0x20000000 /* This thread should not be wake affine */
#define PF_FREEZER_SKIP 0x40000000 /* Freezer should not count it as freezable */
#define PF_SUSPEND_TASK 0x80000000 /* This thread called freeze_processes() and should not be frozen */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 036be95..18eb1fa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5428,6 +5428,9 @@ static int wake_wide(struct task_struct *p)
unsigned int slave = p->wakee_flips;
int factor = this_cpu_read(sd_llc_size);
+ if (unlikely(p->flags & PF_NO_WAKE_AFFINE))
+ return 1;
+
if (master < slave)
swap(master, slave);
if (slave < factor || master < slave * factor)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 887f3b0..b9f75c3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2680,6 +2680,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
mutex_unlock(&kvm->lock);
kvm_arch_vcpu_postcreate(vcpu);
+ current->flags |= PF_NO_WAKE_AFFINE;
return r;
unlock_vcpu_destroy:
--
2.7.4
next reply other threads:[~2019-07-30 9:34 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-30 9:33 Wanpeng Li [this message]
2019-07-30 11:46 ` [PATCH] KVM: Disable wake-affine vCPU process to mitigate lock holder preemption Paolo Bonzini
2019-08-01 12:39 ` Dario Faggioli
2019-07-30 12:09 ` Peter Zijlstra
2019-08-01 12:57 ` Dario Faggioli
2019-08-02 0:51 ` Wanpeng Li
2019-08-02 8:30 ` Christophe de Dinechin
2019-08-02 8:38 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1564479235-25074-1-git-send-email-wanpengli@tencent.com \
--to=kernellwp@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rkrcmar@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).