From: "Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)" <hui-zhi.zhao@hp.com> To: "Radim Krčmář" <rkrcmar@redhat.com>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "Paolo Bonzini" <pbonzini@redhat.com>, "Gleb Natapov" <gleb@kernel.org>, "Raghavendra KT" <raghavendra.kt@linux.vnet.ibm.com>, "Mitchell, Lisa (MCLinux in Fort Collins)" <lisa.mitchell@hp.com> Cc: "Vinod, Chegu" <chegu_vinod@hp.com> Subject: RE: [PATCH 0/9] Dynamic Pause Loop Exiting window. Date: Thu, 21 Aug 2014 06:48:46 +0000 [thread overview] Message-ID: <DB5C686A0A7EE44A895494A4E25D21FC1C91EB7D@G5W2731.americas.hpqcorp.net> (raw) In-Reply-To: <1408480536-8240-1-git-send-email-rkrcmar@redhat.com> [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3197 bytes --] This patch have been tested by Lisa and me and it's success. We created 4 VM guests and reboot them every 10 minutes for 12 hours around, and this issue is gone with the patch. Please add Lisa and me to the "tested by:" list. Tested-by: Mitchell, Lisa <lisa.mitchell@hp.com> Tested-by: Zhao, Hui Zhi <hui-zhi.zhao@hp.com> Regards, Steven Zhao -----Original Message----- From: Radim KrÄmáŠ[mailto:rkrcmar@redhat.com] Sent: Wednesday, August 20, 2014 4:35 AM To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org; Paolo Bonzini; Gleb Natapov; Raghavendra KT; Vinod, Chegu; Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC) Subject: [PATCH 0/9] Dynamic Pause Loop Exiting window. PLE does not scale in its current form. When increasing VCPU count above 150, one can hit soft lockups because of runqueue lock contention. (Which says a lot about performance.) The main reason is that kvm_ple_loop cycles through all VCPUs. Replacing it with a scalable solution would be ideal, but it has already been well optimized for various workloads, so this series tries to alleviate one different major problem while minimizing a chance of regressions: we have too many useless PLE exits. Just increasing PLE window would help some cases, but it still spirals out of control. By increasing the window after every PLE exit, we can limit the amount of useless ones, so we don't reach the state where CPUs spend 99% of the time waiting for a lock. HP confirmed that this series avoids soft lockups and TSC sync errors on large guests. --- Design notes and questions: Alternative to first two patches could be a new notifier. All values are made changeable because defaults weren't selected after weeks of benchmarking -- we can get improved performance by hardcoding if someone is willing to do it. (Or by presuming that noone is ever going to.) Then, we can quite safely drop overflow checks: they are impossible to hit with small increases and I don't think that anyone wants large ones. Also, I'd argue against the last patch: it should be done in userspace, but I'm not sure about Linux's policy. Radim KrÄmáŠ(9): KVM: add kvm_arch_sched_in KVM: x86: introduce sched_in to kvm_x86_ops KVM: VMX: make PLE window per-vcpu KVM: VMX: dynamise PLE window KVM: VMX: clamp PLE window KVM: trace kvm_ple_window grow/shrink KVM: VMX: abstract ple_window modifiers KVM: VMX: runtime knobs for dynamic PLE window KVM: VMX: automatic PLE window maximum arch/arm/kvm/arm.c | 4 ++ arch/mips/kvm/mips.c | 4 ++ arch/powerpc/kvm/powerpc.c | 4 ++ arch/s390/kvm/kvm-s390.c | 4 ++ arch/x86/include/asm/kvm_host.h | 2 + arch/x86/kvm/svm.c | 6 +++ arch/x86/kvm/trace.h | 29 +++++++++++++ arch/x86/kvm/vmx.c | 93 +++++++++++++++++++++++++++++++++++++++-- arch/x86/kvm/x86.c | 6 +++ include/linux/kvm_host.h | 2 + virt/kvm/kvm_main.c | 2 + 11 files changed, 153 insertions(+), 3 deletions(-) -- 2.0.4 ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
WARNING: multiple messages have this Message-ID (diff)
From: "Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)" <hui-zhi.zhao@hp.com> To: "Radim Krčmář" <rkrcmar@redhat.com>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "Paolo Bonzini" <pbonzini@redhat.com>, "Gleb Natapov" <gleb@kernel.org>, "Raghavendra KT" <raghavendra.kt@linux.vnet.ibm.com>, "Mitchell, Lisa (MCLinux in Fort Collins)" <lisa.mitchell@hp.com> Cc: "Vinod, Chegu" <chegu_vinod@hp.com> Subject: RE: [PATCH 0/9] Dynamic Pause Loop Exiting window. Date: Thu, 21 Aug 2014 06:48:46 +0000 [thread overview] Message-ID: <DB5C686A0A7EE44A895494A4E25D21FC1C91EB7D@G5W2731.americas.hpqcorp.net> (raw) In-Reply-To: <1408480536-8240-1-git-send-email-rkrcmar@redhat.com> This patch have been tested by Lisa and me and it's success. We created 4 VM guests and reboot them every 10 minutes for 12 hours around, and this issue is gone with the patch. Please add Lisa and me to the "tested by:" list. Tested-by: Mitchell, Lisa <lisa.mitchell@hp.com> Tested-by: Zhao, Hui Zhi <hui-zhi.zhao@hp.com> Regards, Steven Zhao -----Original Message----- From: Radim Krčmář [mailto:rkrcmar@redhat.com] Sent: Wednesday, August 20, 2014 4:35 AM To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org; Paolo Bonzini; Gleb Natapov; Raghavendra KT; Vinod, Chegu; Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC) Subject: [PATCH 0/9] Dynamic Pause Loop Exiting window. PLE does not scale in its current form. When increasing VCPU count above 150, one can hit soft lockups because of runqueue lock contention. (Which says a lot about performance.) The main reason is that kvm_ple_loop cycles through all VCPUs. Replacing it with a scalable solution would be ideal, but it has already been well optimized for various workloads, so this series tries to alleviate one different major problem while minimizing a chance of regressions: we have too many useless PLE exits. Just increasing PLE window would help some cases, but it still spirals out of control. By increasing the window after every PLE exit, we can limit the amount of useless ones, so we don't reach the state where CPUs spend 99% of the time waiting for a lock. HP confirmed that this series avoids soft lockups and TSC sync errors on large guests. --- Design notes and questions: Alternative to first two patches could be a new notifier. All values are made changeable because defaults weren't selected after weeks of benchmarking -- we can get improved performance by hardcoding if someone is willing to do it. (Or by presuming that noone is ever going to.) Then, we can quite safely drop overflow checks: they are impossible to hit with small increases and I don't think that anyone wants large ones. Also, I'd argue against the last patch: it should be done in userspace, but I'm not sure about Linux's policy. Radim Krčmář (9): KVM: add kvm_arch_sched_in KVM: x86: introduce sched_in to kvm_x86_ops KVM: VMX: make PLE window per-vcpu KVM: VMX: dynamise PLE window KVM: VMX: clamp PLE window KVM: trace kvm_ple_window grow/shrink KVM: VMX: abstract ple_window modifiers KVM: VMX: runtime knobs for dynamic PLE window KVM: VMX: automatic PLE window maximum arch/arm/kvm/arm.c | 4 ++ arch/mips/kvm/mips.c | 4 ++ arch/powerpc/kvm/powerpc.c | 4 ++ arch/s390/kvm/kvm-s390.c | 4 ++ arch/x86/include/asm/kvm_host.h | 2 + arch/x86/kvm/svm.c | 6 +++ arch/x86/kvm/trace.h | 29 +++++++++++++ arch/x86/kvm/vmx.c | 93 +++++++++++++++++++++++++++++++++++++++-- arch/x86/kvm/x86.c | 6 +++ include/linux/kvm_host.h | 2 + virt/kvm/kvm_main.c | 2 + 11 files changed, 153 insertions(+), 3 deletions(-) -- 2.0.4
next prev parent reply other threads:[~2014-08-21 6:50 UTC|newest] Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-08-19 20:35 [PATCH 0/9] Dynamic Pause Loop Exiting window Radim Krčmář 2014-08-19 20:35 ` [PATCH 1/9] KVM: add kvm_arch_sched_in Radim Krčmář 2014-08-20 7:47 ` Christian Borntraeger 2014-08-20 12:56 ` Radim Krčmář 2014-08-19 20:35 ` [PATCH 2/9] KVM: x86: introduce sched_in to kvm_x86_ops Radim Krčmář 2014-08-19 20:35 ` [PATCH 3/9] KVM: VMX: make PLE window per-vcpu Radim Krčmář 2014-08-20 7:13 ` Paolo Bonzini 2014-08-20 12:26 ` Radim Krčmář 2014-08-19 20:35 ` [PATCH 4/9] KVM: VMX: dynamise PLE window Radim Krčmář 2014-08-19 20:35 ` [PATCH 5/9] KVM: VMX: clamp " Radim Krčmář 2014-08-20 7:18 ` Paolo Bonzini 2014-08-20 12:46 ` Radim Krčmář 2014-08-19 20:35 ` [PATCH 6/9] KVM: trace kvm_ple_window grow/shrink Radim Krčmář 2014-08-19 20:35 ` [PATCH 7/9] KVM: VMX: abstract ple_window modifiers Radim Krčmář 2014-08-20 7:02 ` Paolo Bonzini 2014-08-20 12:25 ` Radim Krčmář 2014-08-19 20:35 ` [PATCH 8/9] KVM: VMX: runtime knobs for dynamic PLE window Radim Krčmář 2014-08-19 20:35 ` [PATCH 9/9] KVM: VMX: automatic PLE window maximum Radim Krčmář 2014-08-20 7:16 ` Paolo Bonzini 2014-08-20 7:18 ` Paolo Bonzini 2014-08-20 12:41 ` Radim Krčmář 2014-08-20 13:15 ` Paolo Bonzini 2014-08-20 15:31 ` Radim Krčmář 2014-08-20 15:34 ` Paolo Bonzini 2014-08-20 16:01 ` Radim Krčmář 2014-08-20 16:03 ` Paolo Bonzini 2014-08-20 16:26 ` Radim Krčmář 2014-08-21 6:48 ` Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC) [this message] 2014-08-21 6:48 ` [PATCH 0/9] Dynamic Pause Loop Exiting window Zhao, Hui-Zhi (Steven, HPservers-Core-OE-PSC)
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=DB5C686A0A7EE44A895494A4E25D21FC1C91EB7D@G5W2731.americas.hpqcorp.net \ --to=hui-zhi.zhao@hp.com \ --cc=chegu_vinod@hp.com \ --cc=gleb@kernel.org \ --cc=kvm@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=lisa.mitchell@hp.com \ --cc=pbonzini@redhat.com \ --cc=raghavendra.kt@linux.vnet.ibm.com \ --cc=rkrcmar@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.