From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751576AbdHPEFJ (ORCPT ); Wed, 16 Aug 2017 00:05:09 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43360 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751212AbdHPEFG (ORCPT ); Wed, 16 Aug 2017 00:05:06 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 0C051C047B8F Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mst@redhat.com Date: Wed, 16 Aug 2017 07:04:53 +0300 From: "Michael S. Tsirkin" To: root Cc: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, pbonzini@redhat.com, x86@kernel.org, corbet@lwn.net, tony.luck@intel.com, bp@alien8.de, peterz@infradead.org, mchehab@kernel.org, akpm@linux-foundation.org, krzk@kernel.org, jpoimboe@redhat.com, luto@kernel.org, borntraeger@de.ibm.com, thgarnie@google.com, rgerst@gmail.com, minipli@googlemail.com, douly.fnst@cn.fujitsu.com, nicstange@gmail.com, fweisbec@gmail.com, dvlasenk@redhat.com, bristot@redhat.com, yamada.masahiro@socionext.com, mika.westerberg@linux.intel.com, yu.c.chen@intel.com, aaron.lu@intel.com, rostedt@goodmis.org, me@kylehuey.com, len.brown@intel.com, prarit@redhat.com, hidehiro.kawai.ez@hitachi.com, fengtiantian@huawei.com, pmladek@suse.com, jeyu@redhat.com, Larry.Finger@lwfinger.net, zijun_hu@htc.com, luisbg@osg.samsung.com, johannes.berg@intel.com, niklas.soderlund+renesas@ragnatech.se, zlpnobody@gmail.com, adobriyan@gmail.com, fgao@48lvckh6395k16k5.yundunddos.com, ebiederm@xmission.com, subashab@codeaurora.org, arnd@arndb.de, matt@codeblueprint.co.uk, mgorman@techsingularity.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-edac@vger.kernel.org, kvm@vger.kernel.org Subject: Re: [PATCH 1/2] x86/idle: add halt poll for halt idle Message-ID: <20170816070305-mutt-send-email-mst@kernel.org> References: <1498130534-26568-1-git-send-email-root@ip-172-31-39-62.us-west-2.compute.internal> <1498130534-26568-2-git-send-email-root@ip-172-31-39-62.us-west-2.compute.internal> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1498130534-26568-2-git-send-email-root@ip-172-31-39-62.us-west-2.compute.internal> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 16 Aug 2017 04:05:06 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 22, 2017 at 11:22:13AM +0000, root wrote: > From: Yang Zhang > > This patch introduce a new mechanism to poll for a while before > entering idle state. > > David has a topic in KVM forum to describe the problem on current KVM VM > when running some message passing workload in KVM forum. Also, there > are some work to improve the performance in KVM, like halt polling in KVM. > But we still has 4 MSR wirtes and HLT vmexit when going into halt idle > which introduce lot of latency. > > Halt polling in KVM provide the capbility to not schedule out VCPU when > it is the only task in this pCPU. Unlike it, this patch will let VCPU polls > for a while if there is no work inside VCPU to elimiate heavy vmexit during > in/out idle. The potential impact is it will cost more CPU cycle since we > are doing polling and may impact other task which waiting on the same > physical CPU in host. I wonder whether you considered doing this in an idle driver. I have a prototype patch combining this with mwait within guest - I can post it if you are interested. > Here is the data i get when running benchmark contextswitch > (https://github.com/tsuna/contextswitch) > > before patch: > 2000000 process context switches in 4822613801ns (2411.3ns/ctxsw) > > after patch: > 2000000 process context switches in 3584098241ns (1792.0ns/ctxsw) > > Signed-off-by: Yang Zhang > --- > Documentation/sysctl/kernel.txt | 10 ++++++++++ > arch/x86/kernel/process.c | 21 +++++++++++++++++++++ > include/linux/kernel.h | 3 +++ > kernel/sched/idle.c | 3 +++ > kernel/sysctl.c | 9 +++++++++ > 5 files changed, 46 insertions(+) > > diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt > index bac23c1..4e71bfe 100644 > --- a/Documentation/sysctl/kernel.txt > +++ b/Documentation/sysctl/kernel.txt > @@ -63,6 +63,7 @@ show up in /proc/sys/kernel: > - perf_event_max_stack > - perf_event_max_contexts_per_stack > - pid_max > +- poll_threshold_ns [ X86 only ] > - powersave-nap [ PPC only ] > - printk > - printk_delay > @@ -702,6 +703,15 @@ kernel tries to allocate a number starting from this one. > > ============================================================== > > +poll_threshold_ns: (X86 only) > + > +This parameter used to control the max wait time to poll before going > +into real idle state. By default, the values is 0 means don't poll. > +It is recommended to change the value to non-zero if running latency-bound > +workloads in VM. > + > +============================================================== > + > powersave-nap: (PPC only) > > If set, Linux-PPC will use the 'nap' mode of powersaving, > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index 0bb8842..6361783 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -39,6 +39,10 @@ > #include > #include > > +#ifdef CONFIG_HYPERVISOR_GUEST > +unsigned long poll_threshold_ns; > +#endif > + > /* > * per-CPU TSS segments. Threads are completely 'soft' on Linux, > * no more per-task TSS's. The TSS size is kept cacheline-aligned > @@ -313,6 +317,23 @@ static inline void play_dead(void) > } > #endif > > +#ifdef CONFIG_HYPERVISOR_GUEST > +void arch_cpu_idle_poll(void) > +{ > + ktime_t start, cur, stop; > + > + if (poll_threshold_ns) { > + start = cur = ktime_get(); > + stop = ktime_add_ns(ktime_get(), poll_threshold_ns); > + do { > + if (need_resched()) > + break; > + cur = ktime_get(); > + } while (ktime_before(cur, stop)); > + } > +} > +#endif > + > void arch_cpu_idle_enter(void) > { > tsc_verify_tsc_adjust(false); > diff --git a/include/linux/kernel.h b/include/linux/kernel.h > index 13bc08a..04cf774 100644 > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -460,6 +460,9 @@ extern __scanf(2, 0) > extern int sysctl_panic_on_stackoverflow; > > extern bool crash_kexec_post_notifiers; > +#ifdef CONFIG_HYPERVISOR_GUEST > +extern unsigned long poll_threshold_ns; > +#endif > > /* > * panic_cpu is used for synchronizing panic() and crash_kexec() execution. It > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > index 2a25a9e..e789f99 100644 > --- a/kernel/sched/idle.c > +++ b/kernel/sched/idle.c > @@ -74,6 +74,7 @@ static noinline int __cpuidle cpu_idle_poll(void) > } > > /* Weak implementations for optional arch specific functions */ > +void __weak arch_cpu_idle_poll(void) { } > void __weak arch_cpu_idle_prepare(void) { } > void __weak arch_cpu_idle_enter(void) { } > void __weak arch_cpu_idle_exit(void) { } > @@ -219,6 +220,8 @@ static void do_idle(void) > */ > > __current_set_polling(); > + arch_cpu_idle_poll(); > + > tick_nohz_idle_enter(); > > while (!need_resched()) { > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 4dfba1a..9174d57 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -1203,6 +1203,15 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write, > .extra2 = &one, > }, > #endif > +#ifdef CONFIG_HYPERVISOR_GUEST > + { > + .procname = "halt_poll_threshold", > + .data = &poll_threshold_ns, > + .maxlen = sizeof(unsigned long), > + .mode = 0644, > + .proc_handler = proc_dointvec, > + }, > +#endif > { } > }; > > -- > 1.8.3.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Subject: [1/2] x86/idle: add halt poll for halt idle From: "Michael S. Tsirkin" Message-Id: <20170816070305-mutt-send-email-mst@kernel.org> Date: Wed, 16 Aug 2017 07:04:53 +0300 To: root Cc: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, pbonzini@redhat.com, x86@kernel.org, corbet@lwn.net, tony.luck@intel.com, bp@alien8.de, peterz@infradead.org, mchehab@kernel.org, akpm@linux-foundation.org, krzk@kernel.org, jpoimboe@redhat.com, luto@kernel.org, borntraeger@de.ibm.com, thgarnie@google.com, rgerst@gmail.com, minipli@googlemail.com, douly.fnst@cn.fujitsu.com, nicstange@gmail.com, fweisbec@gmail.com, dvlasenk@redhat.com, bristot@redhat.com, yamada.masahiro@socionext.com, mika.westerberg@linux.intel.com, yu.c.chen@intel.com, aaron.lu@intel.com, rostedt@goodmis.org, me@kylehuey.com, len.brown@intel.com, prarit@redhat.com, hidehiro.kawai.ez@hitachi.com, fengtiantian@huawei.com, pmladek@suse.com, jeyu@redhat.com, Larry.Finger@lwfinger.net, zijun_hu@htc.com, luisbg@osg.samsung.com, johannes.berg@intel.com, niklas.soderlund+renesas@ragnatech.se, zlpnobody@gmail.com, adobriyan@gmail.com, fgao@ikuai8.com, ebiederm@xmission.com, subashab@codeaurora.org, arnd@arndb.de, matt@codeblueprint.co.uk, mgorman@techsingularity.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-edac@vger.kernel.org, kvm@vger.kernel.org List-ID: T24gVGh1LCBKdW4gMjIsIDIwMTcgYXQgMTE6MjI6MTNBTSArMDAwMCwgcm9vdCB3cm90ZToKPiBG cm9tOiBZYW5nIFpoYW5nIDx5YW5nLnpoYW5nLnd6QGdtYWlsLmNvbT4KPiAKPiBUaGlzIHBhdGNo IGludHJvZHVjZSBhIG5ldyBtZWNoYW5pc20gdG8gcG9sbCBmb3IgYSB3aGlsZSBiZWZvcmUKPiBl bnRlcmluZyBpZGxlIHN0YXRlLgo+IAo+IERhdmlkIGhhcyBhIHRvcGljIGluIEtWTSBmb3J1bSB0 byBkZXNjcmliZSB0aGUgcHJvYmxlbSBvbiBjdXJyZW50IEtWTSBWTQo+IHdoZW4gcnVubmluZyBz b21lIG1lc3NhZ2UgcGFzc2luZyB3b3JrbG9hZCBpbiBLVk0gZm9ydW0uIEFsc28sIHRoZXJlCj4g YXJlIHNvbWUgd29yayB0byBpbXByb3ZlIHRoZSBwZXJmb3JtYW5jZSBpbiBLVk0sIGxpa2UgaGFs dCBwb2xsaW5nIGluIEtWTS4KPiBCdXQgd2Ugc3RpbGwgaGFzIDQgTVNSIHdpcnRlcyBhbmQgSExU IHZtZXhpdCB3aGVuIGdvaW5nIGludG8gaGFsdCBpZGxlCj4gd2hpY2ggaW50cm9kdWNlIGxvdCBv ZiBsYXRlbmN5Lgo+IAo+IEhhbHQgcG9sbGluZyBpbiBLVk0gcHJvdmlkZSB0aGUgY2FwYmlsaXR5 IHRvIG5vdCBzY2hlZHVsZSBvdXQgVkNQVSB3aGVuCj4gaXQgaXMgdGhlIG9ubHkgdGFzayBpbiB0 aGlzIHBDUFUuIFVubGlrZSBpdCwgdGhpcyBwYXRjaCB3aWxsIGxldCBWQ1BVIHBvbGxzCj4gZm9y IGEgd2hpbGUgaWYgdGhlcmUgaXMgbm8gd29yayBpbnNpZGUgVkNQVSB0byBlbGltaWF0ZSBoZWF2 eSB2bWV4aXQgZHVyaW5nCj4gaW4vb3V0IGlkbGUuIFRoZSBwb3RlbnRpYWwgaW1wYWN0IGlzIGl0 IHdpbGwgY29zdCBtb3JlIENQVSBjeWNsZSBzaW5jZSB3ZQo+IGFyZSBkb2luZyBwb2xsaW5nIGFu ZCBtYXkgaW1wYWN0IG90aGVyIHRhc2sgd2hpY2ggd2FpdGluZyBvbiB0aGUgc2FtZQo+IHBoeXNp Y2FsIENQVSBpbiBob3N0LgoKSSB3b25kZXIgd2hldGhlciB5b3UgY29uc2lkZXJlZCBkb2luZyB0 aGlzIGluIGFuIGlkbGUgZHJpdmVyLgpJIGhhdmUgYSBwcm90b3R5cGUgcGF0Y2ggY29tYmluaW5n IHRoaXMgd2l0aCBtd2FpdCB3aXRoaW4gZ3Vlc3QgLQpJIGNhbiBwb3N0IGl0IGlmIHlvdSBhcmUg aW50ZXJlc3RlZC4KCgo+IEhlcmUgaXMgdGhlIGRhdGEgaSBnZXQgd2hlbiBydW5uaW5nIGJlbmNo bWFyayBjb250ZXh0c3dpdGNoCj4gKGh0dHBzOi8vZ2l0aHViLmNvbS90c3VuYS9jb250ZXh0c3dp dGNoKQo+IAo+IGJlZm9yZSBwYXRjaDoKPiAyMDAwMDAwIHByb2Nlc3MgY29udGV4dCBzd2l0Y2hl cyBpbiA0ODIyNjEzODAxbnMgKDI0MTEuM25zL2N0eHN3KQo+IAo+IGFmdGVyIHBhdGNoOgo+IDIw MDAwMDAgcHJvY2VzcyBjb250ZXh0IHN3aXRjaGVzIGluIDM1ODQwOTgyNDFucyAoMTc5Mi4wbnMv Y3R4c3cpCj4gCj4gU2lnbmVkLW9mZi1ieTogWWFuZyBaaGFuZyA8eWFuZy56aGFuZy53ekBnbWFp bC5jb20+Cj4gLS0tCj4gIERvY3VtZW50YXRpb24vc3lzY3RsL2tlcm5lbC50eHQgfCAxMCArKysr KysrKysrCj4gIGFyY2gveDg2L2tlcm5lbC9wcm9jZXNzLmMgICAgICAgfCAyMSArKysrKysrKysr KysrKysrKysrKysKPiAgaW5jbHVkZS9saW51eC9rZXJuZWwuaCAgICAgICAgICB8ICAzICsrKwo+ ICBrZXJuZWwvc2NoZWQvaWRsZS5jICAgICAgICAgICAgIHwgIDMgKysrCj4gIGtlcm5lbC9zeXNj dGwuYyAgICAgICAgICAgICAgICAgfCAgOSArKysrKysrKysKPiAgNSBmaWxlcyBjaGFuZ2VkLCA0 NiBpbnNlcnRpb25zKCspCj4gCj4gZGlmZiAtLWdpdCBhL0RvY3VtZW50YXRpb24vc3lzY3RsL2tl cm5lbC50eHQgYi9Eb2N1bWVudGF0aW9uL3N5c2N0bC9rZXJuZWwudHh0Cj4gaW5kZXggYmFjMjNj MS4uNGU3MWJmZSAxMDA2NDQKPiAtLS0gYS9Eb2N1bWVudGF0aW9uL3N5c2N0bC9rZXJuZWwudHh0 Cj4gKysrIGIvRG9jdW1lbnRhdGlvbi9zeXNjdGwva2VybmVsLnR4dAo+IEBAIC02Myw2ICs2Myw3 IEBAIHNob3cgdXAgaW4gL3Byb2Mvc3lzL2tlcm5lbDoKPiAgLSBwZXJmX2V2ZW50X21heF9zdGFj awo+ICAtIHBlcmZfZXZlbnRfbWF4X2NvbnRleHRzX3Blcl9zdGFjawo+ICAtIHBpZF9tYXgKPiAr LSBwb2xsX3RocmVzaG9sZF9ucyAgICAgICAgWyBYODYgb25seSBdCj4gIC0gcG93ZXJzYXZlLW5h cCAgICAgICAgICAgICAgIFsgUFBDIG9ubHkgXQo+ICAtIHByaW50awo+ICAtIHByaW50a19kZWxh eQo+IEBAIC03MDIsNiArNzAzLDE1IEBAIGtlcm5lbCB0cmllcyB0byBhbGxvY2F0ZSBhIG51bWJl ciBzdGFydGluZyBmcm9tIHRoaXMgb25lLgo+ICAKPiAgPT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KPiAgCj4gK3BvbGxfdGhyZXNo b2xkX25zOiAoWDg2IG9ubHkpCj4gKwo+ICtUaGlzIHBhcmFtZXRlciB1c2VkIHRvIGNvbnRyb2wg dGhlIG1heCB3YWl0IHRpbWUgdG8gcG9sbCBiZWZvcmUgZ29pbmcKPiAraW50byByZWFsIGlkbGUg c3RhdGUuIEJ5IGRlZmF1bHQsIHRoZSB2YWx1ZXMgaXMgMCBtZWFucyBkb24ndCBwb2xsLgo+ICtJ dCBpcyByZWNvbW1lbmRlZCB0byBjaGFuZ2UgdGhlIHZhbHVlIHRvIG5vbi16ZXJvIGlmIHJ1bm5p bmcgbGF0ZW5jeS1ib3VuZAo+ICt3b3JrbG9hZHMgaW4gVk0uCj4gKwo+ICs9PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQo+ICsKPiAg cG93ZXJzYXZlLW5hcDogKFBQQyBvbmx5KQo+ICAKPiAgSWYgc2V0LCBMaW51eC1QUEMgd2lsbCB1 c2UgdGhlICduYXAnIG1vZGUgb2YgcG93ZXJzYXZpbmcsCj4gZGlmZiAtLWdpdCBhL2FyY2gveDg2 L2tlcm5lbC9wcm9jZXNzLmMgYi9hcmNoL3g4Ni9rZXJuZWwvcHJvY2Vzcy5jCj4gaW5kZXggMGJi ODg0Mi4uNjM2MTc4MyAxMDA2NDQKPiAtLS0gYS9hcmNoL3g4Ni9rZXJuZWwvcHJvY2Vzcy5jCj4g KysrIGIvYXJjaC94ODYva2VybmVsL3Byb2Nlc3MuYwo+IEBAIC0zOSw2ICszOSwxMCBAQAo+ICAj aW5jbHVkZSA8YXNtL2Rlc2MuaD4KPiAgI2luY2x1ZGUgPGFzbS9wcmN0bC5oPgo+ICAKPiArI2lm ZGVmIENPTkZJR19IWVBFUlZJU09SX0dVRVNUCj4gK3Vuc2lnbmVkIGxvbmcgcG9sbF90aHJlc2hv bGRfbnM7Cj4gKyNlbmRpZgo+ICsKPiAgLyoKPiAgICogcGVyLUNQVSBUU1Mgc2VnbWVudHMuIFRo cmVhZHMgYXJlIGNvbXBsZXRlbHkgJ3NvZnQnIG9uIExpbnV4LAo+ICAgKiBubyBtb3JlIHBlci10 YXNrIFRTUydzLiBUaGUgVFNTIHNpemUgaXMga2VwdCBjYWNoZWxpbmUtYWxpZ25lZAo+IEBAIC0z MTMsNiArMzE3LDIzIEBAIHN0YXRpYyBpbmxpbmUgdm9pZCBwbGF5X2RlYWQodm9pZCkKPiAgfQo+ ICAjZW5kaWYKPiAgCj4gKyNpZmRlZiBDT05GSUdfSFlQRVJWSVNPUl9HVUVTVAo+ICt2b2lkIGFy Y2hfY3B1X2lkbGVfcG9sbCh2b2lkKQo+ICt7Cj4gKwlrdGltZV90IHN0YXJ0LCBjdXIsIHN0b3A7 Cj4gKwo+ICsJaWYgKHBvbGxfdGhyZXNob2xkX25zKSB7Cj4gKwkJc3RhcnQgPSBjdXIgPSBrdGlt ZV9nZXQoKTsKPiArCQlzdG9wID0ga3RpbWVfYWRkX25zKGt0aW1lX2dldCgpLCBwb2xsX3RocmVz aG9sZF9ucyk7Cj4gKwkJZG8gewo+ICsJCQlpZiAobmVlZF9yZXNjaGVkKCkpCj4gKwkJCQlicmVh azsKPiArCQkJY3VyID0ga3RpbWVfZ2V0KCk7Cj4gKwkJfSB3aGlsZSAoa3RpbWVfYmVmb3JlKGN1 ciwgc3RvcCkpOwo+ICsJfQo+ICt9Cj4gKyNlbmRpZgo+ICsKPiAgdm9pZCBhcmNoX2NwdV9pZGxl X2VudGVyKHZvaWQpCj4gIHsKPiAgCXRzY192ZXJpZnlfdHNjX2FkanVzdChmYWxzZSk7Cj4gZGlm ZiAtLWdpdCBhL2luY2x1ZGUvbGludXgva2VybmVsLmggYi9pbmNsdWRlL2xpbnV4L2tlcm5lbC5o Cj4gaW5kZXggMTNiYzA4YS4uMDRjZjc3NCAxMDA2NDQKPiAtLS0gYS9pbmNsdWRlL2xpbnV4L2tl cm5lbC5oCj4gKysrIGIvaW5jbHVkZS9saW51eC9rZXJuZWwuaAo+IEBAIC00NjAsNiArNDYwLDkg QEAgZXh0ZXJuIF9fc2NhbmYoMiwgMCkKPiAgZXh0ZXJuIGludCBzeXNjdGxfcGFuaWNfb25fc3Rh Y2tvdmVyZmxvdzsKPiAgCj4gIGV4dGVybiBib29sIGNyYXNoX2tleGVjX3Bvc3Rfbm90aWZpZXJz Owo+ICsjaWZkZWYgQ09ORklHX0hZUEVSVklTT1JfR1VFU1QKPiArZXh0ZXJuIHVuc2lnbmVkIGxv bmcgcG9sbF90aHJlc2hvbGRfbnM7Cj4gKyNlbmRpZgo+ICAKPiAgLyoKPiAgICogcGFuaWNfY3B1 IGlzIHVzZWQgZm9yIHN5bmNocm9uaXppbmcgcGFuaWMoKSBhbmQgY3Jhc2hfa2V4ZWMoKSBleGVj dXRpb24uIEl0Cj4gZGlmZiAtLWdpdCBhL2tlcm5lbC9zY2hlZC9pZGxlLmMgYi9rZXJuZWwvc2No ZWQvaWRsZS5jCj4gaW5kZXggMmEyNWE5ZS4uZTc4OWY5OSAxMDA2NDQKPiAtLS0gYS9rZXJuZWwv c2NoZWQvaWRsZS5jCj4gKysrIGIva2VybmVsL3NjaGVkL2lkbGUuYwo+IEBAIC03NCw2ICs3NCw3 IEBAIHN0YXRpYyBub2lubGluZSBpbnQgX19jcHVpZGxlIGNwdV9pZGxlX3BvbGwodm9pZCkKPiAg fQo+ICAKPiAgLyogV2VhayBpbXBsZW1lbnRhdGlvbnMgZm9yIG9wdGlvbmFsIGFyY2ggc3BlY2lm aWMgZnVuY3Rpb25zICovCj4gK3ZvaWQgX193ZWFrIGFyY2hfY3B1X2lkbGVfcG9sbCh2b2lkKSB7 IH0KPiAgdm9pZCBfX3dlYWsgYXJjaF9jcHVfaWRsZV9wcmVwYXJlKHZvaWQpIHsgfQo+ICB2b2lk IF9fd2VhayBhcmNoX2NwdV9pZGxlX2VudGVyKHZvaWQpIHsgfQo+ICB2b2lkIF9fd2VhayBhcmNo X2NwdV9pZGxlX2V4aXQodm9pZCkgeyB9Cj4gQEAgLTIxOSw2ICsyMjAsOCBAQCBzdGF0aWMgdm9p ZCBkb19pZGxlKHZvaWQpCj4gIAkgKi8KPiAgCj4gIAlfX2N1cnJlbnRfc2V0X3BvbGxpbmcoKTsK PiArCWFyY2hfY3B1X2lkbGVfcG9sbCgpOwo+ICsKPiAgCXRpY2tfbm9oel9pZGxlX2VudGVyKCk7 Cj4gIAo+ICAJd2hpbGUgKCFuZWVkX3Jlc2NoZWQoKSkgewo+IGRpZmYgLS1naXQgYS9rZXJuZWwv c3lzY3RsLmMgYi9rZXJuZWwvc3lzY3RsLmMKPiBpbmRleCA0ZGZiYTFhLi45MTc0ZDU3IDEwMDY0 NAo+IC0tLSBhL2tlcm5lbC9zeXNjdGwuYwo+ICsrKyBiL2tlcm5lbC9zeXNjdGwuYwo+IEBAIC0x MjAzLDYgKzEyMDMsMTUgQEAgc3RhdGljIGludCBzeXNycV9zeXNjdGxfaGFuZGxlcihzdHJ1Y3Qg Y3RsX3RhYmxlICp0YWJsZSwgaW50IHdyaXRlLAo+ICAJCS5leHRyYTIJCT0gJm9uZSwKPiAgCX0s Cj4gICNlbmRpZgo+ICsjaWZkZWYgQ09ORklHX0hZUEVSVklTT1JfR1VFU1QKPiArCXsKPiArCQku cHJvY25hbWUJPSAiaGFsdF9wb2xsX3RocmVzaG9sZCIsCj4gKwkJLmRhdGEJCT0gJnBvbGxfdGhy ZXNob2xkX25zLAo+ICsJCS5tYXhsZW4JCT0gc2l6ZW9mKHVuc2lnbmVkIGxvbmcpLAo+ICsJCS5t b2RlCQk9IDA2NDQsCj4gKwkJLnByb2NfaGFuZGxlcgk9IHByb2NfZG9pbnR2ZWMsCj4gKwl9LAo+ ICsjZW5kaWYKPiAgCXsgfQo+ICB9Owo+ICAKPiAtLSAKPiAxLjguMy4xCi0tLQpUbyB1bnN1YnNj cmliZSBmcm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAidW5zdWJzY3JpYmUgbGludXgtZWRh YyIgaW4KdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcK TW9yZSBtYWpvcmRvbW8gaW5mbyBhdCAgaHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRvbW8t aW5mby5odG1sCg== From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [PATCH 1/2] x86/idle: add halt poll for halt idle Date: Wed, 16 Aug 2017 07:04:53 +0300 Message-ID: <20170816070305-mutt-send-email-mst@kernel.org> References: <1498130534-26568-1-git-send-email-root@ip-172-31-39-62.us-west-2.compute.internal> <1498130534-26568-2-git-send-email-root@ip-172-31-39-62.us-west-2.compute.internal> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, pbonzini@redhat.com, x86@kernel.org, corbet@lwn.net, tony.luck@intel.com, bp@alien8.de, peterz@infradead.org, mchehab@kernel.org, akpm@linux-foundation.org, krzk@kernel.org, jpoimboe@redhat.com, luto@kernel.org, borntraeger@de.ibm.com, thgarnie@google.com, rgerst@gmail.com, minipli@googlemail.com, douly.fnst@cn.fujitsu.com, nicstange@gmail.com, fweisbec@gmail.com, dvlasenk@redhat.com, bristot@redhat.com, yamada.masahiro@socionext.com, mika.westerberg@linux.intel.com, yu.c.chen@intel.com, aaron.lu@intel.com, rostedt@goodmis.org, me@kylehuey.com, len.brown@intel.com, prarit@redhat.com, hidehiro.kawai.ez@hitachi.com, fengtiantian@huawei.com, pmladek@suse.com, jeyu@redhat.com, Larry.Finger@lwfinger.net, zijun_hu@htc.com, luisbg@osg.s To: root Return-path: Content-Disposition: inline In-Reply-To: <1498130534-26568-2-git-send-email-root@ip-172-31-39-62.us-west-2.compute.internal> Sender: linux-doc-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On Thu, Jun 22, 2017 at 11:22:13AM +0000, root wrote: > From: Yang Zhang > > This patch introduce a new mechanism to poll for a while before > entering idle state. > > David has a topic in KVM forum to describe the problem on current KVM VM > when running some message passing workload in KVM forum. Also, there > are some work to improve the performance in KVM, like halt polling in KVM. > But we still has 4 MSR wirtes and HLT vmexit when going into halt idle > which introduce lot of latency. > > Halt polling in KVM provide the capbility to not schedule out VCPU when > it is the only task in this pCPU. Unlike it, this patch will let VCPU polls > for a while if there is no work inside VCPU to elimiate heavy vmexit during > in/out idle. The potential impact is it will cost more CPU cycle since we > are doing polling and may impact other task which waiting on the same > physical CPU in host. I wonder whether you considered doing this in an idle driver. I have a prototype patch combining this with mwait within guest - I can post it if you are interested. > Here is the data i get when running benchmark contextswitch > (https://github.com/tsuna/contextswitch) > > before patch: > 2000000 process context switches in 4822613801ns (2411.3ns/ctxsw) > > after patch: > 2000000 process context switches in 3584098241ns (1792.0ns/ctxsw) > > Signed-off-by: Yang Zhang > --- > Documentation/sysctl/kernel.txt | 10 ++++++++++ > arch/x86/kernel/process.c | 21 +++++++++++++++++++++ > include/linux/kernel.h | 3 +++ > kernel/sched/idle.c | 3 +++ > kernel/sysctl.c | 9 +++++++++ > 5 files changed, 46 insertions(+) > > diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt > index bac23c1..4e71bfe 100644 > --- a/Documentation/sysctl/kernel.txt > +++ b/Documentation/sysctl/kernel.txt > @@ -63,6 +63,7 @@ show up in /proc/sys/kernel: > - perf_event_max_stack > - perf_event_max_contexts_per_stack > - pid_max > +- poll_threshold_ns [ X86 only ] > - powersave-nap [ PPC only ] > - printk > - printk_delay > @@ -702,6 +703,15 @@ kernel tries to allocate a number starting from this one. > > ============================================================== > > +poll_threshold_ns: (X86 only) > + > +This parameter used to control the max wait time to poll before going > +into real idle state. By default, the values is 0 means don't poll. > +It is recommended to change the value to non-zero if running latency-bound > +workloads in VM. > + > +============================================================== > + > powersave-nap: (PPC only) > > If set, Linux-PPC will use the 'nap' mode of powersaving, > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index 0bb8842..6361783 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -39,6 +39,10 @@ > #include > #include > > +#ifdef CONFIG_HYPERVISOR_GUEST > +unsigned long poll_threshold_ns; > +#endif > + > /* > * per-CPU TSS segments. Threads are completely 'soft' on Linux, > * no more per-task TSS's. The TSS size is kept cacheline-aligned > @@ -313,6 +317,23 @@ static inline void play_dead(void) > } > #endif > > +#ifdef CONFIG_HYPERVISOR_GUEST > +void arch_cpu_idle_poll(void) > +{ > + ktime_t start, cur, stop; > + > + if (poll_threshold_ns) { > + start = cur = ktime_get(); > + stop = ktime_add_ns(ktime_get(), poll_threshold_ns); > + do { > + if (need_resched()) > + break; > + cur = ktime_get(); > + } while (ktime_before(cur, stop)); > + } > +} > +#endif > + > void arch_cpu_idle_enter(void) > { > tsc_verify_tsc_adjust(false); > diff --git a/include/linux/kernel.h b/include/linux/kernel.h > index 13bc08a..04cf774 100644 > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -460,6 +460,9 @@ extern __scanf(2, 0) > extern int sysctl_panic_on_stackoverflow; > > extern bool crash_kexec_post_notifiers; > +#ifdef CONFIG_HYPERVISOR_GUEST > +extern unsigned long poll_threshold_ns; > +#endif > > /* > * panic_cpu is used for synchronizing panic() and crash_kexec() execution. It > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > index 2a25a9e..e789f99 100644 > --- a/kernel/sched/idle.c > +++ b/kernel/sched/idle.c > @@ -74,6 +74,7 @@ static noinline int __cpuidle cpu_idle_poll(void) > } > > /* Weak implementations for optional arch specific functions */ > +void __weak arch_cpu_idle_poll(void) { } > void __weak arch_cpu_idle_prepare(void) { } > void __weak arch_cpu_idle_enter(void) { } > void __weak arch_cpu_idle_exit(void) { } > @@ -219,6 +220,8 @@ static void do_idle(void) > */ > > __current_set_polling(); > + arch_cpu_idle_poll(); > + > tick_nohz_idle_enter(); > > while (!need_resched()) { > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 4dfba1a..9174d57 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -1203,6 +1203,15 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write, > .extra2 = &one, > }, > #endif > +#ifdef CONFIG_HYPERVISOR_GUEST > + { > + .procname = "halt_poll_threshold", > + .data = &poll_threshold_ns, > + .maxlen = sizeof(unsigned long), > + .mode = 0644, > + .proc_handler = proc_dointvec, > + }, > +#endif > { } > }; > > -- > 1.8.3.1