From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21C34C04AAC for ; Mon, 20 May 2019 11:52:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E1EC020645 for ; Mon, 20 May 2019 11:52:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732067AbfETLwB (ORCPT ); Mon, 20 May 2019 07:52:01 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:37510 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731093AbfETLwB (ORCPT ); Mon, 20 May 2019 07:52:01 -0400 Received: by mail-wr1-f65.google.com with SMTP id e15so14263534wrs.4 for ; Mon, 20 May 2019 04:51:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Q2PEIiIF1A7rTvGUAk7SsEfg651tnJ59JNCgRp3FyYI=; b=rbwwIXSiBkBz0Mq9KkIcJ6pmR13WS+KTLwzZ+7lq5oUkLqWobxZ4aklcQSnEKcpZI4 KL20b7pDxQgiLjKTogInx3JoHc8NhCj88YFzBwhYykZaWD+86Sn8PlABi7tdiQgdQWHG QxMwGBtSPCecQ7EKx7CJC/d8QDn5qxPCHVz2NnmidCyNP4RzZ/2dxzu/DHygB6LsprXF +DZoYlHIrfMgXWE2gaQ3trekofn/5/HbPSCFUYCv7Mb8Irbnjh2cxuQAbS/WhlbZD8Wy drjl7i3xVTAnn4BEg8S/Q4tTjsgeE5PwL9PO9YQwYFNUApgyXomPG5RBzzodKvpEg1mC /8hw== X-Gm-Message-State: APjAAAVQvUXb9kHaZ3aABjVRGulQzEn29vLXU48W6zlzZ/qMfkrDgbVa aGBuMZva09PXq70B6NE6SUSnHA== X-Google-Smtp-Source: APXvYqw1PaGad9NLGx5TkoGsuLd/el0us2a6use+EdpS0QqLSXpFCA2iM3HoF6r9/pdvFvxtAkscFA== X-Received: by 2002:adf:9022:: with SMTP id h31mr17708835wrh.46.1558353119121; Mon, 20 May 2019 04:51:59 -0700 (PDT) Received: from ?IPv6:2001:b07:6468:f312:ac04:eef9:b257:b844? ([2001:b07:6468:f312:ac04:eef9:b257:b844]) by smtp.gmail.com with ESMTPSA id t66sm9081410wmf.39.2019.05.20.04.51.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 04:51:58 -0700 (PDT) Subject: Re: [PATCH] x86: add cpuidle_kvm driver to allow guest side halt polling To: Marcelo Tosatti , kvm-devel Cc: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Andrea Arcangeli , "Rafael J. Wysocki" , Peter Zijlstra , Wanpeng Li , Konrad Rzeszutek Wilk , "Raslan, KarimAllah" , Boris Ostrovsky , Ankur Arora References: <20190517174857.GA8611@amt.cnet> From: Paolo Bonzini Message-ID: Date: Mon, 20 May 2019 13:51:57 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <20190517174857.GA8611@amt.cnet> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On 17/05/19 19:48, Marcelo Tosatti wrote: > > The cpuidle_kvm driver allows the guest vcpus to poll for a specified > amount of time before halting. This provides the following benefits > to host side polling: > > 1) The POLL flag is set while polling is performed, which allows > a remote vCPU to avoid sending an IPI (and the associated > cost of handling the IPI) when performing a wakeup. > > 2) The HLT VM-exit cost can be avoided. > > The downside of guest side polling is that polling is performed > even with other runnable tasks in the host. > > Results comparing halt_poll_ns and server/client application > where a small packet is ping-ponged: > > host --> 31.33 > halt_poll_ns=300000 / no guest busy spin --> 33.40 (93.8%) > halt_poll_ns=0 / guest_halt_poll_ns=300000 --> 32.73 (95.7%) > > For the SAP HANA benchmarks (where idle_spin is a parameter > of the previous version of the patch, results should be the > same): > > hpns == halt_poll_ns > > idle_spin=0/ idle_spin=800/ idle_spin=0/ > hpns=200000 hpns=0 hpns=800000 > DeleteC06T03 (100 thread) 1.76 1.71 (-3%) 1.78 (+1%) > InsertC16T02 (100 thread) 2.14 2.07 (-3%) 2.18 (+1.8%) > DeleteC00T01 (1 thread) 1.34 1.28 (-4.5%) 1.29 (-3.7%) > UpdateC00T03 (1 thread) 4.72 4.18 (-12%) 4.53 (-5%) Hi Marcelo, some quick observations: 1) This is actually not KVM-specific, so the name and placement of the docs should be adjusted. 2) Regarding KVM-specific code, however, we could add an MSR so that KVM disables halt_poll_ns for this VM when this is active in the guest? 3) The spin time could use the same adaptive algorithm that KVM uses in the host. Thanks, Paolo > --- > Documentation/virtual/kvm/guest-halt-polling.txt | 39 ++++++++ > arch/x86/Kconfig | 9 + > arch/x86/kernel/Makefile | 1 > arch/x86/kernel/cpuidle_kvm.c | 105 +++++++++++++++++++++++ > arch/x86/kernel/process.c | 2 > 5 files changed, 155 insertions(+), 1 deletion(-) > > Index: linux-2.6.git/arch/x86/Kconfig > =================================================================== > --- linux-2.6.git.orig/arch/x86/Kconfig 2019-04-22 13:49:42.858303265 -0300 > +++ linux-2.6.git/arch/x86/Kconfig 2019-05-16 14:18:41.254852745 -0300 > @@ -805,6 +805,15 @@ > underlying device model, the host provides the guest with > timing infrastructure such as time of day, and system time > > +config KVM_CPUIDLE > + tristate "KVM cpuidle driver" > + depends on KVM_GUEST > + default y > + help > + This option enables KVM cpuidle driver, which allows to poll > + before halting in the guest (more efficient than polling in the > + host via halt_poll_ns for some scenarios). > + > config PVH > bool "Support for running PVH guests" > ---help--- > Index: linux-2.6.git/arch/x86/kernel/Makefile > =================================================================== > --- linux-2.6.git.orig/arch/x86/kernel/Makefile 2019-04-22 13:49:42.869303331 -0300 > +++ linux-2.6.git/arch/x86/kernel/Makefile 2019-05-17 12:59:51.673274881 -0300 > @@ -112,6 +112,7 @@ > obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o > > obj-$(CONFIG_KVM_GUEST) += kvm.o kvmclock.o > +obj-$(CONFIG_KVM_CPUIDLE) += cpuidle_kvm.o > obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_$(BITS).o > obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o > obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o > Index: linux-2.6.git/arch/x86/kernel/process.c > =================================================================== > --- linux-2.6.git.orig/arch/x86/kernel/process.c 2019-04-22 13:49:42.876303374 -0300 > +++ linux-2.6.git/arch/x86/kernel/process.c 2019-05-17 13:19:18.055435117 -0300 > @@ -580,7 +580,7 @@ > safe_halt(); > trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); > } > -#ifdef CONFIG_APM_MODULE > +#if defined(CONFIG_APM_MODULE) || defined(CONFIG_KVM_CPUIDLE_MODULE) > EXPORT_SYMBOL(default_idle); > #endif > > Index: linux-2.6.git/arch/x86/kernel/cpuidle_kvm.c > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux-2.6.git/arch/x86/kernel/cpuidle_kvm.c 2019-05-17 13:38:02.553941356 -0300 > @@ -0,0 +1,105 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * cpuidle driver for KVM guests. > + * > + * Copyright 2019 Red Hat, Inc. and/or its affiliates. > + * > + * This work is licensed under the terms of the GNU GPL, version 2. See > + * the COPYING file in the top-level directory. > + * > + * Authors: Marcelo Tosatti > + */ > + > +#include > +#include > +#include > +#include > +#include > + > +unsigned int guest_halt_poll_ns; > +module_param(guest_halt_poll_ns, uint, 0644); > + > +static int kvm_enter_idle(struct cpuidle_device *dev, > + struct cpuidle_driver *drv, int index) > +{ > + int do_halt = 0; > + > + /* No polling */ > + if (guest_halt_poll_ns == 0) { > + if (current_clr_polling_and_test()) { > + local_irq_enable(); > + return index; > + } > + default_idle(); > + return index; > + } > + > + local_irq_enable(); > + if (!current_set_polling_and_test()) { > + ktime_t now, end_spin; > + > + now = ktime_get(); > + end_spin = ktime_add_ns(now, guest_halt_poll_ns); > + > + while (!need_resched()) { > + cpu_relax(); > + now = ktime_get(); > + > + if (!ktime_before(now, end_spin)) { > + do_halt = 1; > + break; > + } > + } > + } > + > + if (do_halt) { > + /* > + * No events while busy spin window passed, > + * halt. > + */ > + local_irq_disable(); > + if (current_clr_polling_and_test()) { > + local_irq_enable(); > + return index; > + } > + default_idle(); > + } else { > + current_clr_polling(); > + } > + > + return index; > +} > + > +static struct cpuidle_driver kvm_idle_driver = { > + .name = "kvm_idle", > + .owner = THIS_MODULE, > + .states = { > + { /* entry 0 is for polling */ }, > + { > + .enter = kvm_enter_idle, > + .exit_latency = 0, > + .target_residency = 0, > + .power_usage = -1, > + .name = "KVM", > + .desc = "KVM idle", > + }, > + }, > + .safe_state_index = 0, > + .state_count = 2, > +}; > + > +static int __init kvm_cpuidle_init(void) > +{ > + return cpuidle_register(&kvm_idle_driver, NULL); > +} > + > +static void __exit kvm_cpuidle_exit(void) > +{ > + cpuidle_unregister(&kvm_idle_driver); > +} > + > +module_init(kvm_cpuidle_init); > +module_exit(kvm_cpuidle_exit); > +MODULE_LICENSE("GPL"); > +MODULE_AUTHOR("Marcelo Tosatti "); > + > Index: linux-2.6.git/Documentation/virtual/kvm/guest-halt-polling.txt > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux-2.6.git/Documentation/virtual/kvm/guest-halt-polling.txt 2019-05-17 13:36:39.274703710 -0300 > @@ -0,0 +1,39 @@ > +KVM guest halt polling > +====================== > + > +The cpuidle_kvm driver allows the guest vcpus to poll for a specified > +amount of time before halting. This provides the following benefits > +to host side polling: > + > + 1) The POLL flag is set while polling is performed, which allows > + a remote vCPU to avoid sending an IPI (and the associated > + cost of handling the IPI) when performing a wakeup. > + > + 2) The HLT VM-exit cost can be avoided. > + > +The downside of guest side polling is that polling is performed > +even with other runnable tasks in the host. > + > +Module Parameters > +================= > + > +The cpuidle_kvm module has 1 tuneable module parameter: guest_halt_poll_ns, > +the amount of time, in nanoseconds, that polling is performed before > +halting. > + > +This module parameter can be set from the debugfs files in: > + > + /sys/module/cpuidle_kvm/parameters/ > + > +Further Notes > +============= > + > +- Care should be taken when setting the guest_halt_poll_ns parameter as a > +large value has the potential to drive the cpu usage to 100% on a machine which > +would be almost entirely idle otherwise. > + > +- The effective amount of time that polling is performed is the host poll > +value (see halt-polling.txt) plus guest_halt_poll_ns. If all guests > +on a host system support and have properly configured guest_halt_poll_ns, > +then setting halt_poll_ns to 0 in the host is probably the best choice. > + >