From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751983AbdBCSZA (ORCPT ); Fri, 3 Feb 2017 13:25:00 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34074 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751159AbdBCSY4 (ORCPT ); Fri, 3 Feb 2017 13:24:56 -0500 Date: Fri, 3 Feb 2017 16:24:25 -0200 From: Marcelo Tosatti To: Radim Krcmar Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Paolo Bonzini , "Rafael J. Wysocki" , Viresh Kumar Subject: Re: [patch 3/3] KVM: x86: frequency change hypercalls Message-ID: <20170203182423.GB1869@amt.cnet> References: <20170202174755.946578704@redhat.com> <20170202174921.983532379@redhat.com> <20170203174033.GB15128@potion> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170203174033.GB15128@potion> User-Agent: Mutt/1.5.21 (2010-09-15) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Fri, 03 Feb 2017 18:24:57 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 03, 2017 at 06:40:34PM +0100, Radim Krcmar wrote: > 2017-02-02 15:47-0200, Marcelo Tosatti: > > Implement min/max/up/down frequency change > > KVM hypercalls. To be used by DPDK implementation. > > > > Also allow such hypercalls from guest userspace. > > > > Signed-off-by: Marcelo Tosatti > > > > --- > > Index: kvm-pvfreq/arch/x86/kvm/x86.c > > =================================================================== > > --- kvm-pvfreq.orig/arch/x86/kvm/x86.c 2017-02-02 11:17:17.063756725 -0200 > > +++ kvm-pvfreq/arch/x86/kvm/x86.c 2017-02-02 11:17:17.822752510 -0200 > > @@ -6219,10 +6219,58 @@ > > [Here lived copy-paste.] > > > int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) > > { > > unsigned long nr, a0, a1, a2, a3, ret; > > int op_64_bit, r; > > + bool cpl_check; > > > > r = kvm_skip_emulated_instruction(vcpu); > > > > @@ -6246,7 +6294,13 @@ > > a3 &= 0xFFFFFFFF; > > } > > > > - if (kvm_x86_ops->get_cpl(vcpu) != 0) { > > + cpl_check = true; > > + if (nr == KVM_HC_FREQ_UP || nr == KVM_HC_FREQ_DOWN || > > + nr == KVM_HC_FREQ_MIN || nr == KVM_HC_FREQ_MAX) > > + if (vcpu->arch.allow_freq_hypercall == true) > > + cpl_check = false; > > + > > + if (cpl_check == true && kvm_x86_ops->get_cpl(vcpu) != 0) { > > ret = -KVM_EPERM; > > goto out; > > } > > @@ -6262,6 +6316,21 @@ > > case KVM_HC_CLOCK_PAIRING: > > ret = kvm_pv_clock_pairing(vcpu, a0, a1); > > break; > > +#ifdef CONFIG_CPU_FREQ_GOV_USERSPACE > > CONFIG_CPU_FREQ_GOV_USERSPACE should be checked when enabling the > capability. > > > + case KVM_HC_FREQ_UP: > > + ret = kvm_pvfreq_up(vcpu); > > + break; > > + case KVM_HC_FREQ_DOWN: > > + ret = kvm_pvfreq_down(vcpu); > > + break; > > + case KVM_HC_FREQ_MAX: > > + ret = kvm_pvfreq_max(vcpu); > > + break; > > + case KVM_HC_FREQ_MIN: > > + ret = kvm_pvfreq_min(vcpu); > > + break; > > Having 4 hypercalls for this is an overkill. > You can make it one hypercall with an argument. Fine. > And the argument doesn't have to be enum {UP, DOWN, MAX, MIN}, but an > int, which would also allow you to do -2 steps. Are you suggesting to have an integer to signify the number of steps up or down. > A number over the capabilites of stepping would just map to MAX/MIN. Then MAX == any positive value above the number of steps MIN == any negative value below the negative of number of steps Sure. > Avoiding an absolute scale for interface simplifies migration, where the > guest cannot really depend much on this. Except that calling it with > MIN (INT_MIN) will get the minimum and MAX (INT_MAX) the maximum > frequency. Are you suggesting for the hypercall to return the maximum/minimum frequency if called with the highest integer and lowest negative integer respectively? (That same hypercall). Sure. > Plese explictly say in documentation that things like the number of > steps, which the guest can learn by doing MAX and then -1 until the > hypercall fails, is undefined and should not be depended upon. Sure, because it fails over migration. > Userspace might still want know the number of steps to avoid useless > hypercall -- I think we should return a different value when the limit > is reached, not just after the guest wants to go past it. Are you suggesting to return a different value when going from max-1 -> max and min+1 -> min frequencies? Fine. > > +#endif > > + > > default: > > ret = -KVM_ENOSYS; > > break; > > And thinking more about migration, userspace cannot learn the current > frequency (at least MIN/MAX), so the new host will just pick at random, > which will break userspace's expectations that it cannot increase or > decrease the frequency. Is migration left for the future, because DPDK > doesn't migrate anyway? > > Thanks. The new host should start with the highest frequency always. Then the frequency tuning algorithm can reduce frequency afterwards. Migration is a desired feature for DPDK, so it should be supported (thats one reason why virtio-net drivers are used in the guest BTW).