All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org, linux-pm@lists.linux-foundation.org,
	Radim Krcmar <rkrcmar@redhat.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>
Subject: Re: [patch 0/3] KVM CPU frequency change hypercalls (resend)
Date: Tue, 14 Mar 2017 17:40:21 +0100	[thread overview]
Message-ID: <fa831f8a-e9f7-f431-a1bc-9e9e0dda6d44@redhat.com> (raw)
In-Reply-To: <20170302135940.GA19287@amt.cnet>



On 02/03/2017 14:59, Marcelo Tosatti wrote:
> On Thu, Mar 02, 2017 at 11:15:00AM +0100, Paolo Bonzini wrote:
>>  one obvious downside is that any application that you
>> run after DPDK will have its CPU frequency hardcoded to something that
>> is not appropriate.  
> 
> To isolate the CPU where DPDK runs it is already necessary to perform
> special procedures such as changing the cpumask of other tasks, changing
> cpumask of interrupt handlers (to remove the isolated CPU from that
> cpumask), etc. Changing the cpufreq governor to userspace is another
> step of that setup phase.
> 
> On shutdown (or CPU unpin), you can switch back the CPU to the previous
> governor, which can switch the frequency to whatever it finds suitable.

But I thought that one of the reasons to do NFV is to simplify this
setup.  If you now have to do the same thing on virtual machines, things
become more complicated to set up, and I don't think that NFV virtual
machines are _that_ special.

In addition, in the list of setup steps above you forgot "chmod the
sysfs files for cpufreq so that DPDK can access it".  Doing that chmod
is a very explicit act, and that's unlike the functionality of this patch.

By letting virtual machines do the same with a simple hypercall, you're
giving powers to whoever opens /dev/kvm that they didn't have before
(unless the userspace process also had access to sysfs).  Worse, the
effects last beyond the moment /dev/kvm is closed.

So, the question then is how to design the hypervisor so that these NFV
virtual machines can play with cpufreq, but there are no adverse
indefinite effects.  One possibility is to have some kind of per-task
cpufreq.  Another is to do everything in userspace with virtual ACPI
P-states and the userspace governor in the VM.

I was hoping to get more feedback from linux-pm.

>> Here are two possibilities that I could think of:
>>
>> 1) Introduce a mechanism that allows a task to override the governor's
>> choice of CPU frequency.  This could be a ioctl, a prctl, a cgroup-based
>> mechanism or whatever else.  As Marcelo pointed out in the original kvm@
>> thread, the latency and overhead of switching frequencies make it
>> impractical to associate a desired CPU frequency with a task, because
>> multiple tasks could be requesting a given frequency.  One possibility
>> could be to treat the per-task CPU frequency as advisory
> 
> DPDK can't afford the frequency as advisory: failure in setting the
> processor frequency when requested means dropped packets (not 
> dropping packets being a requirement).

It can be advisory if you document a proper configuration where it's obeyed.

Paolo

>>  and only obey
>> it in restricted cases---for example only if nohz_full is in effect.
> 
> From cpufreq documentation:
> 
> "On all other cpufreq implementations, these boundaries still need to
> be set. Then, a "governor" must be selected. Such a "governor" decides
> what speed the processor shall run within the boundaries. One such
> "governor" is the "userspace" governor. This one allows the user - or
> a yet-to-implement userspace program - to decide what specific speed
> the processor shall run at."
> 
> (it seems the cpufreq-hypercall+cpufreq-userspace combination is in 
> accord with what cpufreq-userspace has been designed for).
> 
> Secondly, setting frequencies for multiple tasks is somewhat
> contradictory:
> 
> In the DPDK context, or in any context actually, it makes sense for a
> program to lower processor frequency when it decides the current 
> frequency is sufficient to handle the job: that is lowering the
> frequency will still make it possible to handle the load.
> 
> With multiple applications sharing that processor, the percentage 
> of time given to a certain application also interferes with the
> time it spends handling the job. So the other variable that 
> affects "instructions per second" is timeslice given to the
> task by the scheduler, not only "frequency".
> 
> Having a task request for a particular frequency in that case becomes
> ambiguous: you could be asking for "increased timeslice".

  reply	other threads:[~2017-03-14 16:40 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-01 15:04 [patch 0/3] KVM CPU frequency change hypercalls (resend) Marcelo Tosatti
2017-03-01 15:04 ` [patch 1/3] cpufreq: implement min/max/up/down functions Marcelo Tosatti
2017-03-01 15:04 ` [patch 2/3] KVM: x86: introduce ioctl to allow frequency hypercalls Marcelo Tosatti
2017-03-01 15:04 ` [patch 3/3] KVM: x86: frequency change hypercalls Marcelo Tosatti
2017-03-02 10:15 ` [patch 0/3] KVM CPU frequency change hypercalls (resend) Paolo Bonzini
2017-03-02 13:59   ` Marcelo Tosatti
2017-03-14 16:40     ` Paolo Bonzini [this message]
2017-03-14 23:27       ` Marcelo Tosatti
2017-03-15  8:23         ` Paolo Bonzini
2017-03-15 18:30           ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa831f8a-e9f7-f431-a1bc-9e9e0dda6d44@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pm@lists.linux-foundation.org \
    --cc=mtosatti@redhat.com \
    --cc=rafael@kernel.org \
    --cc=rkrcmar@redhat.com \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.