All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-pm@lists.linux-foundation.org,
	Radim Krcmar <rkrcmar@redhat.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>
Subject: Re: [patch 0/3] KVM CPU frequency change hypercalls (resend)
Date: Thu, 2 Mar 2017 10:59:43 -0300	[thread overview]
Message-ID: <20170302135940.GA19287@amt.cnet> (raw)
In-Reply-To: <bb9c6f9e-fff1-a452-0f02-cf8311f0e27f@redhat.com>

On Thu, Mar 02, 2017 at 11:15:00AM +0100, Paolo Bonzini wrote:
> 
> 
> On 01/03/2017 16:04, Marcelo Tosatti wrote:
> > 
> > Paolo: please comment on your objections and what should
> > be done instead. Note the case "multiple vcpus 
> > on a given pcpu" is not part of the usecase in question.
> 
> I would like to understand the intended usecase of cpufreq-userspace.
> 
> My understanding is that you would have a daemon handling a systemwide
> policy; examples are the historical (and now obsolete) users such as
> cpufreqd, cpudyn, powernowd, or cpuspeed.
> 
> The user alternatively can play the role of the daemon by writing to
> sysfs, but I've never seen userspace tasks talking to cpufreq-userspace
> to set their own running frequency.
>
> Apparently DPDK does that, and I would like to know the opinion of the
> linux-pm folks; 

Only through the number of in-use RX/TX queue entries you can correctly 
set the processor frequency (for this use case where only the network
processing is being performed by the machine).

>  one obvious downside is that any application that you
> run after DPDK will have its CPU frequency hardcoded to something that
> is not appropriate.  

To isolate the CPU where DPDK runs it is already necessary to perform
special procedures such as changing the cpumask of other tasks, changing
cpumask of interrupt handlers (to remove the isolated CPU from that
cpumask), etc. Changing the cpufreq governor to userspace is another
step of that setup phase.

On shutdown (or CPU unpin), you can switch back the CPU to the previous
governor, which can switch the frequency to whatever it finds suitable.

> This might be acceptable for DPDK, but it is worse
> for KVM which tries to provide isolation to its vCPU tasks.

Well in this case you know the only program which executes
on the CPU is handling of network packets and therefore you allow
that program to control the frequency.

> Here are two possibilities that I could think of:
> 
> 1) Introduce a mechanism that allows a task to override the governor's
> choice of CPU frequency.  This could be a ioctl, a prctl, a cgroup-based
> mechanism or whatever else.  As Marcelo pointed out in the original kvm@
> thread, the latency and overhead of switching frequencies make it
> impractical to associate a desired CPU frequency with a task, because
> multiple tasks could be requesting a given frequency.  One possibility
> could be to treat the per-task CPU frequency as advisory

DPDK can't afford the frequency as advisory: failure in setting the
processor frequency when requested means dropped packets (not 
dropping packets being a requirement).

>  and only obey
> it in restricted cases---for example only if nohz_full is in effect.

>From cpufreq documentation:

"On all other cpufreq implementations, these boundaries still need to
be set. Then, a "governor" must be selected. Such a "governor" decides
what speed the processor shall run within the boundaries. One such
"governor" is the "userspace" governor. This one allows the user - or
a yet-to-implement userspace program - to decide what specific speed
the processor shall run at."

(it seems the cpufreq-hypercall+cpufreq-userspace combination is in 
accord with what cpufreq-userspace has been designed for).

Secondly, setting frequencies for multiple tasks is somewhat
contradictory:

In the DPDK context, or in any context actually, it makes sense for a
program to lower processor frequency when it decides the current 
frequency is sufficient to handle the job: that is lowering the
frequency will still make it possible to handle the load.

With multiple applications sharing that processor, the percentage 
of time given to a certain application also interferes with the
time it spends handling the job. So the other variable that 
affects "instructions per second" is timeslice given to the
task by the scheduler, not only "frequency".

Having a task request for a particular frequency in that case becomes
ambiguous: you could be asking for "increased timeslice".

  reply	other threads:[~2017-03-02 23:17 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-01 15:04 [patch 0/3] KVM CPU frequency change hypercalls (resend) Marcelo Tosatti
2017-03-01 15:04 ` [patch 1/3] cpufreq: implement min/max/up/down functions Marcelo Tosatti
2017-03-01 15:04 ` [patch 2/3] KVM: x86: introduce ioctl to allow frequency hypercalls Marcelo Tosatti
2017-03-01 15:04 ` [patch 3/3] KVM: x86: frequency change hypercalls Marcelo Tosatti
2017-03-02 10:15 ` [patch 0/3] KVM CPU frequency change hypercalls (resend) Paolo Bonzini
2017-03-02 13:59   ` Marcelo Tosatti [this message]
2017-03-14 16:40     ` Paolo Bonzini
2017-03-14 23:27       ` Marcelo Tosatti
2017-03-15  8:23         ` Paolo Bonzini
2017-03-15 18:30           ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170302135940.GA19287@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pm@lists.linux-foundation.org \
    --cc=pbonzini@redhat.com \
    --cc=rafael@kernel.org \
    --cc=rkrcmar@redhat.com \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.