All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joelaf@google.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Linux PM <linux-pm@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Len Brown <lenb@kernel.org>,
	"Rafael J . Wysocki" <rjw@rjwysocki.net>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Ingo Molnar <mingo@redhat.com>, Juri Lelli <Juri.Lelli@arm.com>,
	Patrick Bellasi <patrick.bellasi@arm.com>
Subject: Re: [PATCH v2 1/2] cpufreq: Make iowait boost a policy option
Date: Sat, 10 Jun 2017 01:08:18 -0700	[thread overview]
Message-ID: <CAJWu+opua5nw5w=hUUNWVWekV_6HBPb1mM3PgR8VN=GcoTwMgg@mail.gmail.com> (raw)
In-Reply-To: <CAJWu+op_QRJWt=L=gGOk15pryD+1pm5xk9q90yt9wD1insdEzA@mail.gmail.com>

Adding Juri and Patrick as well to share any thoughts. Replied to
Peter in the end of this email.

On Wed, May 24, 2017 at 1:17 PM, Joel Fernandes <joelaf@google.com> wrote:
> Hi Peter,
>
> On Mon, May 22, 2017 at 1:21 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> [..]
>>> On Fri, May 19, 2017 at 2:42 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> > On Thu, May 18, 2017 at 11:23:43PM -0700, Joel Fernandes wrote:
>>> >> Make iowait boost a cpufreq policy option and enable it for intel_pstate
>>> >> cpufreq driver. Governors like schedutil can use it to determine if
>>> >> boosting for tasks that wake up with p->in_iowait set is needed.
>>> >
>>> > Rather than just flat out disabling the option, is there something
>>> > better we can do on ARM?
>>> >
>>> > The reason for the IO-wait boost is to ensure we feed our external
>>> > devices data ASAP, this reduces wait times, increases throughput and
>>> > decreases the duration the devices have to operate.
>>>
>>> Can you help understand how CPU frequency can affect I/O? The ASAP
>>> makes me think of it as a latency thing than a throughput in which
>>> case there should a scheduling priority increase? Also, to me it
>>> sounds more like memory instead of CPU frequency should be boosted
>>> instead so that DMA transfers happen quicker to feed devices data
>>> faster.
>>
>> Suppose your (I/O) device has the task waiting for a completion for 1ms
>> for each request. Further suppose that feeding it the next request takes
>> .1ms at full speed (1 GHz).
>>
>> Then we get, without contending tasks, a cycle of:
>>
>>
>>  R----------R----------                                 (1 GHz)
>>
>>
>> Which comes at 1/11-th utilization, which would then select something
>> like 100 MHz as being sufficient. But then the R part becomes 10x longer
>> and we end up with:
>>
>>
>>  RRRRRRRRRR----------RRRRRRRRRR----------               (100 MHz)
>>
>>
>> And since there's still plenty idle time, and the effective utilization
>> is still the same 1/11th, we'll not ramp up at all and continue in this
>> cycle.
>>
>> Note however that the total time of the cycle increased from 1.1ms
>> to 2ms, for an ~80% decrease in throughput.
>
> Got it, thanks for the explanation.
>
>>> Are you trying to boost the CPU frequency so that a process waiting on
>>> I/O does its next set of processing quickly enough after iowaiting on
>>> the previous I/O transaction, and is ready to feed I/O the next time
>>> sooner?
>>
>> This. So we break the above pattern by boosting the task that wakes from
>> IO-wait. Its utilization will never be enough to cause a significant
>> bump in frequency on its own, as its constantly blocked on the IO
>> device.
>
> It sounds like this problem can happen with any other use-case where
> one task blocks on the other, not just IO. Like a case where 2 tasks
> running on different CPUs block on a mutex, then on either task can
> wait on the other causing their utilization to be low right?
>
>>> The case I'm seeing a lot is a background thread does I/O request and
>>> blocks for short period, and wakes up. All this while the CPU
>>> frequency is low, but that wake up causes a spike in frequency. So
>>> over a period of time, you see these spikes that don't really help
>>> anything.
>>
>> So the background thread is doing some spurious IO but nothing
>> consistent?
>
> Yes, its not a consistent pattern. Its actually a 'kworker' that woke
> up to read/write something related to the video being played by the
> YouTube app and is asynchronous to the app itself. It could be writing
> to the logs or other information. But this definitely not a consistent
> pattern as in the use case you described but intermittent spikes. The
> frequency boosts don't help the actual activity of playing the video
> except increasing power.
>
>>> > I realize max freq/volt might not be the best option for you, but is
>>> > there another spot that would make sense? I can imagine you want to
>>> > return your MMC to low power state ASAP as well.
>>> >
>>> >
>>> > So rather than a disable flag, I would really rather see an IO-wait OPP
>>> > state selector or something.
>>>
>>> We never had this in older kernels and I don't think we ever had an
>>> issue where I/O was slow because of CPU frequency. If a task is busy a
>>> lot, then its load tracking signal should be high and take care of
>>> keeping CPU frequency high right?
>>
>> As per the above, no. If the device completion takes long enough to
>> inject enough idle time, the utilization signal will never be high
>> enough to break out of that pattern.
>>
>>> If PELT is decaying the load
>>> tracking of iowaiting tasks too much, then I think that it should be
>>> fixed there (probably decay an iowaiting task lesser?).
>>
>> For the above to work, we'd have to completely discard IO-wait time on
>> the utilization signal. But that would then give the task u=1, which
>> would be incorrect for placement decisions and wreck EAS.
>
> Not completely discard but cap the decay of the signal during IO wait.
>
>>
>>> Considering
>>> that it makes power worse on newer kernels, it'd probably be best to
>>> disable it in my opinion for those who don't need it.
>>
>> You have yet to convince me you don't need it. Sure Android might not
>> have much IO heavy workloads, but that's not to say nothing on ARM ever
>> does.
>>
>> Also note that if you set the boost OPP to the lowest OPP you
>> effectively do disable it.
>>
>> Looking at the code, it appears we already have this in
>> iowait_boost_max.
>
> Currently it is set to:
>  sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq
>
> Are you proposing to make this a sysfs tunable so we can override what
> the iowait_boost_max value is?
>

Peter I didn't hear back from you. Maybe my comment here did not make
much sense to you? That could be because I was confused what you meant
by iowait_boost_max setting to 0. Currently afaik there isn't an
upstream way of doing this. Were you suggesting making
iowait_boost_max as a tunable and setting it to 0?

Or did you mean to have us carry an out of tree patch that sets it to
0? One of the reason I am pushing this patch is to not have to carry
an out of tree patch that disables it. Looking forward to your reply.

Thanks a lot,
Joel

  reply	other threads:[~2017-06-10  8:08 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-19  6:23 [PATCH v2 0/2] Make iowait_boost optional and default to policy Joel Fernandes
2017-05-19  6:23 ` [PATCH v2 1/2] cpufreq: Make iowait boost a policy option Joel Fernandes
2017-05-19  9:42   ` Peter Zijlstra
2017-05-19 10:21     ` Peter Zijlstra
2017-05-19 17:04     ` Joel Fernandes
2017-05-22  8:21       ` Peter Zijlstra
2017-05-24 20:17         ` Joel Fernandes
2017-06-10  8:08           ` Joel Fernandes [this message]
2017-06-10 13:56             ` Peter Zijlstra
2017-06-11  6:59               ` Joel Fernandes
2017-05-19  6:23 ` [PATCH v2 2/2] sched: Make iowait_boost optional in schedutil Joel Fernandes
2017-05-19  6:50   ` Viresh Kumar
2017-05-19 16:10     ` Joel Fernandes
2017-07-11 19:02       ` Saravana Kannan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJWu+opua5nw5w=hUUNWVWekV_6HBPb1mM3PgR8VN=GcoTwMgg@mail.gmail.com' \
    --to=joelaf@google.com \
    --cc=Juri.Lelli@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.