All of lore.kernel.org
 help / color / mirror / Atom feed
From: Glauber Costa <glommer@parallels.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: <cgroups@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Paul Turner <pjt@google.com>, Randy Dunlap <rdunlap@xenotime.net>
Subject: Re: [PATCH v5 11/11] sched: introduce cgroup file stat_percpu
Date: Wed, 23 Jan 2013 18:20:13 +0400	[thread overview]
Message-ID: <50FFF19D.60007@parallels.com> (raw)
In-Reply-To: <20130109124220.ad9f1a54.akpm@linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 581 bytes --]

On 01/10/2013 12:42 AM, Andrew Morton wrote:
> Also, I'm not seeing any changes to Docmentation/ in this patchset. 
> How do we explain the interface to our users?

There is little point in adding any Documentation, since the cpu cgroup
itself is not documented. I took the liberty of doing this myself so to
provide a baseline for the upcoming changes. It would be very nice if
you guys could review the file as-is, since it would save me one
patchset iteration, at least.

When the contents are settled, I intend to then proceed into documenting
the new file in there.

Thanks.


[-- Attachment #2: cpu.txt --]
[-- Type: text/plain, Size: 3599 bytes --]

CPU Controller
--------------

The CPU controller is responsible for grouping tasks together that will be
viewed by the scheduler as a single unit. The CFS scheduler will first divide
CPU time equally between all entities in the same level, and then proceed by
doing the same in the next level. Basic use cases for that are described in the
main cgroup documentation file, cgroups.txt.

Users of this functionality should be aware that deep hierarchies will of
course impose scheduler overhead, since the scheduler will have to take extra
steps and look up additional data structures to make its final decision.

Through the CPU controller, the scheduler is also able to cap the CPU
utilization of a particular group. This is particularly useful in environments
in which CPU is paid for by the hour, and one values predictability over
performance.

CPU Accounting
--------------

The CPU cgroup will also provide additional files under the prefix "cpuacct".
Those files provide accounting statistics and were previously provided by the
separate cpuacct controller. Although the cpuacct controller will still be kept
around for compatibility reasons, its usage is discouraged. If both the CPU and
cpuacct controllers are present in the system, distributors are encouraged to
always mount them together.

Files
-----

The CPU controller exposes the following files to the user:

cpu.shares:

 - cpu.cfs_period_us: The duration in microseconds of each scheduler period, for
 bandwidth decisions. This defaults to 100000us or 100ms. Larger periods will
 improve throughput at the expense of latency, since the scheduler will be able
 to sustain a cpu-bound workload for longer. The opposite of true for smaller
 periods. Note that this only affects non-RT tasks that are scheduled by the
 CFS scheduler.

- cpu.cfs_quota_us: The maximum time in microseconds during each cfs_period_us
  in for the current group will be allowed to run. For instance, if it is set to
  half of cpu_period_us, the cgroup will only be able to peak run for 50 % of
  the time. One should note that this represents aggregate time over all CPUs
  in the system. Therefore, in order to allow full usage of two CPUs, for
  instance, one should set this value to twice the value of cfs_period_us.

- cpu.stat: statistics about the bandwidth controls. No data will be presented
  if cpu.cfs_quota_us is not set. The file presents three
  numbers:
	nr_periods: how many full periods have been elapsed.
	nr_throttled: number of times we exausted the full allowed bandwidth
	throttled_time: total time the tasks were not run due to being overquota

 - cpu.rt_runtime_us and cpu.rt_period_us: Those files are the RT-tasks
   analogous to the CFS files cfs_quota_us and cfs_period_us. One important
   difference, though, is that while the cfs quotas are upper bounds that
   won't necessarily be met, the rt runtimes form a stricter guarantee.
   Therefore, no overlap is allowed. Implications of that are that given a
   hierarchy with multiple children, the sum of all rt_runtime_us may not exceed
   the runtime of the parent. Also, a rt_runtime_us of 0, means that no rt tasks
   can ever be run in this cgroup.

 - cpuacct.usage: The aggregate CPU time, in microseconds, consumed by all tasks
   in this group.

 - cpuacct.usage_percpu: The CPU time, in microseconds, consumed by all tasks in
   this group, separated by CPU. The format is an space-separated array of time
   values, one for each present CPU.

 - cpuacct.stat: aggregate user and system time consumed by tasks in this group.
   The format is user: x\nsystem: y.


WARNING: multiple messages have this Message-ID (diff)
From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
To: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Peter Zijlstra
	<a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>,
	Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Randy Dunlap <rdunlap-/UHa2rfvQTnk1uMJSBkQmQ@public.gmane.org>
Subject: Re: [PATCH v5 11/11] sched: introduce cgroup file stat_percpu
Date: Wed, 23 Jan 2013 18:20:13 +0400	[thread overview]
Message-ID: <50FFF19D.60007@parallels.com> (raw)
In-Reply-To: <20130109124220.ad9f1a54.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 581 bytes --]

On 01/10/2013 12:42 AM, Andrew Morton wrote:
> Also, I'm not seeing any changes to Docmentation/ in this patchset. 
> How do we explain the interface to our users?

There is little point in adding any Documentation, since the cpu cgroup
itself is not documented. I took the liberty of doing this myself so to
provide a baseline for the upcoming changes. It would be very nice if
you guys could review the file as-is, since it would save me one
patchset iteration, at least.

When the contents are settled, I intend to then proceed into documenting
the new file in there.

Thanks.


[-- Attachment #2: cpu.txt --]
[-- Type: text/plain, Size: 3599 bytes --]

CPU Controller
--------------

The CPU controller is responsible for grouping tasks together that will be
viewed by the scheduler as a single unit. The CFS scheduler will first divide
CPU time equally between all entities in the same level, and then proceed by
doing the same in the next level. Basic use cases for that are described in the
main cgroup documentation file, cgroups.txt.

Users of this functionality should be aware that deep hierarchies will of
course impose scheduler overhead, since the scheduler will have to take extra
steps and look up additional data structures to make its final decision.

Through the CPU controller, the scheduler is also able to cap the CPU
utilization of a particular group. This is particularly useful in environments
in which CPU is paid for by the hour, and one values predictability over
performance.

CPU Accounting
--------------

The CPU cgroup will also provide additional files under the prefix "cpuacct".
Those files provide accounting statistics and were previously provided by the
separate cpuacct controller. Although the cpuacct controller will still be kept
around for compatibility reasons, its usage is discouraged. If both the CPU and
cpuacct controllers are present in the system, distributors are encouraged to
always mount them together.

Files
-----

The CPU controller exposes the following files to the user:

cpu.shares:

 - cpu.cfs_period_us: The duration in microseconds of each scheduler period, for
 bandwidth decisions. This defaults to 100000us or 100ms. Larger periods will
 improve throughput at the expense of latency, since the scheduler will be able
 to sustain a cpu-bound workload for longer. The opposite of true for smaller
 periods. Note that this only affects non-RT tasks that are scheduled by the
 CFS scheduler.

- cpu.cfs_quota_us: The maximum time in microseconds during each cfs_period_us
  in for the current group will be allowed to run. For instance, if it is set to
  half of cpu_period_us, the cgroup will only be able to peak run for 50 % of
  the time. One should note that this represents aggregate time over all CPUs
  in the system. Therefore, in order to allow full usage of two CPUs, for
  instance, one should set this value to twice the value of cfs_period_us.

- cpu.stat: statistics about the bandwidth controls. No data will be presented
  if cpu.cfs_quota_us is not set. The file presents three
  numbers:
	nr_periods: how many full periods have been elapsed.
	nr_throttled: number of times we exausted the full allowed bandwidth
	throttled_time: total time the tasks were not run due to being overquota

 - cpu.rt_runtime_us and cpu.rt_period_us: Those files are the RT-tasks
   analogous to the CFS files cfs_quota_us and cfs_period_us. One important
   difference, though, is that while the cfs quotas are upper bounds that
   won't necessarily be met, the rt runtimes form a stricter guarantee.
   Therefore, no overlap is allowed. Implications of that are that given a
   hierarchy with multiple children, the sum of all rt_runtime_us may not exceed
   the runtime of the parent. Also, a rt_runtime_us of 0, means that no rt tasks
   can ever be run in this cgroup.

 - cpuacct.usage: The aggregate CPU time, in microseconds, consumed by all tasks
   in this group.

 - cpuacct.usage_percpu: The CPU time, in microseconds, consumed by all tasks in
   this group, separated by CPU. The format is an space-separated array of time
   values, one for each present CPU.

 - cpuacct.stat: aggregate user and system time consumed by tasks in this group.
   The format is user: x\nsystem: y.


  parent reply	other threads:[~2013-01-23 14:20 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-09 11:45 [PATCH v5 00/11] per-cgroup cpu-stat Glauber Costa
2013-01-09 11:45 ` Glauber Costa
2013-01-09 11:45 ` [PATCH v5 01/11] don't call cpuacct_charge in stop_task.c Glauber Costa
2013-01-09 11:45   ` Glauber Costa
2013-01-09 11:45 ` [PATCH v5 02/11] cgroup: implement CFTYPE_NO_PREFIX Glauber Costa
2013-01-09 11:45   ` Glauber Costa
2013-01-09 11:45 ` [PATCH v5 03/11] cgroup, sched: let cpu serve the same files as cpuacct Glauber Costa
2013-01-09 11:45   ` Glauber Costa
2013-01-14  8:34   ` Sha Zhengju
2013-01-14  8:34     ` Sha Zhengju
2013-01-14 14:55     ` Glauber Costa
2013-01-14 14:55       ` Glauber Costa
2013-01-15 10:19       ` Sha Zhengju
2013-01-15 10:19         ` Sha Zhengju
2013-01-15 17:52         ` Glauber Costa
2013-01-15 17:52           ` Glauber Costa
2013-01-09 11:45 ` [PATCH v5 04/11] cgroup, sched: deprecate cpuacct Glauber Costa
2013-01-09 11:45   ` Glauber Costa
2013-01-09 11:45 ` [PATCH v5 05/11] sched: adjust exec_clock to use it as cpu usage metric Glauber Costa
2013-01-09 11:45   ` Glauber Costa
2013-01-09 11:45 ` [PATCH v5 06/11] cpuacct: don't actually do anything Glauber Costa
2013-01-09 11:45   ` Glauber Costa
2013-01-09 11:45 ` [PATCH v5 07/11] account guest time per-cgroup as well Glauber Costa
2013-01-09 11:45   ` Glauber Costa
2013-01-09 11:45 ` [PATCH v5 08/11] sched: Push put_prev_task() into pick_next_task() Glauber Costa
2013-01-09 11:45   ` Glauber Costa
2013-01-09 11:45 ` [PATCH v5 09/11] record per-cgroup number of context switches Glauber Costa
2013-01-09 11:45   ` Glauber Costa
2013-01-09 11:45 ` [PATCH v5 10/11] sched: change nr_context_switches calculation Glauber Costa
2013-01-09 11:45   ` Glauber Costa
2013-01-09 11:45 ` [PATCH v5 11/11] sched: introduce cgroup file stat_percpu Glauber Costa
2013-01-09 11:45   ` Glauber Costa
2013-01-09 20:42   ` Andrew Morton
2013-01-09 20:42     ` Andrew Morton
2013-01-09 21:10     ` Glauber Costa
2013-01-09 21:10       ` Glauber Costa
2013-01-09 21:17       ` Andrew Morton
2013-01-09 21:17         ` Andrew Morton
2013-01-09 21:27         ` Glauber Costa
2013-01-09 21:27           ` Glauber Costa
2013-01-23 14:26           ` Glauber Costa
2013-01-23 14:26             ` Glauber Costa
2013-01-23 14:20     ` Glauber Costa [this message]
2013-01-23 14:20       ` Glauber Costa
2013-01-09 14:41 ` [PATCH v5 00/11] per-cgroup cpu-stat Tejun Heo
2013-01-09 14:41   ` Tejun Heo
2013-01-16  0:33 ` Colin Cross
2013-01-21 12:14   ` Glauber Costa
2013-01-21 12:14     ` Glauber Costa
2013-01-23  1:02     ` Tejun Heo
2013-01-23  1:02       ` Tejun Heo
2013-01-23  1:53       ` Colin Cross
2013-01-23  1:53         ` Colin Cross
2013-01-23  8:12         ` Glauber Costa
2013-01-23  8:12           ` Glauber Costa
2013-01-23 16:56         ` Tejun Heo
2013-01-23 16:56           ` Tejun Heo
2013-01-23 22:41           ` Colin Cross
2013-01-23 23:06             ` Tejun Heo
2013-01-23 23:06               ` Tejun Heo
2013-01-23 23:53               ` Colin Cross
2013-01-23 23:53                 ` Colin Cross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50FFF19D.60007@parallels.com \
    --to=glommer@parallels.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pjt@google.com \
    --cc=rdunlap@xenotime.net \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.