* [patch 00/15] CFS Bandwidth Control V6
@ 2011-05-03 9:28 Paul Turner
2011-05-03 9:28 ` [patch 01/15] sched: (fixlet) dont update shares twice on on_rq parent Paul Turner
` (16 more replies)
0 siblings, 17 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:28 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
[ Apologies if you're receiving this twice, the previous mailing did not seem
to make it to the list for some reason ].
Hi all,
Please find attached the latest iteration of bandwidth control (v6).
Where the previous release cleaned up many of the semantics surrounding the
update_curr() path and throttling, this release is focused on cleaning up the
patchset itself. Elements such as the notion of expiring bandwidth from
previous quota periods as well as some of the core accounting changes have
been pushed up (and re-written for clarity) within the patchset reducing the
patch-to-patch churn significantly.
While this restructuring was fairly extensive in terms of the code touched,
there are no major behavioral changes beyond bug fixes.
Thanks to Hidetoshi Seto for identifying the throttle list corruption.
Notable changes:
- Runtime is now actively expired taking advantage of the bounds placed on
sched_clock syncrhonization.
- distribute_cfs_runtime() no longer races with throttles around the period
boundary.
- Major code cleanup
Bug fixes:
- several interactions with active load-balance have been corrected. This was
manifesting previously in throttle_list corruption and crashes.
Interface:
----------
Three new cgroupfs files are exported by the cpu subsystem:
cpu.cfs_period_us : period over which bandwidth is to be regulated
cpu.cfs_quota_us : bandwidth available for consumption per period
cpu.stat : statistics (such as number of throttled periods and
total throttled time)
One important interface change that this introduces (versus the rate limits
proposal) is that the defined bandwidth becomes an absolute quantifier.
Previous postings:
-----------------
v5:
https://lkml.org/lkml/2011/3/22/477
v4:
https://lkml.org/lkml/2011/2/23/44
v3:
https://lkml.org/lkml/2010/10/12/44
v2:
http://lkml.org/lkml/2010/4/28/88
Original posting:
http://lkml.org/lkml/2010/2/12/393
Prior approaches:
http://lkml.org/lkml/2010/1/5/44 ["CFS Hard limits v5"]
Thanks,
- Paul
^ permalink raw reply [flat|nested] 129+ messages in thread
* [patch 01/15] sched: (fixlet) dont update shares twice on on_rq parent
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
@ 2011-05-03 9:28 ` Paul Turner
2011-05-10 7:14 ` Hidetoshi Seto
2011-05-03 9:28 ` [patch 02/15] sched: hierarchical task accounting for SCHED_OTHER Paul Turner
` (15 subsequent siblings)
16 siblings, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:28 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
[-- Attachment #1: sched-bwc-fix_dequeue_task_buglet.patch --]
[-- Type: text/plain, Size: 947 bytes --]
In dequeue_task_fair() we bail on dequeue when we encounter a parenting entity
with additional weight. However, we perform a double shares update on this
entity since we continue the shares update traversal from that point, despite
dequeue_entity() having already updated its queuing cfs_rq.
Avoid this by starting from the parent when we resume.
Signed-off-by: Paul Turner <pjt@google.com>
---
kernel/sched_fair.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index: tip/kernel/sched_fair.c
===================================================================
--- tip.orig/kernel/sched_fair.c
+++ tip/kernel/sched_fair.c
@@ -1355,8 +1355,10 @@ static void dequeue_task_fair(struct rq
dequeue_entity(cfs_rq, se, flags);
/* Don't dequeue parent if it has other entities besides us */
- if (cfs_rq->load.weight)
+ if (cfs_rq->load.weight) {
+ se = parent_entity(se);
break;
+ }
flags |= DEQUEUE_SLEEP;
}
^ permalink raw reply [flat|nested] 129+ messages in thread
* [patch 02/15] sched: hierarchical task accounting for SCHED_OTHER
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
2011-05-03 9:28 ` [patch 01/15] sched: (fixlet) dont update shares twice on on_rq parent Paul Turner
@ 2011-05-03 9:28 ` Paul Turner
2011-05-10 7:17 ` Hidetoshi Seto
2011-05-03 9:28 ` [patch 03/15] sched: introduce primitives to account for CFS bandwidth tracking Paul Turner
` (14 subsequent siblings)
16 siblings, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:28 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
[-- Attachment #1: sched-bwc-account_nr_running.patch --]
[-- Type: text/plain, Size: 4928 bytes --]
Introduce hierarchal task accounting for the group scheduling case in CFS, as
well as promoting the responsibility for maintaining rq->nr_running to the
scheduling classes.
The primary motivation for this is that with scheduling classes supporting
bandwidht throttling it is possible for entities participating in trottled
sub-trees to not have root visible changes in rq->nr_running across activate
and de-activate operations. This in turn leads to incorrect idle and
weight-per-task load balance decisions.
This also allows us to make a small fixlet to the fastpath in pick_next_task()
under group scheduling.
Note: this issue also exists with the existing sched_rt throttling mechanism.
This patch does not address that.
Signed-off-by: Paul Turner <pjt@google.com>
---
kernel/sched.c | 6 ++----
kernel/sched_fair.c | 14 ++++++++++----
kernel/sched_rt.c | 5 ++++-
kernel/sched_stoptask.c | 2 ++
4 files changed, 18 insertions(+), 9 deletions(-)
Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -308,7 +308,7 @@ struct task_group root_task_group;
/* CFS-related fields in a runqueue */
struct cfs_rq {
struct load_weight load;
- unsigned long nr_running;
+ unsigned long nr_running, h_nr_running;
u64 exec_clock;
u64 min_vruntime;
@@ -1793,7 +1793,6 @@ static void activate_task(struct rq *rq,
rq->nr_uninterruptible--;
enqueue_task(rq, p, flags);
- inc_nr_running(rq);
}
/*
@@ -1805,7 +1804,6 @@ static void deactivate_task(struct rq *r
rq->nr_uninterruptible++;
dequeue_task(rq, p, flags);
- dec_nr_running(rq);
}
#ifdef CONFIG_IRQ_TIME_ACCOUNTING
@@ -4053,7 +4051,7 @@ pick_next_task(struct rq *rq)
* Optimization: we know that if all tasks are in
* the fair class we can call that function directly:
*/
- if (likely(rq->nr_running == rq->cfs.nr_running)) {
+ if (likely(rq->nr_running == rq->cfs.h_nr_running)) {
p = fair_sched_class.pick_next_task(rq);
if (likely(p))
return p;
Index: tip/kernel/sched_fair.c
===================================================================
--- tip.orig/kernel/sched_fair.c
+++ tip/kernel/sched_fair.c
@@ -1318,7 +1318,7 @@ static inline void hrtick_update(struct
static void
enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
{
- struct cfs_rq *cfs_rq;
+ struct cfs_rq *cfs_rq = NULL;
struct sched_entity *se = &p->se;
for_each_sched_entity(se) {
@@ -1326,16 +1326,19 @@ enqueue_task_fair(struct rq *rq, struct
break;
cfs_rq = cfs_rq_of(se);
enqueue_entity(cfs_rq, se, flags);
+ cfs_rq->h_nr_running++;
flags = ENQUEUE_WAKEUP;
}
for_each_sched_entity(se) {
- struct cfs_rq *cfs_rq = cfs_rq_of(se);
+ cfs_rq = cfs_rq_of(se);
+ cfs_rq->h_nr_running++;
update_cfs_load(cfs_rq, 0);
update_cfs_shares(cfs_rq);
}
+ inc_nr_running(rq);
hrtick_update(rq);
}
@@ -1346,12 +1349,13 @@ enqueue_task_fair(struct rq *rq, struct
*/
static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
{
- struct cfs_rq *cfs_rq;
+ struct cfs_rq *cfs_rq = NULL;
struct sched_entity *se = &p->se;
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);
dequeue_entity(cfs_rq, se, flags);
+ cfs_rq->h_nr_running--;
/* Don't dequeue parent if it has other entities besides us */
if (cfs_rq->load.weight) {
@@ -1362,12 +1366,14 @@ static void dequeue_task_fair(struct rq
}
for_each_sched_entity(se) {
- struct cfs_rq *cfs_rq = cfs_rq_of(se);
+ cfs_rq = cfs_rq_of(se);
+ cfs_rq->h_nr_running--;
update_cfs_load(cfs_rq, 0);
update_cfs_shares(cfs_rq);
}
+ dec_nr_running(rq);
hrtick_update(rq);
}
Index: tip/kernel/sched_rt.c
===================================================================
--- tip.orig/kernel/sched_rt.c
+++ tip/kernel/sched_rt.c
@@ -927,6 +927,8 @@ enqueue_task_rt(struct rq *rq, struct ta
if (!task_current(rq, p) && p->rt.nr_cpus_allowed > 1)
enqueue_pushable_task(rq, p);
+
+ inc_nr_running(rq);
}
static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
@@ -937,6 +939,8 @@ static void dequeue_task_rt(struct rq *r
dequeue_rt_entity(rt_se);
dequeue_pushable_task(rq, p);
+
+ dec_nr_running(rq);
}
/*
@@ -1804,4 +1808,3 @@ static void print_rt_stats(struct seq_fi
rcu_read_unlock();
}
#endif /* CONFIG_SCHED_DEBUG */
-
Index: tip/kernel/sched_stoptask.c
===================================================================
--- tip.orig/kernel/sched_stoptask.c
+++ tip/kernel/sched_stoptask.c
@@ -35,11 +35,13 @@ static struct task_struct *pick_next_tas
static void
enqueue_task_stop(struct rq *rq, struct task_struct *p, int flags)
{
+ inc_nr_running(rq);
}
static void
dequeue_task_stop(struct rq *rq, struct task_struct *p, int flags)
{
+ dec_nr_running(rq);
}
static void yield_task_stop(struct rq *rq)
^ permalink raw reply [flat|nested] 129+ messages in thread
* [patch 03/15] sched: introduce primitives to account for CFS bandwidth tracking
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
2011-05-03 9:28 ` [patch 01/15] sched: (fixlet) dont update shares twice on on_rq parent Paul Turner
2011-05-03 9:28 ` [patch 02/15] sched: hierarchical task accounting for SCHED_OTHER Paul Turner
@ 2011-05-03 9:28 ` Paul Turner
2011-05-10 7:18 ` Hidetoshi Seto
2011-05-03 9:28 ` [patch 04/15] sched: validate CFS quota hierarchies Paul Turner
` (13 subsequent siblings)
16 siblings, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:28 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov, Nikhil Rao
[-- Attachment #1: sched-bwc-add_cfs_tg_bandwidth.patch --]
[-- Type: text/plain, Size: 9956 bytes --]
In this patch we introduce the notion of CFS bandwidth, partitioned into
globally unassigned bandwidth, and locally claimed bandwidth.
- The global bandwidth is per task_group, it represents a pool of unclaimed
bandwidth that cfs_rqs can allocate from.
- The local bandwidth is tracked per-cfs_rq, this represents allotments from
the global pool bandwidth assigned to a specific cpu.
Bandwidth is managed via cgroupfs, adding two new interfaces to the cpu subsystem:
- cpu.cfs_period_us : the bandwidth period in usecs
- cpu.cfs_quota_us : the cpu bandwidth (in usecs) that this tg will be allowed
to consume over period above.
Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Nikhil Rao <ncrao@google.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
init/Kconfig | 12 +++
kernel/sched.c | 193 ++++++++++++++++++++++++++++++++++++++++++++++++++--
kernel/sched_fair.c | 16 ++++
3 files changed, 217 insertions(+), 4 deletions(-)
Index: tip/init/Kconfig
===================================================================
--- tip.orig/init/Kconfig
+++ tip/init/Kconfig
@@ -715,6 +715,18 @@ config FAIR_GROUP_SCHED
depends on CGROUP_SCHED
default CGROUP_SCHED
+config CFS_BANDWIDTH
+ bool "CPU bandwidth provisioning for FAIR_GROUP_SCHED"
+ depends on EXPERIMENTAL
+ depends on FAIR_GROUP_SCHED
+ default n
+ help
+ This option allows users to define CPU bandwidth rates (limits) for
+ tasks running within the fair group scheduler. Groups with no limit
+ set are considered to be unconstrained and will run with no
+ restriction.
+ See tip/Documentation/scheduler/sched-bwc.txt for more information/
+
config RT_GROUP_SCHED
bool "Group scheduling for SCHED_RR/FIFO"
depends on EXPERIMENTAL
Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -244,6 +244,14 @@ struct cfs_rq;
static LIST_HEAD(task_groups);
+struct cfs_bandwidth {
+#ifdef CONFIG_CFS_BANDWIDTH
+ raw_spinlock_t lock;
+ ktime_t period;
+ u64 quota;
+#endif
+};
+
/* task group related information */
struct task_group {
struct cgroup_subsys_state css;
@@ -275,6 +283,8 @@ struct task_group {
#ifdef CONFIG_SCHED_AUTOGROUP
struct autogroup *autogroup;
#endif
+
+ struct cfs_bandwidth cfs_bandwidth;
};
/* task_group_lock serializes the addition/removal of task groups */
@@ -369,9 +379,45 @@ struct cfs_rq {
unsigned long load_contribution;
#endif
+#ifdef CONFIG_CFS_BANDWIDTH
+ int runtime_enabled;
+ s64 runtime_remaining;
+#endif
#endif
};
+#ifdef CONFIG_CFS_BANDWIDTH
+static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
+{
+ return &tg->cfs_bandwidth;
+}
+
+static inline u64 default_cfs_period(void);
+
+static void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
+{
+ raw_spin_lock_init(&cfs_b->lock);
+ cfs_b->quota = RUNTIME_INF;
+ cfs_b->period = ns_to_ktime(default_cfs_period());
+}
+
+static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq)
+{
+ cfs_rq->runtime_remaining = 0;
+ cfs_rq->runtime_enabled = 0;
+}
+
+static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
+{}
+#else
+#ifdef CONFIG_FAIR_GROUP_SCHED
+static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq) {}
+void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
+static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
+#endif /* CONFIG_FAIR_GROUP_SCHED */
+static void start_cfs_bandwidth(struct cfs_rq *cfs_rq) {}
+#endif /* CONFIG_CFS_BANDWIDTH */
+
/* Real-Time classes' related field in a runqueue: */
struct rt_rq {
struct rt_prio_array active;
@@ -8056,6 +8102,7 @@ static void init_tg_cfs_entry(struct tas
tg->cfs_rq[cpu] = cfs_rq;
init_cfs_rq(cfs_rq, rq);
cfs_rq->tg = tg;
+ init_cfs_rq_runtime(cfs_rq);
tg->se[cpu] = se;
/* se could be NULL for root_task_group */
@@ -8191,6 +8238,7 @@ void __init sched_init(void)
* We achieve this by letting root_task_group's tasks sit
* directly in rq->cfs (i.e root_task_group->se[] = NULL).
*/
+ init_cfs_bandwidth(&root_task_group.cfs_bandwidth);
init_tg_cfs_entry(&root_task_group, &rq->cfs, NULL, i, NULL);
#endif /* CONFIG_FAIR_GROUP_SCHED */
@@ -8433,6 +8481,8 @@ static void free_fair_sched_group(struct
{
int i;
+ destroy_cfs_bandwidth(tg_cfs_bandwidth(tg));
+
for_each_possible_cpu(i) {
if (tg->cfs_rq)
kfree(tg->cfs_rq[i]);
@@ -8460,6 +8510,8 @@ int alloc_fair_sched_group(struct task_g
tg->shares = NICE_0_LOAD;
+ init_cfs_bandwidth(tg_cfs_bandwidth(tg));
+
for_each_possible_cpu(i) {
cfs_rq = kzalloc_node(sizeof(struct cfs_rq),
GFP_KERNEL, cpu_to_node(i));
@@ -8837,7 +8889,7 @@ static int __rt_schedulable(struct task_
return walk_tg_tree(tg_schedulable, tg_nop, &data);
}
-static int tg_set_bandwidth(struct task_group *tg,
+static int tg_set_rt_bandwidth(struct task_group *tg,
u64 rt_period, u64 rt_runtime)
{
int i, err = 0;
@@ -8876,7 +8928,7 @@ int sched_group_set_rt_runtime(struct ta
if (rt_runtime_us < 0)
rt_runtime = RUNTIME_INF;
- return tg_set_bandwidth(tg, rt_period, rt_runtime);
+ return tg_set_rt_bandwidth(tg, rt_period, rt_runtime);
}
long sched_group_rt_runtime(struct task_group *tg)
@@ -8901,7 +8953,7 @@ int sched_group_set_rt_period(struct tas
if (rt_period == 0)
return -EINVAL;
- return tg_set_bandwidth(tg, rt_period, rt_runtime);
+ return tg_set_rt_bandwidth(tg, rt_period, rt_runtime);
}
long sched_group_rt_period(struct task_group *tg)
@@ -9123,6 +9175,128 @@ static u64 cpu_shares_read_u64(struct cg
return (u64) tg->shares;
}
+
+#ifdef CONFIG_CFS_BANDWIDTH
+const u64 max_cfs_quota_period = 1 * NSEC_PER_SEC; /* 1s */
+const u64 min_cfs_quota_period = 1 * NSEC_PER_MSEC; /* 1ms */
+
+static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
+{
+ int i;
+ struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(tg);
+ static DEFINE_MUTEX(mutex);
+
+ if (tg == &root_task_group)
+ return -EINVAL;
+
+ /*
+ * Ensure we have at some amount of bandwidth every period. This is
+ * to prevent reaching a state of large arrears when throttled via
+ * entity_tick() resulting in prolonged exit starvation.
+ */
+ if (quota < min_cfs_quota_period || period < min_cfs_quota_period)
+ return -EINVAL;
+
+ /*
+ * Likewise, bound things on the otherside by preventing insane quota
+ * periods. This also allows us to normalize in computing quota
+ * feasibility.
+ */
+ if (period > max_cfs_quota_period)
+ return -EINVAL;
+
+ mutex_lock(&mutex);
+ raw_spin_lock_irq(&cfs_b->lock);
+ cfs_b->period = ns_to_ktime(period);
+ cfs_b->quota = quota;
+ raw_spin_unlock_irq(&cfs_b->lock);
+
+ for_each_possible_cpu(i) {
+ struct cfs_rq *cfs_rq = tg->cfs_rq[i];
+ struct rq *rq = rq_of(cfs_rq);
+
+ raw_spin_lock_irq(&rq->lock);
+ cfs_rq->runtime_enabled = quota != RUNTIME_INF;
+ cfs_rq->runtime_remaining = 0;
+ raw_spin_unlock_irq(&rq->lock);
+ }
+ mutex_unlock(&mutex);
+
+ return 0;
+}
+
+int tg_set_cfs_quota(struct task_group *tg, long cfs_quota_us)
+{
+ u64 quota, period;
+
+ period = ktime_to_ns(tg_cfs_bandwidth(tg)->period);
+ if (cfs_quota_us < 0)
+ quota = RUNTIME_INF;
+ else
+ quota = (u64)cfs_quota_us * NSEC_PER_USEC;
+
+ return tg_set_cfs_bandwidth(tg, period, quota);
+}
+
+long tg_get_cfs_quota(struct task_group *tg)
+{
+ u64 quota_us;
+
+ if (tg_cfs_bandwidth(tg)->quota == RUNTIME_INF)
+ return -1;
+
+ quota_us = tg_cfs_bandwidth(tg)->quota;
+ do_div(quota_us, NSEC_PER_USEC);
+
+ return quota_us;
+}
+
+int tg_set_cfs_period(struct task_group *tg, long cfs_period_us)
+{
+ u64 quota, period;
+
+ period = (u64)cfs_period_us * NSEC_PER_USEC;
+ quota = tg_cfs_bandwidth(tg)->quota;
+
+ if (period <= 0)
+ return -EINVAL;
+
+ return tg_set_cfs_bandwidth(tg, period, quota);
+}
+
+long tg_get_cfs_period(struct task_group *tg)
+{
+ u64 cfs_period_us;
+
+ cfs_period_us = ktime_to_ns(tg_cfs_bandwidth(tg)->period);
+ do_div(cfs_period_us, NSEC_PER_USEC);
+
+ return cfs_period_us;
+}
+
+static s64 cpu_cfs_quota_read_s64(struct cgroup *cgrp, struct cftype *cft)
+{
+ return tg_get_cfs_quota(cgroup_tg(cgrp));
+}
+
+static int cpu_cfs_quota_write_s64(struct cgroup *cgrp, struct cftype *cftype,
+ s64 cfs_quota_us)
+{
+ return tg_set_cfs_quota(cgroup_tg(cgrp), cfs_quota_us);
+}
+
+static u64 cpu_cfs_period_read_u64(struct cgroup *cgrp, struct cftype *cft)
+{
+ return tg_get_cfs_period(cgroup_tg(cgrp));
+}
+
+static int cpu_cfs_period_write_u64(struct cgroup *cgrp, struct cftype *cftype,
+ u64 cfs_period_us)
+{
+ return tg_set_cfs_period(cgroup_tg(cgrp), cfs_period_us);
+}
+
+#endif /* CONFIG_CFS_BANDWIDTH */
#endif /* CONFIG_FAIR_GROUP_SCHED */
#ifdef CONFIG_RT_GROUP_SCHED
@@ -9157,6 +9331,18 @@ static struct cftype cpu_files[] = {
.write_u64 = cpu_shares_write_u64,
},
#endif
+#ifdef CONFIG_CFS_BANDWIDTH
+ {
+ .name = "cfs_quota_us",
+ .read_s64 = cpu_cfs_quota_read_s64,
+ .write_s64 = cpu_cfs_quota_write_s64,
+ },
+ {
+ .name = "cfs_period_us",
+ .read_u64 = cpu_cfs_period_read_u64,
+ .write_u64 = cpu_cfs_period_write_u64,
+ },
+#endif
#ifdef CONFIG_RT_GROUP_SCHED
{
.name = "rt_runtime_us",
@@ -9466,4 +9652,3 @@ struct cgroup_subsys cpuacct_subsys = {
.subsys_id = cpuacct_subsys_id,
};
#endif /* CONFIG_CGROUP_CPUACCT */
-
Index: tip/kernel/sched_fair.c
===================================================================
--- tip.orig/kernel/sched_fair.c
+++ tip/kernel/sched_fair.c
@@ -1250,6 +1250,22 @@ entity_tick(struct cfs_rq *cfs_rq, struc
check_preempt_tick(cfs_rq, curr);
}
+
+/**************************************************
+ * CFS bandwidth control machinery
+ */
+
+#ifdef CONFIG_CFS_BANDWIDTH
+/*
+ * default period for cfs group bandwidth.
+ * default: 0.5s, units: nanoseconds
+ */
+static inline u64 default_cfs_period(void)
+{
+ return 500000000ULL;
+}
+#endif
+
/**************************************************
* CFS operations on tasks:
*/
^ permalink raw reply [flat|nested] 129+ messages in thread
* [patch 04/15] sched: validate CFS quota hierarchies
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
` (2 preceding siblings ...)
2011-05-03 9:28 ` [patch 03/15] sched: introduce primitives to account for CFS bandwidth tracking Paul Turner
@ 2011-05-03 9:28 ` Paul Turner
2011-05-10 7:20 ` Hidetoshi Seto
` (2 more replies)
2011-05-03 9:28 ` [patch 05/15] sched: add a timer to handle CFS bandwidth refresh Paul Turner
` (12 subsequent siblings)
16 siblings, 3 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:28 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
[-- Attachment #1: sched-bwc-consistent_quota.patch --]
[-- Type: text/plain, Size: 8005 bytes --]
Add constraints validation for CFS bandwidth hierachies.
Validate that:
sum(child bandwidth) <= parent_bandwidth
In a quota limited hierarchy, an unconstrainted entity
(e.g. bandwidth==RUNTIME_INF) inherits the bandwidth of its parent.
Since bandwidth periods may be non-uniform we normalize to the maximum allowed
period, 1 second.
This behavior may be disabled (allowing child bandwidth to exceed parent) via
kernel.sched_cfs_bandwidth_consistent=0
Signed-off-by: Paul Turner <pjt@google.com>
---
include/linux/sched.h | 8 ++
kernel/sched.c | 137 +++++++++++++++++++++++++++++++++++++++++++++-----
kernel/sched_fair.c | 8 ++
kernel/sysctl.c | 11 ++++
4 files changed, 151 insertions(+), 13 deletions(-)
Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -249,6 +249,7 @@ struct cfs_bandwidth {
raw_spinlock_t lock;
ktime_t period;
u64 quota;
+ s64 hierarchal_quota;
#endif
};
@@ -8789,12 +8790,7 @@ unsigned long sched_group_shares(struct
}
#endif
-#ifdef CONFIG_RT_GROUP_SCHED
-/*
- * Ensure that the real time constraints are schedulable.
- */
-static DEFINE_MUTEX(rt_constraints_mutex);
-
+#if defined(CONFIG_RT_GROUP_SCHED) || defined(CONFIG_CFS_BANDWIDTH)
static unsigned long to_ratio(u64 period, u64 runtime)
{
if (runtime == RUNTIME_INF)
@@ -8802,6 +8798,13 @@ static unsigned long to_ratio(u64 period
return div64_u64(runtime << 20, period);
}
+#endif
+
+#ifdef CONFIG_RT_GROUP_SCHED
+/*
+ * Ensure that the real time constraints are schedulable.
+ */
+static DEFINE_MUTEX(rt_constraints_mutex);
/* Must be called with tasklist_lock held */
static inline int tg_has_rt_tasks(struct task_group *tg)
@@ -8822,7 +8825,7 @@ struct rt_schedulable_data {
u64 rt_runtime;
};
-static int tg_schedulable(struct task_group *tg, void *data)
+static int tg_rt_schedulable(struct task_group *tg, void *data)
{
struct rt_schedulable_data *d = data;
struct task_group *child;
@@ -8886,7 +8889,7 @@ static int __rt_schedulable(struct task_
.rt_runtime = runtime,
};
- return walk_tg_tree(tg_schedulable, tg_nop, &data);
+ return walk_tg_tree(tg_rt_schedulable, tg_nop, &data);
}
static int tg_set_rt_bandwidth(struct task_group *tg,
@@ -9177,14 +9180,17 @@ static u64 cpu_shares_read_u64(struct cg
}
#ifdef CONFIG_CFS_BANDWIDTH
+static DEFINE_MUTEX(cfs_constraints_mutex);
+
const u64 max_cfs_quota_period = 1 * NSEC_PER_SEC; /* 1s */
const u64 min_cfs_quota_period = 1 * NSEC_PER_MSEC; /* 1ms */
+static int __cfs_schedulable(struct task_group *tg, u64 period, u64 runtime);
+
static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
{
- int i;
+ int i, ret = 0;
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(tg);
- static DEFINE_MUTEX(mutex);
if (tg == &root_task_group)
return -EINVAL;
@@ -9205,7 +9211,13 @@ static int tg_set_cfs_bandwidth(struct t
if (period > max_cfs_quota_period)
return -EINVAL;
- mutex_lock(&mutex);
+ mutex_lock(&cfs_constraints_mutex);
+ if (sysctl_sched_cfs_bandwidth_consistent) {
+ ret = __cfs_schedulable(tg, period, quota);
+ if (ret)
+ goto out_unlock;
+ }
+
raw_spin_lock_irq(&cfs_b->lock);
cfs_b->period = ns_to_ktime(period);
cfs_b->quota = quota;
@@ -9220,9 +9232,10 @@ static int tg_set_cfs_bandwidth(struct t
cfs_rq->runtime_remaining = 0;
raw_spin_unlock_irq(&rq->lock);
}
- mutex_unlock(&mutex);
+out_unlock:
+ mutex_unlock(&cfs_constraints_mutex);
- return 0;
+ return ret;
}
int tg_set_cfs_quota(struct task_group *tg, long cfs_quota_us)
@@ -9296,6 +9309,104 @@ static int cpu_cfs_period_write_u64(stru
return tg_set_cfs_period(cgroup_tg(cgrp), cfs_period_us);
}
+
+struct cfs_schedulable_data {
+ struct task_group *tg;
+ u64 period, quota;
+};
+
+/*
+ * normalize group quota/period to be quota/max_period
+ * note: units are usecs
+ */
+static u64 normalize_cfs_quota(struct task_group *tg,
+ struct cfs_schedulable_data *d)
+{
+ u64 quota, period;
+
+ if (tg == d->tg) {
+ period = d->period;
+ quota = d->quota;
+ } else {
+ period = tg_get_cfs_period(tg);
+ quota = tg_get_cfs_quota(tg);
+ }
+
+ if (quota == RUNTIME_INF)
+ return RUNTIME_INF;
+
+ return to_ratio(period, quota);
+}
+
+static int tg_cfs_schedulable_down(struct task_group *tg, void *data)
+{
+ struct cfs_schedulable_data *d = data;
+ struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(tg);
+ s64 quota = 0, parent_quota = -1;
+
+ quota = normalize_cfs_quota(tg, d);
+ if (!tg->parent) {
+ quota = RUNTIME_INF;
+ } else {
+ struct cfs_bandwidth *parent_b = tg_cfs_bandwidth(tg->parent);
+
+ parent_quota = parent_b->hierarchal_quota;
+ if (parent_quota != RUNTIME_INF) {
+ parent_quota -= quota;
+ /* invalid hierarchy, child bandwidth exceeds parent */
+ if (parent_quota < 0)
+ return -EINVAL;
+ }
+
+ /* if no inherent limit then inherit parent quota */
+ if (quota == RUNTIME_INF)
+ quota = parent_quota;
+ parent_b->hierarchal_quota = parent_quota;
+ }
+ cfs_b->hierarchal_quota = quota;
+
+ return 0;
+}
+
+static int __cfs_schedulable(struct task_group *tg, u64 period, u64 quota)
+{
+ struct cfs_schedulable_data data = {
+ .tg = tg,
+ .period = period,
+ .quota = quota,
+ };
+
+ if (!sysctl_sched_cfs_bandwidth_consistent)
+ return 0;
+
+ if (quota != RUNTIME_INF) {
+ do_div(data.period, NSEC_PER_USEC);
+ do_div(data.quota, NSEC_PER_USEC);
+ }
+
+ return walk_tg_tree(tg_cfs_schedulable_down, tg_nop, &data);
+}
+
+int sched_cfs_consistent_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp,
+ loff_t *ppos)
+{
+ int ret;
+
+ mutex_lock(&cfs_constraints_mutex);
+ ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+ if (!ret && write && sysctl_sched_cfs_bandwidth_consistent) {
+ ret = __cfs_schedulable(NULL, 0, 0);
+
+ /* must be consistent to enable */
+ if (ret)
+ sysctl_sched_cfs_bandwidth_consistent = 0;
+ }
+ mutex_unlock(&cfs_constraints_mutex);
+
+ return ret;
+}
#endif /* CONFIG_CFS_BANDWIDTH */
#endif /* CONFIG_FAIR_GROUP_SCHED */
Index: tip/kernel/sysctl.c
===================================================================
--- tip.orig/kernel/sysctl.c
+++ tip/kernel/sysctl.c
@@ -367,6 +367,17 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = sched_rt_handler,
},
+#ifdef CONFIG_CFS_BANDWIDTH
+ {
+ .procname = "sched_cfs_bandwidth_consistent",
+ .data = &sysctl_sched_cfs_bandwidth_consistent,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = sched_cfs_consistent_handler,
+ .extra1 = &zero,
+ .extra2 = &one,
+ },
+#endif
#ifdef CONFIG_SCHED_AUTOGROUP
{
.procname = "sched_autogroup_enabled",
Index: tip/include/linux/sched.h
===================================================================
--- tip.orig/include/linux/sched.h
+++ tip/include/linux/sched.h
@@ -1950,6 +1950,14 @@ int sched_rt_handler(struct ctl_table *t
void __user *buffer, size_t *lenp,
loff_t *ppos);
+#ifdef CONFIG_CFS_BANDWIDTH
+extern unsigned int sysctl_sched_cfs_bandwidth_consistent;
+
+int sched_cfs_consistent_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp,
+ loff_t *ppos);
+#endif
+
#ifdef CONFIG_SCHED_AUTOGROUP
extern unsigned int sysctl_sched_autogroup_enabled;
Index: tip/kernel/sched_fair.c
===================================================================
--- tip.orig/kernel/sched_fair.c
+++ tip/kernel/sched_fair.c
@@ -88,6 +88,14 @@ const_debug unsigned int sysctl_sched_mi
*/
unsigned int __read_mostly sysctl_sched_shares_window = 10000000UL;
+#ifdef CONFIG_CFS_BANDWIDTH
+/*
+ * Whether a CFS bandwidth hierarchy is required to be consistent, that is:
+ * sum(child_bandwidth) <= parent_bandwidth
+ */
+unsigned int sysctl_sched_cfs_bandwidth_consistent = 1;
+#endif
+
static const struct sched_class fair_sched_class;
/**************************************************************
^ permalink raw reply [flat|nested] 129+ messages in thread
* [patch 05/15] sched: add a timer to handle CFS bandwidth refresh
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
` (3 preceding siblings ...)
2011-05-03 9:28 ` [patch 04/15] sched: validate CFS quota hierarchies Paul Turner
@ 2011-05-03 9:28 ` Paul Turner
2011-05-10 7:21 ` Hidetoshi Seto
2011-05-16 10:18 ` Peter Zijlstra
2011-05-03 9:28 ` [patch 06/15] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth Paul Turner
` (11 subsequent siblings)
16 siblings, 2 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:28 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
[-- Attachment #1: sched-bwc-bandwidth_timers.patch --]
[-- Type: text/plain, Size: 4945 bytes --]
This patch adds a per-task_group timer which handles the refresh of the global
CFS bandwidth pool.
Since the RT pool is using a similar timer there's some small refactoring to
share this support.
Signed-off-by: Paul Turner <pjt@google.com>
---
kernel/sched.c | 87 ++++++++++++++++++++++++++++++++++++++++------------
kernel/sched_fair.c | 9 +++++
2 files changed, 77 insertions(+), 19 deletions(-)
Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -193,10 +193,28 @@ static inline int rt_bandwidth_enabled(v
return sysctl_sched_rt_runtime >= 0;
}
-static void start_rt_bandwidth(struct rt_bandwidth *rt_b)
+static void start_bandwidth_timer(struct hrtimer *period_timer, ktime_t period)
{
- ktime_t now;
+ unsigned long delta;
+ ktime_t soft, hard, now;
+
+ for (;;) {
+ if (hrtimer_active(period_timer))
+ break;
+
+ now = hrtimer_cb_get_time(period_timer);
+ hrtimer_forward(period_timer, now, period);
+ soft = hrtimer_get_softexpires(period_timer);
+ hard = hrtimer_get_expires(period_timer);
+ delta = ktime_to_ns(ktime_sub(hard, soft));
+ __hrtimer_start_range_ns(period_timer, soft, delta,
+ HRTIMER_MODE_ABS_PINNED, 0);
+ }
+}
+
+static void start_rt_bandwidth(struct rt_bandwidth *rt_b)
+{
if (!rt_bandwidth_enabled() || rt_b->rt_runtime == RUNTIME_INF)
return;
@@ -204,22 +222,7 @@ static void start_rt_bandwidth(struct rt
return;
raw_spin_lock(&rt_b->rt_runtime_lock);
- for (;;) {
- unsigned long delta;
- ktime_t soft, hard;
-
- if (hrtimer_active(&rt_b->rt_period_timer))
- break;
-
- now = hrtimer_cb_get_time(&rt_b->rt_period_timer);
- hrtimer_forward(&rt_b->rt_period_timer, now, rt_b->rt_period);
-
- soft = hrtimer_get_softexpires(&rt_b->rt_period_timer);
- hard = hrtimer_get_expires(&rt_b->rt_period_timer);
- delta = ktime_to_ns(ktime_sub(hard, soft));
- __hrtimer_start_range_ns(&rt_b->rt_period_timer, soft, delta,
- HRTIMER_MODE_ABS_PINNED, 0);
- }
+ start_bandwidth_timer(&rt_b->rt_period_timer, rt_b->rt_period);
raw_spin_unlock(&rt_b->rt_runtime_lock);
}
@@ -250,6 +253,9 @@ struct cfs_bandwidth {
ktime_t period;
u64 quota;
s64 hierarchal_quota;
+
+ int idle;
+ struct hrtimer period_timer;
#endif
};
@@ -394,12 +400,38 @@ static inline struct cfs_bandwidth *tg_c
#ifdef CONFIG_CFS_BANDWIDTH
static inline u64 default_cfs_period(void);
+static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun);
+
+static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
+{
+ struct cfs_bandwidth *cfs_b =
+ container_of(timer, struct cfs_bandwidth, period_timer);
+ ktime_t now;
+ int overrun;
+ int idle = 0;
+
+ for (;;) {
+ now = hrtimer_cb_get_time(timer);
+ overrun = hrtimer_forward(timer, now, cfs_b->period);
+
+ if (!overrun)
+ break;
+
+ idle = do_sched_cfs_period_timer(cfs_b, overrun);
+ }
+
+ return idle ? HRTIMER_NORESTART : HRTIMER_RESTART;
+}
static void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
{
raw_spin_lock_init(&cfs_b->lock);
cfs_b->quota = RUNTIME_INF;
cfs_b->period = ns_to_ktime(default_cfs_period());
+
+ hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ cfs_b->period_timer.function = sched_cfs_period_timer;
+
}
static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq)
@@ -411,8 +443,25 @@ static void init_cfs_rq_runtime(struct c
cfs_rq->runtime_enabled = 1;
}
+static void start_cfs_bandwidth(struct cfs_rq *cfs_rq)
+{
+ struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
+
+ if (cfs_b->quota == RUNTIME_INF)
+ return;
+
+ if (hrtimer_active(&cfs_b->period_timer))
+ return;
+
+ raw_spin_lock(&cfs_b->lock);
+ start_bandwidth_timer(&cfs_b->period_timer, cfs_b->period);
+ raw_spin_unlock(&cfs_b->lock);
+}
+
static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
-{}
+{
+ hrtimer_cancel(&cfs_b->period_timer);
+}
#else
#ifdef CONFIG_FAIR_GROUP_SCHED
static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq) {}
Index: tip/kernel/sched_fair.c
===================================================================
--- tip.orig/kernel/sched_fair.c
+++ tip/kernel/sched_fair.c
@@ -1003,6 +1003,8 @@ enqueue_entity(struct cfs_rq *cfs_rq, st
if (cfs_rq->nr_running == 1)
list_add_leaf_cfs_rq(cfs_rq);
+
+ start_cfs_bandwidth(cfs_rq);
}
static void __clear_buddies_last(struct sched_entity *se)
@@ -1220,6 +1222,8 @@ static void put_prev_entity(struct cfs_r
update_stats_wait_start(cfs_rq, prev);
/* Put 'current' back into the tree. */
__enqueue_entity(cfs_rq, prev);
+
+ start_cfs_bandwidth(cfs_rq);
}
cfs_rq->curr = NULL;
}
@@ -1272,6 +1276,11 @@ static inline u64 default_cfs_period(voi
{
return 500000000ULL;
}
+
+static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
+{
+ return 1;
+}
#endif
/**************************************************
^ permalink raw reply [flat|nested] 129+ messages in thread
* [patch 06/15] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
` (4 preceding siblings ...)
2011-05-03 9:28 ` [patch 05/15] sched: add a timer to handle CFS bandwidth refresh Paul Turner
@ 2011-05-03 9:28 ` Paul Turner
2011-05-10 7:22 ` Hidetoshi Seto
` (2 more replies)
2011-05-03 9:28 ` [patch 07/15] sched: expire invalid runtime Paul Turner
` (10 subsequent siblings)
16 siblings, 3 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:28 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov, Nikhil Rao
[-- Attachment #1: sched-bwc-account_cfs_rq_runtime.patch --]
[-- Type: text/plain, Size: 5873 bytes --]
Account bandwidth usage on the cfs_rq level versus the task_groups to which
they belong. Whether we are tracking bandwidht on a given cfs_rq is maintained
under cfs_rq->runtime_enabled.
cfs_rq's which belong to a bandwidth constrained task_group have their runtime
accounted via the update_curr() path, which withdraws bandwidth from the global
pool as desired. Updates involving the global pool are currently protected
under cfs_bandwidth->lock, local runtime is protected by rq->lock.
This patch only attempts to assign and track quota, no action is taken in the
case that cfs_rq->runtime_used exceeds cfs_rq->runtime_assigned.
Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Nikhil Rao <ncrao@google.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
include/linux/sched.h | 4 ++
kernel/sched.c | 2 +
kernel/sched_fair.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++--
kernel/sysctl.c | 8 ++++
4 files changed, 96 insertions(+), 3 deletions(-)
Index: tip/kernel/sched_fair.c
===================================================================
--- tip.orig/kernel/sched_fair.c
+++ tip/kernel/sched_fair.c
@@ -96,6 +96,15 @@ unsigned int __read_mostly sysctl_sched_
unsigned int sysctl_sched_cfs_bandwidth_consistent = 1;
#endif
+#ifdef CONFIG_CFS_BANDWIDTH
+/*
+ * amount of quota to allocate from global tg to local cfs_rq pool on each
+ * refresh
+ * default: 5ms, units: microseconds
+ */
+unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL;
+#endif
+
static const struct sched_class fair_sched_class;
/**************************************************************
@@ -312,6 +321,8 @@ find_matching_se(struct sched_entity **s
#endif /* CONFIG_FAIR_GROUP_SCHED */
+static void account_cfs_rq_runtime(struct cfs_rq *cfs_rq,
+ unsigned long delta_exec);
/**************************************************************
* Scheduling class tree data structure manipulation methods:
@@ -605,6 +616,8 @@ static void update_curr(struct cfs_rq *c
cpuacct_charge(curtask, delta_exec);
account_group_exec_runtime(curtask, delta_exec);
}
+
+ account_cfs_rq_runtime(cfs_rq, delta_exec);
}
static inline void
@@ -1277,10 +1290,68 @@ static inline u64 default_cfs_period(voi
return 500000000ULL;
}
+static inline u64 sched_cfs_bandwidth_slice(void)
+{
+ return (u64)sysctl_sched_cfs_bandwidth_slice * NSEC_PER_USEC;
+}
+
+static void assign_cfs_rq_runtime(struct cfs_rq *cfs_rq)
+{
+ struct task_group *tg = cfs_rq->tg;
+ struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(tg);
+ u64 amount = 0, min_amount;
+
+ /* note: this is a positive sum, runtime_remaining <= 0 */
+ min_amount = sched_cfs_bandwidth_slice() - cfs_rq->runtime_remaining;
+
+ raw_spin_lock(&cfs_b->lock);
+ if (cfs_b->quota == RUNTIME_INF)
+ amount = min_amount;
+ else if (cfs_b->runtime > 0) {
+ amount = min(cfs_b->runtime, min_amount);
+ cfs_b->runtime -= amount;
+ }
+ cfs_b->idle = 0;
+ raw_spin_unlock(&cfs_b->lock);
+
+ cfs_rq->runtime_remaining += amount;
+}
+
+static void account_cfs_rq_runtime(struct cfs_rq *cfs_rq,
+ unsigned long delta_exec)
+{
+ if (!cfs_rq->runtime_enabled)
+ return;
+
+ cfs_rq->runtime_remaining -= delta_exec;
+ if (cfs_rq->runtime_remaining > 0)
+ return;
+
+ assign_cfs_rq_runtime(cfs_rq);
+}
+
static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
{
- return 1;
+ u64 quota, runtime = 0;
+ int idle = 0;
+
+ raw_spin_lock(&cfs_b->lock);
+ quota = cfs_b->quota;
+
+ if (quota != RUNTIME_INF) {
+ runtime = quota;
+ cfs_b->runtime = runtime;
+
+ idle = cfs_b->idle;
+ cfs_b->idle = 1;
+ }
+ raw_spin_unlock(&cfs_b->lock);
+
+ return idle;
}
+#else
+static void account_cfs_rq_runtime(struct cfs_rq *cfs_rq,
+ unsigned long delta_exec) {}
#endif
/**************************************************
@@ -4222,8 +4293,16 @@ static void set_curr_task_fair(struct rq
{
struct sched_entity *se = &rq->curr->se;
- for_each_sched_entity(se)
- set_next_entity(cfs_rq_of(se), se);
+ for_each_sched_entity(se) {
+ struct cfs_rq *cfs_rq = cfs_rq_of(se);
+
+ set_next_entity(cfs_rq, se);
+ /*
+ * if bandwidth is enabled, make sure it is up-to-date or
+ * reschedule for the case of a move into a throttled cpu.
+ */
+ account_cfs_rq_runtime(cfs_rq, 0);
+ }
}
#ifdef CONFIG_FAIR_GROUP_SCHED
Index: tip/kernel/sysctl.c
===================================================================
--- tip.orig/kernel/sysctl.c
+++ tip/kernel/sysctl.c
@@ -377,6 +377,14 @@ static struct ctl_table kern_table[] = {
.extra1 = &zero,
.extra2 = &one,
},
+ {
+ .procname = "sched_cfs_bandwidth_slice_us",
+ .data = &sysctl_sched_cfs_bandwidth_slice,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &one,
+ },
#endif
#ifdef CONFIG_SCHED_AUTOGROUP
{
Index: tip/include/linux/sched.h
===================================================================
--- tip.orig/include/linux/sched.h
+++ tip/include/linux/sched.h
@@ -1958,6 +1958,10 @@ int sched_cfs_consistent_handler(struct
loff_t *ppos);
#endif
+#ifdef CONFIG_CFS_BANDWIDTH
+extern unsigned int sysctl_sched_cfs_bandwidth_slice;
+#endif
+
#ifdef CONFIG_SCHED_AUTOGROUP
extern unsigned int sysctl_sched_autogroup_enabled;
Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -252,6 +252,7 @@ struct cfs_bandwidth {
raw_spinlock_t lock;
ktime_t period;
u64 quota;
+ u64 runtime;
s64 hierarchal_quota;
int idle;
@@ -426,6 +427,7 @@ static enum hrtimer_restart sched_cfs_pe
static void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
{
raw_spin_lock_init(&cfs_b->lock);
+ cfs_b->runtime = 0;
cfs_b->quota = RUNTIME_INF;
cfs_b->period = ns_to_ktime(default_cfs_period());
^ permalink raw reply [flat|nested] 129+ messages in thread
* [patch 07/15] sched: expire invalid runtime
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
` (5 preceding siblings ...)
2011-05-03 9:28 ` [patch 06/15] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth Paul Turner
@ 2011-05-03 9:28 ` Paul Turner
2011-05-10 7:22 ` Hidetoshi Seto
` (2 more replies)
2011-05-03 9:28 ` [patch 08/15] sched: throttle cfs_rq entities which exceed their local runtime Paul Turner
` (9 subsequent siblings)
16 siblings, 3 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:28 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
[-- Attachment #1: sched-bwc-expire_cfs_rq_runtime.patch --]
[-- Type: text/plain, Size: 4773 bytes --]
With the global quota pool, one challenge is determining when the runtime we
have received from it is still valid. Fortunately we can take advantage of
sched_clock synchronization around the jiffy to do this cheaply.
The one catch is that we don't know whether our local clock is behind or ahead
of the cpu setting the expiration time (relative to its own clock).
Fortunately we can detect which of these is the case by determining whether the
global deadline has advanced. If it has not, then we assume we are behind, and
advance our local expiration; otherwise, we know the deadline has truly passed
and we expire our local runtime.
Signed-off-by: Paul Turner <pjt@google.com>
---
kernel/sched.c | 8 +++++++-
kernel/sched_fair.c | 42 +++++++++++++++++++++++++++++++++++++++---
2 files changed, 46 insertions(+), 4 deletions(-)
Index: tip/kernel/sched_fair.c
===================================================================
--- tip.orig/kernel/sched_fair.c
+++ tip/kernel/sched_fair.c
@@ -1299,7 +1299,7 @@ static void assign_cfs_rq_runtime(struct
{
struct task_group *tg = cfs_rq->tg;
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(tg);
- u64 amount = 0, min_amount;
+ u64 amount = 0, min_amount, expires;
/* note: this is a positive sum, runtime_remaining <= 0 */
min_amount = sched_cfs_bandwidth_slice() - cfs_rq->runtime_remaining;
@@ -1312,9 +1312,38 @@ static void assign_cfs_rq_runtime(struct
cfs_b->runtime -= amount;
}
cfs_b->idle = 0;
+ expires = cfs_b->runtime_expires;
raw_spin_unlock(&cfs_b->lock);
cfs_rq->runtime_remaining += amount;
+ cfs_rq->runtime_expires = max(cfs_rq->runtime_expires, expires);
+}
+
+static void expire_cfs_rq_runtime(struct cfs_rq *cfs_rq)
+{
+ struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
+ struct rq *rq = rq_of(cfs_rq);
+
+ if (rq->clock < cfs_rq->runtime_expires)
+ return;
+
+ /*
+ * If the local deadline has passed we have to cover for the
+ * possibility that our sched_clock is ahead and the global deadline
+ * has not truly expired.
+ *
+ * Fortunately we can check which of these is the case by determining
+ * whether the global deadline has advanced.
+ */
+
+ if (cfs_rq->runtime_expires >= cfs_b->runtime_expires) {
+ /* extend local deadline, drift is bounded above by 2 ticks */
+ cfs_rq->runtime_expires += TICK_NSEC;
+ } else {
+ /* global deadline is ahead, deadline must have passed */
+ if (cfs_rq->runtime_remaining > 0)
+ cfs_rq->runtime_remaining = 0;
+ }
}
static void account_cfs_rq_runtime(struct cfs_rq *cfs_rq,
@@ -1324,6 +1353,9 @@ static void account_cfs_rq_runtime(struc
return;
cfs_rq->runtime_remaining -= delta_exec;
+ /* dock delta_exec before expiring quota (as it could span periods) */
+ expire_cfs_rq_runtime(cfs_rq);
+
if (cfs_rq->runtime_remaining > 0)
return;
@@ -1332,16 +1364,20 @@ static void account_cfs_rq_runtime(struc
static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
{
- u64 quota, runtime = 0;
+ u64 quota, runtime = 0, runtime_expires;
int idle = 0;
+ runtime_expires = sched_clock_cpu(smp_processor_id());
+
raw_spin_lock(&cfs_b->lock);
quota = cfs_b->quota;
if (quota != RUNTIME_INF) {
runtime = quota;
- cfs_b->runtime = runtime;
+ runtime_expires += ktime_to_ns(cfs_b->period);
+ cfs_b->runtime = runtime;
+ cfs_b->runtime_expires = runtime_expires;
idle = cfs_b->idle;
cfs_b->idle = 1;
}
Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -253,6 +253,7 @@ struct cfs_bandwidth {
ktime_t period;
u64 quota;
u64 runtime;
+ u64 runtime_expires;
s64 hierarchal_quota;
int idle;
@@ -389,6 +390,7 @@ struct cfs_rq {
#endif
#ifdef CONFIG_CFS_BANDWIDTH
int runtime_enabled;
+ u64 runtime_expires;
s64 runtime_remaining;
#endif
#endif
@@ -9242,6 +9244,7 @@ static int tg_set_cfs_bandwidth(struct t
{
int i, ret = 0;
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(tg);
+ u64 runtime_expires;
if (tg == &root_task_group)
return -EINVAL;
@@ -9271,7 +9274,9 @@ static int tg_set_cfs_bandwidth(struct t
raw_spin_lock_irq(&cfs_b->lock);
cfs_b->period = ns_to_ktime(period);
- cfs_b->quota = quota;
+ cfs_b->quota = cfs_b->runtime = quota;
+ runtime_expires = sched_clock_cpu(smp_processor_id()) + period;
+ cfs_b->runtime_expires = runtime_expires;
raw_spin_unlock_irq(&cfs_b->lock);
for_each_possible_cpu(i) {
@@ -9281,6 +9286,7 @@ static int tg_set_cfs_bandwidth(struct t
raw_spin_lock_irq(&rq->lock);
cfs_rq->runtime_enabled = quota != RUNTIME_INF;
cfs_rq->runtime_remaining = 0;
+ cfs_rq->runtime_expires = runtime_expires;
raw_spin_unlock_irq(&rq->lock);
}
out_unlock:
^ permalink raw reply [flat|nested] 129+ messages in thread
* [patch 08/15] sched: throttle cfs_rq entities which exceed their local runtime
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
` (6 preceding siblings ...)
2011-05-03 9:28 ` [patch 07/15] sched: expire invalid runtime Paul Turner
@ 2011-05-03 9:28 ` Paul Turner
2011-05-10 7:23 ` Hidetoshi Seto
` (2 more replies)
2011-05-03 9:28 ` [patch 09/15] sched: unthrottle cfs_rq(s) who ran out of quota at period refresh Paul Turner
` (8 subsequent siblings)
16 siblings, 3 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:28 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov, Nikhil Rao
[-- Attachment #1: sched-bwc-throttle_entities.patch --]
[-- Type: text/plain, Size: 8090 bytes --]
In account_cfs_rq_runtime() (via update_curr()) we track consumption versus a
cfs_rqs locally assigned runtime and whether there is global runtime available
to provide a refill when it runs out.
In the case that there is no runtime remaining it's necessary to throttle so
that execution ceases until the susbequent period. While it is at this
boundary that we detect (and signal for, via reshed_task) that a throttle is
required, the actual operation is deferred until put_prev_entity().
At this point the cfs_rq is marked as throttled and not re-enqueued, this
avoids potential interactions with throttled runqueues in the event that we
are not immediately able to evict the running task.
Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Nikhil Rao <ncrao@google.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
kernel/sched.c | 7 ++
kernel/sched_fair.c | 131 ++++++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 133 insertions(+), 5 deletions(-)
Index: tip/kernel/sched_fair.c
===================================================================
--- tip.orig/kernel/sched_fair.c
+++ tip/kernel/sched_fair.c
@@ -985,6 +985,8 @@ place_entity(struct cfs_rq *cfs_rq, stru
se->vruntime = vruntime;
}
+static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
+
static void
enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
@@ -1014,8 +1016,10 @@ enqueue_entity(struct cfs_rq *cfs_rq, st
__enqueue_entity(cfs_rq, se);
se->on_rq = 1;
- if (cfs_rq->nr_running == 1)
+ if (cfs_rq->nr_running == 1) {
list_add_leaf_cfs_rq(cfs_rq);
+ check_enqueue_throttle(cfs_rq);
+ }
start_cfs_bandwidth(cfs_rq);
}
@@ -1221,6 +1225,8 @@ static struct sched_entity *pick_next_en
return se;
}
+static void check_cfs_rq_runtime(struct cfs_rq *cfs_rq);
+
static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev)
{
/*
@@ -1230,6 +1236,9 @@ static void put_prev_entity(struct cfs_r
if (prev->on_rq)
update_curr(cfs_rq);
+ /* throttle cfs_rqs exceeding runtime */
+ check_cfs_rq_runtime(cfs_rq);
+
check_spread(cfs_rq, prev);
if (prev->on_rq) {
update_stats_wait_start(cfs_rq, prev);
@@ -1295,7 +1304,7 @@ static inline u64 sched_cfs_bandwidth_sl
return (u64)sysctl_sched_cfs_bandwidth_slice * NSEC_PER_USEC;
}
-static void assign_cfs_rq_runtime(struct cfs_rq *cfs_rq)
+static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq)
{
struct task_group *tg = cfs_rq->tg;
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(tg);
@@ -1317,6 +1326,8 @@ static void assign_cfs_rq_runtime(struct
cfs_rq->runtime_remaining += amount;
cfs_rq->runtime_expires = max(cfs_rq->runtime_expires, expires);
+
+ return cfs_rq->runtime_remaining > 0;
}
static void expire_cfs_rq_runtime(struct cfs_rq *cfs_rq)
@@ -1359,7 +1370,90 @@ static void account_cfs_rq_runtime(struc
if (cfs_rq->runtime_remaining > 0)
return;
- assign_cfs_rq_runtime(cfs_rq);
+ /*
+ * if we're unable to extend our runtime we resched so that the active
+ * hierarchy can be throttled
+ */
+ if (!assign_cfs_rq_runtime(cfs_rq))
+ resched_task(rq_of(cfs_rq)->curr);
+}
+
+static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq)
+{
+ return cfs_rq->throttled;
+}
+
+static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
+{
+ struct rq *rq = rq_of(cfs_rq);
+ struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
+ struct sched_entity *se;
+ long task_delta, dequeue = 1;
+
+ se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
+
+ /* account load preceding throttle */
+ update_cfs_load(cfs_rq, 0);
+
+ task_delta = -cfs_rq->h_nr_running;
+ for_each_sched_entity(se) {
+ struct cfs_rq *qcfs_rq = cfs_rq_of(se);
+ /* throttled entity or throttle-on-deactivate */
+ if (!se->on_rq)
+ break;
+
+ if (dequeue)
+ dequeue_entity(qcfs_rq, se, DEQUEUE_SLEEP);
+ qcfs_rq->h_nr_running += task_delta;
+
+ if (qcfs_rq->load.weight)
+ dequeue = 0;
+ }
+
+ if (!se)
+ rq->nr_running += task_delta;
+
+ cfs_rq->throttled = 1;
+ raw_spin_lock(&cfs_b->lock);
+ list_add_tail_rcu(&cfs_rq->throttled_list, &cfs_b->throttled_cfs_rq);
+ raw_spin_unlock(&cfs_b->lock);
+}
+
+/* conditionally throttle active cfs_rq's from put_prev_entity() */
+static void check_cfs_rq_runtime(struct cfs_rq *cfs_rq)
+{
+ if (!cfs_rq->runtime_enabled || cfs_rq->runtime_remaining > 0)
+ return;
+
+ /*
+ * it's possible active load balance has forced a throttled cfs_rq to
+ * run again, we don't want to re-throttled in this case.
+ */
+ if (cfs_rq_throttled(cfs_rq))
+ return;
+
+ throttle_cfs_rq(cfs_rq);
+}
+
+/*
+ * When a group wakes up we want to make sure that its quota is not already
+ * expired, otherwise it may be allowed to steal additional ticks of runtime
+ * since update_curr() throttling can not not trigger until it's on-rq.
+ */
+static void check_enqueue_throttle(struct cfs_rq *cfs_rq)
+{
+ /* an active group must be handled by the update_curr()->put() path */
+ if (cfs_rq->curr || !cfs_rq->runtime_enabled)
+ return;
+
+ /* ensure the group is not already throttled */
+ if (cfs_rq_throttled(cfs_rq))
+ return;
+
+ /* update runtime allocation */
+ account_cfs_rq_runtime(cfs_rq, 0);
+ if (cfs_rq->runtime_remaining <= 0)
+ throttle_cfs_rq(cfs_rq);
}
static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
@@ -1389,6 +1483,14 @@ static int do_sched_cfs_period_timer(str
#else
static void account_cfs_rq_runtime(struct cfs_rq *cfs_rq,
unsigned long delta_exec) {}
+
+static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq)
+{
+ return 0;
+}
+
+static void check_cfs_rq_runtime(struct cfs_rq *cfs_rq) {}
+static void check_enqueue_throttle(struct cfs_rq *cfs_rq) {}
#endif
/**************************************************
@@ -1468,6 +1570,12 @@ enqueue_task_fair(struct rq *rq, struct
cfs_rq = cfs_rq_of(se);
enqueue_entity(cfs_rq, se, flags);
cfs_rq->h_nr_running++;
+
+ /* end evaluation on throttled cfs_rq */
+ if (cfs_rq_throttled(cfs_rq)) {
+ se = NULL;
+ break;
+ }
flags = ENQUEUE_WAKEUP;
}
@@ -1475,11 +1583,15 @@ enqueue_task_fair(struct rq *rq, struct
cfs_rq = cfs_rq_of(se);
cfs_rq->h_nr_running++;
+ if (cfs_rq_throttled(cfs_rq))
+ break;
+
update_cfs_load(cfs_rq, 0);
update_cfs_shares(cfs_rq);
}
- inc_nr_running(rq);
+ if (!se)
+ inc_nr_running(rq);
hrtick_update(rq);
}
@@ -1498,6 +1610,11 @@ static void dequeue_task_fair(struct rq
dequeue_entity(cfs_rq, se, flags);
cfs_rq->h_nr_running--;
+ /* end evaluation on throttled cfs_rq */
+ if (cfs_rq_throttled(cfs_rq)) {
+ se = NULL;
+ break;
+ }
/* Don't dequeue parent if it has other entities besides us */
if (cfs_rq->load.weight) {
se = parent_entity(se);
@@ -1510,11 +1627,15 @@ static void dequeue_task_fair(struct rq
cfs_rq = cfs_rq_of(se);
cfs_rq->h_nr_running--;
+ if (cfs_rq_throttled(cfs_rq))
+ break;
+
update_cfs_load(cfs_rq, 0);
update_cfs_shares(cfs_rq);
}
- dec_nr_running(rq);
+ if (!se)
+ dec_nr_running(rq);
hrtick_update(rq);
}
Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -258,6 +258,8 @@ struct cfs_bandwidth {
int idle;
struct hrtimer period_timer;
+ struct list_head throttled_cfs_rq;
+
#endif
};
@@ -392,6 +394,9 @@ struct cfs_rq {
int runtime_enabled;
u64 runtime_expires;
s64 runtime_remaining;
+
+ int throttled;
+ struct list_head throttled_list;
#endif
#endif
};
@@ -433,6 +438,7 @@ static void init_cfs_bandwidth(struct cf
cfs_b->quota = RUNTIME_INF;
cfs_b->period = ns_to_ktime(default_cfs_period());
+ INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq);
hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
cfs_b->period_timer.function = sched_cfs_period_timer;
@@ -442,6 +448,7 @@ static void init_cfs_rq_runtime(struct c
{
cfs_rq->runtime_remaining = 0;
cfs_rq->runtime_enabled = 0;
+ INIT_LIST_HEAD(&cfs_rq->throttled_list);
}
static void start_cfs_bandwidth(struct cfs_rq *cfs_rq)
^ permalink raw reply [flat|nested] 129+ messages in thread
* [patch 09/15] sched: unthrottle cfs_rq(s) who ran out of quota at period refresh
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
` (7 preceding siblings ...)
2011-05-03 9:28 ` [patch 08/15] sched: throttle cfs_rq entities which exceed their local runtime Paul Turner
@ 2011-05-03 9:28 ` Paul Turner
2011-05-10 7:24 ` Hidetoshi Seto
2011-05-03 9:28 ` [patch 10/15] sched: allow for positional tg_tree walks Paul Turner
` (7 subsequent siblings)
16 siblings, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:28 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov, Nikhil Rao
[-- Attachment #1: sched-bwc-unthrottle_entities.patch --]
[-- Type: text/plain, Size: 4346 bytes --]
At the start of a new period there are several actions we must refresh the
global bandwidth pool as well as unthrottle any cfs_rq entities who previously
ran out of bandwidth (as quota permits).
Unthrottled entities have the cfs_rq->throttled flag cleared and are re-enqueued
into the cfs entity hierarchy.
Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Nikhil Rao <ncrao@google.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
kernel/sched.c | 3 +
kernel/sched_fair.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 107 insertions(+), 1 deletion(-)
Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -9294,6 +9294,9 @@ static int tg_set_cfs_bandwidth(struct t
cfs_rq->runtime_enabled = quota != RUNTIME_INF;
cfs_rq->runtime_remaining = 0;
cfs_rq->runtime_expires = runtime_expires;
+
+ if (cfs_rq_throttled(cfs_rq))
+ unthrottle_cfs_rq(cfs_rq);
raw_spin_unlock_irq(&rq->lock);
}
out_unlock:
Index: tip/kernel/sched_fair.c
===================================================================
--- tip.orig/kernel/sched_fair.c
+++ tip/kernel/sched_fair.c
@@ -1456,10 +1456,88 @@ static void check_enqueue_throttle(struc
throttle_cfs_rq(cfs_rq);
}
+static void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
+{
+ struct rq *rq = rq_of(cfs_rq);
+ struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
+ struct sched_entity *se;
+ int enqueue = 1;
+ long task_delta;
+
+ se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
+
+ cfs_rq->throttled = 0;
+ raw_spin_lock(&cfs_b->lock);
+ list_del_rcu(&cfs_rq->throttled_list);
+ raw_spin_unlock(&cfs_b->lock);
+
+ if (!cfs_rq->load.weight)
+ return;
+
+ task_delta = cfs_rq->h_nr_running;
+ for_each_sched_entity(se) {
+ if (se->on_rq)
+ enqueue = 0;
+
+ cfs_rq = cfs_rq_of(se);
+ if (enqueue)
+ enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
+ cfs_rq->h_nr_running += task_delta;
+
+ if (cfs_rq_throttled(cfs_rq))
+ break;
+ }
+
+ if (!se)
+ rq->nr_running += task_delta;
+
+ /* determine whether we need to wake up potentially idle cpu */
+ if (rq->curr == rq->idle && rq->cfs.nr_running)
+ resched_task(rq->curr);
+}
+
+static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
+ u64 remaining, u64 expires)
+{
+ struct cfs_rq *cfs_rq;
+ u64 runtime = remaining;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(cfs_rq, &cfs_b->throttled_cfs_rq,
+ throttled_list) {
+ struct rq *rq = rq_of(cfs_rq);
+
+ raw_spin_lock(&rq->lock);
+ if (!cfs_rq_throttled(cfs_rq))
+ goto next;
+
+ runtime = -cfs_rq->runtime_remaining + 1;
+ if (runtime > remaining)
+ runtime = remaining;
+ remaining -= runtime;
+
+ cfs_rq->runtime_remaining += runtime;
+ cfs_rq->runtime_expires = expires;
+
+ /* we check whether we're throttled above */
+ if (cfs_rq->runtime_remaining > 0)
+ unthrottle_cfs_rq(cfs_rq);
+
+next:
+ raw_spin_unlock(&rq->lock);
+
+ if (!remaining)
+ break;
+ }
+ rcu_read_unlock();
+
+ return remaining;
+}
+
static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
{
u64 quota, runtime = 0, runtime_expires;
- int idle = 0;
+ int idle = 0, throttled = 0;
runtime_expires = sched_clock_cpu(smp_processor_id());
@@ -1469,6 +1547,7 @@ static int do_sched_cfs_period_timer(str
if (quota != RUNTIME_INF) {
runtime = quota;
runtime_expires += ktime_to_ns(cfs_b->period);
+ throttled = !list_empty(&cfs_b->throttled_cfs_rq);
cfs_b->runtime = runtime;
cfs_b->runtime_expires = runtime_expires;
@@ -1477,6 +1556,30 @@ static int do_sched_cfs_period_timer(str
}
raw_spin_unlock(&cfs_b->lock);
+ if (!throttled || quota == RUNTIME_INF)
+ goto out;
+ idle = 0;
+
+retry:
+ runtime = distribute_cfs_runtime(cfs_b, runtime, runtime_expires);
+
+ raw_spin_lock(&cfs_b->lock);
+ /* new new bandwidth may have been set */
+ if (unlikely(runtime_expires != cfs_b->runtime_expires))
+ goto out_unlock;
+ /*
+ * make sure no-one was throttled while we were handing out the new
+ * runtime.
+ */
+ if (runtime > 0 && !list_empty(&cfs_b->throttled_cfs_rq)) {
+ raw_spin_unlock(&cfs_b->lock);
+ goto retry;
+ }
+ cfs_b->runtime = runtime;
+ cfs_b->idle = idle;
+out_unlock:
+ raw_spin_unlock(&cfs_b->lock);
+out:
return idle;
}
#else
^ permalink raw reply [flat|nested] 129+ messages in thread
* [patch 10/15] sched: allow for positional tg_tree walks
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
` (8 preceding siblings ...)
2011-05-03 9:28 ` [patch 09/15] sched: unthrottle cfs_rq(s) who ran out of quota at period refresh Paul Turner
@ 2011-05-03 9:28 ` Paul Turner
2011-05-10 7:24 ` Hidetoshi Seto
2011-05-17 13:31 ` Peter Zijlstra
2011-05-03 9:28 ` [patch 11/15] sched: prevent interactions between throttled entities and load-balance Paul Turner
` (6 subsequent siblings)
16 siblings, 2 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:28 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
[-- Attachment #1: sched-bwc-refactor-walk_tg_tree.patch --]
[-- Type: text/plain, Size: 2015 bytes --]
Extend walk_tg_tree to accept a positional argument
static int walk_tg_tree_from(struct task_group *from,
tg_visitor down, tg_visitor up, void *data)
Existing semantics are preserved, caller must hold rcu_lock() or sufficient
analogue.
Signed-off-by: Paul Turner <pjt@google.com>
---
kernel/sched.c | 34 +++++++++++++++++++++++-----------
1 file changed, 23 insertions(+), 11 deletions(-)
Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -1430,21 +1430,19 @@ static inline void dec_cpu_load(struct r
#if (defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)) || defined(CONFIG_RT_GROUP_SCHED)
typedef int (*tg_visitor)(struct task_group *, void *);
-/*
- * Iterate the full tree, calling @down when first entering a node and @up when
- * leaving it for the final time.
- */
-static int walk_tg_tree(tg_visitor down, tg_visitor up, void *data)
+/* Iterate task_group tree rooted at *from */
+static int walk_tg_tree_from(struct task_group *from,
+ tg_visitor down, tg_visitor up, void *data)
{
struct task_group *parent, *child;
int ret;
- rcu_read_lock();
- parent = &root_task_group;
+ parent = from;
+
down:
ret = (*down)(parent, data);
if (ret)
- goto out_unlock;
+ goto out;
list_for_each_entry_rcu(child, &parent->children, siblings) {
parent = child;
goto down;
@@ -1453,14 +1451,28 @@ up:
continue;
}
ret = (*up)(parent, data);
- if (ret)
- goto out_unlock;
+ if (ret || parent == from)
+ goto out;
child = parent;
parent = parent->parent;
if (parent)
goto up;
-out_unlock:
+out:
+ return ret;
+}
+
+/*
+ * Iterate the full tree, calling @down when first entering a node and @up when
+ * leaving it for the final time.
+ */
+
+static inline int walk_tg_tree(tg_visitor down, tg_visitor up, void *data)
+{
+ int ret;
+
+ rcu_read_lock();
+ ret = walk_tg_tree_from(&root_task_group, down, up, data);
rcu_read_unlock();
return ret;
^ permalink raw reply [flat|nested] 129+ messages in thread
* [patch 11/15] sched: prevent interactions between throttled entities and load-balance
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
` (9 preceding siblings ...)
2011-05-03 9:28 ` [patch 10/15] sched: allow for positional tg_tree walks Paul Turner
@ 2011-05-03 9:28 ` Paul Turner
2011-05-10 7:26 ` Hidetoshi Seto
2011-05-03 9:28 ` [patch 12/15] sched: migrate throttled tasks on HOTPLUG Paul Turner
` (5 subsequent siblings)
16 siblings, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:28 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
[-- Attachment #1: sched-bwc-throttled_shares.patch --]
[-- Type: text/plain, Size: 5368 bytes --]
>From the perspective of load-balance and shares distribution, throttled
entities should be invisible.
However, both of these operations work on 'active' lists and are not
inherently aware of what group hierarchies may be present. In some cases this
may be side-stepped (e.g. we could sideload via tg_load_down in load balance)
while in others (e.g. update_shares()) it is more difficult to compute without
incurring some O(n^2) costs.
Instead, track hierarchal throttled state at time of transition. This allows
us to easily identify whether an entity belongs to a throttled hierarchy and
avoid incorrect interactions with it.
Also, when an entity leaves a throttled hierarchy we need to advance its
time averaging for shares averaging so that the elapsed throttled time is not
considered as part of the cfs_rq's operation.
Signed-off-by: Paul Turner <pjt@google.com>
---
kernel/sched.c | 2 -
kernel/sched_fair.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 71 insertions(+), 7 deletions(-)
Index: tip/kernel/sched_fair.c
===================================================================
--- tip.orig/kernel/sched_fair.c
+++ tip/kernel/sched_fair.c
@@ -739,13 +739,15 @@ static void update_cfs_rq_load_contribut
}
}
+static inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
+
static void update_cfs_load(struct cfs_rq *cfs_rq, int global_update)
{
u64 period = sysctl_sched_shares_window;
u64 now, delta;
unsigned long load = cfs_rq->load.weight;
- if (cfs_rq->tg == &root_task_group)
+ if (cfs_rq->tg == &root_task_group || throttled_hierarchy(cfs_rq))
return;
now = rq_of(cfs_rq)->clock_task;
@@ -1383,6 +1385,46 @@ static inline int cfs_rq_throttled(struc
return cfs_rq->throttled;
}
+static inline int throttled_hierarchy(struct cfs_rq *cfs_rq)
+{
+ return cfs_rq->throttle_count;
+}
+
+struct tg_unthrottle_down_data {
+ int cpu;
+ u64 now;
+};
+
+static int tg_unthrottle_down(struct task_group *tg, void *data)
+{
+ struct tg_unthrottle_down_data *udd = data;
+ struct cfs_rq *cfs_rq = tg->cfs_rq[udd->cpu];
+ u64 delta;
+
+ cfs_rq->throttle_count--;
+ if (!cfs_rq->throttle_count) {
+ /* leaving throttled state, move up windows */
+ delta = udd->now - cfs_rq->load_stamp;
+ cfs_rq->load_stamp += delta;
+ cfs_rq->load_last += delta;
+ }
+
+ return 0;
+}
+
+static int tg_throttle_down(struct task_group *tg, void *data)
+{
+ long cpu = (long)data;
+ struct cfs_rq *cfs_rq = tg->cfs_rq[cpu];
+
+ /* group is entering throttled state, record last load */
+ if (!cfs_rq->throttle_count)
+ update_cfs_load(cfs_rq, 0);
+ cfs_rq->throttle_count++;
+
+ return 0;
+}
+
static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
{
struct rq *rq = rq_of(cfs_rq);
@@ -1393,7 +1435,10 @@ static void throttle_cfs_rq(struct cfs_r
se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
/* account load preceding throttle */
- update_cfs_load(cfs_rq, 0);
+ rcu_read_lock();
+ walk_tg_tree_from(cfs_rq->tg, tg_throttle_down, tg_nop,
+ (void *)(long)rq_of(cfs_rq)->cpu);
+ rcu_read_unlock();
task_delta = -cfs_rq->h_nr_running;
for_each_sched_entity(se) {
@@ -1463,6 +1508,7 @@ static void unthrottle_cfs_rq(struct cfs
struct sched_entity *se;
int enqueue = 1;
long task_delta;
+ struct tg_unthrottle_down_data udd;
se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
@@ -1471,6 +1517,13 @@ static void unthrottle_cfs_rq(struct cfs
list_del_rcu(&cfs_rq->throttled_list);
raw_spin_unlock(&cfs_b->lock);
+ update_rq_clock(rq);
+ /* don't include throttled window for load statistics */
+ udd.cpu = rq->cpu;
+ udd.now = rq->clock_task;
+ walk_tg_tree_from(cfs_rq->tg, tg_unthrottle_down, tg_nop,
+ (void *)&udd);
+
if (!cfs_rq->load.weight)
return;
@@ -1591,6 +1644,11 @@ static inline int cfs_rq_throttled(struc
return 0;
}
+static inline int throttled_hierarchy(struct cfs_rq *cfs_rq)
+{
+ return 0;
+}
+
static void check_cfs_rq_runtime(struct cfs_rq *cfs_rq) {}
static void check_enqueue_throttle(struct cfs_rq *cfs_rq) {}
#endif
@@ -2449,6 +2507,9 @@ move_one_task(struct rq *this_rq, int th
int pinned = 0;
for_each_leaf_cfs_rq(busiest, cfs_rq) {
+ if (throttled_hierarchy(cfs_rq))
+ continue;
+
list_for_each_entry_safe(p, n, &cfs_rq->tasks, se.group_node) {
if (!can_migrate_task(p, busiest, this_cpu,
@@ -2548,8 +2609,10 @@ static int update_shares_cpu(struct task
raw_spin_lock_irqsave(&rq->lock, flags);
- update_rq_clock(rq);
- update_cfs_load(cfs_rq, 1);
+ if (!throttled_hierarchy(cfs_rq)) {
+ update_rq_clock(rq);
+ update_cfs_load(cfs_rq, 1);
+ }
/*
* We need to update shares after updating tg->load_weight in
@@ -2593,9 +2656,10 @@ load_balance_fair(struct rq *this_rq, in
u64 rem_load, moved_load;
/*
- * empty group
+ * empty group or part of a throttled hierarchy
*/
- if (!busiest_cfs_rq->task_weight)
+ if (!busiest_cfs_rq->task_weight ||
+ throttled_hierarchy(busiest_cfs_rq))
continue;
rem_load = (u64)rem_load_move * busiest_weight;
Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -395,7 +395,7 @@ struct cfs_rq {
u64 runtime_expires;
s64 runtime_remaining;
- int throttled;
+ int throttled, throttle_count;
struct list_head throttled_list;
#endif
#endif
^ permalink raw reply [flat|nested] 129+ messages in thread
* [patch 12/15] sched: migrate throttled tasks on HOTPLUG
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
` (10 preceding siblings ...)
2011-05-03 9:28 ` [patch 11/15] sched: prevent interactions between throttled entities and load-balance Paul Turner
@ 2011-05-03 9:28 ` Paul Turner
2011-05-10 7:27 ` Hidetoshi Seto
2011-05-03 9:28 ` [patch 13/15] sched: add exports tracking cfs bandwidth control statistics Paul Turner
` (4 subsequent siblings)
16 siblings, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:28 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
[-- Attachment #1: sched-bwc-migrate_dead.patch --]
[-- Type: text/plain, Size: 1735 bytes --]
Throttled tasks are invisible to cpu-offline since they are not eligible for
selection by pick_next_task(). The regular 'escape' path for a thread that is
blocked at offline is via ttwu->select_task_rq, however this will not handle a
throttled group since there are no individual thread wakeups on an unthrottle.
Resolve this by unthrottling offline cpus so that threads can be migrated.
Signed-off-by: Paul Turner <pjt@google.com>
---
kernel/sched.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -6145,6 +6145,32 @@ static void calc_global_load_remove(stru
rq->calc_load_active = 0;
}
+#ifdef CONFIG_CFS_BANDWIDTH
+static void unthrottle_offline_cfs_rqs(struct rq *rq)
+{
+ struct cfs_rq *cfs_rq;
+
+ for_each_leaf_cfs_rq(rq, cfs_rq) {
+ struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
+
+ if (!cfs_rq->runtime_enabled)
+ continue;
+
+ /*
+ * clock_task is not advancing so we just need to make sure
+ * there's some valid quota amount
+ */
+ cfs_rq->runtime_remaining = cfs_b->quota;
+ if (cfs_rq_throttled(cfs_rq))
+ unthrottle_cfs_rq(cfs_rq);
+ }
+}
+#else
+static void unthrottle_offline_cfs_rqs(struct rq *rq)
+{
+}
+#endif
+
/*
* Migrate all tasks from the rq, sleeping tasks will be migrated by
* try_to_wake_up()->select_task_rq().
@@ -6170,6 +6196,9 @@ static void migrate_tasks(unsigned int d
*/
rq->stop = NULL;
+ /* Ensure any throttled groups are reachable by pick_next_task */
+ unthrottle_offline_cfs_rqs(rq);
+
for ( ; ; ) {
/*
* There's this thread running, bail when that's the only
^ permalink raw reply [flat|nested] 129+ messages in thread
* [patch 13/15] sched: add exports tracking cfs bandwidth control statistics
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
` (11 preceding siblings ...)
2011-05-03 9:28 ` [patch 12/15] sched: migrate throttled tasks on HOTPLUG Paul Turner
@ 2011-05-03 9:28 ` Paul Turner
2011-05-10 7:27 ` Hidetoshi Seto
2011-05-11 7:56 ` Hidetoshi Seto
2011-05-03 9:29 ` [patch 14/15] sched: return unused runtime on voluntary sleep Paul Turner
` (3 subsequent siblings)
16 siblings, 2 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:28 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov, Nikhil Rao
[-- Attachment #1: sched-bwc-throttle_stats.patch --]
[-- Type: text/plain, Size: 3161 bytes --]
From: Nikhil Rao <ncrao@google.com>
This change introduces statistics exports for the cpu sub-system, these are
added through the use of a stat file similar to that exported by other
subsystems.
The following exports are included:
nr_periods: number of periods in which execution occurred
nr_throttled: the number of periods above in which execution was throttle
throttled_time: cumulative wall-time that any cpus have been throttled for
this group
Signed-off-by: Nikhil Rao <ncrao@google.com>
Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
kernel/sched.c | 22 ++++++++++++++++++++++
kernel/sched_fair.c | 9 +++++++++
2 files changed, 31 insertions(+)
Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -260,6 +260,10 @@ struct cfs_bandwidth {
struct hrtimer period_timer;
struct list_head throttled_cfs_rq;
+ /* statistics */
+ int nr_periods, nr_throttled;
+ u64 throttled_time;
+
#endif
};
@@ -395,6 +399,7 @@ struct cfs_rq {
u64 runtime_expires;
s64 runtime_remaining;
+ u64 throttled_timestamp;
int throttled, throttle_count;
struct list_head throttled_list;
#endif
@@ -9517,6 +9522,19 @@ int sched_cfs_consistent_handler(struct
return ret;
}
+
+static int cpu_stats_show(struct cgroup *cgrp, struct cftype *cft,
+ struct cgroup_map_cb *cb)
+{
+ struct task_group *tg = cgroup_tg(cgrp);
+ struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(tg);
+
+ cb->fill(cb, "nr_periods", cfs_b->nr_periods);
+ cb->fill(cb, "nr_throttled", cfs_b->nr_throttled);
+ cb->fill(cb, "throttled_time", cfs_b->throttled_time);
+
+ return 0;
+}
#endif /* CONFIG_CFS_BANDWIDTH */
#endif /* CONFIG_FAIR_GROUP_SCHED */
@@ -9563,6 +9581,10 @@ static struct cftype cpu_files[] = {
.read_u64 = cpu_cfs_period_read_u64,
.write_u64 = cpu_cfs_period_write_u64,
},
+ {
+ .name = "stat",
+ .read_map = cpu_stats_show,
+ },
#endif
#ifdef CONFIG_RT_GROUP_SCHED
{
Index: tip/kernel/sched_fair.c
===================================================================
--- tip.orig/kernel/sched_fair.c
+++ tip/kernel/sched_fair.c
@@ -1459,6 +1459,7 @@ static void throttle_cfs_rq(struct cfs_r
rq->nr_running += task_delta;
cfs_rq->throttled = 1;
+ cfs_rq->throttled_timestamp = rq->clock;
raw_spin_lock(&cfs_b->lock);
list_add_tail_rcu(&cfs_rq->throttled_list, &cfs_b->throttled_cfs_rq);
raw_spin_unlock(&cfs_b->lock);
@@ -1514,8 +1515,10 @@ static void unthrottle_cfs_rq(struct cfs
cfs_rq->throttled = 0;
raw_spin_lock(&cfs_b->lock);
+ cfs_b->throttled_time += rq->clock - cfs_rq->throttled_timestamp;
list_del_rcu(&cfs_rq->throttled_list);
raw_spin_unlock(&cfs_b->lock);
+ cfs_rq->throttled_timestamp = 0;
update_rq_clock(rq);
/* don't include throttled window for load statistics */
@@ -1628,6 +1631,12 @@ retry:
raw_spin_unlock(&cfs_b->lock);
goto retry;
}
+
+ /* update throttled stats */
+ cfs_b->nr_periods += overrun;
+ if (throttled)
+ cfs_b->nr_throttled += overrun;
+
cfs_b->runtime = runtime;
cfs_b->idle = idle;
out_unlock:
^ permalink raw reply [flat|nested] 129+ messages in thread
* [patch 14/15] sched: return unused runtime on voluntary sleep
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
` (12 preceding siblings ...)
2011-05-03 9:28 ` [patch 13/15] sched: add exports tracking cfs bandwidth control statistics Paul Turner
@ 2011-05-03 9:29 ` Paul Turner
2011-05-10 7:28 ` Hidetoshi Seto
2011-05-03 9:29 ` [patch 15/15] sched: add documentation for bandwidth control Paul Turner
` (2 subsequent siblings)
16 siblings, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:29 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
[-- Attachment #1: sched-bwc-simple_return_quota.patch --]
[-- Type: text/plain, Size: 7424 bytes --]
When a local cfs_rq blocks we return the majority of its remaining quota to the
global bandwidth pool for use by other runqueues.
We do this only when the quota is current and there is more than
min_cfs_rq_quota [1ms by default] of runtime remaining on the rq.
In the case where there are throttled runqueues and we have sufficient
bandwidth to meter out a slice, a second timer is kicked off to handle this
delivery, unthrottling where appropriate.
Using a 'worst case' antagonist which executes on each cpu
for 1ms before moving onto the next on a fairly large machine:
no quota generations:
197.47 ms /cgroup/a/cpuacct.usage
199.46 ms /cgroup/a/cpuacct.usage
205.46 ms /cgroup/a/cpuacct.usage
198.46 ms /cgroup/a/cpuacct.usage
208.39 ms /cgroup/a/cpuacct.usage
Since we are allowed to use "stale" quota our usage is effectively bounded by
the rate of input into the global pool and performance is relatively stable.
with quota generations [1s increments]:
119.58 ms /cgroup/a/cpuacct.usage
119.65 ms /cgroup/a/cpuacct.usage
119.64 ms /cgroup/a/cpuacct.usage
119.63 ms /cgroup/a/cpuacct.usage
119.60 ms /cgroup/a/cpuacct.usage
The large deficit here is due to quota generations (/intentionally/) preventing
us from now using previously stranded slack quota. The cost is that this quota
becomes unavailable.
with quota generations and quota return:
200.09 ms /cgroup/a/cpuacct.usage
200.09 ms /cgroup/a/cpuacct.usage
198.09 ms /cgroup/a/cpuacct.usage
200.09 ms /cgroup/a/cpuacct.usage
200.06 ms /cgroup/a/cpuacct.usage
By returning unused quota we're able to both stably consume our desired quota
and prevent unintentional overages due to the abuse of slack quota from
previous quota periods (especially on a large machine).
Signed-off-by: Paul Turner <pjt@google.com>
---
kernel/sched.c | 15 +++++++
kernel/sched_fair.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 113 insertions(+), 4 deletions(-)
Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -257,7 +257,7 @@ struct cfs_bandwidth {
s64 hierarchal_quota;
int idle;
- struct hrtimer period_timer;
+ struct hrtimer period_timer, slack_timer;
struct list_head throttled_cfs_rq;
/* statistics */
@@ -414,6 +414,16 @@ static inline struct cfs_bandwidth *tg_c
static inline u64 default_cfs_period(void);
static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun);
+static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b);
+
+static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
+{
+ struct cfs_bandwidth *cfs_b =
+ container_of(timer, struct cfs_bandwidth, slack_timer);
+ do_sched_cfs_slack_timer(cfs_b);
+
+ return HRTIMER_NORESTART;
+}
static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
{
@@ -446,6 +456,8 @@ static void init_cfs_bandwidth(struct cf
INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq);
hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
cfs_b->period_timer.function = sched_cfs_period_timer;
+ hrtimer_init(&cfs_b->slack_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ cfs_b->slack_timer.function = sched_cfs_slack_timer;
}
@@ -474,6 +486,7 @@ static void start_cfs_bandwidth(struct c
static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
{
hrtimer_cancel(&cfs_b->period_timer);
+ hrtimer_cancel(&cfs_b->slack_timer);
}
#else
#ifdef CONFIG_FAIR_GROUP_SCHED
Index: tip/kernel/sched_fair.c
===================================================================
--- tip.orig/kernel/sched_fair.c
+++ tip/kernel/sched_fair.c
@@ -1465,20 +1465,25 @@ static void throttle_cfs_rq(struct cfs_r
raw_spin_unlock(&cfs_b->lock);
}
+static void return_cfs_rq_quota(struct cfs_rq *cfs_rq);
+
/* conditionally throttle active cfs_rq's from put_prev_entity() */
static void check_cfs_rq_runtime(struct cfs_rq *cfs_rq)
{
- if (!cfs_rq->runtime_enabled || cfs_rq->runtime_remaining > 0)
+ if (!cfs_rq->runtime_enabled)
return;
/*
* it's possible active load balance has forced a throttled cfs_rq to
- * run again, we don't want to re-throttled in this case.
+ * run again, we don't want to re-throttle in this case.
*/
if (cfs_rq_throttled(cfs_rq))
return;
- throttle_cfs_rq(cfs_rq);
+ if (cfs_rq->runtime_remaining <= 0)
+ throttle_cfs_rq(cfs_rq);
+ else if (!cfs_rq->load.weight)
+ return_cfs_rq_quota(cfs_rq);
}
/*
@@ -1644,6 +1649,97 @@ out_unlock:
out:
return idle;
}
+
+/* a cfs_rq won't donate quota below this amount */
+static const u64 min_cfs_rq_quota = 1 * NSEC_PER_MSEC;
+/* minimum remaining period time to redistribute slack quota */
+static const u64 min_bandwidth_expiration = 2 * NSEC_PER_MSEC;
+/* how long we wait to gather additional slack before distributing */
+static const u64 cfs_bandwidth_slack_period = 5 * NSEC_PER_MSEC;
+
+/* are we near the end of the current quota period? */
+static int runtime_refresh_within(struct cfs_bandwidth *cfs_b, u64 min_expire)
+{
+ struct hrtimer *refresh_timer = &cfs_b->period_timer;
+ u64 remaining;
+
+ /* if the call back is running a quota refresh is occurring */
+ if (hrtimer_callback_running(refresh_timer))
+ return 1;
+
+ /* is a quota refresh about to occur? */
+ remaining = ktime_to_ns(hrtimer_expires_remaining(refresh_timer));
+ if (remaining < min_expire)
+ return 1;
+
+ return 0;
+}
+
+static void start_cfs_slack_bandwidth(struct cfs_bandwidth *cfs_b)
+{
+ u64 min_left = cfs_bandwidth_slack_period + min_bandwidth_expiration;
+
+ /* if there's a quota refresh soon don't bother with slack */
+ if (runtime_refresh_within(cfs_b, min_left))
+ return;
+
+ start_bandwidth_timer(&cfs_b->slack_timer,
+ ns_to_ktime(cfs_bandwidth_slack_period));
+}
+
+static void return_cfs_rq_quota(struct cfs_rq *cfs_rq)
+{
+ struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
+ s64 slack_runtime = cfs_rq->runtime_remaining - min_cfs_rq_quota;
+
+ if (!cfs_rq->runtime_enabled || cfs_rq->load.weight)
+ return;
+
+ if (slack_runtime <= 0)
+ return;
+
+ raw_spin_lock(&cfs_b->lock);
+ if (cfs_b->quota != RUNTIME_INF &&
+ cfs_b->runtime_expires == cfs_rq->runtime_expires) {
+ cfs_b->runtime += slack_runtime;
+
+ if (cfs_b->runtime > sched_cfs_bandwidth_slice() &&
+ !list_empty(&cfs_b->throttled_cfs_rq))
+ start_cfs_slack_bandwidth(cfs_b);
+ }
+ raw_spin_unlock(&cfs_b->lock);
+
+ cfs_rq->runtime_remaining -= slack_runtime;
+}
+
+static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
+{
+ u64 runtime = 0, slice = sched_cfs_bandwidth_slice();
+ u64 expires;
+
+ /* confirm we're still not at a refresh boundary */
+ if (runtime_refresh_within(cfs_b, min_bandwidth_expiration))
+ return;
+
+ raw_spin_lock(&cfs_b->lock);
+ if (cfs_b->quota != RUNTIME_INF && cfs_b->runtime > slice) {
+ runtime = cfs_b->runtime;
+ cfs_b->runtime = 0;
+ }
+ expires = cfs_b->runtime_expires;
+ raw_spin_unlock(&cfs_b->lock);
+
+ if (!runtime)
+ return;
+
+ runtime = distribute_cfs_runtime(cfs_b, runtime, expires);
+
+ raw_spin_lock(&cfs_b->lock);
+ if (expires == cfs_b->runtime_expires)
+ cfs_b->runtime = runtime;
+ raw_spin_unlock(&cfs_b->lock);
+}
+
#else
static void account_cfs_rq_runtime(struct cfs_rq *cfs_rq,
unsigned long delta_exec) {}
^ permalink raw reply [flat|nested] 129+ messages in thread
* [patch 15/15] sched: add documentation for bandwidth control
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
` (13 preceding siblings ...)
2011-05-03 9:29 ` [patch 14/15] sched: return unused runtime on voluntary sleep Paul Turner
@ 2011-05-03 9:29 ` Paul Turner
2011-05-10 7:29 ` Hidetoshi Seto
2011-06-07 15:45 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned Kamalesh Babulal
2011-06-14 6:58 ` [patch 00/15] CFS Bandwidth Control V6 Hu Tao
16 siblings, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-05-03 9:29 UTC (permalink / raw)
To: linux-kernel
Cc: Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
[-- Attachment #1: sched-bwc-documentation.patch --]
[-- Type: text/plain, Size: 4888 bytes --]
From: Bharata B Rao <bharata@linux.vnet.ibm.com>
Basic description of usage and effect for CFS Bandwidth Control.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Paul Turner <pjt@google.com>
---
Documentation/scheduler/sched-bwc.txt | 98
++++++++++++++++++++++++++++++++++
Documentation/scheduler/sched-bwc.txt | 104 ++++++++++++++++++++++++++++++++++
1 file changed, 104 insertions(+)
Index: tip/Documentation/scheduler/sched-bwc.txt
===================================================================
--- /dev/null
+++ tip/Documentation/scheduler/sched-bwc.txt
@@ -0,0 +1,104 @@
+CFS Bandwidth Control (aka CPU hard limits)
+===========================================
+
+[ This document talks about CPU bandwidth control of CFS groups only.
+ The bandwidth control of RT groups is explained in
+ Documentation/scheduler/sched-rt-group.txt ]
+
+CFS bandwidth control is a group scheduler extension that can be used to
+control the maximum CPU bandwidth obtained by a CPU cgroup.
+
+Bandwidth allowed for a group is specified using quota and period. Within
+a given "period" (microseconds), a group is allowed to consume up to "quota"
+microseconds of CPU time, which is the upper limit or the hard limit. When the
+CPU bandwidth consumption of a group exceeds the hard limit, the tasks in the
+group are throttled and are not allowed to run until the end of the period at
+which time the group's quota is replenished.
+
+Runtime available to the group is tracked globally. At the beginning of
+every period, group's global runtime pool is replenished with "quota"
+microseconds worth of runtime. The runtime consumption happens locally at each
+CPU by fetching runtimes in "slices" from the global pool.
+
+Interface
+---------
+Quota and period can be set via cgroup files.
+
+cpu.cfs_quota_us: the enforcement interval (microseconds)
+cpu.cfs_period_us: the maximum allowed bandwidth (microseconds)
+
+Within a period of cpu.cfs_period_us, the group as a whole will not be allowed
+to consume more than cpu_cfs_quota_us worth of runtime.
+
+The default value of cpu.cfs_period_us is 500ms and the default value
+for cpu.cfs_quota_us is -1.
+
+A group with cpu.cfs_quota_us as -1 indicates that the group has infinite
+bandwidth, which means that it is not bandwidth controlled.
+
+Writing any negative value to cpu.cfs_quota_us will turn the group into
+an infinite bandwidth group. Reading cpu.cfs_quota_us for an infinite
+bandwidth group will always return -1.
+
+System wide settings
+--------------------
+The amount of runtime obtained from global pool every time a CPU wants the
+group quota locally is controlled by a sysctl parameter
+sched_cfs_bandwidth_slice_us. The current default is 5ms. This can be changed
+by writing to /proc/sys/kernel/sched_cfs_bandwidth_slice_us.
+
+A quota hierarchy is defined to be consistent if the sum of child reservations
+does not exceed the bandwidth allocated to its parent. An entity with no
+explicit bandwidth reservation (e.g. no limit) is considered to inherit its
+parent's limits. This behavior may be managed using
+/proc/sys/kernel/sched_cfs_bandwidth_consistent
+
+Statistics
+----------
+cpu.stat file lists three different stats related to CPU bandwidth control.
+
+nr_periods: Number of enforcement intervals that have elapsed.
+nr_throttled: Number of times the group has been throttled/limited.
+throttled_time: The total time duration (in nanoseconds) for which the group
+remained throttled.
+
+These files are read-only.
+
+Hierarchy considerations
+------------------------
+Each group's bandwidth (quota and period) can be set independent of its
+parent or child groups. There are two ways in which a group can get
+throttled:
+
+- it consumed its quota within the period
+- it has quota left but the parent's quota is exhausted.
+
+In the 2nd case, even though the child has quota left, it will not be
+able to run since the parent itself is throttled. Similarly groups that are
+not bandwidth constrained might end up being throttled if any parent
+in their hierarchy is throttled.
+
+Examples
+--------
+1. Limit a group to 1 CPU worth of runtime.
+
+ If period is 500ms and quota is also 500ms, the group will get
+ 1 CPU worth of runtime every 500ms.
+
+ # echo 500000 > cpu.cfs_quota_us /* quota = 500ms */
+ # echo 500000 > cpu.cfs_period_us /* period = 500ms */
+
+2. Limit a group to 2 CPUs worth of runtime on a multi-CPU machine.
+
+ With 500ms period and 1000ms quota, the group can get 2 CPUs worth of
+ runtime every 500ms.
+
+ # echo 1000000 > cpu.cfs_quota_us /* quota = 1000ms */
+ # echo 500000 > cpu.cfs_period_us /* period = 500ms */
+
+3. Limit a group to 20% of 1 CPU.
+
+ With 500ms period, 100ms quota will be equivalent to 20% of 1 CPU.
+
+ # echo 100000 > cpu.cfs_quota_us /* quota = 100ms */
+ # echo 500000 > cpu.cfs_period_us /* period = 500ms */
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 01/15] sched: (fixlet) dont update shares twice on on_rq parent
2011-05-03 9:28 ` [patch 01/15] sched: (fixlet) dont update shares twice on on_rq parent Paul Turner
@ 2011-05-10 7:14 ` Hidetoshi Seto
2011-05-10 8:32 ` Mike Galbraith
0 siblings, 1 reply; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-10 7:14 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov
(2011/05/03 18:28), Paul Turner wrote:
> In dequeue_task_fair() we bail on dequeue when we encounter a parenting entity
> with additional weight. However, we perform a double shares update on this
> entity since we continue the shares update traversal from that point, despite
> dequeue_entity() having already updated its queuing cfs_rq.
>
> Avoid this by starting from the parent when we resume.
>
> Signed-off-by: Paul Turner <pjt@google.com>
> ---
> kernel/sched_fair.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> Index: tip/kernel/sched_fair.c
> ===================================================================
> --- tip.orig/kernel/sched_fair.c
> +++ tip/kernel/sched_fair.c
> @@ -1355,8 +1355,10 @@ static void dequeue_task_fair(struct rq
> dequeue_entity(cfs_rq, se, flags);
>
> /* Don't dequeue parent if it has other entities besides us */
> - if (cfs_rq->load.weight)
> + if (cfs_rq->load.weight) {
> + se = parent_entity(se);
> break;
> + }
> flags |= DEQUEUE_SLEEP;
> }
>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
This small fixlet can stand alone.
Peter, how about getting this into git tree first?
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 02/15] sched: hierarchical task accounting for SCHED_OTHER
2011-05-03 9:28 ` [patch 02/15] sched: hierarchical task accounting for SCHED_OTHER Paul Turner
@ 2011-05-10 7:17 ` Hidetoshi Seto
0 siblings, 0 replies; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-10 7:17 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov
Some typos in the description.
(2011/05/03 18:28), Paul Turner wrote:
> Introduce hierarchal task accounting for the group scheduling case in CFS, as
hierarchical
> well as promoting the responsibility for maintaining rq->nr_running to the
> scheduling classes.
>
> The primary motivation for this is that with scheduling classes supporting
> bandwidht throttling it is possible for entities participating in trottled
bandwidth throttled
> sub-trees to not have root visible changes in rq->nr_running across activate
> and de-activate operations. This in turn leads to incorrect idle and
> weight-per-task load balance decisions.
>
> This also allows us to make a small fixlet to the fastpath in pick_next_task()
> under group scheduling.
>
> Note: this issue also exists with the existing sched_rt throttling mechanism.
> This patch does not address that.
>
> Signed-off-by: Paul Turner <pjt@google.com>
>
> ---
The patch is good.
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 03/15] sched: introduce primitives to account for CFS bandwidth tracking
2011-05-03 9:28 ` [patch 03/15] sched: introduce primitives to account for CFS bandwidth tracking Paul Turner
@ 2011-05-10 7:18 ` Hidetoshi Seto
0 siblings, 0 replies; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-10 7:18 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov, Nikhil Rao
One nitpicking...
(2011/05/03 18:28), Paul Turner wrote:
> In this patch we introduce the notion of CFS bandwidth, partitioned into
> globally unassigned bandwidth, and locally claimed bandwidth.
>
> - The global bandwidth is per task_group, it represents a pool of unclaimed
> bandwidth that cfs_rqs can allocate from.
> - The local bandwidth is tracked per-cfs_rq, this represents allotments from
> the global pool bandwidth assigned to a specific cpu.
>
> Bandwidth is managed via cgroupfs, adding two new interfaces to the cpu subsystem:
> - cpu.cfs_period_us : the bandwidth period in usecs
> - cpu.cfs_quota_us : the cpu bandwidth (in usecs) that this tg will be allowed
> to consume over period above.
>
> Signed-off-by: Paul Turner <pjt@google.com>
> Signed-off-by: Nikhil Rao <ncrao@google.com>
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> ---
(snip)
> @@ -369,9 +379,45 @@ struct cfs_rq {
>
> unsigned long load_contribution;
> #endif
> +#ifdef CONFIG_CFS_BANDWIDTH
> + int runtime_enabled;
> + s64 runtime_remaining;
> +#endif
> #endif
> };
>
> +#ifdef CONFIG_CFS_BANDWIDTH
> +static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
> +{
> + return &tg->cfs_bandwidth;
> +}
> +
> +static inline u64 default_cfs_period(void);
> +
> +static void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
> +{
> + raw_spin_lock_init(&cfs_b->lock);
> + cfs_b->quota = RUNTIME_INF;
> + cfs_b->period = ns_to_ktime(default_cfs_period());
> +}
> +
> +static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq)
> +{
> + cfs_rq->runtime_remaining = 0;
> + cfs_rq->runtime_enabled = 0;
> +}
> +
> +static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
> +{}
> +#else
> +#ifdef CONFIG_FAIR_GROUP_SCHED
> +static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq) {}
> +void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
Nit: why not static?
> +static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
> +#endif /* CONFIG_FAIR_GROUP_SCHED */
> +static void start_cfs_bandwidth(struct cfs_rq *cfs_rq) {}
> +#endif /* CONFIG_CFS_BANDWIDTH */
> +
> /* Real-Time classes' related field in a runqueue: */
> struct rt_rq {
> struct rt_prio_array active;
The rest looks good for me.
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 04/15] sched: validate CFS quota hierarchies
2011-05-03 9:28 ` [patch 04/15] sched: validate CFS quota hierarchies Paul Turner
@ 2011-05-10 7:20 ` Hidetoshi Seto
2011-05-11 9:37 ` Paul Turner
2011-05-16 9:30 ` Peter Zijlstra
2011-05-16 9:43 ` Peter Zijlstra
2 siblings, 1 reply; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-10 7:20 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov
Description typos + one bug.
(2011/05/03 18:28), Paul Turner wrote:
> Add constraints validation for CFS bandwidth hierachies.
hierarchies
>
> Validate that:
> sum(child bandwidth) <= parent_bandwidth
>
> In a quota limited hierarchy, an unconstrainted entity
unconstrained
> (e.g. bandwidth==RUNTIME_INF) inherits the bandwidth of its parent.
>
> Since bandwidth periods may be non-uniform we normalize to the maximum allowed
> period, 1 second.
>
> This behavior may be disabled (allowing child bandwidth to exceed parent) via
> kernel.sched_cfs_bandwidth_consistent=0
>
> Signed-off-by: Paul Turner <pjt@google.com>
>
> ---
(snip)
> +/*
> + * normalize group quota/period to be quota/max_period
> + * note: units are usecs
> + */
> +static u64 normalize_cfs_quota(struct task_group *tg,
> + struct cfs_schedulable_data *d)
> +{
> + u64 quota, period;
> +
> + if (tg == d->tg) {
> + period = d->period;
> + quota = d->quota;
> + } else {
> + period = tg_get_cfs_period(tg);
> + quota = tg_get_cfs_quota(tg);
> + }
> +
> + if (quota == RUNTIME_INF)
> + return RUNTIME_INF;
> +
> + return to_ratio(period, quota);
> +}
Since tg_get_cfs_quota() doesn't return RUNTIME_INF but -1,
this function needs a fix like following.
For fixed version, feel free to add:
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Thanks,
H.Seto
---
kernel/sched.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index d2562aa..f171ba5 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -9465,16 +9465,17 @@ static u64 normalize_cfs_quota(struct task_group *tg,
u64 quota, period;
if (tg == d->tg) {
+ if (d->quota == RUNTIME_INF)
+ return RUNTIME_INF;
period = d->period;
quota = d->quota;
} else {
+ if (tg_cfs_bandwidth(tg)->quota == RUNTIME_INF)
+ return RUNTIME_INF;
period = tg_get_cfs_period(tg);
quota = tg_get_cfs_quota(tg);
}
- if (quota == RUNTIME_INF)
- return RUNTIME_INF;
-
return to_ratio(period, quota);
}
^ permalink raw reply related [flat|nested] 129+ messages in thread
* Re: [patch 05/15] sched: add a timer to handle CFS bandwidth refresh
2011-05-03 9:28 ` [patch 05/15] sched: add a timer to handle CFS bandwidth refresh Paul Turner
@ 2011-05-10 7:21 ` Hidetoshi Seto
2011-05-11 9:27 ` Paul Turner
2011-05-16 10:18 ` Peter Zijlstra
1 sibling, 1 reply; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-10 7:21 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov
(2011/05/03 18:28), Paul Turner wrote:
> @@ -250,6 +253,9 @@ struct cfs_bandwidth {
> ktime_t period;
> u64 quota;
> s64 hierarchal_quota;
> +
> + int idle;
> + struct hrtimer period_timer;
> #endif
> };
>
"idle" is not used yet. How about adding it in later patch?
Plus, comment explaining how it is used would be appreciated.
> static void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
> {
> raw_spin_lock_init(&cfs_b->lock);
> cfs_b->quota = RUNTIME_INF;
> cfs_b->period = ns_to_ktime(default_cfs_period());
> +
> + hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> + cfs_b->period_timer.function = sched_cfs_period_timer;
> +
> }
Nit: blank line?
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 06/15] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth
2011-05-03 9:28 ` [patch 06/15] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth Paul Turner
@ 2011-05-10 7:22 ` Hidetoshi Seto
2011-05-11 9:25 ` Paul Turner
2011-05-16 10:27 ` Peter Zijlstra
2011-05-16 10:32 ` Peter Zijlstra
2 siblings, 1 reply; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-10 7:22 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov, Nikhil Rao
(2011/05/03 18:28), Paul Turner wrote:
> Index: tip/include/linux/sched.h
> ===================================================================
> --- tip.orig/include/linux/sched.h
> +++ tip/include/linux/sched.h
> @@ -1958,6 +1958,10 @@ int sched_cfs_consistent_handler(struct
> loff_t *ppos);
> #endif
>
> +#ifdef CONFIG_CFS_BANDWIDTH
> +extern unsigned int sysctl_sched_cfs_bandwidth_slice;
> +#endif
> +
> #ifdef CONFIG_SCHED_AUTOGROUP
> extern unsigned int sysctl_sched_autogroup_enabled;
>
Nit: you can reuse ifdef just above here.
+#ifdef CONFIG_CFS_BANDWIDTH
+extern unsigned int sysctl_sched_cfs_bandwidth_consistent;
+
+int sched_cfs_consistent_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp,
+ loff_t *ppos);
+#endif
+
+#ifdef CONFIG_CFS_BANDWIDTH
+extern unsigned int sysctl_sched_cfs_bandwidth_slice;
+#endif
+
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 07/15] sched: expire invalid runtime
2011-05-03 9:28 ` [patch 07/15] sched: expire invalid runtime Paul Turner
@ 2011-05-10 7:22 ` Hidetoshi Seto
2011-05-16 11:05 ` Peter Zijlstra
2011-05-16 11:07 ` Peter Zijlstra
2 siblings, 0 replies; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-10 7:22 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov
(2011/05/03 18:28), Paul Turner wrote:
> With the global quota pool, one challenge is determining when the runtime we
> have received from it is still valid. Fortunately we can take advantage of
> sched_clock synchronization around the jiffy to do this cheaply.
>
> The one catch is that we don't know whether our local clock is behind or ahead
> of the cpu setting the expiration time (relative to its own clock).
>
> Fortunately we can detect which of these is the case by determining whether the
> global deadline has advanced. If it has not, then we assume we are behind, and
> advance our local expiration; otherwise, we know the deadline has truly passed
> and we expire our local runtime.
>
> Signed-off-by: Paul Turner <pjt@google.com>
>
> ---
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 08/15] sched: throttle cfs_rq entities which exceed their local runtime
2011-05-03 9:28 ` [patch 08/15] sched: throttle cfs_rq entities which exceed their local runtime Paul Turner
@ 2011-05-10 7:23 ` Hidetoshi Seto
2011-05-16 15:58 ` Peter Zijlstra
2011-05-16 16:05 ` Peter Zijlstra
2 siblings, 0 replies; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-10 7:23 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov, Nikhil Rao
(2011/05/03 18:28), Paul Turner wrote:
> Index: tip/kernel/sched.c
> ===================================================================
> --- tip.orig/kernel/sched.c
> +++ tip/kernel/sched.c
> @@ -258,6 +258,8 @@ struct cfs_bandwidth {
>
> int idle;
> struct hrtimer period_timer;
> + struct list_head throttled_cfs_rq;
> +
> #endif
> };
>
Nit: blank line?
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 09/15] sched: unthrottle cfs_rq(s) who ran out of quota at period refresh
2011-05-03 9:28 ` [patch 09/15] sched: unthrottle cfs_rq(s) who ran out of quota at period refresh Paul Turner
@ 2011-05-10 7:24 ` Hidetoshi Seto
2011-05-11 9:24 ` Paul Turner
0 siblings, 1 reply; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-10 7:24 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov, Nikhil Rao
Some comments...
(2011/05/03 18:28), Paul Turner wrote:
> At the start of a new period there are several actions we must refresh the
> global bandwidth pool as well as unthrottle any cfs_rq entities who previously
> ran out of bandwidth (as quota permits).
>
> Unthrottled entities have the cfs_rq->throttled flag cleared and are re-enqueued
> into the cfs entity hierarchy.
>
> Signed-off-by: Paul Turner <pjt@google.com>
> Signed-off-by: Nikhil Rao <ncrao@google.com>
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> ---
> kernel/sched.c | 3 +
> kernel/sched_fair.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 107 insertions(+), 1 deletion(-)
>
> Index: tip/kernel/sched.c
> ===================================================================
> --- tip.orig/kernel/sched.c
> +++ tip/kernel/sched.c
> @@ -9294,6 +9294,9 @@ static int tg_set_cfs_bandwidth(struct t
> cfs_rq->runtime_enabled = quota != RUNTIME_INF;
> cfs_rq->runtime_remaining = 0;
> cfs_rq->runtime_expires = runtime_expires;
> +
> + if (cfs_rq_throttled(cfs_rq))
> + unthrottle_cfs_rq(cfs_rq);
> raw_spin_unlock_irq(&rq->lock);
> }
> out_unlock:
> Index: tip/kernel/sched_fair.c
> ===================================================================
> --- tip.orig/kernel/sched_fair.c
> +++ tip/kernel/sched_fair.c
> @@ -1456,10 +1456,88 @@ static void check_enqueue_throttle(struc
> throttle_cfs_rq(cfs_rq);
> }
>
> +static void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
> +{
> + struct rq *rq = rq_of(cfs_rq);
> + struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
> + struct sched_entity *se;
> + int enqueue = 1;
> + long task_delta;
> +
> + se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
> +
> + cfs_rq->throttled = 0;
> + raw_spin_lock(&cfs_b->lock);
> + list_del_rcu(&cfs_rq->throttled_list);
> + raw_spin_unlock(&cfs_b->lock);
> +
> + if (!cfs_rq->load.weight)
> + return;
> +
> + task_delta = cfs_rq->h_nr_running;
> + for_each_sched_entity(se) {
> + if (se->on_rq)
> + enqueue = 0;
> +
> + cfs_rq = cfs_rq_of(se);
> + if (enqueue)
> + enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
> + cfs_rq->h_nr_running += task_delta;
> +
> + if (cfs_rq_throttled(cfs_rq))
> + break;
> + }
> +
> + if (!se)
> + rq->nr_running += task_delta;
> +
> + /* determine whether we need to wake up potentially idle cpu */
> + if (rq->curr == rq->idle && rq->cfs.nr_running)
> + resched_task(rq->curr);
> +}
> +
> +static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
> + u64 remaining, u64 expires)
> +{
> + struct cfs_rq *cfs_rq;
> + u64 runtime = remaining;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(cfs_rq, &cfs_b->throttled_cfs_rq,
> + throttled_list) {
> + struct rq *rq = rq_of(cfs_rq);
> +
> + raw_spin_lock(&rq->lock);
> + if (!cfs_rq_throttled(cfs_rq))
> + goto next;
> +
> + runtime = -cfs_rq->runtime_remaining + 1;
It will helpful if a comment can explain why negative and 1.
> + if (runtime > remaining)
> + runtime = remaining;
> + remaining -= runtime;
> +
> + cfs_rq->runtime_remaining += runtime;
> + cfs_rq->runtime_expires = expires;
> +
> + /* we check whether we're throttled above */
> + if (cfs_rq->runtime_remaining > 0)
> + unthrottle_cfs_rq(cfs_rq);
> +
> +next:
> + raw_spin_unlock(&rq->lock);
> +
> + if (!remaining)
> + break;
> + }
> + rcu_read_unlock();
> +
> + return remaining;
> +}
> +
> static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
> {
> u64 quota, runtime = 0, runtime_expires;
> - int idle = 0;
> + int idle = 0, throttled = 0;
>
> runtime_expires = sched_clock_cpu(smp_processor_id());
>
> @@ -1469,6 +1547,7 @@ static int do_sched_cfs_period_timer(str
> if (quota != RUNTIME_INF) {
> runtime = quota;
> runtime_expires += ktime_to_ns(cfs_b->period);
> + throttled = !list_empty(&cfs_b->throttled_cfs_rq);
>
> cfs_b->runtime = runtime;
> cfs_b->runtime_expires = runtime_expires;
> @@ -1477,6 +1556,30 @@ static int do_sched_cfs_period_timer(str
> }
> raw_spin_unlock(&cfs_b->lock);
>
> + if (!throttled || quota == RUNTIME_INF)
> + goto out;
> + idle = 0;
> +
> +retry:
> + runtime = distribute_cfs_runtime(cfs_b, runtime, runtime_expires);
> +
> + raw_spin_lock(&cfs_b->lock);
> + /* new new bandwidth may have been set */
Typo? new, newer, newest...?
> + if (unlikely(runtime_expires != cfs_b->runtime_expires))
> + goto out_unlock;
> + /*
> + * make sure no-one was throttled while we were handing out the new
> + * runtime.
> + */
> + if (runtime > 0 && !list_empty(&cfs_b->throttled_cfs_rq)) {
> + raw_spin_unlock(&cfs_b->lock);
> + goto retry;
> + }
> + cfs_b->runtime = runtime;
> + cfs_b->idle = idle;
> +out_unlock:
> + raw_spin_unlock(&cfs_b->lock);
> +out:
> return idle;
> }
> #else
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
It would be better if this unthrottle patch (09/15) comes before
throttle patch (08/15) in this series, not to make a small window
in the history that throttled entity never back to the run queue.
But I'm just paranoid...
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 10/15] sched: allow for positional tg_tree walks
2011-05-03 9:28 ` [patch 10/15] sched: allow for positional tg_tree walks Paul Turner
@ 2011-05-10 7:24 ` Hidetoshi Seto
2011-05-17 13:31 ` Peter Zijlstra
1 sibling, 0 replies; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-10 7:24 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov
(2011/05/03 18:28), Paul Turner wrote:
> Extend walk_tg_tree to accept a positional argument
>
> static int walk_tg_tree_from(struct task_group *from,
> tg_visitor down, tg_visitor up, void *data)
>
> Existing semantics are preserved, caller must hold rcu_lock() or sufficient
> analogue.
>
> Signed-off-by: Paul Turner <pjt@google.com>
> ---
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Yeah, it's nice to have.
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 11/15] sched: prevent interactions between throttled entities and load-balance
2011-05-03 9:28 ` [patch 11/15] sched: prevent interactions between throttled entities and load-balance Paul Turner
@ 2011-05-10 7:26 ` Hidetoshi Seto
2011-05-11 9:11 ` Paul Turner
0 siblings, 1 reply; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-10 7:26 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov
(2011/05/03 18:28), Paul Turner wrote:
>>From the perspective of load-balance and shares distribution, throttled
> entities should be invisible.
>
> However, both of these operations work on 'active' lists and are not
> inherently aware of what group hierarchies may be present. In some cases this
> may be side-stepped (e.g. we could sideload via tg_load_down in load balance)
> while in others (e.g. update_shares()) it is more difficult to compute without
> incurring some O(n^2) costs.
>
> Instead, track hierarchal throttled state at time of transition. This allows
hierarchical
> us to easily identify whether an entity belongs to a throttled hierarchy and
> avoid incorrect interactions with it.
>
> Also, when an entity leaves a throttled hierarchy we need to advance its
> time averaging for shares averaging so that the elapsed throttled time is not
> considered as part of the cfs_rq's operation.
>
> Signed-off-by: Paul Turner <pjt@google.com>
> ---
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 12/15] sched: migrate throttled tasks on HOTPLUG
2011-05-03 9:28 ` [patch 12/15] sched: migrate throttled tasks on HOTPLUG Paul Turner
@ 2011-05-10 7:27 ` Hidetoshi Seto
2011-05-11 9:10 ` Paul Turner
0 siblings, 1 reply; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-10 7:27 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov
(2011/05/03 18:28), Paul Turner wrote:
> +#else
> +static void unthrottle_offline_cfs_rqs(struct rq *rq)
> +{
> +}
> +#endif
> +
Nit: To follow others, alternative style is in a line:
+static void unthrottle_offline_cfs_rqs(struct rq *rq) {}
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 13/15] sched: add exports tracking cfs bandwidth control statistics
2011-05-03 9:28 ` [patch 13/15] sched: add exports tracking cfs bandwidth control statistics Paul Turner
@ 2011-05-10 7:27 ` Hidetoshi Seto
2011-05-11 7:56 ` Hidetoshi Seto
1 sibling, 0 replies; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-10 7:27 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov, Nikhil Rao
(2011/05/03 18:28), Paul Turner wrote:
> From: Nikhil Rao <ncrao@google.com>
>
> This change introduces statistics exports for the cpu sub-system, these are
> added through the use of a stat file similar to that exported by other
> subsystems.
>
> The following exports are included:
>
> nr_periods: number of periods in which execution occurred
> nr_throttled: the number of periods above in which execution was throttle
> throttled_time: cumulative wall-time that any cpus have been throttled for
> this group
>
> Signed-off-by: Nikhil Rao <ncrao@google.com>
> Signed-off-by: Paul Turner <pjt@google.com>
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> ---
> kernel/sched.c | 22 ++++++++++++++++++++++
> kernel/sched_fair.c | 9 +++++++++
> 2 files changed, 31 insertions(+)
>
> Index: tip/kernel/sched.c
> ===================================================================
> --- tip.orig/kernel/sched.c
> +++ tip/kernel/sched.c
> @@ -260,6 +260,10 @@ struct cfs_bandwidth {
> struct hrtimer period_timer;
> struct list_head throttled_cfs_rq;
>
> + /* statistics */
> + int nr_periods, nr_throttled;
> + u64 throttled_time;
> +
> #endif
> };
>
Nit: blank line?
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 14/15] sched: return unused runtime on voluntary sleep
2011-05-03 9:29 ` [patch 14/15] sched: return unused runtime on voluntary sleep Paul Turner
@ 2011-05-10 7:28 ` Hidetoshi Seto
0 siblings, 0 replies; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-10 7:28 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov
(2011/05/03 18:29), Paul Turner wrote:
> When a local cfs_rq blocks we return the majority of its remaining quota to the
> global bandwidth pool for use by other runqueues.
>
> We do this only when the quota is current and there is more than
> min_cfs_rq_quota [1ms by default] of runtime remaining on the rq.
>
> In the case where there are throttled runqueues and we have sufficient
> bandwidth to meter out a slice, a second timer is kicked off to handle this
> delivery, unthrottling where appropriate.
>
> Using a 'worst case' antagonist which executes on each cpu
> for 1ms before moving onto the next on a fairly large machine:
>
> no quota generations:
> 197.47 ms /cgroup/a/cpuacct.usage
> 199.46 ms /cgroup/a/cpuacct.usage
> 205.46 ms /cgroup/a/cpuacct.usage
> 198.46 ms /cgroup/a/cpuacct.usage
> 208.39 ms /cgroup/a/cpuacct.usage
> Since we are allowed to use "stale" quota our usage is effectively bounded by
> the rate of input into the global pool and performance is relatively stable.
>
> with quota generations [1s increments]:
> 119.58 ms /cgroup/a/cpuacct.usage
> 119.65 ms /cgroup/a/cpuacct.usage
> 119.64 ms /cgroup/a/cpuacct.usage
> 119.63 ms /cgroup/a/cpuacct.usage
> 119.60 ms /cgroup/a/cpuacct.usage
> The large deficit here is due to quota generations (/intentionally/) preventing
> us from now using previously stranded slack quota. The cost is that this quota
> becomes unavailable.
>
> with quota generations and quota return:
> 200.09 ms /cgroup/a/cpuacct.usage
> 200.09 ms /cgroup/a/cpuacct.usage
> 198.09 ms /cgroup/a/cpuacct.usage
> 200.09 ms /cgroup/a/cpuacct.usage
> 200.06 ms /cgroup/a/cpuacct.usage
> By returning unused quota we're able to both stably consume our desired quota
> and prevent unintentional overages due to the abuse of slack quota from
> previous quota periods (especially on a large machine).
>
> Signed-off-by: Paul Turner <pjt@google.com>
>
> ---
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 15/15] sched: add documentation for bandwidth control
2011-05-03 9:29 ` [patch 15/15] sched: add documentation for bandwidth control Paul Turner
@ 2011-05-10 7:29 ` Hidetoshi Seto
2011-05-11 9:09 ` Paul Turner
0 siblings, 1 reply; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-10 7:29 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov
(2011/05/03 18:29), Paul Turner wrote:
> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
>
> Basic description of usage and effect for CFS Bandwidth Control.
>
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Paul Turner <pjt@google.com>
> ---
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Thank you very much for your great work, Paul!
I've run some test on this version and no problems so far
(other than minor bug pointed by 04/15).
Definitely things getting better.
I'll continue tests and let you know if there is something.
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 01/15] sched: (fixlet) dont update shares twice on on_rq parent
2011-05-10 7:14 ` Hidetoshi Seto
@ 2011-05-10 8:32 ` Mike Galbraith
2011-05-11 7:55 ` Hidetoshi Seto
0 siblings, 1 reply; 129+ messages in thread
From: Mike Galbraith @ 2011-05-10 8:32 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: Paul Turner, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri, Kamalesh Babulal, Ingo Molnar,
Pavel Emelyanov
On Tue, 2011-05-10 at 16:14 +0900, Hidetoshi Seto wrote:
> (2011/05/03 18:28), Paul Turner wrote:
> > In dequeue_task_fair() we bail on dequeue when we encounter a parenting entity
> > with additional weight. However, we perform a double shares update on this
> > entity since we continue the shares update traversal from that point, despite
> > dequeue_entity() having already updated its queuing cfs_rq.
> >
> > Avoid this by starting from the parent when we resume.
> >
> > Signed-off-by: Paul Turner <pjt@google.com>
> > ---
> > kernel/sched_fair.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > Index: tip/kernel/sched_fair.c
> > ===================================================================
> > --- tip.orig/kernel/sched_fair.c
> > +++ tip/kernel/sched_fair.c
> > @@ -1355,8 +1355,10 @@ static void dequeue_task_fair(struct rq
> > dequeue_entity(cfs_rq, se, flags);
> >
> > /* Don't dequeue parent if it has other entities besides us */
> > - if (cfs_rq->load.weight)
> > + if (cfs_rq->load.weight) {
> > + se = parent_entity(se);
> > break;
> > + }
> > flags |= DEQUEUE_SLEEP;
> > }
> >
>
> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
>
> This small fixlet can stand alone.
> Peter, how about getting this into git tree first?
tip 2f36825b176f67e5c5228aa33d828bc39718811f contains the below.
/* Don't dequeue parent if it has other entities besides us */
- if (cfs_rq->load.weight)
+ if (cfs_rq->load.weight) {
+ /*
+ * Bias pick_next to pick a task from this cfs_rq, as
+ * p is sleeping when it is within its sched_slice.
+ */
+ if (task_sleep && parent_entity(se))
+ set_next_buddy(parent_entity(se));
break;
+ }
flags |= DEQUEUE_SLEEP;
}
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 01/15] sched: (fixlet) dont update shares twice on on_rq parent
2011-05-10 8:32 ` Mike Galbraith
@ 2011-05-11 7:55 ` Hidetoshi Seto
2011-05-11 8:13 ` Paul Turner
0 siblings, 1 reply; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-11 7:55 UTC (permalink / raw)
To: Mike Galbraith
Cc: Paul Turner, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri, Kamalesh Babulal, Ingo Molnar,
Pavel Emelyanov
(2011/05/10 17:32), Mike Galbraith wrote:
> On Tue, 2011-05-10 at 16:14 +0900, Hidetoshi Seto wrote:
>> This small fixlet can stand alone.
>> Peter, how about getting this into git tree first?
>
> tip 2f36825b176f67e5c5228aa33d828bc39718811f contains the below.
>
> /* Don't dequeue parent if it has other entities besides us */
> - if (cfs_rq->load.weight)
> + if (cfs_rq->load.weight) {
> + /*
> + * Bias pick_next to pick a task from this cfs_rq, as
> + * p is sleeping when it is within its sched_slice.
> + */
> + if (task_sleep && parent_entity(se))
> + set_next_buddy(parent_entity(se));
> break;
> + }
> flags |= DEQUEUE_SLEEP;
> }
Oh, thanks Mike!
It seems that this change in tip is better one.
Paul, don't you mind rebasing your patches onto tip/sched/core next time?
(...or is there better branch for rebase?)
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 13/15] sched: add exports tracking cfs bandwidth control statistics
2011-05-03 9:28 ` [patch 13/15] sched: add exports tracking cfs bandwidth control statistics Paul Turner
2011-05-10 7:27 ` Hidetoshi Seto
@ 2011-05-11 7:56 ` Hidetoshi Seto
2011-05-11 9:09 ` Paul Turner
1 sibling, 1 reply; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-11 7:56 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov, Nikhil Rao
Oops, I found an issue here.
(2011/05/03 18:28), Paul Turner wrote:
> @@ -1628,6 +1631,12 @@ retry:
> raw_spin_unlock(&cfs_b->lock);
> goto retry;
> }
> +
> + /* update throttled stats */
> + cfs_b->nr_periods += overrun;
> + if (throttled)
> + cfs_b->nr_throttled += overrun;
> +
> cfs_b->runtime = runtime;
> cfs_b->idle = idle;
> out_unlock:
Quoting from patch 09/15:
+ if (!throttled || quota == RUNTIME_INF)
+ goto out;
+ idle = 0;
+
+retry:
+ runtime = distribute_cfs_runtime(cfs_b, runtime, runtime_expires);
+
+ raw_spin_lock(&cfs_b->lock);
+ /* new new bandwidth may have been set */
+ if (unlikely(runtime_expires != cfs_b->runtime_expires))
+ goto out_unlock;
+ /*
+ * make sure no-one was throttled while we were handing out the new
+ * runtime.
+ */
+ if (runtime > 0 && !list_empty(&cfs_b->throttled_cfs_rq)) {
+ raw_spin_unlock(&cfs_b->lock);
+ goto retry;
+ }
+ cfs_b->runtime = runtime;
+ cfs_b->idle = idle;
+out_unlock:
+ raw_spin_unlock(&cfs_b->lock);
+out:
Since we skip distributing runtime (by "goto out") when !throttled,
the new block inserted by this patch is passed only when throttled.
So I see that nr_periods and nr_throttled look the same.
Maybe we should move this block up like followings.
Thanks,
H.Seto
---
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1620,6 +1620,12 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
idle = cfs_b->idle;
cfs_b->idle = 1;
}
+
+ /* update throttled stats */
+ cfs_b->nr_periods += overrun;
+ if (throttled)
+ cfs_b->nr_throttled += overrun;
+
raw_spin_unlock(&cfs_b->lock);
if (!throttled || quota == RUNTIME_INF)
@@ -1642,11 +1648,6 @@ retry:
goto retry;
}
- /* update throttled stats */
- cfs_b->nr_periods += overrun;
- if (throttled)
- cfs_b->nr_throttled += overrun;
-
cfs_b->runtime = runtime;
cfs_b->idle = idle;
out_unlock:
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 01/15] sched: (fixlet) dont update shares twice on on_rq parent
2011-05-11 7:55 ` Hidetoshi Seto
@ 2011-05-11 8:13 ` Paul Turner
2011-05-11 8:45 ` Mike Galbraith
0 siblings, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-05-11 8:13 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: Mike Galbraith, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri, Kamalesh Babulal, Ingo Molnar,
Pavel Emelyanov
On Wed, May 11, 2011 at 12:55 AM, Hidetoshi Seto
<seto.hidetoshi@jp.fujitsu.com> wrote:
> (2011/05/10 17:32), Mike Galbraith wrote:
>> On Tue, 2011-05-10 at 16:14 +0900, Hidetoshi Seto wrote:
>>> This small fixlet can stand alone.
>>> Peter, how about getting this into git tree first?
>>
>> tip 2f36825b176f67e5c5228aa33d828bc39718811f contains the below.
>>
>> /* Don't dequeue parent if it has other entities besides us */
>> - if (cfs_rq->load.weight)
>> + if (cfs_rq->load.weight) {
>> + /*
>> + * Bias pick_next to pick a task from this cfs_rq, as
>> + * p is sleeping when it is within its sched_slice.
>> + */
>> + if (task_sleep && parent_entity(se))
>> + set_next_buddy(parent_entity(se));
>> break;
>> + }
>> flags |= DEQUEUE_SLEEP;
>> }
>
> Oh, thanks Mike!
> It seems that this change in tip is better one.
>
> Paul, don't you mind rebasing your patches onto tip/sched/core next time?
> (...or is there better branch for rebase?)
>
I thought I had but apparently I missed this.
We still need to set se = parent_entity(se) to avoid the pointless
double update below.
Will definitely rebase.
Thanks!
>
> Thanks,
> H.Seto
>
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 01/15] sched: (fixlet) dont update shares twice on on_rq parent
2011-05-11 8:13 ` Paul Turner
@ 2011-05-11 8:45 ` Mike Galbraith
2011-05-11 8:59 ` Hidetoshi Seto
0 siblings, 1 reply; 129+ messages in thread
From: Mike Galbraith @ 2011-05-11 8:45 UTC (permalink / raw)
To: Paul Turner
Cc: Hidetoshi Seto, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri, Kamalesh Babulal, Ingo Molnar,
Pavel Emelyanov
On Wed, 2011-05-11 at 01:13 -0700, Paul Turner wrote:
> On Wed, May 11, 2011 at 12:55 AM, Hidetoshi Seto
> <seto.hidetoshi@jp.fujitsu.com> wrote:
> > (2011/05/10 17:32), Mike Galbraith wrote:
> >> On Tue, 2011-05-10 at 16:14 +0900, Hidetoshi Seto wrote:
> >>> This small fixlet can stand alone.
> >>> Peter, how about getting this into git tree first?
> >>
> >> tip 2f36825b176f67e5c5228aa33d828bc39718811f contains the below.
> >>
> >> /* Don't dequeue parent if it has other entities besides us */
> >> - if (cfs_rq->load.weight)
> >> + if (cfs_rq->load.weight) {
> >> + /*
> >> + * Bias pick_next to pick a task from this cfs_rq, as
> >> + * p is sleeping when it is within its sched_slice.
> >> + */
> >> + if (task_sleep && parent_entity(se))
> >> + set_next_buddy(parent_entity(se));
> >> break;
> >> + }
> >> flags |= DEQUEUE_SLEEP;
> >> }
> >
> > Oh, thanks Mike!
> > It seems that this change in tip is better one.
> >
> > Paul, don't you mind rebasing your patches onto tip/sched/core next time?
> > (...or is there better branch for rebase?)
> >
>
> I thought I had but apparently I missed this.
>
> We still need to set se = parent_entity(se) to avoid the pointless
> double update below.
>
> Will definitely rebase.
Wish I could, wouldn't have 114 other patches just to get evaluation
tree up to speed :)
Index: linux-2.6.32/kernel/sched_fair.c
===================================================================
--- linux-2.6.32.orig/kernel/sched_fair.c
+++ linux-2.6.32/kernel/sched_fair.c
@@ -1308,12 +1308,15 @@ static void dequeue_task_fair(struct rq
/* Don't dequeue parent if it has other entities besides us */
if (cfs_rq->load.weight) {
+ /* Avoid double update below. */
+ se = parent_entity(se);
+
/*
* Bias pick_next to pick a task from this cfs_rq, as
* p is sleeping when it is within its sched_slice.
*/
- if (task_sleep && parent_entity(se))
- set_next_buddy(parent_entity(se));
+ if (task_sleep && se)
+ set_next_buddy(se);
break;
}
flags |= DEQUEUE_SLEEP;
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 01/15] sched: (fixlet) dont update shares twice on on_rq parent
2011-05-11 8:45 ` Mike Galbraith
@ 2011-05-11 8:59 ` Hidetoshi Seto
0 siblings, 0 replies; 129+ messages in thread
From: Hidetoshi Seto @ 2011-05-11 8:59 UTC (permalink / raw)
To: Mike Galbraith
Cc: Paul Turner, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri, Kamalesh Babulal, Ingo Molnar,
Pavel Emelyanov
(2011/05/11 17:45), Mike Galbraith wrote:
> On Wed, 2011-05-11 at 01:13 -0700, Paul Turner wrote:
>> On Wed, May 11, 2011 at 12:55 AM, Hidetoshi Seto
>> <seto.hidetoshi@jp.fujitsu.com> wrote:
>>> (2011/05/10 17:32), Mike Galbraith wrote:
>>>> On Tue, 2011-05-10 at 16:14 +0900, Hidetoshi Seto wrote:
>>>>> This small fixlet can stand alone.
>>>>> Peter, how about getting this into git tree first?
>>>>
>>>> tip 2f36825b176f67e5c5228aa33d828bc39718811f contains the below.
>>>>
>>>> /* Don't dequeue parent if it has other entities besides us */
>>>> - if (cfs_rq->load.weight)
>>>> + if (cfs_rq->load.weight) {
>>>> + /*
>>>> + * Bias pick_next to pick a task from this cfs_rq, as
>>>> + * p is sleeping when it is within its sched_slice.
>>>> + */
>>>> + if (task_sleep && parent_entity(se))
>>>> + set_next_buddy(parent_entity(se));
>>>> break;
>>>> + }
>>>> flags |= DEQUEUE_SLEEP;
>>>> }
>>>
>>> Oh, thanks Mike!
>>> It seems that this change in tip is better one.
>>>
>>> Paul, don't you mind rebasing your patches onto tip/sched/core next time?
>>> (...or is there better branch for rebase?)
>>>
>>
>> I thought I had but apparently I missed this.
>>
>> We still need to set se = parent_entity(se) to avoid the pointless
>> double update below.
>>
>> Will definitely rebase.
>
> Wish I could, wouldn't have 114 other patches just to get evaluation
> tree up to speed :)
>
> Index: linux-2.6.32/kernel/sched_fair.c
> ===================================================================
> --- linux-2.6.32.orig/kernel/sched_fair.c
> +++ linux-2.6.32/kernel/sched_fair.c
> @@ -1308,12 +1308,15 @@ static void dequeue_task_fair(struct rq
>
> /* Don't dequeue parent if it has other entities besides us */
> if (cfs_rq->load.weight) {
> + /* Avoid double update below. */
> + se = parent_entity(se);
> +
> /*
> * Bias pick_next to pick a task from this cfs_rq, as
> * p is sleeping when it is within its sched_slice.
> */
> - if (task_sleep && parent_entity(se))
> - set_next_buddy(parent_entity(se));
> + if (task_sleep && se)
> + set_next_buddy(se);
> break;
> }
> flags |= DEQUEUE_SLEEP;
Nice!
It will be better to put this fixlet out from the cfs-bandwidth series
and post as a single patch.
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 13/15] sched: add exports tracking cfs bandwidth control statistics
2011-05-11 7:56 ` Hidetoshi Seto
@ 2011-05-11 9:09 ` Paul Turner
0 siblings, 0 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-11 9:09 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov, Nikhil Rao
On Wed, May 11, 2011 at 12:56 AM, Hidetoshi Seto
<seto.hidetoshi@jp.fujitsu.com> wrote:
> Oops, I found an issue here.
>
> (2011/05/03 18:28), Paul Turner wrote:
>> @@ -1628,6 +1631,12 @@ retry:
>> raw_spin_unlock(&cfs_b->lock);
>> goto retry;
>> }
>> +
>> + /* update throttled stats */
>> + cfs_b->nr_periods += overrun;
>> + if (throttled)
>> + cfs_b->nr_throttled += overrun;
>> +
>> cfs_b->runtime = runtime;
>> cfs_b->idle = idle;
>> out_unlock:
>
> Quoting from patch 09/15:
>
> + if (!throttled || quota == RUNTIME_INF)
> + goto out;
> + idle = 0;
> +
> +retry:
> + runtime = distribute_cfs_runtime(cfs_b, runtime, runtime_expires);
> +
> + raw_spin_lock(&cfs_b->lock);
> + /* new new bandwidth may have been set */
> + if (unlikely(runtime_expires != cfs_b->runtime_expires))
> + goto out_unlock;
> + /*
> + * make sure no-one was throttled while we were handing out the new
> + * runtime.
> + */
> + if (runtime > 0 && !list_empty(&cfs_b->throttled_cfs_rq)) {
> + raw_spin_unlock(&cfs_b->lock);
> + goto retry;
> + }
> + cfs_b->runtime = runtime;
> + cfs_b->idle = idle;
> +out_unlock:
> + raw_spin_unlock(&cfs_b->lock);
> +out:
>
> Since we skip distributing runtime (by "goto out") when !throttled,
> the new block inserted by this patch is passed only when throttled.
> So I see that nr_periods and nr_throttled look the same.
>
> Maybe we should move this block up like followings.
>
Yes, makes sense, incorporated -- thanks!
> Thanks,
> H.Seto
>
> ---
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -1620,6 +1620,12 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
> idle = cfs_b->idle;
> cfs_b->idle = 1;
> }
> +
> + /* update throttled stats */
> + cfs_b->nr_periods += overrun;
> + if (throttled)
> + cfs_b->nr_throttled += overrun;
> +
> raw_spin_unlock(&cfs_b->lock);
>
> if (!throttled || quota == RUNTIME_INF)
> @@ -1642,11 +1648,6 @@ retry:
> goto retry;
> }
>
> - /* update throttled stats */
> - cfs_b->nr_periods += overrun;
> - if (throttled)
> - cfs_b->nr_throttled += overrun;
> -
> cfs_b->runtime = runtime;
> cfs_b->idle = idle;
> out_unlock:
>
>
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 15/15] sched: add documentation for bandwidth control
2011-05-10 7:29 ` Hidetoshi Seto
@ 2011-05-11 9:09 ` Paul Turner
0 siblings, 0 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-11 9:09 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov
On Tue, May 10, 2011 at 12:29 AM, Hidetoshi Seto
<seto.hidetoshi@jp.fujitsu.com> wrote:
> (2011/05/03 18:29), Paul Turner wrote:
>> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>
>> Basic description of usage and effect for CFS Bandwidth Control.
>>
>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>> Signed-off-by: Paul Turner <pjt@google.com>
>> ---
>
> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
>
> Thank you very much for your great work, Paul!
>
> I've run some test on this version and no problems so far
> (other than minor bug pointed by 04/15).
> Definitely things getting better.
>
> I'll continue tests and let you know if there is something.
>
Thank you for taking the time to review and test!
Very much appreciated!
>
> Thanks,
> H.Seto
>
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 12/15] sched: migrate throttled tasks on HOTPLUG
2011-05-10 7:27 ` Hidetoshi Seto
@ 2011-05-11 9:10 ` Paul Turner
0 siblings, 0 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-11 9:10 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov
On Tue, May 10, 2011 at 12:27 AM, Hidetoshi Seto
<seto.hidetoshi@jp.fujitsu.com> wrote:
> (2011/05/03 18:28), Paul Turner wrote:
>> +#else
>> +static void unthrottle_offline_cfs_rqs(struct rq *rq)
>> +{
>> +}
>> +#endif
>> +
>
> Nit: To follow others, alternative style is in a line:
>
> +static void unthrottle_offline_cfs_rqs(struct rq *rq) {}
>
Agree, updated. Thanks
> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
>
>
> Thanks,
> H.Seto
>
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 11/15] sched: prevent interactions between throttled entities and load-balance
2011-05-10 7:26 ` Hidetoshi Seto
@ 2011-05-11 9:11 ` Paul Turner
0 siblings, 0 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-11 9:11 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov
On Tue, May 10, 2011 at 12:26 AM, Hidetoshi Seto
<seto.hidetoshi@jp.fujitsu.com> wrote:
> (2011/05/03 18:28), Paul Turner wrote:
>>>From the perspective of load-balance and shares distribution, throttled
>> entities should be invisible.
>>
>> However, both of these operations work on 'active' lists and are not
>> inherently aware of what group hierarchies may be present. In some cases this
>> may be side-stepped (e.g. we could sideload via tg_load_down in load balance)
>> while in others (e.g. update_shares()) it is more difficult to compute without
>> incurring some O(n^2) costs.
>>
>> Instead, track hierarchal throttled state at time of transition. This allows
>
> hierarchical
Fixed, Thanks
>
>> us to easily identify whether an entity belongs to a throttled hierarchy and
>> avoid incorrect interactions with it.
>>
>> Also, when an entity leaves a throttled hierarchy we need to advance its
>> time averaging for shares averaging so that the elapsed throttled time is not
>> considered as part of the cfs_rq's operation.
>>
>> Signed-off-by: Paul Turner <pjt@google.com>
>> ---
>
> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
>
> Thanks,
> H.Seto
>
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 09/15] sched: unthrottle cfs_rq(s) who ran out of quota at period refresh
2011-05-10 7:24 ` Hidetoshi Seto
@ 2011-05-11 9:24 ` Paul Turner
0 siblings, 0 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-11 9:24 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov, Nikhil Rao
On Tue, May 10, 2011 at 12:24 AM, Hidetoshi Seto
<seto.hidetoshi@jp.fujitsu.com> wrote:
> Some comments...
>
> (2011/05/03 18:28), Paul Turner wrote:
>> At the start of a new period there are several actions we must refresh the
>> global bandwidth pool as well as unthrottle any cfs_rq entities who previously
>> ran out of bandwidth (as quota permits).
>>
>> Unthrottled entities have the cfs_rq->throttled flag cleared and are re-enqueued
>> into the cfs entity hierarchy.
>>
>> Signed-off-by: Paul Turner <pjt@google.com>
>> Signed-off-by: Nikhil Rao <ncrao@google.com>
>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>> ---
>> kernel/sched.c | 3 +
>> kernel/sched_fair.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 107 insertions(+), 1 deletion(-)
>>
>> Index: tip/kernel/sched.c
>> ===================================================================
>> --- tip.orig/kernel/sched.c
>> +++ tip/kernel/sched.c
>> @@ -9294,6 +9294,9 @@ static int tg_set_cfs_bandwidth(struct t
>> cfs_rq->runtime_enabled = quota != RUNTIME_INF;
>> cfs_rq->runtime_remaining = 0;
>> cfs_rq->runtime_expires = runtime_expires;
>> +
>> + if (cfs_rq_throttled(cfs_rq))
>> + unthrottle_cfs_rq(cfs_rq);
>> raw_spin_unlock_irq(&rq->lock);
>> }
>> out_unlock:
>> Index: tip/kernel/sched_fair.c
>> ===================================================================
>> --- tip.orig/kernel/sched_fair.c
>> +++ tip/kernel/sched_fair.c
>> @@ -1456,10 +1456,88 @@ static void check_enqueue_throttle(struc
>> throttle_cfs_rq(cfs_rq);
>> }
>>
>> +static void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
>> +{
>> + struct rq *rq = rq_of(cfs_rq);
>> + struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
>> + struct sched_entity *se;
>> + int enqueue = 1;
>> + long task_delta;
>> +
>> + se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
>> +
>> + cfs_rq->throttled = 0;
>> + raw_spin_lock(&cfs_b->lock);
>> + list_del_rcu(&cfs_rq->throttled_list);
>> + raw_spin_unlock(&cfs_b->lock);
>> +
>> + if (!cfs_rq->load.weight)
>> + return;
>> +
>> + task_delta = cfs_rq->h_nr_running;
>> + for_each_sched_entity(se) {
>> + if (se->on_rq)
>> + enqueue = 0;
>> +
>> + cfs_rq = cfs_rq_of(se);
>> + if (enqueue)
>> + enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
>> + cfs_rq->h_nr_running += task_delta;
>> +
>> + if (cfs_rq_throttled(cfs_rq))
>> + break;
>> + }
>> +
>> + if (!se)
>> + rq->nr_running += task_delta;
>> +
>> + /* determine whether we need to wake up potentially idle cpu */
>> + if (rq->curr == rq->idle && rq->cfs.nr_running)
>> + resched_task(rq->curr);
>> +}
>> +
>> +static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
>> + u64 remaining, u64 expires)
>> +{
>> + struct cfs_rq *cfs_rq;
>> + u64 runtime = remaining;
>> +
>> + rcu_read_lock();
>> + list_for_each_entry_rcu(cfs_rq, &cfs_b->throttled_cfs_rq,
>> + throttled_list) {
>> + struct rq *rq = rq_of(cfs_rq);
>> +
>> + raw_spin_lock(&rq->lock);
>> + if (!cfs_rq_throttled(cfs_rq))
>> + goto next;
>> +
>> + runtime = -cfs_rq->runtime_remaining + 1;
>
> It will helpful if a comment can explain why negative and 1.
Remaining runtime of <= 0 implies that there was no bandwidth
available. See checks below et al. in check_... functions.
We choose the minimum amount here to return to a positive quota state.
Originally I had elected to take a full slice here. The limitation
became that this then effectively duplicated the assign_cfs_rq_runtime
path, and would require the quota handed out in each to be in
lockstep. Another trade-off is be that when we're in a large state of
arrears, handing out this extra bandwidth (in excess of the minimum
+1) up-front may prevent us from unthrottling another cfs_rq.
Will add a comment explaining that the minimum amount to leave arrears
is chosen above.
>
>> + if (runtime > remaining)
>> + runtime = remaining;
>> + remaining -= runtime;
>> +
>> + cfs_rq->runtime_remaining += runtime;
>> + cfs_rq->runtime_expires = expires;
>> +
>> + /* we check whether we're throttled above */
>> + if (cfs_rq->runtime_remaining > 0)
>> + unthrottle_cfs_rq(cfs_rq);
>> +
>> +next:
>> + raw_spin_unlock(&rq->lock);
>> +
>> + if (!remaining)
>> + break;
>> + }
>> + rcu_read_unlock();
>> +
>> + return remaining;
>> +}
>> +
>> static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
>> {
>> u64 quota, runtime = 0, runtime_expires;
>> - int idle = 0;
>> + int idle = 0, throttled = 0;
>>
>> runtime_expires = sched_clock_cpu(smp_processor_id());
>>
>> @@ -1469,6 +1547,7 @@ static int do_sched_cfs_period_timer(str
>> if (quota != RUNTIME_INF) {
>> runtime = quota;
>> runtime_expires += ktime_to_ns(cfs_b->period);
>> + throttled = !list_empty(&cfs_b->throttled_cfs_rq);
>>
>> cfs_b->runtime = runtime;
>> cfs_b->runtime_expires = runtime_expires;
>> @@ -1477,6 +1556,30 @@ static int do_sched_cfs_period_timer(str
>> }
>> raw_spin_unlock(&cfs_b->lock);
>>
>> + if (!throttled || quota == RUNTIME_INF)
>> + goto out;
>> + idle = 0;
>> +
>> +retry:
>> + runtime = distribute_cfs_runtime(cfs_b, runtime, runtime_expires);
>> +
>> + raw_spin_lock(&cfs_b->lock);
>> + /* new new bandwidth may have been set */
>
> Typo? new, newer, newest...?
>
s/new new/new/ :)
>> + if (unlikely(runtime_expires != cfs_b->runtime_expires))
>> + goto out_unlock;
>> + /*
>> + * make sure no-one was throttled while we were handing out the new
>> + * runtime.
>> + */
>> + if (runtime > 0 && !list_empty(&cfs_b->throttled_cfs_rq)) {
>> + raw_spin_unlock(&cfs_b->lock);
>> + goto retry;
>> + }
>> + cfs_b->runtime = runtime;
>> + cfs_b->idle = idle;
>> +out_unlock:
>> + raw_spin_unlock(&cfs_b->lock);
>> +out:
>> return idle;
>> }
>> #else
>
> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
>
> It would be better if this unthrottle patch (09/15) comes before
> throttle patch (08/15) in this series, not to make a small window
> in the history that throttled entity never back to the run queue.
> But I'm just paranoid...
>
The feature is inert unless bandwidth is set so this should be safe.
The trade-off with reversing the order is that a patch undoing state
that doesn't yet exist looks very strange :). If the above is a
concern I'd probably prefer to separate it into 3 parts:
1. add throttle
2. add unthrottle
3. enable throttle
Where (3) would consist only of the enqueue/put checks to trigger throttling.
>
> Thanks,
> H.Seto
>
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 06/15] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth
2011-05-10 7:22 ` Hidetoshi Seto
@ 2011-05-11 9:25 ` Paul Turner
0 siblings, 0 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-11 9:25 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov, Nikhil Rao
On Tue, May 10, 2011 at 12:22 AM, Hidetoshi Seto
<seto.hidetoshi@jp.fujitsu.com> wrote:
> (2011/05/03 18:28), Paul Turner wrote:
>> Index: tip/include/linux/sched.h
>> ===================================================================
>> --- tip.orig/include/linux/sched.h
>> +++ tip/include/linux/sched.h
>> @@ -1958,6 +1958,10 @@ int sched_cfs_consistent_handler(struct
>> loff_t *ppos);
>> #endif
>>
>> +#ifdef CONFIG_CFS_BANDWIDTH
>> +extern unsigned int sysctl_sched_cfs_bandwidth_slice;
>> +#endif
>> +
>> #ifdef CONFIG_SCHED_AUTOGROUP
>> extern unsigned int sysctl_sched_autogroup_enabled;
>>
>
> Nit: you can reuse ifdef just above here.
Thanks! I think this was actually a quilt-mis-merge when I was
shuffling the order of things around. Definitely makes sense to
combine them.
>
> +#ifdef CONFIG_CFS_BANDWIDTH
> +extern unsigned int sysctl_sched_cfs_bandwidth_consistent;
> +
> +int sched_cfs_consistent_handler(struct ctl_table *table, int write,
> + void __user *buffer, size_t *lenp,
> + loff_t *ppos);
> +#endif
> +
> +#ifdef CONFIG_CFS_BANDWIDTH
> +extern unsigned int sysctl_sched_cfs_bandwidth_slice;
> +#endif
> +
>
> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
>
>
> Thanks,
> H.Seto
>
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 05/15] sched: add a timer to handle CFS bandwidth refresh
2011-05-10 7:21 ` Hidetoshi Seto
@ 2011-05-11 9:27 ` Paul Turner
0 siblings, 0 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-11 9:27 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov
On Tue, May 10, 2011 at 12:21 AM, Hidetoshi Seto
<seto.hidetoshi@jp.fujitsu.com> wrote:
> (2011/05/03 18:28), Paul Turner wrote:
>> @@ -250,6 +253,9 @@ struct cfs_bandwidth {
>> ktime_t period;
>> u64 quota;
>> s64 hierarchal_quota;
>> +
>> + int idle;
>> + struct hrtimer period_timer;
>> #endif
>> };
>>
>
> "idle" is not used yet. How about adding it in later patch?
> Plus, comment explaining how it is used would be appreciated.
Fixed both. (idle belongs to the accumulate patch)
>
>> static void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
>> {
>> raw_spin_lock_init(&cfs_b->lock);
>> cfs_b->quota = RUNTIME_INF;
>> cfs_b->period = ns_to_ktime(default_cfs_period());
>> +
>> + hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>> + cfs_b->period_timer.function = sched_cfs_period_timer;
>> +
>> }
>
> Nit: blank line?
>
> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
>
>
> Thanks,
> H.Seto
>
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 04/15] sched: validate CFS quota hierarchies
2011-05-10 7:20 ` Hidetoshi Seto
@ 2011-05-11 9:37 ` Paul Turner
0 siblings, 0 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-11 9:37 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Kamalesh Babulal, Ingo Molnar, Pavel Emelyanov
On Tue, May 10, 2011 at 12:20 AM, Hidetoshi Seto
<seto.hidetoshi@jp.fujitsu.com> wrote:
> Description typos + one bug.
>
> (2011/05/03 18:28), Paul Turner wrote:
>> Add constraints validation for CFS bandwidth hierachies.
>
> hierarchies
>
>>
>> Validate that:
>> sum(child bandwidth) <= parent_bandwidth
>>
>> In a quota limited hierarchy, an unconstrainted entity
>
> unconstrained
>
>> (e.g. bandwidth==RUNTIME_INF) inherits the bandwidth of its parent.
>>
>> Since bandwidth periods may be non-uniform we normalize to the maximum allowed
>> period, 1 second.
>>
>> This behavior may be disabled (allowing child bandwidth to exceed parent) via
>> kernel.sched_cfs_bandwidth_consistent=0
>>
>> Signed-off-by: Paul Turner <pjt@google.com>
>>
>> ---
> (snip)
>> +/*
>> + * normalize group quota/period to be quota/max_period
>> + * note: units are usecs
>> + */
>> +static u64 normalize_cfs_quota(struct task_group *tg,
>> + struct cfs_schedulable_data *d)
>> +{
>> + u64 quota, period;
>> +
>> + if (tg == d->tg) {
>> + period = d->period;
>> + quota = d->quota;
>> + } else {
>> + period = tg_get_cfs_period(tg);
>> + quota = tg_get_cfs_quota(tg);
>> + }
>> +
>> + if (quota == RUNTIME_INF)
>> + return RUNTIME_INF;
>> +
>> + return to_ratio(period, quota);
>> +}
>
> Since tg_get_cfs_quota() doesn't return RUNTIME_INF but -1,
> this function needs a fix like following.
>
> For fixed version, feel free to add:
>
> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
>
> Thanks,
> H.Seto
>
> ---
> kernel/sched.c | 7 ++++---
> 1 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index d2562aa..f171ba5 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -9465,16 +9465,17 @@ static u64 normalize_cfs_quota(struct task_group *tg,
> u64 quota, period;
>
> if (tg == d->tg) {
> + if (d->quota == RUNTIME_INF)
> + return RUNTIME_INF;
> period = d->period;
> quota = d->quota;
> } else {
> + if (tg_cfs_bandwidth(tg)->quota == RUNTIME_INF)
> + return RUNTIME_INF;
> period = tg_get_cfs_period(tg);
> quota = tg_get_cfs_quota(tg);
> }
>
Good catch!
Just modifying:
+if (quota == RUNTIME_INF || quota == -1)
+ return RUNTIME_INF;
Seems simpler.
Although really there's no reason for tg_get_cfs_runtime (and
sched_group_rt_runtime from which it's cloned) not to be returning
RUNTIME_INF and then doing the conversion within the cgroupfs handler.
Fixing both is probably a better clean-up.
> - if (quota == RUNTIME_INF)
> - return RUNTIME_INF;
> -
> return to_ratio(period, quota);
> }
>
>
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 04/15] sched: validate CFS quota hierarchies
2011-05-03 9:28 ` [patch 04/15] sched: validate CFS quota hierarchies Paul Turner
2011-05-10 7:20 ` Hidetoshi Seto
@ 2011-05-16 9:30 ` Peter Zijlstra
2011-05-16 9:43 ` Peter Zijlstra
2 siblings, 0 replies; 129+ messages in thread
From: Peter Zijlstra @ 2011-05-16 9:30 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
> Since bandwidth periods may be non-uniform we normalize to the maximum allowed
> period, 1 second.
I'm still somewhat confused on this point, what does it mean to have a
(parent) group with 0.1s period with child-groups that have 1s periods?
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 04/15] sched: validate CFS quota hierarchies
2011-05-03 9:28 ` [patch 04/15] sched: validate CFS quota hierarchies Paul Turner
2011-05-10 7:20 ` Hidetoshi Seto
2011-05-16 9:30 ` Peter Zijlstra
@ 2011-05-16 9:43 ` Peter Zijlstra
2011-05-16 12:32 ` Paul Turner
2 siblings, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-05-16 9:43 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
> This behavior may be disabled (allowing child bandwidth to exceed parent) via
> kernel.sched_cfs_bandwidth_consistent=0
why? this needs very good justification.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 05/15] sched: add a timer to handle CFS bandwidth refresh
2011-05-03 9:28 ` [patch 05/15] sched: add a timer to handle CFS bandwidth refresh Paul Turner
2011-05-10 7:21 ` Hidetoshi Seto
@ 2011-05-16 10:18 ` Peter Zijlstra
2011-05-16 12:56 ` Paul Turner
1 sibling, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-05-16 10:18 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
> @@ -1003,6 +1003,8 @@ enqueue_entity(struct cfs_rq *cfs_rq, st
>
> if (cfs_rq->nr_running == 1)
> list_add_leaf_cfs_rq(cfs_rq);
> +
> + start_cfs_bandwidth(cfs_rq);
> }
>
> static void __clear_buddies_last(struct sched_entity *se)
> @@ -1220,6 +1222,8 @@ static void put_prev_entity(struct cfs_r
> update_stats_wait_start(cfs_rq, prev);
> /* Put 'current' back into the tree. */
> __enqueue_entity(cfs_rq, prev);
> +
> + start_cfs_bandwidth(cfs_rq);
> }
> cfs_rq->curr = NULL;
> }
OK, so while the first made sense the second had me go wtf?!, now I
_think_ you do that because do_sched_cfs_period_timer() can return idle
and stop the timer when no bandwidth consumption is seen for a while,
and thus we need to re-start the timer when we put the entity to sleep,
since that could have been a throttle.
If that's so then neither really do make sense and a big fat comment is
missing.
So why not start on the same (but inverse) condition that makes it stop?
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 06/15] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth
2011-05-03 9:28 ` [patch 06/15] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth Paul Turner
2011-05-10 7:22 ` Hidetoshi Seto
@ 2011-05-16 10:27 ` Peter Zijlstra
2011-05-16 12:59 ` Paul Turner
2011-05-16 10:32 ` Peter Zijlstra
2 siblings, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-05-16 10:27 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov, Nikhil Rao
On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
> +unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL;
What happens when the period is smaller than the slice?
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 06/15] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth
2011-05-03 9:28 ` [patch 06/15] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth Paul Turner
2011-05-10 7:22 ` Hidetoshi Seto
2011-05-16 10:27 ` Peter Zijlstra
@ 2011-05-16 10:32 ` Peter Zijlstra
2 siblings, 0 replies; 129+ messages in thread
From: Peter Zijlstra @ 2011-05-16 10:32 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov, Nikhil Rao
On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
> static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int
> overrun)
> {
> - return 1;
> + u64 quota, runtime = 0;
> + int idle = 0;
> +
> + raw_spin_lock(&cfs_b->lock);
> + quota = cfs_b->quota;
> +
> + if (quota != RUNTIME_INF) {
> + runtime = quota;
> + cfs_b->runtime = runtime;
> +
> + idle = cfs_b->idle;
> + cfs_b->idle = 1;
> + }
> + raw_spin_unlock(&cfs_b->lock);
> +
> + return idle;
> }
Shouldn't that also return 'idle' when quota is INF? No point in keeping
that timer ticking when there's no actual accounting anymore.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 07/15] sched: expire invalid runtime
2011-05-03 9:28 ` [patch 07/15] sched: expire invalid runtime Paul Turner
2011-05-10 7:22 ` Hidetoshi Seto
@ 2011-05-16 11:05 ` Peter Zijlstra
2011-05-16 11:07 ` Peter Zijlstra
2 siblings, 0 replies; 129+ messages in thread
From: Peter Zijlstra @ 2011-05-16 11:05 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
> + cfs_rq->runtime_expires = max(cfs_rq->runtime_expires, expires);
That doesn't work well when the clock wraps.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 07/15] sched: expire invalid runtime
2011-05-03 9:28 ` [patch 07/15] sched: expire invalid runtime Paul Turner
2011-05-10 7:22 ` Hidetoshi Seto
2011-05-16 11:05 ` Peter Zijlstra
@ 2011-05-16 11:07 ` Peter Zijlstra
2 siblings, 0 replies; 129+ messages in thread
From: Peter Zijlstra @ 2011-05-16 11:07 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
> With the global quota pool, one challenge is determining when the runtime we
> have received from it is still valid. Fortunately we can take advantage of
> sched_clock synchronization around the jiffy to do this cheaply.
>
> The one catch is that we don't know whether our local clock is behind or ahead
> of the cpu setting the expiration time (relative to its own clock).
>
> Fortunately we can detect which of these is the case by determining whether the
> global deadline has advanced. If it has not, then we assume we are behind, and
> advance our local expiration; otherwise, we know the deadline has truly passed
> and we expire our local runtime.
This needs a few words explaining why we need to do all this. It only e
explains the how of it.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 04/15] sched: validate CFS quota hierarchies
2011-05-16 9:43 ` Peter Zijlstra
@ 2011-05-16 12:32 ` Paul Turner
2011-05-17 15:26 ` Peter Zijlstra
0 siblings, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-05-16 12:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
On Mon, May 16, 2011 at 2:43 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
> > This behavior may be disabled (allowing child bandwidth to exceed parent) via
> > kernel.sched_cfs_bandwidth_consistent=0
>
> why? this needs very good justification.
I think it was lost in other discussion before, but I think there are
two useful use-cases for it:
Posting (condensed) relevant snippet:
-----------------------------------------------------------
Consider:
- I have some application that I want to limit to 3 cpus
I have a 2 workers in that application, across a period I would like
those workers to use a maximum of say 2.5 cpus each (suppose they
serve some sort of co-processor request per user and we want to
prevent a single user eating our entire limit and starving out
everything else).
The goal in this case is not preventing increasing availability within a
given limit, while not destroying the (relatively) work-conserving aspect of
its performance in general.
(...)
- There's also the case of managing an abusive user, use cases such
as the above means that users can usefully be given write permission
to their relevant sub-hierarchy.
If the system size changes, or a user becomes newly abusive then being
able to set non-conformant constraint avoids the adversarial problem of having
to find and bring all of their set (possibly maliciously large) limits
within the global limit.
-----------------------------------------------------------
(Previously: https://lkml.org/lkml/2011/2/24/477)
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 05/15] sched: add a timer to handle CFS bandwidth refresh
2011-05-16 10:18 ` Peter Zijlstra
@ 2011-05-16 12:56 ` Paul Turner
0 siblings, 0 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-16 12:56 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
On Mon, May 16, 2011 at 3:18 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
>> @@ -1003,6 +1003,8 @@ enqueue_entity(struct cfs_rq *cfs_rq, st
>>
>> if (cfs_rq->nr_running == 1)
>> list_add_leaf_cfs_rq(cfs_rq);
>> +
>> + start_cfs_bandwidth(cfs_rq);
>> }
>>
>> static void __clear_buddies_last(struct sched_entity *se)
>> @@ -1220,6 +1222,8 @@ static void put_prev_entity(struct cfs_r
>> update_stats_wait_start(cfs_rq, prev);
>> /* Put 'current' back into the tree. */
>> __enqueue_entity(cfs_rq, prev);
>> +
>> + start_cfs_bandwidth(cfs_rq);
>> }
>> cfs_rq->curr = NULL;
>> }
>
> OK, so while the first made sense the second had me go wtf?!, now I
> _think_ you do that because do_sched_cfs_period_timer() can return idle
> and stop the timer when no bandwidth consumption is seen for a while,
> and thus we need to re-start the timer when we put the entity to sleep,
> since that could have been a throttle.
>
> If that's so then neither really do make sense and a big fat comment is
> missing.
>
> So why not start on the same (but inverse) condition that makes it stop?
>
This was originally to guard the case that an entity was running on
stale (from a previous period) quota resulting in cfs_bandwidth->idle
and the timer not being re-instantiated.
Now that expiration is properly integrated I think the two cases are
analogous and that this can be dropped (and the start moved into the
(nr_running == 1) entity case on enqueue).
I think this is correct but my brain's a little fuzzy right now, will
confirm in the morning.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 06/15] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth
2011-05-16 10:27 ` Peter Zijlstra
@ 2011-05-16 12:59 ` Paul Turner
2011-05-17 15:28 ` Peter Zijlstra
0 siblings, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-05-16 12:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov, Nikhil Rao
On Mon, May 16, 2011 at 3:27 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
>> +unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL;
>
> What happens when the period is smaller than the slice?
>
We'll always take at most whatever's left in this case.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 08/15] sched: throttle cfs_rq entities which exceed their local runtime
2011-05-03 9:28 ` [patch 08/15] sched: throttle cfs_rq entities which exceed their local runtime Paul Turner
2011-05-10 7:23 ` Hidetoshi Seto
@ 2011-05-16 15:58 ` Peter Zijlstra
2011-05-16 16:05 ` Peter Zijlstra
2 siblings, 0 replies; 129+ messages in thread
From: Peter Zijlstra @ 2011-05-16 15:58 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov, Nikhil Rao
On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
> + /*
> + * it's possible active load balance has forced a throttled cfs_rq to
> + * run again, we don't want to re-throttled in this case.
> + */
> + if (cfs_rq_throttled(cfs_rq))
> + return;
expand a little on this, why would load-balancing interact with a
throttled group? load-balancing should fully ignore these things,
they're not runnable after all.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 08/15] sched: throttle cfs_rq entities which exceed their local runtime
2011-05-03 9:28 ` [patch 08/15] sched: throttle cfs_rq entities which exceed their local runtime Paul Turner
2011-05-10 7:23 ` Hidetoshi Seto
2011-05-16 15:58 ` Peter Zijlstra
@ 2011-05-16 16:05 ` Peter Zijlstra
2 siblings, 0 replies; 129+ messages in thread
From: Peter Zijlstra @ 2011-05-16 16:05 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov, Nikhil Rao
On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
> + task_delta = -cfs_rq->h_nr_running;
> + for_each_sched_entity(se) {
> + struct cfs_rq *qcfs_rq = cfs_rq_of(se);
> + /* throttled entity or throttle-on-deactivate */
> + if (!se->on_rq)
> + break;
> +
> + if (dequeue)
> + dequeue_entity(qcfs_rq, se, DEQUEUE_SLEEP);
> + qcfs_rq->h_nr_running += task_delta;
> +
> + if (qcfs_rq->load.weight)
> + dequeue = 0;
> + }
> +
> + if (!se)
> + rq->nr_running += task_delta;
So throttle is like dequeue, it removes tasks, so why then insist on
writing it like its adding tasks? (I see you're adding a negative
number, but its all just weird).
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 10/15] sched: allow for positional tg_tree walks
2011-05-03 9:28 ` [patch 10/15] sched: allow for positional tg_tree walks Paul Turner
2011-05-10 7:24 ` Hidetoshi Seto
@ 2011-05-17 13:31 ` Peter Zijlstra
2011-05-18 7:18 ` Paul Turner
1 sibling, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-05-17 13:31 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
> plain text document attachment (sched-bwc-refactor-walk_tg_tree.patch)
> Extend walk_tg_tree to accept a positional argument
>
> static int walk_tg_tree_from(struct task_group *from,
> tg_visitor down, tg_visitor up, void *data)
>
> Existing semantics are preserved, caller must hold rcu_lock() or sufficient
> analogue.
>
> Signed-off-by: Paul Turner <pjt@google.com>
> ---
> kernel/sched.c | 34 +++++++++++++++++++++++-----------
> 1 file changed, 23 insertions(+), 11 deletions(-)
>
> Index: tip/kernel/sched.c
> ===================================================================
> --- tip.orig/kernel/sched.c
> +++ tip/kernel/sched.c
> @@ -1430,21 +1430,19 @@ static inline void dec_cpu_load(struct r
> #if (defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)) || defined(CONFIG_RT_GROUP_SCHED)
> typedef int (*tg_visitor)(struct task_group *, void *);
>
> -/*
> - * Iterate the full tree, calling @down when first entering a node and @up when
> - * leaving it for the final time.
> - */
> -static int walk_tg_tree(tg_visitor down, tg_visitor up, void *data)
> +/* Iterate task_group tree rooted at *from */
> +static int walk_tg_tree_from(struct task_group *from,
> + tg_visitor down, tg_visitor up, void *data)
> {
> struct task_group *parent, *child;
> int ret;
>
> - rcu_read_lock();
> - parent = &root_task_group;
> + parent = from;
> +
> down:
> ret = (*down)(parent, data);
> if (ret)
> - goto out_unlock;
> + goto out;
> list_for_each_entry_rcu(child, &parent->children, siblings) {
> parent = child;
> goto down;
> @@ -1453,14 +1451,28 @@ up:
> continue;
> }
> ret = (*up)(parent, data);
> - if (ret)
> - goto out_unlock;
> + if (ret || parent == from)
> + goto out;
>
> child = parent;
> parent = parent->parent;
> if (parent)
> goto up;
> -out_unlock:
> +out:
> + return ret;
> +}
> +
> +/*
> + * Iterate the full tree, calling @down when first entering a node and @up when
> + * leaving it for the final time.
> + */
> +
> +static inline int walk_tg_tree(tg_visitor down, tg_visitor up, void *data)
> +{
> + int ret;
> +
> + rcu_read_lock();
> + ret = walk_tg_tree_from(&root_task_group, down, up, data);
> rcu_read_unlock();
>
> return ret;
I don't much like the different locking rules for these two functions. I
don't much care which you pick, but please make them consistent.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 04/15] sched: validate CFS quota hierarchies
2011-05-16 12:32 ` Paul Turner
@ 2011-05-17 15:26 ` Peter Zijlstra
2011-05-18 7:16 ` Paul Turner
0 siblings, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-05-17 15:26 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
On Mon, 2011-05-16 at 05:32 -0700, Paul Turner wrote:
> On Mon, May 16, 2011 at 2:43 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> > On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
> > > This behavior may be disabled (allowing child bandwidth to exceed parent) via
> > > kernel.sched_cfs_bandwidth_consistent=0
> >
> > why? this needs very good justification.
>
> I think it was lost in other discussion before, but I think there are
> two useful use-cases for it:
>
> Posting (condensed) relevant snippet:
Such stuff should really live in the changelog
> -----------------------------------------------------------
> Consider:
>
> - I have some application that I want to limit to 3 cpus
> I have a 2 workers in that application, across a period I would like
> those workers to use a maximum of say 2.5 cpus each (suppose they
> serve some sort of co-processor request per user and we want to
> prevent a single user eating our entire limit and starving out
> everything else).
>
> The goal in this case is not preventing increasing availability within a
> given limit, while not destroying the (relatively) work-conserving aspect of
> its performance in general.
>
> (...)
>
> - There's also the case of managing an abusive user, use cases such
> as the above means that users can usefully be given write permission
> to their relevant sub-hierarchy.
>
> If the system size changes, or a user becomes newly abusive then being
> able to set non-conformant constraint avoids the adversarial problem of having
> to find and bring all of their set (possibly maliciously large) limits
> within the global limit.
> -----------------------------------------------------------
But what about those where they want both behaviours on the same machine
but for different sub-trees?
Also, without the constraints, what does the hierarchy mean?
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 06/15] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth
2011-05-16 12:59 ` Paul Turner
@ 2011-05-17 15:28 ` Peter Zijlstra
2011-05-18 7:02 ` Paul Turner
0 siblings, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-05-17 15:28 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov, Nikhil Rao
On Mon, 2011-05-16 at 05:59 -0700, Paul Turner wrote:
> On Mon, May 16, 2011 at 3:27 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
> >> +unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL;
> >
> > What happens when the period is smaller than the slice?
> >
>
> We'll always take at most whatever's left in this case.
Right, saw that, but it might be good to have a little comment
explaining the interaction between the slice and the period things.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 06/15] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth
2011-05-17 15:28 ` Peter Zijlstra
@ 2011-05-18 7:02 ` Paul Turner
0 siblings, 0 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-18 7:02 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov, Nikhil Rao
On Tue, May 17, 2011 at 8:28 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Mon, 2011-05-16 at 05:59 -0700, Paul Turner wrote:
>> On Mon, May 16, 2011 at 3:27 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>> > On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
>> >> +unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL;
>> >
>> > What happens when the period is smaller than the slice?
>> >
>>
>> We'll always take at most whatever's left in this case.
>
> Right, saw that, but it might be good to have a little comment
> explaining the interaction between the slice and the period things.
>
Oh, sure -- easy enough :)
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 04/15] sched: validate CFS quota hierarchies
2011-05-17 15:26 ` Peter Zijlstra
@ 2011-05-18 7:16 ` Paul Turner
2011-05-18 11:57 ` Peter Zijlstra
0 siblings, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-05-18 7:16 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
On Tue, May 17, 2011 at 8:26 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Mon, 2011-05-16 at 05:32 -0700, Paul Turner wrote:
>> On Mon, May 16, 2011 at 2:43 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>> >
>> > On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
>> > > This behavior may be disabled (allowing child bandwidth to exceed parent) via
>> > > kernel.sched_cfs_bandwidth_consistent=0
>> >
>> > why? this needs very good justification.
>>
>> I think it was lost in other discussion before, but I think there are
>> two useful use-cases for it:
>>
>> Posting (condensed) relevant snippet:
>
> Such stuff should really live in the changelog
>
Given the discussion below it would seem to make sense to split the CL
into one part that adds the consistency checking. And (potentially,
depending on the discussion below) another that provides these state
semantics. This would also give us a chance to clearly call these
details out in the commit description.
>> -----------------------------------------------------------
>> Consider:
>>
>> - I have some application that I want to limit to 3 cpus
>> I have a 2 workers in that application, across a period I would like
>> those workers to use a maximum of say 2.5 cpus each (suppose they
>> serve some sort of co-processor request per user and we want to
>> prevent a single user eating our entire limit and starving out
>> everything else).
>>
>> The goal in this case is not preventing increasing availability within a
>> given limit, while not destroying the (relatively) work-conserving aspect of
>> its performance in general.
>>
>> (...)
>>
>> - There's also the case of managing an abusive user, use cases such
>> as the above means that users can usefully be given write permission
>> to their relevant sub-hierarchy.
>>
>> If the system size changes, or a user becomes newly abusive then being
>> able to set non-conformant constraint avoids the adversarial problem of having
>> to find and bring all of their set (possibly maliciously large) limits
>> within the global limit.
>> -----------------------------------------------------------
>
>
> But what about those where they want both behaviours on the same machine
> but for different sub-trees?
I originally considered a per-tg tunable. I made the assumption that
users would either handle this themselves (=0) or rely on the kernel
to do it (=1). There are some additional complexities that lead me to
withdraw from the per-cg approach in this pass given the known
resistance to it.
One concern was the potential ambiguity in the nesting of these values.
When an inconsistent entity is nested under a consistent one:
A) Do we allow this?
B) How do we treat it?
I think if this was the case that it would make sense to allow it and
that each inconsistent entity should effectively be treated as
terminal from the parent's point of view, and as the new root from the
child's point of view.
Does this make sense? While this is the most intuitive definition for
me there are certainly several other interpretations that could be
argued for.
Would you prefer this approach be taken to consistency vs at a global
level? Do the use-cases above have sufficient merit that we even make
this an option in the first place? Should we just always force
hierarchies to be consistent instead? I'm open on this.
>
> Also, without the constraints, what does the hierarchy mean?
>
It's still an upper-bound for usage, however it may not be achievable
in an inconsistent hierarchy. Whereas in a consistent one it should
always be achievable.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 10/15] sched: allow for positional tg_tree walks
2011-05-17 13:31 ` Peter Zijlstra
@ 2011-05-18 7:18 ` Paul Turner
0 siblings, 0 replies; 129+ messages in thread
From: Paul Turner @ 2011-05-18 7:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
On Tue, May 17, 2011 at 6:31 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Tue, 2011-05-03 at 02:28 -0700, Paul Turner wrote:
>> plain text document attachment (sched-bwc-refactor-walk_tg_tree.patch)
>> Extend walk_tg_tree to accept a positional argument
>>
>> static int walk_tg_tree_from(struct task_group *from,
>> tg_visitor down, tg_visitor up, void *data)
>>
>> Existing semantics are preserved, caller must hold rcu_lock() or sufficient
>> analogue.
>>
>> Signed-off-by: Paul Turner <pjt@google.com>
>> ---
>> kernel/sched.c | 34 +++++++++++++++++++++++-----------
>> 1 file changed, 23 insertions(+), 11 deletions(-)
>>
>> Index: tip/kernel/sched.c
>> ===================================================================
>> --- tip.orig/kernel/sched.c
>> +++ tip/kernel/sched.c
>> @@ -1430,21 +1430,19 @@ static inline void dec_cpu_load(struct r
>> #if (defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)) || defined(CONFIG_RT_GROUP_SCHED)
>> typedef int (*tg_visitor)(struct task_group *, void *);
>>
>> -/*
>> - * Iterate the full tree, calling @down when first entering a node and @up when
>> - * leaving it for the final time.
>> - */
>> -static int walk_tg_tree(tg_visitor down, tg_visitor up, void *data)
>> +/* Iterate task_group tree rooted at *from */
>> +static int walk_tg_tree_from(struct task_group *from,
>> + tg_visitor down, tg_visitor up, void *data)
>> {
>> struct task_group *parent, *child;
>> int ret;
>>
>> - rcu_read_lock();
>> - parent = &root_task_group;
>> + parent = from;
>> +
>> down:
>> ret = (*down)(parent, data);
>> if (ret)
>> - goto out_unlock;
>> + goto out;
>> list_for_each_entry_rcu(child, &parent->children, siblings) {
>> parent = child;
>> goto down;
>> @@ -1453,14 +1451,28 @@ up:
>> continue;
>> }
>> ret = (*up)(parent, data);
>> - if (ret)
>> - goto out_unlock;
>> + if (ret || parent == from)
>> + goto out;
>>
>> child = parent;
>> parent = parent->parent;
>> if (parent)
>> goto up;
>> -out_unlock:
>> +out:
>> + return ret;
>> +}
>> +
>> +/*
>> + * Iterate the full tree, calling @down when first entering a node and @up when
>> + * leaving it for the final time.
>> + */
>> +
>> +static inline int walk_tg_tree(tg_visitor down, tg_visitor up, void *data)
>> +{
>> + int ret;
>> +
>> + rcu_read_lock();
>> + ret = walk_tg_tree_from(&root_task_group, down, up, data);
>> rcu_read_unlock();
>>
>> return ret;
>
> I don't much like the different locking rules for these two functions. I
> don't much care which you pick, but please make them consistent.
>
Reasonable, given the call sites it would seem to make more sense to
make things consistent in the direction of depending on having the
caller do the locking. Will update.
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 04/15] sched: validate CFS quota hierarchies
2011-05-18 7:16 ` Paul Turner
@ 2011-05-18 11:57 ` Peter Zijlstra
0 siblings, 0 replies; 129+ messages in thread
From: Peter Zijlstra @ 2011-05-18 11:57 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Kamalesh Babulal,
Ingo Molnar, Pavel Emelyanov
On Wed, 2011-05-18 at 00:16 -0700, Paul Turner wrote:
> >
> > But what about those where they want both behaviours on the same machine
> > but for different sub-trees?
>
> I originally considered a per-tg tunable. I made the assumption that
> users would either handle this themselves (=0) or rely on the kernel
> to do it (=1). There are some additional complexities that lead me to
> withdraw from the per-cg approach in this pass given the known
> resistance to it.
Yeah, that's quite horrid too, you chose wisely by not going there ;-)
> One concern was the potential ambiguity in the nesting of these values.
>
> When an inconsistent entity is nested under a consistent one:
>
> A) Do we allow this?
> B) How do we treat it?
>
> I think if this was the case that it would make sense to allow it and
> that each inconsistent entity should effectively be treated as
> terminal from the parent's point of view, and as the new root from the
> child's point of view.
>
> Does this make sense? While this is the most intuitive definition for
> me there are certainly several other interpretations that could be
> argued for.
I'm not quite sure I get it, so what you're saying is: there were the
semantics are violated we draw a border and we only look at local
consistency, thereby side-stepping the whole problem.
Doesn't fly for me, also, see below, by not having any invariants you
don't have clear semantics at all.
> Would you prefer this approach be taken to consistency vs at a global
> level? Do the use-cases above have sufficient merit that we even make
> this an option in the first place? Should we just always force
> hierarchies to be consistent instead? I'm open on this.
Yeah, I think the use cases do make sense, its just that I don't like
the two different semantics and the confusion that goes with it.
> >
> > Also, without the constraints, what does the hierarchy mean?
> >
>
> It's still an upper-bound for usage, however it may not be achievable
> in an inconsistent hierarchy. Whereas in a consistent one it should
> always be achievable.
See that doesn't quite make sense to me, if its not achievable its
simply not and the meaning is no more.
So lets consider these cases again:
> - I have some application that I want to limit to 3 cpus
> I have a 2 workers in that application, across a period I would like
> those workers to use a maximum of say 2.5 cpus each (suppose they
> serve some sort of co-processor request per user and we want to
> prevent a single user eating our entire limit and starving out
> everything else).
>
> The goal in this case is not preventing increasing availability within a
> given limit, while not destroying the (relatively) work-conserving aspect of
> its performance in general.
So the problem here is that 2.5+2.5 > 3, right? So maybe our constraint
isn't quite right, since clearly the whole SCHED_OTHER bandwidth crap
has the purpose of allowing overload.
What about instead of using: \Sum u_i =< U, we use max(u_i) =< U, that
would allow the above case, and mean that the bandwidth limit placed on
the parent is the maximum allowed limit in that subtree. In overload
situations things go back to proportional parts of the subtree limit.
> >> - There's also the case of managing an abusive user, use cases such
> >> as the above means that users can usefully be given write permission
> >> to their relevant sub-hierarchy.
> >>
> >> If the system size changes, or a user becomes newly abusive then being
> >> able to set non-conformant constraint avoids the adversarial problem of having
> >> to find and bring all of their set (possibly maliciously large) limits
> >> within the global limit.
Right, so this example is a little more contrived in that if you had
managed it from the get-go the problem wouldn't be that big (you'd have
had sane limits to begin with).
So one solution is to co-mount the freezer cgroup with your cpu cgroup
and simply freeze the whole subtree while you sort out the settings :-)
Another possibility would be to allow something like:
$ echo force:50000 > cfs_quota_us
Where the "force:" thing requires CAP_SYS_ADMIN and updates the entire
sub-tree such that the above invariant is kept.
^ permalink raw reply [flat|nested] 129+ messages in thread
* CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
` (14 preceding siblings ...)
2011-05-03 9:29 ` [patch 15/15] sched: add documentation for bandwidth control Paul Turner
@ 2011-06-07 15:45 ` Kamalesh Babulal
2011-06-08 3:09 ` Paul Turner
` (2 more replies)
2011-06-14 6:58 ` [patch 00/15] CFS Bandwidth Control V6 Hu Tao
16 siblings, 3 replies; 129+ messages in thread
From: Kamalesh Babulal @ 2011-06-07 15:45 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Ingo Molnar, Pavel Emelyanov
Hi All,
In our test environment, while testing the CFS Bandwidth V6 patch set
on top of 55922c9d1b84. We observed that the CPU's idle time is seen
between 30% to 40% while running CPU bound test, with the cgroups tasks
not pinned to the CPU's. Whereas in the inverse case, where the cgroups
tasks are pinned to the CPU's, the idle time seen is nearly zero.
Test Scenario
--------------
- 5 cgroups are created with each groups assigned 2, 2, 4, 8, 16 tasks respectively.
- Each of the cgroup, has N sub-cgroups created. Where N is the NR_TASKS the cgroup
is assigned with. i.e., cgroup1, will create two sub-cgroups under it and assigned
one tasks per sub-group.
------------
| cgroup 1 |
------------
/ \
/ \
-------------- --------------
|sub-cgroup 1| |sub-cgroup 2|
| (task 1) | | (task 2) |
-------------- --------------
- Top cgroup is given unlimited quota (cpu.cfs_quota_us = -1) and period of 500ms
(cpu.cfs_period_us = 500000). Whereas the sub-cgroups are given 250ms of quota
(cpu.cfs_quota_us = 250000) and period of 500ms. i.e. the top cgroups are given
unlimited bandwidth, whereas the sub-group are throttled every 250ms.
- Additional if required the proportional CPU shares can be assigned to cpu.shares
as NR_TASKS * 1024. i.e. cgroup1 has 2 tasks * 1024 = 2048 worth cpu.shares
for cgroup1. (In the below test results published all cgroups and sub-cgroups
are given the equal share of 1024).
- One CPU bound while(1) task is attached to each sub-cgroup.
- sum-exec time for each cgroup/sub-cgroup is captured from /proc/sched_debug after
60 seconds and analyzed for the run time of the tasks a.k.a sub-cgroup.
How is the idle CPU time measured ?
------------------------------------
- vmstat stats are logged every 2 seconds, after attaching the last while1 task
to 16th sub-cgroup of cgroup 5 till the 60 sec run is over. After the run idle%
of a CPU is calculated by summing idle column from the vmstat log and dividing it
by number of samples collected, of-course after neglecting the first record
from the log.
How are the tasks pinned to the CPU ?
-------------------------------------
- cgroup is mounted with cpuset,cpu controller and for every 2 sub-cgroups one
physical CPU is allocated. i.e. CPU 1 is allocated between 1/1 and 1/2 (Group 1,
sub-cgroup 1 and sub-cgroup 2). Similarly CPUs 7 to 15 are allocated to 15/1 to
15/16 (Group 15, subgroup 1 to 16). Note that test machine used to test has
16 CPUs.
Result for non-pining case
---------------------------
Only the hierarchy is created as stated above and cpusets are not assigned per cgroup.
Average CPU Idle percentage 34.8% (as explained above in the Idle time measured)
Bandwidth shared with remaining non-Idle 65.2%
* Note: For the sake of roundoff value the numbers are multiplied by 100.
In the below result for cgroup1 9.2500 corresponds to sum-exec time captured
from /proc/sched_debug for cgroup 1 tasks (including sub-cgroup 1 and 2).
Which is in-turn 6% of the non-Idle CPU time (which is derived by 9.2500 * 65.2 / 100 )
Bandwidth of Group 1 = 9.2500 i.e = 6.0300% of non-Idle CPU time 65.2%
|...... subgroup 1/1 = 48.7800 i.e = 2.9400% of 6.0300% Groups non-Idle CPU time
|...... subgroup 1/2 = 51.2100 i.e = 3.0800% of 6.0300% Groups non-Idle CPU time
Bandwidth of Group 2 = 9.0400 i.e = 5.8900% of non-Idle CPU time 65.2%
|...... subgroup 2/1 = 51.0200 i.e = 3.0000% of 5.8900% Groups non-Idle CPU time
|...... subgroup 2/2 = 48.9700 i.e = 2.8800% of 5.8900% Groups non-Idle CPU time
Bandwidth of Group 3 = 16.9300 i.e = 11.0300% of non-Idle CPU time 65.2%
|...... subgroup 3/1 = 26.0300 i.e = 2.8700% of 11.0300% Groups non-Idle CPU time
|...... subgroup 3/2 = 25.8800 i.e = 2.8500% of 11.0300% Groups non-Idle CPU time
|...... subgroup 3/3 = 22.7800 i.e = 2.5100% of 11.0300% Groups non-Idle CPU time
|...... subgroup 3/4 = 25.2900 i.e = 2.7800% of 11.0300% Groups non-Idle CPU time
Bandwidth of Group 4 = 27.9300 i.e = 18.2100% of non-Idle CPU time 65.2%
|...... subgroup 4/1 = 16.6000 i.e = 3.0200% of 18.2100% Groups non-Idle CPU time
|...... subgroup 4/2 = 8.0000 i.e = 1.4500% of 18.2100% Groups non-Idle CPU time
|...... subgroup 4/3 = 9.0000 i.e = 1.6300% of 18.2100% Groups non-Idle CPU time
|...... subgroup 4/4 = 7.9600 i.e = 1.4400% of 18.2100% Groups non-Idle CPU time
|...... subgroup 4/5 = 12.3500 i.e = 2.2400% of 18.2100% Groups non-Idle CPU time
|...... subgroup 4/6 = 16.2500 i.e = 2.9500% of 18.2100% Groups non-Idle CPU time
|...... subgroup 4/7 = 12.6100 i.e = 2.2900% of 18.2100% Groups non-Idle CPU time
|...... subgroup 4/8 = 17.1900 i.e = 3.1300% of 18.2100% Groups non-Idle CPU time
Bandwidth of Group 5 = 36.8300 i.e = 24.0100% of non-Idle CPU time 65.2%
|...... subgroup 5/1 = 56.6900 i.e = 13.6100% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/2 = 8.8600 i.e = 2.1200% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/3 = 5.5100 i.e = 1.3200% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/4 = 4.5700 i.e = 1.0900% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/5 = 7.9500 i.e = 1.9000% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/6 = 2.1600 i.e = .5100% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/7 = 2.3400 i.e = .5600% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/8 = 2.1500 i.e = .5100% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/9 = 9.7200 i.e = 2.3300% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/10 = 5.0600 i.e = 1.2100% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/11 = 4.6900 i.e = 1.1200% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/12 = 8.9700 i.e = 2.1500% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/13 = 8.4600 i.e = 2.0300% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/14 = 11.8400 i.e = 2.8400% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/15 = 6.3400 i.e = 1.5200% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/16 = 5.1500 i.e = 1.2300% of 24.0100% Groups non-Idle CPU time
Pinned case
--------------
CPU hierarchy is created and cpusets are allocated.
Average CPU Idle percentage 0%
Bandwidth shared with remaining non-Idle 100%
Bandwidth of Group 1 = 6.3400 i.e = 6.3400% of non-Idle CPU time 100%
|...... subgroup 1/1 = 50.0400 i.e = 3.1700% of 6.3400% Groups non-Idle CPU time
|...... subgroup 1/2 = 49.9500 i.e = 3.1600% of 6.3400% Groups non-Idle CPU time
Bandwidth of Group 2 = 6.3200 i.e = 6.3200% of non-Idle CPU time 100%
|...... subgroup 2/1 = 50.0400 i.e = 3.1600% of 6.3200% Groups non-Idle CPU time
|...... subgroup 2/2 = 49.9500 i.e = 3.1500% of 6.3200% Groups non-Idle CPU time
Bandwidth of Group 3 = 12.6300 i.e = 12.6300% of non-Idle CPU time 100%
|...... subgroup 3/1 = 25.0300 i.e = 3.1600% of 12.6300% Groups non-Idle CPU time
|...... subgroup 3/2 = 25.0100 i.e = 3.1500% of 12.6300% Groups non-Idle CPU time
|...... subgroup 3/3 = 25.0000 i.e = 3.1500% of 12.6300% Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9400 i.e = 3.1400% of 12.6300% Groups non-Idle CPU time
Bandwidth of Group 4 = 25.1000 i.e = 25.1000% of non-Idle CPU time 100%
|...... subgroup 4/1 = 12.5400 i.e = 3.1400% of 25.1000% Groups non-Idle CPU time
|...... subgroup 4/2 = 12.5100 i.e = 3.1400% of 25.1000% Groups non-Idle CPU time
|...... subgroup 4/3 = 12.5300 i.e = 3.1400% of 25.1000% Groups non-Idle CPU time
|...... subgroup 4/4 = 12.5000 i.e = 3.1300% of 25.1000% Groups non-Idle CPU time
|...... subgroup 4/5 = 12.4900 i.e = 3.1300% of 25.1000% Groups non-Idle CPU time
|...... subgroup 4/6 = 12.4700 i.e = 3.1200% of 25.1000% Groups non-Idle CPU time
|...... subgroup 4/7 = 12.4700 i.e = 3.1200% of 25.1000% Groups non-Idle CPU time
|...... subgroup 4/8 = 12.4500 i.e = 3.1200% of 25.1000% Groups non-Idle CPU time
Bandwidth of Group 5 = 49.5700 i.e = 49.5700% of non-Idle CPU time 100%
|...... subgroup 5/1 = 49.8500 i.e = 24.7100% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/2 = 6.2900 i.e = 3.1100% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/3 = 6.2800 i.e = 3.1100% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/4 = 6.2700 i.e = 3.1000% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/5 = 6.2700 i.e = 3.1000% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/6 = 6.2600 i.e = 3.1000% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/7 = 6.2500 i.e = 3.0900% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/8 = 6.2400 i.e = 3.0900% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/9 = 6.2400 i.e = 3.0900% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/10 = 6.2300 i.e = 3.0800% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/11 = 6.2300 i.e = 3.0800% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/12 = 6.2200 i.e = 3.0800% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/13 = 6.2100 i.e = 3.0700% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/14 = 6.2100 i.e = 3.0700% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/15 = 6.2100 i.e = 3.0700% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/16 = 6.2100 i.e = 3.0700% of 49.5700% Groups non-Idle CPU time
with equal cpu shares allocated to all the groups/sub-cgroups and CFS bandwidth configured
to allow 100% CPU utilization. We see the CPU idle time in the un-pinned case.
Benchmark used to reproduce the issue, is attached. Justing executing the script should
report similar numbers.
#!/bin/bash
NR_TASKS1=2
NR_TASKS2=2
NR_TASKS3=4
NR_TASKS4=8
NR_TASKS5=16
BANDWIDTH=1
SUBGROUP=1
PRO_SHARES=0
MOUNT=/cgroup/
LOAD=/root/while1
usage()
{
echo "Usage $0: [-b 0|1] [-s 0|1] [-p 0|1]"
echo "-b 1|0 set/unset Cgroups bandwidth control (default set)"
echo "-s Create sub-groups for every task (default creates sub-group)"
echo "-p create propotional shares based on cpus"
exit
}
while getopts ":b:s:p:" arg
do
case $arg in
b)
BANDWIDTH=$OPTARG
shift
if [ $BANDWIDTH -gt 1 ] && [ $BANDWIDTH -lt 0 ]
then
usage
fi
;;
s)
SUBGROUP=$OPTARG
shift
if [ $SUBGROUP -gt 1 ] && [ $SUBGROUP -lt 0 ]
then
usage
fi
;;
p)
PRO_SHARES=$OPTARG
shift
if [ $PRO_SHARES -gt 1 ] && [ $PRO_SHARES -lt 0 ]
then
usage
fi
;;
*)
esac
done
if [ ! -d $MOUNT ]
then
mkdir -p $MOUNT
fi
test()
{
echo -n "[ "
if [ $1 -eq 0 ]
then
echo -ne '\E[42;40mOk'
else
echo -ne '\E[31;40mFailed'
tput sgr0
echo " ]"
exit
fi
tput sgr0
echo " ]"
}
mount_cgrp()
{
echo -n "Mounting root cgroup "
mount -t cgroup -ocpu,cpuset,cpuacct none $MOUNT &> /dev/null
test $?
}
umount_cgrp()
{
echo -n "Unmounting root cgroup "
cd /root/
umount $MOUNT
test $?
}
create_hierarchy()
{
mount_cgrp
cpuset_mem=`cat $MOUNT/cpuset.mems`
cpuset_cpu=`cat $MOUNT/cpuset.cpus`
echo -n "creating groups/sub-groups ..."
for (( i=1; i<=5; i++ ))
do
mkdir $MOUNT/$i
echo $cpuset_mem > $MOUNT/$i/cpuset.mems
echo $cpuset_cpu > $MOUNT/$i/cpuset.cpus
echo -n ".."
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
mkdir -p $MOUNT/$i/$j
echo $cpuset_mem > $MOUNT/$i/$j/cpuset.mems
echo $cpuset_cpu > $MOUNT/$i/$j/cpuset.cpus
echo -n ".."
done
fi
done
echo "."
}
cleanup()
{
pkill -9 while1 &> /dev/null
sleep 10
echo -n "Umount groups/sub-groups .."
for (( i=1; i<=5; i++ ))
do
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
rmdir $MOUNT/$i/$j
echo -n ".."
done
fi
rmdir $MOUNT/$i
echo -n ".."
done
echo " "
umount_cgrp
}
load_tasks()
{
for (( i=1; i<=5; i++ ))
do
jj=$(eval echo "\$NR_TASKS$i")
shares="1024"
if [ $PRO_SHARES -eq 1 ]
then
eval shares=$(echo "$jj * 1024" | bc)
fi
echo $hares > $MOUNT/$i/cpu.shares
for (( j=1; j<=$jj; j++ ))
do
echo "-1" > $MOUNT/$i/cpu.cfs_quota_us
echo "500000" > $MOUNT/$i/cpu.cfs_period_us
if [ $SUBGROUP -eq 1 ]
then
$LOAD &
echo $! > $MOUNT/$i/$j/tasks
echo "1024" > $MOUNT/$i/$j/cpu.shares
if [ $BANDWIDTH -eq 1 ]
then
echo "500000" > $MOUNT/$i/$j/cpu.cfs_period_us
echo "250000" > $MOUNT/$i/$j/cpu.cfs_quota_us
fi
else
$LOAD &
echo $! > $MOUNT/$i/tasks
echo $shares > $MOUNT/$i/cpu.shares
if [ $BANDWIDTH -eq 1 ]
then
echo "500000" > $MOUNT/$i/cpu.cfs_period_us
echo "250000" > $MOUNT/$i/cpu.cfs_quota_us
fi
fi
done
done
echo "Captuing idle cpu time with vmstat...."
vmstat 2 100 &> vmstat_log &
}
pin_tasks()
{
cpu=0
count=1
for (( i=1; i<=5; i++ ))
do
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
if [ $count -gt 2 ]
then
cpu=$((cpu+1))
count=1
fi
echo $cpu > $MOUNT/$i/$j/cpuset.cpus
count=$((count+1))
done
else
case $i in
1)
echo 0 > $MOUNT/$i/cpuset.cpus;;
2)
echo 1 > $MOUNT/$i/cpuset.cpus;;
3)
echo "2-3" > $MOUNT/$i/cpuset.cpus;;
4)
echo "4-6" > $MOUNT/$i/cpuset.cpus;;
5)
echo "7-15" > $MOUNT/$i/cpuset.cpus;;
esac
fi
done
}
print_results()
{
eval gtot=$(cat sched_log|grep -i while|sed 's/R//g'|awk '{gtot+=$7};END{printf "%f", gtot}')
for (( i=1; i<=5; i++ ))
do
eval temp=$(cat sched_log_$i|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
eval tavg=$(echo "scale=4;(($temp / $gtot) * $1)/100 " | bc)
eval avg=$(echo "scale=4;($temp / $gtot) * 100" | bc)
eval pretty_tavg=$( echo "scale=4; $tavg * 100"| bc) # F0r pretty format
echo "Bandwidth of Group $i = $avg i.e = $pretty_tavg% of non-Idle CPU time $1%"
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
eval tmp=$(cat sched_log_$i-$j|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
eval stavg=$(echo "scale=4;($tmp / $temp) * 100" | bc)
eval pretty_stavg=$(echo "scale=4;(($tmp / $temp) * $tavg) * 100" | bc)
echo -n "|"
echo -e "...... subgroup $i/$j\t= $stavg\ti.e = $pretty_stavg% of $pretty_tavg% Groups non-Idle CPU time"
done
fi
echo " "
echo " "
done
}
capture_results()
{
cat /proc/sched_debug > sched_log
pkill -9 vmstat -c
avg=$(cat vmstat_log |grep -iv "system"|grep -iv "swpd"|awk ' { if ( NR != 1) {id+=$15 }}END{print (id/NR)}')
rem=$(echo "scale=2; 100 - $avg" |bc)
echo "Average CPU Idle percentage $avg%"
echo "Bandwidth shared with remaining non-Idle $rem%"
for (( i=1; i<=5; i++ ))
do
cat sched_log |grep -i while1|grep -i " \/$i" > sched_log_$i
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
cat sched_log |grep -i while1|grep -i " \/$i\/$j" > sched_log_$i-$j
done
fi
done
print_results $rem
}
create_hierarchy
pin_tasks
load_tasks
sleep 60
capture_results
cleanup
exit
Thanks,
Kamalesh.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned
2011-06-07 15:45 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned Kamalesh Babulal
@ 2011-06-08 3:09 ` Paul Turner
2011-06-08 10:46 ` Vladimir Davydov
2011-06-14 10:16 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned Hidetoshi Seto
2 siblings, 0 replies; 129+ messages in thread
From: Paul Turner @ 2011-06-08 3:09 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: LKML, Peter Zijlstra, Bharata B Rao, Dhaval Giani, Balbir Singh,
Vaidyanathan Srinivasan, Srivatsa Vaddagiri, Ingo Molnar,
Pavel Emelyanov
[ Sorry for the delayed response, I was out on vacation for the second
half of May until last week -- I've now caught up on email and am
preparing the next posting ]
Thanks for the test-case Kamalesh -- my immediate suspicion is quota
return may not be fine-grained enough (although the numbers provided
are large enough it's possible there's also just a bug).
I have some tools from my own testing I can use to pull this apart,
let me run your work-load and get back to you.
On Tue, Jun 7, 2011 at 8:45 AM, Kamalesh Babulal
<kamalesh@linux.vnet.ibm.com> wrote:
> Hi All,
>
> In our test environment, while testing the CFS Bandwidth V6 patch set
> on top of 55922c9d1b84. We observed that the CPU's idle time is seen
> between 30% to 40% while running CPU bound test, with the cgroups tasks
> not pinned to the CPU's. Whereas in the inverse case, where the cgroups
> tasks are pinned to the CPU's, the idle time seen is nearly zero.
>
> Test Scenario
> --------------
> - 5 cgroups are created with each groups assigned 2, 2, 4, 8, 16 tasks respectively.
> - Each of the cgroup, has N sub-cgroups created. Where N is the NR_TASKS the cgroup
> is assigned with. i.e., cgroup1, will create two sub-cgroups under it and assigned
> one tasks per sub-group.
> ------------
> | cgroup 1 |
> ------------
> / \
> / \
> -------------- --------------
> |sub-cgroup 1| |sub-cgroup 2|
> | (task 1) | | (task 2) |
> -------------- --------------
>
> - Top cgroup is given unlimited quota (cpu.cfs_quota_us = -1) and period of 500ms
> (cpu.cfs_period_us = 500000). Whereas the sub-cgroups are given 250ms of quota
> (cpu.cfs_quota_us = 250000) and period of 500ms. i.e. the top cgroups are given
> unlimited bandwidth, whereas the sub-group are throttled every 250ms.
>
> - Additional if required the proportional CPU shares can be assigned to cpu.shares
> as NR_TASKS * 1024. i.e. cgroup1 has 2 tasks * 1024 = 2048 worth cpu.shares
> for cgroup1. (In the below test results published all cgroups and sub-cgroups
> are given the equal share of 1024).
>
> - One CPU bound while(1) task is attached to each sub-cgroup.
>
> - sum-exec time for each cgroup/sub-cgroup is captured from /proc/sched_debug after
> 60 seconds and analyzed for the run time of the tasks a.k.a sub-cgroup.
>
> How is the idle CPU time measured ?
> ------------------------------------
> - vmstat stats are logged every 2 seconds, after attaching the last while1 task
> to 16th sub-cgroup of cgroup 5 till the 60 sec run is over. After the run idle%
> of a CPU is calculated by summing idle column from the vmstat log and dividing it
> by number of samples collected, of-course after neglecting the first record
> from the log.
>
> How are the tasks pinned to the CPU ?
> -------------------------------------
> - cgroup is mounted with cpuset,cpu controller and for every 2 sub-cgroups one
> physical CPU is allocated. i.e. CPU 1 is allocated between 1/1 and 1/2 (Group 1,
> sub-cgroup 1 and sub-cgroup 2). Similarly CPUs 7 to 15 are allocated to 15/1 to
> 15/16 (Group 15, subgroup 1 to 16). Note that test machine used to test has
> 16 CPUs.
>
> Result for non-pining case
> ---------------------------
> Only the hierarchy is created as stated above and cpusets are not assigned per cgroup.
>
> Average CPU Idle percentage 34.8% (as explained above in the Idle time measured)
> Bandwidth shared with remaining non-Idle 65.2%
>
> * Note: For the sake of roundoff value the numbers are multiplied by 100.
>
> In the below result for cgroup1 9.2500 corresponds to sum-exec time captured
> from /proc/sched_debug for cgroup 1 tasks (including sub-cgroup 1 and 2).
> Which is in-turn 6% of the non-Idle CPU time (which is derived by 9.2500 * 65.2 / 100 )
>
> Bandwidth of Group 1 = 9.2500 i.e = 6.0300% of non-Idle CPU time 65.2%
> |...... subgroup 1/1 = 48.7800 i.e = 2.9400% of 6.0300% Groups non-Idle CPU time
> |...... subgroup 1/2 = 51.2100 i.e = 3.0800% of 6.0300% Groups non-Idle CPU time
>
>
> Bandwidth of Group 2 = 9.0400 i.e = 5.8900% of non-Idle CPU time 65.2%
> |...... subgroup 2/1 = 51.0200 i.e = 3.0000% of 5.8900% Groups non-Idle CPU time
> |...... subgroup 2/2 = 48.9700 i.e = 2.8800% of 5.8900% Groups non-Idle CPU time
>
>
> Bandwidth of Group 3 = 16.9300 i.e = 11.0300% of non-Idle CPU time 65.2%
> |...... subgroup 3/1 = 26.0300 i.e = 2.8700% of 11.0300% Groups non-Idle CPU time
> |...... subgroup 3/2 = 25.8800 i.e = 2.8500% of 11.0300% Groups non-Idle CPU time
> |...... subgroup 3/3 = 22.7800 i.e = 2.5100% of 11.0300% Groups non-Idle CPU time
> |...... subgroup 3/4 = 25.2900 i.e = 2.7800% of 11.0300% Groups non-Idle CPU time
>
>
> Bandwidth of Group 4 = 27.9300 i.e = 18.2100% of non-Idle CPU time 65.2%
> |...... subgroup 4/1 = 16.6000 i.e = 3.0200% of 18.2100% Groups non-Idle CPU time
> |...... subgroup 4/2 = 8.0000 i.e = 1.4500% of 18.2100% Groups non-Idle CPU time
> |...... subgroup 4/3 = 9.0000 i.e = 1.6300% of 18.2100% Groups non-Idle CPU time
> |...... subgroup 4/4 = 7.9600 i.e = 1.4400% of 18.2100% Groups non-Idle CPU time
> |...... subgroup 4/5 = 12.3500 i.e = 2.2400% of 18.2100% Groups non-Idle CPU time
> |...... subgroup 4/6 = 16.2500 i.e = 2.9500% of 18.2100% Groups non-Idle CPU time
> |...... subgroup 4/7 = 12.6100 i.e = 2.2900% of 18.2100% Groups non-Idle CPU time
> |...... subgroup 4/8 = 17.1900 i.e = 3.1300% of 18.2100% Groups non-Idle CPU time
>
>
> Bandwidth of Group 5 = 36.8300 i.e = 24.0100% of non-Idle CPU time 65.2%
> |...... subgroup 5/1 = 56.6900 i.e = 13.6100% of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/2 = 8.8600 i.e = 2.1200% of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/3 = 5.5100 i.e = 1.3200% of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/4 = 4.5700 i.e = 1.0900% of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/5 = 7.9500 i.e = 1.9000% of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/6 = 2.1600 i.e = .5100% of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/7 = 2.3400 i.e = .5600% of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/8 = 2.1500 i.e = .5100% of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/9 = 9.7200 i.e = 2.3300% of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/10 = 5.0600 i.e = 1.2100% of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/11 = 4.6900 i.e = 1.1200% of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/12 = 8.9700 i.e = 2.1500% of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/13 = 8.4600 i.e = 2.0300% of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/14 = 11.8400 i.e = 2.8400% of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/15 = 6.3400 i.e = 1.5200% of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/16 = 5.1500 i.e = 1.2300% of 24.0100% Groups non-Idle CPU time
>
> Pinned case
> --------------
> CPU hierarchy is created and cpusets are allocated.
>
> Average CPU Idle percentage 0%
> Bandwidth shared with remaining non-Idle 100%
>
> Bandwidth of Group 1 = 6.3400 i.e = 6.3400% of non-Idle CPU time 100%
> |...... subgroup 1/1 = 50.0400 i.e = 3.1700% of 6.3400% Groups non-Idle CPU time
> |...... subgroup 1/2 = 49.9500 i.e = 3.1600% of 6.3400% Groups non-Idle CPU time
>
>
> Bandwidth of Group 2 = 6.3200 i.e = 6.3200% of non-Idle CPU time 100%
> |...... subgroup 2/1 = 50.0400 i.e = 3.1600% of 6.3200% Groups non-Idle CPU time
> |...... subgroup 2/2 = 49.9500 i.e = 3.1500% of 6.3200% Groups non-Idle CPU time
>
>
> Bandwidth of Group 3 = 12.6300 i.e = 12.6300% of non-Idle CPU time 100%
> |...... subgroup 3/1 = 25.0300 i.e = 3.1600% of 12.6300% Groups non-Idle CPU time
> |...... subgroup 3/2 = 25.0100 i.e = 3.1500% of 12.6300% Groups non-Idle CPU time
> |...... subgroup 3/3 = 25.0000 i.e = 3.1500% of 12.6300% Groups non-Idle CPU time
> |...... subgroup 3/4 = 24.9400 i.e = 3.1400% of 12.6300% Groups non-Idle CPU time
>
>
> Bandwidth of Group 4 = 25.1000 i.e = 25.1000% of non-Idle CPU time 100%
> |...... subgroup 4/1 = 12.5400 i.e = 3.1400% of 25.1000% Groups non-Idle CPU time
> |...... subgroup 4/2 = 12.5100 i.e = 3.1400% of 25.1000% Groups non-Idle CPU time
> |...... subgroup 4/3 = 12.5300 i.e = 3.1400% of 25.1000% Groups non-Idle CPU time
> |...... subgroup 4/4 = 12.5000 i.e = 3.1300% of 25.1000% Groups non-Idle CPU time
> |...... subgroup 4/5 = 12.4900 i.e = 3.1300% of 25.1000% Groups non-Idle CPU time
> |...... subgroup 4/6 = 12.4700 i.e = 3.1200% of 25.1000% Groups non-Idle CPU time
> |...... subgroup 4/7 = 12.4700 i.e = 3.1200% of 25.1000% Groups non-Idle CPU time
> |...... subgroup 4/8 = 12.4500 i.e = 3.1200% of 25.1000% Groups non-Idle CPU time
>
>
> Bandwidth of Group 5 = 49.5700 i.e = 49.5700% of non-Idle CPU time 100%
> |...... subgroup 5/1 = 49.8500 i.e = 24.7100% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/2 = 6.2900 i.e = 3.1100% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/3 = 6.2800 i.e = 3.1100% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/4 = 6.2700 i.e = 3.1000% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/5 = 6.2700 i.e = 3.1000% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/6 = 6.2600 i.e = 3.1000% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/7 = 6.2500 i.e = 3.0900% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/8 = 6.2400 i.e = 3.0900% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/9 = 6.2400 i.e = 3.0900% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/10 = 6.2300 i.e = 3.0800% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/11 = 6.2300 i.e = 3.0800% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/12 = 6.2200 i.e = 3.0800% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/13 = 6.2100 i.e = 3.0700% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/14 = 6.2100 i.e = 3.0700% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/15 = 6.2100 i.e = 3.0700% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/16 = 6.2100 i.e = 3.0700% of 49.5700% Groups non-Idle CPU time
>
> with equal cpu shares allocated to all the groups/sub-cgroups and CFS bandwidth configured
> to allow 100% CPU utilization. We see the CPU idle time in the un-pinned case.
>
> Benchmark used to reproduce the issue, is attached. Justing executing the script should
> report similar numbers.
>
> #!/bin/bash
>
> NR_TASKS1=2
> NR_TASKS2=2
> NR_TASKS3=4
> NR_TASKS4=8
> NR_TASKS5=16
>
> BANDWIDTH=1
> SUBGROUP=1
> PRO_SHARES=0
> MOUNT=/cgroup/
> LOAD=/root/while1
>
> usage()
> {
> echo "Usage $0: [-b 0|1] [-s 0|1] [-p 0|1]"
> echo "-b 1|0 set/unset Cgroups bandwidth control (default set)"
> echo "-s Create sub-groups for every task (default creates sub-group)"
> echo "-p create propotional shares based on cpus"
> exit
> }
> while getopts ":b:s:p:" arg
> do
> case $arg in
> b)
> BANDWIDTH=$OPTARG
> shift
> if [ $BANDWIDTH -gt 1 ] && [ $BANDWIDTH -lt 0 ]
> then
> usage
> fi
> ;;
> s)
> SUBGROUP=$OPTARG
> shift
> if [ $SUBGROUP -gt 1 ] && [ $SUBGROUP -lt 0 ]
> then
> usage
> fi
> ;;
> p)
> PRO_SHARES=$OPTARG
> shift
> if [ $PRO_SHARES -gt 1 ] && [ $PRO_SHARES -lt 0 ]
> then
> usage
> fi
> ;;
>
> *)
>
> esac
> done
> if [ ! -d $MOUNT ]
> then
> mkdir -p $MOUNT
> fi
> test()
> {
> echo -n "[ "
> if [ $1 -eq 0 ]
> then
> echo -ne '\E[42;40mOk'
> else
> echo -ne '\E[31;40mFailed'
> tput sgr0
> echo " ]"
> exit
> fi
> tput sgr0
> echo " ]"
> }
> mount_cgrp()
> {
> echo -n "Mounting root cgroup "
> mount -t cgroup -ocpu,cpuset,cpuacct none $MOUNT &> /dev/null
> test $?
> }
>
> umount_cgrp()
> {
> echo -n "Unmounting root cgroup "
> cd /root/
> umount $MOUNT
> test $?
> }
>
> create_hierarchy()
> {
> mount_cgrp
> cpuset_mem=`cat $MOUNT/cpuset.mems`
> cpuset_cpu=`cat $MOUNT/cpuset.cpus`
> echo -n "creating groups/sub-groups ..."
> for (( i=1; i<=5; i++ ))
> do
> mkdir $MOUNT/$i
> echo $cpuset_mem > $MOUNT/$i/cpuset.mems
> echo $cpuset_cpu > $MOUNT/$i/cpuset.cpus
> echo -n ".."
> if [ $SUBGROUP -eq 1 ]
> then
> jj=$(eval echo "\$NR_TASKS$i")
> for (( j=1; j<=$jj; j++ ))
> do
> mkdir -p $MOUNT/$i/$j
> echo $cpuset_mem > $MOUNT/$i/$j/cpuset.mems
> echo $cpuset_cpu > $MOUNT/$i/$j/cpuset.cpus
> echo -n ".."
> done
> fi
> done
> echo "."
> }
>
> cleanup()
> {
> pkill -9 while1 &> /dev/null
> sleep 10
> echo -n "Umount groups/sub-groups .."
> for (( i=1; i<=5; i++ ))
> do
> if [ $SUBGROUP -eq 1 ]
> then
> jj=$(eval echo "\$NR_TASKS$i")
> for (( j=1; j<=$jj; j++ ))
> do
> rmdir $MOUNT/$i/$j
> echo -n ".."
> done
> fi
> rmdir $MOUNT/$i
> echo -n ".."
> done
> echo " "
> umount_cgrp
> }
>
> load_tasks()
> {
> for (( i=1; i<=5; i++ ))
> do
> jj=$(eval echo "\$NR_TASKS$i")
> shares="1024"
> if [ $PRO_SHARES -eq 1 ]
> then
> eval shares=$(echo "$jj * 1024" | bc)
> fi
> echo $hares > $MOUNT/$i/cpu.shares
> for (( j=1; j<=$jj; j++ ))
> do
> echo "-1" > $MOUNT/$i/cpu.cfs_quota_us
> echo "500000" > $MOUNT/$i/cpu.cfs_period_us
> if [ $SUBGROUP -eq 1 ]
> then
>
> $LOAD &
> echo $! > $MOUNT/$i/$j/tasks
> echo "1024" > $MOUNT/$i/$j/cpu.shares
>
> if [ $BANDWIDTH -eq 1 ]
> then
> echo "500000" > $MOUNT/$i/$j/cpu.cfs_period_us
> echo "250000" > $MOUNT/$i/$j/cpu.cfs_quota_us
> fi
> else
> $LOAD &
> echo $! > $MOUNT/$i/tasks
> echo $shares > $MOUNT/$i/cpu.shares
>
> if [ $BANDWIDTH -eq 1 ]
> then
> echo "500000" > $MOUNT/$i/cpu.cfs_period_us
> echo "250000" > $MOUNT/$i/cpu.cfs_quota_us
> fi
> fi
> done
> done
> echo "Captuing idle cpu time with vmstat...."
> vmstat 2 100 &> vmstat_log &
> }
>
> pin_tasks()
> {
> cpu=0
> count=1
> for (( i=1; i<=5; i++ ))
> do
> if [ $SUBGROUP -eq 1 ]
> then
> jj=$(eval echo "\$NR_TASKS$i")
> for (( j=1; j<=$jj; j++ ))
> do
> if [ $count -gt 2 ]
> then
> cpu=$((cpu+1))
> count=1
> fi
> echo $cpu > $MOUNT/$i/$j/cpuset.cpus
> count=$((count+1))
> done
> else
> case $i in
> 1)
> echo 0 > $MOUNT/$i/cpuset.cpus;;
> 2)
> echo 1 > $MOUNT/$i/cpuset.cpus;;
> 3)
> echo "2-3" > $MOUNT/$i/cpuset.cpus;;
> 4)
> echo "4-6" > $MOUNT/$i/cpuset.cpus;;
> 5)
> echo "7-15" > $MOUNT/$i/cpuset.cpus;;
> esac
> fi
> done
>
> }
>
> print_results()
> {
> eval gtot=$(cat sched_log|grep -i while|sed 's/R//g'|awk '{gtot+=$7};END{printf "%f", gtot}')
> for (( i=1; i<=5; i++ ))
> do
> eval temp=$(cat sched_log_$i|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
> eval tavg=$(echo "scale=4;(($temp / $gtot) * $1)/100 " | bc)
> eval avg=$(echo "scale=4;($temp / $gtot) * 100" | bc)
> eval pretty_tavg=$( echo "scale=4; $tavg * 100"| bc) # F0r pretty format
> echo "Bandwidth of Group $i = $avg i.e = $pretty_tavg% of non-Idle CPU time $1%"
> if [ $SUBGROUP -eq 1 ]
> then
> jj=$(eval echo "\$NR_TASKS$i")
> for (( j=1; j<=$jj; j++ ))
> do
> eval tmp=$(cat sched_log_$i-$j|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
> eval stavg=$(echo "scale=4;($tmp / $temp) * 100" | bc)
> eval pretty_stavg=$(echo "scale=4;(($tmp / $temp) * $tavg) * 100" | bc)
> echo -n "|"
> echo -e "...... subgroup $i/$j\t= $stavg\ti.e = $pretty_stavg% of $pretty_tavg% Groups non-Idle CPU time"
> done
> fi
> echo " "
> echo " "
> done
> }
> capture_results()
> {
> cat /proc/sched_debug > sched_log
> pkill -9 vmstat -c
> avg=$(cat vmstat_log |grep -iv "system"|grep -iv "swpd"|awk ' { if ( NR != 1) {id+=$15 }}END{print (id/NR)}')
>
> rem=$(echo "scale=2; 100 - $avg" |bc)
> echo "Average CPU Idle percentage $avg%"
> echo "Bandwidth shared with remaining non-Idle $rem%"
> for (( i=1; i<=5; i++ ))
> do
> cat sched_log |grep -i while1|grep -i " \/$i" > sched_log_$i
> if [ $SUBGROUP -eq 1 ]
> then
> jj=$(eval echo "\$NR_TASKS$i")
> for (( j=1; j<=$jj; j++ ))
> do
> cat sched_log |grep -i while1|grep -i " \/$i\/$j" > sched_log_$i-$j
> done
> fi
> done
> print_results $rem
> }
> create_hierarchy
> pin_tasks
>
> load_tasks
> sleep 60
> capture_results
> cleanup
> exit
>
> Thanks,
> Kamalesh.
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned
2011-06-07 15:45 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned Kamalesh Babulal
2011-06-08 3:09 ` Paul Turner
@ 2011-06-08 10:46 ` Vladimir Davydov
2011-06-08 16:32 ` Kamalesh Babulal
2011-06-14 10:16 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned Hidetoshi Seto
2 siblings, 1 reply; 129+ messages in thread
From: Vladimir Davydov @ 2011-06-08 10:46 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: Paul Turner, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri, Ingo Molnar, Pavel Emelianov
On Tue, 2011-06-07 at 19:45 +0400, Kamalesh Babulal wrote:
> Hi All,
>
> In our test environment, while testing the CFS Bandwidth V6 patch set
> on top of 55922c9d1b84. We observed that the CPU's idle time is seen
> between 30% to 40% while running CPU bound test, with the cgroups tasks
> not pinned to the CPU's. Whereas in the inverse case, where the cgroups
> tasks are pinned to the CPU's, the idle time seen is nearly zero.
(snip)
> load_tasks()
> {
> for (( i=1; i<=5; i++ ))
> do
> jj=$(eval echo "\$NR_TASKS$i")
> shares="1024"
> if [ $PRO_SHARES -eq 1 ]
> then
> eval shares=$(echo "$jj * 1024" | bc)
> fi
> echo $hares > $MOUNT/$i/cpu.shares
^^^^^
a fatal misprint? must be shares, I guess
(Setting cpu.shares to "", i.e. to the minimal possible value, will
definitely confuse the load balancer)
> for (( j=1; j<=$jj; j++ ))
> do
> echo "-1" > $MOUNT/$i/cpu.cfs_quota_us
> echo "500000" > $MOUNT/$i/cpu.cfs_period_us
> if [ $SUBGROUP -eq 1 ]
> then
>
> $LOAD &
> echo $! > $MOUNT/$i/$j/tasks
> echo "1024" > $MOUNT/$i/$j/cpu.shares
>
> if [ $BANDWIDTH -eq 1 ]
> then
> echo "500000" > $MOUNT/$i/$j/cpu.cfs_period_us
> echo "250000" > $MOUNT/$i/$j/cpu.cfs_quota_us
> fi
> else
> $LOAD &
> echo $! > $MOUNT/$i/tasks
> echo $shares > $MOUNT/$i/cpu.shares
>
> if [ $BANDWIDTH -eq 1 ]
> then
> echo "500000" > $MOUNT/$i/cpu.cfs_period_us
> echo "250000" > $MOUNT/$i/cpu.cfs_quota_us
> fi
> fi
> done
> done
> echo "Captuing idle cpu time with vmstat...."
> vmstat 2 100 &> vmstat_log &
> }
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned
2011-06-08 10:46 ` Vladimir Davydov
@ 2011-06-08 16:32 ` Kamalesh Babulal
2011-06-09 3:25 ` Paul Turner
0 siblings, 1 reply; 129+ messages in thread
From: Kamalesh Babulal @ 2011-06-08 16:32 UTC (permalink / raw)
To: Vladimir Davydov
Cc: Paul Turner, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri, Ingo Molnar, Pavel Emelianov
* Vladimir Davydov <vdavydov@parallels.com> [2011-06-08 14:46:06]:
> On Tue, 2011-06-07 at 19:45 +0400, Kamalesh Babulal wrote:
> > Hi All,
> >
> > In our test environment, while testing the CFS Bandwidth V6 patch set
> > on top of 55922c9d1b84. We observed that the CPU's idle time is seen
> > between 30% to 40% while running CPU bound test, with the cgroups tasks
> > not pinned to the CPU's. Whereas in the inverse case, where the cgroups
> > tasks are pinned to the CPU's, the idle time seen is nearly zero.
>
> (snip)
>
> > load_tasks()
> > {
> > for (( i=1; i<=5; i++ ))
> > do
> > jj=$(eval echo "\$NR_TASKS$i")
> > shares="1024"
> > if [ $PRO_SHARES -eq 1 ]
> > then
> > eval shares=$(echo "$jj * 1024" | bc)
> > fi
> > echo $hares > $MOUNT/$i/cpu.shares
> ^^^^^
> a fatal misprint? must be shares, I guess
>
> (Setting cpu.shares to "", i.e. to the minimal possible value, will
> definitely confuse the load balancer)
My bad. It was fatal typo, thanks for pointing it out. It made a big difference
in the idle time reported. After correcting to $shares, now the CPU idle time
reported is 20% to 22%. Which is 10% less from the previous reported number.
(snip)
There have been questions on how to interpret the results. Consider the
following test run without pinning of the cgroups tasks
Average CPU Idle percentage 20%
Bandwidth shared with remaining non-Idle 80%
Bandwidth of Group 1 = 7.9700% i.e = 6.3700% of non-Idle CPU time 80%
|...... subgroup 1/1 = 50.0200 i.e = 3.1800% of 6.3700% Groups non-Idle CPU time
|...... subgroup 1/2 = 49.9700 i.e = 3.1800% of 6.3700% Groups non-Idle CPU time
For example let consider the cgroup1 and sum_exec time is the 7 field
captured from the /proc/sched_debug
while1 27273 30665.912793 1988 120 30665.912793 30909.566767 0.021951 /1/2
while1 27272 30511.105690 1995 120 30511.105690 30942.998099 0.017369 /1/1
-----------------
61852.564866
-----------------
- The bandwidth for sub-cgroup1 of cgroup1 is calculated = (30909.566767 * 100) / 61852.564866
= ~50%
and sub-cgroup2 of cgroup1 is calculated = (30942.998099 * 100) / 61852.564866
= ~50%
In the similar way If we add up the sum_exec of all the groups its
------------------------------------------------------------------------------------------------
Group1 Group2 Group3 Group4 Group5 sum_exec
------------------------------------------------------------------------------------------------
61852.564866 + 61686.604930 + 122840.294858 + 232576.303937 +296166.889155 = 775122.657746
again taking the example of cgroup1
Total percentage of bandwidth allocated to cgroup1 = (61852.564866 * 100) / 775122.657746
= ~ 7.9% of total bandwidth of all the cgroups
Calculating the non-idle time is done with
Total (execution time * 100) / (no of cpus * 60000 ms) [script is run for a 60 seconds]
i.e. = (775122.657746 * 100) / (16 * 60000)
= ~80% of non-idle time
Percentage of bandwidth allocated to cgroup1 of the non-idle is derived as
= (cgroup bandwith percentage * non-idle time) / 100
= for cgroup1 = (7.9700 * 80) / 100
= 6.376% bandwidth allocated of non-Idle CPU time.
Bandwidth of Group 2 = 7.9500% i.e = 6.3600% of non-Idle CPU time 80%
|...... subgroup 2/1 = 49.9900 i.e = 3.1700% of 6.3600% Groups non-Idle CPU time
|...... subgroup 2/2 = 50.0000 i.e = 3.1800% of 6.3600% Groups non-Idle CPU time
Bandwidth of Group 3 = 15.8400% i.e = 12.6700% of non-Idle CPU time 80%
|...... subgroup 3/1 = 24.9900 i.e = 3.1600% of 12.6700% Groups non-Idle CPU time
|...... subgroup 3/2 = 24.9900 i.e = 3.1600% of 12.6700% Groups non-Idle CPU time
|...... subgroup 3/3 = 25.0600 i.e = 3.1700% of 12.6700% Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9400 i.e = 3.1500% of 12.6700% Groups non-Idle CPU time
Bandwidth of Group 4 = 30.0000% i.e = 24.0000% of non-Idle CPU time 80%
|...... subgroup 4/1 = 13.1600 i.e = 3.1500% of 24.0000% Groups non-Idle CPU time
|...... subgroup 4/2 = 11.3800 i.e = 2.7300% of 24.0000% Groups non-Idle CPU time
|...... subgroup 4/3 = 13.1100 i.e = 3.1400% of 24.0000% Groups non-Idle CPU time
|...... subgroup 4/4 = 12.3100 i.e = 2.9500% of 24.0000% Groups non-Idle CPU time
|...... subgroup 4/5 = 12.8200 i.e = 3.0700% of 24.0000% Groups non-Idle CPU time
|...... subgroup 4/6 = 11.0600 i.e = 2.6500% of 24.0000% Groups non-Idle CPU time
|...... subgroup 4/7 = 13.0600 i.e = 3.1300% of 24.0000% Groups non-Idle CPU time
|...... subgroup 4/8 = 13.0600 i.e = 3.1300% of 24.0000% Groups non-Idle CPU time
Bandwidth of Group 5 = 38.2000% i.e = 30.5600% of non-Idle CPU time 80%
|...... subgroup 5/1 = 48.1000 i.e = 14.6900% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/2 = 6.7900 i.e = 2.0700% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/3 = 6.3700 i.e = 1.9400% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/4 = 5.1800 i.e = 1.5800% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/5 = 5.0400 i.e = 1.5400% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/6 = 10.1400 i.e = 3.0900% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/7 = 5.0700 i.e = 1.5400% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/8 = 6.3900 i.e = 1.9500% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/9 = 6.8800 i.e = 2.1000% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/10 = 6.4700 i.e = 1.9700% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/11 = 6.5600 i.e = 2.0000% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/12 = 4.6400 i.e = 1.4100% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/13 = 7.4900 i.e = 2.2800% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/14 = 5.8200 i.e = 1.7700% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/15 = 6.5500 i.e = 2.0000% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/16 = 5.2700 i.e = 1.6100% of 30.5600% Groups non-Idle CPU time
Thanks,
Kamalesh.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned
2011-06-08 16:32 ` Kamalesh Babulal
@ 2011-06-09 3:25 ` Paul Turner
2011-06-10 18:17 ` Kamalesh Babulal
0 siblings, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-06-09 3:25 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: Vladimir Davydov, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri, Ingo Molnar, Pavel Emelianov
Hi Kamalesh,
I'm unable to reproduce the results you describe. One possibility is
load-balancer interaction -- can you describe the topology of the
platform you are running this on?
On both a straight NUMA topology and a hyper-threaded platform I
observe a ~4% delta between the pinned and un-pinned cases.
Thanks -- results below,
- Paul
16 cores -- pinned:
Average CPU Idle percentage 4.77419%
Bandwidth shared with remaining non-Idle 95.22581%
Bandwidth of Group 1 = 6.6300 i.e = 6.3100% of non-Idle CPU time 95.22581%
|...... subgroup 1/1 = 50.0400 i.e = 3.1500% of 6.3100%
Groups non-Idle CPU time
|...... subgroup 1/2 = 49.9500 i.e = 3.1500% of 6.3100%
Groups non-Idle CPU time
Bandwidth of Group 2 = 6.6300 i.e = 6.3100% of non-Idle CPU time 95.22581%
|...... subgroup 2/1 = 50.0300 i.e = 3.1500% of 6.3100%
Groups non-Idle CPU time
|...... subgroup 2/2 = 49.9600 i.e = 3.1500% of 6.3100%
Groups non-Idle CPU time
Bandwidth of Group 3 = 13.2000 i.e = 12.5600% of non-Idle CPU time 95.22581%
|...... subgroup 3/1 = 25.0200 i.e = 3.1400% of 12.5600%
Groups non-Idle CPU time
|...... subgroup 3/2 = 24.9500 i.e = 3.1300% of 12.5600%
Groups non-Idle CPU time
|...... subgroup 3/3 = 25.0400 i.e = 3.1400% of 12.5600%
Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9700 i.e = 3.1300% of 12.5600%
Groups non-Idle CPU time
Bandwidth of Group 4 = 26.1500 i.e = 24.9000% of non-Idle CPU time 95.22581%
|...... subgroup 4/1 = 12.4700 i.e = 3.1000% of 24.9000%
Groups non-Idle CPU time
|...... subgroup 4/2 = 12.5500 i.e = 3.1200% of 24.9000%
Groups non-Idle CPU time
|...... subgroup 4/3 = 12.4600 i.e = 3.1000% of 24.9000%
Groups non-Idle CPU time
|...... subgroup 4/4 = 12.5000 i.e = 3.1100% of 24.9000%
Groups non-Idle CPU time
|...... subgroup 4/5 = 12.5400 i.e = 3.1200% of 24.9000%
Groups non-Idle CPU time
|...... subgroup 4/6 = 12.4700 i.e = 3.1000% of 24.9000%
Groups non-Idle CPU time
|...... subgroup 4/7 = 12.5200 i.e = 3.1100% of 24.9000%
Groups non-Idle CPU time
|...... subgroup 4/8 = 12.4600 i.e = 3.1000% of 24.9000%
Groups non-Idle CPU time
Bandwidth of Group 5 = 47.3600 i.e = 45.0900% of non-Idle CPU time 95.22581%
|...... subgroup 5/1 = 49.9600 i.e = 22.5200% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/2 = 6.3600 i.e = 2.8600% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/3 = 6.2400 i.e = 2.8100% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/4 = 6.1900 i.e = 2.7900% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/5 = 6.2700 i.e = 2.8200% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/6 = 6.3400 i.e = 2.8500% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/7 = 6.1900 i.e = 2.7900% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/8 = 6.1500 i.e = 2.7700% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/9 = 6.2600 i.e = 2.8200% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/10 = 6.2800 i.e = 2.8300% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/11 = 6.2800 i.e = 2.8300% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/12 = 6.1400 i.e = 2.7600% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/13 = 6.0900 i.e = 2.7400% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/14 = 6.3000 i.e = 2.8400% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/15 = 6.1600 i.e = 2.7700% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/16 = 6.3400 i.e = 2.8500% of 45.0900%
Groups non-Idle CPU time
AMD 16 core -- pinned:
Average CPU Idle percentage 0%
Bandwidth shared with remaining non-Idle 100%
Bandwidth of Group 1 = 6.2800 i.e = 6.2800% of non-Idle CPU time 100%
|...... subgroup 1/1 = 50.0000 i.e = 3.1400% of 6.2800%
Groups non-Idle CPU time
|...... subgroup 1/2 = 49.9900 i.e = 3.1300% of 6.2800%
Groups non-Idle CPU time
Bandwidth of Group 2 = 6.2800 i.e = 6.2800% of non-Idle CPU time 100%
|...... subgroup 2/1 = 50.0000 i.e = 3.1400% of 6.2800%
Groups non-Idle CPU time
|...... subgroup 2/2 = 49.9900 i.e = 3.1300% of 6.2800%
Groups non-Idle CPU time
Bandwidth of Group 3 = 12.5500 i.e = 12.5500% of non-Idle CPU time 100%
|...... subgroup 3/1 = 25.0100 i.e = 3.1300% of 12.5500%
Groups non-Idle CPU time
|...... subgroup 3/2 = 25.0000 i.e = 3.1300% of 12.5500%
Groups non-Idle CPU time
|...... subgroup 3/3 = 24.9900 i.e = 3.1300% of 12.5500%
Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9700 i.e = 3.1300% of 12.5500%
Groups non-Idle CPU time
Bandwidth of Group 4 = 25.0400 i.e = 25.0400% of non-Idle CPU time 100%
|...... subgroup 4/1 = 12.5000 i.e = 3.1300% of 25.0400%
Groups non-Idle CPU time
|...... subgroup 4/2 = 12.5000 i.e = 3.1300% of 25.0400%
Groups non-Idle CPU time
|...... subgroup 4/3 = 12.5000 i.e = 3.1300% of 25.0400%
Groups non-Idle CPU time
|...... subgroup 4/4 = 12.5000 i.e = 3.1300% of 25.0400%
Groups non-Idle CPU time
|...... subgroup 4/5 = 12.5000 i.e = 3.1300% of 25.0400%
Groups non-Idle CPU time
|...... subgroup 4/6 = 12.4900 i.e = 3.1200% of 25.0400%
Groups non-Idle CPU time
|...... subgroup 4/7 = 12.4900 i.e = 3.1200% of 25.0400%
Groups non-Idle CPU time
|...... subgroup 4/8 = 12.4800 i.e = 3.1200% of 25.0400%
Groups non-Idle CPU time
Bandwidth of Group 5 = 49.8200 i.e = 49.8200% of non-Idle CPU time 100%
|...... subgroup 5/1 = 49.9400 i.e = 24.8800% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/2 = 6.2700 i.e = 3.1200% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/3 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/4 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/5 = 6.2500 i.e = 3.1100% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/6 = 6.2500 i.e = 3.1100% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/7 = 6.2600 i.e = 3.1100% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/8 = 6.2600 i.e = 3.1100% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/9 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/10 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/11 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/12 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/13 = 6.2200 i.e = 3.0900% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/14 = 6.2200 i.e = 3.0900% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/15 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/16 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time
16 core hyper-threaded subset of 24 core machine (threads not pinned
individually):
Average CPU Idle percentage 35.0645%
Bandwidth shared with remaining non-Idle 64.9355%
Bandwidth of Group 1 = 6.6000 i.e = 4.2800% of non-Idle CPU time 64.9355%
|...... subgroup 1/1 = 50.0600 i.e = 2.1400% of 4.2800%
Groups non-Idle CPU time
|...... subgroup 1/2 = 49.9300 i.e = 2.1300% of 4.2800%
Groups non-Idle CPU time
Bandwidth of Group 2 = 6.6000 i.e = 4.2800% of non-Idle CPU time 64.9355%
|...... subgroup 2/1 = 50.0100 i.e = 2.1400% of 4.2800%
Groups non-Idle CPU time
|...... subgroup 2/2 = 49.9800 i.e = 2.1300% of 4.2800%
Groups non-Idle CPU time
Bandwidth of Group 3 = 13.1600 i.e = 8.5400% of non-Idle CPU time 64.9355%
|...... subgroup 3/1 = 25.0200 i.e = 2.1300% of 8.5400%
Groups non-Idle CPU time
|...... subgroup 3/2 = 24.9900 i.e = 2.1300% of 8.5400%
Groups non-Idle CPU time
|...... subgroup 3/3 = 24.9900 i.e = 2.1300% of 8.5400%
Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9900 i.e = 2.1300% of 8.5400%
Groups non-Idle CPU time
Bandwidth of Group 4 = 25.9700 i.e = 16.8600% of non-Idle CPU time 64.9355%
|...... subgroup 4/1 = 12.5000 i.e = 2.1000% of 16.8600%
Groups non-Idle CPU time
|...... subgroup 4/2 = 12.5100 i.e = 2.1000% of 16.8600%
Groups non-Idle CPU time
|...... subgroup 4/3 = 12.6000 i.e = 2.1200% of 16.8600%
Groups non-Idle CPU time
|...... subgroup 4/4 = 12.3800 i.e = 2.0800% of 16.8600%
Groups non-Idle CPU time
|...... subgroup 4/5 = 12.4700 i.e = 2.1000% of 16.8600%
Groups non-Idle CPU time
|...... subgroup 4/6 = 12.4900 i.e = 2.1000% of 16.8600%
Groups non-Idle CPU time
|...... subgroup 4/7 = 12.5700 i.e = 2.1100% of 16.8600%
Groups non-Idle CPU time
|...... subgroup 4/8 = 12.4400 i.e = 2.0900% of 16.8600%
Groups non-Idle CPU time
Bandwidth of Group 5 = 47.6500 i.e = 30.9400% of non-Idle CPU time 64.9355%
|...... subgroup 5/1 = 50.5400 i.e = 15.6300% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/2 = 6.0400 i.e = 1.8600% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/3 = 6.0600 i.e = 1.8700% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/4 = 6.4300 i.e = 1.9800% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/5 = 6.3100 i.e = 1.9500% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/6 = 6.0000 i.e = 1.8500% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/7 = 6.3100 i.e = 1.9500% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/8 = 5.9800 i.e = 1.8500% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/9 = 6.2900 i.e = 1.9400% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/10 = 6.3300 i.e = 1.9500% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/11 = 6.5200 i.e = 2.0100% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/12 = 6.0500 i.e = 1.8700% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/13 = 6.3500 i.e = 1.9600% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/14 = 6.3500 i.e = 1.9600% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/15 = 6.3400 i.e = 1.9600% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/16 = 6.4200 i.e = 1.9800% of 30.9400%
Groups non-Idle CPU time
16 core hyper-threaded subset of 24 core machine (threads individually):
Average CPU Idle percentage 31.7419%
Bandwidth shared with remaining non-Idle 68.2581%
Bandwidth of Group 1 = 6.2700 i.e = 4.2700% of non-Idle CPU time 68.2581%
|...... subgroup 1/1 = 50.0100 i.e = 2.1300% of 4.2700%
Groups non-Idle CPU time
|...... subgroup 1/2 = 49.9800 i.e = 2.1300% of 4.2700%
Groups non-Idle CPU time
Bandwidth of Group 2 = 6.2700 i.e = 4.2700% of non-Idle CPU time 68.2581%
|...... subgroup 2/1 = 50.0100 i.e = 2.1300% of 4.2700%
Groups non-Idle CPU time
|...... subgroup 2/2 = 49.9800 i.e = 2.1300% of 4.2700%
Groups non-Idle CPU time
Bandwidth of Group 3 = 12.5300 i.e = 8.5500% of non-Idle CPU time 68.2581%
|...... subgroup 3/1 = 25.0100 i.e = 2.1300% of 8.5500%
Groups non-Idle CPU time
|...... subgroup 3/2 = 25.0000 i.e = 2.1300% of 8.5500%
Groups non-Idle CPU time
|...... subgroup 3/3 = 24.9900 i.e = 2.1300% of 8.5500%
Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9800 i.e = 2.1300% of 8.5500%
Groups non-Idle CPU time
Bandwidth of Group 4 = 25.0200 i.e = 17.0700% of non-Idle CPU time 68.2581%
|...... subgroup 4/1 = 12.5100 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time
|...... subgroup 4/2 = 12.5000 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time
|...... subgroup 4/3 = 12.5000 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time
|...... subgroup 4/4 = 12.5000 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time
|...... subgroup 4/5 = 12.5000 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time
|...... subgroup 4/6 = 12.4900 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time
|...... subgroup 4/7 = 12.4900 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time
|...... subgroup 4/8 = 12.4800 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time
Bandwidth of Group 5 = 49.8900 i.e = 34.0500% of non-Idle CPU time 68.2581%
|...... subgroup 5/1 = 49.9600 i.e = 17.0100% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/2 = 6.2600 i.e = 2.1300% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/3 = 6.2600 i.e = 2.1300% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/4 = 6.2500 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/5 = 6.2500 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/6 = 6.2500 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/7 = 6.2500 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/8 = 6.2500 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/9 = 6.2400 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/10 = 6.2400 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/11 = 6.2400 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/12 = 6.2400 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/13 = 6.2400 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/14 = 6.2400 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/15 = 6.2300 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/16 = 6.2300 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
On Wed, Jun 8, 2011 at 9:32 AM, Kamalesh Babulal
<kamalesh@linux.vnet.ibm.com> wrote:
> * Vladimir Davydov <vdavydov@parallels.com> [2011-06-08 14:46:06]:
>
>> On Tue, 2011-06-07 at 19:45 +0400, Kamalesh Babulal wrote:
>> > Hi All,
>> >
>> > In our test environment, while testing the CFS Bandwidth V6 patch set
>> > on top of 55922c9d1b84. We observed that the CPU's idle time is seen
>> > between 30% to 40% while running CPU bound test, with the cgroups tasks
>> > not pinned to the CPU's. Whereas in the inverse case, where the cgroups
>> > tasks are pinned to the CPU's, the idle time seen is nearly zero.
>>
>> (snip)
>>
>> > load_tasks()
>> > {
>> > for (( i=1; i<=5; i++ ))
>> > do
>> > jj=$(eval echo "\$NR_TASKS$i")
>> > shares="1024"
>> > if [ $PRO_SHARES -eq 1 ]
>> > then
>> > eval shares=$(echo "$jj * 1024" | bc)
>> > fi
>> > echo $hares > $MOUNT/$i/cpu.shares
>> ^^^^^
>> a fatal misprint? must be shares, I guess
>>
>> (Setting cpu.shares to "", i.e. to the minimal possible value, will
>> definitely confuse the load balancer)
>
> My bad. It was fatal typo, thanks for pointing it out. It made a big difference
> in the idle time reported. After correcting to $shares, now the CPU idle time
> reported is 20% to 22%. Which is 10% less from the previous reported number.
>
> (snip)
>
> There have been questions on how to interpret the results. Consider the
> following test run without pinning of the cgroups tasks
>
> Average CPU Idle percentage 20%
> Bandwidth shared with remaining non-Idle 80%
>
> Bandwidth of Group 1 = 7.9700% i.e = 6.3700% of non-Idle CPU time 80%
> |...... subgroup 1/1 = 50.0200 i.e = 3.1800% of 6.3700% Groups non-Idle CPU time
> |...... subgroup 1/2 = 49.9700 i.e = 3.1800% of 6.3700% Groups non-Idle CPU time
>
> For example let consider the cgroup1 and sum_exec time is the 7 field
> captured from the /proc/sched_debug
>
> while1 27273 30665.912793 1988 120 30665.912793 30909.566767 0.021951 /1/2
> while1 27272 30511.105690 1995 120 30511.105690 30942.998099 0.017369 /1/1
> -----------------
>
> 61852.564866
> -----------------
> - The bandwidth for sub-cgroup1 of cgroup1 is calculated = (30909.566767 * 100) / 61852.564866
> = ~50%
>
> and sub-cgroup2 of cgroup1 is calculated = (30942.998099 * 100) / 61852.564866
> = ~50%
>
> In the similar way If we add up the sum_exec of all the groups its
> ------------------------------------------------------------------------------------------------
> Group1 Group2 Group3 Group4 Group5 sum_exec
> ------------------------------------------------------------------------------------------------
> 61852.564866 + 61686.604930 + 122840.294858 + 232576.303937 +296166.889155 = 775122.657746
>
> again taking the example of cgroup1
> Total percentage of bandwidth allocated to cgroup1 = (61852.564866 * 100) / 775122.657746
> = ~ 7.9% of total bandwidth of all the cgroups
>
>
> Calculating the non-idle time is done with
> Total (execution time * 100) / (no of cpus * 60000 ms) [script is run for a 60 seconds]
> i.e. = (775122.657746 * 100) / (16 * 60000)
> = ~80% of non-idle time
>
> Percentage of bandwidth allocated to cgroup1 of the non-idle is derived as
> = (cgroup bandwith percentage * non-idle time) / 100
> = for cgroup1 = (7.9700 * 80) / 100
> = 6.376% bandwidth allocated of non-Idle CPU time.
>
>
> Bandwidth of Group 2 = 7.9500% i.e = 6.3600% of non-Idle CPU time 80%
> |...... subgroup 2/1 = 49.9900 i.e = 3.1700% of 6.3600% Groups non-Idle CPU time
> |...... subgroup 2/2 = 50.0000 i.e = 3.1800% of 6.3600% Groups non-Idle CPU time
>
>
> Bandwidth of Group 3 = 15.8400% i.e = 12.6700% of non-Idle CPU time 80%
> |...... subgroup 3/1 = 24.9900 i.e = 3.1600% of 12.6700% Groups non-Idle CPU time
> |...... subgroup 3/2 = 24.9900 i.e = 3.1600% of 12.6700% Groups non-Idle CPU time
> |...... subgroup 3/3 = 25.0600 i.e = 3.1700% of 12.6700% Groups non-Idle CPU time
> |...... subgroup 3/4 = 24.9400 i.e = 3.1500% of 12.6700% Groups non-Idle CPU time
>
>
> Bandwidth of Group 4 = 30.0000% i.e = 24.0000% of non-Idle CPU time 80%
> |...... subgroup 4/1 = 13.1600 i.e = 3.1500% of 24.0000% Groups non-Idle CPU time
> |...... subgroup 4/2 = 11.3800 i.e = 2.7300% of 24.0000% Groups non-Idle CPU time
> |...... subgroup 4/3 = 13.1100 i.e = 3.1400% of 24.0000% Groups non-Idle CPU time
> |...... subgroup 4/4 = 12.3100 i.e = 2.9500% of 24.0000% Groups non-Idle CPU time
> |...... subgroup 4/5 = 12.8200 i.e = 3.0700% of 24.0000% Groups non-Idle CPU time
> |...... subgroup 4/6 = 11.0600 i.e = 2.6500% of 24.0000% Groups non-Idle CPU time
> |...... subgroup 4/7 = 13.0600 i.e = 3.1300% of 24.0000% Groups non-Idle CPU time
> |...... subgroup 4/8 = 13.0600 i.e = 3.1300% of 24.0000% Groups non-Idle CPU time
>
>
> Bandwidth of Group 5 = 38.2000% i.e = 30.5600% of non-Idle CPU time 80%
> |...... subgroup 5/1 = 48.1000 i.e = 14.6900% of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/2 = 6.7900 i.e = 2.0700% of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/3 = 6.3700 i.e = 1.9400% of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/4 = 5.1800 i.e = 1.5800% of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/5 = 5.0400 i.e = 1.5400% of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/6 = 10.1400 i.e = 3.0900% of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/7 = 5.0700 i.e = 1.5400% of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/8 = 6.3900 i.e = 1.9500% of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/9 = 6.8800 i.e = 2.1000% of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/10 = 6.4700 i.e = 1.9700% of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/11 = 6.5600 i.e = 2.0000% of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/12 = 4.6400 i.e = 1.4100% of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/13 = 7.4900 i.e = 2.2800% of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/14 = 5.8200 i.e = 1.7700% of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/15 = 6.5500 i.e = 2.0000% of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/16 = 5.2700 i.e = 1.6100% of 30.5600% Groups non-Idle CPU time
>
> Thanks,
> Kamalesh.
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned
2011-06-09 3:25 ` Paul Turner
@ 2011-06-10 18:17 ` Kamalesh Babulal
2011-06-14 0:00 ` Paul Turner
0 siblings, 1 reply; 129+ messages in thread
From: Kamalesh Babulal @ 2011-06-10 18:17 UTC (permalink / raw)
To: Paul Turner
Cc: Vladimir Davydov, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri, Ingo Molnar, Pavel Emelianov
* Paul Turner <pjt@google.com> [2011-06-08 20:25:00]:
> Hi Kamalesh,
>
> I'm unable to reproduce the results you describe. One possibility is
> load-balancer interaction -- can you describe the topology of the
> platform you are running this on?
>
> On both a straight NUMA topology and a hyper-threaded platform I
> observe a ~4% delta between the pinned and un-pinned cases.
>
> Thanks -- results below,
>
> - Paul
>
>
(snip)
Hi Paul,
That box is down. I tried running the test on the 2-socket quad-core with
HT and I was not able to reproduce the issue. CPU idle time reported with
both pinned and un-pinned case was ~0. But if we create a cgroup hirerachy
of 3 levels above the 5 cgroups, instead of the current hirerachy where all
the 5 cgroups created under /cgroup. The Idle time is seen on 2-socket
quad-core (HT) box.
-----------
| cgroups |
-----------
|
-----------
| level 1 |
-----------
|
-----------
| level 2 |
-----------
|
-----------
| level 3 |
-----------
/ / | \ \
/ / | \ \
cgrp1 cgrp2 cgrp3 cgrp4 cgrp5
Un-pinned run
--------------
Average CPU Idle percentage 24.8333%
Bandwidth shared with remaining non-Idle 75.1667%
Bandwidth of Group 1 = 8.3700 i.e = 6.2900% of non-Idle CPU time 75.1667%
|...... subgroup 1/1 = 49.9900 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
|...... subgroup 1/2 = 50.0000 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
Bandwidth of Group 2 = 8.3700 i.e = 6.2900% of non-Idle CPU time 75.1667%
|...... subgroup 2/1 = 49.9900 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
|...... subgroup 2/2 = 50.0000 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
Bandwidth of Group 3 = 16.6500 i.e = 12.5100% of non-Idle CPU time 75.1667%
|...... subgroup 3/1 = 25.0000 i.e = 3.1200% of 12.5100% Groups non-Idle CPU time
|...... subgroup 3/2 = 24.9100 i.e = 3.1100% of 12.5100% Groups non-Idle CPU time
|...... subgroup 3/3 = 25.0800 i.e = 3.1300% of 12.5100% Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9900 i.e = 3.1200% of 12.5100% Groups non-Idle CPU time
Bandwidth of Group 4 = 29.3600 i.e = 22.0600% of non-Idle CPU time 75.1667%
|...... subgroup 4/1 = 12.0200 i.e = 2.6500% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/2 = 12.3800 i.e = 2.7300% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/3 = 13.6300 i.e = 3.0000% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/4 = 12.7000 i.e = 2.8000% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/5 = 12.8000 i.e = 2.8200% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/6 = 11.9600 i.e = 2.6300% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/7 = 12.7400 i.e = 2.8100% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/8 = 11.7300 i.e = 2.5800% of 22.0600% Groupsnon-Idle CPU time
Bandwidth of Group 5 = 37.2300 i.e = 27.9800% of non-Idle CPU time 75.1667%
|...... subgroup 5/1 = 47.7200 i.e = 13.3500% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/2 = 5.2000 i.e = 1.4500% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/3 = 6.3600 i.e = 1.7700% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/4 = 6.3600 i.e = 1.7700% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/5 = 7.9800 i.e = 2.2300% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/6 = 5.1800 i.e = 1.4400% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/7 = 7.4900 i.e = 2.0900% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/8 = 5.9200 i.e = 1.6500% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/9 = 7.7500 i.e = 2.1600% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/10 = 4.8100 i.e = 1.3400% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/11 = 4.9300 i.e = 1.3700% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/12 = 6.8900 i.e = 1.9200% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/13 = 6.0700 i.e = 1.6900% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/14 = 6.5200 i.e = 1.8200% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/15 = 5.9200 i.e = 1.6500% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/16 = 6.6400 i.e = 1.8500% of 27.9800% Groups non-Idle CPU time
Pinned Run
----------
Average CPU Idle percentage 0%
Bandwidth shared with remaining non-Idle 100%
Bandwidth of Group 1 = 6.2700 i.e = 6.2700% of non-Idle CPU time 100%
|...... subgroup 1/1 = 50.0100 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
|...... subgroup 1/2 = 49.9800 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
Bandwidth of Group 2 = 6.2700 i.e = 6.2700% of non-Idle CPU time 100%
|...... subgroup 2/1 = 50.0000 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
|...... subgroup 2/2 = 49.9900 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
Bandwidth of Group 3 = 12.5300 i.e = 12.5300% of non-Idle CPU time 100%
|...... subgroup 3/1 = 25.0100 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
|...... subgroup 3/2 = 25.0000 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
|...... subgroup 3/3 = 24.9900 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9900 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
Bandwidth of Group 4 = 25.0200 i.e = 25.0200% of non-Idle CPU time 100%
|...... subgroup 4/1 = 12.5100 i.e = 3.1300% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/2 = 12.5000 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/3 = 12.5000 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/4 = 12.5000 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/5 = 12.4900 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/6 = 12.4900 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/7 = 12.4900 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/8 = 12.4800 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
Bandwidth of Group 5 = 49.8800 i.e = 49.8800% of non-Idle CPU time 100%
|...... subgroup 5/1 = 49.9600 i.e = 24.9200% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/2 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/3 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/4 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/5 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/6 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/7 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/8 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/9 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/10 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/11 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/12 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/13 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/14 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/15 = 6.2300 i.e = 3.1000% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/16 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
Modified script
---------------
#!/bin/bash
NR_TASKS1=2
NR_TASKS2=2
NR_TASKS3=4
NR_TASKS4=8
NR_TASKS5=16
BANDWIDTH=1
SUBGROUP=1
PRO_SHARES=0
MOUNT_POINT=/cgroups/
MOUNT=/cgroups/
LOAD=./while1
LEVELS=3
usage()
{
echo "Usage $0: [-b 0|1] [-s 0|1] [-p 0|1]"
echo "-b 1|0 set/unset Cgroups bandwidth control (default set)"
echo "-s Create sub-groups for every task (default creates sub-group)"
echo "-p create propotional shares based on cpus"
exit
}
while getopts ":b:s:p:" arg
do
case $arg in
b)
BANDWIDTH=$OPTARG
shift
if [ $BANDWIDTH -gt 1 ] && [ $BANDWIDTH -lt 0 ]
then
usage
fi
;;
s)
SUBGROUP=$OPTARG
shift
if [ $SUBGROUP -gt 1 ] && [ $SUBGROUP -lt 0 ]
then
usage
fi
;;
p)
PRO_SHARES=$OPTARG
shift
if [ $PRO_SHARES -gt 1 ] && [ $PRO_SHARES -lt 0 ]
then
usage
fi
;;
*)
esac
done
if [ ! -d $MOUNT ]
then
mkdir -p $MOUNT
fi
test()
{
echo -n "[ "
if [ $1 -eq 0 ]
then
echo -ne '\E[42;40mOk'
else
echo -ne '\E[31;40mFailed'
tput sgr0
echo " ]"
exit
fi
tput sgr0
echo " ]"
}
mount_cgrp()
{
echo -n "Mounting root cgroup "
mount -t cgroup -ocpu,cpuset,cpuacct none $MOUNT_POINT &> /dev/null
test $?
}
umount_cgrp()
{
echo -n "Unmounting root cgroup "
cd /root/
umount $MOUNT_POINT
test $?
}
create_hierarchy()
{
mount_cgrp
cpuset_mem=`cat $MOUNT/cpuset.mems`
cpuset_cpu=`cat $MOUNT/cpuset.cpus`
echo -n "creating hierarchy of levels $LEVELS "
for (( i=1; i<=$LEVELS; i++ ))
do
MOUNT="${MOUNT}/level${i}"
mkdir $MOUNT
echo $cpuset_mem > $MOUNT/cpuset.mems
echo $cpuset_cpu > $MOUNT/cpuset.cpus
echo "-1" > $MOUNT/cpu.cfs_quota_us
echo "500000" > $MOUNT/cpu.cfs_period_us
echo -n " .."
done
echo " "
echo $MOUNT
echo -n "creating groups/sub-groups ..."
for (( i=1; i<=5; i++ ))
do
mkdir $MOUNT/$i
echo $cpuset_mem > $MOUNT/$i/cpuset.mems
echo $cpuset_cpu > $MOUNT/$i/cpuset.cpus
echo -n ".."
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
mkdir -p $MOUNT/$i/$j
echo $cpuset_mem > $MOUNT/$i/$j/cpuset.mems
echo $cpuset_cpu > $MOUNT/$i/$j/cpuset.cpus
echo -n ".."
done
fi
done
echo "."
}
cleanup()
{
pkill -9 while1 &> /dev/null
sleep 10
echo -n "Umount groups/sub-groups .."
for (( i=1; i<=5; i++ ))
do
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
rmdir $MOUNT/$i/$j
echo -n ".."
done
fi
rmdir $MOUNT/$i
echo -n ".."
done
cd $MOUNT
cd ../
for (( i=$LEVELS; i>=1; i-- ))
do
rmdir level$i
cd ../
done
echo " "
umount_cgrp
}
load_tasks()
{
for (( i=1; i<=5; i++ ))
do
jj=$(eval echo "\$NR_TASKS$i")
shares="1024"
if [ $PRO_SHARES -eq 1 ]
then
eval shares=$(echo "$jj * 1024" | bc)
fi
echo $shares > $MOUNT/$i/cpu.shares
for (( j=1; j<=$jj; j++ ))
do
echo "-1" > $MOUNT/$i/cpu.cfs_quota_us
echo "500000" > $MOUNT/$i/cpu.cfs_period_us
if [ $SUBGROUP -eq 1 ]
then
$LOAD &
echo $! > $MOUNT/$i/$j/tasks
echo "1024" > $MOUNT/$i/$j/cpu.shares
if [ $BANDWIDTH -eq 1 ]
then
echo "500000" > $MOUNT/$i/$j/cpu.cfs_period_us
echo "250000" > $MOUNT/$i/$j/cpu.cfs_quota_us
fi
else
$LOAD &
echo $! > $MOUNT/$i/tasks
echo $shares > $MOUNT/$i/cpu.shares
if [ $BANDWIDTH -eq 1 ]
then
echo "500000" > $MOUNT/$i/cpu.cfs_period_us
echo "250000" > $MOUNT/$i/cpu.cfs_quota_us
fi
fi
done
done
echo "Capturing idle cpu time with vmstat...."
vmstat 2 100 &> vmstat_log &
}
pin_tasks()
{
cpu=0
count=1
for (( i=1; i<=5; i++ ))
do
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
if [ $count -gt 2 ]
then
cpu=$((cpu+1))
count=1
fi
echo $cpu > $MOUNT/$i/$j/cpuset.cpus
count=$((count+1))
done
else
case $i in
1)
echo 0 > $MOUNT/$i/cpuset.cpus;;
2)
echo 1 > $MOUNT/$i/cpuset.cpus;;
3)
echo "2-3" > $MOUNT/$i/cpuset.cpus;;
4)
echo "4-6" > $MOUNT/$i/cpuset.cpus;;
5)
echo "7-15" > $MOUNT/$i/cpuset.cpus;;
esac
fi
done
}
print_results()
{
eval gtot=$(cat sched_log|grep -i while|sed 's/R//g'|awk '{gtot+=$7};END{printf "%f", gtot}')
for (( i=1; i<=5; i++ ))
do
eval temp=$(cat sched_log_$i|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
eval tavg=$(echo "scale=4;(($temp / $gtot) * $1)/100 " | bc)
eval avg=$(echo "scale=4;($temp / $gtot) * 100" | bc)
eval pretty_tavg=$( echo "scale=4; $tavg * 100"| bc) # F0r pretty format
echo "Bandwidth of Group $i = $avg i.e = $pretty_tavg% of non-Idle CPU time $1%"
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
eval tmp=$(cat sched_log_$i-$j|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
eval stavg=$(echo "scale=4;($tmp / $temp) * 100" | bc)
eval pretty_stavg=$(echo "scale=4;(($tmp / $temp) * $tavg) * 100" | bc)
echo -n "|"
echo -e "...... subgroup $i/$j\t= $stavg\ti.e = $pretty_stavg% of $pretty_tavg% Groups non-Idle CPU time"
done
fi
echo " "
echo " "
done
}
capture_results()
{
cat /proc/sched_debug > sched_log
lev=""
for (( i=1; i<=$LEVELS; i++ ))
do
lev="$lev\/level${i}"
done
pkill -9 vmstat
avg=$(cat vmstat_log |grep -iv "system"|grep -iv "swpd"|awk ' { if ( NR != 1) {id+=$15 }}END{print (id/(NR-1))}')
rem=$(echo "scale=2; 100 - $avg" |bc)
echo "Average CPU Idle percentage $avg%"
echo "Bandwidth shared with remaining non-Idle $rem%"
for (( i=1; i<=5; i++ ))
do
cat sched_log |grep -i while1|grep -i "$lev\/$i" > sched_log_$i
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
cat sched_log |grep -i while1|grep -i "$lev\/$i\/$j" > sched_log_$i-$j
done
fi
done
print_results $rem
}
create_hierarchy
pin_tasks
load_tasks
sleep 60
capture_results
cleanup
exit
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned
2011-06-10 18:17 ` Kamalesh Babulal
@ 2011-06-14 0:00 ` Paul Turner
2011-06-15 5:37 ` Kamalesh Babulal
0 siblings, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-06-14 0:00 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: Vladimir Davydov, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri, Ingo Molnar, Pavel Emelianov
Hi Kamalesh.
I tried on both friday and again today to reproduce your results
without success. Results are attached below. The margin of error is
the same as the previous (2-level deep case), ~4%. One minor nit, in
your script's input parsing you're calling shift; you don't need to do
this with getopts and it will actually lead to arguments being
dropped.
Are you testing on top of a clean -tip? Do you have any custom
load-balancer or scheduler settings?
Thanks,
- Paul
Hyper-threaded topology:
unpinned:
Average CPU Idle percentage 38.6333%
Bandwidth shared with remaining non-Idle 61.3667%
pinned:
Average CPU Idle percentage 35.2766%
Bandwidth shared with remaining non-Idle 64.7234%
(The mask in the "unpinned" case is 0-3,6-9,12-15,18-21 which should
mirror your 2 socket 8x2 configuration.)
4-way NUMA topology:
unpinned:
Average CPU Idle percentage 5.26667%
Bandwidth shared with remaining non-Idle 94.73333%
pinned:
Average CPU Idle percentage 0.242424%
Bandwidth shared with remaining non-Idle 99.757576%
On Fri, Jun 10, 2011 at 11:17 AM, Kamalesh Babulal
<kamalesh@linux.vnet.ibm.com> wrote:
> * Paul Turner <pjt@google.com> [2011-06-08 20:25:00]:
>
>> Hi Kamalesh,
>>
>> I'm unable to reproduce the results you describe. One possibility is
>> load-balancer interaction -- can you describe the topology of the
>> platform you are running this on?
>>
>> On both a straight NUMA topology and a hyper-threaded platform I
>> observe a ~4% delta between the pinned and un-pinned cases.
>>
>> Thanks -- results below,
>>
>> - Paul
>>
>>
> (snip)
>
> Hi Paul,
>
> That box is down. I tried running the test on the 2-socket quad-core with
> HT and I was not able to reproduce the issue. CPU idle time reported with
> both pinned and un-pinned case was ~0. But if we create a cgroup hirerachy
> of 3 levels above the 5 cgroups, instead of the current hirerachy where all
> the 5 cgroups created under /cgroup. The Idle time is seen on 2-socket
> quad-core (HT) box.
>
> -----------
> | cgroups |
> -----------
> |
> -----------
> | level 1 |
> -----------
> |
> -----------
> | level 2 |
> -----------
> |
> -----------
> | level 3 |
> -----------
> / / | \ \
> / / | \ \
> cgrp1 cgrp2 cgrp3 cgrp4 cgrp5
>
>
> Un-pinned run
> --------------
>
> Average CPU Idle percentage 24.8333%
> Bandwidth shared with remaining non-Idle 75.1667%
> Bandwidth of Group 1 = 8.3700 i.e = 6.2900% of non-Idle CPU time 75.1667%
> |...... subgroup 1/1 = 49.9900 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
> |...... subgroup 1/2 = 50.0000 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
>
>
> Bandwidth of Group 2 = 8.3700 i.e = 6.2900% of non-Idle CPU time 75.1667%
> |...... subgroup 2/1 = 49.9900 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
> |...... subgroup 2/2 = 50.0000 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
>
>
> Bandwidth of Group 3 = 16.6500 i.e = 12.5100% of non-Idle CPU time 75.1667%
> |...... subgroup 3/1 = 25.0000 i.e = 3.1200% of 12.5100% Groups non-Idle CPU time
> |...... subgroup 3/2 = 24.9100 i.e = 3.1100% of 12.5100% Groups non-Idle CPU time
> |...... subgroup 3/3 = 25.0800 i.e = 3.1300% of 12.5100% Groups non-Idle CPU time
> |...... subgroup 3/4 = 24.9900 i.e = 3.1200% of 12.5100% Groups non-Idle CPU time
>
>
> Bandwidth of Group 4 = 29.3600 i.e = 22.0600% of non-Idle CPU time 75.1667%
> |...... subgroup 4/1 = 12.0200 i.e = 2.6500% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/2 = 12.3800 i.e = 2.7300% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/3 = 13.6300 i.e = 3.0000% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/4 = 12.7000 i.e = 2.8000% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/5 = 12.8000 i.e = 2.8200% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/6 = 11.9600 i.e = 2.6300% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/7 = 12.7400 i.e = 2.8100% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/8 = 11.7300 i.e = 2.5800% of 22.0600% Groupsnon-Idle CPU time
>
>
> Bandwidth of Group 5 = 37.2300 i.e = 27.9800% of non-Idle CPU time 75.1667%
> |...... subgroup 5/1 = 47.7200 i.e = 13.3500% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/2 = 5.2000 i.e = 1.4500% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/3 = 6.3600 i.e = 1.7700% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/4 = 6.3600 i.e = 1.7700% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/5 = 7.9800 i.e = 2.2300% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/6 = 5.1800 i.e = 1.4400% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/7 = 7.4900 i.e = 2.0900% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/8 = 5.9200 i.e = 1.6500% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/9 = 7.7500 i.e = 2.1600% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/10 = 4.8100 i.e = 1.3400% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/11 = 4.9300 i.e = 1.3700% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/12 = 6.8900 i.e = 1.9200% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/13 = 6.0700 i.e = 1.6900% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/14 = 6.5200 i.e = 1.8200% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/15 = 5.9200 i.e = 1.6500% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/16 = 6.6400 i.e = 1.8500% of 27.9800% Groups non-Idle CPU time
>
> Pinned Run
> ----------
>
> Average CPU Idle percentage 0%
> Bandwidth shared with remaining non-Idle 100%
> Bandwidth of Group 1 = 6.2700 i.e = 6.2700% of non-Idle CPU time 100%
> |...... subgroup 1/1 = 50.0100 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
> |...... subgroup 1/2 = 49.9800 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
>
>
> Bandwidth of Group 2 = 6.2700 i.e = 6.2700% of non-Idle CPU time 100%
> |...... subgroup 2/1 = 50.0000 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
> |...... subgroup 2/2 = 49.9900 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
>
>
> Bandwidth of Group 3 = 12.5300 i.e = 12.5300% of non-Idle CPU time 100%
> |...... subgroup 3/1 = 25.0100 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
> |...... subgroup 3/2 = 25.0000 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
> |...... subgroup 3/3 = 24.9900 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
> |...... subgroup 3/4 = 24.9900 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
>
>
> Bandwidth of Group 4 = 25.0200 i.e = 25.0200% of non-Idle CPU time 100%
> |...... subgroup 4/1 = 12.5100 i.e = 3.1300% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/2 = 12.5000 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/3 = 12.5000 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/4 = 12.5000 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/5 = 12.4900 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/6 = 12.4900 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/7 = 12.4900 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/8 = 12.4800 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
>
>
> Bandwidth of Group 5 = 49.8800 i.e = 49.8800% of non-Idle CPU time 100%
> |...... subgroup 5/1 = 49.9600 i.e = 24.9200% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/2 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/3 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/4 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/5 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/6 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/7 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/8 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/9 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/10 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/11 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/12 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/13 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/14 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/15 = 6.2300 i.e = 3.1000% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/16 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
>
> Modified script
> ---------------
>
> #!/bin/bash
>
> NR_TASKS1=2
> NR_TASKS2=2
> NR_TASKS3=4
> NR_TASKS4=8
> NR_TASKS5=16
>
> BANDWIDTH=1
> SUBGROUP=1
> PRO_SHARES=0
> MOUNT_POINT=/cgroups/
> MOUNT=/cgroups/
> LOAD=./while1
> LEVELS=3
>
> usage()
> {
> echo "Usage $0: [-b 0|1] [-s 0|1] [-p 0|1]"
> echo "-b 1|0 set/unset Cgroups bandwidth control (default set)"
> echo "-s Create sub-groups for every task (default creates sub-group)"
> echo "-p create propotional shares based on cpus"
> exit
> }
> while getopts ":b:s:p:" arg
> do
> case $arg in
> b)
> BANDWIDTH=$OPTARG
> shift
> if [ $BANDWIDTH -gt 1 ] && [ $BANDWIDTH -lt 0 ]
> then
> usage
> fi
> ;;
> s)
> SUBGROUP=$OPTARG
> shift
> if [ $SUBGROUP -gt 1 ] && [ $SUBGROUP -lt 0 ]
> then
> usage
> fi
> ;;
> p)
> PRO_SHARES=$OPTARG
> shift
> if [ $PRO_SHARES -gt 1 ] && [ $PRO_SHARES -lt 0 ]
> then
> usage
> fi
> ;;
>
> *)
>
> esac
> done
> if [ ! -d $MOUNT ]
> then
> mkdir -p $MOUNT
> fi
> test()
> {
> echo -n "[ "
> if [ $1 -eq 0 ]
> then
> echo -ne '\E[42;40mOk'
> else
> echo -ne '\E[31;40mFailed'
> tput sgr0
> echo " ]"
> exit
> fi
> tput sgr0
> echo " ]"
> }
> mount_cgrp()
> {
> echo -n "Mounting root cgroup "
> mount -t cgroup -ocpu,cpuset,cpuacct none $MOUNT_POINT &> /dev/null
> test $?
> }
>
> umount_cgrp()
> {
> echo -n "Unmounting root cgroup "
> cd /root/
> umount $MOUNT_POINT
> test $?
> }
>
> create_hierarchy()
> {
> mount_cgrp
> cpuset_mem=`cat $MOUNT/cpuset.mems`
> cpuset_cpu=`cat $MOUNT/cpuset.cpus`
> echo -n "creating hierarchy of levels $LEVELS "
> for (( i=1; i<=$LEVELS; i++ ))
> do
> MOUNT="${MOUNT}/level${i}"
> mkdir $MOUNT
> echo $cpuset_mem > $MOUNT/cpuset.mems
> echo $cpuset_cpu > $MOUNT/cpuset.cpus
> echo "-1" > $MOUNT/cpu.cfs_quota_us
> echo "500000" > $MOUNT/cpu.cfs_period_us
> echo -n " .."
> done
> echo " "
> echo $MOUNT
> echo -n "creating groups/sub-groups ..."
> for (( i=1; i<=5; i++ ))
> do
> mkdir $MOUNT/$i
> echo $cpuset_mem > $MOUNT/$i/cpuset.mems
> echo $cpuset_cpu > $MOUNT/$i/cpuset.cpus
> echo -n ".."
> if [ $SUBGROUP -eq 1 ]
> then
> jj=$(eval echo "\$NR_TASKS$i")
> for (( j=1; j<=$jj; j++ ))
> do
> mkdir -p $MOUNT/$i/$j
> echo $cpuset_mem > $MOUNT/$i/$j/cpuset.mems
> echo $cpuset_cpu > $MOUNT/$i/$j/cpuset.cpus
> echo -n ".."
> done
> fi
> done
> echo "."
> }
>
> cleanup()
> {
> pkill -9 while1 &> /dev/null
> sleep 10
> echo -n "Umount groups/sub-groups .."
> for (( i=1; i<=5; i++ ))
> do
> if [ $SUBGROUP -eq 1 ]
> then
> jj=$(eval echo "\$NR_TASKS$i")
> for (( j=1; j<=$jj; j++ ))
> do
> rmdir $MOUNT/$i/$j
> echo -n ".."
> done
> fi
> rmdir $MOUNT/$i
> echo -n ".."
> done
> cd $MOUNT
> cd ../
> for (( i=$LEVELS; i>=1; i-- ))
> do
> rmdir level$i
> cd ../
> done
> echo " "
> umount_cgrp
> }
>
> load_tasks()
> {
> for (( i=1; i<=5; i++ ))
> do
> jj=$(eval echo "\$NR_TASKS$i")
> shares="1024"
> if [ $PRO_SHARES -eq 1 ]
> then
> eval shares=$(echo "$jj * 1024" | bc)
> fi
> echo $shares > $MOUNT/$i/cpu.shares
> for (( j=1; j<=$jj; j++ ))
> do
> echo "-1" > $MOUNT/$i/cpu.cfs_quota_us
> echo "500000" > $MOUNT/$i/cpu.cfs_period_us
> if [ $SUBGROUP -eq 1 ]
> then
>
> $LOAD &
> echo $! > $MOUNT/$i/$j/tasks
> echo "1024" > $MOUNT/$i/$j/cpu.shares
>
> if [ $BANDWIDTH -eq 1 ]
> then
> echo "500000" > $MOUNT/$i/$j/cpu.cfs_period_us
> echo "250000" > $MOUNT/$i/$j/cpu.cfs_quota_us
> fi
> else
> $LOAD &
> echo $! > $MOUNT/$i/tasks
> echo $shares > $MOUNT/$i/cpu.shares
>
> if [ $BANDWIDTH -eq 1 ]
> then
> echo "500000" > $MOUNT/$i/cpu.cfs_period_us
> echo "250000" > $MOUNT/$i/cpu.cfs_quota_us
> fi
> fi
> done
> done
> echo "Capturing idle cpu time with vmstat...."
> vmstat 2 100 &> vmstat_log &
> }
>
> pin_tasks()
> {
> cpu=0
> count=1
> for (( i=1; i<=5; i++ ))
> do
> if [ $SUBGROUP -eq 1 ]
> then
> jj=$(eval echo "\$NR_TASKS$i")
> for (( j=1; j<=$jj; j++ ))
> do
> if [ $count -gt 2 ]
> then
> cpu=$((cpu+1))
> count=1
> fi
> echo $cpu > $MOUNT/$i/$j/cpuset.cpus
> count=$((count+1))
> done
> else
> case $i in
> 1)
> echo 0 > $MOUNT/$i/cpuset.cpus;;
> 2)
> echo 1 > $MOUNT/$i/cpuset.cpus;;
> 3)
> echo "2-3" > $MOUNT/$i/cpuset.cpus;;
> 4)
> echo "4-6" > $MOUNT/$i/cpuset.cpus;;
> 5)
> echo "7-15" > $MOUNT/$i/cpuset.cpus;;
> esac
> fi
> done
>
> }
>
> print_results()
> {
> eval gtot=$(cat sched_log|grep -i while|sed 's/R//g'|awk '{gtot+=$7};END{printf "%f", gtot}')
> for (( i=1; i<=5; i++ ))
> do
> eval temp=$(cat sched_log_$i|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
> eval tavg=$(echo "scale=4;(($temp / $gtot) * $1)/100 " | bc)
> eval avg=$(echo "scale=4;($temp / $gtot) * 100" | bc)
> eval pretty_tavg=$( echo "scale=4; $tavg * 100"| bc) # F0r pretty format
> echo "Bandwidth of Group $i = $avg i.e = $pretty_tavg% of non-Idle CPU time $1%"
> if [ $SUBGROUP -eq 1 ]
> then
> jj=$(eval echo "\$NR_TASKS$i")
> for (( j=1; j<=$jj; j++ ))
> do
> eval tmp=$(cat sched_log_$i-$j|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
> eval stavg=$(echo "scale=4;($tmp / $temp) * 100" | bc)
> eval pretty_stavg=$(echo "scale=4;(($tmp / $temp) * $tavg) * 100" | bc)
> echo -n "|"
> echo -e "...... subgroup $i/$j\t= $stavg\ti.e = $pretty_stavg% of $pretty_tavg% Groups non-Idle CPU time"
> done
> fi
> echo " "
> echo " "
> done
> }
>
> capture_results()
> {
> cat /proc/sched_debug > sched_log
> lev=""
> for (( i=1; i<=$LEVELS; i++ ))
> do
> lev="$lev\/level${i}"
> done
> pkill -9 vmstat
> avg=$(cat vmstat_log |grep -iv "system"|grep -iv "swpd"|awk ' { if ( NR != 1) {id+=$15 }}END{print (id/(NR-1))}')
>
> rem=$(echo "scale=2; 100 - $avg" |bc)
> echo "Average CPU Idle percentage $avg%"
> echo "Bandwidth shared with remaining non-Idle $rem%"
> for (( i=1; i<=5; i++ ))
> do
> cat sched_log |grep -i while1|grep -i "$lev\/$i" > sched_log_$i
> if [ $SUBGROUP -eq 1 ]
> then
> jj=$(eval echo "\$NR_TASKS$i")
> for (( j=1; j<=$jj; j++ ))
> do
> cat sched_log |grep -i while1|grep -i "$lev\/$i\/$j" > sched_log_$i-$j
> done
> fi
> done
> print_results $rem
> }
>
> create_hierarchy
> pin_tasks
>
> load_tasks
> sleep 60
> capture_results
> cleanup
> exit
>
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 00/15] CFS Bandwidth Control V6
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
` (15 preceding siblings ...)
2011-06-07 15:45 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned Kamalesh Babulal
@ 2011-06-14 6:58 ` Hu Tao
2011-06-14 7:29 ` Hidetoshi Seto
16 siblings, 1 reply; 129+ messages in thread
From: Hu Tao @ 2011-06-14 6:58 UTC (permalink / raw)
To: Paul Turner
Cc: linux-kernel, Peter Zijlstra, Bharata B Rao, Dhaval Giani,
Balbir Singh, Vaidyanathan Srinivasan, Srivatsa Vaddagiri
[-- Attachment #1: Type: text/plain, Size: 496 bytes --]
Hi,
I've run several tests including hackbench, unixbench, massive-intr
and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
4 cores, and 4G memory.
Most of the time the results differ few, but there are problems:
1. unixbench: execl throughout has about 5% drop.
2. unixbench: process creation has about 5% drop.
3. massive-intr: when running 200 processes for 5mins, the number
of loops each process runs differ more than before cfs-bandwidth-v6.
The results are attached.
[-- Attachment #2: massive-intr-200-300-without-patch.txt --]
[-- Type: text/plain, Size: 2784 bytes --]
004726 00000761
004723 00000763
004793 00000763
004776 00000736
004746 00000735
004731 00000754
004685 00000735
004835 00000754
004782 00000751
004747 00000736
004766 00000754
004663 00000735
004696 00000752
004737 00000760
004679 00000735
004727 00000751
004840 00000754
004720 00000767
004718 00000764
004788 00000761
004716 00000770
004791 00000758
004655 00000755
004838 00000757
004811 00000753
004659 00000768
004686 00000735
004740 00000759
004676 00000739
004849 00000748
004825 00000763
004808 00000748
004844 00000747
004702 00000755
004828 00000758
004829 00000758
004822 00000750
004820 00000753
004805 00000751
004764 00000748
004717 00000765
004794 00000761
004701 00000750
004792 00000766
004818 00000753
004842 00000752
004837 00000751
004697 00000750
004654 00000739
004763 00000754
004851 00000761
004671 00000738
004807 00000753
004734 00000760
004661 00000740
004743 00000737
004664 00000740
004682 00000737
004741 00000750
004817 00000750
004694 00000754
004779 00000753
004833 00000754
004758 00000757
004809 00000756
004815 00000752
004666 00000758
004770 00000750
004704 00000737
004709 00000753
004841 00000754
004732 00000753
004706 00000753
004675 00000739
004745 00000737
004719 00000765
004691 00000764
004777 00000756
004778 00000750
004780 00000759
004754 00000737
004799 00000755
004848 00000755
004752 00000737
004742 00000734
004773 00000752
004774 00000747
004673 00000736
004787 00000763
004781 00000756
004693 00000753
004692 00000751
004769 00000750
004728 00000763
004756 00000758
004749 00000737
004762 00000753
004687 00000739
004827 00000766
004683 00000734
004761 00000757
004678 00000739
004830 00000763
004803 00000763
004798 00000765
004850 00000760
004771 00000749
004674 00000737
004832 00000753
004821 00000757
004753 00000734
004843 00000752
004724 00000763
004759 00000752
004800 00000753
004700 00000753
004824 00000763
004767 00000755
004823 00000751
004789 00000768
004757 00000755
004852 00000765
004836 00000756
004839 00000757
004760 00000748
004834 00000758
004739 00000759
004786 00000768
004846 00000754
004711 00000761
004826 00000765
004695 00000755
004710 00000758
004783 00000761
004765 00000755
004684 00000731
004698 00000752
004785 00000768
004755 00000736
004813 00000754
004775 00000753
004795 00000765
004712 00000755
004768 00000755
004713 00000767
004816 00000752
004790 00000765
004744 00000731
004736 00000756
004672 00000741
004715 00000766
004667 00000754
004705 00000755
004810 00000755
004708 00000755
004707 00000752
004750 00000736
004688 00000736
004772 00000741
004703 00000736
004681 00000736
004748 00000737
004668 00000736
004690 00000739
004669 00000739
004733 00000743
004656 00000767
004812 00000749
004714 00000771
004677 00000741
004806 00000755
004665 00000736
004680 00000739
004670 00000739
[-- Attachment #3: massive-intr-200-300-with-patch.txt --]
[-- Type: text/plain, Size: 3200 bytes --]
004663 00000754
004634 00000694
004723 00000800
004746 00000751
004734 00000768
004633 00000689
004755 00000754
004722 00000797
004626 00000797
004689 00000765
004767 00000695
004813 00000765
004724 00000800
004621 00000769
004725 00000796
004714 00000799
004789 00000793
004631 00000758
004712 00000796
004744 00000748
004655 00000796
004783 00000751
004785 00000800
004790 00000796
004758 00000748
004816 00000772
004683 00000765
004636 00000694
004771 00000691
004619 00000695
004669 00000753
004623 00000696
004775 00000753
004752 00000748
004778 00000754
004784 00000751
004739 00000767
004807 00000762
004693 00000765
004691 00000770
004736 00000763
004709 00000768
004720 00000796
004628 00000695
004772 00000695
004696 00000695
004682 00000692
004675 00000748
004643 00000689
004637 00000695
004715 00000793
004787 00000796
004792 00000793
004797 00000796
004708 00000768
004651 00000796
004806 00000766
004679 00000766
004811 00000763
004699 00000695
004624 00000769
004638 00000695
004645 00000695
004635 00000692
004704 00000692
004742 00000764
004680 00000761
004800 00000796
004796 00000801
004802 00000798
004731 00000793
004677 00000770
004640 00000692
004657 00000692
004656 00000793
004730 00000790
004786 00000795
004817 00000766
004627 00000694
004727 00000793
004814 00000773
004658 00000798
004695 00000689
004791 00000792
004653 00000795
004798 00000792
004673 00000745
004666 00000753
004753 00000751
004664 00000753
004788 00000798
004801 00000753
004685 00000766
004810 00000770
004750 00000753
004754 00000755
004652 00000795
004668 00000753
004654 00000795
004648 00000695
004777 00000747
004765 00000694
004672 00000753
004665 00000750
004737 00000770
004757 00000747
004620 00000796
004780 00000750
004717 00000792
004773 00000751
004756 00000767
004760 00000746
004808 00000770
004776 00000753
004662 00000756
004670 00000750
004625 00000694
004647 00000694
004794 00000795
004738 00000767
004641 00000698
004735 00000767
004759 00000694
004799 00000790
004762 00000697
004629 00000694
004769 00000694
004705 00000694
004743 00000767
004781 00000750
004701 00000697
004661 00000749
004702 00000694
004710 00000770
004681 00000767
004700 00000691
004686 00000767
004642 00000694
004747 00000753
004644 00000694
004812 00000767
004748 00000750
004733 00000764
004721 00000797
004687 00000771
004690 00000771
004751 00000749
004632 00000694
004732 00000764
004728 00000798
004766 00000694
004706 00000764
004630 00000694
004688 00000764
004711 00000694
004622 00000753
004795 00000798
004815 00000770
004729 00000791
004763 00000747
004818 00000766
004674 00000749
004761 00000694
004749 00000752
004770 00000692
004718 00000795
004694 00000694
004782 00000755
004809 00000766
004740 00000770
004671 00000752
004716 00000762
004707 00000766
004692 00000801
004719 00000795
004713 00000800
004659 00000797
004764 00000749
004774 00000747
004698 00000688
004649 00000696
004779 00000752
004768 00000694
004676 00000752
004646 00000693
004805 00000755
004697 00000691
004703 00000692
004639 00000694
004804 00000693
004803 00000754
004678 00000769
004741 00000768
004684 00000761
004660 00000693
004793 00000797
004667 00000753
004726 00000795
004745 00000755
004650 00000691
[-- Attachment #4: massive-intr-200-60-without-patch.txt --]
[-- Type: text/plain, Size: 2608 bytes --]
004544 00000138
004411 00000152
004435 00000154
004408 00000152
004553 00000138
004450 00000138
004540 00000138
004534 00000138
004557 00000138
004545 00000138
004469 00000152
004467 00000152
004521 00000138
004396 00000152
004484 00000152
004556 00000138
004474 00000152
004537 00000138
004489 00000152
004481 00000152
004547 00000138
004587 00000138
004555 00000138
004393 00000152
004480 00000152
004405 00000152
004392 00000152
004475 00000152
004402 00000152
004563 00000135
004524 00000154
004427 00000140
004517 00000154
004431 00000154
004584 00000154
004432 00000154
004418 00000140
004442 00000154
004420 00000140
004443 00000154
004428 00000140
004549 00000154
004466 00000140
004525 00000154
004516 00000154
004423 00000140
004468 00000140
004532 00000154
004444 00000154
004531 00000154
004441 00000154
004577 00000154
004438 00000151
004518 00000151
004574 00000151
004513 00000155
004398 00000156
004588 00000153
004413 00000154
004403 00000154
004520 00000151
004512 00000140
004409 00000154
004430 00000151
004465 00000137
004482 00000154
004390 00000156
004546 00000140
004501 00000155
004404 00000154
004538 00000140
004487 00000154
004554 00000140
004471 00000154
004571 00000152
004406 00000154
004564 00000155
004499 00000155
004492 00000154
004558 00000140
004485 00000154
004536 00000140
004470 00000154
004541 00000140
004514 00000140
004551 00000140
004508 00000155
004559 00000140
004394 00000154
004542 00000140
004483 00000154
004479 00000154
004510 00000155
004410 00000154
004550 00000140
004490 00000154
004389 00000154
004502 00000155
004445 00000155
004562 00000155
004399 00000154
004494 00000154
004414 00000154
004533 00000140
004496 00000140
004395 00000151
004495 00000140
004462 00000155
004412 00000154
004407 00000151
004523 00000137
004535 00000137
004543 00000137
004575 00000153
004457 00000157
004528 00000153
004529 00000153
004515 00000153
004519 00000153
004455 00000157
004522 00000153
004472 00000151
004569 00000157
004433 00000153
004401 00000151
004417 00000139
004583 00000153
004526 00000153
004488 00000151
004434 00000153
004530 00000153
004552 00000153
004421 00000139
004425 00000154
004585 00000153
004580 00000153
004448 00000157
004452 00000157
004446 00000157
004565 00000157
004451 00000157
004436 00000153
004505 00000157
004461 00000157
004449 00000157
004497 00000157
004400 00000160
004566 00000157
004568 00000154
004570 00000157
004498 00000154
004573 00000157
004509 00000157
004453 00000157
004456 00000157
004504 00000157
004500 00000157
004511 00000157
004391 00000160
004454 00000154
004506 00000154
004572 00000157
004459 00000157
[-- Attachment #5: massive-intr-200-60-with-patch.txt --]
[-- Type: text/plain, Size: 3120 bytes --]
004434 00000156
004547 00000156
004543 00000156
004473 00000156
004399 00000156
004537 00000156
004477 00000138
004400 00000156
004444 00000152
004465 00000156
004496 00000147
004548 00000156
004372 00000159
004437 00000152
004566 00000152
004495 00000147
004489 00000141
004545 00000156
004552 00000156
004421 00000141
004461 00000141
004490 00000141
004525 00000147
004472 00000156
004412 00000141
004397 00000141
004450 00000147
004522 00000148
004425 00000147
004455 00000147
004459 00000147
004523 00000147
004530 00000147
004551 00000155
004475 00000156
004484 00000138
004439 00000147
004557 00000154
004387 00000141
004515 00000147
004494 00000147
004535 00000147
004558 00000147
004519 00000147
004449 00000147
004385 00000155
004454 00000147
004534 00000147
004395 00000141
004524 00000147
004417 00000141
004542 00000153
004423 00000141
004509 00000157
004415 00000141
004536 00000155
004532 00000147
004446 00000147
004497 00000147
004468 00000153
004393 00000141
004554 00000153
004485 00000141
004521 00000147
004375 00000141
004448 00000147
004392 00000141
004452 00000141
004493 00000147
004403 00000141
004411 00000141
004424 00000141
004481 00000141
004538 00000157
004483 00000141
004418 00000141
004384 00000146
004420 00000140
004469 00000155
004491 00000139
004391 00000140
004419 00000138
004456 00000144
004502 00000155
004386 00000146
004451 00000144
004526 00000146
004429 00000156
004371 00000146
004471 00000157
004427 00000146
004549 00000146
004369 00000141
004487 00000142
004402 00000155
004373 00000141
004533 00000146
004416 00000142
004520 00000146
004414 00000140
004381 00000148
004479 00000155
004544 00000157
004466 00000155
004370 00000156
004470 00000155
004406 00000155
004546 00000156
004376 00000140
004383 00000148
004458 00000146
004527 00000146
004453 00000146
004432 00000154
004528 00000146
004435 00000154
004447 00000147
004499 00000154
004476 00000157
004498 00000146
004504 00000154
004467 00000155
004561 00000154
004531 00000148
004463 00000155
004565 00000154
004541 00000155
004405 00000155
004492 00000146
004410 00000140
004457 00000146
004374 00000140
004430 00000154
004442 00000154
004445 00000156
004539 00000155
004486 00000140
004382 00000140
004505 00000156
004482 00000142
004390 00000140
004409 00000140
004553 00000158
004379 00000140
004540 00000158
004431 00000154
004413 00000140
004550 00000155
004380 00000155
004388 00000140
004408 00000140
004474 00000157
004433 00000156
004507 00000153
004426 00000143
004518 00000156
004460 00000149
004559 00000156
004480 00000146
004478 00000158
004529 00000146
004464 00000157
004462 00000157
004440 00000155
004398 00000139
004443 00000156
004438 00000156
004562 00000156
004404 00000157
004377 00000140
004513 00000156
004422 00000142
004394 00000142
004401 00000155
004378 00000142
004389 00000142
004501 00000156
004407 00000157
004508 00000156
004488 00000142
004396 00000142
004560 00000156
004516 00000156
004514 00000156
004556 00000153
004564 00000157
004567 00000156
004555 00000156
004568 00000155
004500 00000156
004517 00000156
004512 00000156
004436 00000156
004506 00000156
004510 00000153
[-- Attachment #6: massive-intr.png --]
[-- Type: image/png, Size: 79913 bytes --]
[-- Attachment #7: unixbench-cfs-bandwidth-v6 --]
[-- Type: text/plain, Size: 5624 bytes --]
BYTE UNIX Benchmarks (Version 5.1.3)
System: KERNEL-128: GNU/Linux
OS: GNU/Linux -- 2.6.39-rc3ht-cpu-bandwidth-test+ -- #1 SMP PREEMPT Fri May 27 11:20:19 CST 2011
Machine: x86_64 (x86_64)
Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
CPU 0: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz (4787.8 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
CPU 1: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz (4786.6 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
CPU 2: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz (4786.6 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
CPU 3: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz (4786.6 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
14:27:45 up 42 min, 1 user, load average: 0.00, 0.04, 0.20; runlevel 3
------------------------------------------------------------------------
Benchmark Run: Mon May 30 2011 14:27:45 - 14:56:15
4 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 23560040.2 lps (10.0 s, 7 samples)
Double-Precision Whetstone 2854.8 MWIPS (10.0 s, 7 samples)
Execl Throughput 240.6 lps (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 32539.3 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 8147.0 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 124312.2 KBps (30.0 s, 2 samples)
Pipe Throughput 235002.6 lps (10.0 s, 7 samples)
Pipe-based Context Switching 21412.1 lps (10.0 s, 7 samples)
Process Creation 416.0 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 895.9 lpm (60.1 s, 2 samples)
Shell Scripts (8 concurrent) 352.1 lpm (60.2 s, 2 samples)
System Call Overhead 322619.9 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 23560040.2 2018.9
Double-Precision Whetstone 55.0 2854.8 519.1
Execl Throughput 43.0 240.6 56.0
File Copy 1024 bufsize 2000 maxblocks 3960.0 32539.3 82.2
File Copy 256 bufsize 500 maxblocks 1655.0 8147.0 49.2
File Copy 4096 bufsize 8000 maxblocks 5800.0 124312.2 214.3
Pipe Throughput 12440.0 235002.6 188.9
Pipe-based Context Switching 4000.0 21412.1 53.5
Process Creation 126.0 416.0 33.0
Shell Scripts (1 concurrent) 42.4 895.9 211.3
Shell Scripts (8 concurrent) 6.0 352.1 586.8
System Call Overhead 15000.0 322619.9 215.1
========
System Benchmarks Index Score 166.5
------------------------------------------------------------------------
Benchmark Run: Mon May 30 2011 14:56:15 - 15:24:49
4 CPUs in system; running 4 parallel copies of tests
Dhrystone 2 using register variables 94432034.3 lps (10.0 s, 7 samples)
Double-Precision Whetstone 11421.1 MWIPS (10.0 s, 7 samples)
Execl Throughput 1787.6 lps (29.6 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 25070.7 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 6025.9 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 70719.0 KBps (30.0 s, 2 samples)
Pipe Throughput 912685.3 lps (10.0 s, 7 samples)
Pipe-based Context Switching 168909.9 lps (10.0 s, 7 samples)
Process Creation 2796.7 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 3201.4 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 431.3 lpm (60.3 s, 2 samples)
System Call Overhead 1233097.5 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 94432034.3 8091.9
Double-Precision Whetstone 55.0 11421.1 2076.6
Execl Throughput 43.0 1787.6 415.7
File Copy 1024 bufsize 2000 maxblocks 3960.0 25070.7 63.3
File Copy 256 bufsize 500 maxblocks 1655.0 6025.9 36.4
File Copy 4096 bufsize 8000 maxblocks 5800.0 70719.0 121.9
Pipe Throughput 12440.0 912685.3 733.7
Pipe-based Context Switching 4000.0 168909.9 422.3
Process Creation 126.0 2796.7 222.0
Shell Scripts (1 concurrent) 42.4 3201.4 755.0
Shell Scripts (8 concurrent) 6.0 431.3 718.9
System Call Overhead 15000.0 1233097.5 822.1
========
System Benchmarks Index Score 445.0
[-- Attachment #8: unixbench-without-cfs-bandwidth-v6 --]
[-- Type: text/plain, Size: 5637 bytes --]
BYTE UNIX Benchmarks (Version 5.1.3)
System: KERNEL-128: GNU/Linux
OS: GNU/Linux -- 2.6.39-rc3ht-cpu-bandwidth-test-without-patch+ -- #2 SMP PREEMPT Fri May 27 16:00:22 CST 2011
Machine: x86_64 (x86_64)
Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
CPU 0: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz (4788.0 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
CPU 1: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz (4786.6 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
CPU 2: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz (4786.6 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
CPU 3: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz (4786.6 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
15:34:21 up 1 min, 1 user, load average: 0.90, 0.33, 0.12; runlevel 3
------------------------------------------------------------------------
Benchmark Run: Mon May 30 2011 15:34:21 - 16:02:43
4 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 23570449.8 lps (10.0 s, 7 samples)
Double-Precision Whetstone 2856.0 MWIPS (10.0 s, 7 samples)
Execl Throughput 245.3 lps (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 32605.9 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 8211.5 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 126138.2 KBps (30.0 s, 2 samples)
Pipe Throughput 231883.3 lps (10.0 s, 7 samples)
Pipe-based Context Switching 22245.2 lps (10.0 s, 7 samples)
Process Creation 421.0 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 714.7 lpm (60.1 s, 2 samples)
Shell Scripts (8 concurrent) 355.1 lpm (60.1 s, 2 samples)
System Call Overhead 316964.5 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 23570449.8 2019.7
Double-Precision Whetstone 55.0 2856.0 519.3
Execl Throughput 43.0 245.3 57.0
File Copy 1024 bufsize 2000 maxblocks 3960.0 32605.9 82.3
File Copy 256 bufsize 500 maxblocks 1655.0 8211.5 49.6
File Copy 4096 bufsize 8000 maxblocks 5800.0 126138.2 217.5
Pipe Throughput 12440.0 231883.3 186.4
Pipe-based Context Switching 4000.0 22245.2 55.6
Process Creation 126.0 421.0 33.4
Shell Scripts (1 concurrent) 42.4 714.7 168.6
Shell Scripts (8 concurrent) 6.0 355.1 591.9
System Call Overhead 15000.0 316964.5 211.3
========
System Benchmarks Index Score 164.3
------------------------------------------------------------------------
Benchmark Run: Mon May 30 2011 16:02:43 - 16:31:14
4 CPUs in system; running 4 parallel copies of tests
Dhrystone 2 using register variables 94372189.3 lps (10.0 s, 7 samples)
Double-Precision Whetstone 11430.4 MWIPS (10.0 s, 7 samples)
Execl Throughput 1875.1 lps (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 22718.2 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 6067.2 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 62203.8 KBps (30.0 s, 2 samples)
Pipe Throughput 884763.1 lps (10.0 s, 7 samples)
Pipe-based Context Switching 172161.4 lps (10.0 s, 7 samples)
Process Creation 2920.9 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 3230.3 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 430.6 lpm (60.3 s, 2 samples)
System Call Overhead 1199897.3 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 94372189.3 8086.7
Double-Precision Whetstone 55.0 11430.4 2078.3
Execl Throughput 43.0 1875.1 436.1
File Copy 1024 bufsize 2000 maxblocks 3960.0 22718.2 57.4
File Copy 256 bufsize 500 maxblocks 1655.0 6067.2 36.7
File Copy 4096 bufsize 8000 maxblocks 5800.0 62203.8 107.2
Pipe Throughput 12440.0 884763.1 711.2
Pipe-based Context Switching 4000.0 172161.4 430.4
Process Creation 126.0 2920.9 231.8
Shell Scripts (1 concurrent) 42.4 3230.3 761.9
Shell Scripts (8 concurrent) 6.0 430.6 717.6
System Call Overhead 15000.0 1199897.3 799.9
========
System Benchmarks Index Score 439.0
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 00/15] CFS Bandwidth Control V6
2011-06-14 6:58 ` [patch 00/15] CFS Bandwidth Control V6 Hu Tao
@ 2011-06-14 7:29 ` Hidetoshi Seto
2011-06-14 7:44 ` Hu Tao
2011-06-15 8:37 ` Hu Tao
0 siblings, 2 replies; 129+ messages in thread
From: Hidetoshi Seto @ 2011-06-14 7:29 UTC (permalink / raw)
To: Hu Tao
Cc: Paul Turner, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri
(2011/06/14 15:58), Hu Tao wrote:
> Hi,
>
> I've run several tests including hackbench, unixbench, massive-intr
> and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
> 4 cores, and 4G memory.
>
> Most of the time the results differ few, but there are problems:
>
> 1. unixbench: execl throughout has about 5% drop.
> 2. unixbench: process creation has about 5% drop.
> 3. massive-intr: when running 200 processes for 5mins, the number
> of loops each process runs differ more than before cfs-bandwidth-v6.
>
> The results are attached.
I know the score of unixbench is not so stable that the problem might
be noises ... but the result of massive-intr is interesting.
Could you give a try to find which piece (xx/15) in the series cause
the problems?
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 00/15] CFS Bandwidth Control V6
2011-06-14 7:29 ` Hidetoshi Seto
@ 2011-06-14 7:44 ` Hu Tao
2011-06-15 8:37 ` Hu Tao
1 sibling, 0 replies; 129+ messages in thread
From: Hu Tao @ 2011-06-14 7:44 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: Paul Turner, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri
On Tue, Jun 14, 2011 at 04:29:49PM +0900, Hidetoshi Seto wrote:
> (2011/06/14 15:58), Hu Tao wrote:
> > Hi,
> >
> > I've run several tests including hackbench, unixbench, massive-intr
> > and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
> > 4 cores, and 4G memory.
> >
> > Most of the time the results differ few, but there are problems:
> >
> > 1. unixbench: execl throughout has about 5% drop.
> > 2. unixbench: process creation has about 5% drop.
> > 3. massive-intr: when running 200 processes for 5mins, the number
> > of loops each process runs differ more than before cfs-bandwidth-v6.
> >
> > The results are attached.
>
> I know the score of unixbench is not so stable that the problem might
> be noises ... but the result of massive-intr is interesting.
> Could you give a try to find which piece (xx/15) in the series cause
> the problems?
OK. I'll do it.
>
> Thanks,
> H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned
2011-06-07 15:45 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned Kamalesh Babulal
2011-06-08 3:09 ` Paul Turner
2011-06-08 10:46 ` Vladimir Davydov
@ 2011-06-14 10:16 ` Hidetoshi Seto
2 siblings, 0 replies; 129+ messages in thread
From: Hidetoshi Seto @ 2011-06-14 10:16 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: Paul Turner, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri, Ingo Molnar, Pavel Emelyanov
(2011/06/08 0:45), Kamalesh Babulal wrote:
> Hi All,
>
> In our test environment, while testing the CFS Bandwidth V6 patch set
> on top of 55922c9d1b84. We observed that the CPU's idle time is seen
> between 30% to 40% while running CPU bound test, with the cgroups tasks
> not pinned to the CPU's. Whereas in the inverse case, where the cgroups
> tasks are pinned to the CPU's, the idle time seen is nearly zero.
I've some test with your test script but I'm not sure whether it is really
a considerable problem. Am I missing the point?
I add -c option to your script to toggle pinning (1:pinned, 0:not pinned).
In short the results in my environment (16 cpu, 4 quad core) are:
# group's usage
-b 0 -p 0 -c 0 : Idle = 0% (12,12,25,25,25)
-b 0 -p 0 -c 1 : Idle = 0% (6,6,12,25,50)
-b 0 -p 1 -c * : Idle = 0% (6,6,12,25,50)
-b 1 -p 0 -c 0 : Idle = ~25% (6,6,12,25,25)
-b 1 -p 0 -c 1 : Idle = 0% (6,6,12,25,50)
-b 1 -p 1 -c * : Idle = 0% (6,6,12,25,50)
In my understanding is correct, when -p0, there are 5 groups (with share=1024)
and each group has 2,2,4,8,16 subgroups, so a subgroup in /1 is weighted 8 times
higher than one in /5. And when -p1, share of 5 parent groups are promoted and
all subgroups are evenly weighted.
With -p0 the cpu usage of 5 groups is going to be 20,20,20,20,20 but group /1
and /2 have only 2 subgroups for each, so even if /1 and /2 fully use 2 cpus
for each the usage will be 12,12,25,25,25.
OTOH the bandwidth of a subgroup is 250000/500000 (=0.5 cpu), so in case of
Idle=0% the cpu usage of groups are likely be 6,6,12,25,50%.
The question is what happen if both are mixed.
For example in case of your unpinned Idle=34.8%:
> Average CPU Idle percentage 34.8% (as explained above in the Idle time measured)
> Bandwidth shared with remaining non-Idle 65.2%
> Bandwidth of Group 1 = 9.2500 i.e = 6.0300% of non-Idle CPU time 65.2%
> Bandwidth of Group 2 = 9.0400 i.e = 5.8900% of non-Idle CPU time 65.2%
> Bandwidth of Group 3 = 16.9300 i.e = 11.0300% of non-Idle CPU time 65.2%
> Bandwidth of Group 4 = 27.9300 i.e = 18.2100% of non-Idle CPU time 65.2%
> Bandwidth of Group 5 = 36.8300 i.e = 24.0100% of non-Idle CPU time 65.2%
The usage is 6,6,11,18,24.
It looks like that group /1 to /3 are limited by bandwidth, while group /5 is
limited by share. (I have no idea about the noise on /4 here)
BTW since pinning in your script always pin a couple of subgroup in a same
group to a cpu, subgroups are weighted evenly everywhere so as the result
share doesn't work for these cases.
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned
2011-06-14 0:00 ` Paul Turner
@ 2011-06-15 5:37 ` Kamalesh Babulal
2011-06-21 19:48 ` Paul Turner
0 siblings, 1 reply; 129+ messages in thread
From: Kamalesh Babulal @ 2011-06-15 5:37 UTC (permalink / raw)
To: Paul Turner
Cc: Vladimir Davydov, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Ingo Molnar, Pavel Emelianov
* Paul Turner <pjt@google.com> [2011-06-13 17:00:08]:
> Hi Kamalesh.
>
> I tried on both friday and again today to reproduce your results
> without success. Results are attached below. The margin of error is
> the same as the previous (2-level deep case), ~4%. One minor nit, in
> your script's input parsing you're calling shift; you don't need to do
> this with getopts and it will actually lead to arguments being
> dropped.
>
> Are you testing on top of a clean -tip? Do you have any custom
> load-balancer or scheduler settings?
>
> Thanks,
>
> - Paul
>
>
> Hyper-threaded topology:
> unpinned:
> Average CPU Idle percentage 38.6333%
> Bandwidth shared with remaining non-Idle 61.3667%
>
> pinned:
> Average CPU Idle percentage 35.2766%
> Bandwidth shared with remaining non-Idle 64.7234%
> (The mask in the "unpinned" case is 0-3,6-9,12-15,18-21 which should
> mirror your 2 socket 8x2 configuration.)
>
> 4-way NUMA topology:
> unpinned:
> Average CPU Idle percentage 5.26667%
> Bandwidth shared with remaining non-Idle 94.73333%
>
> pinned:
> Average CPU Idle percentage 0.242424%
> Bandwidth shared with remaining non-Idle 99.757576%
>
Hi Paul,
I tried tip 919c9baa9 + V6 patchset on 2 socket,quadcore with HT and
the Idle time seen is ~22% to ~23%. Kernel is not tuned to any custom
load-balancer/scheduler settings.
unpinned:
Average CPU Idle percentage 23.5333%
Bandwidth shared with remaining non-Idle 76.4667%
pinned:
Average CPU Idle percentage 0%
Bandwidth shared with remaining non-Idle 100%
Thanks,
Kamalesh
>
>
>
> On Fri, Jun 10, 2011 at 11:17 AM, Kamalesh Babulal
> <kamalesh@linux.vnet.ibm.com> wrote:
> > * Paul Turner <pjt@google.com> [2011-06-08 20:25:00]:
> >
> >> Hi Kamalesh,
> >>
> >> I'm unable to reproduce the results you describe. One possibility is
> >> load-balancer interaction -- can you describe the topology of the
> >> platform you are running this on?
> >>
> >> On both a straight NUMA topology and a hyper-threaded platform I
> >> observe a ~4% delta between the pinned and un-pinned cases.
> >>
> >> Thanks -- results below,
> >>
> >> - Paul
> >>
> >>
(snip)
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 00/15] CFS Bandwidth Control V6
2011-06-14 7:29 ` Hidetoshi Seto
2011-06-14 7:44 ` Hu Tao
@ 2011-06-15 8:37 ` Hu Tao
2011-06-16 0:57 ` Hidetoshi Seto
1 sibling, 1 reply; 129+ messages in thread
From: Hu Tao @ 2011-06-15 8:37 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: Paul Turner, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri
[-- Attachment #1: Type: text/plain, Size: 1092 bytes --]
On Tue, Jun 14, 2011 at 04:29:49PM +0900, Hidetoshi Seto wrote:
> (2011/06/14 15:58), Hu Tao wrote:
> > Hi,
> >
> > I've run several tests including hackbench, unixbench, massive-intr
> > and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
> > 4 cores, and 4G memory.
> >
> > Most of the time the results differ few, but there are problems:
> >
> > 1. unixbench: execl throughout has about 5% drop.
> > 2. unixbench: process creation has about 5% drop.
> > 3. massive-intr: when running 200 processes for 5mins, the number
> > of loops each process runs differ more than before cfs-bandwidth-v6.
> >
> > The results are attached.
>
> I know the score of unixbench is not so stable that the problem might
> be noises ... but the result of massive-intr is interesting.
> Could you give a try to find which piece (xx/15) in the series cause
> the problems?
After more tests, I found massive-intr data is not stable, too. Results
are attached. The third number in file name means which patchs are
applied, 0 means no patch applied. plot.sh is easy to generate png
files.
[-- Attachment #2: massive-intr-200-300-0-1.txt --]
[-- Type: text/plain, Size: 3200 bytes --]
004516 00000782
004522 00000778
004400 00000689
004420 00000699
004442 00000781
004459 00000729
004539 00000734
004413 00000689
004489 00000700
004499 00000699
004519 00000781
004543 00000734
004389 00000689
004561 00000737
004473 00000731
004457 00000736
004467 00000725
004557 00000794
004566 00000797
004440 00000778
004415 00000696
004531 00000794
004401 00000693
004552 00000743
004416 00000694
004422 00000695
004550 00000734
004497 00000701
004485 00000792
004451 00000789
004502 00000698
004507 00000780
004517 00000777
004536 00000792
004430 00000781
004505 00000780
004529 00000800
004534 00000789
004408 00000683
004456 00000734
004488 00000685
004527 00000803
004544 00000735
004546 00000737
004474 00000734
004564 00000789
004551 00000734
004392 00000793
004581 00000747
004445 00000785
004511 00000777
004395 00000691
004411 00000690
004576 00000694
004496 00000695
004409 00000691
004470 00000735
004426 00000780
004393 00000781
004460 00000737
004390 00000731
004483 00000796
004458 00000741
004465 00000735
004478 00000800
004433 00000778
004503 00000694
004514 00000784
004436 00000780
004435 00000783
004520 00000777
004386 00000783
004513 00000777
004521 00000782
004508 00000780
004427 00000776
004569 00000792
004573 00000794
004405 00000691
004476 00000789
004481 00000784
004548 00000731
004438 00000779
004472 00000731
004487 00000694
004549 00000727
004583 00000732
004575 00000693
004579 00000731
004397 00000784
004495 00000694
004542 00000738
004524 00000785
004580 00000741
004492 00000688
004463 00000739
004434 00000774
004449 00000797
004424 00000776
004504 00000784
004399 00000689
004437 00000784
004572 00000794
004452 00000790
004453 00000794
004563 00000796
004559 00000728
004446 00000794
004535 00000795
004444 00000779
004454 00000794
004560 00000734
004541 00000728
004494 00000695
004554 00000735
004419 00000690
004469 00000736
004447 00000796
004570 00000696
004471 00000733
004565 00000796
004403 00000688
004558 00000739
004532 00000797
004429 00000786
004475 00000793
004498 00000694
004417 00000698
004562 00000737
004506 00000781
004491 00000699
004448 00000795
004428 00000782
004404 00000692
004512 00000780
004509 00000781
004486 00000698
004479 00000802
004406 00000695
004398 00000775
004441 00000782
004423 00000696
004464 00000736
004510 00000782
004477 00000791
004462 00000796
004493 00000697
004410 00000702
004555 00000738
004384 00000696
004518 00000779
004425 00000742
004394 00000696
004443 00000780
004414 00000697
004388 00000690
004455 00000738
004482 00000791
004432 00000777
004582 00000734
004577 00000693
004439 00000779
004533 00000791
004578 00000692
004466 00000739
004418 00000690
004402 00000697
004391 00000798
004545 00000737
004500 00000696
004526 00000779
004568 00000799
004567 00000792
004450 00000795
004528 00000796
004480 00000794
004530 00000803
004387 00000739
004540 00000738
004538 00000793
004556 00000733
004490 00000693
004525 00000780
004547 00000743
004431 00000779
004484 00000794
004421 00000693
004412 00000699
004407 00000691
004385 00000800
004501 00000695
004537 00000796
004468 00000732
004515 00000736
004396 00000796
004571 00000799
004574 00000693
004461 00000744
004523 00000779
004553 00000746
[-- Attachment #3: massive-intr-200-300-10.txt --]
[-- Type: text/plain, Size: 2848 bytes --]
004687 00000706
004613 00000709
004591 00000702
004579 00000709
004685 00000709
004588 00000709
004669 00000811
004598 00000814
004699 00000758
004763 00000753
004750 00000735
004684 00000709
004756 00000753
004573 00000709
004577 00000754
004609 00000706
004657 00000820
004666 00000815
004633 00000753
004697 00000753
004608 00000703
004590 00000707
004681 00000706
004568 00000758
004736 00000819
004643 00000734
004566 00000813
004704 00000753
004595 00000813
004759 00000753
004709 00000753
004606 00000708
004661 00000814
004622 00000739
004675 00000816
004725 00000735
004663 00000818
004731 00000815
004596 00000818
004753 00000750
004713 00000750
004655 00000710
004627 00000750
004594 00000813
004667 00000816
004716 00000739
004722 00000740
004715 00000739
004700 00000750
004735 00000814
004674 00000815
004728 00000741
004762 00000751
004740 00000818
004576 00000713
004578 00000738
004723 00000740
004653 00000743
004647 00000737
004572 00000709
004584 00000706
004620 00000709
004619 00000710
004592 00000702
004597 00000739
004648 00000737
004733 00000814
004758 00000754
004659 00000820
004664 00000818
004747 00000820
004717 00000739
004701 00000752
004696 00000747
004760 00000740
004710 00000755
004712 00000752
004695 00000755
004623 00000734
004683 00000707
004587 00000711
004618 00000712
004605 00000710
004631 00000755
004603 00000710
004586 00000711
004706 00000734
004702 00000755
004644 00000739
004634 00000752
004635 00000752
004617 00000713
004738 00000815
004610 00000707
004732 00000818
004641 00000740
004691 00000734
004746 00000813
004601 00000822
004670 00000815
004628 00000752
004615 00000714
004703 00000762
004612 00000708
004698 00000752
004636 00000752
004632 00000752
004682 00000709
004629 00000752
004734 00000818
004714 00000750
004742 00000815
004708 00000754
004585 00000711
004743 00000814
004751 00000749
004574 00000711
004599 00000818
004639 00000756
004737 00000814
004651 00000739
004672 00000812
004671 00000815
004680 00000711
004668 00000817
004720 00000735
004761 00000754
004752 00000818
004678 00000714
004565 00000711
004638 00000757
004569 00000709
004665 00000814
004583 00000738
004688 00000739
004727 00000812
004575 00000712
004582 00000823
004581 00000740
004744 00000817
004614 00000709
004660 00000812
004580 00000735
004624 00000754
004642 00000733
004571 00000710
004705 00000758
004686 00000710
004741 00000820
004721 00000736
004593 00000817
004616 00000709
004677 00000713
004693 00000733
004650 00000736
004640 00000752
004719 00000739
004730 00000731
004745 00000736
004621 00000736
004645 00000736
004656 00000739
004689 00000741
004646 00000736
004748 00000817
004739 00000819
004676 00000708
004652 00000736
004694 00000739
004654 00000736
004649 00000739
004749 00000733
004726 00000731
004729 00000739
004724 00000737
004692 00000736
004718 00000737
004626 00000751
[-- Attachment #4: massive-intr-200-300-11.txt --]
[-- Type: text/plain, Size: 3072 bytes --]
004680 00000765
004812 00000759
004681 00000765
004705 00000762
004807 00000762
004786 00000775
004721 00000778
004783 00000762
004805 00000764
004648 00000775
004710 00000775
004809 00000762
004733 00000767
004724 00000778
004794 00000778
004796 00000779
004791 00000780
004718 00000777
004795 00000780
004687 00000708
004651 00000708
004633 00000708
004694 00000708
004701 00000708
004654 00000708
004673 00000767
004693 00000705
004620 00000700
004637 00000705
004702 00000708
004765 00000702
004621 00000708
004697 00000708
004649 00000778
004709 00000778
004641 00000708
004643 00000708
004614 00000774
004797 00000773
004698 00000705
004696 00000705
004692 00000705
004624 00000705
004776 00000705
004704 00000705
004808 00000767
004623 00000702
004639 00000705
004685 00000764
004663 00000750
004640 00000705
004766 00000702
004619 00000702
004629 00000702
004714 00000775
004772 00000750
004788 00000778
004792 00000780
004803 00000777
004646 00000775
004810 00000767
004745 00000750
004734 00000764
004798 00000767
004744 00000747
004631 00000751
004778 00000750
004769 00000750
004662 00000750
004764 00000747
004747 00000747
004755 00000744
004666 00000750
004804 00000767
004730 00000767
004753 00000750
004652 00000780
004667 00000750
004669 00000747
004773 00000747
004758 00000750
004751 00000750
004780 00000750
004754 00000747
004726 00000780
004689 00000767
004802 00000781
004739 00000761
004655 00000744
004671 00000747
004672 00000767
004779 00000741
004743 00000767
004708 00000767
004731 00000767
004749 00000747
004738 00000767
004771 00000744
004616 00000766
004750 00000741
004679 00000761
004789 00000772
004729 00000767
004727 00000780
004613 00000705
004728 00000777
004622 00000710
004799 00000777
004699 00000710
004759 00000744
004675 00000764
004691 00000707
004741 00000764
004627 00000710
004638 00000710
004642 00000710
004767 00000707
004723 00000777
004686 00000764
004768 00000707
004636 00000707
004740 00000764
004801 00000780
004625 00000707
004735 00000764
004793 00000777
004618 00000704
004716 00000772
004777 00000707
004690 00000707
004703 00000707
004628 00000707
004700 00000710
004678 00000758
004760 00000704
004761 00000707
004632 00000704
004683 00000769
004634 00000707
004806 00000704
004645 00000778
004684 00000761
004644 00000704
004676 00000766
004630 00000779
004647 00000777
004664 00000749
004617 00000702
004695 00000707
004650 00000780
004635 00000704
004785 00000782
004712 00000782
004717 00000779
004790 00000775
004811 00000766
004660 00000749
004763 00000749
004674 00000763
004656 00000749
004774 00000752
004787 00000779
004615 00000754
004770 00000749
004706 00000758
004707 00000766
004742 00000766
004713 00000774
004719 00000777
004715 00000776
004653 00000780
004746 00000752
004665 00000743
004658 00000743
004775 00000746
004782 00000746
004659 00000749
004722 00000776
004668 00000740
004736 00000760
004732 00000766
004677 00000763
004800 00000766
004737 00000763
004756 00000752
004657 00000752
004688 00000766
004626 00000770
004670 00000746
004781 00000749
004748 00000746
004752 00000746
[-- Attachment #5: massive-intr-200-300-12.txt --]
[-- Type: text/plain, Size: 3200 bytes --]
004457 00000758
004552 00000749
004581 00000776
004417 00000775
004556 00000749
004548 00000752
004555 00000749
004470 00000749
004465 00000761
004481 00000749
004418 00000749
004538 00000749
004549 00000749
004540 00000749
004546 00000749
004482 00000746
004592 00000749
004541 00000749
004409 00000749
004562 00000761
004584 00000779
004564 00000755
004438 00000776
004475 00000746
004578 00000776
004536 00000766
004408 00000761
004462 00000766
004463 00000763
004533 00000763
004489 00000707
004530 00000763
004566 00000760
004531 00000766
004529 00000710
004431 00000710
004430 00000710
004488 00000710
004429 00000710
004469 00000751
004413 00000710
004427 00000707
004412 00000707
004582 00000781
004447 00000707
004575 00000784
004423 00000707
004477 00000707
004449 00000763
004558 00000704
004450 00000763
004434 00000707
004411 00000767
004494 00000707
004495 00000707
004464 00000763
004497 00000707
004597 00000746
004452 00000766
004500 00000766
004573 00000763
004587 00000781
004419 00000707
004407 00000777
004441 00000784
004506 00000781
004560 00000763
004580 00000778
004588 00000781
004516 00000784
004598 00000754
004589 00000784
004520 00000781
004513 00000781
004583 00000781
004515 00000781
004433 00000704
004501 00000781
004504 00000781
004432 00000704
004539 00000751
004577 00000781
004428 00000704
004486 00000707
004547 00000751
004485 00000704
004551 00000754
004490 00000707
004557 00000751
004424 00000707
004491 00000707
004576 00000778
004511 00000781
004594 00000778
004445 00000778
004483 00000704
004436 00000701
004595 00000751
004554 00000751
004479 00000751
004519 00000778
004543 00000751
004572 00000763
004446 00000755
004518 00000763
004602 00000748
004454 00000760
004480 00000704
004459 00000763
004460 00000763
004535 00000775
004448 00000704
004605 00000745
004599 00000751
004600 00000751
004550 00000751
004528 00000760
004451 00000760
004456 00000763
004522 00000763
004596 00000748
004527 00000763
004473 00000748
004478 00000748
004505 00000775
004524 00000763
004601 00000751
004476 00000748
004537 00000757
004544 00000748
004545 00000751
004461 00000765
004571 00000757
004458 00000757
004455 00000762
004534 00000765
004507 00000783
004567 00000762
004503 00000783
004561 00000765
004406 00000707
004579 00000783
004569 00000763
004439 00000780
004532 00000765
004421 00000709
004512 00000783
004523 00000762
004499 00000709
004440 00000783
004474 00000753
004443 00000780
004508 00000783
004410 00000709
004563 00000762
004467 00000750
004509 00000783
004415 00000706
004468 00000753
004510 00000783
004604 00000753
004585 00000780
004487 00000706
004471 00000753
004492 00000706
004472 00000750
004444 00000777
004442 00000777
004425 00000706
004603 00000753
004586 00000783
004574 00000753
004542 00000750
004498 00000709
004414 00000703
004422 00000709
004565 00000757
004570 00000706
004502 00000780
004426 00000706
004591 00000780
004514 00000780
004517 00000780
004435 00000706
004420 00000703
004593 00000706
004437 00000706
004553 00000706
004493 00000706
004484 00000703
004525 00000762
004568 00000762
004521 00000765
004526 00000765
004416 00000706
004496 00000709
004590 00000780
004466 00000753
004453 00000759
004559 00000709
[-- Attachment #6: massive-intr-200-300-13.txt --]
[-- Type: text/plain, Size: 2624 bytes --]
004525 00000797
004491 00000733
004477 00000733
004527 00000799
004446 00000706
004469 00000765
004556 00000730
004595 00000802
004529 00000731
004512 00000717
004423 00000713
004475 00000771
004438 00000710
004501 00000714
004460 00000739
004435 00000713
004603 00000794
004465 00000736
004463 00000721
004588 00000797
004519 00000801
004597 00000797
004480 00000738
004476 00000770
004542 00000765
004503 00000768
004471 00000764
004443 00000715
004472 00000762
004434 00000737
004581 00000767
004496 00000767
004602 00000799
004579 00000765
004577 00000765
004508 00000736
004447 00000714
004584 00000791
004592 00000799
004553 00000801
004545 00000765
004598 00000792
004599 00000799
004445 00000713
004544 00000763
004451 00000711
004573 00000767
004607 00000738
004502 00000762
004567 00000738
004616 00000735
004490 00000730
004574 00000767
004429 00000769
004547 00000767
004449 00000714
004483 00000738
004522 00000797
004531 00000797
004593 00000796
004478 00000738
004561 00000738
004507 00000716
004419 00000799
004557 00000738
004485 00000738
004587 00000799
004497 00000767
004570 00000770
004576 00000770
004612 00000735
004601 00000735
004563 00000738
004430 00000736
004509 00000738
004608 00000738
004457 00000802
004426 00000716
004433 00000712
004461 00000709
004511 00000720
004528 00000794
004613 00000738
004546 00000767
004466 00000712
004617 00000737
004494 00000735
004489 00000738
004459 00000714
004464 00000713
004458 00000788
004454 00000805
004440 00000711
004427 00000723
004585 00000795
004565 00000732
004524 00000800
004572 00000762
004474 00000769
004594 00000798
004452 00000794
004504 00000709
004590 00000790
004568 00000770
004515 00000799
004455 00000804
004540 00000763
004425 00000766
004582 00000764
004575 00000764
004539 00000761
004523 00000785
004420 00000761
004418 00000735
004530 00000799
004456 00000801
004487 00000708
004462 00000717
004578 00000769
004583 00000715
004536 00000761
004609 00000729
004500 00000767
004473 00000764
004450 00000716
004467 00000769
004437 00000711
004498 00000769
004424 00000732
004482 00000740
004589 00000797
004564 00000769
004606 00000734
004432 00000708
004521 00000792
004615 00000740
004481 00000732
004562 00000737
004484 00000737
004566 00000737
004520 00000796
004486 00000737
004442 00000721
004510 00000717
004533 00000712
004534 00000714
004554 00000737
004468 00000769
004513 00000739
004560 00000731
004591 00000801
004552 00000737
004499 00000766
004586 00000800
004448 00000735
004444 00000713
004505 00000734
004436 00000716
004431 00000736
004551 00000737
004558 00000734
004495 00000769
004488 00000734
004605 00000796
[-- Attachment #7: massive-intr-200-300-14.txt --]
[-- Type: text/plain, Size: 3008 bytes --]
004446 00000734
004525 00000769
004604 00000749
004583 00000750
004578 00000753
004597 00000769
004534 00000753
004418 00000734
004444 00000734
004519 00000766
004507 00000731
004599 00000769
004547 00000749
004573 00000750
004522 00000766
004503 00000736
004496 00000732
004595 00000769
004448 00000729
004556 00000751
004491 00000754
004520 00000769
004479 00000751
004453 00000766
004559 00000751
004456 00000766
004437 00000734
004575 00000745
004579 00000750
004574 00000752
004473 00000753
004590 00000766
004424 00000753
004452 00000769
004602 00000766
004587 00000766
004471 00000750
004526 00000763
004543 00000750
004476 00000750
004572 00000750
004598 00000766
004580 00000755
004549 00000750
004472 00000751
004445 00000732
004529 00000766
004501 00000734
004550 00000754
004487 00000748
004486 00000750
004436 00000734
004617 00000753
004527 00000766
004511 00000729
004603 00000766
004565 00000756
004454 00000766
004523 00000763
004447 00000734
004541 00000755
004475 00000746
004569 00000751
004570 00000733
004451 00000769
004439 00000737
004428 00000733
004478 00000754
004542 00000750
004555 00000750
004477 00000756
004443 00000730
004566 00000751
004510 00000736
004465 00000747
004489 00000750
004508 00000734
004455 00000763
004467 00000753
004468 00000750
004546 00000753
004584 00000753
004535 00000751
004586 00000751
004540 00000750
004433 00000734
004464 00000747
004427 00000734
004560 00000751
004493 00000748
004571 00000737
004558 00000755
004552 00000754
004607 00000734
004608 00000751
004481 00000753
004585 00000752
004462 00000752
004528 00000768
004576 00000755
004517 00000771
004512 00000768
004614 00000752
004591 00000771
004563 00000753
004606 00000768
004431 00000757
004488 00000748
004485 00000756
004494 00000751
004440 00000733
004531 00000771
004589 00000768
004551 00000753
004423 00000733
004601 00000768
004509 00000736
004594 00000765
004422 00000735
004505 00000734
004600 00000768
004470 00000750
004548 00000755
004513 00000768
004421 00000757
004514 00000768
004480 00000755
004438 00000734
004611 00000756
004460 00000746
004524 00000768
004616 00000752
004483 00000749
004495 00000736
004554 00000750
004515 00000765
004530 00000747
004504 00000733
004545 00000752
004482 00000753
004457 00000753
004588 00000768
004613 00000750
004492 00000730
004434 00000735
004420 00000755
004426 00000733
004474 00000752
004516 00000765
004553 00000753
004593 00000768
004466 00000754
004568 00000765
004499 00000736
004449 00000768
004461 00000752
004562 00000755
004429 00000767
004425 00000743
004533 00000749
004612 00000754
004419 00000761
004502 00000733
004544 00000756
004500 00000735
004537 00000749
004430 00000733
004582 00000733
004596 00000765
004610 00000750
004458 00000737
004490 00000752
004484 00000755
004450 00000738
004521 00000770
004577 00000751
004615 00000757
004469 00000753
004592 00000765
004432 00000735
004459 00000732
004567 00000754
004463 00000751
004497 00000743
004506 00000732
004561 00000734
004605 00000768
004536 00000752
[-- Attachment #8: massive-intr-200-300-15-1.txt --]
[-- Type: text/plain, Size: 3088 bytes --]
004437 00000779
004583 00000782
004543 00000766
004568 00000753
004503 00000780
004526 00000742
004461 00000720
004455 00000718
004421 00000722
004563 00000744
004525 00000726
004591 00000781
004600 00000768
004428 00000720
004499 00000727
004490 00000745
004596 00000768
004593 00000777
004539 00000767
004482 00000748
004471 00000768
004555 00000762
004553 00000769
004491 00000751
004458 00000750
004431 00000720
004457 00000752
004497 00000725
004411 00000720
004575 00000781
004588 00000779
004452 00000720
004465 00000766
004467 00000768
004567 00000746
004562 00000744
004536 00000754
004435 00000720
004566 00000739
004470 00000767
004580 00000782
004560 00000742
004542 00000765
004406 00000783
004595 00000775
004576 00000776
004540 00000768
004493 00000723
004450 00000722
004514 00000784
004422 00000724
004448 00000724
004418 00000774
004544 00000771
004415 00000744
004534 00000742
004535 00000750
004507 00000778
004584 00000779
004498 00000717
004509 00000782
004459 00000751
004587 00000781
004532 00000746
004594 00000769
004541 00000776
004433 00000723
004599 00000768
004598 00000781
004554 00000765
004501 00000779
004424 00000725
004603 00000771
004440 00000782
004413 00000719
004572 00000746
004537 00000765
004559 00000744
004502 00000785
004570 00000752
004552 00000745
004495 00000722
004538 00000771
004474 00000765
004558 00000746
004472 00000747
004550 00000770
004478 00000770
004590 00000763
004530 00000742
004429 00000722
004442 00000785
004447 00000722
004488 00000755
004423 00000725
004466 00000769
004408 00000771
004592 00000783
004469 00000771
004517 00000778
004604 00000768
004585 00000781
004524 00000719
004577 00000780
004416 00000781
004516 00000781
004546 00000765
004519 00000781
004522 00000720
004480 00000765
004579 00000781
004405 00000722
004486 00000741
004523 00000724
004569 00000740
004589 00000781
004492 00000750
004515 00000781
004571 00000753
004586 00000780
004438 00000777
004414 00000722
004481 00000769
004487 00000743
004521 00000719
004496 00000725
004462 00000740
004531 00000746
004443 00000781
004561 00000747
004527 00000723
004439 00000783
004449 00000725
004483 00000744
004419 00000724
004601 00000777
004410 00000721
004425 00000722
004445 00000730
004453 00000722
004451 00000721
004551 00000764
004565 00000747
004417 00000748
004548 00000768
004484 00000744
004549 00000770
004434 00000719
004456 00000721
004473 00000768
004412 00000741
004510 00000778
004489 00000751
004533 00000747
004547 00000768
004581 00000785
004504 00000780
004520 00000727
004441 00000716
004573 00000742
004494 00000724
004528 00000747
004564 00000747
004582 00000780
004477 00000767
004500 00000778
004427 00000726
004430 00000721
004468 00000767
004506 00000783
004545 00000770
004556 00000769
004407 00000750
004505 00000782
004432 00000721
004426 00000723
004511 00000780
004460 00000743
004508 00000782
004574 00000777
004597 00000772
004476 00000748
004578 00000778
004512 00000775
004529 00000750
004513 00000781
004463 00000747
004454 00000725
004602 00000770
004475 00000771
004446 00000724
004485 00000749
004464 00000765
[-- Attachment #9: massive-intr-200-300-15-2.txt --]
[-- Type: text/plain, Size: 3184 bytes --]
004922 00000731
004819 00000751
004979 00000735
004931 00000766
004929 00000763
004810 00000746
004901 00000748
004797 00000768
004849 00000748
004832 00000736
004914 00000733
004806 00000729
004883 00000733
004866 00000762
004955 00000748
004816 00000748
004801 00000745
004812 00000777
004800 00000745
004820 00000748
004939 00000759
004993 00000764
004813 00000748
004956 00000743
004885 00000752
004918 00000736
004947 00000734
004920 00000732
004864 00000762
004974 00000754
004973 00000751
004966 00000748
004890 00000752
004894 00000751
004928 00000759
004836 00000734
004843 00000764
004902 00000751
004927 00000737
004821 00000752
004907 00000764
004975 00000750
004824 00000745
004829 00000752
004909 00000744
004908 00000750
004940 00000764
004906 00000750
004865 00000750
004912 00000749
004896 00000749
004987 00000775
004938 00000772
004848 00000759
004880 00000732
004949 00000735
004921 00000736
004911 00000752
004842 00000734
004982 00000748
004835 00000729
004830 00000752
004839 00000736
004867 00000764
004841 00000749
004854 00000752
004913 00000750
004874 00000735
004934 00000751
004983 00000762
004870 00000767
004991 00000763
004893 00000747
004850 00000747
004808 00000739
004834 00000745
004915 00000734
004855 00000767
004980 00000756
004936 00000764
004838 00000733
004961 00000728
004952 00000734
004951 00000731
004953 00000731
004989 00000761
004990 00000767
004860 00000764
004853 00000767
004978 00000767
004887 00000745
004817 00000750
004873 00000737
004844 00000767
004840 00000726
004958 00000744
004852 00000735
004857 00000764
004837 00000734
004957 00000764
004969 00000765
004803 00000744
004884 00000734
004892 00000754
004924 00000747
004971 00000751
004847 00000747
004891 00000751
004827 00000744
004882 00000731
004815 00000747
004859 00000746
004798 00000747
004954 00000750
004962 00000756
004845 00000764
004799 00000750
004846 00000760
004862 00000764
004986 00000764
004900 00000747
004946 00000734
004868 00000770
004903 00000752
004933 00000764
004981 00000761
004875 00000731
004881 00000731
004858 00000761
004795 00000753
004937 00000747
004988 00000728
004916 00000750
004897 00000747
004917 00000771
004863 00000763
004888 00000752
004926 00000731
004968 00000752
004895 00000752
004919 00000752
004930 00000767
004960 00000735
004941 00000764
004910 00000752
004963 00000748
004886 00000736
004945 00000747
004898 00000750
004967 00000753
004923 00000751
004856 00000768
004904 00000753
004818 00000751
004878 00000764
004802 00000750
004805 00000747
004876 00000746
004899 00000748
004977 00000744
004972 00000749
004889 00000749
004833 00000736
004944 00000758
004905 00000752
004985 00000764
004942 00000766
004823 00000752
004869 00000769
004822 00000752
004811 00000737
004807 00000735
004948 00000737
004796 00000738
004932 00000738
004877 00000736
004950 00000733
004970 00000749
004851 00000746
004825 00000764
004976 00000753
004828 00000762
004935 00000753
004804 00000752
004814 00000747
004964 00000746
004861 00000764
004809 00000748
004826 00000746
004984 00000755
004879 00000749
004794 00000750
004831 00000747
004872 00000765
004959 00000747
004992 00000772
004965 00000745
004925 00000741
004943 00000773
[-- Attachment #10: massive-intr-200-300-15.txt --]
[-- Type: text/plain, Size: 3056 bytes --]
004580 00000811
004556 00000696
004478 00000698
004638 00000773
004614 00000739
004509 00000815
004464 00000813
004521 00000764
004604 00000763
004596 00000770
004623 00000738
004595 00000770
004603 00000769
004574 00000813
004597 00000761
004620 00000738
004618 00000742
004547 00000733
004488 00000698
004568 00000696
004587 00000774
004454 00000813
004460 00000695
004499 00000815
004626 00000728
004465 00000808
004523 00000769
004554 00000694
004560 00000695
004551 00000736
004590 00000767
004516 00000766
004469 00000813
004619 00000739
004463 00000698
004562 00000692
004518 00000774
004458 00000700
004592 00000738
004624 00000739
004607 00000735
004533 00000733
004487 00000690
004448 00000695
004484 00000696
004514 00000766
004559 00000691
004450 00000767
004591 00000762
004635 00000768
004584 00000812
004525 00000765
004513 00000767
004629 00000730
004540 00000733
004459 00000695
004446 00000697
004549 00000808
004538 00000736
004519 00000770
004589 00000767
004625 00000693
004517 00000760
004616 00000736
004475 00000736
004531 00000813
004565 00000697
004534 00000736
004526 00000813
004471 00000738
004493 00000695
004467 00000812
004482 00000695
004451 00000694
004442 00000692
004447 00000697
004599 00000767
004532 00000736
004520 00000768
004577 00000810
004503 00000812
004606 00000812
004506 00000805
004561 00000696
004457 00000699
004641 00000769
004485 00000700
004572 00000815
004640 00000770
004570 00000813
004445 00000766
004468 00000810
004639 00000763
004530 00000814
004528 00000762
004588 00000761
004566 00000696
004529 00000768
004491 00000694
004449 00000743
004452 00000697
004573 00000811
004500 00000816
004598 00000768
004473 00000738
004495 00000696
004504 00000814
004502 00000810
004552 00000813
004455 00000694
004576 00000815
004555 00000696
004542 00000735
004522 00000767
004581 00000812
004615 00000737
004632 00000737
004550 00000736
004541 00000733
004497 00000700
004630 00000733
004558 00000699
004636 00000769
004476 00000732
004477 00000737
004461 00000697
004613 00000739
004609 00000737
004536 00000731
004479 00000693
004601 00000769
004444 00000811
004453 00000768
004443 00000738
004575 00000810
004611 00000737
004593 00000765
004571 00000809
004610 00000734
004557 00000694
004474 00000695
004545 00000740
004466 00000813
004633 00000777
004535 00000737
004456 00000695
004578 00000811
004510 00000767
004481 00000697
004627 00000811
004586 00000771
004617 00000740
004631 00000735
004524 00000768
004553 00000699
004486 00000694
004608 00000738
004505 00000810
004498 00000812
004483 00000692
004544 00000738
004621 00000735
004512 00000771
004634 00000762
004612 00000745
004472 00000738
004515 00000772
004602 00000767
004511 00000765
004582 00000813
004470 00000738
004507 00000815
004605 00000815
004579 00000812
004564 00000698
004489 00000695
004569 00000695
004628 00000695
004492 00000698
004480 00000696
004637 00000765
004527 00000766
004490 00000698
004583 00000815
004600 00000815
004594 00000766
004496 00000694
004548 00000817
004585 00000765
004539 00000735
004622 00000737
[-- Attachment #11: massive-intr-200-300-16-1.txt --]
[-- Type: text/plain, Size: 2624 bytes --]
004446 00000771
004390 00000725
004414 00000770
004456 00000724
004415 00000760
004465 00000727
004523 00000728
004514 00000725
004500 00000757
004434 00000773
004427 00000757
004393 00000769
004498 00000757
004401 00000770
004551 00000776
004391 00000728
004458 00000725
004515 00000722
004376 00000769
004473 00000727
004386 00000728
004528 00000727
004472 00000722
004410 00000767
004505 00000767
004365 00000776
004546 00000753
004421 00000754
004384 00000775
004371 00000755
004467 00000727
004471 00000724
004444 00000773
004538 00000757
004403 00000765
004448 00000771
004557 00000758
004422 00000758
004394 00000771
004517 00000722
004383 00000725
004539 00000775
004396 00000768
004389 00000725
004381 00000771
004400 00000766
004395 00000771
004367 00000770
004397 00000774
004439 00000771
004453 00000771
004423 00000759
004480 00000767
004413 00000774
004470 00000727
004468 00000727
004531 00000727
004463 00000727
004506 00000775
004489 00000727
004435 00000775
004385 00000728
004521 00000724
004486 00000762
004543 00000775
004481 00000772
004534 00000774
004424 00000759
004532 00000724
004508 00000775
004369 00000772
004379 00000768
004442 00000771
004450 00000773
004368 00000771
004554 00000756
004511 00000769
004503 00000772
004436 00000775
004510 00000775
004408 00000772
004420 00000756
004540 00000775
004443 00000768
004441 00000768
004457 00000727
004502 00000759
004513 00000727
004547 00000756
004485 00000756
004529 00000719
004553 00000753
004372 00000768
004419 00000759
004440 00000772
004559 00000753
004454 00000768
004494 00000759
004373 00000771
004488 00000758
004461 00000724
004544 00000775
004431 00000755
004399 00000771
004405 00000772
004452 00000774
004462 00000724
004438 00000772
004507 00000772
004482 00000772
004366 00000773
004407 00000775
004455 00000765
004550 00000756
004451 00000767
004428 00000758
004542 00000774
004492 00000756
004504 00000769
004404 00000770
004370 00000774
004361 00000776
004362 00000759
004363 00000778
004499 00000777
004520 00000726
004425 00000758
004409 00000774
004449 00000768
004530 00000726
004548 00000753
004479 00000774
004522 00000726
004495 00000758
004509 00000726
004484 00000759
004378 00000770
004527 00000720
004360 00000767
004411 00000769
004460 00000770
004545 00000761
004518 00000726
004387 00000726
004516 00000726
004459 00000721
004476 00000723
004549 00000762
004558 00000757
004426 00000760
004417 00000758
004437 00000777
004491 00000758
004364 00000771
004519 00000726
004555 00000757
004525 00000728
004466 00000726
004552 00000758
004497 00000761
004490 00000756
004535 00000772
004478 00000773
004533 00000729
[-- Attachment #12: massive-intr-200-300-1.txt --]
[-- Type: text/plain, Size: 3152 bytes --]
004404 00000754
004390 00000750
004453 00000740
004449 00000728
004526 00000742
004528 00000735
004508 00000780
004512 00000774
004507 00000777
004430 00000749
004393 00000750
004425 00000730
004552 00000777
004400 00000745
004579 00000774
004502 00000771
004569 00000734
004500 00000734
004450 00000733
004456 00000730
004548 00000774
004534 00000728
004398 00000749
004497 00000737
004490 00000777
004511 00000774
004551 00000777
004576 00000778
004553 00000774
004448 00000755
004542 00000777
004407 00000751
004544 00000774
004410 00000742
004514 00000776
004495 00000737
004506 00000779
004493 00000735
004538 00000777
004573 00000731
004478 00000731
004580 00000779
004519 00000768
004578 00000774
004395 00000734
004457 00000733
004532 00000731
004520 00000774
004541 00000772
004385 00000747
004444 00000749
004546 00000776
004521 00000771
004394 00000738
004411 00000747
004487 00000733
004555 00000778
004563 00000733
004535 00000730
004530 00000733
004559 00000736
004423 00000728
004489 00000734
004441 00000750
004414 00000750
004409 00000751
004464 00000739
004451 00000735
004505 00000776
004474 00000733
004518 00000773
004583 00000733
004477 00000737
004517 00000776
004504 00000773
004408 00000747
004527 00000736
004575 00000776
004445 00000747
004476 00000731
004499 00000735
004427 00000747
004424 00000733
004554 00000776
004564 00000734
004510 00000774
004460 00000735
004513 00000773
004388 00000750
004468 00000727
004389 00000747
004452 00000732
004557 00000736
004570 00000735
004523 00000733
004522 00000730
004492 00000733
004403 00000744
004515 00000776
004397 00000744
004584 00000732
004574 00000776
004571 00000776
004547 00000776
004387 00000737
004545 00000775
004484 00000735
004402 00000760
004565 00000725
004488 00000733
004558 00000733
004434 00000744
004412 00000747
004431 00000750
004562 00000730
004396 00000735
004419 00000749
004439 00000747
004429 00000733
004433 00000761
004422 00000731
004496 00000733
004466 00000737
004567 00000733
004440 00000755
004482 00000733
004524 00000731
004432 00000744
004549 00000770
004391 00000746
004543 00000776
004525 00000730
004509 00000778
004531 00000729
004418 00000749
004406 00000749
004480 00000776
004516 00000773
004529 00000730
004417 00000753
004413 00000748
004399 00000749
004392 00000749
004435 00000753
004415 00000756
004469 00000735
004503 00000776
004416 00000749
004485 00000733
004462 00000737
004566 00000735
004438 00000748
004481 00000778
004533 00000776
004442 00000746
004447 00000749
004436 00000752
004556 00000733
004428 00000736
004560 00000730
004401 00000749
004483 00000734
004581 00000735
004550 00000773
004467 00000730
004446 00000756
004437 00000746
004577 00000774
004443 00000748
004501 00000723
004473 00000734
004540 00000735
004461 00000735
004421 00000737
004582 00000735
004471 00000737
004498 00000730
004458 00000735
004465 00000732
004568 00000732
004539 00000738
004386 00000735
004459 00000732
004472 00000732
004470 00000732
004491 00000735
004420 00000732
004536 00000734
004454 00000732
004463 00000732
004475 00000734
004426 00000727
004455 00000732
004572 00000737
004537 00000734
004479 00000731
004486 00000729
[-- Attachment #13: massive-intr-200-300-2.txt --]
[-- Type: text/plain, Size: 3120 bytes --]
004409 00000752
004398 00000752
004379 00000752
004445 00000709
004531 00000796
004405 00000752
004433 00000755
004401 00000752
004538 00000791
004418 00000706
004491 00000711
004546 00000756
004383 00000752
004392 00000799
004389 00000752
004478 00000706
004386 00000754
004520 00000709
004547 00000747
004391 00000799
004535 00000802
004427 00000755
004364 00000752
004439 00000755
004441 00000708
004482 00000708
004534 00000796
004542 00000796
004499 00000755
004508 00000755
004545 00000751
004507 00000751
004488 00000749
004484 00000749
004371 00000757
004359 00000758
004440 00000756
004450 00000708
004407 00000757
004457 00000788
004395 00000799
004449 00000705
004469 00000711
004501 00000757
004458 00000796
004500 00000754
004510 00000708
004480 00000710
004415 00000711
004539 00000801
004360 00000749
004528 00000801
004453 00000801
004496 00000752
004426 00000751
004487 00000757
004373 00000751
004404 00000754
004494 00000745
004413 00000754
004410 00000757
004384 00000751
004411 00000754
004525 00000799
004382 00000754
004443 00000710
004431 00000756
004504 00000751
004489 00000705
004522 00000754
004476 00000709
004554 00000751
004365 00000754
004357 00000754
004378 00000754
004524 00000801
004385 00000754
004375 00000754
004461 00000796
004490 00000705
004477 00000714
004533 00000801
004493 00000711
004455 00000798
004470 00000801
004399 00000801
004544 00000801
004481 00000711
004475 00000708
004416 00000711
004376 00000754
004459 00000710
004549 00000752
004422 00000756
004479 00000710
004466 00000707
004370 00000751
004381 00000751
004483 00000751
004463 00000798
004434 00000751
004397 00000801
004541 00000801
004448 00000708
004437 00000757
004425 00000750
004514 00000708
004442 00000708
004551 00000753
004394 00000801
004367 00000799
004492 00000714
004390 00000798
004540 00000798
004511 00000757
004527 00000798
004548 00000753
004402 00000756
004498 00000750
004430 00000749
004428 00000748
004361 00000756
004497 00000757
004502 00000754
004368 00000755
004436 00000754
004523 00000798
004471 00000798
004366 00000752
004555 00000753
004447 00000710
004505 00000754
004516 00000710
004460 00000798
004495 00000758
004518 00000714
004521 00000711
004424 00000751
004519 00000711
004464 00000800
004435 00000757
004515 00000712
004474 00000709
004512 00000708
004396 00000795
004420 00000712
004526 00000798
004408 00000751
004553 00000754
004372 00000756
004454 00000705
004552 00000750
004456 00000803
004423 00000713
004509 00000756
004406 00000756
004486 00000756
004465 00000795
004421 00000710
004403 00000756
004388 00000756
004532 00000705
004374 00000756
004432 00000758
004446 00000712
004363 00000756
004550 00000748
004414 00000712
004362 00000742
004452 00000709
004537 00000800
004380 00000756
004530 00000800
004468 00000800
004462 00000803
004485 00000756
004393 00000797
004543 00000800
004444 00000709
004472 00000713
004451 00000710
004513 00000713
004417 00000715
004419 00000712
004369 00000750
004377 00000753
004412 00000753
004387 00000756
004503 00000755
004556 00000750
004506 00000758
004467 00000795
004438 00000758
004429 00000758
004536 00000790
[-- Attachment #14: massive-intr-200-300-3.txt --]
[-- Type: text/plain, Size: 3056 bytes --]
004463 00000781
004475 00000779
004435 00000765
004398 00000775
004399 00000778
004409 00000746
004462 00000777
004438 00000783
004401 00000781
004377 00000785
004383 00000725
004443 00000723
004372 00000723
004415 00000734
004375 00000716
004449 00000722
004419 00000736
004513 00000723
004515 00000736
004494 00000765
004431 00000768
004426 00000772
004367 00000783
004488 00000769
004429 00000767
004483 00000769
004425 00000741
004514 00000734
004434 00000762
004456 00000719
004481 00000765
004499 00000736
004470 00000785
004412 00000733
004466 00000778
004369 00000740
004402 00000780
004444 00000723
004384 00000718
004397 00000723
004416 00000737
004489 00000761
004491 00000770
004508 00000732
004468 00000775
004531 00000740
004410 00000739
004505 00000733
004496 00000738
004427 00000768
004479 00000775
004436 00000766
004480 00000763
004501 00000742
004528 00000733
004403 00000781
004534 00000737
004564 00000766
004440 00000762
004467 00000777
004535 00000783
004476 00000792
004464 00000780
004391 00000721
004517 00000724
004374 00000720
004472 00000784
004417 00000730
004421 00000739
004465 00000785
004452 00000725
004541 00000777
004389 00000722
004446 00000721
004448 00000721
004458 00000769
004486 00000762
004540 00000783
004411 00000739
004551 00000766
004445 00000723
004428 00000765
004390 00000724
004441 00000767
004457 00000721
004559 00000764
004553 00000780
004371 00000723
004537 00000778
004368 00000764
004512 00000718
004536 00000780
004455 00000718
004450 00000721
004473 00000768
004561 00000767
004527 00000733
004538 00000788
004385 00000722
004554 00000764
004442 00000720
004510 00000738
004482 00000769
004504 00000739
004523 00000738
004524 00000734
004407 00000719
004422 00000738
004437 00000764
004387 00000722
004474 00000783
004413 00000734
004492 00000761
004516 00000736
004497 00000738
004484 00000764
004509 00000737
004471 00000783
004529 00000720
004558 00000771
004533 00000745
004423 00000736
004552 00000780
004370 00000723
004485 00000768
004563 00000773
004525 00000738
004521 00000740
004556 00000767
004518 00000721
004439 00000766
004522 00000733
004487 00000770
004424 00000735
004543 00000777
004430 00000767
004503 00000736
004547 00000779
004530 00000735
004405 00000779
004565 00000764
004542 00000784
004549 00000786
004557 00000767
004550 00000788
004490 00000770
004539 00000785
004502 00000743
004469 00000785
004376 00000740
004459 00000767
004546 00000780
004414 00000745
004555 00000768
004477 00000785
004396 00000726
004451 00000726
004506 00000738
004460 00000773
004507 00000741
004366 00000727
004461 00000782
004447 00000724
004378 00000769
004432 00000764
004408 00000726
004454 00000722
004520 00000726
004386 00000720
004392 00000722
004393 00000720
004380 00000727
004519 00000727
004406 00000720
004394 00000721
004532 00000743
004373 00000726
004545 00000782
004500 00000736
004544 00000786
004495 00000765
004379 00000722
004433 00000761
004382 00000722
004511 00000738
004420 00000741
004418 00000736
004478 00000783
004388 00000720
004453 00000723
004562 00000766
[-- Attachment #15: massive-intr-200-300-4.txt --]
[-- Type: text/plain, Size: 3120 bytes --]
004478 00000736
004572 00000807
004414 00000713
004577 00000721
004513 00000736
004419 00000721
004411 00000718
004509 00000718
004505 00000721
004420 00000718
004413 00000721
004468 00000735
004449 00000733
004412 00000718
004410 00000721
004415 00000721
004488 00000807
004494 00000808
004499 00000721
004554 00000730
004593 00000734
004463 00000735
004460 00000732
004464 00000727
004404 00000735
004524 00000733
004395 00000806
004454 00000735
004584 00000735
004534 00000735
004537 00000737
004512 00000718
004440 00000718
004565 00000809
004500 00000718
004515 00000733
004447 00000730
004465 00000735
004455 00000738
004406 00000737
004564 00000812
004425 00000812
004558 00000807
004498 00000804
004470 00000735
004492 00000812
004518 00000735
004467 00000735
004433 00000721
004547 00000730
004439 00000717
004423 00000807
004568 00000812
004544 00000726
004549 00000735
004421 00000723
004591 00000732
004487 00000812
004490 00000807
004562 00000806
004427 00000812
004567 00000809
004497 00000809
004561 00000811
004485 00000807
004483 00000809
004581 00000729
004451 00000738
004553 00000727
004585 00000739
004424 00000813
004502 00000809
004442 00000722
004532 00000737
004403 00000720
004530 00000737
004570 00000809
004541 00000735
004566 00000811
004481 00000720
004430 00000717
004443 00000717
004405 00000814
004429 00000806
004579 00000735
004416 00000717
004473 00000719
004438 00000720
004432 00000714
004573 00000809
004408 00000720
004575 00000732
004511 00000720
004444 00000738
004450 00000735
004472 00000738
004475 00000738
004409 00000720
004521 00000738
004555 00000732
004545 00000735
004550 00000735
004535 00000734
004520 00000735
004417 00000720
004556 00000735
004539 00000729
004428 00000806
004587 00000734
004476 00000735
004495 00000809
004542 00000737
004480 00000720
004525 00000734
004469 00000737
004527 00000734
004533 00000737
004552 00000732
004400 00000720
004578 00000738
004538 00000737
004441 00000720
004486 00000810
004548 00000734
004504 00000720
004514 00000732
004590 00000732
004446 00000732
004569 00000812
004557 00000809
004397 00000737
004531 00000736
004436 00000723
004457 00000737
004491 00000809
004435 00000713
004522 00000732
004479 00000732
004546 00000732
004508 00000720
004422 00000720
004459 00000734
004418 00000723
004462 00000734
004543 00000737
004496 00000808
004437 00000720
004471 00000735
004507 00000717
004458 00000734
004431 00000720
004588 00000734
004401 00000714
004506 00000716
004407 00000719
004563 00000814
004461 00000737
004540 00000734
004399 00000722
004582 00000737
004592 00000734
004576 00000809
004560 00000814
004559 00000811
004426 00000811
004489 00000809
004571 00000814
004493 00000811
004477 00000737
004452 00000734
004448 00000737
004394 00000718
004474 00000734
004396 00000737
004482 00000720
004398 00000720
004456 00000734
004503 00000720
004586 00000734
004526 00000734
004529 00000734
004536 00000731
004434 00000738
004528 00000736
004445 00000734
004517 00000736
004510 00000720
004551 00000732
004523 00000734
004589 00000731
004580 00000736
004466 00000736
004453 00000737
004516 00000737
004501 00000806
[-- Attachment #16: massive-intr-200-300-5.txt --]
[-- Type: text/plain, Size: 2768 bytes --]
004452 00000689
004559 00000766
004437 00000769
004488 00000776
004466 00000766
004478 00000766
004399 00000763
004394 00000694
004511 00000763
004391 00000694
004440 00000689
004421 00000780
004374 00000689
004392 00000694
004505 00000766
004456 00000694
004566 00000763
004459 00000691
004509 00000766
004409 00000776
004411 00000776
004530 00000779
004471 00000776
004410 00000779
004550 00000766
004562 00000769
004544 00000768
004564 00000766
004474 00000766
004489 00000776
004425 00000776
004524 00000778
004460 00000694
004462 00000763
004567 00000760
004386 00000694
004565 00000766
004522 00000776
004516 00000766
004415 00000778
004507 00000763
004463 00000766
004454 00000694
004393 00000689
004563 00000692
004424 00000788
004404 00000766
004508 00000765
004396 00000691
004554 00000765
004499 00000775
004406 00000691
004480 00000776
004523 00000778
004432 00000760
004558 00000768
004502 00000765
004401 00000763
004503 00000763
004557 00000762
004395 00000691
004520 00000781
004539 00000766
004416 00000766
004389 00000691
004379 00000694
004419 00000784
004388 00000688
004427 00000765
004383 00000690
004513 00000694
004479 00000760
004420 00000784
004412 00000779
004418 00000783
004376 00000781
004533 00000763
004540 00000757
004413 00000778
004458 00000693
004492 00000778
004497 00000779
004447 00000691
004521 00000781
004435 00000768
004481 00000777
004375 00000694
004448 00000691
004519 00000691
004414 00000781
004553 00000777
004560 00000691
004387 00000694
004486 00000781
004453 00000691
004556 00000768
004484 00000777
004525 00000775
004494 00000781
004381 00000772
004493 00000782
004491 00000775
004426 00000765
004483 00000778
004547 00000770
004517 00000764
004498 00000766
004433 00000765
004443 00000765
004532 00000768
004510 00000765
004527 00000778
004441 00000765
004555 00000765
004434 00000767
004428 00000768
004506 00000770
004529 00000783
004417 00000690
004385 00000693
004473 00000765
004543 00000772
004370 00000787
004423 00000782
004495 00000783
004430 00000767
004528 00000780
004445 00000696
004465 00000768
004496 00000782
004398 00000768
004551 00000780
004515 00000762
004470 00000765
004490 00000778
004541 00000765
004487 00000777
004504 00000770
004482 00000783
004475 00000765
004561 00000762
004526 00000784
004476 00000765
004422 00000780
004469 00000765
004438 00000770
004477 00000768
004436 00000764
004439 00000770
004545 00000773
004397 00000693
004372 00000692
004431 00000767
004371 00000765
004500 00000764
004442 00000767
004407 00000690
004400 00000769
004512 00000770
004461 00000765
004467 00000769
004429 00000759
004451 00000689
004368 00000693
004536 00000762
004405 00000690
004444 00000693
004472 00000767
004468 00000762
004537 00000762
004518 00000693
004403 00000767
004552 00000765
[-- Attachment #17: massive-intr-200-300-6.txt --]
[-- Type: text/plain, Size: 3072 bytes --]
004513 00000777
004429 00000690
004447 00000781
004462 00000776
004568 00000776
004541 00000748
004428 00000691
004496 00000693
004510 00000779
004566 00000774
004602 00000752
004432 00000690
004555 00000750
004523 00000780
004570 00000783
004427 00000693
004459 00000693
004583 00000778
004585 00000777
004433 00000690
004599 00000747
004449 00000777
004503 00000773
004508 00000780
004442 00000779
004580 00000777
004469 00000747
004595 00000781
004527 00000695
004489 00000783
004484 00000749
004479 00000753
004551 00000749
004605 00000747
004515 00000777
004482 00000744
004436 00000696
004512 00000776
004458 00000692
004483 00000747
004540 00000751
004478 00000746
004560 00000785
004535 00000781
004464 00000783
004559 00000779
004514 00000776
004603 00000754
004423 00000692
004410 00000783
004529 00000785
004434 00000693
004556 00000743
004472 00000750
004604 00000754
004465 00000780
004412 00000696
004524 00000693
004439 00000693
004413 00000693
004505 00000779
004416 00000692
004533 00000783
004509 00000776
004491 00000783
004558 00000747
004486 00000782
004488 00000784
004457 00000693
004537 00000781
004606 00000744
004588 00000776
004577 00000783
004480 00000755
004563 00000781
004461 00000781
004437 00000693
004463 00000778
004452 00000693
004545 00000741
004485 00000779
004573 00000782
004530 00000775
004471 00000748
004554 00000778
004466 00000783
004593 00000779
004504 00000780
004532 00000781
004506 00000773
004444 00000771
004574 00000778
004431 00000690
004494 00000782
004450 00000692
004499 00000690
004448 00000785
004578 00000780
004481 00000778
004544 00000752
004487 00000778
004557 00000753
004534 00000784
004507 00000779
004582 00000779
004567 00000775
004538 00000746
004548 00000749
004543 00000746
004587 00000778
004549 00000749
004408 00000692
004575 00000749
004598 00000749
004474 00000749
004495 00000692
004542 00000744
004445 00000772
004525 00000696
004607 00000751
004454 00000692
004498 00000695
004497 00000700
004440 00000694
004470 00000749
004561 00000782
004572 00000787
004477 00000750
004443 00000776
004467 00000782
004553 00000747
004547 00000752
004565 00000783
004594 00000775
004571 00000783
004591 00000776
004420 00000752
004579 00000779
004511 00000780
004539 00000746
004417 00000784
004446 00000779
004522 00000772
004419 00000782
004584 00000778
004473 00000750
004550 00000695
004418 00000694
004519 00000695
004441 00000694
004460 00000692
004414 00000779
004492 00000783
004409 00000779
004528 00000783
004501 00000695
004518 00000766
004426 00000695
004562 00000783
004502 00000694
004493 00000783
004476 00000751
004590 00000784
004536 00000783
004592 00000778
004475 00000746
004552 00000747
004597 00000777
004600 00000750
004546 00000750
004581 00000773
004589 00000778
004435 00000699
004424 00000695
004421 00000695
004425 00000689
004521 00000694
004430 00000695
004415 00000694
004526 00000692
004531 00000781
004422 00000694
004456 00000692
004455 00000692
004500 00000692
004438 00000689
004451 00000692
004520 00000777
004564 00000777
004516 00000778
004576 00000751
004490 00000694
[-- Attachment #18: massive-intr-200-300-7.txt --]
[-- Type: text/plain, Size: 2992 bytes --]
004676 00000739
004762 00000727
004830 00000726
004845 00000721
004856 00000747
004771 00000736
004815 00000757
004736 00000755
004756 00000730
004732 00000754
004777 00000738
004789 00000795
004808 00000753
004722 00000797
004781 00000737
004718 00000797
004854 00000751
004818 00000751
004724 00000800
004705 00000739
004802 00000760
004727 00000754
004778 00000741
004734 00000750
004779 00000740
004677 00000740
004796 00000797
004693 00000740
004784 00000800
004659 00000752
004831 00000727
004813 00000756
004742 00000797
004787 00000800
004673 00000737
004788 00000799
004768 00000740
004672 00000737
004749 00000724
004829 00000724
004716 00000794
004753 00000724
004851 00000753
004760 00000724
004723 00000796
004805 00000724
004700 00000737
004770 00000737
004832 00000721
004729 00000756
004811 00000753
004816 00000753
004697 00000737
004685 00000726
004750 00000726
004689 00000737
004688 00000801
004767 00000737
004794 00000790
004800 00000753
004719 00000798
004844 00000796
004737 00000753
004745 00000796
004694 00000742
004776 00000734
004759 00000738
004793 00000799
004783 00000794
004810 00000750
004761 00000728
004782 00000734
004797 00000801
004667 00000737
004774 00000731
004825 00000723
004715 00000796
004804 00000751
004812 00000750
004780 00000741
004703 00000739
004772 00000739
004675 00000739
004839 00000801
004690 00000796
004795 00000796
004850 00000757
004728 00000759
004747 00000799
004738 00000758
004757 00000726
004848 00000722
004684 00000721
004769 00000734
004801 00000752
004708 00000736
004819 00000755
004680 00000720
004704 00000733
004661 00000740
004711 00000739
004683 00000724
004726 00000797
004786 00000797
004775 00000736
004754 00000729
004674 00000736
004799 00000757
004809 00000751
004664 00000754
004666 00000760
004702 00000739
004752 00000723
004806 00000752
004765 00000725
004682 00000726
004828 00000728
004741 00000757
004740 00000755
004834 00000723
004746 00000801
004678 00000722
004733 00000747
004791 00000799
004748 00000725
004692 00000790
004730 00000757
004713 00000794
004763 00000728
004696 00000736
004840 00000725
004836 00000723
004824 00000719
004841 00000723
004855 00000752
004766 00000725
004701 00000740
004706 00000739
004739 00000749
004833 00000729
004709 00000741
004847 00000720
004744 00000752
004662 00000739
004707 00000740
004731 00000752
004820 00000798
004758 00000727
004686 00000802
004792 00000799
004823 00000723
004773 00000734
004814 00000748
004735 00000752
004846 00000726
004695 00000733
004835 00000724
004665 00000743
004712 00000800
004679 00000726
004837 00000725
004687 00000796
004699 00000743
004691 00000752
004842 00000796
004785 00000796
004743 00000753
004658 00000795
004663 00000744
004670 00000729
004822 00000726
004838 00000717
004698 00000736
004720 00000720
004671 00000733
004660 00000728
004798 00000793
004826 00000728
004717 00000799
004849 00000755
004803 00000752
004710 00000739
004807 00000752
004817 00000757
004790 00000796
004668 00000800
004853 00000751
[-- Attachment #19: massive-intr-200-300-8.txt --]
[-- Type: text/plain, Size: 3200 bytes --]
004546 00000739
004492 00000802
004514 00000732
004585 00000738
004588 00000732
004583 00000735
004454 00000735
004506 00000732
004518 00000735
004547 00000734
004428 00000797
004413 00000730
004536 00000732
004393 00000738
004405 00000732
004575 00000796
004471 00000735
004532 00000737
004455 00000737
004549 00000732
004578 00000733
004414 00000734
004488 00000796
004556 00000791
004589 00000737
004425 00000791
004561 00000796
004449 00000734
004502 00000795
004458 00000735
004444 00000734
004542 00000737
004571 00000793
004460 00000734
004463 00000737
004450 00000737
004436 00000736
004581 00000737
004550 00000737
004493 00000735
004394 00000746
004410 00000736
004531 00000734
004537 00000734
004431 00000735
004451 00000734
004519 00000737
004487 00000737
004501 00000796
004426 00000793
004478 00000732
004418 00000729
004480 00000732
004565 00000794
004424 00000796
004503 00000797
004403 00000740
004586 00000736
004500 00000790
004534 00000736
004411 00000735
004445 00000734
004416 00000734
004407 00000735
004484 00000735
004457 00000731
004448 00000745
004453 00000737
004553 00000734
004476 00000732
004486 00000735
004479 00000736
004419 00000735
004572 00000746
004562 00000793
004580 00000731
004548 00000734
004466 00000737
004545 00000737
004392 00000792
004590 00000735
004446 00000737
004438 00000737
004544 00000734
004432 00000734
004512 00000734
004475 00000732
004441 00000739
004423 00000796
004398 00000732
004409 00000732
004540 00000734
004552 00000734
004481 00000732
004396 00000729
004470 00000732
004467 00000734
004443 00000734
004461 00000748
004417 00000745
004559 00000795
004490 00000794
004577 00000795
004564 00000798
004427 00000794
004497 00000794
004496 00000793
004400 00000734
004538 00000737
004402 00000740
004472 00000734
004399 00000798
004491 00000796
004498 00000789
004558 00000792
004391 00000737
004429 00000794
004468 00000729
004555 00000730
004494 00000798
004440 00000732
004404 00000731
004524 00000735
004520 00000738
004554 00000747
004485 00000732
004421 00000729
004563 00000795
004530 00000736
004430 00000801
004401 00000732
004513 00000731
004515 00000733
004525 00000736
004504 00000796
004516 00000738
004489 00000790
004406 00000737
004509 00000736
004573 00000795
004442 00000736
004582 00000739
004543 00000736
004522 00000738
004462 00000734
004517 00000728
004527 00000736
004464 00000736
004483 00000737
004434 00000739
004422 00000737
004539 00000737
004447 00000733
004412 00000734
004551 00000735
004510 00000741
004474 00000731
004505 00000739
004535 00000733
004533 00000734
004569 00000792
004507 00000735
004568 00000795
004459 00000738
004541 00000748
004465 00000733
004521 00000736
004576 00000792
004557 00000792
004584 00000737
004511 00000736
004435 00000740
004567 00000793
004587 00000736
004574 00000795
004526 00000736
004452 00000736
004529 00000733
004523 00000747
004477 00000735
004395 00000733
004408 00000734
004469 00000734
004560 00000797
004437 00000736
004508 00000732
004482 00000734
004415 00000730
004420 00000734
004473 00000734
004397 00000734
004499 00000792
004433 00000736
004439 00000733
004528 00000741
004456 00000736
004579 00000733
004570 00000799
004495 00000795
004566 00000789
[-- Attachment #20: massive-intr-200-300-9.txt --]
[-- Type: text/plain, Size: 3184 bytes --]
004528 00000799
004424 00000769
004561 00000769
004484 00000726
004377 00000728
004413 00000721
004430 00000769
004461 00000799
004531 00000799
004546 00000791
004456 00000799
004374 00000798
004523 00000723
004396 00000796
004537 00000799
004412 00000724
004420 00000771
004463 00000796
004395 00000798
004466 00000796
004477 00000721
004500 00000768
004368 00000726
004470 00000724
004522 00000724
004369 00000724
004390 00000724
004418 00000726
004479 00000724
004547 00000726
004468 00000796
004471 00000796
004551 00000725
004489 00000723
004378 00000721
004542 00000796
004474 00000718
004446 00000727
004365 00000795
004545 00000724
004441 00000726
004415 00000725
004447 00000712
004502 00000766
004520 00000720
004458 00000801
004431 00000801
004367 00000770
004556 00000768
004550 00000766
004411 00000721
004491 00000726
004549 00000770
004403 00000721
004534 00000796
004407 00000725
004405 00000724
004451 00000724
004517 00000728
004472 00000801
004383 00000727
004487 00000720
004394 00000801
004526 00000726
004437 00000765
004521 00000728
004501 00000768
004455 00000803
004497 00000775
004385 00000718
004438 00000768
004507 00000770
004428 00000773
004465 00000796
004423 00000728
004496 00000768
004543 00000801
004527 00000795
004533 00000798
004419 00000727
004436 00000772
004493 00000725
004372 00000773
004555 00000767
004544 00000796
004467 00000801
004434 00000772
004444 00000720
004529 00000801
004399 00000796
004460 00000801
004504 00000771
004464 00000798
004554 00000767
004559 00000722
004515 00000725
004435 00000769
004499 00000770
004558 00000761
004505 00000767
004429 00000770
004370 00000721
004492 00000725
004454 00000718
004421 00000722
004400 00000798
004425 00000773
004532 00000798
004459 00000798
004518 00000724
004427 00000773
004535 00000801
004553 00000768
004414 00000720
004563 00000770
004416 00000722
004541 00000798
004516 00000725
004519 00000725
004552 00000770
004457 00000795
004482 00000729
004375 00000725
004562 00000770
004490 00000725
004530 00000798
004539 00000800
004439 00000767
004371 00000725
004373 00000726
004443 00000725
004486 00000728
004557 00000771
004432 00000769
004388 00000726
004524 00000725
004409 00000723
004560 00000770
004380 00000726
004393 00000723
004408 00000723
004406 00000723
004475 00000723
004445 00000723
004364 00000723
004476 00000723
004410 00000725
004389 00000726
004478 00000723
004397 00000723
004450 00000720
004387 00000720
004483 00000723
004398 00000802
004509 00000776
004495 00000770
004485 00000721
004453 00000723
004506 00000771
004366 00000726
004473 00000803
004422 00000726
004462 00000800
004536 00000797
004449 00000726
004503 00000770
004391 00000723
004426 00000774
004386 00000721
004525 00000729
004480 00000723
004510 00000767
004494 00000771
004382 00000720
004452 00000724
004417 00000727
004448 00000722
004488 00000725
004376 00000725
004381 00000725
004404 00000721
004469 00000795
004548 00000728
004513 00000724
004511 00000727
004514 00000727
004498 00000771
004508 00000773
004540 00000803
004440 00000722
004433 00000772
004384 00000723
004481 00000720
004512 00000775
004442 00000727
004392 00000722
004402 00000720
004538 00000800
004401 00000797
[-- Attachment #21: massive-intr-200-300-without-patch.txt --]
[-- Type: text/plain, Size: 2784 bytes --]
004726 00000761
004723 00000763
004793 00000763
004776 00000736
004746 00000735
004731 00000754
004685 00000735
004835 00000754
004782 00000751
004747 00000736
004766 00000754
004663 00000735
004696 00000752
004737 00000760
004679 00000735
004727 00000751
004840 00000754
004720 00000767
004718 00000764
004788 00000761
004716 00000770
004791 00000758
004655 00000755
004838 00000757
004811 00000753
004659 00000768
004686 00000735
004740 00000759
004676 00000739
004849 00000748
004825 00000763
004808 00000748
004844 00000747
004702 00000755
004828 00000758
004829 00000758
004822 00000750
004820 00000753
004805 00000751
004764 00000748
004717 00000765
004794 00000761
004701 00000750
004792 00000766
004818 00000753
004842 00000752
004837 00000751
004697 00000750
004654 00000739
004763 00000754
004851 00000761
004671 00000738
004807 00000753
004734 00000760
004661 00000740
004743 00000737
004664 00000740
004682 00000737
004741 00000750
004817 00000750
004694 00000754
004779 00000753
004833 00000754
004758 00000757
004809 00000756
004815 00000752
004666 00000758
004770 00000750
004704 00000737
004709 00000753
004841 00000754
004732 00000753
004706 00000753
004675 00000739
004745 00000737
004719 00000765
004691 00000764
004777 00000756
004778 00000750
004780 00000759
004754 00000737
004799 00000755
004848 00000755
004752 00000737
004742 00000734
004773 00000752
004774 00000747
004673 00000736
004787 00000763
004781 00000756
004693 00000753
004692 00000751
004769 00000750
004728 00000763
004756 00000758
004749 00000737
004762 00000753
004687 00000739
004827 00000766
004683 00000734
004761 00000757
004678 00000739
004830 00000763
004803 00000763
004798 00000765
004850 00000760
004771 00000749
004674 00000737
004832 00000753
004821 00000757
004753 00000734
004843 00000752
004724 00000763
004759 00000752
004800 00000753
004700 00000753
004824 00000763
004767 00000755
004823 00000751
004789 00000768
004757 00000755
004852 00000765
004836 00000756
004839 00000757
004760 00000748
004834 00000758
004739 00000759
004786 00000768
004846 00000754
004711 00000761
004826 00000765
004695 00000755
004710 00000758
004783 00000761
004765 00000755
004684 00000731
004698 00000752
004785 00000768
004755 00000736
004813 00000754
004775 00000753
004795 00000765
004712 00000755
004768 00000755
004713 00000767
004816 00000752
004790 00000765
004744 00000731
004736 00000756
004672 00000741
004715 00000766
004667 00000754
004705 00000755
004810 00000755
004708 00000755
004707 00000752
004750 00000736
004688 00000736
004772 00000741
004703 00000736
004681 00000736
004748 00000737
004668 00000736
004690 00000739
004669 00000739
004733 00000743
004656 00000767
004812 00000749
004714 00000771
004677 00000741
004806 00000755
004665 00000736
004680 00000739
004670 00000739
[-- Attachment #22: massive-intr-200-300-with-patch.txt --]
[-- Type: text/plain, Size: 3200 bytes --]
004663 00000754
004634 00000694
004723 00000800
004746 00000751
004734 00000768
004633 00000689
004755 00000754
004722 00000797
004626 00000797
004689 00000765
004767 00000695
004813 00000765
004724 00000800
004621 00000769
004725 00000796
004714 00000799
004789 00000793
004631 00000758
004712 00000796
004744 00000748
004655 00000796
004783 00000751
004785 00000800
004790 00000796
004758 00000748
004816 00000772
004683 00000765
004636 00000694
004771 00000691
004619 00000695
004669 00000753
004623 00000696
004775 00000753
004752 00000748
004778 00000754
004784 00000751
004739 00000767
004807 00000762
004693 00000765
004691 00000770
004736 00000763
004709 00000768
004720 00000796
004628 00000695
004772 00000695
004696 00000695
004682 00000692
004675 00000748
004643 00000689
004637 00000695
004715 00000793
004787 00000796
004792 00000793
004797 00000796
004708 00000768
004651 00000796
004806 00000766
004679 00000766
004811 00000763
004699 00000695
004624 00000769
004638 00000695
004645 00000695
004635 00000692
004704 00000692
004742 00000764
004680 00000761
004800 00000796
004796 00000801
004802 00000798
004731 00000793
004677 00000770
004640 00000692
004657 00000692
004656 00000793
004730 00000790
004786 00000795
004817 00000766
004627 00000694
004727 00000793
004814 00000773
004658 00000798
004695 00000689
004791 00000792
004653 00000795
004798 00000792
004673 00000745
004666 00000753
004753 00000751
004664 00000753
004788 00000798
004801 00000753
004685 00000766
004810 00000770
004750 00000753
004754 00000755
004652 00000795
004668 00000753
004654 00000795
004648 00000695
004777 00000747
004765 00000694
004672 00000753
004665 00000750
004737 00000770
004757 00000747
004620 00000796
004780 00000750
004717 00000792
004773 00000751
004756 00000767
004760 00000746
004808 00000770
004776 00000753
004662 00000756
004670 00000750
004625 00000694
004647 00000694
004794 00000795
004738 00000767
004641 00000698
004735 00000767
004759 00000694
004799 00000790
004762 00000697
004629 00000694
004769 00000694
004705 00000694
004743 00000767
004781 00000750
004701 00000697
004661 00000749
004702 00000694
004710 00000770
004681 00000767
004700 00000691
004686 00000767
004642 00000694
004747 00000753
004644 00000694
004812 00000767
004748 00000750
004733 00000764
004721 00000797
004687 00000771
004690 00000771
004751 00000749
004632 00000694
004732 00000764
004728 00000798
004766 00000694
004706 00000764
004630 00000694
004688 00000764
004711 00000694
004622 00000753
004795 00000798
004815 00000770
004729 00000791
004763 00000747
004818 00000766
004674 00000749
004761 00000694
004749 00000752
004770 00000692
004718 00000795
004694 00000694
004782 00000755
004809 00000766
004740 00000770
004671 00000752
004716 00000762
004707 00000766
004692 00000801
004719 00000795
004713 00000800
004659 00000797
004764 00000749
004774 00000747
004698 00000688
004649 00000696
004779 00000752
004768 00000694
004676 00000752
004646 00000693
004805 00000755
004697 00000691
004703 00000692
004639 00000694
004804 00000693
004803 00000754
004678 00000769
004741 00000768
004684 00000761
004660 00000693
004793 00000797
004667 00000753
004726 00000795
004745 00000755
004650 00000691
[-- Attachment #23: plot.sh --]
[-- Type: application/x-sh, Size: 1242 bytes --]
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 00/15] CFS Bandwidth Control V6
2011-06-15 8:37 ` Hu Tao
@ 2011-06-16 0:57 ` Hidetoshi Seto
2011-06-16 9:45 ` Hu Tao
0 siblings, 1 reply; 129+ messages in thread
From: Hidetoshi Seto @ 2011-06-16 0:57 UTC (permalink / raw)
To: Hu Tao
Cc: Paul Turner, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri
(2011/06/15 17:37), Hu Tao wrote:
> On Tue, Jun 14, 2011 at 04:29:49PM +0900, Hidetoshi Seto wrote:
>> (2011/06/14 15:58), Hu Tao wrote:
>>> Hi,
>>>
>>> I've run several tests including hackbench, unixbench, massive-intr
>>> and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
>>> 4 cores, and 4G memory.
>>>
>>> Most of the time the results differ few, but there are problems:
>>>
>>> 1. unixbench: execl throughout has about 5% drop.
>>> 2. unixbench: process creation has about 5% drop.
>>> 3. massive-intr: when running 200 processes for 5mins, the number
>>> of loops each process runs differ more than before cfs-bandwidth-v6.
>>>
>>> The results are attached.
>>
>> I know the score of unixbench is not so stable that the problem might
>> be noises ... but the result of massive-intr is interesting.
>> Could you give a try to find which piece (xx/15) in the series cause
>> the problems?
>
> After more tests, I found massive-intr data is not stable, too. Results
> are attached. The third number in file name means which patchs are
> applied, 0 means no patch applied. plot.sh is easy to generate png
> files.
(Though I don't know what the 16th patch of this series is, anyway)
I see that the results of 15, 15-1 and 15-2 are very different and that
15-2 is similar to without-patch.
One concern is whether this unstable of data is really caused by the
nature of your test (hardware, massive-intr itself and something running
in background etc.) or by a hidden piece in the bandwidth patch set.
Did you see "not stable" data when none of patches is applied?
If not, which patch makes it unstable?
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 00/15] CFS Bandwidth Control V6
2011-06-16 0:57 ` Hidetoshi Seto
@ 2011-06-16 9:45 ` Hu Tao
2011-06-17 1:22 ` Hidetoshi Seto
0 siblings, 1 reply; 129+ messages in thread
From: Hu Tao @ 2011-06-16 9:45 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: Paul Turner, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri
[-- Attachment #1: Type: text/plain, Size: 1937 bytes --]
On Thu, Jun 16, 2011 at 09:57:09AM +0900, Hidetoshi Seto wrote:
> (2011/06/15 17:37), Hu Tao wrote:
> > On Tue, Jun 14, 2011 at 04:29:49PM +0900, Hidetoshi Seto wrote:
> >> (2011/06/14 15:58), Hu Tao wrote:
> >>> Hi,
> >>>
> >>> I've run several tests including hackbench, unixbench, massive-intr
> >>> and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
> >>> 4 cores, and 4G memory.
> >>>
> >>> Most of the time the results differ few, but there are problems:
> >>>
> >>> 1. unixbench: execl throughout has about 5% drop.
> >>> 2. unixbench: process creation has about 5% drop.
> >>> 3. massive-intr: when running 200 processes for 5mins, the number
> >>> of loops each process runs differ more than before cfs-bandwidth-v6.
> >>>
> >>> The results are attached.
> >>
> >> I know the score of unixbench is not so stable that the problem might
> >> be noises ... but the result of massive-intr is interesting.
> >> Could you give a try to find which piece (xx/15) in the series cause
> >> the problems?
> >
> > After more tests, I found massive-intr data is not stable, too. Results
> > are attached. The third number in file name means which patchs are
> > applied, 0 means no patch applied. plot.sh is easy to generate png
> > files.
>
> (Though I don't know what the 16th patch of this series is, anyway)
the 16th patch is this: https://lkml.org/lkml/2011/5/23/503
> I see that the results of 15, 15-1 and 15-2 are very different and that
> 15-2 is similar to without-patch.
>
> One concern is whether this unstable of data is really caused by the
> nature of your test (hardware, massive-intr itself and something running
> in background etc.) or by a hidden piece in the bandwidth patch set.
> Did you see "not stable" data when none of patches is applied?
Yes.
But for a five-runs the result seems 'stable'(before patches and after
patches). I've also run the tests in single mode. results are attached.
[-- Attachment #2: massive-intr-200-300-0-1.txt --]
[-- Type: text/plain, Size: 3200 bytes --]
005989 00000751
006025 00000750
006060 00000750
006044 00000742
006009 00000750
005893 00000744
006035 00000746
005936 00000754
005929 00000750
006040 00000743
006042 00000749
006050 00000750
005997 00000749
005966 00000752
005982 00000739
006034 00000751
006067 00000748
005939 00000751
005950 00000754
005986 00000747
006058 00000755
005963 00000750
005920 00000748
005925 00000754
006001 00000751
005980 00000745
006011 00000754
005993 00000747
005992 00000748
005994 00000745
006008 00000748
005984 00000745
006054 00000741
005912 00000749
005965 00000756
005918 00000752
006078 00000755
006074 00000763
006023 00000751
006038 00000745
005969 00000744
005897 00000749
005938 00000749
006069 00000748
005959 00000750
005945 00000759
005998 00000747
006016 00000752
006022 00000746
005887 00000753
005898 00000754
005949 00000748
006049 00000743
006020 00000754
006046 00000744
006018 00000748
005922 00000746
005942 00000752
005944 00000748
006026 00000752
005931 00000750
005928 00000752
006047 00000743
006029 00000744
005977 00000744
006024 00000754
005985 00000743
005915 00000747
005905 00000751
005902 00000750
005894 00000753
006005 00000750
005971 00000745
006007 00000749
005884 00000752
005991 00000744
006056 00000749
006052 00000742
005919 00000748
006015 00000753
006076 00000755
006077 00000750
006045 00000745
005955 00000752
006075 00000759
006036 00000741
005933 00000748
005907 00000749
005935 00000749
006064 00000752
005957 00000746
005990 00000746
006033 00000753
006006 00000752
006066 00000750
005910 00000753
006057 00000753
005909 00000749
005903 00000753
005927 00000746
006061 00000746
006028 00000751
006013 00000750
005988 00000745
006012 00000754
005899 00000749
005981 00000749
006065 00000753
005924 00000750
006004 00000757
005953 00000749
005934 00000749
005926 00000749
005932 00000753
006037 00000748
005975 00000745
006000 00000742
005914 00000753
005947 00000752
005906 00000746
006079 00000757
006030 00000742
006073 00000752
006068 00000754
005892 00000745
006072 00000747
005970 00000746
005908 00000749
005946 00000755
006048 00000742
006021 00000746
006017 00000752
006043 00000745
005886 00000749
005923 00000748
005890 00000753
006019 00000752
006059 00000749
006003 00000748
005983 00000745
005956 00000746
005999 00000755
006039 00000749
006032 00000754
006055 00000749
005940 00000755
005962 00000749
005901 00000747
005976 00000747
005895 00000751
005937 00000749
005972 00000743
005921 00000751
005995 00000743
005941 00000757
005960 00000757
005987 00000751
005952 00000746
006063 00000750
005904 00000755
006071 00000754
005979 00000745
006002 00000751
005964 00000749
006051 00000746
006031 00000743
005978 00000745
005951 00000752
006027 00000751
005954 00000749
005917 00000748
005891 00000752
005889 00000751
006062 00000751
005948 00000751
005896 00000748
006010 00000753
005943 00000753
006053 00000742
005958 00000752
005961 00000751
005888 00000749
005968 00000751
005883 00000749
005881 00000751
005913 00000752
005967 00000746
005900 00000749
005911 00000748
005916 00000751
005930 00000749
005880 00000750
005973 00000752
006041 00000743
005996 00000747
006014 00000753
005974 00000737
005885 00000751
006070 00000757
005882 00000756
[-- Attachment #3: massive-intr-200-300-0-2.txt --]
[-- Type: text/plain, Size: 3008 bytes --]
006143 00000704
006259 00000750
006275 00000708
006203 00000704
006204 00000704
006138 00000708
006267 00000749
006276 00000704
006205 00000704
006139 00000751
006272 00000698
006142 00000705
006211 00000701
006168 00000793
006140 00000791
006223 00000751
006303 00000796
006190 00000753
006155 00000709
006310 00000788
006148 00000710
006307 00000794
006187 00000751
006314 00000790
006302 00000792
006240 00000793
006304 00000793
006141 00000753
006293 00000753
006270 00000751
006208 00000704
006199 00000741
006238 00000787
006196 00000703
006219 00000748
006251 00000751
006173 00000751
006184 00000751
006201 00000748
006135 00000706
006189 00000751
006130 00000794
006181 00000751
006278 00000703
006172 00000787
006289 00000751
006264 00000751
006200 00000748
006300 00000795
006287 00000751
006323 00000749
006318 00000748
006186 00000751
006160 00000748
006247 00000748
006222 00000751
006210 00000708
006217 00000753
006241 00000796
006224 00000748
006209 00000706
006144 00000706
006321 00000751
006319 00000751
006180 00000748
006249 00000748
006171 00000793
006198 00000749
006266 00000748
006295 00000745
006188 00000748
006291 00000748
006202 00000709
006297 00000792
006134 00000705
006316 00000752
006167 00000793
006218 00000746
006157 00000707
006299 00000790
006129 00000709
006283 00000745
006290 00000750
006268 00000706
006131 00000750
006147 00000703
006277 00000708
006263 00000752
006177 00000753
006252 00000748
006213 00000708
006166 00000792
006151 00000705
006235 00000788
006296 00000751
006298 00000789
006175 00000794
006260 00000751
006237 00000792
006322 00000750
006301 00000795
006179 00000750
006236 00000792
006317 00000750
006245 00000790
006282 00000752
006169 00000795
006214 00000705
006228 00000790
006162 00000750
006239 00000791
006232 00000792
006248 00000751
006279 00000705
006227 00000790
006281 00000707
006244 00000792
006269 00000742
006243 00000795
006250 00000753
006132 00000752
006225 00000750
006154 00000706
006193 00000752
006215 00000705
006182 00000750
006191 00000749
006185 00000750
006292 00000750
006150 00000705
006158 00000708
006161 00000753
006195 00000749
006246 00000753
006163 00000750
006261 00000750
006262 00000750
006285 00000750
006311 00000750
006288 00000750
006306 00000795
006256 00000751
006156 00000705
006231 00000796
006284 00000750
006221 00000749
006165 00000795
006265 00000750
006324 00000750
006230 00000795
006212 00000711
006229 00000796
006226 00000750
006164 00000744
006149 00000707
006159 00000750
006136 00000708
006192 00000747
006280 00000705
006176 00000752
006145 00000708
006178 00000750
006271 00000751
006320 00000749
006206 00000705
006328 00000746
006233 00000790
006255 00000751
006273 00000701
006325 00000750
006137 00000705
006294 00000754
006153 00000708
006258 00000748
006254 00000753
006174 00000795
006234 00000792
006315 00000791
006313 00000795
006309 00000789
006286 00000745
006170 00000794
006194 00000751
006220 00000753
006312 00000740
006253 00000745
006305 00000794
006326 00000705
[-- Attachment #4: massive-intr-200-300-0-3.txt --]
[-- Type: text/plain, Size: 3056 bytes --]
006450 00000740
006481 00000812
006490 00000807
006557 00000809
006392 00000723
006485 00000810
006442 00000734
006497 00000720
006537 00000735
006503 00000740
006391 00000719
006533 00000732
006426 00000717
006462 00000717
006407 00000717
006454 00000734
006394 00000724
006522 00000737
006536 00000737
006554 00000818
006577 00000739
006515 00000731
006435 00000730
006397 00000733
006584 00000736
006546 00000734
006457 00000737
006478 00000807
006575 00000737
006390 00000717
006402 00000720
006464 00000717
006580 00000742
006491 00000810
006502 00000735
006403 00000718
006439 00000735
006433 00000735
006507 00000741
006562 00000812
006540 00000735
006551 00000810
006559 00000814
006567 00000817
006447 00000734
006550 00000731
006535 00000737
006419 00000810
006429 00000737
006393 00000817
006440 00000735
006430 00000735
006477 00000815
006532 00000735
006547 00000737
006460 00000734
006387 00000738
006505 00000741
006572 00000737
006552 00000807
006563 00000812
006415 00000806
006512 00000734
006421 00000812
006431 00000728
006410 00000721
006467 00000716
006495 00000815
006422 00000809
006494 00000812
006560 00000812
006486 00000812
006506 00000739
006404 00000722
006474 00000724
006465 00000722
006466 00000719
006488 00000812
006508 00000737
006568 00000812
006555 00000811
006458 00000723
006443 00000741
006418 00000812
006428 00000737
006526 00000729
006498 00000719
006475 00000719
006564 00000803
006493 00000722
006420 00000722
006399 00000722
006416 00000815
006473 00000719
006479 00000818
006409 00000719
006500 00000737
006401 00000719
006438 00000737
006579 00000739
006523 00000739
006484 00000809
006414 00000719
006518 00000739
006545 00000728
006514 00000739
006456 00000740
006553 00000810
006400 00000719
006501 00000734
006492 00000722
006470 00000719
006558 00000811
006385 00000717
006469 00000722
006517 00000736
006427 00000719
006424 00000719
006483 00000814
006412 00000723
006556 00000811
006411 00000719
006437 00000737
006398 00000719
006388 00000737
006408 00000719
006417 00000816
006472 00000719
006423 00000719
006499 00000722
006453 00000735
006489 00000812
006524 00000734
006459 00000733
006487 00000817
006449 00000736
006455 00000736
006529 00000737
006425 00000724
006519 00000734
006576 00000736
006413 00000721
006510 00000722
006511 00000731
006565 00000814
006468 00000722
006541 00000742
006406 00000720
006538 00000739
006405 00000724
006531 00000737
006436 00000734
006471 00000718
006583 00000733
006396 00000718
006395 00000736
006452 00000738
006581 00000733
006578 00000736
006441 00000731
006569 00000815
006480 00000811
006386 00000813
006520 00000734
006570 00000735
006451 00000738
006543 00000736
006574 00000733
006509 00000738
006539 00000736
006530 00000734
006448 00000738
006463 00000724
006389 00000724
006571 00000733
006461 00000724
006504 00000738
006432 00000733
006534 00000739
006525 00000733
006513 00000736
006549 00000742
006434 00000736
006444 00000733
006446 00000738
006561 00000740
006496 00000811
006544 00000736
006521 00000733
006527 00000734
006516 00000733
[-- Attachment #5: massive-intr-200-300-0-4.txt --]
[-- Type: text/plain, Size: 3168 bytes --]
006839 00000762
006730 00000738
006799 00000748
006812 00000747
006870 00000733
006797 00000733
006903 00000745
006911 00000733
006880 00000750
006879 00000747
006766 00000762
006743 00000745
006759 00000759
006782 00000745
006859 00000736
006857 00000736
006908 00000733
006793 00000733
006830 00000759
006784 00000736
006794 00000733
006774 00000742
006750 00000748
006719 00000747
006721 00000738
006728 00000735
006787 00000730
006878 00000744
006831 00000759
006862 00000733
006738 00000756
006866 00000730
006828 00000748
006845 00000747
006856 00000733
006758 00000742
006795 00000730
006824 00000752
006736 00000756
006726 00000756
006744 00000751
006821 00000749
006729 00000749
006805 00000753
006901 00000752
006808 00000753
006846 00000746
006756 00000753
006896 00000753
006900 00000753
006884 00000750
006775 00000753
006883 00000753
006757 00000750
006804 00000750
006722 00000758
006780 00000750
006840 00000761
006898 00000750
006798 00000735
006740 00000759
006731 00000758
006848 00000752
006770 00000764
006843 00000761
006822 00000752
006841 00000761
006837 00000764
006889 00000750
006792 00000738
006915 00000738
006763 00000761
006790 00000738
006734 00000761
006783 00000735
006727 00000764
006833 00000761
006760 00000758
006868 00000738
006767 00000761
006852 00000748
006842 00000756
006850 00000752
006733 00000764
006720 00000746
006718 00000759
006748 00000749
006772 00000761
006855 00000749
006761 00000761
006809 00000750
006864 00000738
006789 00000735
006814 00000752
006865 00000735
006827 00000749
006869 00000746
006853 00000749
006771 00000758
006905 00000735
006820 00000749
006762 00000761
006815 00000749
006745 00000756
006777 00000747
006811 00000749
006810 00000747
006765 00000761
006807 00000747
006818 00000746
006891 00000750
006902 00000753
006800 00000735
006897 00000750
006904 00000750
006788 00000729
006803 00000747
006725 00000751
006802 00000732
006823 00000749
006894 00000747
006737 00000761
006890 00000747
006739 00000761
006895 00000744
006838 00000761
006886 00000747
006909 00000735
006834 00000761
006888 00000747
006791 00000735
006785 00000735
006916 00000735
006912 00000735
006796 00000735
006768 00000761
006723 00000758
006732 00000761
006844 00000753
006801 00000729
006906 00000732
006860 00000732
006914 00000732
006836 00000761
006832 00000761
006826 00000758
006849 00000748
006874 00000751
006769 00000761
006875 00000746
006835 00000755
006899 00000743
006917 00000731
006786 00000737
006742 00000748
006872 00000734
006881 00000748
006861 00000737
006825 00000751
006907 00000734
006819 00000751
006764 00000760
006892 00000752
006913 00000737
006741 00000763
006778 00000752
006873 00000729
006749 00000740
006751 00000749
006863 00000737
006773 00000761
006851 00000748
006781 00000752
006858 00000731
006854 00000748
006776 00000749
006724 00000763
006747 00000751
006885 00000752
006887 00000749
006847 00000740
006779 00000749
006754 00000752
006806 00000749
006893 00000749
006882 00000749
006813 00000744
006829 00000751
006867 00000734
006871 00000737
006816 00000752
006746 00000748
006817 00000748
006755 00000752
006877 00000751
006876 00000750
006910 00000749
006752 00000749
[-- Attachment #6: massive-intr-200-300-0-5.txt --]
[-- Type: text/plain, Size: 2848 bytes --]
007029 00000814
007028 00000708
007017 00000814
007143 00000752
007118 00000739
007060 00000755
006982 00000815
007055 00000734
007010 00000703
007110 00000739
007080 00000751
007099 00000708
007138 00000814
007116 00000736
007021 00000711
007034 00000814
007001 00000731
006998 00000707
006978 00000755
007106 00000710
007097 00000708
007172 00000750
006996 00000705
006993 00000708
006983 00000710
007119 00000741
007149 00000747
006985 00000713
006981 00000705
007153 00000750
007030 00000814
007134 00000816
007088 00000701
007127 00000814
007160 00000751
007013 00000821
007108 00000736
007105 00000708
007038 00000811
007166 00000709
007144 00000755
007014 00000811
007114 00000739
007022 00000696
006975 00000769
007039 00000811
007090 00000733
007025 00000709
007076 00000752
007173 00000751
007047 00000736
007072 00000752
006995 00000705
007007 00000739
007168 00000736
007044 00000820
007077 00000752
007065 00000755
006997 00000709
007102 00000711
007016 00000814
007032 00000811
007063 00000755
007082 00000749
006994 00000710
006990 00000712
007067 00000751
007120 00000742
007056 00000739
006984 00000707
007045 00000816
007093 00000712
007130 00000810
007048 00000736
007058 00000736
006987 00000712
007152 00000744
007112 00000733
007064 00000749
007086 00000754
007094 00000710
007154 00000755
006989 00000714
007174 00000748
007027 00000710
007089 00000733
007037 00000813
007142 00000819
006977 00000736
007000 00000745
007113 00000736
007003 00000735
006976 00000815
007040 00000738
007041 00000816
007167 00000733
007035 00000816
006979 00000707
007046 00000816
007020 00000707
006988 00000710
007006 00000738
006986 00000713
007075 00000735
006999 00000732
007155 00000749
007070 00000752
007117 00000730
007100 00000704
007124 00000739
007066 00000755
007078 00000752
007019 00000710
007074 00000752
007111 00000738
007151 00000758
007146 00000758
007015 00000816
006980 00000739
007126 00000733
007018 00000703
007071 00000756
007115 00000733
007095 00000704
007051 00000733
007147 00000748
007085 00000753
007079 00000746
007135 00000819
007024 00000709
007156 00000754
007098 00000706
007062 00000750
007125 00000743
007083 00000754
007092 00000704
007026 00000709
007091 00000712
007073 00000756
007031 00000711
007033 00000820
007104 00000712
007009 00000818
007011 00000813
007122 00000742
007164 00000706
007121 00000740
007023 00000713
007109 00000735
007171 00000752
007103 00000712
007169 00000735
007081 00000748
007136 00000813
007096 00000709
007061 00000754
007069 00000754
007140 00000815
007132 00000815
007043 00000818
007101 00000712
007005 00000735
007054 00000735
007057 00000735
007002 00000737
007148 00000754
007131 00000815
007087 00000754
007150 00000748
007159 00000751
007137 00000810
007128 00000812
007158 00000751
007050 00000735
007141 00000815
007129 00000819
007161 00000810
007036 00000816
[-- Attachment #7: massive-intr-200-300-16-1.txt --]
[-- Type: text/plain, Size: 3024 bytes --]
004379 00000798
004329 00000708
004434 00000753
004338 00000751
004421 00000770
004420 00000749
004354 00000754
004286 00000723
004454 00000798
004464 00000794
004341 00000753
004468 00000754
004357 00000755
004411 00000754
004459 00000797
004316 00000804
004386 00000797
004346 00000758
004452 00000794
004406 00000752
004335 00000752
004407 00000754
004426 00000770
004443 00000753
004437 00000752
004334 00000752
004424 00000752
004369 00000713
004282 00000711
004449 00000793
004430 00000753
004427 00000767
004377 00000795
004340 00000749
004344 00000754
004300 00000704
004293 00000709
004326 00000707
004450 00000796
004289 00000753
004399 00000753
004442 00000749
004383 00000796
004458 00000794
004455 00000798
004376 00000796
004279 00000797
004387 00000797
004422 00000763
004429 00000748
004396 00000707
004290 00000755
004299 00000709
004472 00000755
004327 00000702
004397 00000706
004315 00000793
004298 00000705
004444 00000754
004287 00000767
004339 00000751
004414 00000751
004456 00000797
004301 00000709
004417 00000753
004332 00000744
004419 00000751
004475 00000757
004410 00000753
004362 00000752
004336 00000755
004368 00000713
004391 00000793
004477 00000746
004381 00000796
004404 00000747
004378 00000793
004317 00000795
004370 00000718
004382 00000800
004313 00000710
004296 00000707
004433 00000751
004403 00000751
004408 00000751
004392 00000711
004371 00000708
004375 00000797
004325 00000715
004363 00000755
004305 00000707
004401 00000751
004440 00000707
004359 00000756
004409 00000751
004355 00000749
004312 00000794
004365 00000710
004413 00000759
004295 00000704
004337 00000749
004415 00000755
004453 00000796
004342 00000756
004441 00000758
004402 00000758
004310 00000795
004445 00000751
004314 00000704
004311 00000794
004431 00000750
004457 00000797
004331 00000754
004374 00000800
004330 00000756
004350 00000756
004473 00000753
004319 00000711
004466 00000752
004373 00000796
004285 00000715
004398 00000751
004320 00000712
004447 00000799
004476 00000751
004356 00000750
004423 00000757
004470 00000756
004451 00000795
004345 00000753
004460 00000795
004418 00000774
004307 00000711
004343 00000758
004303 00000713
004435 00000751
004352 00000748
004360 00000751
004353 00000754
004448 00000795
004280 00000757
004388 00000798
004281 00000760
004412 00000754
004302 00000710
004308 00000795
004361 00000709
004394 00000727
004385 00000797
004284 00000764
004367 00000706
004384 00000794
004291 00000713
004436 00000753
004372 00000709
004297 00000716
004333 00000750
004446 00000795
004439 00000757
004416 00000749
004463 00000753
004425 00000764
004471 00000756
004432 00000753
004461 00000795
004351 00000758
004465 00000799
004309 00000796
004324 00000710
004288 00000799
004322 00000710
004366 00000707
004306 00000710
004400 00000753
004283 00000714
004405 00000755
004364 00000752
004348 00000748
004278 00000706
004328 00000709
004318 00000709
004292 00000709
004323 00000709
004389 00000703
004321 00000709
004438 00000753
004469 00000754
004390 00000795
004358 00000751
[-- Attachment #8: massive-intr-200-300-16-2.txt --]
[-- Type: text/plain, Size: 3184 bytes --]
004756 00000741
004639 00000765
004710 00000762
004611 00000729
004678 00000732
004619 00000732
004612 00000729
004624 00000754
004708 00000765
004786 00000755
004692 00000742
004590 00000732
004739 00000767
004755 00000745
004763 00000767
004665 00000759
004610 00000734
004717 00000765
004629 00000750
004625 00000759
004637 00000762
004709 00000759
004603 00000729
004750 00000729
004615 00000729
004788 00000749
004702 00000762
004698 00000752
004744 00000754
004690 00000730
004627 00000756
004634 00000761
004764 00000761
004767 00000761
004604 00000736
004598 00000734
004594 00000736
004666 00000750
004601 00000751
004630 00000747
004772 00000762
004773 00000754
004753 00000750
004685 00000737
004683 00000737
004607 00000736
004614 00000735
004672 00000751
004771 00000766
004769 00000762
004770 00000767
004784 00000754
004759 00000748
004667 00000748
004617 00000731
004748 00000754
004689 00000734
004596 00000737
004677 00000734
004648 00000751
004694 00000737
004663 00000753
004715 00000761
004732 00000754
004602 00000755
004649 00000751
004727 00000751
004591 00000756
004654 00000754
004628 00000751
004735 00000754
004712 00000762
004691 00000734
004600 00000756
004593 00000751
004782 00000751
004731 00000751
004653 00000748
004778 00000751
004655 00000751
004675 00000757
004724 00000751
004696 00000756
004658 00000762
004674 00000751
004644 00000767
004729 00000748
004754 00000748
004734 00000753
004751 00000741
004668 00000751
004679 00000734
004609 00000730
004747 00000767
004618 00000734
004620 00000734
004657 00000748
004726 00000751
004722 00000751
004787 00000757
004718 00000758
004660 00000754
004697 00000754
004659 00000752
004743 00000752
004669 00000755
004643 00000759
004613 00000735
004703 00000767
004776 00000761
004682 00000734
004775 00000764
004662 00000744
004626 00000751
004642 00000764
004687 00000731
004746 00000734
004765 00000763
004707 00000761
004661 00000764
004664 00000764
004705 00000761
004592 00000767
004645 00000748
004621 00000736
004768 00000755
004681 00000734
004632 00000757
004741 00000749
004631 00000753
004758 00000750
004684 00000735
004595 00000735
004706 00000761
004789 00000750
004766 00000768
004740 00000756
004736 00000755
004597 00000769
004606 00000736
004761 00000756
004774 00000763
004701 00000753
004599 00000736
004737 00000753
004749 00000751
004623 00000738
004738 00000761
004760 00000753
004762 00000725
004633 00000767
004695 00000750
004616 00000736
004635 00000766
004783 00000750
004622 00000736
004714 00000768
004781 00000750
004686 00000737
004670 00000750
004733 00000750
004728 00000753
004608 00000733
004671 00000755
004716 00000760
004757 00000753
004713 00000766
004700 00000747
004742 00000756
004641 00000761
004636 00000765
004673 00000738
004640 00000760
004711 00000760
004785 00000751
004721 00000752
004720 00000754
004730 00000747
004725 00000750
004777 00000750
004719 00000748
004745 00000752
004651 00000753
004656 00000753
004650 00000753
004779 00000750
004647 00000750
004723 00000750
004780 00000750
004652 00000747
004646 00000750
004752 00000753
004688 00000737
004676 00000733
004699 00000747
004638 00000766
004693 00000732
004704 00000766
004680 00000736
[-- Attachment #9: massive-intr-200-300-16-3.txt --]
[-- Type: text/plain, Size: 2992 bytes --]
005000 00000758
004992 00000756
005004 00000762
005068 00000758
005045 00000757
004916 00000754
005035 00000752
004905 00000754
005026 00000754
004919 00000755
005028 00000745
004914 00000755
005062 00000759
005002 00000751
004877 00000755
004903 00000754
005034 00000756
004979 00000754
004909 00000755
005027 00000754
004891 00000755
004971 00000755
005060 00000754
005005 00000751
005013 00000756
005032 00000751
004973 00000755
004890 00000749
004946 00000755
004958 00000755
004933 00000761
004935 00000753
004953 00000755
004975 00000754
005020 00000748
004929 00000755
004901 00000755
004915 00000760
004981 00000749
004889 00000755
004949 00000758
004911 00000752
005017 00000753
004888 00000752
004881 00000757
005067 00000754
004908 00000752
004939 00000756
004904 00000749
004924 00000757
004964 00000756
004947 00000768
004986 00000757
005037 00000761
004988 00000753
004934 00000756
004932 00000753
004931 00000751
004876 00000759
004882 00000756
004956 00000763
005070 00000754
004999 00000754
005031 00000758
005051 00000748
004897 00000760
004900 00000749
005014 00000753
004945 00000754
004874 00000755
004893 00000746
005001 00000754
005029 00000757
005046 00000756
005054 00000753
005018 00000748
005059 00000756
004997 00000750
005052 00000753
004926 00000756
005003 00000754
005022 00000760
005024 00000748
005019 00000756
004948 00000758
004966 00000756
004991 00000749
005007 00000751
005015 00000750
004972 00000750
004907 00000760
005009 00000763
004880 00000754
004892 00000757
005047 00000754
004990 00000755
005033 00000753
004959 00000757
004963 00000746
005041 00000758
005023 00000753
005049 00000756
005006 00000754
004998 00000755
004980 00000759
005016 00000749
004894 00000757
004917 00000751
004965 00000750
004912 00000754
005053 00000754
004957 00000754
004921 00000756
004884 00000754
004936 00000757
004950 00000757
005066 00000755
004920 00000759
004902 00000753
004879 00000767
005021 00000750
004910 00000754
004922 00000761
005040 00000767
005042 00000755
005043 00000759
005058 00000755
004887 00000759
004873 00000755
004895 00000754
004906 00000754
004967 00000755
004878 00000751
004977 00000752
004974 00000748
004896 00000757
005008 00000755
004983 00000756
004962 00000758
004955 00000748
005064 00000752
004885 00000755
004995 00000758
004927 00000753
004913 00000757
004954 00000754
004952 00000758
004875 00000753
004925 00000758
004978 00000756
005069 00000756
005030 00000752
004984 00000752
005038 00000756
004996 00000749
004883 00000758
004942 00000752
004923 00000758
004993 00000760
004938 00000761
005025 00000750
004930 00000752
004944 00000757
005011 00000750
004994 00000755
004989 00000752
004968 00000752
004886 00000754
004937 00000755
005012 00000751
004928 00000750
005065 00000752
005048 00000756
004940 00000755
004970 00000750
004976 00000752
004969 00000755
004960 00000756
004951 00000756
005039 00000760
005010 00000755
004872 00000756
005036 00000753
005063 00000752
004987 00000760
004941 00000749
004898 00000761
[-- Attachment #10: massive-intr-200-300-16-4.txt --]
[-- Type: text/plain, Size: 3024 bytes --]
005154 00000794
005194 00000755
005166 00000800
005270 00000760
005125 00000798
005156 00000797
005283 00000800
005158 00000800
005221 00000802
005184 00000755
005281 00000800
005132 00000675
005287 00000800
005278 00000800
005144 00000673
005267 00000752
005135 00000672
005120 00000667
005137 00000671
005251 00000673
005274 00000667
005209 00000673
005118 00000756
005262 00000755
005186 00000751
005249 00000667
005208 00000671
005196 00000752
005276 00000810
005254 00000755
005164 00000805
005195 00000757
005177 00000673
005139 00000672
005189 00000751
005233 00000799
005121 00000670
005204 00000670
005150 00000670
005201 00000670
005222 00000799
005250 00000673
005225 00000797
005178 00000670
005155 00000800
005286 00000796
005246 00000799
005242 00000802
005141 00000666
005292 00000799
005200 00000673
005129 00000676
005294 00000799
005240 00000799
005231 00000802
005172 00000799
005167 00000804
005310 00000754
005252 00000752
005187 00000754
005145 00000675
005289 00000804
005295 00000799
005235 00000799
005191 00000754
005169 00000793
005298 00000802
005170 00000799
005304 00000798
005264 00000757
005152 00000801
005216 00000797
005232 00000793
005243 00000798
005237 00000802
005279 00000802
005117 00000798
005159 00000796
005290 00000803
005285 00000801
005226 00000802
005313 00000754
005280 00000804
005192 00000757
005284 00000796
005149 00000670
005253 00000758
005185 00000753
005130 00000669
005272 00000672
005199 00000759
005151 00000805
005179 00000675
005140 00000669
005198 00000756
005180 00000671
005220 00000802
005269 00000749
005239 00000675
005307 00000754
005183 00000754
005259 00000757
005247 00000805
005275 00000757
005126 00000806
005215 00000802
005122 00000672
005248 00000669
005148 00000675
005277 00000802
005165 00000806
005223 00000799
005124 00000762
005217 00000801
005311 00000754
005142 00000669
005265 00000754
005263 00000750
005160 00000799
005300 00000801
005147 00000672
005175 00000801
005210 00000671
005181 00000672
005266 00000676
005271 00000669
005188 00000751
005203 00000671
005256 00000751
005206 00000672
005197 00000754
005127 00000764
005205 00000675
005161 00000803
005134 00000675
005182 00000754
005245 00000796
005190 00000756
005173 00000789
005282 00000798
005207 00000669
005255 00000745
005301 00000798
005309 00000756
005123 00000758
005143 00000674
005163 00000798
005306 00000752
005229 00000801
005273 00000672
005116 00000805
005261 00000751
005168 00000801
005303 00000796
005162 00000802
005115 00000675
005224 00000804
005308 00000750
005238 00000790
005302 00000798
005213 00000801
005171 00000801
005234 00000798
005257 00000754
005244 00000794
005297 00000798
005314 00000758
005202 00000669
005153 00000801
005291 00000804
005312 00000674
005219 00000799
005299 00000753
005212 00000794
005136 00000669
005214 00000799
005293 00000798
005176 00000796
005236 00000801
005288 00000801
005228 00000806
005174 00000798
005131 00000674
005211 00000801
005157 00000804
005218 00000798
005119 00000674
005138 00000675
005146 00000672
[-- Attachment #11: massive-intr-200-300-16-5.txt --]
[-- Type: text/plain, Size: 3200 bytes --]
005431 00000751
005528 00000717
005461 00000751
005400 00000723
005408 00000751
005527 00000720
005496 00000751
005474 00000723
005546 00000751
005526 00000720
005370 00000752
005536 00000717
005545 00000751
005535 00000720
005453 00000751
005478 00000720
005498 00000751
005468 00000720
005467 00000720
005397 00000720
005413 00000752
005501 00000717
005421 00000749
005497 00000751
005484 00000749
005520 00000720
005499 00000751
005521 00000720
005430 00000749
005477 00000720
005388 00000749
005500 00000717
005389 00000743
005471 00000720
005479 00000746
005490 00000749
005393 00000749
005373 00000749
005542 00000751
005409 00000748
005424 00000746
005403 00000751
005493 00000749
005455 00000748
005494 00000746
005405 00000748
005375 00000749
005417 00000749
005487 00000746
005433 00000748
005418 00000743
005390 00000752
005541 00000745
005416 00000746
005404 00000748
005426 00000752
005551 00000745
005407 00000748
005544 00000748
005491 00000749
005427 00000746
005382 00000746
005486 00000746
005518 00000777
005406 00000751
005387 00000751
005415 00000751
005516 00000780
005419 00000748
005552 00000750
005384 00000751
005549 00000753
005495 00000751
005464 00000750
005383 00000751
005386 00000748
005391 00000751
005379 00000754
005429 00000751
005449 00000753
005456 00000750
005414 00000748
005448 00000780
005457 00000750
005537 00000747
005432 00000750
005476 00000719
005376 00000781
005556 00000725
005462 00000750
005369 00000726
005559 00000777
005434 00000753
005460 00000750
005502 00000719
005561 00000777
005538 00000750
005514 00000777
005452 00000780
005463 00000753
005505 00000777
005392 00000748
005488 00000748
005492 00000751
005368 00000748
005374 00000751
005420 00000751
005543 00000753
005425 00000748
005435 00000722
005458 00000755
005402 00000722
005540 00000752
005399 00000725
005398 00000722
005454 00000750
005459 00000750
005548 00000750
005428 00000748
005534 00000722
005436 00000780
005395 00000722
005506 00000780
005439 00000780
005554 00000747
005446 00000783
005412 00000744
005553 00000722
005539 00000749
005547 00000755
005475 00000722
005550 00000755
005411 00000755
005445 00000782
005444 00000722
005532 00000725
005558 00000783
005372 00000748
005563 00000780
005485 00000751
005385 00000748
005381 00000742
005489 00000748
005423 00000745
005422 00000745
005410 00000755
005530 00000724
005380 00000773
005470 00000721
005465 00000755
005481 00000721
005562 00000782
005524 00000724
005529 00000721
005480 00000724
005557 00000776
005483 00000724
005564 00000779
005394 00000724
005482 00000724
005437 00000779
005466 00000721
005451 00000776
005531 00000724
005438 00000773
005401 00000721
005473 00000724
005469 00000721
005522 00000783
005523 00000782
005443 00000782
005533 00000721
005560 00000779
005513 00000779
005555 00000721
005519 00000782
005511 00000779
005510 00000779
005525 00000721
005517 00000782
005504 00000782
005515 00000779
005441 00000776
005509 00000776
005450 00000782
005567 00000781
005371 00000780
005508 00000782
005377 00000781
005566 00000779
005440 00000779
005507 00000779
005447 00000779
005512 00000776
005503 00000721
005472 00000721
005396 00000721
005378 00000719
005565 00000779
005442 00000782
[-- Attachment #12: massive-intr-200-300-single-0-1.txt --]
[-- Type: text/plain, Size: 3104 bytes --]
003648 00000745
003807 00000743
003802 00000751
003718 00000751
003738 00000747
003790 00000758
003737 00000747
003725 00000749
003653 00000749
003774 00000749
003704 00000747
003674 00000742
003655 00000745
003680 00000751
003741 00000745
003688 00000751
003818 00000751
003679 00000742
003684 00000752
003633 00000750
003787 00000753
003736 00000750
003755 00000747
003687 00000739
003804 00000741
003715 00000757
003627 00000748
003682 00000745
003637 00000763
003641 00000745
003733 00000753
003647 00000744
003623 00000745
003720 00000749
003659 00000745
003667 00000751
003771 00000752
003759 00000748
003817 00000745
003628 00000750
003677 00000742
003803 00000752
003678 00000741
003791 00000749
003779 00000746
003689 00000754
003761 00000750
003739 00000748
003768 00000745
003640 00000759
003652 00000748
003726 00000748
003728 00000754
003806 00000748
003793 00000748
003716 00000745
003696 00000743
003729 00000751
003675 00000750
003756 00000745
003809 00000748
003631 00000749
003723 00000761
003658 00000751
003767 00000745
003781 00000748
003766 00000749
003815 00000749
003676 00000754
003664 00000745
003673 00000744
003681 00000752
003748 00000740
003722 00000753
003644 00000752
003763 00000748
003669 00000750
003789 00000759
003777 00000756
003712 00000749
003649 00000751
003643 00000750
003724 00000753
003780 00000745
003660 00000745
003821 00000746
003770 00000742
003626 00000746
003693 00000748
003782 00000749
003776 00000751
003735 00000748
003812 00000748
003775 00000748
003686 00000750
003683 00000750
003703 00000748
003749 00000749
003638 00000747
003745 00000745
003711 00000741
003706 00000750
003629 00000749
003753 00000746
003765 00000742
003710 00000745
003813 00000749
003799 00000751
003820 00000754
003708 00000745
003690 00000747
003800 00000748
003634 00000749
003646 00000747
003747 00000747
003672 00000747
003639 00000741
003707 00000753
003642 00000746
003656 00000744
003814 00000742
003702 00000759
003746 00000748
003685 00000755
003760 00000749
003751 00000756
003666 00000756
003645 00000750
003750 00000749
003719 00000751
003783 00000747
003792 00000753
003727 00000743
003731 00000747
003754 00000756
003808 00000759
003805 00000750
003671 00000754
003786 00000743
003757 00000747
003811 00000749
003625 00000745
003795 00000757
003661 00000754
003801 00000747
003709 00000752
003798 00000759
003650 00000750
003740 00000752
003794 00000747
003670 00000747
003694 00000752
003714 00000744
003721 00000747
003668 00000743
003784 00000748
003822 00000750
003698 00000747
003636 00000745
003691 00000750
003624 00000746
003772 00000750
003654 00000747
003743 00000749
003788 00000744
003732 00000747
003713 00000747
003717 00000749
003810 00000754
003744 00000747
003730 00000747
003758 00000748
003797 00000758
003632 00000753
003700 00000744
003816 00000750
003764 00000752
003778 00000747
003742 00000748
003695 00000747
003762 00000744
003769 00000748
003630 00000749
003662 00000747
003705 00000750
003697 00000747
003773 00000747
003665 00000745
003752 00000747
003635 00000748
003734 00000747
003663 00000747
003785 00000752
003701 00000750
[-- Attachment #13: massive-intr-200-300-single-0-2.txt --]
[-- Type: text/plain, Size: 3184 bytes --]
003856 00000762
004019 00000759
003943 00000764
003833 00000765
003860 00000763
003941 00000761
003987 00000755
004003 00000761
004011 00000760
003848 00000763
003927 00000752
004016 00000762
004004 00000762
003890 00000764
003959 00000762
003958 00000762
003896 00000759
003824 00000760
003925 00000760
003838 00000763
003924 00000760
003832 00000760
003930 00000760
003994 00000761
003981 00000761
003919 00000760
003909 00000763
003827 00000763
003949 00000759
003857 00000759
004002 00000759
003898 00000759
003918 00000758
003954 00000762
004021 00000759
004008 00000761
004009 00000759
003861 00000759
003946 00000758
003892 00000762
003906 00000759
003953 00000762
003923 00000761
004012 00000759
003895 00000759
003836 00000760
004005 00000758
004018 00000761
003929 00000760
003831 00000760
003865 00000756
003843 00000760
003837 00000760
003899 00000761
003905 00000758
003876 00000759
003957 00000759
003967 00000756
004022 00000761
003887 00000759
003951 00000750
003884 00000759
003932 00000758
003863 00000756
003931 00000760
003992 00000760
003877 00000761
003976 00000765
003886 00000759
003983 00000761
003882 00000756
003835 00000768
003948 00000758
003942 00000760
003917 00000759
003850 00000757
003864 00000761
004006 00000753
003933 00000758
003997 00000758
003989 00000760
003995 00000758
004010 00000758
003950 00000764
003883 00000750
003980 00000755
003937 00000754
003955 00000761
003936 00000755
003866 00000763
003859 00000761
003826 00000766
003970 00000761
003849 00000761
003922 00000757
003879 00000753
003889 00000761
003984 00000755
003870 00000757
003973 00000764
003851 00000761
003858 00000765
004000 00000762
003862 00000764
003974 00000765
003972 00000758
003962 00000762
003999 00000761
003964 00000753
003966 00000764
004007 00000765
003963 00000761
003916 00000762
003934 00000757
004001 00000760
003910 00000762
003915 00000759
003947 00000761
003894 00000761
003971 00000758
003891 00000761
003829 00000761
003846 00000763
003908 00000758
003834 00000765
003828 00000759
003998 00000763
003847 00000762
003912 00000762
004014 00000766
003977 00000757
003901 00000762
003839 00000758
003975 00000760
003872 00000761
003871 00000760
003996 00000755
003986 00000754
003979 00000759
003928 00000760
003920 00000762
003993 00000760
003926 00000762
003852 00000759
003854 00000755
004017 00000763
003902 00000758
003842 00000761
003888 00000760
003991 00000759
003978 00000763
003893 00000761
003990 00000757
003944 00000760
003880 00000760
003969 00000760
003874 00000758
003869 00000757
004013 00000758
003878 00000758
003830 00000760
003961 00000754
004020 00000763
003881 00000764
003825 00000760
003900 00000760
003885 00000759
003903 00000759
004023 00000762
003907 00000756
003867 00000755
003938 00000758
003853 00000756
003940 00000757
003873 00000752
003913 00000758
003855 00000759
003845 00000759
003921 00000758
003982 00000757
003868 00000758
003875 00000760
003935 00000760
003945 00000767
003914 00000759
003841 00000759
003911 00000764
003960 00000763
003952 00000760
003985 00000756
003968 00000763
003988 00000762
003939 00000757
003956 00000763
004015 00000758
003897 00000758
003840 00000756
003904 00000759
003844 00000761
[-- Attachment #14: massive-intr-200-300-single-0-3.txt --]
[-- Type: text/plain, Size: 2928 bytes --]
004151 00000773
004064 00000747
004113 00000770
004125 00000744
004120 00000744
004202 00000744
004204 00000744
004169 00000767
004130 00000744
004167 00000762
004220 00000762
004122 00000744
004217 00000760
004110 00000757
004075 00000763
004079 00000760
004175 00000770
004211 00000744
004198 00000736
004040 00000763
004026 00000745
004029 00000760
004194 00000744
004105 00000763
004207 00000744
004158 00000762
004031 00000763
004189 00000744
004098 00000762
004088 00000762
004170 00000762
004085 00000762
004036 00000762
004191 00000744
004121 00000744
004176 00000767
004197 00000748
004068 00000760
004063 00000760
004101 00000760
004135 00000740
004083 00000759
004223 00000759
004190 00000749
004089 00000761
004043 00000760
004082 00000759
004112 00000760
004070 00000760
004081 00000760
004116 00000775
004033 00000760
004032 00000760
004117 00000775
004078 00000760
004106 00000759
004038 00000762
004077 00000757
004187 00000775
004212 00000759
004146 00000775
004114 00000775
004172 00000775
004086 00000763
004216 00000756
004076 00000757
004180 00000775
004100 00000757
004177 00000775
004094 00000756
004108 00000757
004134 00000738
004118 00000749
004124 00000743
004042 00000762
004200 00000749
004062 00000751
004168 00000770
004148 00000774
004143 00000772
004179 00000775
004053 00000774
004054 00000774
004025 00000757
004129 00000775
004140 00000775
004139 00000772
004066 00000754
004145 00000772
004192 00000743
004133 00000746
004141 00000769
004174 00000772
004149 00000772
004132 00000749
004060 00000747
004071 00000762
004195 00000746
004080 00000754
004206 00000746
004059 00000746
004196 00000746
004142 00000769
004057 00000746
004065 00000746
004193 00000746
004127 00000746
004096 00000764
004205 00000746
004035 00000747
004052 00000777
004182 00000772
004051 00000777
004049 00000777
004157 00000753
004115 00000772
004048 00000771
004107 00000762
004061 00000746
004137 00000776
004153 00000763
004161 00000764
004119 00000746
004090 00000756
004046 00000762
004201 00000743
004163 00000764
004044 00000762
004213 00000761
004047 00000762
004099 00000761
004109 00000762
004178 00000769
004181 00000777
004039 00000756
004041 00000761
004030 00000762
004209 00000743
004072 00000762
004111 00000751
004104 00000763
004152 00000758
004186 00000761
004092 00000758
004102 00000756
004128 00000747
004155 00000763
004073 00000755
004159 00000752
004103 00000759
004027 00000767
004160 00000764
004069 00000762
004097 00000756
004074 00000756
004171 00000761
004067 00000774
004203 00000748
004136 00000748
004199 00000749
004164 00000759
004131 00000748
004222 00000758
004221 00000762
004056 00000771
004224 00000760
004208 00000745
004188 00000771
004156 00000762
004154 00000758
004185 00000774
004058 00000745
004138 00000774
004184 00000774
004162 00000755
004123 00000744
004037 00000756
004214 00000761
004050 00000771
004126 00000748
004215 00000759
004147 00000774
004183 00000772
[-- Attachment #15: massive-intr-200-300-single-0-4.txt --]
[-- Type: text/plain, Size: 3184 bytes --]
004318 00000754
004230 00000754
004306 00000780
004390 00000782
004329 00000754
004342 00000731
004315 00000751
004255 00000754
004229 00000751
004309 00000751
004247 00000751
004250 00000751
004235 00000748
004232 00000751
004314 00000751
004327 00000748
004319 00000751
004242 00000748
004316 00000751
004322 00000748
004331 00000748
004243 00000748
004251 00000751
004234 00000748
004302 00000782
004325 00000745
004324 00000753
004281 00000779
004298 00000779
004391 00000779
004340 00000782
004394 00000779
004337 00000782
004383 00000782
004274 00000776
004305 00000779
004303 00000776
004273 00000776
004332 00000782
004419 00000737
004381 00000779
004411 00000737
004352 00000740
004263 00000740
004382 00000779
004343 00000740
004252 00000742
004276 00000784
004344 00000740
004351 00000740
004339 00000779
004393 00000779
004311 00000750
004313 00000753
004246 00000753
004326 00000753
004249 00000753
004335 00000779
004256 00000750
004240 00000750
004328 00000750
004254 00000750
004375 00000734
004330 00000753
004345 00000737
004286 00000740
004282 00000781
004237 00000739
004409 00000737
004403 00000731
004421 00000734
004377 00000737
004376 00000737
004346 00000734
004257 00000737
004262 00000734
004410 00000737
004404 00000737
004401 00000734
004308 00000734
004283 00000734
004350 00000737
004307 00000734
004386 00000783
004424 00000740
004236 00000739
004361 00000737
004360 00000740
004388 00000778
004413 00000737
004293 00000734
004416 00000737
004422 00000737
004300 00000734
004301 00000734
004294 00000734
004299 00000734
004414 00000737
004296 00000734
004423 00000734
004364 00000734
004291 00000731
004363 00000734
004295 00000737
004341 00000737
004399 00000731
004267 00000728
004259 00000734
004420 00000740
004359 00000737
004356 00000737
004287 00000778
004285 00000737
004397 00000784
004317 00000750
004275 00000781
004244 00000753
004277 00000781
004226 00000750
004270 00000781
004323 00000750
004389 00000781
004253 00000746
004312 00000750
004320 00000750
004333 00000781
004380 00000778
004238 00000750
004387 00000784
004310 00000750
004378 00000784
004241 00000747
004239 00000750
004248 00000747
004396 00000775
004336 00000778
004395 00000781
004279 00000775
004321 00000755
004392 00000781
004228 00000739
004338 00000781
004358 00000739
004245 00000752
004271 00000783
004297 00000739
004231 00000755
004304 00000752
004264 00000736
004384 00000778
004353 00000739
004412 00000739
004385 00000778
004261 00000736
004292 00000739
004407 00000739
004369 00000739
004357 00000736
004367 00000739
004379 00000739
004289 00000736
004425 00000735
004373 00000736
004374 00000739
004355 00000736
004362 00000733
004284 00000739
004227 00000741
004405 00000733
004268 00000739
004258 00000736
004280 00000777
004266 00000736
004272 00000783
004334 00000778
004278 00000783
004368 00000739
004269 00000777
004400 00000739
004372 00000736
004354 00000736
004290 00000733
004288 00000736
004406 00000736
004408 00000736
004348 00000736
004349 00000736
004402 00000739
004347 00000739
004260 00000736
004265 00000733
004370 00000739
004371 00000739
004415 00000739
004365 00000739
004418 00000736
004233 00000741
004417 00000738
004398 00000739
[-- Attachment #16: massive-intr-200-300-single-0-5.txt --]
[-- Type: text/plain, Size: 3152 bytes --]
004532 00000752
004557 00000747
004592 00000731
004545 00000734
004604 00000748
004575 00000768
004540 00000752
004499 00000765
004451 00000749
004556 00000745
004444 00000752
004457 00000749
004493 00000748
004531 00000749
004490 00000751
004538 00000749
004491 00000751
004446 00000749
004577 00000768
004485 00000751
004460 00000752
004626 00000764
004615 00000751
004458 00000749
004561 00000751
004455 00000749
004562 00000748
004442 00000749
004553 00000748
004486 00000751
004605 00000751
004488 00000748
004464 00000749
004606 00000748
004529 00000749
004463 00000749
004550 00000751
004459 00000749
004609 00000748
004536 00000749
004492 00000745
004456 00000746
004618 00000765
004612 00000748
004487 00000745
004469 00000731
004427 00000746
004614 00000748
004452 00000746
004489 00000748
004520 00000749
004500 00000762
004434 00000743
004439 00000743
004449 00000749
004533 00000746
004528 00000743
004576 00000765
004454 00000746
004616 00000762
004438 00000763
004513 00000731
004580 00000765
004595 00000734
004515 00000734
004525 00000734
004586 00000731
004624 00000765
004587 00000731
004509 00000765
004506 00000731
004497 00000765
004521 00000731
004623 00000765
004522 00000731
004569 00000765
004571 00000765
004582 00000765
004503 00000765
004597 00000731
004526 00000733
004620 00000762
004570 00000765
004568 00000765
004473 00000731
004574 00000762
004428 00000766
004566 00000765
004504 00000765
004502 00000762
004567 00000762
004470 00000731
004518 00000734
004590 00000734
004517 00000734
004477 00000734
004583 00000765
004437 00000766
004578 00000762
004496 00000762
004494 00000765
004523 00000728
004598 00000731
004519 00000731
004622 00000762
004471 00000736
004621 00000762
004572 00000762
004498 00000762
004514 00000733
004596 00000725
004565 00000762
004534 00000751
004453 00000748
004481 00000753
004465 00000751
004613 00000750
004527 00000751
004603 00000753
004430 00000751
004482 00000750
004440 00000751
004552 00000753
004448 00000751
004429 00000751
004467 00000748
004484 00000750
004431 00000751
004512 00000750
004539 00000751
004511 00000767
004524 00000733
004450 00000748
004558 00000750
004584 00000753
004555 00000750
004472 00000733
004610 00000750
004611 00000750
004559 00000750
004433 00000748
004495 00000767
004436 00000763
004619 00000767
004560 00000750
004432 00000751
004625 00000764
004443 00000745
004483 00000750
004551 00000747
004549 00000744
004535 00000736
004480 00000747
004601 00000733
004510 00000764
004508 00000764
004594 00000724
004579 00000764
004501 00000766
004507 00000764
004505 00000730
004461 00000748
004530 00000751
004589 00000730
004462 00000748
004542 00000730
004537 00000748
004468 00000748
004573 00000764
004554 00000750
004516 00000736
004474 00000733
004479 00000733
004564 00000750
004548 00000747
004607 00000744
004581 00000733
004476 00000730
004478 00000733
004543 00000733
004602 00000733
004600 00000730
004546 00000736
004593 00000730
004466 00000745
004563 00000747
004608 00000747
004435 00000751
004585 00000733
004541 00000733
004475 00000736
004599 00000733
004547 00000733
004544 00000748
004591 00000733
004617 00000764
004588 00000730
[-- Attachment #17: massive-intr-200-300-single-16-1.txt --]
[-- Type: text/plain, Size: 3088 bytes --]
003655 00000743
003642 00000746
003740 00000743
003734 00000743
003635 00000746
003760 00000743
003772 00000746
003659 00000743
003761 00000743
003670 00000742
003671 00000747
003713 00000745
003708 00000747
003594 00000745
003747 00000742
003751 00000748
003581 00000742
003691 00000745
003583 00000748
003632 00000748
003626 00000742
003623 00000748
003631 00000748
003728 00000751
003620 00000745
003692 00000745
003615 00000742
003698 00000748
003622 00000748
003628 00000747
003683 00000748
003775 00000742
003627 00000745
003582 00000745
003595 00000745
003598 00000745
003593 00000745
003614 00000742
003589 00000745
003630 00000742
003619 00000742
003675 00000751
003610 00000745
003748 00000751
003755 00000748
003710 00000751
003693 00000746
003608 00000751
003705 00000747
003668 00000748
003678 00000748
003765 00000748
003607 00000748
003681 00000748
003649 00000751
003707 00000748
003767 00000748
003639 00000751
003664 00000745
003648 00000748
003717 00000748
003732 00000745
003742 00000745
003611 00000742
003718 00000748
003599 00000751
003720 00000748
003745 00000748
003735 00000748
003744 00000745
003646 00000751
003666 00000748
003730 00000751
003729 00000748
003677 00000742
003731 00000745
003665 00000748
003716 00000742
003584 00000749
003758 00000748
003762 00000748
003684 00000748
003577 00000746
003644 00000745
003737 00000748
003725 00000745
003601 00000748
003637 00000748
003752 00000742
003602 00000748
003634 00000745
003722 00000748
003689 00000747
003600 00000748
003690 00000750
003768 00000748
003597 00000747
003660 00000745
003652 00000745
003596 00000747
003654 00000745
003687 00000747
003771 00000745
003591 00000747
003661 00000742
003629 00000747
003769 00000750
003688 00000750
003694 00000750
003700 00000747
003585 00000749
003727 00000750
003695 00000744
003612 00000747
003699 00000744
003590 00000747
003697 00000747
003576 00000747
003617 00000747
003773 00000750
003616 00000747
003580 00000744
003592 00000747
003696 00000747
003618 00000744
003667 00000747
003621 00000745
003625 00000746
003603 00000750
003754 00000750
003757 00000750
003741 00000747
003636 00000747
003759 00000750
003679 00000742
003749 00000750
003674 00000744
003609 00000750
003633 00000750
003624 00000747
003686 00000749
003604 00000747
003706 00000750
003743 00000750
003714 00000750
003641 00000750
003709 00000749
003766 00000747
003613 00000747
003672 00000747
003680 00000744
003676 00000744
003719 00000747
003669 00000747
003721 00000744
003763 00000747
003588 00000748
003770 00000744
003579 00000751
003738 00000747
003704 00000752
003712 00000747
003653 00000747
003651 00000747
003662 00000747
003774 00000747
003723 00000750
003657 00000747
003587 00000749
003656 00000750
003733 00000750
003650 00000750
003658 00000747
003578 00000750
003753 00000747
003711 00000750
003673 00000750
003682 00000750
003647 00000748
003638 00000750
003643 00000750
003640 00000750
003586 00000751
003746 00000747
003750 00000747
003702 00000750
003703 00000747
003739 00000747
003736 00000750
003724 00000750
003645 00000747
003701 00000750
003605 00000742
[-- Attachment #18: massive-intr-200-300-single-16-2.txt --]
[-- Type: text/plain, Size: 3072 bytes --]
003938 00000750
003899 00000748
003894 00000748
003960 00000748
003949 00000751
003788 00000768
003942 00000745
003895 00000752
003803 00000750
003848 00000747
003874 00000750
003837 00000748
003838 00000752
003906 00000753
003780 00000752
003777 00000750
003903 00000752
003856 00000753
003854 00000756
003846 00000751
003925 00000758
003897 00000752
003941 00000763
003873 00000752
003945 00000753
003883 00000749
003816 00000750
003811 00000779
003863 00000748
003905 00000751
003809 00000752
003794 00000754
003952 00000761
003957 00000755
003861 00000744
003958 00000754
003931 00000763
003881 00000749
003829 00000758
003847 00000749
003922 00000748
003807 00000749
003896 00000742
003973 00000751
003872 00000751
003909 00000751
003951 00000758
003783 00000762
003831 00000758
003857 00000763
003976 00000760
003806 00000762
003923 00000751
003959 00000750
003950 00000749
003804 00000743
003865 00000751
003928 00000757
003904 00000748
003933 00000748
003915 00000753
003805 00000754
003802 00000750
003790 00000749
003962 00000754
003849 00000756
003936 00000756
003778 00000751
003878 00000743
003880 00000752
003830 00000751
003963 00000753
003886 00000740
003851 00000761
003835 00000745
003912 00000751
003914 00000745
003785 00000749
003934 00000750
003796 00000751
003824 00000773
003926 00000776
003901 00000750
003789 00000755
003953 00000751
003882 00000758
003916 00000744
003907 00000752
003970 00000750
003891 00000750
003795 00000754
003823 00000751
003917 00000753
003797 00000754
003969 00000751
003910 00000757
003862 00000762
003964 00000751
003875 00000754
003813 00000768
003828 00000751
003841 00000760
003834 00000752
003843 00000752
003844 00000762
003937 00000750
003911 00000751
003839 00000759
003975 00000748
003940 00000754
003859 00000746
003946 00000755
003850 00000750
003887 00000765
003827 00000753
003939 00000745
003929 00000769
003822 00000747
003799 00000747
003852 00000755
003871 00000751
003893 00000756
003888 00000755
003877 00000752
003930 00000762
003892 00000750
003927 00000760
003932 00000753
003812 00000756
003974 00000750
003858 00000750
003924 00000762
003868 00000750
003918 00000763
003870 00000753
003972 00000761
003819 00000753
003944 00000770
003820 00000754
003792 00000752
003921 00000763
003801 00000754
003853 00000754
003866 00000744
003947 00000751
003955 00000748
003876 00000751
003967 00000760
003814 00000753
003965 00000745
003860 00000753
003908 00000763
003889 00000753
003825 00000767
003913 00000763
003968 00000750
003956 00000753
003793 00000749
003840 00000747
003845 00000752
003919 00000749
003898 00000750
003855 00000743
003885 00000756
003782 00000746
003821 00000751
003784 00000752
003786 00000750
003818 00000757
003900 00000756
003884 00000748
003842 00000766
003833 00000753
003798 00000759
003961 00000747
003948 00000749
003902 00000747
003815 00000751
003966 00000755
003920 00000766
003879 00000752
003943 00000751
003808 00000764
003787 00000769
003954 00000750
003781 00000749
003890 00000765
003810 00000749
003869 00000752
003817 00000747
003800 00000748
003832 00000760
[-- Attachment #19: massive-intr-200-300-single-16-3.txt --]
[-- Type: text/plain, Size: 3200 bytes --]
004003 00000756
003992 00000756
004041 00000753
004070 00000756
004079 00000774
004140 00000774
004062 00000756
003993 00000756
004045 00000745
004016 00000769
003978 00000753
004101 00000771
004026 00000769
004015 00000760
004085 00000752
004046 00000745
004012 00000760
004124 00000771
004074 00000771
004102 00000768
004072 00000761
004021 00000769
004004 00000761
004025 00000769
004151 00000754
004130 00000771
004023 00000774
004137 00000771
004076 00000771
004013 00000760
004156 00000757
004036 00000761
004068 00000758
004134 00000770
004090 00000751
004105 00000768
004014 00000760
004161 00000757
004125 00000768
004153 00000760
004127 00000742
003980 00000746
004115 00000745
004107 00000768
004030 00000757
004031 00000760
003986 00000745
004044 00000745
004118 00000745
004171 00000745
004162 00000742
004096 00000757
004122 00000742
004020 00000774
004114 00000745
004129 00000768
004083 00000760
004136 00000768
004027 00000754
004018 00000766
004069 00000758
003981 00000758
004029 00000754
004042 00000745
004058 00000742
004111 00000745
004177 00000744
004106 00000768
004168 00000757
004024 00000774
003994 00000758
004098 00000768
004048 00000742
004109 00000742
004172 00000742
004011 00000760
004160 00000757
004147 00000760
004164 00000757
003979 00000762
004053 00000742
004110 00000742
004176 00000742
004032 00000755
004055 00000742
003991 00000758
004173 00000742
004049 00000739
004169 00000757
004082 00000757
004040 00000758
004175 00000739
004159 00000757
004035 00000758
004089 00000754
004052 00000750
004097 00000757
003989 00000743
004145 00000739
004067 00000758
004064 00000758
004028 00000776
004001 00000758
004084 00000754
003998 00000755
004009 00000759
004077 00000773
004019 00000771
003982 00000760
004132 00000773
003996 00000755
004131 00000773
003997 00000755
004141 00000770
004022 00000771
004144 00000770
004139 00000773
004108 00000773
004143 00000773
004081 00000773
003995 00000761
004133 00000770
004135 00000767
004104 00000770
004075 00000770
004128 00000770
004080 00000770
004138 00000770
004142 00000770
004099 00000759
004017 00000776
004150 00000747
004103 00000766
004157 00000756
004112 00000747
004120 00000741
004117 00000747
004121 00000741
004073 00000767
004093 00000759
004047 00000741
004061 00000760
004119 00000744
004078 00000775
004149 00000744
004091 00000759
004059 00000760
004113 00000744
004116 00000744
004152 00000759
004043 00000741
004155 00000762
003984 00000748
004154 00000762
004051 00000744
004087 00000759
004054 00000744
004100 00000759
004163 00000759
004166 00000756
004006 00000757
004170 00000741
004146 00000759
004005 00000757
004056 00000744
004008 00000759
004039 00000760
004086 00000759
004065 00000757
004095 00000756
004148 00000741
004038 00000760
004057 00000744
004050 00000741
004126 00000741
003987 00000761
004007 00000754
004174 00000741
004165 00000761
004034 00000757
004092 00000759
004010 00000757
004158 00000759
004088 00000759
003990 00000760
004060 00000754
004000 00000760
003985 00000760
004066 00000754
004071 00000757
004063 00000760
003999 00000757
004033 00000760
004094 00000756
004002 00000757
003988 00000757
004167 00000738
004037 00000757
004123 00000744
003983 00000757
[-- Attachment #20: massive-intr-200-300-single-16-4.txt --]
[-- Type: text/plain, Size: 3104 bytes --]
004354 00000738
004280 00000766
004332 00000752
004269 00000756
004361 00000752
004339 00000765
004211 00000749
004357 00000758
004304 00000754
004296 00000768
004348 00000765
004205 00000756
004229 00000766
004373 00000739
004228 00000771
004208 00000752
004300 00000753
004193 00000757
004214 00000759
004377 00000739
004337 00000770
004376 00000742
004267 00000752
004197 00000755
004284 00000768
004305 00000753
004333 00000765
004237 00000733
004203 00000757
004320 00000742
004224 00000739
004180 00000759
004290 00000764
004331 00000750
004306 00000736
004349 00000768
004326 00000754
004329 00000756
004352 00000755
004371 00000739
004286 00000767
004242 00000739
004252 00000739
004324 00000739
004251 00000741
004272 00000755
004261 00000757
004262 00000752
004369 00000740
004274 00000751
004281 00000754
004276 00000757
004227 00000766
004181 00000739
004360 00000758
004338 00000771
004260 00000754
004212 00000758
004253 00000754
004240 00000763
004303 00000748
004219 00000756
004256 00000750
004359 00000754
004218 00000755
004273 00000757
004192 00000753
004299 00000754
004275 00000751
004279 00000752
004264 00000755
004344 00000769
004340 00000764
004232 00000776
004257 00000754
004367 00000739
004358 00000754
004287 00000770
004322 00000754
004246 00000748
004307 00000745
004255 00000753
004185 00000752
004345 00000764
004202 00000749
004291 00000767
004301 00000751
004351 00000769
004278 00000752
004179 00000758
004325 00000760
004282 00000755
004199 00000752
004263 00000756
004198 00000755
004334 00000769
004363 00000751
004190 00000744
004236 00000766
004213 00000754
004207 00000749
004362 00000759
004230 00000768
004294 00000774
004321 00000741
004239 00000770
004346 00000764
004335 00000764
004217 00000748
004250 00000741
004308 00000735
004378 00000740
004342 00000766
004238 00000774
004313 00000743
004298 00000755
004366 00000741
004341 00000766
004312 00000738
004245 00000743
004249 00000735
004210 00000757
004318 00000740
004323 00000739
004310 00000741
004288 00000767
004368 00000741
004221 00000741
004268 00000751
004225 00000771
004235 00000765
004293 00000766
004372 00000741
004231 00000762
004319 00000741
004233 00000765
004347 00000768
004374 00000738
004295 00000768
004223 00000743
004311 00000734
004222 00000738
004182 00000753
004302 00000753
004343 00000770
004196 00000754
004215 00000756
004350 00000763
004364 00000735
004201 00000754
004365 00000755
004314 00000738
004266 00000757
004189 00000760
004183 00000754
004206 00000754
004336 00000771
004270 00000751
004327 00000753
004285 00000765
004277 00000754
004226 00000767
004243 00000737
004194 00000756
004265 00000754
004316 00000742
004241 00000740
004234 00000768
004258 00000754
004244 00000738
004309 00000738
004220 00000759
004328 00000763
004283 00000766
004187 00000754
004195 00000753
004209 00000754
004204 00000751
004191 00000757
004356 00000762
004188 00000758
004216 00000756
004259 00000752
004184 00000751
004370 00000737
004271 00000754
004292 00000772
004317 00000735
004186 00000754
004289 00000765
004248 00000735
004353 00000753
004297 00000756
004355 00000753
[-- Attachment #21: massive-intr-200-300-single-16-5.txt --]
[-- Type: text/plain, Size: 3120 bytes --]
004424 00000770
004423 00000775
004473 00000764
004462 00000713
004562 00000774
004485 00000713
004553 00000777
004542 00000711
004386 00000757
004463 00000764
004500 00000774
004561 00000781
004407 00000757
004474 00000757
004541 00000709
004435 00000774
004563 00000769
004430 00000775
004413 00000713
004552 00000772
004457 00000708
004390 00000775
004494 00000779
004556 00000781
004447 00000713
004525 00000719
004441 00000787
004395 00000756
004434 00000776
004502 00000774
004467 00000758
004404 00000754
004514 00000788
004478 00000758
004489 00000774
004540 00000717
004381 00000781
004570 00000785
004567 00000786
004577 00000786
004574 00000793
004531 00000714
004548 00000722
004400 00000754
004469 00000754
004505 00000770
004516 00000785
004565 00000785
004488 00000713
004438 00000786
004436 00000770
004448 00000796
004568 00000790
004439 00000787
004517 00000794
004575 00000789
004385 00000755
004415 00000711
004547 00000707
004483 00000750
004389 00000755
004458 00000756
004398 00000756
004456 00000711
004408 00000758
004406 00000761
004551 00000769
004452 00000788
004549 00000776
004498 00000772
004445 00000793
004519 00000787
004506 00000787
004495 00000769
004440 00000793
004428 00000772
004437 00000792
004464 00000748
004518 00000786
004477 00000755
004431 00000769
004421 00000773
004571 00000786
004544 00000702
004412 00000718
004545 00000710
004444 00000787
004401 00000756
004454 00000710
004529 00000715
004486 00000715
004501 00000774
004383 00000756
004394 00000756
004453 00000717
004476 00000760
004387 00000763
004546 00000708
004554 00000767
004528 00000711
004512 00000795
004403 00000760
004481 00000761
004564 00000770
004550 00000773
004558 00000773
004559 00000772
004579 00000792
004515 00000791
004482 00000758
004555 00000774
004402 00000755
004530 00000722
004578 00000784
004572 00000788
004397 00000758
004510 00000785
004418 00000710
004484 00000756
004455 00000716
004470 00000755
004429 00000768
004391 00000787
004533 00000718
004543 00000717
004432 00000775
004446 00000793
004576 00000790
004384 00000756
004521 00000792
004479 00000753
004425 00000772
004427 00000780
004522 00000794
004566 00000786
004393 00000755
004449 00000791
004409 00000762
004450 00000787
004539 00000713
004491 00000774
004443 00000797
004557 00000713
004499 00000776
004382 00000801
004388 00000762
004380 00000756
004560 00000771
004535 00000709
004392 00000753
004573 00000785
004414 00000710
004475 00000755
004422 00000798
004426 00000772
004468 00000760
004493 00000775
004471 00000766
004520 00000797
004480 00000758
004410 00000713
004399 00000755
004507 00000793
004465 00000757
004433 00000774
004569 00000796
004396 00000752
004508 00000789
004497 00000766
004523 00000774
004504 00000771
004451 00000710
004492 00000771
004534 00000710
004526 00000710
004509 00000787
004513 00000789
004460 00000712
004405 00000756
004487 00000752
004442 00000789
004503 00000773
004537 00000712
004511 00000709
004420 00000713
004416 00000709
004417 00000710
004411 00000712
004538 00000723
004536 00000713
004532 00000709
004472 00000761
004466 00000769
004461 00000715
004527 00000712
[-- Attachment #22: 0-1.png --]
[-- Type: image/png, Size: 12388 bytes --]
[-- Attachment #23: 16-1.png --]
[-- Type: image/png, Size: 12546 bytes --]
[-- Attachment #24: single-0-1.png --]
[-- Type: image/png, Size: 10008 bytes --]
[-- Attachment #25: single-16-1.png --]
[-- Type: image/png, Size: 11114 bytes --]
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 00/15] CFS Bandwidth Control V6
2011-06-16 9:45 ` Hu Tao
@ 2011-06-17 1:22 ` Hidetoshi Seto
2011-06-17 6:05 ` Hu Tao
2011-06-17 6:25 ` Paul Turner
0 siblings, 2 replies; 129+ messages in thread
From: Hidetoshi Seto @ 2011-06-17 1:22 UTC (permalink / raw)
To: Hu Tao
Cc: Paul Turner, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri
(2011/06/16 18:45), Hu Tao wrote:
> On Thu, Jun 16, 2011 at 09:57:09AM +0900, Hidetoshi Seto wrote:
>> (2011/06/15 17:37), Hu Tao wrote:
>>> On Tue, Jun 14, 2011 at 04:29:49PM +0900, Hidetoshi Seto wrote:
>>>> (2011/06/14 15:58), Hu Tao wrote:
>>>>> Hi,
>>>>>
>>>>> I've run several tests including hackbench, unixbench, massive-intr
>>>>> and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
>>>>> 4 cores, and 4G memory.
>>>>>
>>>>> Most of the time the results differ few, but there are problems:
>>>>>
>>>>> 1. unixbench: execl throughout has about 5% drop.
>>>>> 2. unixbench: process creation has about 5% drop.
>>>>> 3. massive-intr: when running 200 processes for 5mins, the number
>>>>> of loops each process runs differ more than before cfs-bandwidth-v6.
>>>>>
>>>>> The results are attached.
>>>>
>>>> I know the score of unixbench is not so stable that the problem might
>>>> be noises ... but the result of massive-intr is interesting.
>>>> Could you give a try to find which piece (xx/15) in the series cause
>>>> the problems?
>>>
>>> After more tests, I found massive-intr data is not stable, too. Results
>>> are attached. The third number in file name means which patchs are
>>> applied, 0 means no patch applied. plot.sh is easy to generate png
>>> files.
>>
>> (Though I don't know what the 16th patch of this series is, anyway)
I see. It will be replaced by Paul's update.
> the 16th patch is this: https://lkml.org/lkml/2011/5/23/503
>
>> I see that the results of 15, 15-1 and 15-2 are very different and that
>> 15-2 is similar to without-patch.
>>
>> One concern is whether this unstable of data is really caused by the
>> nature of your test (hardware, massive-intr itself and something running
>> in background etc.) or by a hidden piece in the bandwidth patch set.
>> Did you see "not stable" data when none of patches is applied?
>
> Yes.
>
> But for a five-runs the result seems 'stable'(before patches and after
> patches). I've also run the tests in single mode. results are attached.
(It will be appreciated greatly if you could provide not only raw results
but also your current observation/speculation.)
Well, (to wrap it up,) do you still see the following problem?
>>>>> 3. massive-intr: when running 200 processes for 5mins, the number
>>>>> of loops each process runs differ more than before cfs-bandwidth-v6.
I think that 5 samples are not enough to draw a conclusion, and that at the
moment it is inconsiderable. How do you think?
Even though pointed problems are gone, I have to say thank you for taking
your time to test this CFS bandwidth patch set.
I'd appreciate it if you could continue your test, possibly against V7.
(I'm waiting, Paul?)
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 00/15] CFS Bandwidth Control V6
2011-06-17 1:22 ` Hidetoshi Seto
@ 2011-06-17 6:05 ` Hu Tao
2011-06-17 6:25 ` Paul Turner
1 sibling, 0 replies; 129+ messages in thread
From: Hu Tao @ 2011-06-17 6:05 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: Paul Turner, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri
On Fri, Jun 17, 2011 at 10:22:51AM +0900, Hidetoshi Seto wrote:
> (2011/06/16 18:45), Hu Tao wrote:
> > On Thu, Jun 16, 2011 at 09:57:09AM +0900, Hidetoshi Seto wrote:
> >> (2011/06/15 17:37), Hu Tao wrote:
> >>> On Tue, Jun 14, 2011 at 04:29:49PM +0900, Hidetoshi Seto wrote:
> >>>> (2011/06/14 15:58), Hu Tao wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I've run several tests including hackbench, unixbench, massive-intr
> >>>>> and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
> >>>>> 4 cores, and 4G memory.
> >>>>>
> >>>>> Most of the time the results differ few, but there are problems:
> >>>>>
> >>>>> 1. unixbench: execl throughout has about 5% drop.
> >>>>> 2. unixbench: process creation has about 5% drop.
> >>>>> 3. massive-intr: when running 200 processes for 5mins, the number
> >>>>> of loops each process runs differ more than before cfs-bandwidth-v6.
> >>>>>
> >>>>> The results are attached.
> >>>>
> >>>> I know the score of unixbench is not so stable that the problem might
> >>>> be noises ... but the result of massive-intr is interesting.
> >>>> Could you give a try to find which piece (xx/15) in the series cause
> >>>> the problems?
> >>>
> >>> After more tests, I found massive-intr data is not stable, too. Results
> >>> are attached. The third number in file name means which patchs are
> >>> applied, 0 means no patch applied. plot.sh is easy to generate png
> >>> files.
> >>
> >> (Though I don't know what the 16th patch of this series is, anyway)
>
> I see. It will be replaced by Paul's update.
>
> > the 16th patch is this: https://lkml.org/lkml/2011/5/23/503
> >
> >> I see that the results of 15, 15-1 and 15-2 are very different and that
> >> 15-2 is similar to without-patch.
> >>
> >> One concern is whether this unstable of data is really caused by the
> >> nature of your test (hardware, massive-intr itself and something running
> >> in background etc.) or by a hidden piece in the bandwidth patch set.
> >> Did you see "not stable" data when none of patches is applied?
> >
> > Yes.
> >
> > But for a five-runs the result seems 'stable'(before patches and after
> > patches). I've also run the tests in single mode. results are attached.
>
> (It will be appreciated greatly if you could provide not only raw results
> but also your current observation/speculation.)
Sorry I didn't make me clear.
>
> Well, (to wrap it up,) do you still see the following problem?
>
> >>>>> 3. massive-intr: when running 200 processes for 5mins, the number
> >>>>> of loops each process runs differ more than before cfs-bandwidth-v6.
Even when before applying the patches, the numbers differ much between
several runs of massive_intr, this is the reason I say the data is not
stable. But treating the results of five runs as a whole, it shows some
stability. The results after the patches are similar, and the average
loops differ little comparing to the results before the patches(compare
0-1.png and 16-1.png in my last mail). so I would say the patches don't
bring too much impact on interactive processes.
>
> I think that 5 samples are not enough to draw a conclusion, and that at the
> moment it is inconsiderable. How do you think?
At least 5 samples reveal something, but if you'd like I can take more
samples.
>
> Even though pointed problems are gone, I have to say thank you for taking
> your time to test this CFS bandwidth patch set.
> I'd appreciate it if you could continue your test, possibly against V7.
> (I'm waiting, Paul?)
>
>
> Thanks,
> H.Seto
Thanks,
--
Hu Tao
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 00/15] CFS Bandwidth Control V6
2011-06-17 1:22 ` Hidetoshi Seto
2011-06-17 6:05 ` Hu Tao
@ 2011-06-17 6:25 ` Paul Turner
2011-06-17 9:13 ` Hidetoshi Seto
1 sibling, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-06-17 6:25 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: Hu Tao, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri
On Thu, Jun 16, 2011 at 6:22 PM, Hidetoshi Seto
<seto.hidetoshi@jp.fujitsu.com> wrote:
> (2011/06/16 18:45), Hu Tao wrote:
>> On Thu, Jun 16, 2011 at 09:57:09AM +0900, Hidetoshi Seto wrote:
>>> (2011/06/15 17:37), Hu Tao wrote:
>>>> On Tue, Jun 14, 2011 at 04:29:49PM +0900, Hidetoshi Seto wrote:
>>>>> (2011/06/14 15:58), Hu Tao wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've run several tests including hackbench, unixbench, massive-intr
>>>>>> and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
>>>>>> 4 cores, and 4G memory.
>>>>>>
>>>>>> Most of the time the results differ few, but there are problems:
>>>>>>
>>>>>> 1. unixbench: execl throughout has about 5% drop.
>>>>>> 2. unixbench: process creation has about 5% drop.
>>>>>> 3. massive-intr: when running 200 processes for 5mins, the number
>>>>>> of loops each process runs differ more than before cfs-bandwidth-v6.
>>>>>>
>>>>>> The results are attached.
>>>>>
>>>>> I know the score of unixbench is not so stable that the problem might
>>>>> be noises ... but the result of massive-intr is interesting.
>>>>> Could you give a try to find which piece (xx/15) in the series cause
>>>>> the problems?
>>>>
>>>> After more tests, I found massive-intr data is not stable, too. Results
>>>> are attached. The third number in file name means which patchs are
>>>> applied, 0 means no patch applied. plot.sh is easy to generate png
>>>> files.
>>>
>>> (Though I don't know what the 16th patch of this series is, anyway)
>
> I see. It will be replaced by Paul's update.
>
>> the 16th patch is this: https://lkml.org/lkml/2011/5/23/503
>>
>>> I see that the results of 15, 15-1 and 15-2 are very different and that
>>> 15-2 is similar to without-patch.
>>>
>>> One concern is whether this unstable of data is really caused by the
>>> nature of your test (hardware, massive-intr itself and something running
>>> in background etc.) or by a hidden piece in the bandwidth patch set.
>>> Did you see "not stable" data when none of patches is applied?
>>
>> Yes.
>>
>> But for a five-runs the result seems 'stable'(before patches and after
>> patches). I've also run the tests in single mode. results are attached.
>
> (It will be appreciated greatly if you could provide not only raw results
> but also your current observation/speculation.)
>
> Well, (to wrap it up,) do you still see the following problem?
>
>>>>>> 3. massive-intr: when running 200 processes for 5mins, the number
>>>>>> of loops each process runs differ more than before cfs-bandwidth-v6.
>
> I think that 5 samples are not enough to draw a conclusion, and that at the
> moment it is inconsiderable. How do you think?
>
> Even though pointed problems are gone, I have to say thank you for taking
> your time to test this CFS bandwidth patch set.
> I'd appreciate it if you could continue your test, possibly against V7.
> (I'm waiting, Paul?)
It should be out in a few hours, as I was preparing everything today I
realized an latent error existed in the quota expiration path;
specifically that on a wake-up from a sufficiently long sleep we will
see expired quota and have to wait for the timer to recharge bandwidth
before we're actually allowed to run. Currently munging the results
of fixing that and making sure everything else is correct in the wake
of those changes.
>
>
> Thanks,
> H.Seto
>
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 00/15] CFS Bandwidth Control V6
2011-06-17 6:25 ` Paul Turner
@ 2011-06-17 9:13 ` Hidetoshi Seto
2011-06-18 0:28 ` Paul Turner
0 siblings, 1 reply; 129+ messages in thread
From: Hidetoshi Seto @ 2011-06-17 9:13 UTC (permalink / raw)
To: Paul Turner
Cc: Hu Tao, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri
(2011/06/17 15:25), Paul Turner wrote:
> It should be out in a few hours, as I was preparing everything today I
> realized an latent error existed in the quota expiration path;
> specifically that on a wake-up from a sufficiently long sleep we will
> see expired quota and have to wait for the timer to recharge bandwidth
> before we're actually allowed to run. Currently munging the results
> of fixing that and making sure everything else is correct in the wake
> of those changes.
Thanks!
I'll check it some time early next week.
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: [patch 00/15] CFS Bandwidth Control V6
2011-06-17 9:13 ` Hidetoshi Seto
@ 2011-06-18 0:28 ` Paul Turner
0 siblings, 0 replies; 129+ messages in thread
From: Paul Turner @ 2011-06-18 0:28 UTC (permalink / raw)
To: Hidetoshi Seto
Cc: Hu Tao, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Balbir Singh, Vaidyanathan Srinivasan,
Srivatsa Vaddagiri
On Fri, Jun 17, 2011 at 2:13 AM, Hidetoshi Seto
<seto.hidetoshi@jp.fujitsu.com> wrote:
> (2011/06/17 15:25), Paul Turner wrote:
>> It should be out in a few hours, as I was preparing everything today I
>> realized an latent error existed in the quota expiration path;
>> specifically that on a wake-up from a sufficiently long sleep we will
>> see expired quota and have to wait for the timer to recharge bandwidth
>> before we're actually allowed to run. Currently munging the results
>> of fixing that and making sure everything else is correct in the wake
>> of those changes.
>
> Thanks!
> I'll check it some time early next week.
So it's been a long session of hunting races and implementing the
cleanups above.
Unfortunately as my finger hovered over the send button I realized one
hurdle remains -- there's a narrow race in the period timer shutdown
path:
- Our period timer can decide that we're going idle as a result of no activity
- Right after it makes this decision a task sneaks in and runs on
another cpu. We can see the timer has chosen to go idle (it's
possible to synchronize on that state around the bandwidth lock) but
there's no good way to kick the period timer into an about-face since
it's already active.
- The timing is sufficiently rare and short that we could do something
awful like spin until the timer is complete, but I think it's probably
better to put a kick in one of our already existing re-occuring paths
such as update_shares.
I'll fix this after some sleep, I'm out of steam for now.
>
>
> Thanks,
> H.Seto
>
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned
2011-06-15 5:37 ` Kamalesh Babulal
@ 2011-06-21 19:48 ` Paul Turner
2011-06-24 15:05 ` Kamalesh Babulal
` (3 more replies)
0 siblings, 4 replies; 129+ messages in thread
From: Paul Turner @ 2011-06-21 19:48 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: Vladimir Davydov, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Ingo Molnar, Pavel Emelianov
Hi Kamalesh,
Can you see what things look like under v7?
There's been a few improvements to quota re-distribution that should
hopefully help your test case.
The remaining idle% I see on my machines appear to be a product of
load-balancer inefficiency.
Thanks!
- Paul
On Tue, Jun 14, 2011 at 10:37 PM, Kamalesh Babulal
<kamalesh@linux.vnet.ibm.com> wrote:
> * Paul Turner <pjt@google.com> [2011-06-13 17:00:08]:
>
>> Hi Kamalesh.
>>
>> I tried on both friday and again today to reproduce your results
>> without success. Results are attached below. The margin of error is
>> the same as the previous (2-level deep case), ~4%. One minor nit, in
>> your script's input parsing you're calling shift; you don't need to do
>> this with getopts and it will actually lead to arguments being
>> dropped.
>>
>> Are you testing on top of a clean -tip? Do you have any custom
>> load-balancer or scheduler settings?
>>
>> Thanks,
>>
>> - Paul
>>
>>
>> Hyper-threaded topology:
>> unpinned:
>> Average CPU Idle percentage 38.6333%
>> Bandwidth shared with remaining non-Idle 61.3667%
>>
>> pinned:
>> Average CPU Idle percentage 35.2766%
>> Bandwidth shared with remaining non-Idle 64.7234%
>> (The mask in the "unpinned" case is 0-3,6-9,12-15,18-21 which should
>> mirror your 2 socket 8x2 configuration.)
>>
>> 4-way NUMA topology:
>> unpinned:
>> Average CPU Idle percentage 5.26667%
>> Bandwidth shared with remaining non-Idle 94.73333%
>>
>> pinned:
>> Average CPU Idle percentage 0.242424%
>> Bandwidth shared with remaining non-Idle 99.757576%
>>
> Hi Paul,
>
> I tried tip 919c9baa9 + V6 patchset on 2 socket,quadcore with HT and
> the Idle time seen is ~22% to ~23%. Kernel is not tuned to any custom
> load-balancer/scheduler settings.
>
> unpinned:
> Average CPU Idle percentage 23.5333%
> Bandwidth shared with remaining non-Idle 76.4667%
>
> pinned:
> Average CPU Idle percentage 0%
> Bandwidth shared with remaining non-Idle 100%
>
> Thanks,
>
> Kamalesh
>>
>>
>>
>> On Fri, Jun 10, 2011 at 11:17 AM, Kamalesh Babulal
>> <kamalesh@linux.vnet.ibm.com> wrote:
>> > * Paul Turner <pjt@google.com> [2011-06-08 20:25:00]:
>> >
>> >> Hi Kamalesh,
>> >>
>> >> I'm unable to reproduce the results you describe. One possibility is
>> >> load-balancer interaction -- can you describe the topology of the
>> >> platform you are running this on?
>> >>
>> >> On both a straight NUMA topology and a hyper-threaded platform I
>> >> observe a ~4% delta between the pinned and un-pinned cases.
>> >>
>> >> Thanks -- results below,
>> >>
>> >> - Paul
>> >>
>> >>
> (snip)
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned
2011-06-21 19:48 ` Paul Turner
@ 2011-06-24 15:05 ` Kamalesh Babulal
2011-09-07 11:00 ` Srivatsa Vaddagiri
` (2 subsequent siblings)
3 siblings, 0 replies; 129+ messages in thread
From: Kamalesh Babulal @ 2011-06-24 15:05 UTC (permalink / raw)
To: Paul Turner
Cc: Vladimir Davydov, linux-kernel, Peter Zijlstra, Bharata B Rao,
Dhaval Giani, Vaidyanathan Srinivasan, Srivatsa Vaddagiri,
Ingo Molnar, Pavel Emelianov
* Paul Turner <pjt@google.com> [2011-06-21 12:48:17]:
> Hi Kamalesh,
>
> Can you see what things look like under v7?
>
> There's been a few improvements to quota re-distribution that should
> hopefully help your test case.
>
> The remaining idle% I see on my machines appear to be a product of
> load-balancer inefficiency.
>
> Thanks!
>
> - Paul
(snip)
Hi Paul,
Sorry for the delay in the response. I tried the V7 patchset on
top of tip. Patchset passed different combinations build and boot
tests.
I have re-run the tests with couple of combinations on the same
2 socket,4 core, HT box. The test data was collected for 60 seconds
run
un-pinned and cpu shares of 1024
-------------------------------------------------
Top five cgroups and its sub-cgroups were assigned default
cpu shares of 1024.
Average CPU Idle percentage 21.8333%
Bandwidth shared with remaining non-Idle 78.1667%
un-pinned and cpu shares are proportional
--------------------------------------------------
Top five cgroups were assigned cpu shares proportional to
no of sub-cgroups it has under its hierarchy.
For example cgroup1's share is (1024*2) = 2048 and each sub-cgroups
has shares of 1024.
Average CPU Idle percentage 14.2%
Bandwidth shared with remaining non-Idle 85.8%
pinned and cpu shares of 1024
--------------------------------------------------
Average CPU Idle percentage 0.0666667%
Bandwidth shared with remaining non-Idle 99.9333333%
pinned and cpu shares are proportional
--------------------------------------------------
Average CPU Idle percentage 0%
Bandwidth shared with remaining non-Idle 100%
I have captured the perf sched stats for every run. Let me
know if that will help. I can mail them to you privately.
Thanks,
Kamalesh.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned
2011-06-21 19:48 ` Paul Turner
2011-06-24 15:05 ` Kamalesh Babulal
@ 2011-09-07 11:00 ` Srivatsa Vaddagiri
2011-09-07 14:54 ` Srivatsa Vaddagiri
2011-09-07 15:20 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede Srivatsa Vaddagiri
3 siblings, 0 replies; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-07 11:00 UTC (permalink / raw)
To: Paul Turner
Cc: Kamalesh Babulal, Vladimir Davydov, linux-kernel, Peter Zijlstra,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Tue, Jun 21, 2011 at 12:48:17PM -0700, Paul Turner wrote:
> Hi Kamalesh,
>
> Can you see what things look like under v7?
>
> There's been a few improvements to quota re-distribution that should
> hopefully help your test case.
>
> The remaining idle% I see on my machines appear to be a product of
> load-balancer inefficiency.
which is quite a complex problem to solve! I am still surprised that
we can't handle 32 cpuhogs on a 16-cpu system very easily. The tasks seem to
hop around madly rather than settle down as 2 tasks/cpu. Kamalesh, can you post
the exact count of migrations we saw on latest tip over a 20-sec window?
Anyway, here's a "hack" to minimize the idle time induced due to load-balance
issues. It brings down idle time from 7+% to ~0% ..I am not too happy about
this, but I don't see any other simpler solutions to solve the idle time issue
completely (other than making load-balancer completely fair!).
--
Fix excessive idle time reported when cgroups are capped. The patch
introduces the notion of "steal" (or "grace") time which is the surplus
time/bandwidth each cgroup is allowed to consume, subject to a maximum
steal time (sched_cfs_max_steal_time_us). Cgroups are allowed this "steal"
or "grace" time when the lone task running on a cpu is about to be throttled.
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Index: linux-3.1-rc4/include/linux/sched.h
===================================================================
--- linux-3.1-rc4.orig/include/linux/sched.h 2011-09-07 14:57:49.529602231 +0800
+++ linux-3.1-rc4/include/linux/sched.h 2011-09-07 14:58:49.952418107 +0800
@@ -2042,6 +2042,7 @@
#ifdef CONFIG_CFS_BANDWIDTH
extern unsigned int sysctl_sched_cfs_bandwidth_slice;
+extern unsigned int sysctl_sched_cfs_max_steal_time;
#endif
#ifdef CONFIG_RT_MUTEXES
Index: linux-3.1-rc4/kernel/sched.c
===================================================================
--- linux-3.1-rc4.orig/kernel/sched.c 2011-09-07 14:57:49.532854588 +0800
+++ linux-3.1-rc4/kernel/sched.c 2011-09-07 14:58:49.955453578 +0800
@@ -254,7 +254,7 @@
#ifdef CONFIG_CFS_BANDWIDTH
raw_spinlock_t lock;
ktime_t period;
- u64 quota, runtime;
+ u64 quota, runtime, steal_time;
s64 hierarchal_quota;
u64 runtime_expires;
Index: linux-3.1-rc4/kernel/sched_fair.c
===================================================================
--- linux-3.1-rc4.orig/kernel/sched_fair.c 2011-09-07 14:57:49.533644483 +0800
+++ linux-3.1-rc4/kernel/sched_fair.c 2011-09-07 15:16:09.338824132 +0800
@@ -101,6 +101,18 @@
* default: 5 msec, units: microseconds
*/
unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL;
+
+/*
+ * "Surplus" quota given to a cgroup to prevent a CPU from becoming idle.
+ *
+ * This would have been unnecessary had the load-balancer been "ideal" in
+ * loading tasks uniformly across all CPUs, which would have allowed
+ * all cgroups to claim their "quota" completely. In the absence of an
+ * "ideal" load-balancer, cgroups are unable to utilize their quota, leading
+ * to unexpected idle time. This knob allows a CPU to keep running a
+ * task beyond its throttled point before becoming idle.
+ */
+unsigned int sysctl_sched_cfs_max_steal_time = 100000UL;
#endif
static const struct sched_class fair_sched_class;
@@ -1288,6 +1300,11 @@
return (u64)sysctl_sched_cfs_bandwidth_slice * NSEC_PER_USEC;
}
+static inline u64 sched_cfs_max_steal_time(void)
+{
+ return (u64)sysctl_sched_cfs_max_steal_time * NSEC_PER_USEC;
+}
+
/*
* Replenish runtime according to assigned quota and update expiration time.
* We use sched_clock_cpu directly instead of rq->clock to avoid adding
@@ -1303,6 +1320,7 @@
return;
now = sched_clock_cpu(smp_processor_id());
+ cfs_b->steal_time = 0;
cfs_b->runtime = cfs_b->quota;
cfs_b->runtime_expires = now + ktime_to_ns(cfs_b->period);
}
@@ -1337,6 +1355,12 @@
cfs_b->runtime -= amount;
cfs_b->idle = 0;
}
+
+ if (!amount && rq_of(cfs_rq)->nr_running == 1 &&
+ cfs_b->steal_time < sched_cfs_max_steal_time()) {
+ amount = min_amount;
+ cfs_b->steal_time += amount;
+ }
}
expires = cfs_b->runtime_expires;
raw_spin_unlock(&cfs_b->lock);
@@ -1378,7 +1402,8 @@
* whether the global deadline has advanced.
*/
- if ((s64)(cfs_rq->runtime_expires - cfs_b->runtime_expires) >= 0) {
+ if ((s64)(cfs_rq->runtime_expires - cfs_b->runtime_expires) >= 0 ||
+ (rq_of(cfs_rq)->nr_running == 1 && cfs_b->steal_time < sched_cfs_max_steal_time())) {
/* extend local deadline, drift is bounded above by 2 ticks */
cfs_rq->runtime_expires += TICK_NSEC;
} else {
Index: linux-3.1-rc4/kernel/sysctl.c
===================================================================
--- linux-3.1-rc4.orig/kernel/sysctl.c 2011-09-07 14:57:49.534454409 +0800
+++ linux-3.1-rc4/kernel/sysctl.c 2011-09-07 14:58:49.958452846 +0800
@@ -388,6 +388,14 @@
.proc_handler = proc_dointvec_minmax,
.extra1 = &one,
},
+ {
+ .procname = "sched_cfs_max_steal_time_us",
+ .data = &sysctl_sched_cfs_max_steal_time,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &one,
+ },
#endif
#ifdef CONFIG_PROVE_LOCKING
{
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned
2011-06-21 19:48 ` Paul Turner
2011-06-24 15:05 ` Kamalesh Babulal
2011-09-07 11:00 ` Srivatsa Vaddagiri
@ 2011-09-07 14:54 ` Srivatsa Vaddagiri
2011-09-07 15:20 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede Srivatsa Vaddagiri
3 siblings, 0 replies; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-07 14:54 UTC (permalink / raw)
To: Paul Turner
Cc: Kamalesh Babulal, Vladimir Davydov, linux-kernel, Peter Zijlstra,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Paul Turner <pjt@google.com> [2011-06-21 12:48:17]:
> Hi Kamalesh,
>
> Can you see what things look like under v7?
>
> There's been a few improvements to quota re-distribution that should
> hopefully help your test case.
>
> The remaining idle% I see on my machines appear to be a product of
> load-balancer inefficiency.
which is quite a complex problem to solve! I am still surprised that
we can't handle 32 cpuhogs on a 16-cpu system very easily. The tasks seem to
hop around madly rather than settle down as 2 tasks/cpu. Kamalesh, can you post
the exact count of migrations we saw on latest tip over a 20-sec window?
Anyway, here's a "hack" to minimize the idle time induced due to load-balance
issues. It brings down idle time from 7+% to ~0% ..I am not too happy about
this, but I don't see any other simpler solutions to solve the idle time issue
completely (other than making load-balancer completely fair!).
--
Fix excessive idle time reported when cgroups are capped. The patch
introduces the notion of "steal" (or "grace") time which is the surplus
time/bandwidth each cgroup is allowed to consume, subject to a maximum
steal time (sched_cfs_max_steal_time_us). Cgroups are allowed this "steal"
or "grace" time when the lone task running on a cpu is about to be throttled.
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Index: linux-3.1-rc4/include/linux/sched.h
===================================================================
--- linux-3.1-rc4.orig/include/linux/sched.h 2011-09-07 14:57:49.529602231 +0800
+++ linux-3.1-rc4/include/linux/sched.h 2011-09-07 14:58:49.952418107 +0800
@@ -2042,6 +2042,7 @@ static inline void sched_autogroup_exit(
#ifdef CONFIG_CFS_BANDWIDTH
extern unsigned int sysctl_sched_cfs_bandwidth_slice;
+extern unsigned int sysctl_sched_cfs_max_steal_time;
#endif
#ifdef CONFIG_RT_MUTEXES
Index: linux-3.1-rc4/kernel/sched.c
===================================================================
--- linux-3.1-rc4.orig/kernel/sched.c 2011-09-07 14:57:49.532854588 +0800
+++ linux-3.1-rc4/kernel/sched.c 2011-09-07 14:58:49.955453578 +0800
@@ -254,7 +254,7 @@ struct cfs_bandwidth {
#ifdef CONFIG_CFS_BANDWIDTH
raw_spinlock_t lock;
ktime_t period;
- u64 quota, runtime;
+ u64 quota, runtime, steal_time;
s64 hierarchal_quota;
u64 runtime_expires;
Index: linux-3.1-rc4/kernel/sched_fair.c
===================================================================
--- linux-3.1-rc4.orig/kernel/sched_fair.c 2011-09-07 14:57:49.533644483 +0800
+++ linux-3.1-rc4/kernel/sched_fair.c 2011-09-07 15:16:09.338824132 +0800
@@ -101,6 +101,18 @@ unsigned int __read_mostly sysctl_sched_
* default: 5 msec, units: microseconds
*/
unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL;
+
+/*
+ * "Surplus" quota given to a cgroup to prevent a CPU from becoming idle.
+ *
+ * This would have been unnecessary had the load-balancer been "ideal" in
+ * loading tasks uniformly across all CPUs, which would have allowed
+ * all cgroups to claim their "quota" completely. In the absence of an
+ * "ideal" load-balancer, cgroups are unable to utilize their quota, leading
+ * to unexpected idle time. This knob allows a CPU to keep running a
+ * task beyond its throttled point before becoming idle.
+ */
+unsigned int sysctl_sched_cfs_max_steal_time = 100000UL;
#endif
static const struct sched_class fair_sched_class;
@@ -1288,6 +1300,11 @@ static inline u64 sched_cfs_bandwidth_sl
return (u64)sysctl_sched_cfs_bandwidth_slice * NSEC_PER_USEC;
}
+static inline u64 sched_cfs_max_steal_time(void)
+{
+ return (u64)sysctl_sched_cfs_max_steal_time * NSEC_PER_USEC;
+}
+
/*
* Replenish runtime according to assigned quota and update expiration time.
* We use sched_clock_cpu directly instead of rq->clock to avoid adding
@@ -1303,6 +1320,7 @@ static void __refill_cfs_bandwidth_runti
return;
now = sched_clock_cpu(smp_processor_id());
+ cfs_b->steal_time = 0;
cfs_b->runtime = cfs_b->quota;
cfs_b->runtime_expires = now + ktime_to_ns(cfs_b->period);
}
@@ -1337,6 +1355,12 @@ static int assign_cfs_rq_runtime(struct
cfs_b->runtime -= amount;
cfs_b->idle = 0;
}
+
+ if (!amount && rq_of(cfs_rq)->nr_running == 1 &&
+ cfs_b->steal_time < sched_cfs_max_steal_time()) {
+ amount = min_amount;
+ cfs_b->steal_time += amount;
+ }
}
expires = cfs_b->runtime_expires;
raw_spin_unlock(&cfs_b->lock);
@@ -1378,7 +1402,8 @@ static void expire_cfs_rq_runtime(struct
* whether the global deadline has advanced.
*/
- if ((s64)(cfs_rq->runtime_expires - cfs_b->runtime_expires) >= 0) {
+ if ((s64)(cfs_rq->runtime_expires - cfs_b->runtime_expires) >= 0 ||
+ (rq_of(cfs_rq)->nr_running == 1 && cfs_b->steal_time < sched_cfs_max_steal_time())) {
/* extend local deadline, drift is bounded above by 2 ticks */
cfs_rq->runtime_expires += TICK_NSEC;
} else {
Index: linux-3.1-rc4/kernel/sysctl.c
===================================================================
--- linux-3.1-rc4.orig/kernel/sysctl.c 2011-09-07 14:57:49.534454409 +0800
+++ linux-3.1-rc4/kernel/sysctl.c 2011-09-07 14:58:49.958452846 +0800
@@ -388,6 +388,14 @@ static struct ctl_table kern_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = &one,
},
+ {
+ .procname = "sched_cfs_max_steal_time_us",
+ .data = &sysctl_sched_cfs_max_steal_time,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &one,
+ },
#endif
#ifdef CONFIG_PROVE_LOCKING
{
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-06-21 19:48 ` Paul Turner
` (2 preceding siblings ...)
2011-09-07 14:54 ` Srivatsa Vaddagiri
@ 2011-09-07 15:20 ` Srivatsa Vaddagiri
2011-09-07 19:22 ` Peter Zijlstra
2011-09-16 8:22 ` Paul Turner
3 siblings, 2 replies; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-07 15:20 UTC (permalink / raw)
To: Paul Turner
Cc: Kamalesh Babulal, Vladimir Davydov, linux-kernel, Peter Zijlstra,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
[Apologies if you get this email multiple times - there is some email
client config issue that I am fixing up]
* Paul Turner <pjt@google.com> [2011-06-21 12:48:17]:
> Hi Kamalesh,
>
> Can you see what things look like under v7?
>
> There's been a few improvements to quota re-distribution that should
> hopefully help your test case.
>
> The remaining idle% I see on my machines appear to be a product of
> load-balancer inefficiency.
which is quite a complex problem to solve! I am still surprised that
we can't handle 32 cpuhogs on a 16-cpu system very easily. The tasks seem to
hop around madly rather than settle down as 2 tasks/cpu. Kamalesh, can you post
the exact count of migrations we saw on latest tip over a 20-sec window?
Anyway, here's a "hack" to minimize the idle time induced due to load-balance
issues. It brings down idle time from 7+% to ~0% ..I am not too happy about
this, but I don't see any other simpler solutions to solve the idle time issue
completely (other than making load-balancer completely fair!).
--
Fix excessive idle time reported when cgroups are capped. The patch
introduces the notion of "steal" (or "grace") time which is the surplus
time/bandwidth each cgroup is allowed to consume, subject to a maximum
steal time (sched_cfs_max_steal_time_us). Cgroups are allowed this "steal"
or "grace" time when the lone task running on a cpu is about to be throttled.
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Index: linux-3.1-rc4/include/linux/sched.h
===================================================================
--- linux-3.1-rc4.orig/include/linux/sched.h 2011-09-07 14:57:49.529602231 +0800
+++ linux-3.1-rc4/include/linux/sched.h 2011-09-07 14:58:49.952418107 +0800
@@ -2042,6 +2042,7 @@ static inline void sched_autogroup_exit(
#ifdef CONFIG_CFS_BANDWIDTH
extern unsigned int sysctl_sched_cfs_bandwidth_slice;
+extern unsigned int sysctl_sched_cfs_max_steal_time;
#endif
#ifdef CONFIG_RT_MUTEXES
Index: linux-3.1-rc4/kernel/sched.c
===================================================================
--- linux-3.1-rc4.orig/kernel/sched.c 2011-09-07 14:57:49.532854588 +0800
+++ linux-3.1-rc4/kernel/sched.c 2011-09-07 14:58:49.955453578 +0800
@@ -254,7 +254,7 @@ struct cfs_bandwidth {
#ifdef CONFIG_CFS_BANDWIDTH
raw_spinlock_t lock;
ktime_t period;
- u64 quota, runtime;
+ u64 quota, runtime, steal_time;
s64 hierarchal_quota;
u64 runtime_expires;
Index: linux-3.1-rc4/kernel/sched_fair.c
===================================================================
--- linux-3.1-rc4.orig/kernel/sched_fair.c 2011-09-07 14:57:49.533644483 +0800
+++ linux-3.1-rc4/kernel/sched_fair.c 2011-09-07 15:16:09.338824132 +0800
@@ -101,6 +101,18 @@ unsigned int __read_mostly sysctl_sched_
* default: 5 msec, units: microseconds
*/
unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL;
+
+/*
+ * "Surplus" quota given to a cgroup to prevent a CPU from becoming idle.
+ *
+ * This would have been unnecessary had the load-balancer been "ideal" in
+ * loading tasks uniformly across all CPUs, which would have allowed
+ * all cgroups to claim their "quota" completely. In the absence of an
+ * "ideal" load-balancer, cgroups are unable to utilize their quota, leading
+ * to unexpected idle time. This knob allows a CPU to keep running a
+ * task beyond its throttled point before becoming idle.
+ */
+unsigned int sysctl_sched_cfs_max_steal_time = 100000UL;
#endif
static const struct sched_class fair_sched_class;
@@ -1288,6 +1300,11 @@ static inline u64 sched_cfs_bandwidth_sl
return (u64)sysctl_sched_cfs_bandwidth_slice * NSEC_PER_USEC;
}
+static inline u64 sched_cfs_max_steal_time(void)
+{
+ return (u64)sysctl_sched_cfs_max_steal_time * NSEC_PER_USEC;
+}
+
/*
* Replenish runtime according to assigned quota and update expiration time.
* We use sched_clock_cpu directly instead of rq->clock to avoid adding
@@ -1303,6 +1320,7 @@ static void __refill_cfs_bandwidth_runti
return;
now = sched_clock_cpu(smp_processor_id());
+ cfs_b->steal_time = 0;
cfs_b->runtime = cfs_b->quota;
cfs_b->runtime_expires = now + ktime_to_ns(cfs_b->period);
}
@@ -1337,6 +1355,12 @@ static int assign_cfs_rq_runtime(struct
cfs_b->runtime -= amount;
cfs_b->idle = 0;
}
+
+ if (!amount && rq_of(cfs_rq)->nr_running == 1 &&
+ cfs_b->steal_time < sched_cfs_max_steal_time()) {
+ amount = min_amount;
+ cfs_b->steal_time += amount;
+ }
}
expires = cfs_b->runtime_expires;
raw_spin_unlock(&cfs_b->lock);
@@ -1378,7 +1402,8 @@ static void expire_cfs_rq_runtime(struct
* whether the global deadline has advanced.
*/
- if ((s64)(cfs_rq->runtime_expires - cfs_b->runtime_expires) >= 0) {
+ if ((s64)(cfs_rq->runtime_expires - cfs_b->runtime_expires) >= 0 ||
+ (rq_of(cfs_rq)->nr_running == 1 && cfs_b->steal_time < sched_cfs_max_steal_time())) {
/* extend local deadline, drift is bounded above by 2 ticks */
cfs_rq->runtime_expires += TICK_NSEC;
} else {
Index: linux-3.1-rc4/kernel/sysctl.c
===================================================================
--- linux-3.1-rc4.orig/kernel/sysctl.c 2011-09-07 14:57:49.534454409 +0800
+++ linux-3.1-rc4/kernel/sysctl.c 2011-09-07 14:58:49.958452846 +0800
@@ -388,6 +388,14 @@ static struct ctl_table kern_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = &one,
},
+ {
+ .procname = "sched_cfs_max_steal_time_us",
+ .data = &sysctl_sched_cfs_max_steal_time,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &one,
+ },
#endif
#ifdef CONFIG_PROVE_LOCKING
{
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-07 15:20 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede Srivatsa Vaddagiri
@ 2011-09-07 19:22 ` Peter Zijlstra
2011-09-08 15:15 ` Srivatsa Vaddagiri
2011-09-16 8:22 ` Paul Turner
1 sibling, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-07 19:22 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Wed, 2011-09-07 at 20:50 +0530, Srivatsa Vaddagiri wrote:
>
> Fix excessive idle time reported when cgroups are capped.
Where from? The whole idea of bandwidth caps is to introduce idle time,
so what's excessive and where does it come from?
> The patch introduces the notion of "steal"
The virt folks already claimed steal-time and have it mean something
entirely different. You get to pick a new name.
> (or "grace") time which is the surplus
> time/bandwidth each cgroup is allowed to consume, subject to a maximum
> steal time (sched_cfs_max_steal_time_us). Cgroups are allowed this "steal"
> or "grace" time when the lone task running on a cpu is about to be throttled.
Ok, so this is a solution to an unstated problem. Why is it a good
solution?
Also, another tunable, yay!
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-07 19:22 ` Peter Zijlstra
@ 2011-09-08 15:15 ` Srivatsa Vaddagiri
2011-09-09 12:31 ` Peter Zijlstra
0 siblings, 1 reply; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-08 15:15 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-07 21:22:22]:
> On Wed, 2011-09-07 at 20:50 +0530, Srivatsa Vaddagiri wrote:
> >
> > Fix excessive idle time reported when cgroups are capped.
>
> Where from? The whole idea of bandwidth caps is to introduce idle time,
> so what's excessive and where does it come from?
We have setup cgroups and their hard limits so that in theory they should
consume the entire capacity available on machine, leading to 0% idle time.
That's not what we see. A more detailed description of the setup and the problem
is here:
https://lkml.org/lkml/2011/6/7/352
but to quickly summarize it, the machine and the test-case is as below:
Machine : 16-cpus (2 Quad-core w/ HT enabled)
Cgroups : 5 in number (C1-C5), each having {2, 2, 4, 8, 16} tasks respectively.
Further, each task is placed in its own (sub-)cgroup with
a capped usage of 50% CPU.
/C1/C1_1/Task1 -> capped at 50% cpu usage
/C1/C1_2/Task2 -> capped at 50% cpu usage
/C2/C2_1/Task3 -> capped at 50% cpu usage
/C2/C2_2/Task3 -> capped at 50% cpu usage
/C3/C3_1/Task4 -> capped at 50% cpu usage
/C3/C3_2/Task4 -> capped at 50% cpu usage
/C3/C3_3/Task4 -> capped at 50% cpu usage
/C3/C3_4/Task4 -> capped at 50% cpu usage
...
/C5/C5_16/Task32 -> capped at 50% cpu usage
So we have 32 tasks, each capped at 50% CPU usage, run on a 16-CPU
system. One can expect 0% idle time in this scenario, which was found
not to be the case. With early versions of cfs hardlimits, upto ~20%
idle time was seen, though with the current version in tip, we see upto
~10% idle time (when cfs.period = 100ms) which goes down to ~5% when
cfs.period is set to 500ms.
>From what I could find out, the "excess" idle time crops up because
load-balancer is not perfect. For example, there are instances when a
CPU has just 1 task on its runqueue (rather then the ideal number of 2
tasks/cpu). When that lone task exceeds its 50% limit, cpu is forced to
become idle.
> > The patch introduces the notion of "steal"
>
> The virt folks already claimed steal-time and have it mean something
> entirely different. You get to pick a new name.
grace time?
> > (or "grace") time which is the surplus
> > time/bandwidth each cgroup is allowed to consume, subject to a maximum
> > steal time (sched_cfs_max_steal_time_us). Cgroups are allowed this "steal"
> > or "grace" time when the lone task running on a cpu is about to be throttled.
>
> Ok, so this is a solution to an unstated problem. Why is it a good
> solution?
I am not sure if there are any "good" solutions to this problem! One
possibility is to make the idle load balancer become aggressive in
pulling tasks across sched-domain boundaries i.e when a CPU becomes idle
(after a task got throttled) and invokes the idle load balancer, it
should try "harder" at pulling a task from far-off cpus (across
package/node boundaries)?
> Also, another tunable, yay!
- vatsa
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-08 15:15 ` Srivatsa Vaddagiri
@ 2011-09-09 12:31 ` Peter Zijlstra
2011-09-09 13:26 ` Srivatsa Vaddagiri
2011-09-12 10:17 ` Srivatsa Vaddagiri
0 siblings, 2 replies; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-09 12:31 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Thu, 2011-09-08 at 20:45 +0530, Srivatsa Vaddagiri wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-07 21:22:22]:
>
> > On Wed, 2011-09-07 at 20:50 +0530, Srivatsa Vaddagiri wrote:
> > >
> > > Fix excessive idle time reported when cgroups are capped.
> >
> > Where from? The whole idea of bandwidth caps is to introduce idle time,
> > so what's excessive and where does it come from?
>
> We have setup cgroups and their hard limits so that in theory they should
> consume the entire capacity available on machine, leading to 0% idle time.
> That's not what we see. A more detailed description of the setup and the problem
> is here:
>
> https://lkml.org/lkml/2011/6/7/352
That's frigging irrelevant isn't it? A patch should contain its own
justification.
> Machine : 16-cpus (2 Quad-core w/ HT enabled)
> Cgroups : 5 in number (C1-C5), each having {2, 2, 4, 8, 16} tasks respectively.
> Further, each task is placed in its own (sub-)cgroup with
> a capped usage of 50% CPU.
So that's loads: {512,512}, {512,512}, {256,256,256,256}, {128,..} and {64,..}
And you expect that to be balanced perfectly when a bandwidth cap is
introduced, I think you need some expectation adjustments.
> From what I could find out, the "excess" idle time crops up because
> load-balancer is not perfect. For example, there are instances when a
> CPU has just 1 task on its runqueue (rather then the ideal number of 2
> tasks/cpu). When that lone task exceeds its 50% limit, cpu is forced to
> become idle.
So try and cure that instead of frobbing crap like this.
> > > The patch introduces the notion of "steal"
> >
> > The virt folks already claimed steal-time and have it mean something
> > entirely different. You get to pick a new name.
>
> grace time?
Well, ideally this frobbing of symptoms instead of fixing of causes
isn't going to happen at all, its just retarded. And it most certainly
shouldn't be the first approach to any problem.
> > Ok, so this is a solution to an unstated problem. Why is it a good
> > solution?
>
> I am not sure if there are any "good" solutions to this problem!
Good, so then we're not going to do it, full stop.
> One
> possibility is to make the idle load balancer become aggressive in
> pulling tasks across sched-domain boundaries i.e when a CPU becomes idle
> (after a task got throttled) and invokes the idle load balancer, it
> should try "harder" at pulling a task from far-off cpus (across
> package/node boundaries)?
How about we just live with it? You set up a nearly impossible
(non-scalable) problem and then complain we don't do well. Tough fscking
luck, don't do that.
I mean, I'm all for improving things, but your frobbing here is just not
going to happen, most certainly not without very _very_ good
justification, and your patch frankly didn't have any.
Furthermore your patch frobs the bandwidth accounting but doesn't spend
a single word explaining how, if at all, it keeps the accounting a 0-sum
game.
Seriously, you suck, you patch sucks and your method sucks. Go away.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-09 12:31 ` Peter Zijlstra
@ 2011-09-09 13:26 ` Srivatsa Vaddagiri
2011-09-12 10:17 ` Srivatsa Vaddagiri
1 sibling, 0 replies; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-09 13:26 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-09 14:31:02]:
> > We have setup cgroups and their hard limits so that in theory they should
> > consume the entire capacity available on machine, leading to 0% idle time.
> > That's not what we see. A more detailed description of the setup and the problem
> > is here:
> >
> > https://lkml.org/lkml/2011/6/7/352
>
> That's frigging irrelevant isn't it? A patch should contain its own
> justification.
Agreed my bad. I was (wrongly) setting the problem context by posting
this in response to Paul's email where the problem was discussed.
> > One
> > possibility is to make the idle load balancer become aggressive in
> > pulling tasks across sched-domain boundaries i.e when a CPU becomes idle
> > (after a task got throttled) and invokes the idle load balancer, it
> > should try "harder" at pulling a task from far-off cpus (across
> > package/node boundaries)?
>
> How about we just live with it?
I think we will, unless the load balancer can be improved (which seems unlikely
to me :-()
- vatsa
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-09 12:31 ` Peter Zijlstra
2011-09-09 13:26 ` Srivatsa Vaddagiri
@ 2011-09-12 10:17 ` Srivatsa Vaddagiri
2011-09-12 12:35 ` Peter Zijlstra
1 sibling, 1 reply; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-12 10:17 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-09 14:31:02]:
> > Machine : 16-cpus (2 Quad-core w/ HT enabled)
> > Cgroups : 5 in number (C1-C5), each having {2, 2, 4, 8, 16} tasks respectively.
> > Further, each task is placed in its own (sub-)cgroup with
> > a capped usage of 50% CPU.
>
> So that's loads: {512,512}, {512,512}, {256,256,256,256}, {128,..} and {64,..}
Yes, with the default shares of 1024 for each cgroup.
FWIW we did also try setting shares for each cgroup proportional to number of
tasks it has. For ex: C1's shares = 1024 * 2 = 2048, C2 = 1024 * 2 = 2048,
C3 = 4 * 1024 = 4096 etc. while /C1/C1_1, /C1/C1_2, .../C5/C5_16/ shares were
left at default of 1024 (as those sub-cgroups contain only one task).
That does help reduce idle time by almost 50% (from 15-20% -> 6-9%)
- vatsa
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-12 10:17 ` Srivatsa Vaddagiri
@ 2011-09-12 12:35 ` Peter Zijlstra
2011-09-13 4:15 ` Srivatsa Vaddagiri
0 siblings, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-12 12:35 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Mon, 2011-09-12 at 15:47 +0530, Srivatsa Vaddagiri wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-09 14:31:02]:
>
> > > Machine : 16-cpus (2 Quad-core w/ HT enabled)
> > > Cgroups : 5 in number (C1-C5), each having {2, 2, 4, 8, 16} tasks respectively.
> > > Further, each task is placed in its own (sub-)cgroup with
> > > a capped usage of 50% CPU.
> >
> > So that's loads: {512,512}, {512,512}, {256,256,256,256}, {128,..} and {64,..}
>
> Yes, with the default shares of 1024 for each cgroup.
>
> FWIW we did also try setting shares for each cgroup proportional to number of
> tasks it has. For ex: C1's shares = 1024 * 2 = 2048, C2 = 1024 * 2 = 2048,
> C3 = 4 * 1024 = 4096 etc. while /C1/C1_1, /C1/C1_2, .../C5/C5_16/ shares were
> left at default of 1024 (as those sub-cgroups contain only one task).
>
> That does help reduce idle time by almost 50% (from 15-20% -> 6-9%)
Of course it does.. and I bet you can improve that slightly if you
manage to fix some of the numerical nightmares that live in the cgroup
load-balancer (Paul, care to share your WIP?)
But the initial scenario is a complete and utter fail, its impossible to
schedule that sanely. Its an infeasible weight scenario with more tasks
than cpus, and the added bandwidth constraints just keep changing the
set requiring endless migrations to try and keep utilization from
tanking.
Really, classic fail.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-12 12:35 ` Peter Zijlstra
@ 2011-09-13 4:15 ` Srivatsa Vaddagiri
2011-09-13 5:03 ` Srivatsa Vaddagiri
2011-09-13 14:19 ` Peter Zijlstra
0 siblings, 2 replies; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-13 4:15 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-12 14:35:43]:
> Of course it does.. and I bet you can improve that slightly if you
> manage to fix some of the numerical nightmares that live in the cgroup
> load-balancer (Paul, care to share your WIP?)
Booting with "nohz=off" also helps significantly.
With nohz=on, average idle time (over 1 min) is 10.3%
With nohz=off, average idle time (over 1 min) is 3.9%
- vatsa
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 4:15 ` Srivatsa Vaddagiri
@ 2011-09-13 5:03 ` Srivatsa Vaddagiri
2011-09-13 5:05 ` Srivatsa Vaddagiri
2011-09-13 9:39 ` Peter Zijlstra
2011-09-13 14:19 ` Peter Zijlstra
1 sibling, 2 replies; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-13 5:03 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> [2011-09-13 09:45:45]:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-12 14:35:43]:
>
> > Of course it does.. and I bet you can improve that slightly if you
> > manage to fix some of the numerical nightmares that live in the cgroup
> > load-balancer (Paul, care to share your WIP?)
>
> Booting with "nohz=off" also helps significantly.
>
> With nohz=on, average idle time (over 1 min) is 10.3%
> With nohz=off, average idle time (over 1 min) is 3.9%
Tuning min_interval and max_interval of various sched_domains to 1 [a]
and also setting sched_cfs_bandwidth_slice_us to 500 does cut down idle
time further to 2.7% ..
This is perhaps not optimal (as it may lead to more lock contentions), but
something to note for those who care for both capping and utilization in
equal measure!
- vatsa
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 5:03 ` Srivatsa Vaddagiri
@ 2011-09-13 5:05 ` Srivatsa Vaddagiri
2011-09-13 9:39 ` Peter Zijlstra
1 sibling, 0 replies; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-13 5:05 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> [2011-09-13 10:33:06]:
> * Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> [2011-09-13 09:45:45]:
>
> > * Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-12 14:35:43]:
> >
> > > Of course it does.. and I bet you can improve that slightly if you
> > > manage to fix some of the numerical nightmares that live in the cgroup
> > > load-balancer (Paul, care to share your WIP?)
> >
> > Booting with "nohz=off" also helps significantly.
> >
> > With nohz=on, average idle time (over 1 min) is 10.3%
> > With nohz=off, average idle time (over 1 min) is 3.9%
>
> Tuning min_interval and max_interval of various sched_domains to 1 [a]
Forgot to add footnote (a) earlier. min and max_interval tuned as
below:
# cd /proc/sys/kernel/sched_domain
# for i in `find . -name min_interval`; do echo 1 > $i; done
# for i in `find . -name max_interval`; do echo 1 > $i; done
- vatsa
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 5:03 ` Srivatsa Vaddagiri
2011-09-13 5:05 ` Srivatsa Vaddagiri
@ 2011-09-13 9:39 ` Peter Zijlstra
2011-09-13 11:28 ` Srivatsa Vaddagiri
1 sibling, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-13 9:39 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Tue, 2011-09-13 at 10:33 +0530, Srivatsa Vaddagiri wrote:
>
> This is perhaps not optimal (as it may lead to more lock contentions), but
> something to note for those who care for both capping and utilization in
> equal measure!
You meant lock inversion, which leads to more idle time :-)
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 9:39 ` Peter Zijlstra
@ 2011-09-13 11:28 ` Srivatsa Vaddagiri
2011-09-13 14:07 ` Peter Zijlstra
0 siblings, 1 reply; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-13 11:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-13 11:39:48]:
> On Tue, 2011-09-13 at 10:33 +0530, Srivatsa Vaddagiri wrote:
> >
> > This is perhaps not optimal (as it may lead to more lock contentions), but
> > something to note for those who care for both capping and utilization in
> > equal measure!
>
> You meant lock inversion, which leads to more idle time :-)
I think 'cfs_b->lock' contention would go up significantly when reducing
sysctl_sched_cfs_bandwidth_slice, while for something like 'balancing' lock
(taken with SD_SERIALIZE set and more frequently when tuning down
max_interval?), yes it may increase idle time! Did you have any other
lock in mind when speaking of inversion?
- vatsa
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 11:28 ` Srivatsa Vaddagiri
@ 2011-09-13 14:07 ` Peter Zijlstra
2011-09-13 16:21 ` Srivatsa Vaddagiri
0 siblings, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-13 14:07 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Tue, 2011-09-13 at 16:58 +0530, Srivatsa Vaddagiri wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-13 11:39:48]:
>
> > On Tue, 2011-09-13 at 10:33 +0530, Srivatsa Vaddagiri wrote:
> > >
> > > This is perhaps not optimal (as it may lead to more lock contentions), but
> > > something to note for those who care for both capping and utilization in
> > > equal measure!
> >
> > You meant lock inversion, which leads to more idle time :-)
>
> I think 'cfs_b->lock' contention would go up significantly when reducing
> sysctl_sched_cfs_bandwidth_slice, while for something like 'balancing' lock
> (taken with SD_SERIALIZE set and more frequently when tuning down
> max_interval?), yes it may increase idle time! Did you have any other
> lock in mind when speaking of inversion?
I can't read it seems.. I thought you were talking about increasing the
period, which increases the time you force a task to sleep that's
holding locks etc..
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 4:15 ` Srivatsa Vaddagiri
2011-09-13 5:03 ` Srivatsa Vaddagiri
@ 2011-09-13 14:19 ` Peter Zijlstra
2011-09-13 18:01 ` Srivatsa Vaddagiri
1 sibling, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-13 14:19 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov, Thomas Gleixner
On Tue, 2011-09-13 at 09:45 +0530, Srivatsa Vaddagiri wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-12 14:35:43]:
>
> > Of course it does.. and I bet you can improve that slightly if you
> > manage to fix some of the numerical nightmares that live in the cgroup
> > load-balancer (Paul, care to share your WIP?)
>
> Booting with "nohz=off" also helps significantly.
>
> With nohz=on, average idle time (over 1 min) is 10.3%
> With nohz=off, average idle time (over 1 min) is 3.9%
So we should put the cpufreq/idle governor into the nohz/idle path, it
already tries to predict the idle duration in order to pick a C state,
that same prediction should be used to determine if stopping the tick is
worth it.
This has come up previously, but I can't quite recollect in what
context.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 14:07 ` Peter Zijlstra
@ 2011-09-13 16:21 ` Srivatsa Vaddagiri
2011-09-13 16:33 ` Peter Zijlstra
2011-09-13 16:36 ` Peter Zijlstra
0 siblings, 2 replies; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-13 16:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-13 16:07:28]:
> > > > This is perhaps not optimal (as it may lead to more lock contentions), but
> > > > something to note for those who care for both capping and utilization in
> > > > equal measure!
> > >
> > > You meant lock inversion, which leads to more idle time :-)
> >
> > I think 'cfs_b->lock' contention would go up significantly when reducing
> > sysctl_sched_cfs_bandwidth_slice, while for something like 'balancing' lock
> > (taken with SD_SERIALIZE set and more frequently when tuning down
> > max_interval?), yes it may increase idle time! Did you have any other
> > lock in mind when speaking of inversion?
>
> I can't read it seems.. I thought you were talking about increasing the
> period,
Mm ..I brought up the increased lock contention with reference to this
experimental result that I posted earlier:
> Tuning min_interval and max_interval of various sched_domains to 1
> and also setting sched_cfs_bandwidth_slice_us to 500 does cut down idle
> time further to 2.7%
Value of sched_cfs_bandwidth_slice_us was reduced from default of 5000us
to 500us, which (along with reduction of min/max interval) helped cut down
idle time further (3.9% -> 2.7%). I was commenting that this may not necessarily
be optimal (as for example low 'sched_cfs_bandwidth_slice_us' could result
in all cpus contending for cfs_b->lock very frequently).
> which increases the time you force a task to sleep that's holding locks etc..
Ideally all tasks should get capped at the same time, given that there is
a global pool from which everyone pulls bandwidth? So while one vcpu/task
(holding a lock) gets capped, other vcpus/tasks (that may want the same lock)
should ideally not be running for long after that, avoiding lock inversion
related problems you point out.
I guess that we may still run into that with current implementation ..
Basically global pool may have zero runtime left for current period,
forcing a vcpu/task to be throttled, while there is surplus runtime in
per-cpu pools, allowing some sibling vcpus/tasks to run for wee bit
more, leading to lock-inversion related problems (more idling). That
makes me think we can improve directed yield->capping interaction.
Essentially when the target task of directed yield is capped, can the
"yielding" task donate some of its bandwidth?
- vatsa
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 16:21 ` Srivatsa Vaddagiri
@ 2011-09-13 16:33 ` Peter Zijlstra
2011-09-13 17:41 ` Srivatsa Vaddagiri
2011-09-13 16:36 ` Peter Zijlstra
1 sibling, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-13 16:33 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Tue, 2011-09-13 at 21:51 +0530, Srivatsa Vaddagiri wrote:
> > which increases the time you force a task to sleep that's holding locks etc..
>
> Ideally all tasks should get capped at the same time, given that there is
> a global pool from which everyone pulls bandwidth? So while one vcpu/task
> (holding a lock) gets capped, other vcpus/tasks (that may want the same lock)
> should ideally not be running for long after that, avoiding lock inversion
> related problems you point out.
No this simply cannot be true.. You force groups to sleep so that other
groups can run, right? Therefore shared kernel locks will cause
inversion.
You cannot put both groups to sleep and still expect a utilization of
100%.
Simple example, some task in group A owns the i_mutex of a file, group A
runs out of time and gets dequeued. Some other task in group B needs
that same i_mutex.
> I guess that we may still run into that with current implementation ..
> Basically global pool may have zero runtime left for current period,
> forcing a vcpu/task to be throttled, while there is surplus runtime in
> per-cpu pools, allowing some sibling vcpus/tasks to run for wee bit
> more, leading to lock-inversion related problems (more idling). That
> makes me think we can improve directed yield->capping interaction.
> Essentially when the target task of directed yield is capped, can the
> "yielding" task donate some of its bandwidth?
What moron ever calls yield anyway? If you use yield you're doing it
wrong!
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 16:21 ` Srivatsa Vaddagiri
2011-09-13 16:33 ` Peter Zijlstra
@ 2011-09-13 16:36 ` Peter Zijlstra
2011-09-13 17:54 ` Srivatsa Vaddagiri
1 sibling, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-13 16:36 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Tue, 2011-09-13 at 21:51 +0530, Srivatsa Vaddagiri wrote:
> > I can't read it seems.. I thought you were talking about increasing the
> > period,
>
> Mm ..I brought up the increased lock contention with reference to this
> experimental result that I posted earlier:
>
> > Tuning min_interval and max_interval of various sched_domains to 1
> > and also setting sched_cfs_bandwidth_slice_us to 500 does cut down idle
> > time further to 2.7%
Yeah, that's the not being able to read part..
> Value of sched_cfs_bandwidth_slice_us was reduced from default of 5000us
> to 500us, which (along with reduction of min/max interval) helped cut down
> idle time further (3.9% -> 2.7%). I was commenting that this may not necessarily
> be optimal (as for example low 'sched_cfs_bandwidth_slice_us' could result
> in all cpus contending for cfs_b->lock very frequently).
Right.. so this seems to suggest you're migrating a lot.
Also what workload are we talking about? the insane one with 5 groups of
weight 1024?
Ramping up the frequency of the load-balancer and giving out smaller
slices is really anti-scalability.. I bet a lot of that 'reclaimed' idle
time is spend in system time.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 16:33 ` Peter Zijlstra
@ 2011-09-13 17:41 ` Srivatsa Vaddagiri
0 siblings, 0 replies; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-13 17:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-13 18:33:09]:
> On Tue, 2011-09-13 at 21:51 +0530, Srivatsa Vaddagiri wrote:
> > > which increases the time you force a task to sleep that's holding locks etc..
> >
> > Ideally all tasks should get capped at the same time, given that there is
> > a global pool from which everyone pulls bandwidth? So while one vcpu/task
> > (holding a lock) gets capped, other vcpus/tasks (that may want the same lock)
> > should ideally not be running for long after that, avoiding lock inversion
> > related problems you point out.
>
> No this simply cannot be true.. You force groups to sleep so that other
> groups can run, right? Therefore shared kernel locks will cause
> inversion.
Ah ..shared locks of "host" kernel ..true ..that can still cause
lock-inversion yes.
I had in mind user-space (or "guest" kernel) locks - which can't get inverted
that easily (one of cgroup's tasks wanting a "userspace" lock which is held by
another "throttled" task of same cgroup - causing a inversion problem of sorts).
My point was that once a task gets throttled, other sibling tasks should get
throttled almost immediately after that (given that bandwidth for a cgroup is
maintained in a global pool from which everyone draws in "small" increments) -
so a task that gets capped while holding a user-space lock should not
result in other sibling tasks going too much hungry on held locks within the
same period?
> You cannot put both groups to sleep and still expect a utilization of
> 100%.
>
> Simple example, some task in group A owns the i_mutex of a file, group A
> runs out of time and gets dequeued. Some other task in group B needs
> that same i_mutex.
>
> > I guess that we may still run into that with current implementation ..
> > Basically global pool may have zero runtime left for current period,
> > forcing a vcpu/task to be throttled, while there is surplus runtime in
> > per-cpu pools, allowing some sibling vcpus/tasks to run for wee bit
> > more, leading to lock-inversion related problems (more idling). That
> > makes me think we can improve directed yield->capping interaction.
> > Essentially when the target task of directed yield is capped, can the
> > "yielding" task donate some of its bandwidth?
>
> What moron ever calls yield anyway?
I meant directed yield (yield_to) ..which is used by KVM when it detects
pause-loops. Essentially, a vcpu spinning in guest-kernel context for too long
leading to PLE (Pasue-Loop-Exit), which leads to KVM driver doing a directed
yield to another sibling vcpu ..so the target of directed yield may be a
capped vcpu task, in which case was wondering if directed yield can donate
bit of bandwidth to the throttled task. Again going by what I said earlier about
tasks getting capped more or less at same time, this should occur very
infrequently ...something for me to test and find out nevertheless!
> If you use yield you're doing it wrong!
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 16:36 ` Peter Zijlstra
@ 2011-09-13 17:54 ` Srivatsa Vaddagiri
2011-09-13 18:03 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-13 17:54 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
[-- Attachment #1: Type: text/plain, Size: 2037 bytes --]
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-13 18:36:15]:
> > Value of sched_cfs_bandwidth_slice_us was reduced from default of 5000us
> > to 500us, which (along with reduction of min/max interval) helped cut down
> > idle time further (3.9% -> 2.7%). I was commenting that this may not necessarily
> > be optimal (as for example low 'sched_cfs_bandwidth_slice_us' could result
> > in all cpus contending for cfs_b->lock very frequently).
>
> Right.. so this seems to suggest you're migrating a lot.
We did do some experiments (outside of capping) to see how badly tasks
migrate on latest tip (compared to previous kernels). The test was to
spawn 32 cpuhogs on a 16-cpu system (place them in default cgroup -
without any capping in place) and measure how much they bounce around.
System had little load besides these cpu hogs.
We saw considerably high migration count on latest tip compared to
previous kernels. Kamalesh, can you please post the migration count
data?
> Also what workload are we talking about? the insane one with 5 groups of
> weight 1024?
We never were running the "insane" one ..we are always with proportional
shares, the "sane" one! I missed to mention that bit in my first email
(about the shares setup). I am attaching the test script we are using
for your reference. Fyi, we have added additional levels to cgroup setup
(/Level1/Level2/C1/C1_1 etc) to mimic cgroup hierarchy for VMS as
created by libvirt.
> Ramping up the frequency of the load-balancer and giving out smaller
> slices is really anti-scalability.. I bet a lot of that 'reclaimed' idle
> time is spend in system time.
System time (in top and vmstat) does remain unchanged at 0% when
cranking up load-balance frequency and slicing down
sched_cfs_bandwidth_slice_us ..I guess the additional "system" time
can't be accounted for easily by the tick-based accounting system we
have. I agree there could be other un-observed side-effects of increased
load-balance frequency (like workload performance) that I haven't noticed.
- vatsa
[-- Attachment #2: hard_limit_test.sh --]
[-- Type: application/x-sh, Size: 7035 bytes --]
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 14:19 ` Peter Zijlstra
@ 2011-09-13 18:01 ` Srivatsa Vaddagiri
2011-09-13 18:23 ` Peter Zijlstra
0 siblings, 1 reply; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-13 18:01 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov, Thomas Gleixner
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-13 16:19:39]:
> > Booting with "nohz=off" also helps significantly.
> >
> > With nohz=on, average idle time (over 1 min) is 10.3%
> > With nohz=off, average idle time (over 1 min) is 3.9%
>
> So we should put the cpufreq/idle governor into the nohz/idle path, it
> already tries to predict the idle duration in order to pick a C state,
> that same prediction should be used to determine if stopping the tick is
> worth it.
Hmm ..I tried performance governor and found that it slightly increases
idle time.
With nohz=off && ondemand governor, idle time = 4%
With nohz=off && performance governor on all cpus, idle time = 6%
I can't see obvious reasons for that ..afaict bandwidth capping should
be independent of frequency (i.e task gets capped by "used" time,
irrespective of frequency at which it was "using" the cpu)?
- vatsa
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 17:54 ` Srivatsa Vaddagiri
@ 2011-09-13 18:03 ` Peter Zijlstra
2011-09-13 18:12 ` Srivatsa Vaddagiri
2011-09-13 18:07 ` Peter Zijlstra
2011-09-13 18:19 ` Peter Zijlstra
2 siblings, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-13 18:03 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Tue, 2011-09-13 at 23:24 +0530, Srivatsa Vaddagiri wrote:
> Fyi, we have added additional levels to cgroup setup
> (/Level1/Level2/C1/C1_1 etc) to mimic cgroup hierarchy for VMS as
> created by libvirt.
The deeper you nest the bigger the numerical problems get..
Also, can you please stop using virt crap and focus on useful
things? :-) Start with simple cases of single depth groups.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 17:54 ` Srivatsa Vaddagiri
2011-09-13 18:03 ` Peter Zijlstra
@ 2011-09-13 18:07 ` Peter Zijlstra
2011-09-13 18:19 ` Peter Zijlstra
2 siblings, 0 replies; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-13 18:07 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Tue, 2011-09-13 at 23:24 +0530, Srivatsa Vaddagiri wrote:
> I guess the additional "system" time
> can't be accounted for easily by the tick-based accounting system we
> have. I agree there could be other un-observed side-effects of increased
> load-balance frequency (like workload performance) that I haven't noticed.
Yeah, very hard, its the tick that starts the balancer, so it would have
to last longer than a tick to be noticed, very unlikely.
We should implement full blown CONFIG_VIRT_CPU_ACCOUNTING,.. except I
bet that once we do that people will want it enabled and I'm pretty sure
people also don't want to pay the price for it... :-)
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 18:03 ` Peter Zijlstra
@ 2011-09-13 18:12 ` Srivatsa Vaddagiri
0 siblings, 0 replies; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-13 18:12 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-13 20:03:04]:
> On Tue, 2011-09-13 at 23:24 +0530, Srivatsa Vaddagiri wrote:
> > Fyi, we have added additional levels to cgroup setup
> > (/Level1/Level2/C1/C1_1 etc) to mimic cgroup hierarchy for VMS as
> > created by libvirt.
>
> The deeper you nest the bigger the numerical problems get..
>
> Also, can you please stop using virt crap and focus on useful
> things? :-)
That unfortunately is the target environment where we want this working
(want to cap VMs under KVM) :-) For simplicity, we have been playing
with non-VM based testcase ..
> Start with simple cases of single depth groups.
We did try with single level and "extra" large proportional
shares (10k * NR_TASKS if I recall)..I don't think they made any
difference ..will re-check though.
- vatsa
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 17:54 ` Srivatsa Vaddagiri
2011-09-13 18:03 ` Peter Zijlstra
2011-09-13 18:07 ` Peter Zijlstra
@ 2011-09-13 18:19 ` Peter Zijlstra
2011-09-13 18:28 ` Srivatsa Vaddagiri
2 siblings, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-13 18:19 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Tue, 2011-09-13 at 23:24 +0530, Srivatsa Vaddagiri wrote:
> We saw considerably high migration count on latest tip compared to
> previous kernels. Kamalesh, can you please post the migration count
> data?
Hrmm, yes this looks horrid.. even without cgroup crap, something's
funny.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 18:01 ` Srivatsa Vaddagiri
@ 2011-09-13 18:23 ` Peter Zijlstra
2011-09-16 8:14 ` Paul Turner
0 siblings, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-13 18:23 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov, Thomas Gleixner
On Tue, 2011-09-13 at 23:31 +0530, Srivatsa Vaddagiri wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-13 16:19:39]:
>
> > > Booting with "nohz=off" also helps significantly.
> > >
> > > With nohz=on, average idle time (over 1 min) is 10.3%
> > > With nohz=off, average idle time (over 1 min) is 3.9%
> >
> > So we should put the cpufreq/idle governor into the nohz/idle path, it
> > already tries to predict the idle duration in order to pick a C state,
> > that same prediction should be used to determine if stopping the tick is
> > worth it.
>
> Hmm ..I tried performance governor and found that it slightly increases
> idle time.
>
> With nohz=off && ondemand governor, idle time = 4%
> With nohz=off && performance governor on all cpus, idle time = 6%
>
> I can't see obvious reasons for that ..afaict bandwidth capping should
> be independent of frequency (i.e task gets capped by "used" time,
> irrespective of frequency at which it was "using" the cpu)?
That's not what I said.. what I said is that the nohz code should also
use the idle time prognosis.. disabling the tick is a costly operation,
doing it only to have to undo it costs time, and will be accounted to
idle time, hence your improvement with nohz=off.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 18:19 ` Peter Zijlstra
@ 2011-09-13 18:28 ` Srivatsa Vaddagiri
2011-09-13 18:30 ` Peter Zijlstra
0 siblings, 1 reply; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-13 18:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-13 20:19:55]:
> On Tue, 2011-09-13 at 23:24 +0530, Srivatsa Vaddagiri wrote:
> > We saw considerably high migration count on latest tip compared to
> > previous kernels. Kamalesh, can you please post the migration count
> > data?
>
> Hrmm, yes this looks horrid.. even without cgroup crap, something's funny.
Yes ..we could visualize that very much in top o/p .. A task's cpu would keep
changing *every* screen refresh (refreshed every 0.5 sec that too!).
We didn't see that with older kernels ..Kamalesh is planning to do a
git bisect and see which commit lead to this "mad" hopping ..
- vatsa
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 18:28 ` Srivatsa Vaddagiri
@ 2011-09-13 18:30 ` Peter Zijlstra
2011-09-13 18:35 ` Srivatsa Vaddagiri
0 siblings, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-13 18:30 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Tue, 2011-09-13 at 23:58 +0530, Srivatsa Vaddagiri wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-13 20:19:55]:
>
> > On Tue, 2011-09-13 at 23:24 +0530, Srivatsa Vaddagiri wrote:
> > > We saw considerably high migration count on latest tip compared to
> > > previous kernels. Kamalesh, can you please post the migration count
> > > data?
> >
> > Hrmm, yes this looks horrid.. even without cgroup crap, something's funny.
>
> Yes ..we could visualize that very much in top o/p .. A task's cpu would keep
> changing *every* screen refresh (refreshed every 0.5 sec that too!).
>
> We didn't see that with older kernels ..Kamalesh is planning to do a
> git bisect and see which commit lead to this "mad" hopping ..
Awesome, thanks! Btw, what is 'older'? 3.0?
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 18:30 ` Peter Zijlstra
@ 2011-09-13 18:35 ` Srivatsa Vaddagiri
2011-09-15 17:55 ` Kamalesh Babulal
0 siblings, 1 reply; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-13 18:35 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-13 20:30:46]:
> On Tue, 2011-09-13 at 23:58 +0530, Srivatsa Vaddagiri wrote:
> > * Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-13 20:19:55]:
> >
> > > On Tue, 2011-09-13 at 23:24 +0530, Srivatsa Vaddagiri wrote:
> > > > We saw considerably high migration count on latest tip compared to
> > > > previous kernels. Kamalesh, can you please post the migration count
> > > > data?
> > >
> > > Hrmm, yes this looks horrid.. even without cgroup crap, something's funny.
> >
> > Yes ..we could visualize that very much in top o/p .. A task's cpu would keep
> > changing *every* screen refresh (refreshed every 0.5 sec that too!).
> >
> > We didn't see that with older kernels ..Kamalesh is planning to do a
> > git bisect and see which commit lead to this "mad" hopping ..
>
> Awesome, thanks! Btw, what is 'older'? 3.0?
We went back all the way upto 2.6.32! I think 2.6.38 and 2.6.39 were
pretty stable ..I don't have the migration count data with me readily. I
will let Kamalesh post that info soon.
- vatsa
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 18:35 ` Srivatsa Vaddagiri
@ 2011-09-15 17:55 ` Kamalesh Babulal
2011-09-15 21:48 ` Peter Zijlstra
2011-09-20 12:55 ` Peter Zijlstra
0 siblings, 2 replies; 129+ messages in thread
From: Kamalesh Babulal @ 2011-09-15 17:55 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Peter Zijlstra, Paul Turner, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> [2011-09-14 00:05:02]:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-13 20:30:46]:
>
> > On Tue, 2011-09-13 at 23:58 +0530, Srivatsa Vaddagiri wrote:
> > > * Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-13 20:19:55]:
> > >
> > > > On Tue, 2011-09-13 at 23:24 +0530, Srivatsa Vaddagiri wrote:
> > > > > We saw considerably high migration count on latest tip compared to
> > > > > previous kernels. Kamalesh, can you please post the migration count
> > > > > data?
> > > >
> > > > Hrmm, yes this looks horrid.. even without cgroup crap, something's funny.
> > >
> > > Yes ..we could visualize that very much in top o/p .. A task's cpu would keep
> > > changing *every* screen refresh (refreshed every 0.5 sec that too!).
> > >
> > > We didn't see that with older kernels ..Kamalesh is planning to do a
> > > git bisect and see which commit lead to this "mad" hopping ..
> >
> > Awesome, thanks! Btw, what is 'older'? 3.0?
>
> We went back all the way upto 2.6.32! I think 2.6.38 and 2.6.39 were
> pretty stable ..I don't have the migration count data with me readily. I
> will let Kamalesh post that info soon.
Test Setup :
-----------
Machine is 2 socket Quad Core Intel (x5570) box. The lb.sh
script was run in a loop to execute 5 times after the box
was bought up with the kernel.
lb.sh script spawns 2x number of CPU hogs, where x is
number of CPUs on the system. The script collects the
se.nr_migration before/after 60 seconds sleep and subtracts
the after_se.nr_migration - before_se.nr_migrations for
all the spawned hogs.
----------------+-------+-------+-------+-------+-------+
Kernel | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 |
----------------+-------+-------+-------+-------+-------+
2.6.33 | 9604 | 101 | 66 | 2543 | 3488 |
----------------+-------+-------+-------+-------+-------+
2.6.34 | 28469 | 1514 | 1602 | 185 | 139 |
----------------+-------+-------+-------+-------+-------+
2.6.35 | 1052 | 12 | 4 | 11 | 6 |
----------------+-------+-------+-------+-------+-------+
2.6.36 | 1253 | 53 | 78 | 76 | 50 |
----------------+-------+-------+-------+-------+-------+
2.6.37 | 262 | 36 | 48 | 61 | 43 |
----------------+-------+-------+-------+-------+-------+
2.6.38 | 1551 | 48 | 62 | 47 | 50 |
----------------+-------+-------+-------+-------+-------+
2.6.39 | 3784 | 457 | 722 | 3209 | 1037 |
----------------+-------+-------+-------+-------+-------+
3.0 | 933 | 608 | 658 | 1424 | 1415 |
----------------+-------+-------+-------+-------+-------+
3.1.0-rc4-tip | | | | | |
(e467f18f945) | 1672 | 1643 | 1316 | 1577 | 61 |
----------------+-------+-------+-------+-------+-------+
lb.sh
------
#!/bin/bash
rm -rf test*
rm -rf t*
ITERATIONS=60 # No of Iterations to capture the details
NUM_CPUS=$(cat /proc/cpuinfo |grep -i proces|wc -l)
NUM_HOGS=$((NUM_CPUS * 2)) # No of hogs threads to invoke
echo "System has $NUM_CPUS cpus..... Spawing $NUM_HOGS cpu hogs ... for $ITERATIONS seconds.."
if [ ! -e while1.c ]
then
cat >> while1.c << EOF
int
main (int argc, char **argv)
{
while(1);
return (0);
}
EOF
fi
for i in $(seq 1 $NUM_HOGS)
do
gcc -o while$i while1.c
if [ $? -ne 0 ]
then
echo "Looks like gcc is not present ... aborting"
exit
fi
done
for i in $(seq 1 $NUM_HOGS)
do
./while$i &
pids[$i]=$!
pids_old[$i]=`cat /proc/$!/sched |grep -i nr_migr|grep -iv cold|cut -d ":" -f2|sed 's/ //g'`
done
sleep $ITERATIONS
j=1
old_nr_migrations=0
new_nr_migrations=0
echo -e " \t New \t Old"
for i in $(seq 1 $NUM_HOGS)
do
a=`echo ${pids[i]}`
new=`cat /proc/$a/sched |grep -i nr_migr|grep -iv cold|cut -d ":" -f2|sed 's/ //g'`
old=`echo ${pids_old[i]}`
old_nr_migrations=$((old_nr_migrations + old))
c=$(($new - $old))
new_nr_migrations=$((new_nr_migrations + c))
echo -e "while$i\t[$new]\t[$old]\t"
done
echo "*******************************************"
echo -e " $new_nr_migrations\t$old_nr_migrations"
echo "*******************************************"
pkill -9 while
exit
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-15 17:55 ` Kamalesh Babulal
@ 2011-09-15 21:48 ` Peter Zijlstra
2011-09-19 17:51 ` Kamalesh Babulal
2011-09-20 12:55 ` Peter Zijlstra
1 sibling, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-15 21:48 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: Srivatsa Vaddagiri, Paul Turner, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Thu, 2011-09-15 at 23:25 +0530, Kamalesh Babulal wrote:
> 2.6.38 | 1551 | 48 | 62 | 47 | 50 |
> ----------------+-------+-------+-------+-------+-------+
> 2.6.39 | 3784 | 457 | 722 | 3209 | 1037 |
I'd say we wrecked it going from .38 to .39 and only made it worse after
that.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-13 18:23 ` Peter Zijlstra
@ 2011-09-16 8:14 ` Paul Turner
2011-09-16 8:28 ` Peter Zijlstra
0 siblings, 1 reply; 129+ messages in thread
From: Paul Turner @ 2011-09-16 8:14 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Srivatsa Vaddagiri, Kamalesh Babulal, Vladimir Davydov,
linux-kernel, Bharata B Rao, Dhaval Giani,
Vaidyanathan Srinivasan, Ingo Molnar, Pavel Emelianov,
Thomas Gleixner
On 09/13/11 11:23, Peter Zijlstra wrote:
> On Tue, 2011-09-13 at 23:31 +0530, Srivatsa Vaddagiri wrote:
>> * Peter Zijlstra<a.p.zijlstra@chello.nl> [2011-09-13 16:19:39]:
>>
>>>> Booting with "nohz=off" also helps significantly.
>>>>
>>>> With nohz=on, average idle time (over 1 min) is 10.3%
>>>> With nohz=off, average idle time (over 1 min) is 3.9%
I think more compelling here is that it looks like nohz load-balance
needs more love.
>>>
>>> So we should put the cpufreq/idle governor into the nohz/idle path, it
>>> already tries to predict the idle duration in order to pick a C state,
>>> that same prediction should be used to determine if stopping the tick is
>>> worth it.
>>
>> Hmm ..I tried performance governor and found that it slightly increases
>> idle time.
>>
>> With nohz=off&& ondemand governor, idle time = 4%
>> With nohz=off&& performance governor on all cpus, idle time = 6%
>>
>> I can't see obvious reasons for that ..afaict bandwidth capping should
>> be independent of frequency (i.e task gets capped by "used" time,
>> irrespective of frequency at which it was "using" the cpu)?
>
> That's not what I said.. what I said is that the nohz code should also
> use the idle time prognosis.. disabling the tick is a costly operation,
> doing it only to have to undo it costs time, and will be accounted to
> idle time, hence your improvement with nohz=off.
>
Enabling Venki's CONFIG_IRQ_TIME_ACCOUNTING=y would discount to provide
a definitive answer here yes?
- Paul
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-07 15:20 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede Srivatsa Vaddagiri
2011-09-07 19:22 ` Peter Zijlstra
@ 2011-09-16 8:22 ` Paul Turner
1 sibling, 0 replies; 129+ messages in thread
From: Paul Turner @ 2011-09-16 8:22 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Kamalesh Babulal, Vladimir Davydov, linux-kernel, Peter Zijlstra,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On 09/07/11 08:20, Srivatsa Vaddagiri wrote:
> [Apologies if you get this email multiple times - there is some email
> client config issue that I am fixing up]
>
> * Paul Turner<pjt@google.com> [2011-06-21 12:48:17]:
>
>> Hi Kamalesh,
>>
>> Can you see what things look like under v7?
>>
>> There's been a few improvements to quota re-distribution that should
>> hopefully help your test case.
>>
>> The remaining idle% I see on my machines appear to be a product of
>> load-balancer inefficiency.
>
Hey Srivatsa,
Thanks for taking another look at this -- sorry for the delayed reply!
> which is quite a complex problem to solve! I am still surprised that
> we can't handle 32 cpuhogs on a 16-cpu system very easily. The tasks seem to
> hop around madly rather than settle down as 2 tasks/cpu. Kamalesh, can you post
> the exact count of migrations we saw on latest tip over a 20-sec window?
>
> Anyway, here's a "hack" to minimize the idle time induced due to load-balance
> issues. It brings down idle time from 7+% to ~0% ..I am not too happy about
> this, but I don't see any other simpler solutions to solve the idle time issue
> completely (other than making load-balancer completely fair!).
Hum,
So BWC returns bandwidth on voluntary sleep to the parent, so the most
we can really lose is NR_CPUS * 1ms (how much a cpu keeps in case the
entity re-wakes up quickly). Technically we could lose another few ms
if there's not enough BW left to bother distributing and we're near the
end of the period; but I think that works out to another 6ms or so at
worst.
As discussed in the long thread dangling off this; it's load-balance
that's at fault -- allowing steal time is just hiding this by instead
letting cpus run over quota within a period.
If you for example set-up a deadline oriented test that tried to
accomplish the same amount of work (without bandwidth limits) and threw
away the rest of the work when it reached period expiration (a benchmark
I've been meaning to write and publish as a more general load-balance
test actually); then I suspect we'd see similar problems; and sadly,
this case is both more representative of real-world performance and not
fixable by something like steal-time.
So... we're probably better off trying to improve LB; I raised it in
another reply on the chain but the NOHZ vs ticks ilb numbers look pretty
compelling as an area for improvement in this regard.
Thanks!
- Paul
>
> --
>
> Fix excessive idle time reported when cgroups are capped. The patch
> introduces the notion of "steal" (or "grace") time which is the surplus
> time/bandwidth each cgroup is allowed to consume, subject to a maximum
> steal time (sched_cfs_max_steal_time_us). Cgroups are allowed this "steal"
> or "grace" time when the lone task running on a cpu is about to be throttled.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-16 8:14 ` Paul Turner
@ 2011-09-16 8:28 ` Peter Zijlstra
2011-09-19 16:35 ` Srivatsa Vaddagiri
0 siblings, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-16 8:28 UTC (permalink / raw)
To: Paul Turner
Cc: Srivatsa Vaddagiri, Kamalesh Babulal, Vladimir Davydov,
linux-kernel, Bharata B Rao, Dhaval Giani,
Vaidyanathan Srinivasan, Ingo Molnar, Pavel Emelianov,
Thomas Gleixner
On Fri, 2011-09-16 at 01:14 -0700, Paul Turner wrote:
> On 09/13/11 11:23, Peter Zijlstra wrote:
> > On Tue, 2011-09-13 at 23:31 +0530, Srivatsa Vaddagiri wrote:
> >> * Peter Zijlstra<a.p.zijlstra@chello.nl> [2011-09-13 16:19:39]:
> >>
> >>>> Booting with "nohz=off" also helps significantly.
> >>>>
> >>>> With nohz=on, average idle time (over 1 min) is 10.3%
> >>>> With nohz=off, average idle time (over 1 min) is 3.9%
>
> I think more compelling here is that it looks like nohz load-balance
> needs more love.
Quite probable, although I do know we tend to go overboard in going into
nohz state too.
> > That's not what I said.. what I said is that the nohz code should also
> > use the idle time prognosis.. disabling the tick is a costly operation,
> > doing it only to have to undo it costs time, and will be accounted to
> > idle time, hence your improvement with nohz=off.
> >
>
> Enabling Venki's CONFIG_IRQ_TIME_ACCOUNTING=y would discount to provide
> a definitive answer here yes?
Ah, yes, its all (soft)irq context anyway, no need to also account
systemcalls.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-16 8:28 ` Peter Zijlstra
@ 2011-09-19 16:35 ` Srivatsa Vaddagiri
0 siblings, 0 replies; 129+ messages in thread
From: Srivatsa Vaddagiri @ 2011-09-19 16:35 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Turner, Kamalesh Babulal, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov, Thomas Gleixner
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-16 10:28:40]:
> > I think more compelling here is that it looks like nohz load-balance
> > needs more love.
>
> Quite probable,
Staring at nohz load-balancer for sometime, I see a potential issue:
'first_pick_cpu' and 'second_pick_cpu' can be idle without stopping
ticks for quite a while. When that happens, they stop bothering to
kick ilb cpu because of this snippet in nohz_kick_needed():
static inline int nohz_kick_needed(struct rq *rq, int cpu)
{
..
if (rq->idle_at_tick)
return 0;
..
}
?
- vatsa
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-15 21:48 ` Peter Zijlstra
@ 2011-09-19 17:51 ` Kamalesh Babulal
2011-09-20 0:38 ` Venki Pallipadi
` (2 more replies)
0 siblings, 3 replies; 129+ messages in thread
From: Kamalesh Babulal @ 2011-09-19 17:51 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Srivatsa Vaddagiri, Paul Turner, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-15 23:48:43]:
> On Thu, 2011-09-15 at 23:25 +0530, Kamalesh Babulal wrote:
> > 2.6.38 | 1551 | 48 | 62 | 47 | 50 |
> > ----------------+-------+-------+-------+-------+-------+
> > 2.6.39 | 3784 | 457 | 722 | 3209 | 1037 |
>
> I'd say we wrecked it going from .38 to .39 and only made it worse after
> that.
after reverting the commit 866ab43efd325fae8889ea, of the patches
went between .38 and .39 reduces the ping pong of the tasks.
------------------------+-------+-------+-------+-------+-------+
Kernel | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 |
------------------------+-------+-------+-------+-------+-------+
2.6.39 | 1542 | 2172 | 2727 | 120 | 3681 |
------------------------+-------+-------+-------+-------+-------+
2.6.39 (with | | | | | |
866ab43efd reverted) | 65 | 78 | 58 | 99 | 62 |
------------------------+-------+-------+-------+-------+-------+
3.1-rc4+tip | | | | | |
(e467f18f945c) | 1219 | 2037 | 1943 | 772 | 1701 |
------------------------+-------+-------+-------+-------+-------+
3.1-rc4+tip (e467f18f9) | | | | | |
(866ab43efd reverted) | 64 | 45 | 59 | 59 | 69 |
------------------------+-------+-------+-------+-------+-------+
Thanks,
Kamalesh.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-19 17:51 ` Kamalesh Babulal
@ 2011-09-20 0:38 ` Venki Pallipadi
2011-09-20 11:09 ` Kamalesh Babulal
2011-09-20 13:56 ` Peter Zijlstra
2011-09-20 14:04 ` Peter Zijlstra
2 siblings, 1 reply; 129+ messages in thread
From: Venki Pallipadi @ 2011-09-20 0:38 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: Peter Zijlstra, Srivatsa Vaddagiri, Paul Turner,
Vladimir Davydov, linux-kernel, Bharata B Rao, Dhaval Giani,
Vaidyanathan Srinivasan, Ingo Molnar, Pavel Emelianov, Ken Chen
On Mon, Sep 19, 2011 at 10:51 AM, Kamalesh Babulal
<kamalesh@linux.vnet.ibm.com> wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-15 23:48:43]:
>
>> On Thu, 2011-09-15 at 23:25 +0530, Kamalesh Babulal wrote:
>> > 2.6.38 | 1551 | 48 | 62 | 47 | 50 |
>> > ----------------+-------+-------+-------+-------+-------+
>> > 2.6.39 | 3784 | 457 | 722 | 3209 | 1037 |
>>
>> I'd say we wrecked it going from .38 to .39 and only made it worse after
>> that.
>
> after reverting the commit 866ab43efd325fae8889ea, of the patches
> went between .38 and .39 reduces the ping pong of the tasks.
There was a side-effect from 866ab43efd325fae8889ea that Ken
identified and fixed later in commit
b0432d8f162c7d5d9537b4cb749d44076b76a783. I guess you are seeing the
same problem...
Thanks,
Venki
>
> ------------------------+-------+-------+-------+-------+-------+
> Kernel | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 |
> ------------------------+-------+-------+-------+-------+-------+
> 2.6.39 | 1542 | 2172 | 2727 | 120 | 3681 |
> ------------------------+-------+-------+-------+-------+-------+
> 2.6.39 (with | | | | | |
> 866ab43efd reverted) | 65 | 78 | 58 | 99 | 62 |
> ------------------------+-------+-------+-------+-------+-------+
> 3.1-rc4+tip | | | | | |
> (e467f18f945c) | 1219 | 2037 | 1943 | 772 | 1701 |
> ------------------------+-------+-------+-------+-------+-------+
> 3.1-rc4+tip (e467f18f9) | | | | | |
> (866ab43efd reverted) | 64 | 45 | 59 | 59 | 69 |
> ------------------------+-------+-------+-------+-------+-------+
>
> Thanks,
> Kamalesh.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-20 0:38 ` Venki Pallipadi
@ 2011-09-20 11:09 ` Kamalesh Babulal
0 siblings, 0 replies; 129+ messages in thread
From: Kamalesh Babulal @ 2011-09-20 11:09 UTC (permalink / raw)
To: Venki Pallipadi
Cc: Peter Zijlstra, Srivatsa Vaddagiri, Paul Turner,
Vladimir Davydov, linux-kernel, Bharata B Rao, Dhaval Giani,
Vaidyanathan Srinivasan, Ingo Molnar, Pavel Emelianov, Ken Chen
* Venki Pallipadi <venki@google.com> [2011-09-19 17:38:26]:
> On Mon, Sep 19, 2011 at 10:51 AM, Kamalesh Babulal
> <kamalesh@linux.vnet.ibm.com> wrote:
> > * Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-15 23:48:43]:
> >
> >> On Thu, 2011-09-15 at 23:25 +0530, Kamalesh Babulal wrote:
> >> > 2.6.38 | 1551 | 48 | 62 | 47 | 50 |
> >> > ----------------+-------+-------+-------+-------+-------+
> >> > 2.6.39 | 3784 | 457 | 722 | 3209 | 1037 |
> >>
> >> I'd say we wrecked it going from .38 to .39 and only made it worse after
> >> that.
> >
> > after reverting the commit 866ab43efd325fae8889ea, of the patches
> > went between .38 and .39 reduces the ping pong of the tasks.
>
> There was a side-effect from 866ab43efd325fae8889ea that Ken
> identified and fixed later in commit
> b0432d8f162c7d5d9537b4cb749d44076b76a783. I guess you are seeing the
> same problem...
(snip)
3.1-rc4+tip includes the commit b0432d8f162c7d5d. The number of task
ping pongs has reduced with 3.1-rc4+tip, in comparison to 2.6.39 as
seen in the below table. Reverting the commit 866ab43efd325fa on the
3.1-rc4+tip reduces the task bouncing to a good extend.
> > ------------------------+-------+-------+-------+-------+-------+
> > Kernel | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 |
> > ------------------------+-------+-------+-------+-------+-------+
> > 2.6.39 | 1542 | 2172 | 2727 | 120 | 3681 |
> > ------------------------+-------+-------+-------+-------+-------+
> > 2.6.39 (with | | | | | |
> > 866ab43efd reverted) | 65 | 78 | 58 | 99 | 62 |
> > ------------------------+-------+-------+-------+-------+-------+
> > 3.1-rc4+tip | | | | | |
> > (e467f18f945c) | 1219 | 2037 | 1943 | 772 | 1701 |
> > ------------------------+-------+-------+-------+-------+-------+
> > 3.1-rc4+tip (e467f18f9) | | | | | |
> > (866ab43efd reverted) | 64 | 45 | 59 | 59 | 69 |
> > ------------------------+-------+-------+-------+-------+-------+
> >
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-15 17:55 ` Kamalesh Babulal
2011-09-15 21:48 ` Peter Zijlstra
@ 2011-09-20 12:55 ` Peter Zijlstra
2011-09-21 17:34 ` Kamalesh Babulal
1 sibling, 1 reply; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-20 12:55 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: Srivatsa Vaddagiri, Paul Turner, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Thu, 2011-09-15 at 23:25 +0530, Kamalesh Babulal wrote:
> lb.sh
> ------
> #!/bin/bash
>
> rm -rf test*
> rm -rf t*
You're insane, right?
> ITERATIONS=60 # No of Iterations to capture the details
> NUM_CPUS=$(cat /proc/cpuinfo |grep -i proces|wc -l)
>
> NUM_HOGS=$((NUM_CPUS * 2)) # No of hogs threads to invoke
>
> echo "System has $NUM_CPUS cpus..... Spawing $NUM_HOGS cpu hogs ... for $ITERATIONS seconds.."
> if [ ! -e while1.c ]
> then
> cat >> while1.c << EOF
> int
> main (int argc, char **argv)
> {
> while(1);
> return (0);
> }
> EOF
> fi
>
> for i in $(seq 1 $NUM_HOGS)
> do
> gcc -o while$i while1.c
> if [ $? -ne 0 ]
> then
> echo "Looks like gcc is not present ... aborting"
> exit
> fi
> done
>
> for i in $(seq 1 $NUM_HOGS)
> do
> ./while$i &
You can kill the above two blocks by doing:
while :; do :; done &
> pids[$i]=$!
> pids_old[$i]=`cat /proc/$!/sched |grep -i nr_migr|grep -iv cold|cut -d ":" -f2|sed 's/ //g'`
> done
and a fixup of the pkill muck.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-19 17:51 ` Kamalesh Babulal
2011-09-20 0:38 ` Venki Pallipadi
@ 2011-09-20 13:56 ` Peter Zijlstra
2011-09-20 14:04 ` Peter Zijlstra
2 siblings, 0 replies; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-20 13:56 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: Srivatsa Vaddagiri, Paul Turner, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Mon, 2011-09-19 at 23:21 +0530, Kamalesh Babulal wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-15 23:48:43]:
>
> > On Thu, 2011-09-15 at 23:25 +0530, Kamalesh Babulal wrote:
> > > 2.6.38 | 1551 | 48 | 62 | 47 | 50 |
> > > ----------------+-------+-------+-------+-------+-------+
> > > 2.6.39 | 3784 | 457 | 722 | 3209 | 1037 |
> >
> > I'd say we wrecked it going from .38 to .39 and only made it worse after
> > that.
>
> after reverting the commit 866ab43efd325fae8889ea, of the patches
> went between .38 and .39 reduces the ping pong of the tasks.
>
> ------------------------+-------+-------+-------+-------+-------+
> Kernel | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 |
> ------------------------+-------+-------+-------+-------+-------+
> 2.6.39 | 1542 | 2172 | 2727 | 120 | 3681 |
> ------------------------+-------+-------+-------+-------+-------+
> 2.6.39 (with | | | | | |
> 866ab43efd reverted) | 65 | 78 | 58 | 99 | 62 |
> ------------------------+-------+-------+-------+-------+-------+
> 3.1-rc4+tip | | | | | |
> (e467f18f945c) | 1219 | 2037 | 1943 | 772 | 1701 |
> ------------------------+-------+-------+-------+-------+-------+
> 3.1-rc4+tip (e467f18f9) | | | | | |
> (866ab43efd reverted) | 64 | 45 | 59 | 59 | 69 |
> ------------------------+-------+-------+-------+-------+-------+
Right, so reverting that breaks the cpuset/cpuaffinity thing again :-(
Now I'm not quite sure why group_imb gets toggled in this use-case at
all, having put a trace_printk() in, we get:
<...>-1894 [006] 704.056250: find_busiest_group: max: 2048, min: 0, avg: 1024, nr: 2
kworker/1:1-101 [001] 706.305523: find_busiest_group: max: 3072, min: 0, avg: 1024, nr: 3
Which is of course a bad state to be in, but we also get:
migration/17-73 [017] 706.284191: find_busiest_group: max: 1024, min: 0, avg: 512, nr: 2
<idle>-0 [003] 706.325435: find_busiest_group: max: 1250, min: 440, avg: 1024, nr: 2
on a CGROUP=n kernel.. which I think we can attribute to races.
When I enable tracing I also get some good runs, so it smells like the
lb does one bad thing and instead of correcting it it makes it worse.
It looks like its set-off by a mass-wakeup of random crap that really
shouldn't be waking at all, I mean who needs automount to wakeup, or
whatever the fuck rtkit-daemon is. I'm pretty sure my bash loops don't
do anything remotely related to those.
Anyway, once enough random crap wakes up, the load-balancer goes shift
stuff around, once we hit the group_imb conditions we seem to get stuck
in a bad state instead of getting out of it.
Bah!
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-19 17:51 ` Kamalesh Babulal
2011-09-20 0:38 ` Venki Pallipadi
2011-09-20 13:56 ` Peter Zijlstra
@ 2011-09-20 14:04 ` Peter Zijlstra
2 siblings, 0 replies; 129+ messages in thread
From: Peter Zijlstra @ 2011-09-20 14:04 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: Srivatsa Vaddagiri, Paul Turner, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
On Tue, 2011-09-20 at 15:56 +0200, Peter Zijlstra wrote:
>
> Anyway, once enough random crap wakes up, the load-balancer goes shift
> stuff around, once we hit the group_imb conditions we seem to get stuck
> in a bad state instead of getting out of it.
I bet all that crap wakes on the same tick that sets of the
load-balancer, because none of those things runs long enough to register
otherwise.
Looks like we need proper time weighted load averages for the regular lb
too.. pjt mentioned doing something like that as well, if only to reduce
the number of different load calculations we have.
^ permalink raw reply [flat|nested] 129+ messages in thread
* Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede
2011-09-20 12:55 ` Peter Zijlstra
@ 2011-09-21 17:34 ` Kamalesh Babulal
0 siblings, 0 replies; 129+ messages in thread
From: Kamalesh Babulal @ 2011-09-21 17:34 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Srivatsa Vaddagiri, Paul Turner, Vladimir Davydov, linux-kernel,
Bharata B Rao, Dhaval Giani, Vaidyanathan Srinivasan,
Ingo Molnar, Pavel Emelianov
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2011-09-20 14:55:20]:
> On Thu, 2011-09-15 at 23:25 +0530, Kamalesh Babulal wrote:
>
(snip)
> > rm -rf test*
> > rm -rf t*
>
> You're insane, right?
Ofcourse not :-). It's a typo. it should have been
rm -rf r* to delete the temporary files created by
the original script (Only the part which does the
se.nr_migrations calculation was posted).
> > ITERATIONS=60 # No of Iterations to capture the details
> > NUM_CPUS=$(cat /proc/cpuinfo |grep -i proces|wc -l)
> >
> > NUM_HOGS=$((NUM_CPUS * 2)) # No of hogs threads to invoke
> >
(snip)
> > for i in $(seq 1 $NUM_HOGS)
> > do
> > ./while$i &
>
> You can kill the above two blocks by doing:
>
> while :; do :; done &
Thanks. Got to knew this from your commit 866ab43efd325fae88 previously.
Thanks,
Kamalesh.
^ permalink raw reply [flat|nested] 129+ messages in thread
end of thread, other threads:[~2011-09-21 17:34 UTC | newest]
Thread overview: 129+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-03 9:28 [patch 00/15] CFS Bandwidth Control V6 Paul Turner
2011-05-03 9:28 ` [patch 01/15] sched: (fixlet) dont update shares twice on on_rq parent Paul Turner
2011-05-10 7:14 ` Hidetoshi Seto
2011-05-10 8:32 ` Mike Galbraith
2011-05-11 7:55 ` Hidetoshi Seto
2011-05-11 8:13 ` Paul Turner
2011-05-11 8:45 ` Mike Galbraith
2011-05-11 8:59 ` Hidetoshi Seto
2011-05-03 9:28 ` [patch 02/15] sched: hierarchical task accounting for SCHED_OTHER Paul Turner
2011-05-10 7:17 ` Hidetoshi Seto
2011-05-03 9:28 ` [patch 03/15] sched: introduce primitives to account for CFS bandwidth tracking Paul Turner
2011-05-10 7:18 ` Hidetoshi Seto
2011-05-03 9:28 ` [patch 04/15] sched: validate CFS quota hierarchies Paul Turner
2011-05-10 7:20 ` Hidetoshi Seto
2011-05-11 9:37 ` Paul Turner
2011-05-16 9:30 ` Peter Zijlstra
2011-05-16 9:43 ` Peter Zijlstra
2011-05-16 12:32 ` Paul Turner
2011-05-17 15:26 ` Peter Zijlstra
2011-05-18 7:16 ` Paul Turner
2011-05-18 11:57 ` Peter Zijlstra
2011-05-03 9:28 ` [patch 05/15] sched: add a timer to handle CFS bandwidth refresh Paul Turner
2011-05-10 7:21 ` Hidetoshi Seto
2011-05-11 9:27 ` Paul Turner
2011-05-16 10:18 ` Peter Zijlstra
2011-05-16 12:56 ` Paul Turner
2011-05-03 9:28 ` [patch 06/15] sched: accumulate per-cfs_rq cpu usage and charge against bandwidth Paul Turner
2011-05-10 7:22 ` Hidetoshi Seto
2011-05-11 9:25 ` Paul Turner
2011-05-16 10:27 ` Peter Zijlstra
2011-05-16 12:59 ` Paul Turner
2011-05-17 15:28 ` Peter Zijlstra
2011-05-18 7:02 ` Paul Turner
2011-05-16 10:32 ` Peter Zijlstra
2011-05-03 9:28 ` [patch 07/15] sched: expire invalid runtime Paul Turner
2011-05-10 7:22 ` Hidetoshi Seto
2011-05-16 11:05 ` Peter Zijlstra
2011-05-16 11:07 ` Peter Zijlstra
2011-05-03 9:28 ` [patch 08/15] sched: throttle cfs_rq entities which exceed their local runtime Paul Turner
2011-05-10 7:23 ` Hidetoshi Seto
2011-05-16 15:58 ` Peter Zijlstra
2011-05-16 16:05 ` Peter Zijlstra
2011-05-03 9:28 ` [patch 09/15] sched: unthrottle cfs_rq(s) who ran out of quota at period refresh Paul Turner
2011-05-10 7:24 ` Hidetoshi Seto
2011-05-11 9:24 ` Paul Turner
2011-05-03 9:28 ` [patch 10/15] sched: allow for positional tg_tree walks Paul Turner
2011-05-10 7:24 ` Hidetoshi Seto
2011-05-17 13:31 ` Peter Zijlstra
2011-05-18 7:18 ` Paul Turner
2011-05-03 9:28 ` [patch 11/15] sched: prevent interactions between throttled entities and load-balance Paul Turner
2011-05-10 7:26 ` Hidetoshi Seto
2011-05-11 9:11 ` Paul Turner
2011-05-03 9:28 ` [patch 12/15] sched: migrate throttled tasks on HOTPLUG Paul Turner
2011-05-10 7:27 ` Hidetoshi Seto
2011-05-11 9:10 ` Paul Turner
2011-05-03 9:28 ` [patch 13/15] sched: add exports tracking cfs bandwidth control statistics Paul Turner
2011-05-10 7:27 ` Hidetoshi Seto
2011-05-11 7:56 ` Hidetoshi Seto
2011-05-11 9:09 ` Paul Turner
2011-05-03 9:29 ` [patch 14/15] sched: return unused runtime on voluntary sleep Paul Turner
2011-05-10 7:28 ` Hidetoshi Seto
2011-05-03 9:29 ` [patch 15/15] sched: add documentation for bandwidth control Paul Turner
2011-05-10 7:29 ` Hidetoshi Seto
2011-05-11 9:09 ` Paul Turner
2011-06-07 15:45 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned Kamalesh Babulal
2011-06-08 3:09 ` Paul Turner
2011-06-08 10:46 ` Vladimir Davydov
2011-06-08 16:32 ` Kamalesh Babulal
2011-06-09 3:25 ` Paul Turner
2011-06-10 18:17 ` Kamalesh Babulal
2011-06-14 0:00 ` Paul Turner
2011-06-15 5:37 ` Kamalesh Babulal
2011-06-21 19:48 ` Paul Turner
2011-06-24 15:05 ` Kamalesh Babulal
2011-09-07 11:00 ` Srivatsa Vaddagiri
2011-09-07 14:54 ` Srivatsa Vaddagiri
2011-09-07 15:20 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinnede Srivatsa Vaddagiri
2011-09-07 19:22 ` Peter Zijlstra
2011-09-08 15:15 ` Srivatsa Vaddagiri
2011-09-09 12:31 ` Peter Zijlstra
2011-09-09 13:26 ` Srivatsa Vaddagiri
2011-09-12 10:17 ` Srivatsa Vaddagiri
2011-09-12 12:35 ` Peter Zijlstra
2011-09-13 4:15 ` Srivatsa Vaddagiri
2011-09-13 5:03 ` Srivatsa Vaddagiri
2011-09-13 5:05 ` Srivatsa Vaddagiri
2011-09-13 9:39 ` Peter Zijlstra
2011-09-13 11:28 ` Srivatsa Vaddagiri
2011-09-13 14:07 ` Peter Zijlstra
2011-09-13 16:21 ` Srivatsa Vaddagiri
2011-09-13 16:33 ` Peter Zijlstra
2011-09-13 17:41 ` Srivatsa Vaddagiri
2011-09-13 16:36 ` Peter Zijlstra
2011-09-13 17:54 ` Srivatsa Vaddagiri
2011-09-13 18:03 ` Peter Zijlstra
2011-09-13 18:12 ` Srivatsa Vaddagiri
2011-09-13 18:07 ` Peter Zijlstra
2011-09-13 18:19 ` Peter Zijlstra
2011-09-13 18:28 ` Srivatsa Vaddagiri
2011-09-13 18:30 ` Peter Zijlstra
2011-09-13 18:35 ` Srivatsa Vaddagiri
2011-09-15 17:55 ` Kamalesh Babulal
2011-09-15 21:48 ` Peter Zijlstra
2011-09-19 17:51 ` Kamalesh Babulal
2011-09-20 0:38 ` Venki Pallipadi
2011-09-20 11:09 ` Kamalesh Babulal
2011-09-20 13:56 ` Peter Zijlstra
2011-09-20 14:04 ` Peter Zijlstra
2011-09-20 12:55 ` Peter Zijlstra
2011-09-21 17:34 ` Kamalesh Babulal
2011-09-13 14:19 ` Peter Zijlstra
2011-09-13 18:01 ` Srivatsa Vaddagiri
2011-09-13 18:23 ` Peter Zijlstra
2011-09-16 8:14 ` Paul Turner
2011-09-16 8:28 ` Peter Zijlstra
2011-09-19 16:35 ` Srivatsa Vaddagiri
2011-09-16 8:22 ` Paul Turner
2011-06-14 10:16 ` CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned Hidetoshi Seto
2011-06-14 6:58 ` [patch 00/15] CFS Bandwidth Control V6 Hu Tao
2011-06-14 7:29 ` Hidetoshi Seto
2011-06-14 7:44 ` Hu Tao
2011-06-15 8:37 ` Hu Tao
2011-06-16 0:57 ` Hidetoshi Seto
2011-06-16 9:45 ` Hu Tao
2011-06-17 1:22 ` Hidetoshi Seto
2011-06-17 6:05 ` Hu Tao
2011-06-17 6:25 ` Paul Turner
2011-06-17 9:13 ` Hidetoshi Seto
2011-06-18 0:28 ` Paul Turner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.