* [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file
@ 2017-11-06 14:40 Kirill Tkhai
2017-11-06 14:40 ` [PATCH 1/4] sched: Move manipulations with nr_iowait counter to separate functions Kirill Tkhai
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 14:40 UTC (permalink / raw)
To: mingo, peterz, linux-kernel, ktkhai
The patchset exports numbers of cpu cgroup's running tasks and tasks in iowait
to userspace. This may be useful to get statistics of a separate container and
to analyse its load.
Number of running tasks is accounted as a sum of corresponding cfs and rt tasks
of the task cgroup. Number of iowait task is accounted similar to global nr_iowait
(it's incremented on dequeue in __schedule() and dectemented on try_to_wakeup)
with the only exception we handle changing of cgroup in sched_change_group().
---
Kirill Tkhai (4):
sched: Move manipulations with nr_iowait counter to separate functions
sched: Account per task_group nr_iowait
sched: Export per task_cgroup nr_iowait to userspace
sched: Export per task_cgroup nr_running to userspace
kernel/sched/core.c | 107 ++++++++++++++++++++++++++++++++++++++++----------
kernel/sched/sched.h | 5 ++
2 files changed, 90 insertions(+), 22 deletions(-)
--
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/4] sched: Move manipulations with nr_iowait counter to separate functions
2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
@ 2017-11-06 14:40 ` Kirill Tkhai
2017-11-06 14:40 ` [PATCH 2/4] sched: Account per task_group nr_iowait Kirill Tkhai
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 14:40 UTC (permalink / raw)
To: mingo, peterz, linux-kernel, ktkhai
Move the (repeating) code to new helpers to reduce its volume and
to improve its readability.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
kernel/sched/core.c | 36 ++++++++++++++++++++----------------
1 file changed, 20 insertions(+), 16 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 72d1ab9550c0..712ee54edaa1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -794,6 +794,18 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
dequeue_task(rq, p, flags);
}
+static void task_iowait_start(struct rq *rq, struct task_struct *p)
+{
+ atomic_inc(&rq->nr_iowait);
+ delayacct_blkio_start();
+}
+
+static void task_iowait_end(struct rq *rq, struct task_struct *p)
+{
+ delayacct_blkio_end();
+ atomic_dec(&rq->nr_iowait);
+}
+
/*
* __normal_prio - return the priority that is based on the static prio
*/
@@ -2051,10 +2063,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
p->sched_contributes_to_load = !!task_contributes_to_load(p);
p->state = TASK_WAKING;
- if (p->in_iowait) {
- delayacct_blkio_end();
- atomic_dec(&task_rq(p)->nr_iowait);
- }
+ if (p->in_iowait)
+ task_iowait_end(task_rq(p), p);
cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags);
if (task_cpu(p) != cpu) {
@@ -2064,10 +2074,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
#else /* CONFIG_SMP */
- if (p->in_iowait) {
- delayacct_blkio_end();
- atomic_dec(&task_rq(p)->nr_iowait);
- }
+ if (p->in_iowait)
+ task_iowait_end(task_rq(p), p);
#endif /* CONFIG_SMP */
@@ -2117,10 +2125,8 @@ static void try_to_wake_up_local(struct task_struct *p, struct rq_flags *rf)
trace_sched_waking(p);
if (!task_on_rq_queued(p)) {
- if (p->in_iowait) {
- delayacct_blkio_end();
- atomic_dec(&rq->nr_iowait);
- }
+ if (p->in_iowait)
+ task_iowait_end(rq, p);
ttwu_activate(rq, p, ENQUEUE_WAKEUP | ENQUEUE_NOCLOCK);
}
@@ -3320,10 +3326,8 @@ static void __sched notrace __schedule(bool preempt)
deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
prev->on_rq = 0;
- if (prev->in_iowait) {
- atomic_inc(&rq->nr_iowait);
- delayacct_blkio_start();
- }
+ if (prev->in_iowait)
+ task_iowait_start(rq, prev);
/*
* If a worker went to sleep, notify and ask workqueue
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/4] sched: Account per task_group nr_iowait
2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
2017-11-06 14:40 ` [PATCH 1/4] sched: Move manipulations with nr_iowait counter to separate functions Kirill Tkhai
@ 2017-11-06 14:40 ` Kirill Tkhai
2017-11-06 16:06 ` Peter Zijlstra
2017-11-06 14:40 ` [PATCH 3/4] sched: Export per task_cgroup nr_iowait to userspace Kirill Tkhai
` (2 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 14:40 UTC (permalink / raw)
To: mingo, peterz, linux-kernel, ktkhai
The patch makes number of task_group's tasks in iowait state
be tracked separately. This may be useful for containers to
check nr_iowait state of a single one.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
kernel/sched/core.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
kernel/sched/sched.h | 5 +++++
2 files changed, 50 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 712ee54edaa1..86d1ad5f49bd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -796,12 +796,32 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
static void task_iowait_start(struct rq *rq, struct task_struct *p)
{
+#ifdef CONFIG_CGROUP_SCHED
+ struct task_group *tg = task_group(p);
+
+ /* Task's sched_task_group is changed under both of the below locks */
+ BUG_ON(!raw_spin_is_locked(&p->pi_lock) && !raw_spin_is_locked(&rq->lock));
+ while (task_group_is_autogroup(tg))
+ tg = tg->parent;
+ atomic_inc(&tg->stat[rq->cpu].nr_iowait);
+#endif
+
atomic_inc(&rq->nr_iowait);
delayacct_blkio_start();
}
static void task_iowait_end(struct rq *rq, struct task_struct *p)
{
+#ifdef CONFIG_CGROUP_SCHED
+ struct task_group *tg = task_group(p);
+
+ /* Task's sched_task_group is changed under both of the below locks */
+ BUG_ON(!raw_spin_is_locked(&p->pi_lock) && !raw_spin_is_locked(&rq->lock));
+ while (task_group_is_autogroup(tg))
+ tg = tg->parent;
+ atomic_dec(&tg->stat[rq->cpu].nr_iowait);
+#endif
+
delayacct_blkio_end();
atomic_dec(&rq->nr_iowait);
}
@@ -5805,6 +5825,9 @@ void __init sched_init(void)
sched_clock_init();
wait_bit_init();
+#ifdef CONFIG_CGROUP_SCHED
+ alloc_size += nr_cpu_ids * sizeof(struct tg_stat);
+#endif
#ifdef CONFIG_FAIR_GROUP_SCHED
alloc_size += 2 * nr_cpu_ids * sizeof(void **);
#endif
@@ -5814,6 +5837,10 @@ void __init sched_init(void)
if (alloc_size) {
ptr = (unsigned long)kzalloc(alloc_size, GFP_NOWAIT);
+#ifdef CONFIG_CGROUP_SCHED
+ root_task_group.stat = (struct tg_stat *)ptr;
+ ptr += nr_cpu_ids * sizeof(struct tg_stat);
+#endif
#ifdef CONFIG_FAIR_GROUP_SCHED
root_task_group.se = (struct sched_entity **)ptr;
ptr += nr_cpu_ids * sizeof(void **);
@@ -6133,6 +6160,8 @@ static DEFINE_SPINLOCK(task_group_lock);
static void sched_free_group(struct task_group *tg)
{
+ if (tg->stat)
+ kfree(tg->stat);
free_fair_sched_group(tg);
free_rt_sched_group(tg);
autogroup_free(tg);
@@ -6154,6 +6183,10 @@ struct task_group *sched_create_group(struct task_group *parent)
if (!alloc_rt_sched_group(tg, parent))
goto err;
+ tg->stat = kzalloc(sizeof(struct tg_stat) * nr_cpu_ids, 0);
+ if (!tg->stat)
+ goto err;
+
return tg;
err:
@@ -6207,8 +6240,16 @@ void sched_offline_group(struct task_group *tg)
static void sched_change_group(struct task_struct *tsk, int type)
{
+ int cpu = 0, queued = task_on_rq_queued(tsk);
struct task_group *tg;
+ if (!queued && tsk->in_iowait && type == TASK_MOVE_GROUP) {
+ cpu = task_cpu(tsk);
+ tg = task_group(tsk);
+ while (task_group_is_autogroup(tg))
+ tg = tg->parent;
+ atomic_dec(&tg->stat[cpu].nr_iowait);
+ }
/*
* All callers are synchronized by task_rq_lock(); we do not use RCU
* which is pointless here. Thus, we pass "true" to task_css_check()
@@ -6216,6 +6257,10 @@ static void sched_change_group(struct task_struct *tsk, int type)
*/
tg = container_of(task_css_check(tsk, cpu_cgrp_id, true),
struct task_group, css);
+
+ if (!queued && tsk->in_iowait && type == TASK_MOVE_GROUP)
+ atomic_inc(&tg->stat[cpu].nr_iowait);
+
tg = autogroup_task_group(tsk, tg);
tsk->sched_task_group = tg;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 58787e3631c7..5e7cb18e1340 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -290,6 +290,10 @@ struct cfs_bandwidth {
#endif
};
+struct tg_stat {
+ atomic_t nr_iowait;
+};
+
/* task group related information */
struct task_group {
struct cgroup_subsys_state css;
@@ -330,6 +334,7 @@ struct task_group {
#endif
struct cfs_bandwidth cfs_bandwidth;
+ struct tg_stat *stat;
};
#ifdef CONFIG_FAIR_GROUP_SCHED
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/4] sched: Export per task_cgroup nr_iowait to userspace
2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
2017-11-06 14:40 ` [PATCH 1/4] sched: Move manipulations with nr_iowait counter to separate functions Kirill Tkhai
2017-11-06 14:40 ` [PATCH 2/4] sched: Account per task_group nr_iowait Kirill Tkhai
@ 2017-11-06 14:40 ` Kirill Tkhai
2017-11-06 14:40 ` [PATCH 4/4] sched: Export per task_cgroup nr_running " Kirill Tkhai
2017-11-06 16:04 ` [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Peter Zijlstra
4 siblings, 0 replies; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 14:40 UTC (permalink / raw)
To: mingo, peterz, linux-kernel, ktkhai
Make "stat" file always exist in cpu cgroup directory
(not only in CONFIG_CFS_BANDWIDTH case), and show
task_cgroup's nr_iowait there.
This may be useful for containers to check a statistics
of a single container.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
kernel/sched/core.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 86d1ad5f49bd..1b54707f5344 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6665,20 +6665,27 @@ static int __cfs_schedulable(struct task_group *tg, u64 period, u64 quota)
return ret;
}
+#endif /* CONFIG_CFS_BANDWIDTH */
+#endif /* CONFIG_FAIR_GROUP_SCHED */
static int cpu_stats_show(struct seq_file *sf, void *v)
{
struct task_group *tg = css_tg(seq_css(sf));
+ int i, nr_iowait = 0;
+#ifdef CONFIG_CFS_BANDWIDTH
struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
seq_printf(sf, "nr_periods %d\n", cfs_b->nr_periods);
seq_printf(sf, "nr_throttled %d\n", cfs_b->nr_throttled);
seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time);
+#endif
+ for_each_possible_cpu(i) {
+ nr_iowait += atomic_read(&tg->stat[i].nr_iowait);
+ }
+ seq_printf(sf, "nr_iowait %d\n", nr_iowait);
return 0;
}
-#endif /* CONFIG_CFS_BANDWIDTH */
-#endif /* CONFIG_FAIR_GROUP_SCHED */
#ifdef CONFIG_RT_GROUP_SCHED
static int cpu_rt_runtime_write(struct cgroup_subsys_state *css,
@@ -6707,6 +6714,10 @@ static u64 cpu_rt_period_read_uint(struct cgroup_subsys_state *css,
#endif /* CONFIG_RT_GROUP_SCHED */
static struct cftype cpu_files[] = {
+ {
+ .name = "stat",
+ .seq_show = cpu_stats_show,
+ },
#ifdef CONFIG_FAIR_GROUP_SCHED
{
.name = "shares",
@@ -6725,10 +6736,6 @@ static struct cftype cpu_files[] = {
.read_u64 = cpu_cfs_period_read_u64,
.write_u64 = cpu_cfs_period_write_u64,
},
- {
- .name = "stat",
- .seq_show = cpu_stats_show,
- },
#endif
#ifdef CONFIG_RT_GROUP_SCHED
{
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/4] sched: Export per task_cgroup nr_running to userspace
2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
` (2 preceding siblings ...)
2017-11-06 14:40 ` [PATCH 3/4] sched: Export per task_cgroup nr_iowait to userspace Kirill Tkhai
@ 2017-11-06 14:40 ` Kirill Tkhai
2017-11-06 16:04 ` [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Peter Zijlstra
4 siblings, 0 replies; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 14:40 UTC (permalink / raw)
To: mingo, peterz, linux-kernel, ktkhai
Show task_cgroup's nr_running in cgroup's "cpu.stat" file.
This may be useful for containers to check a statistics
of a single container.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
kernel/sched/core.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1b54707f5344..96dd30e6d243 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6671,7 +6671,7 @@ static int __cfs_schedulable(struct task_group *tg, u64 period, u64 quota)
static int cpu_stats_show(struct seq_file *sf, void *v)
{
struct task_group *tg = css_tg(seq_css(sf));
- int i, nr_iowait = 0;
+ int i, nr_running = 0, nr_iowait = 0;
#ifdef CONFIG_CFS_BANDWIDTH
struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
@@ -6680,8 +6680,15 @@ static int cpu_stats_show(struct seq_file *sf, void *v)
seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time);
#endif
for_each_possible_cpu(i) {
+#ifdef CONFIG_FAIR_GROUP_SCHED
+ nr_running += tg->cfs_rq[i]->nr_running;
+#endif
+#ifdef CONFIG_RT_GROUP_SCHED
+ nr_running += tg->rt_rq[i]->rt_nr_running;
+#endif
nr_iowait += atomic_read(&tg->stat[i].nr_iowait);
}
+ seq_printf(sf, "nr_running %d\n", nr_running);
seq_printf(sf, "nr_iowait %d\n", nr_iowait);
return 0;
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file
2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
` (3 preceding siblings ...)
2017-11-06 14:40 ` [PATCH 4/4] sched: Export per task_cgroup nr_running " Kirill Tkhai
@ 2017-11-06 16:04 ` Peter Zijlstra
4 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2017-11-06 16:04 UTC (permalink / raw)
To: Kirill Tkhai; +Cc: mingo, linux-kernel
On Mon, Nov 06, 2017 at 05:40:15PM +0300, Kirill Tkhai wrote:
> The patchset exports numbers of cpu cgroup's running tasks and tasks in iowait
> to userspace. This may be useful to get statistics of a separate container and
> to analyse its load.
NAK on the iowait thing. That should not be exposed ever.
Also read commit:
e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/4] sched: Account per task_group nr_iowait
2017-11-06 14:40 ` [PATCH 2/4] sched: Account per task_group nr_iowait Kirill Tkhai
@ 2017-11-06 16:06 ` Peter Zijlstra
2017-11-06 16:12 ` Kirill Tkhai
0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2017-11-06 16:06 UTC (permalink / raw)
To: Kirill Tkhai; +Cc: mingo, linux-kernel
On Mon, Nov 06, 2017 at 05:40:32PM +0300, Kirill Tkhai wrote:
> The patch makes number of task_group's tasks in iowait state
> be tracked separately. This may be useful for containers to
> check nr_iowait state of a single one.
>
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
> kernel/sched/core.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> kernel/sched/sched.h | 5 +++++
> 2 files changed, 50 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 712ee54edaa1..86d1ad5f49bd 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -796,12 +796,32 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
>
> static void task_iowait_start(struct rq *rq, struct task_struct *p)
> {
> +#ifdef CONFIG_CGROUP_SCHED
> + struct task_group *tg = task_group(p);
> +
> + /* Task's sched_task_group is changed under both of the below locks */
> + BUG_ON(!raw_spin_is_locked(&p->pi_lock) && !raw_spin_is_locked(&rq->lock));
We have lockdep_assert_held for that.
> + while (task_group_is_autogroup(tg))
> + tg = tg->parent;
> + atomic_inc(&tg->stat[rq->cpu].nr_iowait);
You're joking right, more atomic ops on the fast paths..
> +#endif
> +
> atomic_inc(&rq->nr_iowait);
> delayacct_blkio_start();
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/4] sched: Account per task_group nr_iowait
2017-11-06 16:06 ` Peter Zijlstra
@ 2017-11-06 16:12 ` Kirill Tkhai
2017-11-06 16:24 ` Peter Zijlstra
0 siblings, 1 reply; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 16:12 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: mingo, linux-kernel
On 06.11.2017 19:06, Peter Zijlstra wrote:
> On Mon, Nov 06, 2017 at 05:40:32PM +0300, Kirill Tkhai wrote:
>> The patch makes number of task_group's tasks in iowait state
>> be tracked separately. This may be useful for containers to
>> check nr_iowait state of a single one.
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> ---
>> kernel/sched/core.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>> kernel/sched/sched.h | 5 +++++
>> 2 files changed, 50 insertions(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 712ee54edaa1..86d1ad5f49bd 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -796,12 +796,32 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
>>
>> static void task_iowait_start(struct rq *rq, struct task_struct *p)
>> {
>> +#ifdef CONFIG_CGROUP_SCHED
>> + struct task_group *tg = task_group(p);
>> +
>> + /* Task's sched_task_group is changed under both of the below locks */
>> + BUG_ON(!raw_spin_is_locked(&p->pi_lock) && !raw_spin_is_locked(&rq->lock));
>
> We have lockdep_assert_held for that.
>
>> + while (task_group_is_autogroup(tg))
>> + tg = tg->parent;
>> + atomic_inc(&tg->stat[rq->cpu].nr_iowait);
>
> You're joking right, more atomic ops on the fast paths..
There should be a synchronization... It's modified under rq->lock everywhere, except try_to_wakeup().
Would it be better to use one more rq->lock at try_to_wakeup() instead of atomic?
>> +#endif
>> +
>> atomic_inc(&rq->nr_iowait);
>> delayacct_blkio_start();
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/4] sched: Account per task_group nr_iowait
2017-11-06 16:12 ` Kirill Tkhai
@ 2017-11-06 16:24 ` Peter Zijlstra
2017-11-06 16:25 ` Peter Zijlstra
0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2017-11-06 16:24 UTC (permalink / raw)
To: Kirill Tkhai; +Cc: mingo, linux-kernel
On Mon, Nov 06, 2017 at 07:12:58PM +0300, Kirill Tkhai wrote:
> >> + atomic_inc(&tg->stat[rq->cpu].nr_iowait);
> >
> > You're joking right, more atomic ops on the fast paths..
>
> There should be a synchronization... It's modified under rq->lock everywhere, except try_to_wakeup().
> Would it be better to use one more rq->lock at try_to_wakeup() instead of atomic?
No, of course not. We spend a lot of time getting of that rq->lock
there.
The better option is to not care about iowait, since its a complete
garbage number to begin with -- read that commit I pointed you to.
But if you do manage to convince me iowait is a sane thing to export
(and its not); then you should not use atomics -- nor is there any need
to. Since all you want to export is \Sum nr_iowait, you can inc/dec to
pure cpu local variables and the sum will make it all work.
The extant iowait crap cannot do this because it thinks per-cpu IO-wait
is a thing -- its not, its a random number at best, but its ABI so we
can't fix :-(
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/4] sched: Account per task_group nr_iowait
2017-11-06 16:24 ` Peter Zijlstra
@ 2017-11-06 16:25 ` Peter Zijlstra
0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2017-11-06 16:25 UTC (permalink / raw)
To: Kirill Tkhai; +Cc: mingo, linux-kernel
On Mon, Nov 06, 2017 at 05:24:36PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 06, 2017 at 07:12:58PM +0300, Kirill Tkhai wrote:
>
> > >> + atomic_inc(&tg->stat[rq->cpu].nr_iowait);
> > >
> > > You're joking right, more atomic ops on the fast paths..
> >
> > There should be a synchronization... It's modified under rq->lock everywhere, except try_to_wakeup().
> > Would it be better to use one more rq->lock at try_to_wakeup() instead of atomic?
>
> No, of course not. We spend a lot of time getting of that rq->lock
^ rid
> there.
>
> The better option is to not care about iowait, since its a complete
> garbage number to begin with -- read that commit I pointed you to.
>
> But if you do manage to convince me iowait is a sane thing to export
> (and its not); then you should not use atomics -- nor is there any need
> to. Since all you want to export is \Sum nr_iowait, you can inc/dec to
> pure cpu local variables and the sum will make it all work.
>
> The extant iowait crap cannot do this because it thinks per-cpu IO-wait
> is a thing -- its not, its a random number at best, but its ABI so we
> can't fix :-(
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-11-06 16:25 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
2017-11-06 14:40 ` [PATCH 1/4] sched: Move manipulations with nr_iowait counter to separate functions Kirill Tkhai
2017-11-06 14:40 ` [PATCH 2/4] sched: Account per task_group nr_iowait Kirill Tkhai
2017-11-06 16:06 ` Peter Zijlstra
2017-11-06 16:12 ` Kirill Tkhai
2017-11-06 16:24 ` Peter Zijlstra
2017-11-06 16:25 ` Peter Zijlstra
2017-11-06 14:40 ` [PATCH 3/4] sched: Export per task_cgroup nr_iowait to userspace Kirill Tkhai
2017-11-06 14:40 ` [PATCH 4/4] sched: Export per task_cgroup nr_running " Kirill Tkhai
2017-11-06 16:04 ` [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).