linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file
@ 2017-11-06 14:40 Kirill Tkhai
  2017-11-06 14:40 ` [PATCH 1/4] sched: Move manipulations with nr_iowait counter to separate functions Kirill Tkhai
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 14:40 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, ktkhai

The patchset exports numbers of cpu cgroup's running tasks and tasks in iowait
to userspace. This may be useful to get statistics of a separate container and
to analyse its load.

Number of running tasks is accounted as a sum of corresponding cfs and rt tasks
of the task cgroup. Number of iowait task is accounted similar to global nr_iowait
(it's incremented on dequeue in __schedule() and dectemented on try_to_wakeup)
with the only exception we handle changing of cgroup in sched_change_group().

---

Kirill Tkhai (4):
      sched: Move manipulations with nr_iowait counter to separate functions
      sched: Account per task_group nr_iowait
      sched: Export per task_cgroup nr_iowait to userspace
      sched: Export per task_cgroup nr_running to userspace


 kernel/sched/core.c  |  107 ++++++++++++++++++++++++++++++++++++++++----------
 kernel/sched/sched.h |    5 ++
 2 files changed, 90 insertions(+), 22 deletions(-)

--
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] sched: Move manipulations with nr_iowait counter to separate functions
  2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
@ 2017-11-06 14:40 ` Kirill Tkhai
  2017-11-06 14:40 ` [PATCH 2/4] sched: Account per task_group nr_iowait Kirill Tkhai
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 14:40 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, ktkhai

Move the (repeating) code to new helpers to reduce its volume and
to improve its readability.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 kernel/sched/core.c |   36 ++++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 72d1ab9550c0..712ee54edaa1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -794,6 +794,18 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
 	dequeue_task(rq, p, flags);
 }
 
+static void task_iowait_start(struct rq *rq, struct task_struct *p)
+{
+	atomic_inc(&rq->nr_iowait);
+	delayacct_blkio_start();
+}
+
+static void task_iowait_end(struct rq *rq, struct task_struct *p)
+{
+	delayacct_blkio_end();
+	atomic_dec(&rq->nr_iowait);
+}
+
 /*
  * __normal_prio - return the priority that is based on the static prio
  */
@@ -2051,10 +2063,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	p->sched_contributes_to_load = !!task_contributes_to_load(p);
 	p->state = TASK_WAKING;
 
-	if (p->in_iowait) {
-		delayacct_blkio_end();
-		atomic_dec(&task_rq(p)->nr_iowait);
-	}
+	if (p->in_iowait)
+		task_iowait_end(task_rq(p), p);
 
 	cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags);
 	if (task_cpu(p) != cpu) {
@@ -2064,10 +2074,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 
 #else /* CONFIG_SMP */
 
-	if (p->in_iowait) {
-		delayacct_blkio_end();
-		atomic_dec(&task_rq(p)->nr_iowait);
-	}
+	if (p->in_iowait)
+		task_iowait_end(task_rq(p), p);
 
 #endif /* CONFIG_SMP */
 
@@ -2117,10 +2125,8 @@ static void try_to_wake_up_local(struct task_struct *p, struct rq_flags *rf)
 	trace_sched_waking(p);
 
 	if (!task_on_rq_queued(p)) {
-		if (p->in_iowait) {
-			delayacct_blkio_end();
-			atomic_dec(&rq->nr_iowait);
-		}
+		if (p->in_iowait)
+			task_iowait_end(rq, p);
 		ttwu_activate(rq, p, ENQUEUE_WAKEUP | ENQUEUE_NOCLOCK);
 	}
 
@@ -3320,10 +3326,8 @@ static void __sched notrace __schedule(bool preempt)
 			deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
 			prev->on_rq = 0;
 
-			if (prev->in_iowait) {
-				atomic_inc(&rq->nr_iowait);
-				delayacct_blkio_start();
-			}
+			if (prev->in_iowait)
+				task_iowait_start(rq, prev);
 
 			/*
 			 * If a worker went to sleep, notify and ask workqueue

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] sched: Account per task_group nr_iowait
  2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
  2017-11-06 14:40 ` [PATCH 1/4] sched: Move manipulations with nr_iowait counter to separate functions Kirill Tkhai
@ 2017-11-06 14:40 ` Kirill Tkhai
  2017-11-06 16:06   ` Peter Zijlstra
  2017-11-06 14:40 ` [PATCH 3/4] sched: Export per task_cgroup nr_iowait to userspace Kirill Tkhai
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 14:40 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, ktkhai

The patch makes number of task_group's tasks in iowait state
be tracked separately. This may be useful for containers to
check nr_iowait state of a single one.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 kernel/sched/core.c  |   45 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h |    5 +++++
 2 files changed, 50 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 712ee54edaa1..86d1ad5f49bd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -796,12 +796,32 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
 
 static void task_iowait_start(struct rq *rq, struct task_struct *p)
 {
+#ifdef CONFIG_CGROUP_SCHED
+	struct task_group *tg = task_group(p);
+
+	/* Task's sched_task_group is changed under both of the below locks */
+	BUG_ON(!raw_spin_is_locked(&p->pi_lock) && !raw_spin_is_locked(&rq->lock));
+	while (task_group_is_autogroup(tg))
+		tg = tg->parent;
+	atomic_inc(&tg->stat[rq->cpu].nr_iowait);
+#endif
+
 	atomic_inc(&rq->nr_iowait);
 	delayacct_blkio_start();
 }
 
 static void task_iowait_end(struct rq *rq, struct task_struct *p)
 {
+#ifdef CONFIG_CGROUP_SCHED
+	struct task_group *tg = task_group(p);
+
+	/* Task's sched_task_group is changed under both of the below locks */
+	BUG_ON(!raw_spin_is_locked(&p->pi_lock) && !raw_spin_is_locked(&rq->lock));
+	while (task_group_is_autogroup(tg))
+		tg = tg->parent;
+	atomic_dec(&tg->stat[rq->cpu].nr_iowait);
+#endif
+
 	delayacct_blkio_end();
 	atomic_dec(&rq->nr_iowait);
 }
@@ -5805,6 +5825,9 @@ void __init sched_init(void)
 	sched_clock_init();
 	wait_bit_init();
 
+#ifdef CONFIG_CGROUP_SCHED
+	alloc_size += nr_cpu_ids * sizeof(struct tg_stat);
+#endif
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	alloc_size += 2 * nr_cpu_ids * sizeof(void **);
 #endif
@@ -5814,6 +5837,10 @@ void __init sched_init(void)
 	if (alloc_size) {
 		ptr = (unsigned long)kzalloc(alloc_size, GFP_NOWAIT);
 
+#ifdef CONFIG_CGROUP_SCHED
+		root_task_group.stat = (struct tg_stat *)ptr;
+		ptr += nr_cpu_ids * sizeof(struct tg_stat);
+#endif
 #ifdef CONFIG_FAIR_GROUP_SCHED
 		root_task_group.se = (struct sched_entity **)ptr;
 		ptr += nr_cpu_ids * sizeof(void **);
@@ -6133,6 +6160,8 @@ static DEFINE_SPINLOCK(task_group_lock);
 
 static void sched_free_group(struct task_group *tg)
 {
+	if (tg->stat)
+		kfree(tg->stat);
 	free_fair_sched_group(tg);
 	free_rt_sched_group(tg);
 	autogroup_free(tg);
@@ -6154,6 +6183,10 @@ struct task_group *sched_create_group(struct task_group *parent)
 	if (!alloc_rt_sched_group(tg, parent))
 		goto err;
 
+	tg->stat = kzalloc(sizeof(struct tg_stat) * nr_cpu_ids, 0);
+	if (!tg->stat)
+		goto err;
+
 	return tg;
 
 err:
@@ -6207,8 +6240,16 @@ void sched_offline_group(struct task_group *tg)
 
 static void sched_change_group(struct task_struct *tsk, int type)
 {
+	int cpu = 0, queued = task_on_rq_queued(tsk);
 	struct task_group *tg;
 
+	if (!queued && tsk->in_iowait && type == TASK_MOVE_GROUP) {
+		cpu = task_cpu(tsk);
+		tg = task_group(tsk);
+		while (task_group_is_autogroup(tg))
+			tg = tg->parent;
+		atomic_dec(&tg->stat[cpu].nr_iowait);
+	}
 	/*
 	 * All callers are synchronized by task_rq_lock(); we do not use RCU
 	 * which is pointless here. Thus, we pass "true" to task_css_check()
@@ -6216,6 +6257,10 @@ static void sched_change_group(struct task_struct *tsk, int type)
 	 */
 	tg = container_of(task_css_check(tsk, cpu_cgrp_id, true),
 			  struct task_group, css);
+
+	if (!queued && tsk->in_iowait && type == TASK_MOVE_GROUP)
+		atomic_inc(&tg->stat[cpu].nr_iowait);
+
 	tg = autogroup_task_group(tsk, tg);
 	tsk->sched_task_group = tg;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 58787e3631c7..5e7cb18e1340 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -290,6 +290,10 @@ struct cfs_bandwidth {
 #endif
 };
 
+struct tg_stat {
+	atomic_t nr_iowait;
+};
+
 /* task group related information */
 struct task_group {
 	struct cgroup_subsys_state css;
@@ -330,6 +334,7 @@ struct task_group {
 #endif
 
 	struct cfs_bandwidth cfs_bandwidth;
+	struct tg_stat *stat;
 };
 
 #ifdef CONFIG_FAIR_GROUP_SCHED

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] sched: Export per task_cgroup nr_iowait to userspace
  2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
  2017-11-06 14:40 ` [PATCH 1/4] sched: Move manipulations with nr_iowait counter to separate functions Kirill Tkhai
  2017-11-06 14:40 ` [PATCH 2/4] sched: Account per task_group nr_iowait Kirill Tkhai
@ 2017-11-06 14:40 ` Kirill Tkhai
  2017-11-06 14:40 ` [PATCH 4/4] sched: Export per task_cgroup nr_running " Kirill Tkhai
  2017-11-06 16:04 ` [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Peter Zijlstra
  4 siblings, 0 replies; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 14:40 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, ktkhai

Make "stat" file always exist in cpu cgroup directory
(not only in CONFIG_CFS_BANDWIDTH case), and show
task_cgroup's nr_iowait there.

This may be useful for containers to check a statistics
of a single container.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 kernel/sched/core.c |   19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 86d1ad5f49bd..1b54707f5344 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6665,20 +6665,27 @@ static int __cfs_schedulable(struct task_group *tg, u64 period, u64 quota)
 
 	return ret;
 }
+#endif /* CONFIG_CFS_BANDWIDTH */
+#endif /* CONFIG_FAIR_GROUP_SCHED */
 
 static int cpu_stats_show(struct seq_file *sf, void *v)
 {
 	struct task_group *tg = css_tg(seq_css(sf));
+	int i, nr_iowait = 0;
+#ifdef CONFIG_CFS_BANDWIDTH
 	struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
 
 	seq_printf(sf, "nr_periods %d\n", cfs_b->nr_periods);
 	seq_printf(sf, "nr_throttled %d\n", cfs_b->nr_throttled);
 	seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time);
+#endif
+	for_each_possible_cpu(i) {
+		nr_iowait += atomic_read(&tg->stat[i].nr_iowait);
+	}
+	seq_printf(sf, "nr_iowait %d\n", nr_iowait);
 
 	return 0;
 }
-#endif /* CONFIG_CFS_BANDWIDTH */
-#endif /* CONFIG_FAIR_GROUP_SCHED */
 
 #ifdef CONFIG_RT_GROUP_SCHED
 static int cpu_rt_runtime_write(struct cgroup_subsys_state *css,
@@ -6707,6 +6714,10 @@ static u64 cpu_rt_period_read_uint(struct cgroup_subsys_state *css,
 #endif /* CONFIG_RT_GROUP_SCHED */
 
 static struct cftype cpu_files[] = {
+	{
+		.name = "stat",
+		.seq_show = cpu_stats_show,
+	},
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	{
 		.name = "shares",
@@ -6725,10 +6736,6 @@ static struct cftype cpu_files[] = {
 		.read_u64 = cpu_cfs_period_read_u64,
 		.write_u64 = cpu_cfs_period_write_u64,
 	},
-	{
-		.name = "stat",
-		.seq_show = cpu_stats_show,
-	},
 #endif
 #ifdef CONFIG_RT_GROUP_SCHED
 	{

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] sched: Export per task_cgroup nr_running to userspace
  2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
                   ` (2 preceding siblings ...)
  2017-11-06 14:40 ` [PATCH 3/4] sched: Export per task_cgroup nr_iowait to userspace Kirill Tkhai
@ 2017-11-06 14:40 ` Kirill Tkhai
  2017-11-06 16:04 ` [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Peter Zijlstra
  4 siblings, 0 replies; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 14:40 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, ktkhai

Show task_cgroup's nr_running in cgroup's "cpu.stat" file.

This may be useful for containers to check a statistics
of a single container.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 kernel/sched/core.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1b54707f5344..96dd30e6d243 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6671,7 +6671,7 @@ static int __cfs_schedulable(struct task_group *tg, u64 period, u64 quota)
 static int cpu_stats_show(struct seq_file *sf, void *v)
 {
 	struct task_group *tg = css_tg(seq_css(sf));
-	int i, nr_iowait = 0;
+	int i, nr_running = 0, nr_iowait = 0;
 #ifdef CONFIG_CFS_BANDWIDTH
 	struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
 
@@ -6680,8 +6680,15 @@ static int cpu_stats_show(struct seq_file *sf, void *v)
 	seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time);
 #endif
 	for_each_possible_cpu(i) {
+#ifdef CONFIG_FAIR_GROUP_SCHED
+		nr_running += tg->cfs_rq[i]->nr_running;
+#endif
+#ifdef CONFIG_RT_GROUP_SCHED
+		nr_running += tg->rt_rq[i]->rt_nr_running;
+#endif
 		nr_iowait += atomic_read(&tg->stat[i].nr_iowait);
 	}
+	seq_printf(sf, "nr_running %d\n", nr_running);
 	seq_printf(sf, "nr_iowait %d\n", nr_iowait);
 
 	return 0;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait     in cpu.stat file
  2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
                   ` (3 preceding siblings ...)
  2017-11-06 14:40 ` [PATCH 4/4] sched: Export per task_cgroup nr_running " Kirill Tkhai
@ 2017-11-06 16:04 ` Peter Zijlstra
  4 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2017-11-06 16:04 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: mingo, linux-kernel

On Mon, Nov 06, 2017 at 05:40:15PM +0300, Kirill Tkhai wrote:
> The patchset exports numbers of cpu cgroup's running tasks and tasks in iowait
> to userspace. This may be useful to get statistics of a separate container and
> to analyse its load.

NAK on the iowait thing. That should not be exposed ever.

Also read commit:

  e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] sched: Account per task_group nr_iowait
  2017-11-06 14:40 ` [PATCH 2/4] sched: Account per task_group nr_iowait Kirill Tkhai
@ 2017-11-06 16:06   ` Peter Zijlstra
  2017-11-06 16:12     ` Kirill Tkhai
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2017-11-06 16:06 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: mingo, linux-kernel

On Mon, Nov 06, 2017 at 05:40:32PM +0300, Kirill Tkhai wrote:
> The patch makes number of task_group's tasks in iowait state
> be tracked separately. This may be useful for containers to
> check nr_iowait state of a single one.
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
>  kernel/sched/core.c  |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  kernel/sched/sched.h |    5 +++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 712ee54edaa1..86d1ad5f49bd 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -796,12 +796,32 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
>  
>  static void task_iowait_start(struct rq *rq, struct task_struct *p)
>  {
> +#ifdef CONFIG_CGROUP_SCHED
> +	struct task_group *tg = task_group(p);
> +
> +	/* Task's sched_task_group is changed under both of the below locks */
> +	BUG_ON(!raw_spin_is_locked(&p->pi_lock) && !raw_spin_is_locked(&rq->lock));

We have lockdep_assert_held for that.

> +	while (task_group_is_autogroup(tg))
> +		tg = tg->parent;
> +	atomic_inc(&tg->stat[rq->cpu].nr_iowait);

You're joking right, more atomic ops on the fast paths..

> +#endif
> +
>  	atomic_inc(&rq->nr_iowait);
>  	delayacct_blkio_start();

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] sched: Account per task_group nr_iowait
  2017-11-06 16:06   ` Peter Zijlstra
@ 2017-11-06 16:12     ` Kirill Tkhai
  2017-11-06 16:24       ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 16:12 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel

On 06.11.2017 19:06, Peter Zijlstra wrote:
> On Mon, Nov 06, 2017 at 05:40:32PM +0300, Kirill Tkhai wrote:
>> The patch makes number of task_group's tasks in iowait state
>> be tracked separately. This may be useful for containers to
>> check nr_iowait state of a single one.
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> ---
>>  kernel/sched/core.c  |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>  kernel/sched/sched.h |    5 +++++
>>  2 files changed, 50 insertions(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 712ee54edaa1..86d1ad5f49bd 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -796,12 +796,32 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
>>  
>>  static void task_iowait_start(struct rq *rq, struct task_struct *p)
>>  {
>> +#ifdef CONFIG_CGROUP_SCHED
>> +	struct task_group *tg = task_group(p);
>> +
>> +	/* Task's sched_task_group is changed under both of the below locks */
>> +	BUG_ON(!raw_spin_is_locked(&p->pi_lock) && !raw_spin_is_locked(&rq->lock));
> 
> We have lockdep_assert_held for that.
> 
>> +	while (task_group_is_autogroup(tg))
>> +		tg = tg->parent;
>> +	atomic_inc(&tg->stat[rq->cpu].nr_iowait);
> 
> You're joking right, more atomic ops on the fast paths..

There should be a synchronization... It's modified under rq->lock everywhere, except try_to_wakeup().
Would it be better to use one more rq->lock at try_to_wakeup() instead of atomic?
 
>> +#endif
>> +
>>  	atomic_inc(&rq->nr_iowait);
>>  	delayacct_blkio_start();

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] sched: Account per task_group nr_iowait
  2017-11-06 16:12     ` Kirill Tkhai
@ 2017-11-06 16:24       ` Peter Zijlstra
  2017-11-06 16:25         ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2017-11-06 16:24 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: mingo, linux-kernel

On Mon, Nov 06, 2017 at 07:12:58PM +0300, Kirill Tkhai wrote:

> >> +	atomic_inc(&tg->stat[rq->cpu].nr_iowait);
> > 
> > You're joking right, more atomic ops on the fast paths..
> 
> There should be a synchronization... It's modified under rq->lock everywhere, except try_to_wakeup().
> Would it be better to use one more rq->lock at try_to_wakeup() instead of atomic?

No, of course not. We spend a lot of time getting of that rq->lock
there.

The better option is to not care about iowait, since its a complete
garbage number to begin with -- read that commit I pointed you to.

But if you do manage to convince me iowait is a sane thing to export
(and its not); then you should not use atomics -- nor is there any need
to. Since all you want to export is \Sum nr_iowait, you can inc/dec to
pure cpu local variables and the sum will make it all work.

The extant iowait crap cannot do this because it thinks per-cpu IO-wait
is a thing -- its not, its a random number at best, but its ABI so we
can't fix :-(

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] sched: Account per task_group nr_iowait
  2017-11-06 16:24       ` Peter Zijlstra
@ 2017-11-06 16:25         ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2017-11-06 16:25 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: mingo, linux-kernel

On Mon, Nov 06, 2017 at 05:24:36PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 06, 2017 at 07:12:58PM +0300, Kirill Tkhai wrote:
> 
> > >> +	atomic_inc(&tg->stat[rq->cpu].nr_iowait);
> > > 
> > > You're joking right, more atomic ops on the fast paths..
> > 
> > There should be a synchronization... It's modified under rq->lock everywhere, except try_to_wakeup().
> > Would it be better to use one more rq->lock at try_to_wakeup() instead of atomic?
> 
> No, of course not. We spend a lot of time getting of that rq->lock
                                                   ^ rid
> there.
> 
> The better option is to not care about iowait, since its a complete
> garbage number to begin with -- read that commit I pointed you to.
> 
> But if you do manage to convince me iowait is a sane thing to export
> (and its not); then you should not use atomics -- nor is there any need
> to. Since all you want to export is \Sum nr_iowait, you can inc/dec to
> pure cpu local variables and the sum will make it all work.
> 
> The extant iowait crap cannot do this because it thinks per-cpu IO-wait
> is a thing -- its not, its a random number at best, but its ABI so we
> can't fix :-(

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-11-06 16:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
2017-11-06 14:40 ` [PATCH 1/4] sched: Move manipulations with nr_iowait counter to separate functions Kirill Tkhai
2017-11-06 14:40 ` [PATCH 2/4] sched: Account per task_group nr_iowait Kirill Tkhai
2017-11-06 16:06   ` Peter Zijlstra
2017-11-06 16:12     ` Kirill Tkhai
2017-11-06 16:24       ` Peter Zijlstra
2017-11-06 16:25         ` Peter Zijlstra
2017-11-06 14:40 ` [PATCH 3/4] sched: Export per task_cgroup nr_iowait to userspace Kirill Tkhai
2017-11-06 14:40 ` [PATCH 4/4] sched: Export per task_cgroup nr_running " Kirill Tkhai
2017-11-06 16:04 ` [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).