All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file
@ 2017-11-06 14:40 Kirill Tkhai
  2017-11-06 14:40 ` [PATCH 1/4] sched: Move manipulations with nr_iowait counter to separate functions Kirill Tkhai
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 14:40 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, ktkhai

The patchset exports numbers of cpu cgroup's running tasks and tasks in iowait
to userspace. This may be useful to get statistics of a separate container and
to analyse its load.

Number of running tasks is accounted as a sum of corresponding cfs and rt tasks
of the task cgroup. Number of iowait task is accounted similar to global nr_iowait
(it's incremented on dequeue in __schedule() and dectemented on try_to_wakeup)
with the only exception we handle changing of cgroup in sched_change_group().

---

Kirill Tkhai (4):
      sched: Move manipulations with nr_iowait counter to separate functions
      sched: Account per task_group nr_iowait
      sched: Export per task_cgroup nr_iowait to userspace
      sched: Export per task_cgroup nr_running to userspace


 kernel/sched/core.c  |  107 ++++++++++++++++++++++++++++++++++++++++----------
 kernel/sched/sched.h |    5 ++
 2 files changed, 90 insertions(+), 22 deletions(-)

--
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] sched: Move manipulations with nr_iowait counter to separate functions
  2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
@ 2017-11-06 14:40 ` Kirill Tkhai
  2017-11-06 14:40 ` [PATCH 2/4] sched: Account per task_group nr_iowait Kirill Tkhai
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 14:40 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, ktkhai

Move the (repeating) code to new helpers to reduce its volume and
to improve its readability.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 kernel/sched/core.c |   36 ++++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 72d1ab9550c0..712ee54edaa1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -794,6 +794,18 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
 	dequeue_task(rq, p, flags);
 }
 
+static void task_iowait_start(struct rq *rq, struct task_struct *p)
+{
+	atomic_inc(&rq->nr_iowait);
+	delayacct_blkio_start();
+}
+
+static void task_iowait_end(struct rq *rq, struct task_struct *p)
+{
+	delayacct_blkio_end();
+	atomic_dec(&rq->nr_iowait);
+}
+
 /*
  * __normal_prio - return the priority that is based on the static prio
  */
@@ -2051,10 +2063,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	p->sched_contributes_to_load = !!task_contributes_to_load(p);
 	p->state = TASK_WAKING;
 
-	if (p->in_iowait) {
-		delayacct_blkio_end();
-		atomic_dec(&task_rq(p)->nr_iowait);
-	}
+	if (p->in_iowait)
+		task_iowait_end(task_rq(p), p);
 
 	cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags);
 	if (task_cpu(p) != cpu) {
@@ -2064,10 +2074,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 
 #else /* CONFIG_SMP */
 
-	if (p->in_iowait) {
-		delayacct_blkio_end();
-		atomic_dec(&task_rq(p)->nr_iowait);
-	}
+	if (p->in_iowait)
+		task_iowait_end(task_rq(p), p);
 
 #endif /* CONFIG_SMP */
 
@@ -2117,10 +2125,8 @@ static void try_to_wake_up_local(struct task_struct *p, struct rq_flags *rf)
 	trace_sched_waking(p);
 
 	if (!task_on_rq_queued(p)) {
-		if (p->in_iowait) {
-			delayacct_blkio_end();
-			atomic_dec(&rq->nr_iowait);
-		}
+		if (p->in_iowait)
+			task_iowait_end(rq, p);
 		ttwu_activate(rq, p, ENQUEUE_WAKEUP | ENQUEUE_NOCLOCK);
 	}
 
@@ -3320,10 +3326,8 @@ static void __sched notrace __schedule(bool preempt)
 			deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
 			prev->on_rq = 0;
 
-			if (prev->in_iowait) {
-				atomic_inc(&rq->nr_iowait);
-				delayacct_blkio_start();
-			}
+			if (prev->in_iowait)
+				task_iowait_start(rq, prev);
 
 			/*
 			 * If a worker went to sleep, notify and ask workqueue

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] sched: Account per task_group nr_iowait
  2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
  2017-11-06 14:40 ` [PATCH 1/4] sched: Move manipulations with nr_iowait counter to separate functions Kirill Tkhai
@ 2017-11-06 14:40 ` Kirill Tkhai
  2017-11-06 16:06   ` Peter Zijlstra
  2017-11-06 14:40 ` [PATCH 3/4] sched: Export per task_cgroup nr_iowait to userspace Kirill Tkhai
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 14:40 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, ktkhai

The patch makes number of task_group's tasks in iowait state
be tracked separately. This may be useful for containers to
check nr_iowait state of a single one.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 kernel/sched/core.c  |   45 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h |    5 +++++
 2 files changed, 50 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 712ee54edaa1..86d1ad5f49bd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -796,12 +796,32 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
 
 static void task_iowait_start(struct rq *rq, struct task_struct *p)
 {
+#ifdef CONFIG_CGROUP_SCHED
+	struct task_group *tg = task_group(p);
+
+	/* Task's sched_task_group is changed under both of the below locks */
+	BUG_ON(!raw_spin_is_locked(&p->pi_lock) && !raw_spin_is_locked(&rq->lock));
+	while (task_group_is_autogroup(tg))
+		tg = tg->parent;
+	atomic_inc(&tg->stat[rq->cpu].nr_iowait);
+#endif
+
 	atomic_inc(&rq->nr_iowait);
 	delayacct_blkio_start();
 }
 
 static void task_iowait_end(struct rq *rq, struct task_struct *p)
 {
+#ifdef CONFIG_CGROUP_SCHED
+	struct task_group *tg = task_group(p);
+
+	/* Task's sched_task_group is changed under both of the below locks */
+	BUG_ON(!raw_spin_is_locked(&p->pi_lock) && !raw_spin_is_locked(&rq->lock));
+	while (task_group_is_autogroup(tg))
+		tg = tg->parent;
+	atomic_dec(&tg->stat[rq->cpu].nr_iowait);
+#endif
+
 	delayacct_blkio_end();
 	atomic_dec(&rq->nr_iowait);
 }
@@ -5805,6 +5825,9 @@ void __init sched_init(void)
 	sched_clock_init();
 	wait_bit_init();
 
+#ifdef CONFIG_CGROUP_SCHED
+	alloc_size += nr_cpu_ids * sizeof(struct tg_stat);
+#endif
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	alloc_size += 2 * nr_cpu_ids * sizeof(void **);
 #endif
@@ -5814,6 +5837,10 @@ void __init sched_init(void)
 	if (alloc_size) {
 		ptr = (unsigned long)kzalloc(alloc_size, GFP_NOWAIT);
 
+#ifdef CONFIG_CGROUP_SCHED
+		root_task_group.stat = (struct tg_stat *)ptr;
+		ptr += nr_cpu_ids * sizeof(struct tg_stat);
+#endif
 #ifdef CONFIG_FAIR_GROUP_SCHED
 		root_task_group.se = (struct sched_entity **)ptr;
 		ptr += nr_cpu_ids * sizeof(void **);
@@ -6133,6 +6160,8 @@ static DEFINE_SPINLOCK(task_group_lock);
 
 static void sched_free_group(struct task_group *tg)
 {
+	if (tg->stat)
+		kfree(tg->stat);
 	free_fair_sched_group(tg);
 	free_rt_sched_group(tg);
 	autogroup_free(tg);
@@ -6154,6 +6183,10 @@ struct task_group *sched_create_group(struct task_group *parent)
 	if (!alloc_rt_sched_group(tg, parent))
 		goto err;
 
+	tg->stat = kzalloc(sizeof(struct tg_stat) * nr_cpu_ids, 0);
+	if (!tg->stat)
+		goto err;
+
 	return tg;
 
 err:
@@ -6207,8 +6240,16 @@ void sched_offline_group(struct task_group *tg)
 
 static void sched_change_group(struct task_struct *tsk, int type)
 {
+	int cpu = 0, queued = task_on_rq_queued(tsk);
 	struct task_group *tg;
 
+	if (!queued && tsk->in_iowait && type == TASK_MOVE_GROUP) {
+		cpu = task_cpu(tsk);
+		tg = task_group(tsk);
+		while (task_group_is_autogroup(tg))
+			tg = tg->parent;
+		atomic_dec(&tg->stat[cpu].nr_iowait);
+	}
 	/*
 	 * All callers are synchronized by task_rq_lock(); we do not use RCU
 	 * which is pointless here. Thus, we pass "true" to task_css_check()
@@ -6216,6 +6257,10 @@ static void sched_change_group(struct task_struct *tsk, int type)
 	 */
 	tg = container_of(task_css_check(tsk, cpu_cgrp_id, true),
 			  struct task_group, css);
+
+	if (!queued && tsk->in_iowait && type == TASK_MOVE_GROUP)
+		atomic_inc(&tg->stat[cpu].nr_iowait);
+
 	tg = autogroup_task_group(tsk, tg);
 	tsk->sched_task_group = tg;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 58787e3631c7..5e7cb18e1340 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -290,6 +290,10 @@ struct cfs_bandwidth {
 #endif
 };
 
+struct tg_stat {
+	atomic_t nr_iowait;
+};
+
 /* task group related information */
 struct task_group {
 	struct cgroup_subsys_state css;
@@ -330,6 +334,7 @@ struct task_group {
 #endif
 
 	struct cfs_bandwidth cfs_bandwidth;
+	struct tg_stat *stat;
 };
 
 #ifdef CONFIG_FAIR_GROUP_SCHED

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] sched: Export per task_cgroup nr_iowait to userspace
  2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
  2017-11-06 14:40 ` [PATCH 1/4] sched: Move manipulations with nr_iowait counter to separate functions Kirill Tkhai
  2017-11-06 14:40 ` [PATCH 2/4] sched: Account per task_group nr_iowait Kirill Tkhai
@ 2017-11-06 14:40 ` Kirill Tkhai
  2017-11-06 14:40 ` [PATCH 4/4] sched: Export per task_cgroup nr_running " Kirill Tkhai
  2017-11-06 16:04 ` [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Peter Zijlstra
  4 siblings, 0 replies; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 14:40 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, ktkhai

Make "stat" file always exist in cpu cgroup directory
(not only in CONFIG_CFS_BANDWIDTH case), and show
task_cgroup's nr_iowait there.

This may be useful for containers to check a statistics
of a single container.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 kernel/sched/core.c |   19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 86d1ad5f49bd..1b54707f5344 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6665,20 +6665,27 @@ static int __cfs_schedulable(struct task_group *tg, u64 period, u64 quota)
 
 	return ret;
 }
+#endif /* CONFIG_CFS_BANDWIDTH */
+#endif /* CONFIG_FAIR_GROUP_SCHED */
 
 static int cpu_stats_show(struct seq_file *sf, void *v)
 {
 	struct task_group *tg = css_tg(seq_css(sf));
+	int i, nr_iowait = 0;
+#ifdef CONFIG_CFS_BANDWIDTH
 	struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
 
 	seq_printf(sf, "nr_periods %d\n", cfs_b->nr_periods);
 	seq_printf(sf, "nr_throttled %d\n", cfs_b->nr_throttled);
 	seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time);
+#endif
+	for_each_possible_cpu(i) {
+		nr_iowait += atomic_read(&tg->stat[i].nr_iowait);
+	}
+	seq_printf(sf, "nr_iowait %d\n", nr_iowait);
 
 	return 0;
 }
-#endif /* CONFIG_CFS_BANDWIDTH */
-#endif /* CONFIG_FAIR_GROUP_SCHED */
 
 #ifdef CONFIG_RT_GROUP_SCHED
 static int cpu_rt_runtime_write(struct cgroup_subsys_state *css,
@@ -6707,6 +6714,10 @@ static u64 cpu_rt_period_read_uint(struct cgroup_subsys_state *css,
 #endif /* CONFIG_RT_GROUP_SCHED */
 
 static struct cftype cpu_files[] = {
+	{
+		.name = "stat",
+		.seq_show = cpu_stats_show,
+	},
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	{
 		.name = "shares",
@@ -6725,10 +6736,6 @@ static struct cftype cpu_files[] = {
 		.read_u64 = cpu_cfs_period_read_u64,
 		.write_u64 = cpu_cfs_period_write_u64,
 	},
-	{
-		.name = "stat",
-		.seq_show = cpu_stats_show,
-	},
 #endif
 #ifdef CONFIG_RT_GROUP_SCHED
 	{

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] sched: Export per task_cgroup nr_running to userspace
  2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
                   ` (2 preceding siblings ...)
  2017-11-06 14:40 ` [PATCH 3/4] sched: Export per task_cgroup nr_iowait to userspace Kirill Tkhai
@ 2017-11-06 14:40 ` Kirill Tkhai
  2017-11-06 16:04 ` [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Peter Zijlstra
  4 siblings, 0 replies; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 14:40 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, ktkhai

Show task_cgroup's nr_running in cgroup's "cpu.stat" file.

This may be useful for containers to check a statistics
of a single container.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 kernel/sched/core.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1b54707f5344..96dd30e6d243 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6671,7 +6671,7 @@ static int __cfs_schedulable(struct task_group *tg, u64 period, u64 quota)
 static int cpu_stats_show(struct seq_file *sf, void *v)
 {
 	struct task_group *tg = css_tg(seq_css(sf));
-	int i, nr_iowait = 0;
+	int i, nr_running = 0, nr_iowait = 0;
 #ifdef CONFIG_CFS_BANDWIDTH
 	struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
 
@@ -6680,8 +6680,15 @@ static int cpu_stats_show(struct seq_file *sf, void *v)
 	seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time);
 #endif
 	for_each_possible_cpu(i) {
+#ifdef CONFIG_FAIR_GROUP_SCHED
+		nr_running += tg->cfs_rq[i]->nr_running;
+#endif
+#ifdef CONFIG_RT_GROUP_SCHED
+		nr_running += tg->rt_rq[i]->rt_nr_running;
+#endif
 		nr_iowait += atomic_read(&tg->stat[i].nr_iowait);
 	}
+	seq_printf(sf, "nr_running %d\n", nr_running);
 	seq_printf(sf, "nr_iowait %d\n", nr_iowait);
 
 	return 0;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait     in cpu.stat file
  2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
                   ` (3 preceding siblings ...)
  2017-11-06 14:40 ` [PATCH 4/4] sched: Export per task_cgroup nr_running " Kirill Tkhai
@ 2017-11-06 16:04 ` Peter Zijlstra
  4 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2017-11-06 16:04 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: mingo, linux-kernel

On Mon, Nov 06, 2017 at 05:40:15PM +0300, Kirill Tkhai wrote:
> The patchset exports numbers of cpu cgroup's running tasks and tasks in iowait
> to userspace. This may be useful to get statistics of a separate container and
> to analyse its load.

NAK on the iowait thing. That should not be exposed ever.

Also read commit:

  e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] sched: Account per task_group nr_iowait
  2017-11-06 14:40 ` [PATCH 2/4] sched: Account per task_group nr_iowait Kirill Tkhai
@ 2017-11-06 16:06   ` Peter Zijlstra
  2017-11-06 16:12     ` Kirill Tkhai
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2017-11-06 16:06 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: mingo, linux-kernel

On Mon, Nov 06, 2017 at 05:40:32PM +0300, Kirill Tkhai wrote:
> The patch makes number of task_group's tasks in iowait state
> be tracked separately. This may be useful for containers to
> check nr_iowait state of a single one.
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
>  kernel/sched/core.c  |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  kernel/sched/sched.h |    5 +++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 712ee54edaa1..86d1ad5f49bd 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -796,12 +796,32 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
>  
>  static void task_iowait_start(struct rq *rq, struct task_struct *p)
>  {
> +#ifdef CONFIG_CGROUP_SCHED
> +	struct task_group *tg = task_group(p);
> +
> +	/* Task's sched_task_group is changed under both of the below locks */
> +	BUG_ON(!raw_spin_is_locked(&p->pi_lock) && !raw_spin_is_locked(&rq->lock));

We have lockdep_assert_held for that.

> +	while (task_group_is_autogroup(tg))
> +		tg = tg->parent;
> +	atomic_inc(&tg->stat[rq->cpu].nr_iowait);

You're joking right, more atomic ops on the fast paths..

> +#endif
> +
>  	atomic_inc(&rq->nr_iowait);
>  	delayacct_blkio_start();

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] sched: Account per task_group nr_iowait
  2017-11-06 16:06   ` Peter Zijlstra
@ 2017-11-06 16:12     ` Kirill Tkhai
  2017-11-06 16:24       ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Kirill Tkhai @ 2017-11-06 16:12 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel

On 06.11.2017 19:06, Peter Zijlstra wrote:
> On Mon, Nov 06, 2017 at 05:40:32PM +0300, Kirill Tkhai wrote:
>> The patch makes number of task_group's tasks in iowait state
>> be tracked separately. This may be useful for containers to
>> check nr_iowait state of a single one.
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> ---
>>  kernel/sched/core.c  |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>  kernel/sched/sched.h |    5 +++++
>>  2 files changed, 50 insertions(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 712ee54edaa1..86d1ad5f49bd 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -796,12 +796,32 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
>>  
>>  static void task_iowait_start(struct rq *rq, struct task_struct *p)
>>  {
>> +#ifdef CONFIG_CGROUP_SCHED
>> +	struct task_group *tg = task_group(p);
>> +
>> +	/* Task's sched_task_group is changed under both of the below locks */
>> +	BUG_ON(!raw_spin_is_locked(&p->pi_lock) && !raw_spin_is_locked(&rq->lock));
> 
> We have lockdep_assert_held for that.
> 
>> +	while (task_group_is_autogroup(tg))
>> +		tg = tg->parent;
>> +	atomic_inc(&tg->stat[rq->cpu].nr_iowait);
> 
> You're joking right, more atomic ops on the fast paths..

There should be a synchronization... It's modified under rq->lock everywhere, except try_to_wakeup().
Would it be better to use one more rq->lock at try_to_wakeup() instead of atomic?
 
>> +#endif
>> +
>>  	atomic_inc(&rq->nr_iowait);
>>  	delayacct_blkio_start();

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] sched: Account per task_group nr_iowait
  2017-11-06 16:12     ` Kirill Tkhai
@ 2017-11-06 16:24       ` Peter Zijlstra
  2017-11-06 16:25         ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2017-11-06 16:24 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: mingo, linux-kernel

On Mon, Nov 06, 2017 at 07:12:58PM +0300, Kirill Tkhai wrote:

> >> +	atomic_inc(&tg->stat[rq->cpu].nr_iowait);
> > 
> > You're joking right, more atomic ops on the fast paths..
> 
> There should be a synchronization... It's modified under rq->lock everywhere, except try_to_wakeup().
> Would it be better to use one more rq->lock at try_to_wakeup() instead of atomic?

No, of course not. We spend a lot of time getting of that rq->lock
there.

The better option is to not care about iowait, since its a complete
garbage number to begin with -- read that commit I pointed you to.

But if you do manage to convince me iowait is a sane thing to export
(and its not); then you should not use atomics -- nor is there any need
to. Since all you want to export is \Sum nr_iowait, you can inc/dec to
pure cpu local variables and the sum will make it all work.

The extant iowait crap cannot do this because it thinks per-cpu IO-wait
is a thing -- its not, its a random number at best, but its ABI so we
can't fix :-(

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] sched: Account per task_group nr_iowait
  2017-11-06 16:24       ` Peter Zijlstra
@ 2017-11-06 16:25         ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2017-11-06 16:25 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: mingo, linux-kernel

On Mon, Nov 06, 2017 at 05:24:36PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 06, 2017 at 07:12:58PM +0300, Kirill Tkhai wrote:
> 
> > >> +	atomic_inc(&tg->stat[rq->cpu].nr_iowait);
> > > 
> > > You're joking right, more atomic ops on the fast paths..
> > 
> > There should be a synchronization... It's modified under rq->lock everywhere, except try_to_wakeup().
> > Would it be better to use one more rq->lock at try_to_wakeup() instead of atomic?
> 
> No, of course not. We spend a lot of time getting of that rq->lock
                                                   ^ rid
> there.
> 
> The better option is to not care about iowait, since its a complete
> garbage number to begin with -- read that commit I pointed you to.
> 
> But if you do manage to convince me iowait is a sane thing to export
> (and its not); then you should not use atomics -- nor is there any need
> to. Since all you want to export is \Sum nr_iowait, you can inc/dec to
> pure cpu local variables and the sum will make it all work.
> 
> The extant iowait crap cannot do this because it thinks per-cpu IO-wait
> is a thing -- its not, its a random number at best, but its ABI so we
> can't fix :-(

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-11-06 16:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-06 14:40 [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Kirill Tkhai
2017-11-06 14:40 ` [PATCH 1/4] sched: Move manipulations with nr_iowait counter to separate functions Kirill Tkhai
2017-11-06 14:40 ` [PATCH 2/4] sched: Account per task_group nr_iowait Kirill Tkhai
2017-11-06 16:06   ` Peter Zijlstra
2017-11-06 16:12     ` Kirill Tkhai
2017-11-06 16:24       ` Peter Zijlstra
2017-11-06 16:25         ` Peter Zijlstra
2017-11-06 14:40 ` [PATCH 3/4] sched: Export per task_cgroup nr_iowait to userspace Kirill Tkhai
2017-11-06 14:40 ` [PATCH 4/4] sched: Export per task_cgroup nr_running " Kirill Tkhai
2017-11-06 16:04 ` [PATCH 0/4] Show cpu cgroup's nr_running and nr_iowait in cpu.stat file Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.