All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kirill Tkhai <ktkhai@virtuozzo.com>
To: Juri Lelli <juri.lelli@gmail.com>,
	mingo@redhat.com, peterz@infradead.org
Cc: linux-kernel@vger.kernel.org
Subject: [PATCH v2] sched/rt: Rework for_each_process_thread() iterations in tg_has_rt_tasks()
Date: Fri, 20 Apr 2018 13:06:31 +0300	[thread overview]
Message-ID: <854a5fb1-a9c1-023f-55ec-17fa14ad07d5@virtuozzo.com> (raw)
In-Reply-To: <0d7fbdab-b972-7f86-4090-b49f9315c868@virtuozzo.com>

From: Kirill Tkhai <ktkhai@virtuozzo.com>

tg_rt_schedulable() iterates over all child task groups,
while tg_has_rt_tasks() iterates over all linked tasks.
In case of systems with big number of tasks, this may
take a lot of time.

I observed hard LOCKUP on machine with 20000+ processes
after write to "cpu.rt_period_us" of cpu cgroup with
39 children. The problem occurred because of tasklist_lock
is held for a long time and other processes can't do fork().

PID: 1036268  TASK: ffff88766c310000  CPU: 36  COMMAND: "criu"
 #0 [ffff887f7f408e48] crash_nmi_callback at ffffffff81050601
 #1 [ffff887f7f408e58] nmi_handle at ffffffff816e0cc7
 #2 [ffff887f7f408eb0] do_nmi at ffffffff816e0fb0
 #3 [ffff887f7f408ef0] end_repeat_nmi at ffffffff816e00b9
    [exception RIP: tg_rt_schedulable+463]
    RIP: ffffffff810bf49f  RSP: ffff886537ad7d50  RFLAGS: 00000202
    RAX: 0000000000000000  RBX: 000000003b9aca00  RCX: ffff883e9cb4b1b0
    RDX: ffff887d0be43608  RSI: ffff886537ad7dd8  RDI: ffff8840a6ad0000
    RBP: ffff886537ad7d68   R8: ffff887d0be431b0   R9: 00000000000e7ef0
    R10: ffff88164fc39400  R11: 0000000000023380  R12: ffffffff81ef8d00
    R13: ffffffff810bea40  R14: 0000000000000000  R15: ffff8840a6ad0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff886537ad7d50] tg_rt_schedulable at ffffffff810bf49f
 #5 [ffff886537ad7d70] walk_tg_tree_from at ffffffff810c6c91
 #6 [ffff886537ad7dc0] tg_set_rt_bandwidth at ffffffff810c6dd0
 #7 [ffff886537ad7e28] cpu_rt_period_write_uint at ffffffff810c6eea
 #8 [ffff886537ad7e38] cgroup_file_write at ffffffff8111cfd3
 #9 [ffff886537ad7ec8] vfs_write at ffffffff8121eced
#10 [ffff886537ad7f08] sys_write at ffffffff8121faff
#11 [ffff886537ad7f50] system_call_fastpath at ffffffff816e8a7d

The patch reworks tg_has_rt_tasks() and makes it to iterate over
task group process list instead of iteration over all tasks list.
This makes the function to scale well, and reduces its execution
time.

Note, that since tasklist_lock doesn't protect a task against
sched_class changing, we don't introduce new races in comparison
to that we had before.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 kernel/sched/rt.c |   22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 7aef6b4e885a..20be796d4e63 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2395,10 +2395,11 @@ const struct sched_class rt_sched_class = {
  */
 static DEFINE_MUTEX(rt_constraints_mutex);
 
-/* Must be called with tasklist_lock held */
 static inline int tg_has_rt_tasks(struct task_group *tg)
 {
-	struct task_struct *g, *p;
+	struct css_task_iter it;
+	struct task_struct *task;
+	int ret = 0;
 
 	/*
 	 * Autogroups do not have RT tasks; see autogroup_create().
@@ -2406,12 +2407,15 @@ static inline int tg_has_rt_tasks(struct task_group *tg)
 	if (task_group_is_autogroup(tg))
 		return 0;
 
-	for_each_process_thread(g, p) {
-		if (rt_task(p) && task_group(p) == tg)
-			return 1;
-	}
+	css_task_iter_start(&tg->css, 0, &it);
+	while ((task = css_task_iter_next(&it)))
+		if (rt_task(task)) {
+			ret = 1;
+			break;
+		}
+	css_task_iter_end(&it);
 
-	return 0;
+	return ret;
 }
 
 struct rt_schedulable_data {
@@ -2510,7 +2514,6 @@ static int tg_set_rt_bandwidth(struct task_group *tg,
 		return -EINVAL;
 
 	mutex_lock(&rt_constraints_mutex);
-	read_lock(&tasklist_lock);
 	err = __rt_schedulable(tg, rt_period, rt_runtime);
 	if (err)
 		goto unlock;
@@ -2528,7 +2531,6 @@ static int tg_set_rt_bandwidth(struct task_group *tg,
 	}
 	raw_spin_unlock_irq(&tg->rt_bandwidth.rt_runtime_lock);
 unlock:
-	read_unlock(&tasklist_lock);
 	mutex_unlock(&rt_constraints_mutex);
 
 	return err;
@@ -2582,9 +2584,7 @@ static int sched_rt_global_constraints(void)
 	int ret = 0;
 
 	mutex_lock(&rt_constraints_mutex);
-	read_lock(&tasklist_lock);
 	ret = __rt_schedulable(NULL, 0, 0);
-	read_unlock(&tasklist_lock);
 	mutex_unlock(&rt_constraints_mutex);
 
 	return ret;

  reply	other threads:[~2018-04-20 10:06 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-19 17:29 [PATCH] sched/rt: Rework for_each_process_thread() iterations in tg_has_rt_tasks() Kirill Tkhai
2018-04-20  9:25 ` Juri Lelli
2018-04-20  9:43   ` Kirill Tkhai
2018-04-20 10:06     ` Kirill Tkhai [this message]
2018-04-20 14:11       ` [PATCH v2] " Juri Lelli
2018-04-20 14:30         ` Kirill Tkhai
2018-04-20 15:27           ` Juri Lelli
2018-04-25 15:42       ` Kirill Tkhai
2018-04-25 19:49       ` Peter Zijlstra
2018-04-26  9:54         ` [PATCH v3]sched/rt: Stop " Kirill Tkhai
2020-01-23 21:56           ` Phil Auld
2020-01-24  9:09             ` Kirill Tkhai
2020-01-27 16:30               ` Phil Auld
2020-01-27 16:43             ` Peter Zijlstra
2020-01-27 16:56               ` Phil Auld
2020-01-27 17:00                 ` Peter Zijlstra
2020-01-27 17:45                   ` Phil Auld
2018-04-20 10:58     ` [PATCH] sched/rt: Rework " Juri Lelli
2018-04-20 11:21       ` Kirill Tkhai
2018-04-25 17:55 ` Peter Zijlstra
2018-04-26  9:26   ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=854a5fb1-a9c1-023f-55ec-17fa14ad07d5@virtuozzo.com \
    --to=ktkhai@virtuozzo.com \
    --cc=juri.lelli@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.