linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kthread: run kthreadd with max priority SCHED_FIFO
@ 2007-12-17 22:43 Michal Schmidt
  2007-12-17 23:00 ` Jon Masters
  2007-12-22  9:30 ` Andrew Morton
  0 siblings, 2 replies; 21+ messages in thread
From: Michal Schmidt @ 2007-12-17 22:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Eric W. Biederman, Jon Masters, Andrew Morton, Satoru Takeuchi

kthreadd, the creator of other kernel threads, runs as a normal
priority task. This is a potential for priority inversion when a task
wants to spawn a high-priority kernel thread. A middle priority
SCHED_FIFO task can block kthreadd's execution indefinitely and thus
prevent the timely creation of the high-priority kernel thread.
    
This causes a practical problem. When a runaway real-time task is
eating 100% CPU and we attempt to put the CPU offline, sometimes we
block while waiting for the creation of the highest-priority
"kstopmachine" thread. 

The fix is to run kthreadd with the highest possible SCHED_FIFO
priority. Its children must still run as slightly negatively reniced
SCHED_NORMAL tasks.
    
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

diff --git a/kernel/kthread.c b/kernel/kthread.c
index dcfe724..a7ce932 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -94,10 +94,17 @@ static void create_kthread(struct kthread_create_info *create)
 	if (pid < 0) {
 		create->result = ERR_PTR(pid);
 	} else {
+		struct sched_param param = { .sched_priority = 0 };
 		wait_for_completion(&create->started);
 		read_lock(&tasklist_lock);
 		create->result = find_task_by_pid(pid);
 		read_unlock(&tasklist_lock);
+		/*
+		 * We (kthreadd) run with SCHED_FIFO, but we don't want
+		 * the kthreads we create to have it too by default.
+		 */
+		sched_setscheduler(create->result, SCHED_NORMAL, &param);
+		set_user_nice(create->result, -5);
 	}
 	complete(&create->done);
 }
@@ -217,11 +224,12 @@ EXPORT_SYMBOL(kthread_stop);
 int kthreadd(void *unused)
 {
 	struct task_struct *tsk = current;
+	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
 
 	/* Setup a clean context for our children to inherit. */
 	set_task_comm(tsk, "kthreadd");
 	ignore_signals(tsk);
-	set_user_nice(tsk, -5);
+	sched_setscheduler(tsk, SCHED_FIFO, &param);
 	set_cpus_allowed(tsk, CPU_MASK_ALL);
 
 	current->flags |= PF_NOFREEZE;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: run kthreadd with max priority SCHED_FIFO
  2007-12-17 22:43 [PATCH] kthread: run kthreadd with max priority SCHED_FIFO Michal Schmidt
@ 2007-12-17 23:00 ` Jon Masters
  2007-12-22  9:30 ` Andrew Morton
  1 sibling, 0 replies; 21+ messages in thread
From: Jon Masters @ 2007-12-17 23:00 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: linux-kernel, Eric W. Biederman, Andrew Morton, Satoru Takeuchi

On Mon, 2007-12-17 at 23:43 +0100, Michal Schmidt wrote:

> kthreadd, the creator of other kernel threads, runs as a normal
> priority task. This is a potential for priority inversion when a task
> wants to spawn a high-priority kernel thread. A middle priority
> SCHED_FIFO task can block kthreadd's execution indefinitely and thus
> prevent the timely creation of the high-priority kernel thread.
>     
> This causes a practical problem. When a runaway real-time task is
> eating 100% CPU and we attempt to put the CPU offline, sometimes we
> block while waiting for the creation of the highest-priority
> "kstopmachine" thread. 
> 
> The fix is to run kthreadd with the highest possible SCHED_FIFO
> priority. Its children must still run as slightly negatively reniced
> SCHED_NORMAL tasks.
>     
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index dcfe724..a7ce932 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -94,10 +94,17 @@ static void create_kthread(struct kthread_create_info *create)
>  	if (pid < 0) {
>  		create->result = ERR_PTR(pid);
>  	} else {
> +		struct sched_param param = { .sched_priority = 0 };
>  		wait_for_completion(&create->started);
>  		read_lock(&tasklist_lock);
>  		create->result = find_task_by_pid(pid);
>  		read_unlock(&tasklist_lock);
> +		/*
> +		 * We (kthreadd) run with SCHED_FIFO, but we don't want
> +		 * the kthreads we create to have it too by default.
> +		 */
> +		sched_setscheduler(create->result, SCHED_NORMAL, &param);
> +		set_user_nice(create->result, -5);
>  	}
>  	complete(&create->done);
>  }
> @@ -217,11 +224,12 @@ EXPORT_SYMBOL(kthread_stop);
>  int kthreadd(void *unused)
>  {
>  	struct task_struct *tsk = current;
> +	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
>  
>  	/* Setup a clean context for our children to inherit. */
>  	set_task_comm(tsk, "kthreadd");
>  	ignore_signals(tsk);
> -	set_user_nice(tsk, -5);
> +	sched_setscheduler(tsk, SCHED_FIFO, &param);
>  	set_cpus_allowed(tsk, CPU_MASK_ALL);
>  
>  	current->flags |= PF_NOFREEZE;

I looked at this internally over the weekend.

Acked-by: Jon Masters <jcm@redhat.com>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: run kthreadd with max priority SCHED_FIFO
  2007-12-17 22:43 [PATCH] kthread: run kthreadd with max priority SCHED_FIFO Michal Schmidt
  2007-12-17 23:00 ` Jon Masters
@ 2007-12-22  9:30 ` Andrew Morton
  2007-12-22  9:52   ` Jon Masters
  2008-01-07 10:06   ` [PATCH] kthread: always create the kernel threads with normal priority Michal Schmidt
  1 sibling, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2007-12-22  9:30 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: linux-kernel, Eric W. Biederman, Jon Masters, Satoru Takeuchi

On Mon, 17 Dec 2007 23:43:14 +0100 Michal Schmidt <mschmidt@redhat.com> wrote:

> kthreadd, the creator of other kernel threads, runs as a normal
> priority task. This is a potential for priority inversion when a task
> wants to spawn a high-priority kernel thread. A middle priority
> SCHED_FIFO task can block kthreadd's execution indefinitely and thus
> prevent the timely creation of the high-priority kernel thread.
>     
> This causes a practical problem. When a runaway real-time task is
> eating 100% CPU and we attempt to put the CPU offline, sometimes we
> block while waiting for the creation of the highest-priority
> "kstopmachine" thread. 
> 
> The fix is to run kthreadd with the highest possible SCHED_FIFO
> priority. Its children must still run as slightly negatively reniced
> SCHED_NORMAL tasks.

Did you hit this problem with the stock kernel, or have you been working on
other stuff?

A locked-up SCHED_FIFO process will cause kernel threads all sorts of
problems.  You've hit one instance, but there will be others.  (pdflush
stops working, for one).

The general approach we've taken to this is "don't do that".  Yes, we could
boost lots of kernel threads in the way which this patch does but this
actually takes control *away* from userspace.  Userspace no longer has the
ability to guarantee itself minimum possible latency without getting
preempted by kernel threads.

And yes, giving userspace this minimum-latency capability does imply that
userspace has a responsibility to not 100% starve kernel threads.  It's a
reasonable compromise, I think?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: run kthreadd with max priority SCHED_FIFO
  2007-12-22  9:30 ` Andrew Morton
@ 2007-12-22  9:52   ` Jon Masters
  2007-12-22 10:11     ` Andrew Morton
  2007-12-22 10:39     ` Mike Galbraith
  2008-01-07 10:06   ` [PATCH] kthread: always create the kernel threads with normal priority Michal Schmidt
  1 sibling, 2 replies; 21+ messages in thread
From: Jon Masters @ 2007-12-22  9:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Schmidt, linux-kernel, Eric W. Biederman, Satoru Takeuchi


On Sat, 2007-12-22 at 01:30 -0800, Andrew Morton wrote:
> On Mon, 17 Dec 2007 23:43:14 +0100 Michal Schmidt <mschmidt@redhat.com> wrote:
> 
> > kthreadd, the creator of other kernel threads, runs as a normal
> > priority task. This is a potential for priority inversion when a task
> > wants to spawn a high-priority kernel thread. A middle priority
> > SCHED_FIFO task can block kthreadd's execution indefinitely and thus
> > prevent the timely creation of the high-priority kernel thread.
> >     
> > This causes a practical problem. When a runaway real-time task is
> > eating 100% CPU and we attempt to put the CPU offline, sometimes we
> > block while waiting for the creation of the highest-priority
> > "kstopmachine" thread. 
> > 
> > The fix is to run kthreadd with the highest possible SCHED_FIFO
> > priority. Its children must still run as slightly negatively reniced
> > SCHED_NORMAL tasks.
> 
> Did you hit this problem with the stock kernel, or have you been working on
> other stuff?

This kind of problem is *far* more likely to happen on the -RT kernel
(more example cases), but it's also a general problem too.

> A locked-up SCHED_FIFO process will cause kernel threads all sorts of
> problems.  You've hit one instance, but there will be others.  (pdflush
> stops working, for one).

Right. Agreed that this is just one "fix" out of many possibly needed,
if upstream wants to address this kind of problem.

> The general approach we've taken to this is "don't do that".  Yes, we could
> boost lots of kernel threads in the way which this patch does but this
> actually takes control *away* from userspace.  Userspace no longer has the
> ability to guarantee itself minimum possible latency without getting
> preempted by kernel threads.
> 
> And yes, giving userspace this minimum-latency capability does imply that
> userspace has a responsibility to not 100% starve kernel threads.  It's a
> reasonable compromise, I think?

So, user tasks running with SCHED_FIFO should be able to lock a system?
I guess I see both sides of this argument - yes, it's userspace at
fault, but in other cases when userspace is at fault, we take action
(OOM, segfault, others). Isn't this situation just another case where
the kernel needs to avoid the evils of userland going awry?

Jon.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: run kthreadd with max priority SCHED_FIFO
  2007-12-22  9:52   ` Jon Masters
@ 2007-12-22 10:11     ` Andrew Morton
  2007-12-22 10:18       ` Jon Masters
  2007-12-22 10:39     ` Mike Galbraith
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2007-12-22 10:11 UTC (permalink / raw)
  To: Jon Masters
  Cc: Michal Schmidt, linux-kernel, Eric W. Biederman, Satoru Takeuchi

On Sat, 22 Dec 2007 04:52:50 -0500 Jon Masters <jcm@redhat.com> wrote:

> > The general approach we've taken to this is "don't do that".  Yes, we could
> > boost lots of kernel threads in the way which this patch does but this
> > actually takes control *away* from userspace.  Userspace no longer has the
> > ability to guarantee itself minimum possible latency without getting
> > preempted by kernel threads.
> > 
> > And yes, giving userspace this minimum-latency capability does imply that
> > userspace has a responsibility to not 100% starve kernel threads.  It's a
> > reasonable compromise, I think?
> 
> So, user tasks running with SCHED_FIFO should be able to lock a system?

yup.  root can damage the system in all sorts of ways.

> I guess I see both sides of this argument - yes, it's userspace at
> fault, but in other cases when userspace is at fault, we take action
> (OOM, segfault, others). Isn't this situation just another case where
> the kernel needs to avoid the evils of userland going awry?

Well...  the problem is that if we add a safety net to catch run-away
SCHED_FIFO processes, we've permanently degraded the service which we
provide to well-behaved programs.

Should there be a watchdog which checks for a process which has run
realtime for a certain period and which then takes some action?  Such as
descheduling it for a while, generating warnings, demoting its policy,
killing it etc?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: run kthreadd with max priority SCHED_FIFO
  2007-12-22 10:11     ` Andrew Morton
@ 2007-12-22 10:18       ` Jon Masters
  0 siblings, 0 replies; 21+ messages in thread
From: Jon Masters @ 2007-12-22 10:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Schmidt, linux-kernel, Eric W. Biederman, Satoru Takeuchi

On Sat, 2007-12-22 at 02:11 -0800, Andrew Morton wrote:

> Should there be a watchdog which checks for a process which has run
> realtime for a certain period and which then takes some action?  Such as
> descheduling it for a while, generating warnings, demoting its policy,
> killing it etc?

Using the analogy of the OOM killer being called, this wouldn't be a bad
idea IMO - especially if it were configurable, and off by default. This
couldn't be done in a kernel thread (unless some new priority level were
created for it), but I think the details can be worked out later.

Michal - what do you think is the best upstream solution?

Jon.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: run kthreadd with max priority SCHED_FIFO
  2007-12-22  9:52   ` Jon Masters
  2007-12-22 10:11     ` Andrew Morton
@ 2007-12-22 10:39     ` Mike Galbraith
  2007-12-22 10:52       ` Andrew Morton
  1 sibling, 1 reply; 21+ messages in thread
From: Mike Galbraith @ 2007-12-22 10:39 UTC (permalink / raw)
  To: Jon Masters
  Cc: Andrew Morton, Michal Schmidt, linux-kernel, Eric W. Biederman,
	Satoru Takeuchi


On Sat, 2007-12-22 at 04:52 -0500, Jon Masters wrote:

> So, user tasks running with SCHED_FIFO should be able to lock a system?
> I guess I see both sides of this argument - yes, it's userspace at
> fault, but in other cases when userspace is at fault, we take action
> (OOM, segfault, others). Isn't this situation just another case where
> the kernel needs to avoid the evils of userland going awry?

FYI, Ingo queued the below.

http://lkml.org/lkml/2007/10/31/344

	-Mike





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: run kthreadd with max priority SCHED_FIFO
  2007-12-22 10:39     ` Mike Galbraith
@ 2007-12-22 10:52       ` Andrew Morton
  2007-12-22 11:21         ` Jon Masters
  2007-12-23  8:50         ` Mike Galbraith
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2007-12-22 10:52 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Jon Masters, Michal Schmidt, linux-kernel, Eric W. Biederman,
	Satoru Takeuchi

On Sat, 22 Dec 2007 11:39:30 +0100 Mike Galbraith <efault@gmx.de> wrote:

> 
> On Sat, 2007-12-22 at 04:52 -0500, Jon Masters wrote:
> 
> > So, user tasks running with SCHED_FIFO should be able to lock a system?
> > I guess I see both sides of this argument - yes, it's userspace at
> > fault, but in other cases when userspace is at fault, we take action
> > (OOM, segfault, others). Isn't this situation just another case where
> > the kernel needs to avoid the evils of userland going awry?
> 
> FYI, Ingo queued the below.
> 
> http://lkml.org/lkml/2007/10/31/344
> 

That's pretty different of course, but rlimit might be a suitable interface
for implementing RLIMIT_MAX_CONTINUOUS_RT_MILLISECONDS.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: run kthreadd with max priority SCHED_FIFO
  2007-12-22 10:52       ` Andrew Morton
@ 2007-12-22 11:21         ` Jon Masters
  2007-12-23  8:50         ` Mike Galbraith
  1 sibling, 0 replies; 21+ messages in thread
From: Jon Masters @ 2007-12-22 11:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Galbraith, Michal Schmidt, linux-kernel, Eric W. Biederman,
	Satoru Takeuchi


On Sat, 2007-12-22 at 02:52 -0800, Andrew Morton wrote:
> On Sat, 22 Dec 2007 11:39:30 +0100 Mike Galbraith <efault@gmx.de> wrote:
> 
> > 
> > On Sat, 2007-12-22 at 04:52 -0500, Jon Masters wrote:
> > 
> > > So, user tasks running with SCHED_FIFO should be able to lock a system?
> > > I guess I see both sides of this argument - yes, it's userspace at
> > > fault, but in other cases when userspace is at fault, we take action
> > > (OOM, segfault, others). Isn't this situation just another case where
> > > the kernel needs to avoid the evils of userland going awry?
> > 
> > FYI, Ingo queued the below.
> > 
> > http://lkml.org/lkml/2007/10/31/344
> > 
> 
> That's pretty different of course, but rlimit might be a suitable interface
> for implementing RLIMIT_MAX_CONTINUOUS_RT_MILLISECONDS.

Right. Just when I thought there was no point in continuing (Ingo having
invented everything already, as usual), this specific problem remains.

I'll try bouncing some ideas off Ingo later.

Jon.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: run kthreadd with max priority SCHED_FIFO
  2007-12-22 10:52       ` Andrew Morton
  2007-12-22 11:21         ` Jon Masters
@ 2007-12-23  8:50         ` Mike Galbraith
  1 sibling, 0 replies; 21+ messages in thread
From: Mike Galbraith @ 2007-12-23  8:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jon Masters, Michal Schmidt, linux-kernel, Eric W. Biederman,
	Satoru Takeuchi


On Sat, 2007-12-22 at 02:52 -0800, Andrew Morton wrote:
> On Sat, 22 Dec 2007 11:39:30 +0100 Mike Galbraith <efault@gmx.de> wrote:
> 
> > 
> > On Sat, 2007-12-22 at 04:52 -0500, Jon Masters wrote:
> > 
> > > So, user tasks running with SCHED_FIFO should be able to lock a system?
> > > I guess I see both sides of this argument - yes, it's userspace at
> > > fault, but in other cases when userspace is at fault, we take action
> > > (OOM, segfault, others). Isn't this situation just another case where
> > > the kernel needs to avoid the evils of userland going awry?
> > 
> > FYI, Ingo queued the below.
> > 
> > http://lkml.org/lkml/2007/10/31/344
> > 
> 
> That's pretty different of course, but rlimit might be a suitable interface
> for implementing RLIMIT_MAX_CONTINUOUS_RT_MILLISECONDS.

I'd extend Peter's rt safety net instead: mark for forced requeue when
the soft limit is hit, or add that as an intermediate stage.  Possibly
add a demotion stage as well.  I wouldn't try to select lower priority
tasks, which RLIMIT_MAX_CONTINUOUS_RT_MILLISECONDS implies to me.

	-Mike


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] kthread: always create the kernel threads with normal priority
  2007-12-22  9:30 ` Andrew Morton
  2007-12-22  9:52   ` Jon Masters
@ 2008-01-07 10:06   ` Michal Schmidt
  2008-01-07 10:25     ` Andrew Morton
  2008-01-07 11:22     ` Remy Bohmer
  1 sibling, 2 replies; 21+ messages in thread
From: Michal Schmidt @ 2008-01-07 10:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Eric W. Biederman, Jon Masters, Satoru Takeuchi

On Sat, 22 Dec 2007 01:30:21 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 17 Dec 2007 23:43:14 +0100 Michal Schmidt
> <mschmidt@redhat.com> wrote:
> 
> > kthreadd, the creator of other kernel threads, runs as a normal
> > priority task. This is a potential for priority inversion when a
> > task wants to spawn a high-priority kernel thread. A middle priority
> > SCHED_FIFO task can block kthreadd's execution indefinitely and thus
> > prevent the timely creation of the high-priority kernel thread.
> >     
> > This causes a practical problem. When a runaway real-time task is
> > eating 100% CPU and we attempt to put the CPU offline, sometimes we
> > block while waiting for the creation of the highest-priority
> > "kstopmachine" thread. 
> > 
> > The fix is to run kthreadd with the highest possible SCHED_FIFO
> > priority. Its children must still run as slightly negatively reniced
> > SCHED_NORMAL tasks.
> 
> Did you hit this problem with the stock kernel, or have you been
> working on other stuff?

This was with RHEL5 and with current Fedora kernels.

> A locked-up SCHED_FIFO process will cause kernel threads all sorts of
> problems.  You've hit one instance, but there will be others.
> (pdflush stops working, for one).
> 
> The general approach we've taken to this is "don't do that".  Yes, we
> could boost lots of kernel threads in the way which this patch does
> but this actually takes control *away* from userspace.  Userspace no
> longer has the ability to guarantee itself minimum possible latency
> without getting preempted by kernel threads.
> 
> And yes, giving userspace this minimum-latency capability does imply
> that userspace has a responsibility to not 100% starve kernel
> threads.  It's a reasonable compromise, I think?

You're right. We should not run kthreadd with SCHED_FIFO by default.
But the user should be able to change it using chrt if he wants to
avoid this particular problem. So how about this instead?:



kthreadd, the creator of other kernel threads, runs as a normal priority task.
This is a potential for priority inversion when a task wants to spawn a
high-priority kernel thread. A middle priority SCHED_FIFO task can block
kthreadd's execution indefinitely and thus prevent the timely creation of the
high-priority kernel thread.

This causes a practical problem. When a runaway real-time task is eating 100%
CPU and we attempt to put the CPU offline, sometimes we block while waiting for
the creation of the highest-priority "kstopmachine" thread.

This could be solved by always running kthreadd with the highest possible
SCHED_FIFO priority, but that would be undesirable policy decision in the
kernel. kthreadd would cause unwanted latencies even for the realtime users who
know what they're doing.

Let's not make the decision for the user. Just allow the administrator to
change kthreadd's priority safely if he chooses to do it. Ensure that the
kernel threads are created with the usual nice level even if kthreadd's
priority is changed from the default.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
---
 kernel/kthread.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index dcfe724..e832a85 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -94,10 +94,21 @@ static void create_kthread(struct kthread_create_info *create)
 	if (pid < 0) {
 		create->result = ERR_PTR(pid);
 	} else {
+		struct sched_param param = { .sched_priority = 0 };
 		wait_for_completion(&create->started);
 		read_lock(&tasklist_lock);
 		create->result = find_task_by_pid(pid);
 		read_unlock(&tasklist_lock);
+		/*
+		 * root may want to change our (kthreadd's) priority to
+		 * realtime to solve a corner case priority inversion problem
+		 * (a realtime task consuming 100% CPU blocking the creation of
+		 * kernel threads). The kernel thread should not inherit the
+		 * higher priority. Let's always create it with the usual nice
+		 * level.
+		 */
+		sched_setscheduler(create->result, SCHED_NORMAL, &param);
+		set_user_nice(create->result, -5);
 	}
 	complete(&create->done);
 }
-- 
1.5.3.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: always create the kernel threads with normal priority
  2008-01-07 10:06   ` [PATCH] kthread: always create the kernel threads with normal priority Michal Schmidt
@ 2008-01-07 10:25     ` Andrew Morton
  2008-01-07 11:09       ` Ingo Molnar
  2008-01-07 13:18       ` Michal Schmidt
  2008-01-07 11:22     ` Remy Bohmer
  1 sibling, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2008-01-07 10:25 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: linux-kernel, Eric W. Biederman, Jon Masters, Satoru Takeuchi,
	Ingo Molnar

On Mon, 7 Jan 2008 11:06:03 +0100 Michal Schmidt <mschmidt@redhat.com> wrote:

> On Sat, 22 Dec 2007 01:30:21 -0800
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Mon, 17 Dec 2007 23:43:14 +0100 Michal Schmidt
> > <mschmidt@redhat.com> wrote:
> > 
> > > kthreadd, the creator of other kernel threads, runs as a normal
> > > priority task. This is a potential for priority inversion when a
> > > task wants to spawn a high-priority kernel thread. A middle priority
> > > SCHED_FIFO task can block kthreadd's execution indefinitely and thus
> > > prevent the timely creation of the high-priority kernel thread.
> > >     
> > > This causes a practical problem. When a runaway real-time task is
> > > eating 100% CPU and we attempt to put the CPU offline, sometimes we
> > > block while waiting for the creation of the highest-priority
> > > "kstopmachine" thread. 
> > > 
> > > The fix is to run kthreadd with the highest possible SCHED_FIFO
> > > priority. Its children must still run as slightly negatively reniced
> > > SCHED_NORMAL tasks.
> > 
> > Did you hit this problem with the stock kernel, or have you been
> > working on other stuff?
> 
> This was with RHEL5 and with current Fedora kernels.
> 
> > A locked-up SCHED_FIFO process will cause kernel threads all sorts of
> > problems.  You've hit one instance, but there will be others.
> > (pdflush stops working, for one).
> > 
> > The general approach we've taken to this is "don't do that".  Yes, we
> > could boost lots of kernel threads in the way which this patch does
> > but this actually takes control *away* from userspace.  Userspace no
> > longer has the ability to guarantee itself minimum possible latency
> > without getting preempted by kernel threads.
> > 
> > And yes, giving userspace this minimum-latency capability does imply
> > that userspace has a responsibility to not 100% starve kernel
> > threads.  It's a reasonable compromise, I think?
> 
> You're right. We should not run kthreadd with SCHED_FIFO by default.
> But the user should be able to change it using chrt if he wants to
> avoid this particular problem. So how about this instead?:
> 
> 
> 
> kthreadd, the creator of other kernel threads, runs as a normal priority task.
> This is a potential for priority inversion when a task wants to spawn a
> high-priority kernel thread. A middle priority SCHED_FIFO task can block
> kthreadd's execution indefinitely and thus prevent the timely creation of the
> high-priority kernel thread.
> 
> This causes a practical problem. When a runaway real-time task is eating 100%
> CPU and we attempt to put the CPU offline, sometimes we block while waiting for
> the creation of the highest-priority "kstopmachine" thread.
> 
> This could be solved by always running kthreadd with the highest possible
> SCHED_FIFO priority, but that would be undesirable policy decision in the
> kernel. kthreadd would cause unwanted latencies even for the realtime users who
> know what they're doing.
> 
> Let's not make the decision for the user. Just allow the administrator to
> change kthreadd's priority safely if he chooses to do it. Ensure that the
> kernel threads are created with the usual nice level even if kthreadd's
> priority is changed from the default.
> 
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
> ---
>  kernel/kthread.c |   11 +++++++++++
>  1 files changed, 11 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index dcfe724..e832a85 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -94,10 +94,21 @@ static void create_kthread(struct kthread_create_info *create)
>  	if (pid < 0) {
>  		create->result = ERR_PTR(pid);
>  	} else {
> +		struct sched_param param = { .sched_priority = 0 };
>  		wait_for_completion(&create->started);
>  		read_lock(&tasklist_lock);
>  		create->result = find_task_by_pid(pid);
>  		read_unlock(&tasklist_lock);
> +		/*
> +		 * root may want to change our (kthreadd's) priority to
> +		 * realtime to solve a corner case priority inversion problem
> +		 * (a realtime task consuming 100% CPU blocking the creation of
> +		 * kernel threads). The kernel thread should not inherit the
> +		 * higher priority. Let's always create it with the usual nice
> +		 * level.
> +		 */
> +		sched_setscheduler(create->result, SCHED_NORMAL, &param);
> +		set_user_nice(create->result, -5);
>  	}
>  	complete(&create->done);
>  }

Seems reasonable.

As a followup thing, we now have two hard-coded magical -5's in kthread.c. 
It'd be nice to add a #define for this.

It'd be nicer to work out where on earth that -5 came from too ;)

Readers might wonder why kthreadd children disinherit kthreadd's policy and
priority, but retain its cpus_allowed (and whatever other stuff root could have
altered?)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: always create the kernel threads with normal priority
  2008-01-07 10:25     ` Andrew Morton
@ 2008-01-07 11:09       ` Ingo Molnar
  2008-01-07 17:29         ` Andrew Morton
  2008-01-07 13:18       ` Michal Schmidt
  1 sibling, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2008-01-07 11:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Schmidt, linux-kernel, Eric W. Biederman, Jon Masters,
	Satoru Takeuchi, Peter Zijlstra


> > This causes a practical problem. When a runaway real-time task is 
> > eating 100% CPU and we attempt to put the CPU offline, sometimes we 
> > block while waiting for the creation of the highest-priority 
> > "kstopmachine" thread.

sched-devel.git has new mechanisms against runaway RT tasks. There's a 
new RLIMIT_RTTIME rlimit - if an RT task exceeds that rlimit then it is 
sent SIGXCPU.

there's also a new group scheduling extension that is driven via a 
sysctl:

  /proc/sys/kernel/sched_rt_ratio

this way if a user has a runaway RT task, other users (and root) will 
still have some CPU time left. (in Peter's latest patchset that is 
replaced via rt_runtime_ns - but this is a detail)

so instead of the never-ending arms race of kernel thread priorities 
against RT task priorities, we are going towards making RT tasks safer 
on a policy level.

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: always create the kernel threads with normal priority
  2008-01-07 10:06   ` [PATCH] kthread: always create the kernel threads with normal priority Michal Schmidt
  2008-01-07 10:25     ` Andrew Morton
@ 2008-01-07 11:22     ` Remy Bohmer
  2008-01-07 13:10       ` Michal Schmidt
  1 sibling, 1 reply; 21+ messages in thread
From: Remy Bohmer @ 2008-01-07 11:22 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: Andrew Morton, linux-kernel, Eric W. Biederman, Jon Masters,
	Satoru Takeuchi, Steven Rostedt

Hello Michal and Andrew,

> Let's not make the decision for the user. Just allow the administrator to
> change kthreadd's priority safely if he chooses to do it. Ensure that the
> kernel threads are created with the usual nice level even if kthreadd's
> priority is changed from the default.

Last year, I posted a patchset (that was meant for Preempt-RT at that
time) to be able to prioritise the interrupt-handler-threads (which
are kthreads) and softirq-threads from the kernel commandline. See
http://lkml.org/lkml/2007/12/19/208

Maybe we can find a way to use a similar mechanism as I used in my
patchset for the priorities of the remaining kthreads.
I do not like the way of forcing userland to change the priorities,
because that would require a userland with the chrt tool installed,
and that is not that practical for embedded systems (in which there
could be cases that there is no userland at all, or the init-process
is the whole embedded application). In that case an option to do it on
the kernel commandline is more practical.

I propose this kernel cmd-line option:
kthread_pmap=somethread:50,otherthread:12,34

Then threads can be started as SCHED_NORMAL, and when overruled inside
the kthread-sources itself, or by the kernel commandline, the user can
set them to something else.

What do you think of this?

(notice that I am reworking the review comments I received on this
patch-series right now, and that I can take such change into account
immediately)

Kind Regards,

Remy

2008/1/7, Michal Schmidt <mschmidt@redhat.com>:
> On Sat, 22 Dec 2007 01:30:21 -0800
> Andrew Morton <akpm@linux-foundation.org> wrote:
>
> > On Mon, 17 Dec 2007 23:43:14 +0100 Michal Schmidt
> > <mschmidt@redhat.com> wrote:
> >
> > > kthreadd, the creator of other kernel threads, runs as a normal
> > > priority task. This is a potential for priority inversion when a
> > > task wants to spawn a high-priority kernel thread. A middle priority
> > > SCHED_FIFO task can block kthreadd's execution indefinitely and thus
> > > prevent the timely creation of the high-priority kernel thread.
> > >
> > > This causes a practical problem. When a runaway real-time task is
> > > eating 100% CPU and we attempt to put the CPU offline, sometimes we
> > > block while waiting for the creation of the highest-priority
> > > "kstopmachine" thread.
> > >
> > > The fix is to run kthreadd with the highest possible SCHED_FIFO
> > > priority. Its children must still run as slightly negatively reniced
> > > SCHED_NORMAL tasks.
> >
> > Did you hit this problem with the stock kernel, or have you been
> > working on other stuff?
>
> This was with RHEL5 and with current Fedora kernels.
>
> > A locked-up SCHED_FIFO process will cause kernel threads all sorts of
> > problems.  You've hit one instance, but there will be others.
> > (pdflush stops working, for one).
> >
> > The general approach we've taken to this is "don't do that".  Yes, we
> > could boost lots of kernel threads in the way which this patch does
> > but this actually takes control *away* from userspace.  Userspace no
> > longer has the ability to guarantee itself minimum possible latency
> > without getting preempted by kernel threads.
> >
> > And yes, giving userspace this minimum-latency capability does imply
> > that userspace has a responsibility to not 100% starve kernel
> > threads.  It's a reasonable compromise, I think?
>
> You're right. We should not run kthreadd with SCHED_FIFO by default.
> But the user should be able to change it using chrt if he wants to
> avoid this particular problem. So how about this instead?:
>
>
>
> kthreadd, the creator of other kernel threads, runs as a normal priority task.
> This is a potential for priority inversion when a task wants to spawn a
> high-priority kernel thread. A middle priority SCHED_FIFO task can block
> kthreadd's execution indefinitely and thus prevent the timely creation of the
> high-priority kernel thread.
>
> This causes a practical problem. When a runaway real-time task is eating 100%
> CPU and we attempt to put the CPU offline, sometimes we block while waiting for
> the creation of the highest-priority "kstopmachine" thread.
>
> This could be solved by always running kthreadd with the highest possible
> SCHED_FIFO priority, but that would be undesirable policy decision in the
> kernel. kthreadd would cause unwanted latencies even for the realtime users who
> know what they're doing.
>
> Let's not make the decision for the user. Just allow the administrator to
> change kthreadd's priority safely if he chooses to do it. Ensure that the
> kernel threads are created with the usual nice level even if kthreadd's
> priority is changed from the default.
>
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
> ---
>  kernel/kthread.c |   11 +++++++++++
>  1 files changed, 11 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index dcfe724..e832a85 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -94,10 +94,21 @@ static void create_kthread(struct kthread_create_info *create)
>         if (pid < 0) {
>                 create->result = ERR_PTR(pid);
>         } else {
> +               struct sched_param param = { .sched_priority = 0 };
>                 wait_for_completion(&create->started);
>                 read_lock(&tasklist_lock);
>                 create->result = find_task_by_pid(pid);
>                 read_unlock(&tasklist_lock);
> +               /*
> +                * root may want to change our (kthreadd's) priority to
> +                * realtime to solve a corner case priority inversion problem
> +                * (a realtime task consuming 100% CPU blocking the creation of
> +                * kernel threads). The kernel thread should not inherit the
> +                * higher priority. Let's always create it with the usual nice
> +                * level.
> +                */
> +               sched_setscheduler(create->result, SCHED_NORMAL, &param);
> +               set_user_nice(create->result, -5);
>         }
>         complete(&create->done);
>  }
> --
> 1.5.3.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: always create the kernel threads with normal priority
  2008-01-07 11:22     ` Remy Bohmer
@ 2008-01-07 13:10       ` Michal Schmidt
  2008-01-07 15:53         ` Remy Bohmer
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Schmidt @ 2008-01-07 13:10 UTC (permalink / raw)
  To: Remy Bohmer
  Cc: Andrew Morton, linux-kernel, Eric W. Biederman, Jon Masters,
	Satoru Takeuchi, Steven Rostedt

On Mon, 7 Jan 2008 12:22:51 +0100
"Remy Bohmer" <linux@bohmer.net> wrote:

> Hello Michal and Andrew,
> 
> > Let's not make the decision for the user. Just allow the
> > administrator to change kthreadd's priority safely if he chooses to
> > do it. Ensure that the kernel threads are created with the usual
> > nice level even if kthreadd's priority is changed from the default.
> 
> Last year, I posted a patchset (that was meant for Preempt-RT at that
> time) to be able to prioritise the interrupt-handler-threads (which
> are kthreads) and softirq-threads from the kernel commandline. See
> http://lkml.org/lkml/2007/12/19/208
> 
> Maybe we can find a way to use a similar mechanism as I used in my
> patchset for the priorities of the remaining kthreads.
> I do not like the way of forcing userland to change the priorities,
> because that would require a userland with the chrt tool installed,
> and that is not that practical for embedded systems (in which there
> could be cases that there is no userland at all, or the init-process
> is the whole embedded application). In that case an option to do it on
> the kernel commandline is more practical.
> 
> I propose this kernel cmd-line option:
> kthread_pmap=somethread:50,otherthread:12,34

I see. kthreadd would look up the priority for itself and
kthread_create would consult the map for all other kernel threads.
That should work.
Your sirq_pmap would not be needed anymore, as kthread_pmap could be
used for softirq threads too, right?

Michal

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: always create the kernel threads with normal priority
  2008-01-07 10:25     ` Andrew Morton
  2008-01-07 11:09       ` Ingo Molnar
@ 2008-01-07 13:18       ` Michal Schmidt
  2008-01-08 16:22         ` Ingo Molnar
  1 sibling, 1 reply; 21+ messages in thread
From: Michal Schmidt @ 2008-01-07 13:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Eric W. Biederman, Jon Masters, Satoru Takeuchi,
	Ingo Molnar

On Mon, 7 Jan 2008 02:25:13 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 7 Jan 2008 11:06:03 +0100 Michal Schmidt
> <mschmidt@redhat.com> wrote:
> 
> > On Sat, 22 Dec 2007 01:30:21 -0800
> > Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > On Mon, 17 Dec 2007 23:43:14 +0100 Michal Schmidt
> > > <mschmidt@redhat.com> wrote:
> > > 
> > > > kthreadd, the creator of other kernel threads, runs as a normal
> > > > priority task. This is a potential for priority inversion when a
> > > > task wants to spawn a high-priority kernel thread. A middle
> > > > priority SCHED_FIFO task can block kthreadd's execution
> > > > indefinitely and thus prevent the timely creation of the
> > > > high-priority kernel thread. 
> > > > This causes a practical problem. When a runaway real-time task
> > > > is eating 100% CPU and we attempt to put the CPU offline,
> > > > sometimes we block while waiting for the creation of the
> > > > highest-priority "kstopmachine" thread. 
> > > > 
> > > > The fix is to run kthreadd with the highest possible SCHED_FIFO
> > > > priority. Its children must still run as slightly negatively
> > > > reniced SCHED_NORMAL tasks.
> > > 
> > > Did you hit this problem with the stock kernel, or have you been
> > > working on other stuff?
> > 
> > This was with RHEL5 and with current Fedora kernels.
> > 
> > > A locked-up SCHED_FIFO process will cause kernel threads all
> > > sorts of problems.  You've hit one instance, but there will be
> > > others. (pdflush stops working, for one).
> > > 
> > > The general approach we've taken to this is "don't do that".
> > > Yes, we could boost lots of kernel threads in the way which this
> > > patch does but this actually takes control *away* from
> > > userspace.  Userspace no longer has the ability to guarantee
> > > itself minimum possible latency without getting preempted by
> > > kernel threads.
> > > 
> > > And yes, giving userspace this minimum-latency capability does
> > > imply that userspace has a responsibility to not 100% starve
> > > kernel threads.  It's a reasonable compromise, I think?
> > 
> > You're right. We should not run kthreadd with SCHED_FIFO by default.
> > But the user should be able to change it using chrt if he wants to
> > avoid this particular problem. So how about this instead?:
> > 
> > 
> > 
> > kthreadd, the creator of other kernel threads, runs as a normal
> > priority task. This is a potential for priority inversion when a
> > task wants to spawn a high-priority kernel thread. A middle
> > priority SCHED_FIFO task can block kthreadd's execution
> > indefinitely and thus prevent the timely creation of the
> > high-priority kernel thread.
> > 
> > This causes a practical problem. When a runaway real-time task is
> > eating 100% CPU and we attempt to put the CPU offline, sometimes we
> > block while waiting for the creation of the highest-priority
> > "kstopmachine" thread.
> > 
> > This could be solved by always running kthreadd with the highest
> > possible SCHED_FIFO priority, but that would be undesirable policy
> > decision in the kernel. kthreadd would cause unwanted latencies
> > even for the realtime users who know what they're doing.
> > 
> > Let's not make the decision for the user. Just allow the
> > administrator to change kthreadd's priority safely if he chooses to
> > do it. Ensure that the kernel threads are created with the usual
> > nice level even if kthreadd's priority is changed from the default.
> > 
> > Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
> > ---
> >  kernel/kthread.c |   11 +++++++++++
> >  1 files changed, 11 insertions(+), 0 deletions(-)
> > 
> > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > index dcfe724..e832a85 100644
> > --- a/kernel/kthread.c
> > +++ b/kernel/kthread.c
> > @@ -94,10 +94,21 @@ static void create_kthread(struct
> > kthread_create_info *create) if (pid < 0) {
> >  		create->result = ERR_PTR(pid);
> >  	} else {
> > +		struct sched_param param = { .sched_priority = 0 };
> >  		wait_for_completion(&create->started);
> >  		read_lock(&tasklist_lock);
> >  		create->result = find_task_by_pid(pid);
> >  		read_unlock(&tasklist_lock);
> > +		/*
> > +		 * root may want to change our (kthreadd's)
> > priority to
> > +		 * realtime to solve a corner case priority
> > inversion problem
> > +		 * (a realtime task consuming 100% CPU blocking
> > the creation of
> > +		 * kernel threads). The kernel thread should not
> > inherit the
> > +		 * higher priority. Let's always create it with
> > the usual nice
> > +		 * level.
> > +		 */
> > +		sched_setscheduler(create->result, SCHED_NORMAL,
> > &param);
> > +		set_user_nice(create->result, -5);
> >  	}
> >  	complete(&create->done);
> >  }
> 
> Seems reasonable.
> 
> As a followup thing, we now have two hard-coded magical -5's in
> kthread.c. It'd be nice to add a #define for this.

Done.

> It'd be nicer to work out where on earth that -5 came from too ;)
> 
> Readers might wonder why kthreadd children disinherit kthreadd's
> policy and priority, but retain its cpus_allowed (and whatever other
> stuff root could have altered?)

It's a good idea to reset the CPU mask too.


Allow the administrator to change kthreadd's priority and affinity.
Ensure that the kernel threads are created with the usual nice level
and affinity even if kthreadd's properties were changed from the
default.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

diff --git a/kernel/kthread.c b/kernel/kthread.c
index dcfe724..0ac8878 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -15,6 +15,8 @@
 #include <linux/mutex.h>
 #include <asm/semaphore.h>
 
+#define KTHREAD_NICE_LEVEL (-5)
+
 static DEFINE_SPINLOCK(kthread_create_lock);
 static LIST_HEAD(kthread_create_list);
 struct task_struct *kthreadd_task;
@@ -94,10 +96,18 @@ static void create_kthread(struct kthread_create_info *create)
 	if (pid < 0) {
 		create->result = ERR_PTR(pid);
 	} else {
+		struct sched_param param = { .sched_priority = 0 };
 		wait_for_completion(&create->started);
 		read_lock(&tasklist_lock);
 		create->result = find_task_by_pid(pid);
 		read_unlock(&tasklist_lock);
+		/*
+		 * root may have changed our (kthreadd's) priority or CPU mask.
+		 * The kernel thread should not inherit these properties.
+		 */
+		sched_setscheduler(create->result, SCHED_NORMAL, &param);
+		set_user_nice(create->result, KTHREAD_NICE_LEVEL);
+		set_cpus_allowed(create->result, CPU_MASK_ALL);
 	}
 	complete(&create->done);
 }
@@ -221,7 +231,7 @@ int kthreadd(void *unused)
 	/* Setup a clean context for our children to inherit. */
 	set_task_comm(tsk, "kthreadd");
 	ignore_signals(tsk);
-	set_user_nice(tsk, -5);
+	set_user_nice(tsk, KTHREAD_NICE_LEVEL);
 	set_cpus_allowed(tsk, CPU_MASK_ALL);
 
 	current->flags |= PF_NOFREEZE;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: always create the kernel threads with normal priority
  2008-01-07 13:10       ` Michal Schmidt
@ 2008-01-07 15:53         ` Remy Bohmer
  0 siblings, 0 replies; 21+ messages in thread
From: Remy Bohmer @ 2008-01-07 15:53 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: Andrew Morton, linux-kernel, Eric W. Biederman, Jon Masters,
	Satoru Takeuchi, Steven Rostedt

Hello Michal,

> > Maybe we can find a way to use a similar mechanism as I used in my
> > patchset for the priorities of the remaining kthreads.
> > I do not like the way of forcing userland to change the priorities,
> > because that would require a userland with the chrt tool installed,
> > and that is not that practical for embedded systems (in which there
> > could be cases that there is no userland at all, or the init-process
> > is the whole embedded application). In that case an option to do it on
> > the kernel commandline is more practical.
> >
> > I propose this kernel cmd-line option:
> > kthread_pmap=somethread:50,otherthread:12,34
>
> I see. kthreadd would look up the priority for itself and
> kthread_create would consult the map for all other kernel threads.
> That should work.
> Your sirq_pmap would not be needed anymore, as kthread_pmap could be
> used for softirq threads too, right?

That is correct. The soft-irqs are just ordinary kernel-threads, but
irq_pmap is still needed, to set the priority of a certain interrupt
handler.
In this case it also possible to set the prio of the IRQ-kthreads as
well as the prio of a certain interrupt handler. This might give some
conflicts, and I have to check how to resolve these.

Kind Regards,

Remy


2008/1/7, Michal Schmidt <mschmidt@redhat.com>:
> On Mon, 7 Jan 2008 12:22:51 +0100
> "Remy Bohmer" <linux@bohmer.net> wrote:
>
> > Hello Michal and Andrew,
> >
> > > Let's not make the decision for the user. Just allow the
> > > administrator to change kthreadd's priority safely if he chooses to
> > > do it. Ensure that the kernel threads are created with the usual
> > > nice level even if kthreadd's priority is changed from the default.
> >
> > Last year, I posted a patchset (that was meant for Preempt-RT at that
> > time) to be able to prioritise the interrupt-handler-threads (which
> > are kthreads) and softirq-threads from the kernel commandline. See
> > http://lkml.org/lkml/2007/12/19/208
> >
> > Maybe we can find a way to use a similar mechanism as I used in my
> > patchset for the priorities of the remaining kthreads.
> > I do not like the way of forcing userland to change the priorities,
> > because that would require a userland with the chrt tool installed,
> > and that is not that practical for embedded systems (in which there
> > could be cases that there is no userland at all, or the init-process
> > is the whole embedded application). In that case an option to do it on
> > the kernel commandline is more practical.
> >
> > I propose this kernel cmd-line option:
> > kthread_pmap=somethread:50,otherthread:12,34
>
> I see. kthreadd would look up the priority for itself and
> kthread_create would consult the map for all other kernel threads.
> That should work.
> Your sirq_pmap would not be needed anymore, as kthread_pmap could be
> used for softirq threads too, right?
>
> Michal
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: always create the kernel threads with normal priority
  2008-01-07 11:09       ` Ingo Molnar
@ 2008-01-07 17:29         ` Andrew Morton
  2008-01-07 17:47           ` Peter Zijlstra
  2008-01-08  9:54           ` Michal Schmidt
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2008-01-07 17:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michal Schmidt, linux-kernel, Eric W. Biederman, Jon Masters,
	Satoru Takeuchi, Peter Zijlstra

On Mon, 7 Jan 2008 12:09:04 +0100 Ingo Molnar <mingo@elte.hu> wrote:

> 
> > > This causes a practical problem. When a runaway real-time task is 
> > > eating 100% CPU and we attempt to put the CPU offline, sometimes we 
> > > block while waiting for the creation of the highest-priority 
> > > "kstopmachine" thread.
> 
> sched-devel.git has new mechanisms against runaway RT tasks. There's a 
> new RLIMIT_RTTIME rlimit - if an RT task exceeds that rlimit then it is 
> sent SIGXCPU.

Is that "total RT CPU time" or "elapsed time since last schedule()"?

If the former, it is not useful for this problem.

> there's also a new group scheduling extension that is driven via a 
> sysctl:
> 
>   /proc/sys/kernel/sched_rt_ratio
> 
> this way if a user has a runaway RT task, other users (and root) will 
> still have some CPU time left. (in Peter's latest patchset that is 
> replaced via rt_runtime_ns - but this is a detail)

Doesn't this make the RT task non-RT?  Would need to understand more
details to tell.

> so instead of the never-ending arms race of kernel thread priorities 
> against RT task priorities, we are going towards making RT tasks safer 
> on a policy level.
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: always create the kernel threads with normal priority
  2008-01-07 17:29         ` Andrew Morton
@ 2008-01-07 17:47           ` Peter Zijlstra
  2008-01-08  9:54           ` Michal Schmidt
  1 sibling, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2008-01-07 17:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Michal Schmidt, linux-kernel, Eric W. Biederman,
	Jon Masters, Satoru Takeuchi

[-- Attachment #1: Type: text/plain, Size: 2176 bytes --]


On Mon, 2008-01-07 at 09:29 -0800, Andrew Morton wrote:
> On Mon, 7 Jan 2008 12:09:04 +0100 Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > > > This causes a practical problem. When a runaway real-time task is 
> > > > eating 100% CPU and we attempt to put the CPU offline, sometimes we 
> > > > block while waiting for the creation of the highest-priority 
> > > > "kstopmachine" thread.
> > 
> > sched-devel.git has new mechanisms against runaway RT tasks. There's a 
> > new RLIMIT_RTTIME rlimit - if an RT task exceeds that rlimit then it is 
> > sent SIGXCPU.
> 
> Is that "total RT CPU time" or "elapsed time since last schedule()"?
> 
> If the former, it is not useful for this problem.
> 
> > there's also a new group scheduling extension that is driven via a 
> > sysctl:
> > 
> >   /proc/sys/kernel/sched_rt_ratio
> > 
> > this way if a user has a runaway RT task, other users (and root) will 
> > still have some CPU time left. (in Peter's latest patchset that is 
> > replaced via rt_runtime_ns - but this is a detail)
> 
> Doesn't this make the RT task non-RT?  Would need to understand more
> details to tell.

Its an artifact of rt group scheduling. Each group will have to specify
a period and runtime limit therein (and the normalized sum thereof must
not exceed the total time available - otherwise the set is not
schedulable).

So say we have two groups A and B. A has a period of 2 seconds and a
runtime limit of 1, that gives him an avg of 50% cpu time. If B then has
a period of 1 second with a runtime limit of .25s (avg 25%) the total
time required to schedule the realtime groups would be 75% on average.

Without group scheduling everything is considered one group but we still
have the period and runtime limits.

So as long as the realtime cpu usage fits within the given limits it
acts as before. Once it exceeds its limit it will be capped hard - which
is ok, since it exceeded its hard limit, and realtime applications are
supposed to be deterministic and thus be able to tell how much time
they'd require. [ If only this model were true, but its a model
frequently used and quite accepted ]



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: always create the kernel threads with normal priority
  2008-01-07 17:29         ` Andrew Morton
  2008-01-07 17:47           ` Peter Zijlstra
@ 2008-01-08  9:54           ` Michal Schmidt
  1 sibling, 0 replies; 21+ messages in thread
From: Michal Schmidt @ 2008-01-08  9:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, linux-kernel, Eric W. Biederman, Jon Masters,
	Satoru Takeuchi, Peter Zijlstra

On Mon, 7 Jan 2008 09:29:56 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 7 Jan 2008 12:09:04 +0100 Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > > > This causes a practical problem. When a runaway real-time task
> > > > is eating 100% CPU and we attempt to put the CPU offline,
> > > > sometimes we block while waiting for the creation of the
> > > > highest-priority "kstopmachine" thread.
> > 
> > sched-devel.git has new mechanisms against runaway RT tasks.
> > There's a new RLIMIT_RTTIME rlimit - if an RT task exceeds that
> > rlimit then it is sent SIGXCPU.
> 
> Is that "total RT CPU time" or "elapsed time since last schedule()"?
> 
> If the former, it is not useful for this problem.

It's "runtime since last sleep" so it is useful.

I still think the kthread patch is good to have anyway. The user can
have other reasons to change kthreadd's priority/cpumask.

Michal

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] kthread: always create the kernel threads with normal priority
  2008-01-07 13:18       ` Michal Schmidt
@ 2008-01-08 16:22         ` Ingo Molnar
  0 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2008-01-08 16:22 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: Andrew Morton, linux-kernel, Eric W. Biederman, Jon Masters,
	Satoru Takeuchi


* Michal Schmidt <mschmidt@redhat.com> wrote:

> Allow the administrator to change kthreadd's priority and affinity. 
> Ensure that the kernel threads are created with the usual nice level 
> and affinity even if kthreadd's properties were changed from the 
> default.
> 
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

thanks Michal, applied your patch to sched-devel.git.

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-01-08 16:23 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-17 22:43 [PATCH] kthread: run kthreadd with max priority SCHED_FIFO Michal Schmidt
2007-12-17 23:00 ` Jon Masters
2007-12-22  9:30 ` Andrew Morton
2007-12-22  9:52   ` Jon Masters
2007-12-22 10:11     ` Andrew Morton
2007-12-22 10:18       ` Jon Masters
2007-12-22 10:39     ` Mike Galbraith
2007-12-22 10:52       ` Andrew Morton
2007-12-22 11:21         ` Jon Masters
2007-12-23  8:50         ` Mike Galbraith
2008-01-07 10:06   ` [PATCH] kthread: always create the kernel threads with normal priority Michal Schmidt
2008-01-07 10:25     ` Andrew Morton
2008-01-07 11:09       ` Ingo Molnar
2008-01-07 17:29         ` Andrew Morton
2008-01-07 17:47           ` Peter Zijlstra
2008-01-08  9:54           ` Michal Schmidt
2008-01-07 13:18       ` Michal Schmidt
2008-01-08 16:22         ` Ingo Molnar
2008-01-07 11:22     ` Remy Bohmer
2008-01-07 13:10       ` Michal Schmidt
2008-01-07 15:53         ` Remy Bohmer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).