linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] delay rq_lock acquisition in setscheduler
@ 2004-10-21  1:32 Chris Wright
  2004-10-21  2:00 ` Andrea Arcangeli
  2004-10-21  7:56 ` Ingo Molnar
  0 siblings, 2 replies; 6+ messages in thread
From: Chris Wright @ 2004-10-21  1:32 UTC (permalink / raw)
  To: mingo; +Cc: johansen, Stephen Smalley, Thomas Bleher, linux-kernel

Hi Ingo,

Doing access control checks with rq_lock held can cause deadlock
when audit messages are created (via printk or audit infrastructure),
as noted by both SELinux and SubDomain folks.  This patch will let the
security checks happen w/out lock held, then re-sample the p->policy in
case it was raced.  Original patch from John Johansen, with some updates
from me.  What do you think?

From: John Johansen <johansen@immunix.com>
Signed-off-by: Chris Wright <chrisw@osdl.org>

===== kernel/sched.c 1.367 vs edited =====
--- 1.367/kernel/sched.c	2004-10-18 22:26:52 -07:00
+++ edited/kernel/sched.c	2004-10-20 15:55:12 -07:00
@@ -3038,7 +3038,7 @@
 {
 	struct sched_param lp;
 	int retval = -EINVAL;
-	int oldprio;
+	int oldprio, oldpolicy = -1;
 	prio_array_t *array;
 	unsigned long flags;
 	runqueue_t *rq;
@@ -3060,23 +3060,17 @@
 
 	retval = -ESRCH;
 	if (!p)
-		goto out_unlock_tasklist;
-
-	/*
-	 * To be able to change p->policy safely, the apropriate
-	 * runqueue lock must be held.
-	 */
-	rq = task_rq_lock(p, &flags);
+		goto out_unlock;
 
+	/* double check policy once rq lock held */
 	if (policy < 0)
-		policy = p->policy;
+		policy = oldpolicy = p->policy;
 	else {
 		retval = -EINVAL;
 		if (policy != SCHED_FIFO && policy != SCHED_RR &&
 				policy != SCHED_NORMAL)
 			goto out_unlock;
 	}
-
 	/*
 	 * Valid priorities for SCHED_FIFO and SCHED_RR are
 	 * 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_NORMAL is 0.
@@ -3098,7 +3092,15 @@
 	retval = security_task_setscheduler(p, policy, &lp);
 	if (retval)
 		goto out_unlock;
-
+	/*
+	 * To be able to change p->policy safely, the apropriate
+	 * runqueue lock must be held.
+	 */
+	rq = task_rq_lock(p, &flags);
+	/* recheck policy now with rq lock held */
+	retval = -EPERM;
+	if (unlikely(oldpolicy != -1 && oldpolicy != p->policy))
+		goto out_unlock_rq;
 	array = p->array;
 	if (array)
 		deactivate_task(p, task_rq(p));
@@ -3118,12 +3120,10 @@
 		} else if (TASK_PREEMPTS_CURR(p, rq))
 			resched_task(rq->curr);
 	}
-
-out_unlock:
+out_unlock_rq:
 	task_rq_unlock(rq, &flags);
-out_unlock_tasklist:
+out_unlock:
 	read_unlock_irq(&tasklist_lock);
-
 out_nounlock:
 	return retval;
 }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] delay rq_lock acquisition in setscheduler
  2004-10-21  1:32 [RFC][PATCH] delay rq_lock acquisition in setscheduler Chris Wright
@ 2004-10-21  2:00 ` Andrea Arcangeli
  2004-10-21  5:16   ` Chris Wright
  2004-10-21  7:56 ` Ingo Molnar
  1 sibling, 1 reply; 6+ messages in thread
From: Andrea Arcangeli @ 2004-10-21  2:00 UTC (permalink / raw)
  To: Chris Wright
  Cc: mingo, johansen, Stephen Smalley, Thomas Bleher, linux-kernel

On Wed, Oct 20, 2004 at 06:32:38PM -0700, Chris Wright wrote:
> +	rq = task_rq_lock(p, &flags);
> +	/* recheck policy now with rq lock held */
> +	retval = -EPERM;
> +	if (unlikely(oldpolicy != -1 && oldpolicy != p->policy))
> +		goto out_unlock_rq;

to be really backwards compatible you should return 0 methinks, the only
case when this race can trigger is with non deterministic usage, and the
current kernel would never return -EPERM in such a non deterministic
usage. However the -EPERM will signal the non deterministic usage, but I
doubt it worth to return -EPERM there, since it makes it looks like the
other side that didn't get EPERM is safe while it's not, since the other
side isn't deterministic either.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] delay rq_lock acquisition in setscheduler
  2004-10-21  2:00 ` Andrea Arcangeli
@ 2004-10-21  5:16   ` Chris Wright
  2004-10-21 12:53     ` Andrea Arcangeli
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Wright @ 2004-10-21  5:16 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Chris Wright, mingo, johansen, Stephen Smalley, Thomas Bleher,
	linux-kernel

* Andrea Arcangeli (andrea@novell.com) wrote:
> On Wed, Oct 20, 2004 at 06:32:38PM -0700, Chris Wright wrote:
> > +	rq = task_rq_lock(p, &flags);
> > +	/* recheck policy now with rq lock held */
> > +	retval = -EPERM;
> > +	if (unlikely(oldpolicy != -1 && oldpolicy != p->policy))
> > +		goto out_unlock_rq;
> 
> to be really backwards compatible you should return 0 methinks, the only
> case when this race can trigger is with non deterministic usage, and the
> current kernel would never return -EPERM in such a non deterministic
> usage. However the -EPERM will signal the non deterministic usage, but I
> doubt it worth to return -EPERM there, since it makes it looks like the
> other side that didn't get EPERM is safe while it's not, since the other
> side isn't deterministic either.

true.  another alternative is to drop rq_lock and do the checks over.
i didn't convince myself yet that there's no chance for livelock,
although it seems unlikely.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] delay rq_lock acquisition in setscheduler
  2004-10-21  1:32 [RFC][PATCH] delay rq_lock acquisition in setscheduler Chris Wright
  2004-10-21  2:00 ` Andrea Arcangeli
@ 2004-10-21  7:56 ` Ingo Molnar
  2004-10-21 17:25   ` Chris Wright
  1 sibling, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2004-10-21  7:56 UTC (permalink / raw)
  To: Chris Wright; +Cc: johansen, Stephen Smalley, Thomas Bleher, linux-kernel


* Chris Wright <chrisw@osdl.org> wrote:

> Hi Ingo,
> 
> Doing access control checks with rq_lock held can cause deadlock when
> audit messages are created (via printk or audit infrastructure), as
> noted by both SELinux and SubDomain folks.  This patch will let the
> security checks happen w/out lock held, then re-sample the p->policy
> in case it was raced.  Original patch from John Johansen, with some
> updates from me.  What do you think?

i dont this this patch is correct, because it changes semantics by
pushing a security-subsystem failure back into userspace. There's
nothing wrong with two tasks trying to change a third task's policy in
parallel - our API allows that.

I agree that this is a very special case of lock dependency and that the
security subsystem should not be burdened with double-buffering messages
just to avoid the runqueue lock on syslogd wakeup. Only this single
scheduling-related system-call is affected by this problem.

i think the right solution would be to retry the permission check if the
policy has changed (an unlikely event). It is livelockable the same way
seqlocks are livelockable so i'd not worry about it too much. It is also
preemptible with PREEMPT so not a big issue. Also, the check & repeat
code should possibly be #ifdef CONFIG_SECURITY.

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] delay rq_lock acquisition in setscheduler
  2004-10-21  5:16   ` Chris Wright
@ 2004-10-21 12:53     ` Andrea Arcangeli
  0 siblings, 0 replies; 6+ messages in thread
From: Andrea Arcangeli @ 2004-10-21 12:53 UTC (permalink / raw)
  To: Chris Wright
  Cc: mingo, johansen, Stephen Smalley, Thomas Bleher, linux-kernel

On Wed, Oct 20, 2004 at 10:16:32PM -0700, Chris Wright wrote:
> true.  another alternative is to drop rq_lock and do the checks over.
> i didn't convince myself yet that there's no chance for livelock,
> although it seems unlikely.

yep, since the workload isn't deterministic if the race triggers I got
convinced the retry loop wasn't strictly needed. There should be no
livelock, however with the loop just like with the spinlocks there's no
fariness guarantee on the numa (especially old numa). (and fixing the
spinlocks is easier for the architecture by implementing a fair version
transparently). That's probably the only issue with the loops.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] delay rq_lock acquisition in setscheduler
  2004-10-21  7:56 ` Ingo Molnar
@ 2004-10-21 17:25   ` Chris Wright
  0 siblings, 0 replies; 6+ messages in thread
From: Chris Wright @ 2004-10-21 17:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrea Arcangeli, Chris Wright, johansen, Stephen Smalley,
	Thomas Bleher, linux-kernel

* Ingo Molnar (mingo@elte.hu) wrote:
> i dont this this patch is correct, because it changes semantics by
> pushing a security-subsystem failure back into userspace. There's
> nothing wrong with two tasks trying to change a third task's policy in
> parallel - our API allows that.

Andrea and John had similar concern.

> I agree that this is a very special case of lock dependency and that the
> security subsystem should not be burdened with double-buffering messages
> just to avoid the runqueue lock on syslogd wakeup. Only this single
> scheduling-related system-call is affected by this problem.
> 
> i think the right solution would be to retry the permission check if the
> policy has changed (an unlikely event). It is livelockable the same way
> seqlocks are livelockable so i'd not worry about it too much. It is also
> preemptible with PREEMPT so not a big issue. Also, the check & repeat
> code should possibly be #ifdef CONFIG_SECURITY.

I think ifdef would start to look messy in that function.  This one
simply rechecks permissions if the policy has changed.  Look OK?

Doing access control checks with rq_lock held can cause deadlock when
audit messages are created (via printk or audit infrastructure) which
trigger a wakeup and deadlock, as noted by both SELinux and SubDomain
folks.  This patch will let the security checks happen w/out lock held,
then re-sample the p->policy in case it was raced.  Originally from
John Johansen <johansen@immunix.com>, bits redone by me.

From: John Johansen <johansen@immunix.com>
Signed-off-by: Chris Wright <chrisw@osdl.org>

===== kernel/sched.c 1.367 vs edited =====
--- 1.367/kernel/sched.c	2004-10-18 22:26:52 -07:00
+++ edited/kernel/sched.c	2004-10-21 09:41:23 -07:00
@@ -3038,7 +3038,7 @@
 {
 	struct sched_param lp;
 	int retval = -EINVAL;
-	int oldprio;
+	int oldprio, oldpolicy = -1;
 	prio_array_t *array;
 	unsigned long flags;
 	runqueue_t *rq;
@@ -3060,23 +3060,17 @@
 
 	retval = -ESRCH;
 	if (!p)
-		goto out_unlock_tasklist;
-
-	/*
-	 * To be able to change p->policy safely, the apropriate
-	 * runqueue lock must be held.
-	 */
-	rq = task_rq_lock(p, &flags);
-
+		goto out_unlock;
+recheck:
+	/* double check policy once rq lock held */
 	if (policy < 0)
-		policy = p->policy;
+		policy = oldpolicy = p->policy;
 	else {
 		retval = -EINVAL;
 		if (policy != SCHED_FIFO && policy != SCHED_RR &&
 				policy != SCHED_NORMAL)
 			goto out_unlock;
 	}
-
 	/*
 	 * Valid priorities for SCHED_FIFO and SCHED_RR are
 	 * 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_NORMAL is 0.
@@ -3098,7 +3092,17 @@
 	retval = security_task_setscheduler(p, policy, &lp);
 	if (retval)
 		goto out_unlock;
-
+	/*
+	 * To be able to change p->policy safely, the apropriate
+	 * runqueue lock must be held.
+	 */
+	rq = task_rq_lock(p, &flags);
+	/* recheck policy now with rq lock held */
+	if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
+		policy = oldpolicy = -1;
+		task_rq_unlock(rq, &flags);
+		goto recheck;
+	}
 	array = p->array;
 	if (array)
 		deactivate_task(p, task_rq(p));
@@ -3118,12 +3122,9 @@
 		} else if (TASK_PREEMPTS_CURR(p, rq))
 			resched_task(rq->curr);
 	}
-
-out_unlock:
 	task_rq_unlock(rq, &flags);
-out_unlock_tasklist:
+out_unlock:
 	read_unlock_irq(&tasklist_lock);
-
 out_nounlock:
 	return retval;
 }

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-10-21 17:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-21  1:32 [RFC][PATCH] delay rq_lock acquisition in setscheduler Chris Wright
2004-10-21  2:00 ` Andrea Arcangeli
2004-10-21  5:16   ` Chris Wright
2004-10-21 12:53     ` Andrea Arcangeli
2004-10-21  7:56 ` Ingo Molnar
2004-10-21 17:25   ` Chris Wright

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).