linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* rwlock_t unfairness and tasklist_lock
@ 2013-01-09  4:03 Michel Lespinasse
  2013-01-09 17:49 ` Oleg Nesterov
  2013-01-11 14:34 ` Thomas Gleixner
  0 siblings, 2 replies; 8+ messages in thread
From: Michel Lespinasse @ 2013-01-09  4:03 UTC (permalink / raw)
  To: David Howells, Thomas Gleixner, Salman Qazi, Oleg Nesterov, LKML

Like others before me, I have discovered how easy it is to DOS a
system by abusing the rwlock_t unfairness and causing the
tasklist_lock read side to be continuously held (my abuse code makes
use of the getpriority syscall, but there are plenty of other ways
anyway).

My understanding is that the issue of rwlock_t fairness has come up
several times over the last 10 years (I first saw a fair rwlock_t
proposal by David Howells 10 years ago,
https://lkml.org/lkml/2002/11/8/102), and every time the answer has
been that we can't easily change this because tasklist_lock makes use
of the read-side reentrancy and interruptibility properties of
rwlock_t, and that we should really find something smart to do about
tasklist_lock. Yet that last part never gets done, and the problem is
still with us.

I am wondering:

- Does anyone know of any current work towards removing the
tasklist_lock use of rwlock_t ? Thomas Gleixner mentioned 3 years ago
that he'd give it a shot (https://lwn.net/Articles/364601/), did he
encounter some unforeseen difficulty that we should learn from ?

- Would there be any fundamental objection to implementing a fair
rwlock_t and dealing with the reentrancy issues in tasklist_lock ? My
proposal there would be along the lines of:

1- implement a fair rwlock_t - the ticket based idea from David
Howells seems quite appropriate to me

2- if any places use reader side reentrancy within the same context,
adjust the code as needed to get rid of that reentrancy

3- a simple way to deal with reentrancy between contexts (as in, we
take the tasklist_lock read side in process context, get interrupted,
and we now need to take it again in interrupt or softirq context)
would be to have different locks depending on context. tasklist_lock
read side in process context would work as usual, but in irq or
contexts we'd take tasklist_irq_lock instead (and, if there are any
irq handlers taking tasklist_lock read side, we'd have to disable
interrupt handling when tasklist_irq_lock is held to avoid further
nesting). tasklist_lock write side - that is, mainly fork() and exec()
- would have to take both tasklist_lock and tasklist_irq_lock, in that
order.

While it might seem to be a downside that tasklist_lock write side
would now have to take both tasklist_lock and tasklist_irq_lock, I
must note that this wouldn't increase the number of atomic operations:
the current rwlock_t implementation uses atomics on both lock and
unlock, while the ticket based one would only need atomics on the lock
side (unlock is just a regular mov instruction), so the total cost
should be comparable to what we have now.

Any comments about this proposal ?

(I should note that I haven't given much thought to tasklist_lock
before, and I'm not quite sure just from code inspection which read
locks are run in which context...)

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rwlock_t unfairness and tasklist_lock
  2013-01-09  4:03 rwlock_t unfairness and tasklist_lock Michel Lespinasse
@ 2013-01-09 17:49 ` Oleg Nesterov
  2013-01-09 23:20   ` Michel Lespinasse
  2013-01-11 14:34 ` Thomas Gleixner
  1 sibling, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2013-01-09 17:49 UTC (permalink / raw)
  To: Michel Lespinasse; +Cc: David Howells, Thomas Gleixner, Salman Qazi, LKML

On 01/08, Michel Lespinasse wrote:
>
> Like others before me, I have discovered how easy it is to DOS a
> system by abusing the rwlock_t unfairness and causing the
> tasklist_lock read side to be continuously held

Yes. Plus it has perfomance problems.

It should die. We still need the global lock to protect, say,
init_task.tasks list, but otherwise we need the per-process locking.

> - Would there be any fundamental objection to implementing a fair
> rwlock_t and dealing with the reentrancy issues in tasklist_lock ? My
> proposal there would be along the lines of:

I don't really understand your proposal in details, but until we kill
tasklist_lock, perhaps it makes sense to implement something simple, say,
write-biased rwlock and add "int task_struct->tasklist_read_lock_counter"
to avoid the read-write-read deadlock.

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rwlock_t unfairness and tasklist_lock
  2013-01-09 17:49 ` Oleg Nesterov
@ 2013-01-09 23:20   ` Michel Lespinasse
  2013-01-12 17:31     ` Oleg Nesterov
  0 siblings, 1 reply; 8+ messages in thread
From: Michel Lespinasse @ 2013-01-09 23:20 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: David Howells, Thomas Gleixner, Salman Qazi, LKML

On Wed, Jan 9, 2013 at 9:49 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 01/08, Michel Lespinasse wrote:
>> Like others before me, I have discovered how easy it is to DOS a
>> system by abusing the rwlock_t unfairness and causing the
>> tasklist_lock read side to be continuously held
>
> Yes. Plus it has perfomance problems.
>
> It should die. We still need the global lock to protect, say,
> init_task.tasks list, but otherwise we need the per-process locking.

To be clear: I'm not trying to defend tasklist_lock here. However,
given how long this has been a known issue, I think we should consider
attacking the problem from the lock fairness perspective first and
stop waiting for an eventual tasklist_lock death.

>> - Would there be any fundamental objection to implementing a fair
>> rwlock_t and dealing with the reentrancy issues in tasklist_lock ? My
>> proposal there would be along the lines of:
>
> I don't really understand your proposal in details, but until we kill
> tasklist_lock, perhaps it makes sense to implement something simple, say,
> write-biased rwlock and add "int task_struct->tasklist_read_lock_counter"
> to avoid the read-write-read deadlock.

Right. But one complexity that has to be dealt with, is how to handle
reentrant uses of the tasklist_lock read side, when such uses come
from a different context (say, the lock was first taken in process
context and the reentrant use is in irq or softirq context).

If in process context we take the tasklist_lock read side, and *then*
increment the tasklist_read_lock_counter, there is still the
possibility of an irq coming up in before the counter is incremented.
So to deal with that, I think we have to explicitly detect the
tasklist_lock uses that are in irq/softirq context and deal with these
differently from those in process context - we would have to either
ignore the tasklist_lock write bias when in irq/softirq context, or we
could deal with it by taking a separate lock then (as in my proposal).

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rwlock_t unfairness and tasklist_lock
  2013-01-09  4:03 rwlock_t unfairness and tasklist_lock Michel Lespinasse
  2013-01-09 17:49 ` Oleg Nesterov
@ 2013-01-11 14:34 ` Thomas Gleixner
  2013-01-12  3:33   ` [PATCH] " Michel Lespinasse
  1 sibling, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2013-01-11 14:34 UTC (permalink / raw)
  To: Michel Lespinasse; +Cc: David Howells, Salman Qazi, Oleg Nesterov, LKML

On Tue, 8 Jan 2013, Michel Lespinasse wrote:
> - Does anyone know of any current work towards removing the
> tasklist_lock use of rwlock_t ? Thomas Gleixner mentioned 3 years ago
> that he'd give it a shot (https://lwn.net/Articles/364601/), did he
> encounter some unforeseen difficulty that we should learn from ?

I converted quite a bunch of the read side instances to rcu
protection, but got distracted. There was no fundamental difficulty,
just lack of time.
 
> - Would there be any fundamental objection to implementing a fair
> rwlock_t and dealing with the reentrancy issues in tasklist_lock ? My
> proposal there would be along the lines of:
> 
> 1- implement a fair rwlock_t - the ticket based idea from David
> Howells seems quite appropriate to me

Nah. Lets get it killed. Most of the stuff can be converted to RCU and
the remaining bits and pieces are the write lock sides which then can
be converted to a big standard spinlock. There might be a few more
complex ones, but Oleg said back then that those should be solved by
locking the process instead of locking the whole tasklist.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] rwlock_t unfairness and tasklist_lock
  2013-01-11 14:34 ` Thomas Gleixner
@ 2013-01-12  3:33   ` Michel Lespinasse
  2013-01-12 17:46     ` Oleg Nesterov
  0 siblings, 1 reply; 8+ messages in thread
From: Michel Lespinasse @ 2013-01-12  3:33 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: David Howells, Salman Qazi, Oleg Nesterov, LKML

On Fri, Jan 11, 2013 at 03:34:41PM +0100, Thomas Gleixner wrote:
> On Tue, 8 Jan 2013, Michel Lespinasse wrote:
> > - Does anyone know of any current work towards removing the
> > tasklist_lock use of rwlock_t ? Thomas Gleixner mentioned 3 years ago
> > that he'd give it a shot (https://lwn.net/Articles/364601/), did he
> > encounter some unforeseen difficulty that we should learn from ?
> 
> I converted quite a bunch of the read side instances to rcu
> protection, but got distracted. There was no fundamental difficulty,
> just lack of time.

All right. Thanks for explaining here and offline; it looks like the
problem is not as intractable as I had thought initially.

> > - Would there be any fundamental objection to implementing a fair
> > rwlock_t and dealing with the reentrancy issues in tasklist_lock ? My
> > proposal there would be along the lines of:
> > 
> > 1- implement a fair rwlock_t - the ticket based idea from David
> > Howells seems quite appropriate to me
> 
> Nah. Lets get it killed. Most of the stuff can be converted to RCU and
> the remaining bits and pieces are the write lock sides which then can
> be converted to a big standard spinlock. There might be a few more
> complex ones, but Oleg said back then that those should be solved by
> locking the process instead of locking the whole tasklist.

So I looked again at getpriority() since that's what I had used for my
DOS test code, and it looks like everything there is already protected
by RCU or smaller granularity locks and refcounts. Patch attached to
remove this tasklist_lock usage.

Since I'm new to this, I would like someone to double check me.
Also, what is the proper tree to send such patches to so they'll get
some testing before making it into Linus's tree ?

--------------------------------8<-----------------------------

remove use of tasklist_lock in getpriority / setpriority syscalls

I can't see anything in these syscalls that isn't already protected
by RCU (for the task/thread iterations and for mapping pids to tasks)
or by smaller granularity locks (for set_one_prio()) or refcounts
(for find_user()). So, it looks like we can just remove the use of
tasklist_lock...

Signed-off-by: Michel Lespinasse <walken@google.com>

---
 kernel/sys.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 265b37690421..5df66d4b118f 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -189,7 +189,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 		niceval = 19;
 
 	rcu_read_lock();
-	read_lock(&tasklist_lock);
 	switch (which) {
 		case PRIO_PROCESS:
 			if (who)
@@ -226,7 +225,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 			break;
 	}
 out_unlock:
-	read_unlock(&tasklist_lock);
 	rcu_read_unlock();
 out:
 	return error;
@@ -251,7 +249,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 		return -EINVAL;
 
 	rcu_read_lock();
-	read_lock(&tasklist_lock);
 	switch (which) {
 		case PRIO_PROCESS:
 			if (who)
@@ -296,7 +293,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 			break;
 	}
 out_unlock:
-	read_unlock(&tasklist_lock);
 	rcu_read_unlock();
 
 	return retval;

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: rwlock_t unfairness and tasklist_lock
  2013-01-09 23:20   ` Michel Lespinasse
@ 2013-01-12 17:31     ` Oleg Nesterov
  2013-01-25  0:33       ` Michel Lespinasse
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2013-01-12 17:31 UTC (permalink / raw)
  To: Michel Lespinasse; +Cc: David Howells, Thomas Gleixner, Salman Qazi, LKML

On 01/09, Michel Lespinasse wrote:
>
> On Wed, Jan 9, 2013 at 9:49 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 01/08, Michel Lespinasse wrote:
> >> Like others before me, I have discovered how easy it is to DOS a
> >> system by abusing the rwlock_t unfairness and causing the
> >> tasklist_lock read side to be continuously held
> >
> > Yes. Plus it has perfomance problems.
> >
> > It should die. We still need the global lock to protect, say,
> > init_task.tasks list, but otherwise we need the per-process locking.
>
> To be clear: I'm not trying to defend tasklist_lock here.

I understand,

> However,
> given how long this has been a known issue, I think we should consider
> attacking the problem from the lock fairness perspective first and
> stop waiting for an eventual tasklist_lock death.

And probably you are right,

> >> - Would there be any fundamental objection to implementing a fair
> >> rwlock_t and dealing with the reentrancy issues in tasklist_lock ? My
> >> proposal there would be along the lines of:
> >
> > I don't really understand your proposal in details, but until we kill
> > tasklist_lock, perhaps it makes sense to implement something simple, say,
> > write-biased rwlock and add "int task_struct->tasklist_read_lock_counter"
> > to avoid the read-write-read deadlock.
>
> Right. But one complexity that has to be dealt with, is how to handle
> reentrant uses of the tasklist_lock read side,
> ...
>
> there is still the
> possibility of an irq coming up in before the counter is incremented.

Sure, I didn't try to say that it is trivial to implement
read_lock_tasklist(), we should prevent this race.

> So to deal with that, I think we have to explicitly detect the
> tasklist_lock uses that are in irq/softirq context and deal with these
> differently from those in process context

I disagree. In the long term, I think that tasklist (or whatever we use
instead) should be never used in irq/atomic context. And probably the
per-process lock should be rw_semaphore (although it is not recursive).

But until then, if we try to improve the things somehow, we should not
complicate the code, we need something simple.

But actually I am not sure, you can be right.

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] rwlock_t unfairness and tasklist_lock
  2013-01-12  3:33   ` [PATCH] " Michel Lespinasse
@ 2013-01-12 17:46     ` Oleg Nesterov
  0 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2013-01-12 17:46 UTC (permalink / raw)
  To: Michel Lespinasse; +Cc: Thomas Gleixner, David Howells, Salman Qazi, LKML

On 01/11, Michel Lespinasse wrote:
>
> So I looked again at getpriority() since that's what I had used for my
> DOS test code, and it looks like everything there is already protected
> by RCU or smaller granularity locks and refcounts. Patch attached to
> remove this tasklist_lock usage.

And probably the change in getpriority() is fine, but ...

> @@ -189,7 +189,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
>  		niceval = 19;
>
>  	rcu_read_lock();
> -	read_lock(&tasklist_lock);
>  	switch (which) {
>  		case PRIO_PROCESS:
>  			if (who)
> @@ -226,7 +225,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
>  			break;
>  	}
>  out_unlock:
> -	read_unlock(&tasklist_lock);

you also changed setpriority(), this should be documented at least ;)

OK. Even without this change, say, sys_setpriority(PRIO_PGRP) can obviously
race with fork(), so this change probably is not bad.

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rwlock_t unfairness and tasklist_lock
  2013-01-12 17:31     ` Oleg Nesterov
@ 2013-01-25  0:33       ` Michel Lespinasse
  0 siblings, 0 replies; 8+ messages in thread
From: Michel Lespinasse @ 2013-01-25  0:33 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: David Howells, Thomas Gleixner, Salman Qazi, LKML

On Sat, Jan 12, 2013 at 9:31 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 01/09, Michel Lespinasse wrote:
>> >> - Would there be any fundamental objection to implementing a fair
>> >> rwlock_t and dealing with the reentrancy issues in tasklist_lock ? My
>> >> proposal there would be along the lines of:
>> >
>> > I don't really understand your proposal in details, but until we kill
>> > tasklist_lock, perhaps it makes sense to implement something simple, say,
>> > write-biased rwlock and add "int task_struct->tasklist_read_lock_counter"
>> > to avoid the read-write-read deadlock.
>>
>> Right. But one complexity that has to be dealt with, is how to handle
>> reentrant uses of the tasklist_lock read side,
>> ...
>>
>> there is still the
>> possibility of an irq coming up in before the counter is incremented.
>
> Sure, I didn't try to say that it is trivial to implement
> read_lock_tasklist(), we should prevent this race.
>
>> So to deal with that, I think we have to explicitly detect the
>> tasklist_lock uses that are in irq/softirq context and deal with these
>> differently from those in process context
>
> I disagree. In the long term, I think that tasklist (or whatever we use
> instead) should be never used in irq/atomic context. And probably the
> per-process lock should be rw_semaphore (although it is not recursive).

All right. So I went through all tasklist_lock call sites, and
converted them to use helper functions:

First 4 are for the sites I know are only used in process context:
tasklist_write_lock() / tasklist_write_unlock() / tasklist_read_lock()
/ tasklist_read_unlock()

Remaining ones are for the sites that can be called from irq/softirq
context as well:
tassklist_read_lock_any() / tasklist_read_trylock_any() /
tasklist_read_unlock_any()

I'm not sure if upstream would take these ? IMO, it helps to identify
the irq context call sites. If I didn't get it wrong, they are:

- send_sigio(): may be called in any context from kill_fasync()
  (examples: process context from pipe_read(), softirq context from
   n_tty_receive_buf(), hardirq context from rtc_interrupt())

- send_sigurg(): may be called from process or softirq context
  (network receive is typically processed in softirq, but packets can be
   queued up while sockets are being locked and the backlog processed from
   process context as the socket gets unlocked)

- kill_pgrp(): called in process context from job_control(), pty_resize()
  or in softirq context through n_tty_receive_buf()
  or even in hardirq context from arch/um/drivers/line.c winch_interrupt()

- posix_cpu_timer_schedule(): called through cpu_timer_fire(),
  in process context (from posix_cpu_timer_set()) or
  in hardirq context (from run_posix_cpu_timers())

- sysrq debugging features: handle_sysrq() runs in multiple contexts and
  calls into send_sig_all(), debug_show_all_locks(), normalize_rt_tasks(),
  print_rq().

- arch/blackfin/kernel/trace.c decode_address(): another debugging feature,
  may be called from any contexts.

I suppose we could do away with the sysrq stuff (or run it in a work
queue or whatever). This leaves us with signal delivery and posix cpu
timers. To be honest, I looked at signal delivery and couldn't tell
why it needs to take the tasklist_lock, but I also couldn't remove it
in any way that I would feel safe about :)

> But until then, if we try to improve the things somehow, we should not
> complicate the code, we need something simple.
>
> But actually I am not sure, you can be right.

I actually also have an implementation that makes tasklist_lock fair
for process context call sites. It's easy enough if you can use a
split lock for the process vs irq context uses. But I agree it'd be
much nicer / more upstreamable if we could just get rid of the irq
context uses first, at which point there is no remaining obstacle to
replacing rwlock_t with a fair lock.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-01-25  0:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-09  4:03 rwlock_t unfairness and tasklist_lock Michel Lespinasse
2013-01-09 17:49 ` Oleg Nesterov
2013-01-09 23:20   ` Michel Lespinasse
2013-01-12 17:31     ` Oleg Nesterov
2013-01-25  0:33       ` Michel Lespinasse
2013-01-11 14:34 ` Thomas Gleixner
2013-01-12  3:33   ` [PATCH] " Michel Lespinasse
2013-01-12 17:46     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).