* [RFC][PATCH 0/5] Improve signal delivery scalability @ 2011-04-05 19:21 Matt Fleming 2011-04-05 19:21 ` [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' Matt Fleming ` (4 more replies) 0 siblings, 5 replies; 29+ messages in thread From: Matt Fleming @ 2011-04-05 19:21 UTC (permalink / raw) To: Tejun Heo, Oleg Nesterov Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming From: Matt Fleming <matt.fleming@linux.intel.com> This series is most definitely RFC material. At this early stage I'm just asking for feedback on whether the concepts/ideas are sound. For example, I'm fairly sure that these patches break ia64 because it accesses tsk->pending in asm and I haven't had time to look closely at that yet. This patch series was motivated by the signal1_threads testcase from the Will It Scale benchmark suite. Currently, signal generation and delivery in a thread group is serialised by the thread group's siglock, tsk->sighand->siglock. This lock is a huge point of contention in multithreaded applications, even when sending a signal to one thread in a thread group. In light of this, this patch series tries to improve scalability by, - introducing a per-thread siglock to protect the per-thread pending signal queue - avoiding acquiring the shared siglock wherever possible - using a rwlock to protect signal handlers This series is based on the assumption that it is OK to acquire and release the shared siglock across function calls and do things like execute PENDING() without holding any locks. The improvement on the "signal delivery" testcase can be seen here, http://userweb.kernel.org/~mfleming/will-it-scale/signals-per-thread-siglock/ Tejun, these patches are based on your ptrace branch as both you and Oleg have touched alot of the same code paths lately. All comments appreciated! Matt Fleming (5): signals: Always place SIGCONT and SIGSTOP on 'shared_pending' signals: Introduce per-thread siglock and action rwlock ia64: Catch up with new sighand action spinlock signals: Introduce __dequeue_private_signal helper function signals: Don't hold shared siglock across signal delivery arch/ia64/kernel/signal.c | 13 +-- fs/autofs4/waitq.c | 2 + fs/exec.c | 2 + fs/signalfd.c | 7 +- include/linux/init_task.h | 2 + include/linux/sched.h | 4 + include/linux/signal.h | 5 + include/linux/tracehook.h | 3 +- kernel/compat.c | 7 +- kernel/exit.c | 12 +- kernel/fork.c | 2 + kernel/kmod.c | 8 +- kernel/ptrace.c | 4 +- kernel/signal.c | 429 +++++++++++++++++++++++++++++++++------------ security/selinux/hooks.c | 6 +- 15 files changed, 360 insertions(+), 146 deletions(-) -- 1.7.4 ^ permalink raw reply [flat|nested] 29+ messages in thread
* [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' 2011-04-05 19:21 [RFC][PATCH 0/5] Improve signal delivery scalability Matt Fleming @ 2011-04-05 19:21 ` Matt Fleming 2011-04-05 20:19 ` Oleg Nesterov 2011-04-05 19:21 ` [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock Matt Fleming ` (3 subsequent siblings) 4 siblings, 1 reply; 29+ messages in thread From: Matt Fleming @ 2011-04-05 19:21 UTC (permalink / raw) To: Tejun Heo, Oleg Nesterov Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming From: Matt Fleming <matt.fleming@linux.intel.com> Because SIGCONT and SIGSTOP affect an entire thread group, we can place them on the 'shared_pending' queue. This means that we no longer have to iterate over every thread in a thread group to remove SIGCONT/SIGSTOP signals from their private 'pending' queues. SIGCONT and SIGSTOP signals are now only placed on the shared queue and not on the private so queue we need to prevent them from being starved of service. We need to prioritize signals on the shared queue over signals on the private queue. For example, if we don't do this there's potential for a task to keep handling private signals even though they were delivered _after_ a SIGSTOP. Note that POSIX does not mandate any kind of priority between signals to a thread group or a specific thread, so this change in behaviour should be safe. This is an optimization to reduce the amount of code that we execute while holding ->siglock, and is preparation for introducing a per-thread siglock for modifying the private signal queue. Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com> --- kernel/signal.c | 123 +++++++++++++++++++++++++++++++++++-------------------- 1 files changed, 79 insertions(+), 44 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 4f7312b..9595d86 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -545,51 +545,41 @@ static int __dequeue_signal(struct sigpending *pending, sigset_t *mask, } /* - * Dequeue a signal and return the element to the caller, which is - * expected to free it. - * - * All callers have to hold the siglock. + * All callers must hold tsk->sighand->siglock. */ -int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info) +static int __dequeue_shared_signal(struct task_struct *tsk, + sigset_t *mask, siginfo_t *info) { int signr; - /* We only dequeue private signals from ourselves, we don't let - * signalfd steal them + assert_spin_locked(&tsk->sighand->siglock); + + signr = __dequeue_signal(&tsk->signal->shared_pending, + mask, info); + /* + * itimer signal ? + * + * itimers are process shared and we restart periodic + * itimers in the signal delivery path to prevent DoS + * attacks in the high resolution timer case. This is + * compliant with the old way of self restarting + * itimers, as the SIGALRM is a legacy signal and only + * queued once. Changing the restart behaviour to + * restart the timer in the signal dequeue path is + * reducing the timer noise on heavy loaded !highres + * systems too. */ - signr = __dequeue_signal(&tsk->pending, mask, info); - if (!signr) { - signr = __dequeue_signal(&tsk->signal->shared_pending, - mask, info); - /* - * itimer signal ? - * - * itimers are process shared and we restart periodic - * itimers in the signal delivery path to prevent DoS - * attacks in the high resolution timer case. This is - * compliant with the old way of self restarting - * itimers, as the SIGALRM is a legacy signal and only - * queued once. Changing the restart behaviour to - * restart the timer in the signal dequeue path is - * reducing the timer noise on heavy loaded !highres - * systems too. - */ - if (unlikely(signr == SIGALRM)) { - struct hrtimer *tmr = &tsk->signal->real_timer; - - if (!hrtimer_is_queued(tmr) && - tsk->signal->it_real_incr.tv64 != 0) { - hrtimer_forward(tmr, tmr->base->get_time(), - tsk->signal->it_real_incr); - hrtimer_restart(tmr); - } + if (unlikely(signr == SIGALRM)) { + struct hrtimer *tmr = &tsk->signal->real_timer; + + if (!hrtimer_is_queued(tmr) && + tsk->signal->it_real_incr.tv64 != 0) { + hrtimer_forward(tmr, tmr->base->get_time(), + tsk->signal->it_real_incr); + hrtimer_restart(tmr); } } - recalc_sigpending(); - if (!signr) - return 0; - if (unlikely(sig_kernel_stop(signr))) { /* * Set a marker that we have dequeued a stop signal. Our @@ -605,6 +595,40 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info) */ current->group_stop |= GROUP_STOP_DEQUEUED; } + + return signr; +} + +/* + * Dequeue a signal and return the element to the caller, which is + * expected to free it. + * + * All callers have to hold the siglock. + */ +int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info) +{ + int signr; + + /* + * Dequeue shared signals first since this is where SIGSTOP + * and SIGCONTs will be. If we don't dequeue these first + * there's a chance that we could keep handling private + * signals even when there are SIGSTOPs or SIGCONTs pending in + * the shared queue, e.g. we'd starve shared signals from + * being serviced. + */ + signr = __dequeue_shared_signal(tsk, mask, info); + + /* We only dequeue private signals from ourselves, we don't let + * signalfd steal them + */ + if (!signr) + signr = __dequeue_signal(&tsk->pending, mask, info); + + recalc_sigpending(); + if (!signr) + return 0; + if ((info->si_code & __SI_MASK) == __SI_TIMER && info->si_sys_private) { /* * Release the siglock to ensure proper locking order @@ -779,13 +803,10 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns) */ } else if (sig_kernel_stop(sig)) { /* - * This is a stop signal. Remove SIGCONT from all queues. + * This is a stop signal. Remove SIGCONT from the + * shared queue. */ rm_from_queue(sigmask(SIGCONT), &signal->shared_pending); - t = p; - do { - rm_from_queue(sigmask(SIGCONT), &t->pending); - } while_each_thread(p, t); } else if (sig == SIGCONT) { unsigned int why; /* @@ -795,7 +816,6 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns) t = p; do { task_clear_group_stop_pending(t); - rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending); wake_up_state(t, __TASK_STOPPED); } while_each_thread(p, t); @@ -945,7 +965,22 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t, if (!prepare_signal(sig, t, from_ancestor_ns)) return 0; - pending = group ? &t->signal->shared_pending : &t->pending; + /* + * We always enqueue SIGSTOP or SIGCONT signals on the shared + * queue. This means that a SIGSTOP or SIGCONT signal _cannot_ + * be present on a thread's private pending queue. + * + * This makes prepare_signal() more optimal as we do not have + * to remove signals from each thread's pending queue and so + * can avoid iterating over all threads in the thread group + * (and therefore avoid the locking that would be necessary to + * do that safely). + */ + if (group || sig_kernel_stop(sig) || sig == SIGCONT) + pending = &t->signal->shared_pending; + else + pending = &t->pending; + /* * Short-circuit ignored signals and support queuing * exactly one non-rt signal, so that we can get more -- 1.7.4 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' 2011-04-05 19:21 ` [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' Matt Fleming @ 2011-04-05 20:19 ` Oleg Nesterov 2011-04-05 20:50 ` Matt Fleming 0 siblings, 1 reply; 29+ messages in thread From: Oleg Nesterov @ 2011-04-05 20:19 UTC (permalink / raw) To: Matt Fleming Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming Hi Matt, I'll try to study this series, but not before Friday, sorry. Only one thing, On 04/05, Matt Fleming wrote: > > Because SIGCONT and SIGSTOP affect an entire thread group, Yes, the effect is global, but > we can > place them on the 'shared_pending' queue. I don't think we can. - pending = group ? &t->signal->shared_pending : &t->pending; > + /* > + * We always enqueue SIGSTOP or SIGCONT signals on the shared > + * queue. This means that a SIGSTOP or SIGCONT signal _cannot_ > + * be present on a thread's private pending queue. > + * > + * This makes prepare_signal() more optimal as we do not have > + * to remove signals from each thread's pending queue and so > + * can avoid iterating over all threads in the thread group > + * (and therefore avoid the locking that would be necessary to > + * do that safely). > + */ > + if (group || sig_kernel_stop(sig) || sig == SIGCONT) > + pending = &t->signal->shared_pending; > + else > + pending = &t->pending; How so? Suppose the process has a handler for SIGCONT. Suppose this process is not stopped. tkill(SIGCONT) should deliver the signal to the right thread. SIGSTOP can't have the handler, still we shouldn't place it on the shared list, debuggers won't be happy. Also. This code was changed very much, please do these changes on top of http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=refs/heads/ptrace Oleg. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' 2011-04-05 20:19 ` Oleg Nesterov @ 2011-04-05 20:50 ` Matt Fleming 2011-04-06 12:57 ` Oleg Nesterov 0 siblings, 1 reply; 29+ messages in thread From: Matt Fleming @ 2011-04-05 20:50 UTC (permalink / raw) To: Oleg Nesterov Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On Tue, 5 Apr 2011 22:19:58 +0200 Oleg Nesterov <oleg@redhat.com> wrote: > Hi Matt, > > I'll try to study this series, but not before Friday, sorry. No problem! > Only one thing, > > On 04/05, Matt Fleming wrote: > > > > Because SIGCONT and SIGSTOP affect an entire thread group, > > Yes, the effect is global, but > > > we can > > place them on the 'shared_pending' queue. > > I don't think we can. > > - pending = group ? &t->signal->shared_pending : &t->pending; > > + /* > > + * We always enqueue SIGSTOP or SIGCONT signals on the > > shared > > + * queue. This means that a SIGSTOP or SIGCONT signal > > _cannot_ > > + * be present on a thread's private pending queue. > > + * > > + * This makes prepare_signal() more optimal as we do not > > have > > + * to remove signals from each thread's pending queue and > > so > > + * can avoid iterating over all threads in the thread group > > + * (and therefore avoid the locking that would be > > necessary to > > + * do that safely). > > + */ > > + if (group || sig_kernel_stop(sig) || sig == SIGCONT) > > + pending = &t->signal->shared_pending; > > + else > > + pending = &t->pending; > > How so? Suppose the process has a handler for SIGCONT. Suppose this > process is not stopped. tkill(SIGCONT) should deliver the signal to > the right thread. D'oh, yes. I think I got confused here. You're right, this won't work. > SIGSTOP can't have the handler, still we shouldn't place it on the > shared list, debuggers won't be happy. Urgh, debuggers actually peek at shared_pending and pending? > Also. This code was changed very much, please do these changes on > top of > http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=refs/heads/ptrace My patches are already based on that tree. -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' 2011-04-05 20:50 ` Matt Fleming @ 2011-04-06 12:57 ` Oleg Nesterov 2011-04-06 13:09 ` Tejun Heo 2011-04-06 13:15 ` Matt Fleming 0 siblings, 2 replies; 29+ messages in thread From: Oleg Nesterov @ 2011-04-06 12:57 UTC (permalink / raw) To: Matt Fleming Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On 04/05, Matt Fleming wrote: > > On Tue, 5 Apr 2011 22:19:58 +0200 > > > SIGSTOP can't have the handler, still we shouldn't place it on the > > shared list, debuggers won't be happy. > > Urgh, debuggers actually peek at shared_pending and pending? Argh, sorry, I was not clear. First of all, only SIGSTOP is never delivered to user-space, other sig_kernel_stop() signals can have a handler and in this case, say, SIGTTIN doesn't stop but acts like the normal signal. This means you can't put it into shared_pending. But even SIGSTOP should be routed properly. If the process is ptraced, the tracee reports SIGSTOP to the debugger first. This means that tkill(SIGSTOP) should be delivered to the right target. > > Also. This code was changed very much, please do these changes on > > top of > > http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=refs/heads/ptrace > > My patches are already based on that tree. Ah, good. Oleg. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' 2011-04-06 12:57 ` Oleg Nesterov @ 2011-04-06 13:09 ` Tejun Heo 2011-04-06 13:30 ` Matt Fleming 2011-04-06 13:15 ` Matt Fleming 1 sibling, 1 reply; 29+ messages in thread From: Tejun Heo @ 2011-04-06 13:09 UTC (permalink / raw) To: Oleg Nesterov Cc: Matt Fleming, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming Hey, guys. On Wed, Apr 06, 2011 at 02:57:57PM +0200, Oleg Nesterov wrote: > But even SIGSTOP should be routed properly. If the process is ptraced, > the tracee reports SIGSTOP to the debugger first. This means that > tkill(SIGSTOP) should be delivered to the right target. I think the more important part is that there really isn't much point in optimizing SIGSTOP/CONT. They inherently involve heavy, walk-every-thread operations of putting them to sleep and reversing it and there isn't much point in optimizing sending SIGSTOP to stopped processes or CONT to running ones. In addition, STOP/CONT interaction is already scary enough so I'd like to avoid adding complexities there if at all possible. I think it would be better to concentrate on more usual signals. Thank you. -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' 2011-04-06 13:09 ` Tejun Heo @ 2011-04-06 13:30 ` Matt Fleming 0 siblings, 0 replies; 29+ messages in thread From: Matt Fleming @ 2011-04-06 13:30 UTC (permalink / raw) To: Tejun Heo Cc: Oleg Nesterov, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On Wed, 6 Apr 2011 06:09:28 -0700 Tejun Heo <tj@kernel.org> wrote: > Hey, guys. > > On Wed, Apr 06, 2011 at 02:57:57PM +0200, Oleg Nesterov wrote: > > But even SIGSTOP should be routed properly. If the process is > > ptraced, the tracee reports SIGSTOP to the debugger first. This > > means that tkill(SIGSTOP) should be delivered to the right target. > > I think the more important part is that there really isn't much point > in optimizing SIGSTOP/CONT. They inherently involve heavy, > walk-every-thread operations of putting them to sleep and reversing it > and there isn't much point in optimizing sending SIGSTOP to stopped > processes or CONT to running ones. In addition, STOP/CONT interaction > is already scary enough so I'd like to avoid adding complexities there > if at all possible. > > I think it would be better to concentrate on more usual signals. Fair point. Note that none of the other patches try to optimize SIGSTOP/CONT paths. This patch was also my attempt to make my brain not explode while figuring out the locking order. This was the first patch I wrote in the series and it was before I'd decided on the order. In other words, I was trying to eliminate any code where we'd do, tsk->sighand->action_lock tsk->siglock [Dequeue STOP signal from tsk->pending] tsk->sighand->siglock because that complicates the locking order and this patch seemed like a worthwhile cleanup. As it turns out, it's not a worthwhile/correct cleanup so I'll have to think how I can handle those paths safely with a different locking order. -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' 2011-04-06 12:57 ` Oleg Nesterov 2011-04-06 13:09 ` Tejun Heo @ 2011-04-06 13:15 ` Matt Fleming 2011-04-11 18:50 ` Oleg Nesterov 1 sibling, 1 reply; 29+ messages in thread From: Matt Fleming @ 2011-04-06 13:15 UTC (permalink / raw) To: Oleg Nesterov Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On Wed, 6 Apr 2011 14:57:57 +0200 Oleg Nesterov <oleg@redhat.com> wrote: > Argh, sorry, I was not clear. > > First of all, only SIGSTOP is never delivered to user-space, other > sig_kernel_stop() signals can have a handler and in this case, say, > SIGTTIN doesn't stop but acts like the normal signal. This means you > can't put it into shared_pending. Yeah, good point. This patch definitely needs dropping from the series. > But even SIGSTOP should be routed properly. If the process is ptraced, > the tracee reports SIGSTOP to the debugger first. This means that > tkill(SIGSTOP) should be delivered to the right target. Right, thanks. -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' 2011-04-06 13:15 ` Matt Fleming @ 2011-04-11 18:50 ` Oleg Nesterov 2011-04-11 19:24 ` Matt Fleming 0 siblings, 1 reply; 29+ messages in thread From: Oleg Nesterov @ 2011-04-11 18:50 UTC (permalink / raw) To: Matt Fleming Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On 04/06, Matt Fleming wrote: > > Yeah, good point. This patch definitely needs dropping from the series. Yes. Sorry for delay again. I'll try to read the rest of this series tomorrow, I can no longer look at signal.c to today. Oh, these changes do not look trivial. I must admit, I am a bit sceptical, but this is only my first impression based on amount on changes and more complex locking. Oleg. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' 2011-04-11 18:50 ` Oleg Nesterov @ 2011-04-11 19:24 ` Matt Fleming 0 siblings, 0 replies; 29+ messages in thread From: Matt Fleming @ 2011-04-11 19:24 UTC (permalink / raw) To: Oleg Nesterov Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On Mon, 11 Apr 2011 20:50:05 +0200 Oleg Nesterov <oleg@redhat.com> wrote: > Sorry for delay again. I'll try to read the rest of this series > tomorrow, I can no longer look at signal.c to today. No problem. > Oh, these changes do not look trivial. I must admit, I am a bit > sceptical, but this is only my first impression based on amount on > changes and more complex locking. Yeah, the locking is more complex and I suspect that because I'm dropping this patch the next iteration might become even more complex. However, I definitely think the complexity is worth it because of how much better signal delivery scales with these patches. If you find anything that you think is overly complex I'm sure I could clean it up with either comments or refactoring the code. -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 29+ messages in thread
* [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock 2011-04-05 19:21 [RFC][PATCH 0/5] Improve signal delivery scalability Matt Fleming 2011-04-05 19:21 ` [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' Matt Fleming @ 2011-04-05 19:21 ` Matt Fleming 2011-04-13 19:42 ` Oleg Nesterov 2011-04-05 19:21 ` [RFC][PATCH 3/5] ia64: Catch up with new sighand action spinlock Matt Fleming ` (2 subsequent siblings) 4 siblings, 1 reply; 29+ messages in thread From: Matt Fleming @ 2011-04-05 19:21 UTC (permalink / raw) To: Tejun Heo, Oleg Nesterov Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming From: Matt Fleming <matt.fleming@linux.intel.com> Try to reduce the amount of time we hold the shared siglock by introducing a per-thread siglock that protects each thread's 'pending' queue. Currently, the shared siglock protects both the shared and private signal queues. As such, it is a huge point of contention when generating and delivering signals to single threads in a multithreaded process. For example, even if a thread is delivering a signal to itself (thereby putting a signal on its private signal queue) it must acquire the shared siglock. To improve signal generation scalability we only acquire the shared siglock when generating a shared signal. If we're generating a private signal which will be delivered to a specific thread (and that signal does not affect an entire thread group) then we can optimise that case by only taking the per-thread siglock. However, the case where we're sending a fatal signal to a specific thread is special because we need to modify tsk->signal->flags, so we need to be holding the shared siglock. Note that we now hold both the shared and per-thread siglock when delivering a private signal. That will be fixed in the next patch so that signal delivery scales with signal generation. Also, because we now run signal code paths without holding the shared siglock at all, it can no longer be used to protect tsk->sighand->action. So we introduce a rwlock for protecting tsk->sighand->action. A rwlock is better than a spinlock in this case because there are many more readers than writers and a rwlock allows us to scale much better than a spinlock. Move the the majority of the shared signal code into a helper function to make the code in dequeue_signal() easier to read. Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com> --- fs/autofs4/waitq.c | 2 + fs/exec.c | 2 + fs/signalfd.c | 2 + include/linux/init_task.h | 2 + include/linux/sched.h | 4 + include/linux/signal.h | 5 + kernel/exit.c | 12 ++- kernel/fork.c | 2 + kernel/kmod.c | 8 +- kernel/ptrace.c | 4 +- kernel/signal.c | 189 ++++++++++++++++++++++++++++++++++++++------- security/selinux/hooks.c | 6 +- 12 files changed, 199 insertions(+), 39 deletions(-) diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c index 5601005..a7adbf8 100644 --- a/fs/autofs4/waitq.c +++ b/fs/autofs4/waitq.c @@ -83,7 +83,9 @@ static int autofs4_write(struct file *file, const void *addr, int bytes) SIGPIPE unless it was already supposed to get one */ if (wr == -EPIPE && !sigpipe) { spin_lock_irqsave(¤t->sighand->siglock, flags); + spin_lock(¤t->siglock); sigdelset(¤t->pending.signal, SIGPIPE); + spin_unlock(¤t->siglock); recalc_sigpending(); spin_unlock_irqrestore(¤t->sighand->siglock, flags); } diff --git a/fs/exec.c b/fs/exec.c index 8328beb..5cc4a6c 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1661,8 +1661,10 @@ static int zap_process(struct task_struct *start, int exit_code) do { task_clear_group_stop_pending(t); if (t != current && t->mm) { + spin_lock(&t->siglock); sigaddset(&t->pending.signal, SIGKILL); signal_wake_up(t, 1); + spin_unlock(&t->siglock); nr++; } } while_each_thread(start, t); diff --git a/fs/signalfd.c b/fs/signalfd.c index 492465b..ed49c40 100644 --- a/fs/signalfd.c +++ b/fs/signalfd.c @@ -48,10 +48,12 @@ static unsigned int signalfd_poll(struct file *file, poll_table *wait) poll_wait(file, ¤t->sighand->signalfd_wqh, wait); spin_lock_irq(¤t->sighand->siglock); + spin_lock(¤t->siglock); if (next_signal(¤t->pending, &ctx->sigmask) || next_signal(¤t->signal->shared_pending, &ctx->sigmask)) events |= POLLIN; + spin_unlock(¤t->siglock); spin_unlock_irq(¤t->sighand->siglock); return events; diff --git a/include/linux/init_task.h b/include/linux/init_task.h index caa151f..a948d24 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -47,6 +47,7 @@ extern struct nsproxy init_nsproxy; .action = { { { .sa_handler = SIG_DFL, } }, }, \ .siglock = __SPIN_LOCK_UNLOCKED(sighand.siglock), \ .signalfd_wqh = __WAIT_QUEUE_HEAD_INITIALIZER(sighand.signalfd_wqh), \ + .action_lock = __RW_LOCK_UNLOCKED(sighand.action_lock), \ } extern struct group_info init_groups; @@ -168,6 +169,7 @@ extern struct cred init_cred; .signal = &init_signals, \ .sighand = &init_sighand, \ .nsproxy = &init_nsproxy, \ + .siglock = __SPIN_LOCK_UNLOCKED(tsk.siglock), \ .pending = { \ .list = LIST_HEAD_INIT(tsk.pending.list), \ .signal = {{0}}}, \ diff --git a/include/linux/sched.h b/include/linux/sched.h index 8cef82d..7b2ea86 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -443,6 +443,9 @@ struct sighand_struct { struct k_sigaction action[_NSIG]; spinlock_t siglock; wait_queue_head_t signalfd_wqh; + + /* protects action */ + rwlock_t action_lock; }; struct pacct_struct { @@ -1358,6 +1361,7 @@ struct task_struct { sigset_t blocked, real_blocked; sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */ + spinlock_t siglock; /* protects pending */ struct sigpending pending; unsigned long sas_ss_sp; diff --git a/include/linux/signal.h b/include/linux/signal.h index fcd2b14..9aad2b0 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -376,6 +376,11 @@ int unhandled_signal(struct task_struct *tsk, int sig); (!siginmask(signr, SIG_KERNEL_IGNORE_MASK|SIG_KERNEL_STOP_MASK) && \ (t)->sighand->action[(signr)-1].sa.sa_handler == SIG_DFL) +/* Should the signal be placed on shared_pending? */ +#define sig_shared(sig) \ + (((sig) < SIGRTMIN) && \ + siginmask(sig, SIG_KERNEL_STOP_MASK | rt_sigmask(SIGCONT))) + void signals_init(void); #endif /* __KERNEL__ */ diff --git a/kernel/exit.c b/kernel/exit.c index 1a0f10f..7774126 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -142,7 +142,9 @@ static void __exit_signal(struct task_struct *tsk) * Do this under ->siglock, we can race with another thread * doing sigqueue_free() if we have SIGQUEUE_PREALLOC signals. */ + spin_lock(&tsk->siglock); flush_sigqueue(&tsk->pending); + spin_unlock(&tsk->siglock); tsk->sighand = NULL; spin_unlock(&sighand->siglock); @@ -386,7 +388,8 @@ int allow_signal(int sig) if (!valid_signal(sig) || sig < 1) return -EINVAL; - spin_lock_irq(¤t->sighand->siglock); + write_lock_irq(¤t->sighand->action_lock); + spin_lock(¤t->sighand->siglock); /* This is only needed for daemonize()'ed kthreads */ sigdelset(¤t->blocked, sig); /* @@ -396,7 +399,8 @@ int allow_signal(int sig) */ current->sighand->action[(sig)-1].sa.sa_handler = (void __user *)2; recalc_sigpending(); - spin_unlock_irq(¤t->sighand->siglock); + spin_unlock(¤t->sighand->siglock); + write_unlock_irq(¤t->sighand->action_lock); return 0; } @@ -407,10 +411,10 @@ int disallow_signal(int sig) if (!valid_signal(sig) || sig < 1) return -EINVAL; - spin_lock_irq(¤t->sighand->siglock); + write_lock_irq(¤t->sighand->action_lock); current->sighand->action[(sig)-1].sa.sa_handler = SIG_IGN; recalc_sigpending(); - spin_unlock_irq(¤t->sighand->siglock); + write_unlock_irq(¤t->sighand->action_lock); return 0; } diff --git a/kernel/fork.c b/kernel/fork.c index f2b494d..50fa84f 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1081,6 +1081,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, p->vfork_done = NULL; spin_lock_init(&p->alloc_lock); + spin_lock_init(&p->siglock); init_sigpending(&p->pending); p->utime = cputime_zero; @@ -1494,6 +1495,7 @@ static void sighand_ctor(void *data) spin_lock_init(&sighand->siglock); init_waitqueue_head(&sighand->signalfd_wqh); + rwlock_init(&sighand->action_lock); } void __init proc_caches_init(void) diff --git a/kernel/kmod.c b/kernel/kmod.c index 9cd0591..551a6e9 100644 --- a/kernel/kmod.c +++ b/kernel/kmod.c @@ -134,9 +134,9 @@ static int ____call_usermodehelper(void *data) struct subprocess_info *sub_info = data; int retval; - spin_lock_irq(¤t->sighand->siglock); + write_lock_irq(¤t->sighand->action_lock); flush_signal_handlers(current, 1); - spin_unlock_irq(¤t->sighand->siglock); + write_unlock_irq(¤t->sighand->action_lock); /* We can run anywhere, unlike our parent keventd(). */ set_cpus_allowed_ptr(current, cpu_all_mask); @@ -178,9 +178,9 @@ static int wait_for_helper(void *data) pid_t pid; /* If SIGCLD is ignored sys_wait4 won't populate the status. */ - spin_lock_irq(¤t->sighand->siglock); + write_lock_irq(¤t->sighand->action_lock); current->sighand->action[SIGCHLD-1].sa.sa_handler = SIG_DFL; - spin_unlock_irq(¤t->sighand->siglock); + write_unlock_irq(¤t->sighand->action_lock); pid = kernel_thread(____call_usermodehelper, sub_info, SIGCHLD); if (pid < 0) { diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 20d5efd..c72d40d 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -294,10 +294,10 @@ static int ptrace_traceme(void) static int ignoring_children(struct sighand_struct *sigh) { int ret; - spin_lock(&sigh->siglock); + read_lock(&sigh->action_lock); ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) || (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT); - spin_unlock(&sigh->siglock); + read_unlock(&sigh->action_lock); return ret; } diff --git a/kernel/signal.c b/kernel/signal.c index 9595d86..c67f27a 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -38,6 +38,45 @@ #include "audit.h" /* audit_signal_info() */ /* + * Signal locking. + * + * There are three locks to consider when manipulating signal queues, + * + * - tsk->sighand->action_lock (sighand action rwlock) + * - tsk->sighand->siglock (shared siglock) + * - tsk->siglock (per-thread siglock) + * + * The sighand action rwlock needs to be acquired when modifying + * tsk->sighand->action or when you need to prevent modifications. + * + * When modifying the shared signal queue (signal->shared_pending), + * modifying tsk->blocked or handling stop signals (see + * do_signal_stop), you need to be holding the shared siglock. + * + * When modifying the per-thread pending signal queue (tsk->pending) + * you need to be holding the per-thread siglock. + * + * You MUST NOT acquire the shared siglock if already holding the + * per-thread siglock because various places iterate over the threads + * in a thread group and acquire their per-thread siglocks while + * holding the shared siglock - see zap_process(). + * + * You MUST NOT acquire the sighand action lock if already holding the + * per-thread siglock. + * + * So the locking order is... + * + * tsk->sighand->action_lock + * tsk->sighand->siglock + * tsk->siglock + * + * OR.. + * + * tsk->sighand->action_lock + * tsk->siglock + */ + +/* * SLAB caches for signal bits. */ @@ -367,7 +406,9 @@ void flush_sigqueue(struct sigpending *queue) void __flush_signals(struct task_struct *t) { clear_tsk_thread_flag(t, TIF_SIGPENDING); + spin_lock(&t->siglock); flush_sigqueue(&t->pending); + spin_unlock(&t->siglock); flush_sigqueue(&t->signal->shared_pending); } @@ -409,7 +450,9 @@ void flush_itimer_signals(void) unsigned long flags; spin_lock_irqsave(&tsk->sighand->siglock, flags); + spin_lock(&tsk->siglock); __flush_itimer_signals(&tsk->pending); + spin_unlock(&tsk->siglock); __flush_itimer_signals(&tsk->signal->shared_pending); spin_unlock_irqrestore(&tsk->sighand->siglock, flags); } @@ -622,8 +665,11 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info) /* We only dequeue private signals from ourselves, we don't let * signalfd steal them */ - if (!signr) + if (!signr) { + spin_lock(&tsk->siglock); signr = __dequeue_signal(&tsk->pending, mask, info); + spin_unlock(&tsk->siglock); + } recalc_sigpending(); if (!signr) @@ -931,7 +977,9 @@ static void complete_signal(int sig, struct task_struct *p, int group) t = p; do { task_clear_group_stop_pending(t); + spin_lock(&t->siglock); sigaddset(&t->pending.signal, SIGKILL); + spin_unlock(&t->siglock); signal_wake_up(t, 1); } while_each_thread(p, t); return; @@ -960,8 +1008,6 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t, trace_signal_generate(sig, info, t); - assert_spin_locked(&t->sighand->siglock); - if (!prepare_signal(sig, t, from_ancestor_ns)) return 0; @@ -976,7 +1022,12 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t, * (and therefore avoid the locking that would be necessary to * do that safely). */ - if (group || sig_kernel_stop(sig) || sig == SIGCONT) + if (group || sig_shared(sig) || sig_fatal(t, sig)) + assert_spin_locked(&t->sighand->siglock); + else + assert_spin_locked(&t->siglock); + + if (group || sig_shared(sig)) pending = &t->signal->shared_pending; else pending = &t->pending; @@ -988,6 +1039,7 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t, */ if (legacy_queue(pending, sig)) return 0; + /* * fast-pathed signals for kernel-internal things like SIGSTOP * or SIGKILL. @@ -1123,11 +1175,34 @@ int do_send_sig_info(int sig, struct siginfo *info, struct task_struct *p, unsigned long flags; int ret = -ESRCH; - if (lock_task_sighand(p, &flags)) { + read_lock_irqsave(&p->sighand->action_lock, flags); + + /* + * We can optimize for the case where we're sending a signal + * to a specific thread. If we're only sending a signal to one + * thread we only need to acquire that thread's siglock and + * not the global siglock. + * + * Fatal signals have the potential to tear down an entire + * thread group, even if we're only sending the signal to a + * specific thread! If we're killing an entire thread group we + * need the shared siglock. Note that we can't check if the + * signal is fatal until we're holding 'action_lock', because + * someone could modify the handler for the signal (some + * signals are only fatal when the handler is SIG_DFL). + */ + if (!group && !sig_shared(sig) && !sig_fatal(p, sig)) { + spin_lock(&p->siglock); ret = send_signal(sig, info, p, group); - unlock_task_sighand(p, &flags); + spin_unlock(&p->siglock); + } else { + if (lock_task_sighand(p, &flags)) { + ret = send_signal(sig, info, p, group); + unlock_task_sighand(p, &flags); + } } + read_unlock_irqrestore(&p->sighand->action_lock, flags); return ret; } @@ -1148,8 +1223,10 @@ force_sig_info(int sig, struct siginfo *info, struct task_struct *t) unsigned long int flags; int ret, blocked, ignored; struct k_sigaction *action; + int shared_siglock; - spin_lock_irqsave(&t->sighand->siglock, flags); + write_lock_irqsave(&t->sighand->action_lock, flags); + spin_lock(&t->sighand->siglock); action = &t->sighand->action[sig-1]; ignored = action->sa.sa_handler == SIG_IGN; blocked = sigismember(&t->blocked, sig); @@ -1162,9 +1239,21 @@ force_sig_info(int sig, struct siginfo *info, struct task_struct *t) } if (action->sa.sa_handler == SIG_DFL) t->signal->flags &= ~SIGNAL_UNKILLABLE; + + shared_siglock = sig_shared(sig) || sig_fatal(t, sig); + if (!shared_siglock) { + spin_unlock(&t->sighand->siglock); + spin_lock(&t->siglock); + } + ret = specific_send_sig_info(sig, info, t); - spin_unlock_irqrestore(&t->sighand->siglock, flags); + if (!shared_siglock) + spin_unlock(&t->siglock); + else + spin_unlock(&t->sighand->siglock); + + write_unlock_irqrestore(&t->sighand->action_lock, flags); return ret; } @@ -1185,7 +1274,9 @@ int zap_other_threads(struct task_struct *p) /* Don't bother with already dead threads */ if (t->exit_state) continue; + spin_lock(&t->siglock); sigaddset(&t->pending.signal, SIGKILL); + spin_unlock(&t->siglock); signal_wake_up(t, 1); } @@ -1314,11 +1405,15 @@ int kill_pid_info_as_uid(int sig, struct siginfo *info, struct pid *pid, goto out_unlock; if (sig) { - if (lock_task_sighand(p, &flags)) { + unsigned long _flags; + + read_lock_irqsave(&p->sighand->action_lock, flags); + if (lock_task_sighand(p, &_flags)) { ret = __send_signal(sig, info, p, 1, 0); - unlock_task_sighand(p, &flags); + unlock_task_sighand(p, &_flags); } else ret = -ESRCH; + read_unlock_irqrestore(&p->sighand->action_lock, flags); } out_unlock: rcu_read_unlock(); @@ -1411,9 +1506,9 @@ force_sigsegv(int sig, struct task_struct *p) { if (sig == SIGSEGV) { unsigned long flags; - spin_lock_irqsave(&p->sighand->siglock, flags); + write_lock_irqsave(&p->sighand->action_lock, flags); p->sighand->action[sig - 1].sa.sa_handler = SIG_DFL; - spin_unlock_irqrestore(&p->sighand->siglock, flags); + write_unlock_irqrestore(&p->sighand->action_lock, flags); } force_sig(SIGSEGV, p); return 0; @@ -1485,14 +1580,24 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group) { int sig = q->info.si_signo; struct sigpending *pending; - unsigned long flags; + unsigned long flags, _flags; + int shared_siglock; int ret; BUG_ON(!(q->flags & SIGQUEUE_PREALLOC)); ret = -1; - if (!likely(lock_task_sighand(t, &flags))) - goto ret; + read_lock_irqsave(&t->sighand->action_lock, flags); + shared_siglock = group || sig_shared(sig) || sig_fatal(t, sig); + if (!shared_siglock) { + spin_lock(&t->siglock); + pending = &t->pending; + } else { + if (!likely(lock_task_sighand(t, &_flags))) + goto ret; + + pending = &t->signal->shared_pending; + } ret = 1; /* the signal is ignored */ if (!prepare_signal(sig, t, 0)) @@ -1511,12 +1616,18 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group) q->info.si_overrun = 0; signalfd_notify(t, sig); - pending = group ? &t->signal->shared_pending : &t->pending; + list_add_tail(&q->list, &pending->list); sigaddset(&pending->signal, sig); complete_signal(sig, t, group); + out: - unlock_task_sighand(t, &flags); + if (shared_siglock) + unlock_task_sighand(t, &_flags); + else + spin_unlock(&t->siglock); + + read_unlock_irqrestore(&t->sighand->action_lock, flags); ret: return ret; } @@ -1578,7 +1689,8 @@ int do_notify_parent(struct task_struct *tsk, int sig) } psig = tsk->parent->sighand; - spin_lock_irqsave(&psig->siglock, flags); + read_lock_irqsave(&psig->action_lock, flags); + spin_lock(&psig->siglock); if (!task_ptrace(tsk) && sig == SIGCHLD && (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN || (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) { @@ -1604,7 +1716,8 @@ int do_notify_parent(struct task_struct *tsk, int sig) if (valid_signal(sig) && sig > 0) __group_send_sig_info(sig, &info, tsk->parent); __wake_up_parent(tsk, tsk->parent); - spin_unlock_irqrestore(&psig->siglock, flags); + spin_unlock(&psig->siglock); + read_unlock_irqrestore(&psig->action_lock, flags); return ret; } @@ -1666,7 +1779,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, } sighand = parent->sighand; - spin_lock_irqsave(&sighand->siglock, flags); + read_lock_irqsave(&sighand->action_lock, flags); + spin_lock(&sighand->siglock); if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN && !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP)) __group_send_sig_info(SIGCHLD, &info, parent); @@ -1674,7 +1788,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, * Even if SIGCHLD is not generated, we must wake up wait4 calls. */ __wake_up_parent(tsk, parent); - spin_unlock_irqrestore(&sighand->siglock, flags); + spin_unlock(&sighand->siglock); + read_unlock_irqrestore(&sighand->action_lock, flags); } static inline int may_ptrace_stop(void) @@ -2021,7 +2136,19 @@ static int ptrace_signal(int signr, siginfo_t *info, /* If the (new) signal is now blocked, requeue it. */ if (sigismember(¤t->blocked, signr)) { + /* + * We're called with the shared siglock held but if + * we're going to modify the private signal queue we + * also need to acquire the per-thread siglock. + */ + if (!sig_shared(signr) && !sig_fatal(current, signr)) + spin_lock(¤t->siglock); + specific_send_sig_info(signr, info, current); + + if (!sig_shared(signr) && !sig_fatal(current, signr)) + spin_unlock(¤t->siglock); + signr = 0; } @@ -2126,8 +2253,11 @@ relock: /* Run the handler. */ *return_ka = *ka; - if (ka->sa.sa_flags & SA_ONESHOT) + if (ka->sa.sa_flags & SA_ONESHOT) { + write_lock(&sighand->action_lock); ka->sa.sa_handler = SIG_DFL; + write_unlock(&sighand->action_lock); + } break; /* will return non-zero "signr" value */ } @@ -2157,9 +2287,8 @@ relock: * The default action is to stop all threads in * the thread group. The job control signals * do nothing in an orphaned pgrp, but SIGSTOP - * always works. Note that siglock needs to be - * dropped during the call to is_orphaned_pgrp() - * because of lock ordering with tasklist_lock. + * always works. Note that we are not holding + * siglock during the call to is_orphaned_pgrp(). * This allows an intervening SIGCONT to be posted. * We need to check for that and bail out if necessary. */ @@ -2374,8 +2503,10 @@ long do_sigpending(void __user *set, unsigned long sigsetsize) goto out; spin_lock_irq(¤t->sighand->siglock); + spin_lock(¤t->siglock); sigorsets(&pending, ¤t->pending.signal, ¤t->signal->shared_pending.signal); + spin_unlock(¤t->siglock); spin_unlock_irq(¤t->sighand->siglock); /* Outside the lock because only this thread touches it. */ @@ -2691,9 +2822,10 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact) if (!valid_signal(sig) || sig < 1 || (act && sig_kernel_only(sig))) return -EINVAL; + read_lock_irq(&t->sighand->action_lock); k = &t->sighand->action[sig-1]; - spin_lock_irq(¤t->sighand->siglock); + spin_lock(¤t->sighand->siglock); if (oact) *oact = *k; @@ -2717,13 +2849,16 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact) sigaddset(&mask, sig); rm_from_queue_full(&mask, &t->signal->shared_pending); do { + spin_lock(&t->siglock); rm_from_queue_full(&mask, &t->pending); + spin_unlock(&t->siglock); t = next_thread(t); } while (t != current); } } - spin_unlock_irq(¤t->sighand->siglock); + spin_unlock(¤t->sighand->siglock); + read_unlock_irq(&t->sighand->action_lock); return 0; } diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 6475e1f..91e2450 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2244,13 +2244,15 @@ static void selinux_bprm_committed_creds(struct linux_binprm *bprm) memset(&itimer, 0, sizeof itimer); for (i = 0; i < 3; i++) do_setitimer(i, &itimer, NULL); - spin_lock_irq(¤t->sighand->siglock); + write_lock_irq(¤t->sighand->action_lock); + spin_lock(¤t->sighand->siglock); if (!(current->signal->flags & SIGNAL_GROUP_EXIT)) { __flush_signals(current); flush_signal_handlers(current, 1); sigemptyset(¤t->blocked); } - spin_unlock_irq(¤t->sighand->siglock); + spin_unlock(¤t->sighand->siglock); + write_unlock_irq(¤t->sighand->action_lock); } /* Wake up the parent if it is waiting so that it can recheck -- 1.7.4 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock 2011-04-05 19:21 ` [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock Matt Fleming @ 2011-04-13 19:42 ` Oleg Nesterov 2011-04-14 10:34 ` Matt Fleming 0 siblings, 1 reply; 29+ messages in thread From: Oleg Nesterov @ 2011-04-13 19:42 UTC (permalink / raw) To: Matt Fleming Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming Matt, sorry for huge delay. On 04/05, Matt Fleming wrote: > > Try to reduce the amount of time we hold the shared siglock by > introducing a per-thread siglock that protects each thread's 'pending' > queue. Currently, the shared siglock protects both the shared and > private signal queues. As such, it is a huge point of contention when > generating and delivering signals to single threads in a multithreaded > process. For example, even if a thread is delivering a signal to > itself (thereby putting a signal on its private signal queue) it must > acquire the shared siglock. > > To improve signal generation scalability we only acquire the shared > siglock when generating a shared signal. If we're generating a private > signal which will be delivered to a specific thread (and that signal > does not affect an entire thread group) then we can optimise that case > by only taking the per-thread siglock. However, the case where we're > sending a fatal signal to a specific thread is special because we need > to modify tsk->signal->flags, so we need to be holding the shared > siglock. > > Note that we now hold both the shared and per-thread siglock when > delivering a private signal. That will be fixed in the next patch so > that signal delivery scales with signal generation. > > Also, because we now run signal code paths without holding the shared > siglock at all, it can no longer be used to protect tsk->sighand->action. > So we introduce a rwlock for protecting tsk->sighand->action. A rwlock > is better than a spinlock in this case because there are many more > readers than writers and a rwlock allows us to scale much better than > a spinlock. So. This complicates the locking enormously. I must admit, I dislike this very much. Yes, the signals are slow... But this returns us to the original question: do we really care? Of course, it is always nice to makes the things faster, but I am not sure the added complexity worth the trouble. Random example, int force_sig_info(int sig, struct siginfo *info, struct task_struct *t) { unsigned long int flags; int ret, blocked, ignored; struct k_sigaction *action; int shared_siglock; write_lock_irqsave(&t->sighand->action_lock, flags); spin_lock(&t->sighand->siglock); action = &t->sighand->action[sig-1]; ignored = action->sa.sa_handler == SIG_IGN; blocked = sigismember(&t->blocked, sig); if (blocked || ignored) { action->sa.sa_handler = SIG_DFL; if (blocked) { sigdelset(&t->blocked, sig); recalc_sigpending_and_wake(t); } } if (action->sa.sa_handler == SIG_DFL) t->signal->flags &= ~SIGNAL_UNKILLABLE; shared_siglock = sig_shared(sig) || sig_fatal(t, sig); if (!shared_siglock) { spin_unlock(&t->sighand->siglock); spin_lock(&t->siglock); } ret = specific_send_sig_info(sig, info, t); if (!shared_siglock) spin_unlock(&t->siglock); else spin_unlock(&t->sighand->siglock); write_unlock_irqrestore(&t->sighand->action_lock, flags); return ret; } This doesn't look very nice ;) To me personally, the only "real" performance problem is kill-from-irq (posix timers is the worst case), this should be solved somehow but these changes can't help... Anyway, the patches do not look right. > +/* Should the signal be placed on shared_pending? */ > +#define sig_shared(sig) \ > + (((sig) < SIGRTMIN) && \ > + siginmask(sig, SIG_KERNEL_STOP_MASK | rt_sigmask(SIGCONT))) OK, this is wrong but we already discussed this. > @@ -142,7 +142,9 @@ static void __exit_signal(struct task_struct *tsk) > * Do this under ->siglock, we can race with another thread > * doing sigqueue_free() if we have SIGQUEUE_PREALLOC signals. > */ > + spin_lock(&tsk->siglock); > flush_sigqueue(&tsk->pending); > + spin_unlock(&tsk->siglock); This only protects flush_sigqueue(), but this is not enough. tkill() can run without ->siglock held, how can it ensure the target can't exit before we take task->siglock? > @@ -976,7 +1022,12 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t, > * (and therefore avoid the locking that would be necessary to > * do that safely). > */ > - if (group || sig_kernel_stop(sig) || sig == SIGCONT) > + if (group || sig_shared(sig) || sig_fatal(t, sig)) > + assert_spin_locked(&t->sighand->siglock); > + else > + assert_spin_locked(&t->siglock); > + > + if (group || sig_shared(sig)) > pending = &t->signal->shared_pending; > else > pending = &t->pending; Well. But later then we call complete_signal()->signal_wake_up(). And this needs ->siglock. Now I recall why it is needed. Obviously, to serialize with recalc_sigpending(). Note also that if you use task->siglock for sigaddset(t->pending), then recalc_sigpending() should take this lock as well. > @@ -1123,11 +1175,34 @@ int do_send_sig_info(int sig, struct siginfo *info, struct task_struct *p, > unsigned long flags; > int ret = -ESRCH; > > - if (lock_task_sighand(p, &flags)) { > + read_lock_irqsave(&p->sighand->action_lock, flags); How so? This is unsafe, p can exit, ->sighand can be NULL. Or exec can change ->sighand, we can take the wrong lock. And there are more wrong changes like this... > @@ -1485,14 +1580,24 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group) > { > int sig = q->info.si_signo; > struct sigpending *pending; > - unsigned long flags; > + unsigned long flags, _flags; > + int shared_siglock; > int ret; > > BUG_ON(!(q->flags & SIGQUEUE_PREALLOC)); > > ret = -1; > - if (!likely(lock_task_sighand(t, &flags))) > - goto ret; > + read_lock_irqsave(&t->sighand->action_lock, flags); > + shared_siglock = group || sig_shared(sig) || sig_fatal(t, sig); > + if (!shared_siglock) { > + spin_lock(&t->siglock); > + pending = &t->pending; > + } else { > + if (!likely(lock_task_sighand(t, &_flags))) > + goto ret; > + > + pending = &t->signal->shared_pending; This looks wrong wrong. We shouldn't do s/pending/shared_pending/ if sig_fatal(). The signal can be blocked the this task can be ptraced. > @@ -1666,7 +1779,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, > } > > sighand = parent->sighand; > - spin_lock_irqsave(&sighand->siglock, flags); > + read_lock_irqsave(&sighand->action_lock, flags); > + spin_lock(&sighand->siglock); Why do we need both? (the same for do_notify_parent) Oleg. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock 2011-04-13 19:42 ` Oleg Nesterov @ 2011-04-14 10:34 ` Matt Fleming 2011-04-14 19:00 ` Oleg Nesterov 0 siblings, 1 reply; 29+ messages in thread From: Matt Fleming @ 2011-04-14 10:34 UTC (permalink / raw) To: Oleg Nesterov Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On Wed, 13 Apr 2011 21:42:31 +0200 Oleg Nesterov <oleg@redhat.com> wrote: > Matt, sorry for huge delay. No problem. Thanks for the review! > So. This complicates the locking enormously. I must admit, I dislike > this very much. Yes, the signals are slow... But this returns us to the > original question: do we really care? Of course, it is always nice to > makes the things faster, but I am not sure the added complexity worth > the trouble. Well, it's not really that signals are slow (though that might be true!), it's more that they don't scale. So, this patch series was not designed to speed up signals in a single-threaded app, but rather to make signals scale better in multithreaded apps. We do that by reducing the contention on the shared siglock. Signal delivery isn't getting any faster with these patches, they just try to stop it getting slower when you add more threads ;-) As for the complexity... OK, point taken. Since we've already established that the patch series doesn't work as-is because the first patch is bogus, I'll treat the rest of the review as an academic exercise in helping to highlight any of my misunderstandings about the signal code. If/when I respin this series I'll certainly try to keep it simpler. > Random example, > > int > force_sig_info(int sig, struct siginfo *info, struct task_struct *t) > { > unsigned long int flags; > int ret, blocked, ignored; > struct k_sigaction *action; > int shared_siglock; > > write_lock_irqsave(&t->sighand->action_lock, flags); > spin_lock(&t->sighand->siglock); > action = &t->sighand->action[sig-1]; > ignored = action->sa.sa_handler == SIG_IGN; > blocked = sigismember(&t->blocked, sig); > if (blocked || ignored) { > action->sa.sa_handler = SIG_DFL; > if (blocked) { > sigdelset(&t->blocked, sig); > recalc_sigpending_and_wake(t); > } > } > if (action->sa.sa_handler == SIG_DFL) > t->signal->flags &= ~SIGNAL_UNKILLABLE; > > shared_siglock = sig_shared(sig) || sig_fatal(t, sig); > if (!shared_siglock) { > spin_unlock(&t->sighand->siglock); > spin_lock(&t->siglock); > } > > ret = specific_send_sig_info(sig, info, t); > > if (!shared_siglock) > spin_unlock(&t->siglock); > else > spin_unlock(&t->sighand->siglock); > > write_unlock_irqrestore(&t->sighand->action_lock, flags); > return ret; > } > > This doesn't look very nice ;) Yeah, not my finest work ;-) But seriously, you're not wrong. This code was written this way as a result of the locking design, which suggests I need to go back to the drawing board and come up with something simpler. > To me personally, the only "real" performance problem is kill-from-irq > (posix timers is the worst case), this should be solved somehow but > these changes can't help... True. > > @@ -142,7 +142,9 @@ static void __exit_signal(struct task_struct *tsk) > > * Do this under ->siglock, we can race with another thread > > * doing sigqueue_free() if we have SIGQUEUE_PREALLOC signals. > > */ > > + spin_lock(&tsk->siglock); > > flush_sigqueue(&tsk->pending); > > + spin_unlock(&tsk->siglock); > > This only protects flush_sigqueue(), but this is not enough. > > tkill() can run without ->siglock held, how can it ensure the target > can't exit before we take task->siglock? And by "target can't exit" you mean, how can we ensure that the target doesn't execute __exit_signal() and set tsk->signal to NULL if we're not holding the shared siglock in the tkill() path? Good point, that's totally a bug. While we're discussing the technique for stopping tasks from exiting... Is there a reason that a short-term reference counter isn't used to prevent this, instead of taking the siglock? Obviously, the siglock already exists and we would have to add a new atomic_t to sighand for the short-term ref, but is there any other fundamental reason why the siglock acquiring method is better? Maybe a short-term ref would make things more complicated? I dunno. siglock does seem to be used for a few different things at the moment though, so currently things are already quite complicated. Of course, the reason I'm asking is that this entire series relies on avoiding acquiring the shared siglock wherever possible. As you've pointed out above, by not acquiring the shared siglock in the tkill() path we can race with the task exiting, and somehow I need to fix that. Though note that I don't think it's just the tkill() path that is the problem here, it's signal generation in general, i.e. anybody who calls send_signal() without the shared siglock held. > > @@ -976,7 +1022,12 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t, > > * (and therefore avoid the locking that would be necessary to > > * do that safely). > > */ > > - if (group || sig_kernel_stop(sig) || sig == SIGCONT) > > + if (group || sig_shared(sig) || sig_fatal(t, sig)) > > + assert_spin_locked(&t->sighand->siglock); > > + else > > + assert_spin_locked(&t->siglock); > > + > > + if (group || sig_shared(sig)) > > pending = &t->signal->shared_pending; > > else > > pending = &t->pending; > > Well. But later then we call complete_signal()->signal_wake_up(). And this > needs ->siglock. Now I recall why it is needed. Obviously, to serialize with > recalc_sigpending(). Ah, it's the set_tsk_thread_flag(t, TIF_SIGPENDING) that we need to do atomically. Right, damn, that's a race. > Note also that if you use task->siglock for sigaddset(t->pending), then > recalc_sigpending() should take this lock as well. Right. > > @@ -1123,11 +1175,34 @@ int do_send_sig_info(int sig, struct siginfo *info, struct task_struct *p, > > unsigned long flags; > > int ret = -ESRCH; > > > > - if (lock_task_sighand(p, &flags)) { > > + read_lock_irqsave(&p->sighand->action_lock, flags); > > How so? This is unsafe, p can exit, ->sighand can be NULL. Or exec can > change ->sighand, we can take the wrong lock. > > And there are more wrong changes like this... Yeah, this is a fundamental problem with this series. > > @@ -1485,14 +1580,24 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group) > > { > > int sig = q->info.si_signo; > > struct sigpending *pending; > > - unsigned long flags; > > + unsigned long flags, _flags; > > + int shared_siglock; > > int ret; > > > > BUG_ON(!(q->flags & SIGQUEUE_PREALLOC)); > > > > ret = -1; > > - if (!likely(lock_task_sighand(t, &flags))) > > - goto ret; > > + read_lock_irqsave(&t->sighand->action_lock, flags); > > + shared_siglock = group || sig_shared(sig) || sig_fatal(t, sig); > > + if (!shared_siglock) { > > + spin_lock(&t->siglock); > > + pending = &t->pending; > > + } else { > > + if (!likely(lock_task_sighand(t, &_flags))) > > + goto ret; > > + > > + pending = &t->signal->shared_pending; > > This looks wrong wrong. We shouldn't do s/pending/shared_pending/ if > sig_fatal(). The signal can be blocked the this task can be ptraced. D'oh, yes, my bad. I've compounded the case of picking the pending queue with picking the siglock, that's wrong. OK, maybe this new locking is complicated ;-) > > @@ -1666,7 +1779,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, > > } > > > > sighand = parent->sighand; > > - spin_lock_irqsave(&sighand->siglock, flags); > > + read_lock_irqsave(&sighand->action_lock, flags); > > + spin_lock(&sighand->siglock); > > Why do we need both? (the same for do_notify_parent) We need action_lock because we're reading sighand->action and making decisions based upon its value, so we need it to not change. Also, __send_signal() needs us to have the action_lock at least for reading because we call sig_fatal() in complete_signal(), which checks tsk's action handlers, so again, we need to prevent the handlers from being modified. We need sighand->siglock because we're sending a group signal, and therefore modifying signal->shared_pending. The answer also applies to do_notify_parent(). Thanks for the review! -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock 2011-04-14 10:34 ` Matt Fleming @ 2011-04-14 19:00 ` Oleg Nesterov 2011-04-16 13:08 ` Matt Fleming 0 siblings, 1 reply; 29+ messages in thread From: Oleg Nesterov @ 2011-04-14 19:00 UTC (permalink / raw) To: Matt Fleming Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On 04/14, Matt Fleming wrote: > > Well, it's not really that signals are slow (though that might be > true!), it's more that they don't scale. So, this patch series was not > designed to speed up signals in a single-threaded app, but rather to > make signals scale better in multithreaded apps. We do that by > reducing the contention on the shared siglock. Signal delivery isn't > getting any faster with these patches, they just try to stop it getting > slower when you add more threads ;-) Yes, I understand. Even the private signal needs the "global" per-process lock bit it is rwlock. > > > @@ -142,7 +142,9 @@ static void __exit_signal(struct task_struct *tsk) > > > * Do this under ->siglock, we can race with another thread > > > * doing sigqueue_free() if we have SIGQUEUE_PREALLOC signals. > > > */ > > > + spin_lock(&tsk->siglock); > > > flush_sigqueue(&tsk->pending); > > > + spin_unlock(&tsk->siglock); > > > > This only protects flush_sigqueue(), but this is not enough. > > > > tkill() can run without ->siglock held, how can it ensure the target > > can't exit before we take task->siglock? > > And by "target can't exit" you mean, how can we ensure that the target > doesn't execute __exit_signal() and set tsk->signal to NULL No, tsk->signal can't go away, ->sighand can. This means that prepare_signal()->sig_ignored() is not safe. Another problem is, __send_signal() shouldn't add the new sigqueue to tsk->pending if this tsk has already passed __exit_signal()->flush_sigqueue(). > While we're discussing the technique for stopping tasks from exiting... > > Is there a reason that a short-term reference counter isn't used to > prevent this, instead of taking the siglock? Well, sighand->count is the reference counter. The problem is, ->sighand is not per-process, we can share it with abother CLONE_SIGHAND process and de_thread() can change ->sighand during exec. Also. We have the code which checks ->sighand != NULL to check if this thread was released or not. I was going to try to add some cleanups here after the scope of ->signal was changed, but right now I can't recall what I had in mind. Anyway, everything in sighand_struct needs ->siglock anyway, a lockless access doesn't buy too much currently. > > > @@ -1666,7 +1779,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, > > > } > > > > > > sighand = parent->sighand; > > > - spin_lock_irqsave(&sighand->siglock, flags); > > > + read_lock_irqsave(&sighand->action_lock, flags); > > > + spin_lock(&sighand->siglock); > > > > Why do we need both? (the same for do_notify_parent) > > We need action_lock because we're reading sighand->action and making > decisions based upon its value, so we need it to not change. Also, > __send_signal() Ah, indeed, thanks. Somehow I misread the code as if it takes task->siglock, not sighand->siglock. But anyway I was wrong, I forgot we are going to send the signal. Oleg. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock 2011-04-14 19:00 ` Oleg Nesterov @ 2011-04-16 13:08 ` Matt Fleming 2011-04-18 16:45 ` Oleg Nesterov 0 siblings, 1 reply; 29+ messages in thread From: Matt Fleming @ 2011-04-16 13:08 UTC (permalink / raw) To: Oleg Nesterov Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming, Chris Metcalf On Thu, 14 Apr 2011 21:00:12 +0200 Oleg Nesterov <oleg@redhat.com> wrote: [Added Chris to Cc. Chris, check out point 3 below] > No, tsk->signal can't go away, ->sighand can. This means that > prepare_signal()->sig_ignored() is not safe. Sorry, that was a typo - I meant tsk->sighand. > Another problem is, __send_signal() shouldn't add the new sigqueue to > tsk->pending if this tsk has already passed __exit_signal()->flush_sigqueue(). Right. > > While we're discussing the technique for stopping tasks from exiting... > > > > Is there a reason that a short-term reference counter isn't used to > > prevent this, instead of taking the siglock? > > Well, sighand->count is the reference counter. The problem is, ->sighand > is not per-process, we can share it with abother CLONE_SIGHAND process > and de_thread() can change ->sighand during exec. What I meant was, a reference to say "You can't change/free ->sighand because I'm reading/modifying it". So you'd have two new functions, get_sighand() and put_sighand(), which would protect the sighand from changing while you were looking at it. Obviously, you'd still need to see if sighand = NULL, but you wouldn't need to grab the shared siglock. Note how this is different from sighand->count. sighand->count is a much longer term reference which stops it being freed while a task is using it, kinda like a "Don't free _MY_ sighand" reference, whereas what I'm talking about is a "I'm touching YOUR sighand, so don't change/free it" reference, e.g. a short term ref for when we're operating on a target task. It could be that the two references can really be just one atomic_t, I would have to write the code to figure that out. (Note I haven't thought this through all the way yet. There may be a reason that the above idea is the most stupid thing anyone's ever heard) Now, at the moment that suggestion just seems like needless overhead because siglock already provides the features we want. But, my problem with siglock is, 1. It needs to be acquired to stop a task passing through __exit_signal(). 2. It protects bits of signal_struct and that struct is getting pretty bloated and siglock is being used to protect lots of different things. Do we really need to completely prevent signals being generated/delivered to a thread group while we modify signal->it[] in kernel/itimer.c? 3. I suspect most people find the rules of ->sighand pretty confusing. Just look at arch/tile/kernel/hardwall.c:do_hardwall_trap() the use of siglock there looks buggy to me. What if 'p' passes through __exit_signal()? 'p' might not have reached free_task() yet and so won't have called hardwall_deactivate(), which means 'p' could still be on rect->task_head. Chris? 4. It's a heavily contended lock in thread groups (which was the reason for this patch series). > I was going to try to add some cleanups here after the scope of ->signal > was changed, but right now I can't recall what I had in mind. Anyway, > everything in sighand_struct needs ->siglock anyway, a lockless access > doesn't buy too much currently. Well, as I think my patch series shows, (in theory) not needing to acquire ->siglock gets us decent scalability improvements (in practice it was buggy as hell but that should be fixable ;-). Do you have any recollection of the cleanups? signal_struct needs to be put on a diet for sure. -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock 2011-04-16 13:08 ` Matt Fleming @ 2011-04-18 16:45 ` Oleg Nesterov 2011-04-21 19:03 ` arch/tile/kernel/hardwall.c:do_hardwall_trap unsafe/wrong usage of ->sighand Oleg Nesterov 2011-04-26 9:46 ` [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock Matt Fleming 0 siblings, 2 replies; 29+ messages in thread From: Oleg Nesterov @ 2011-04-18 16:45 UTC (permalink / raw) To: Matt Fleming Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming, Chris Metcalf On 04/16, Matt Fleming wrote: > > On Thu, 14 Apr 2011 21:00:12 +0200 > Oleg Nesterov <oleg@redhat.com> wrote: > > > > Is there a reason that a short-term reference counter isn't used to > > > prevent this, instead of taking the siglock? > > > > Well, sighand->count is the reference counter. The problem is, ->sighand > > is not per-process, we can share it with abother CLONE_SIGHAND process > > and de_thread() can change ->sighand during exec. > > What I meant was, a reference to say "You can't change/free ->sighand > because I'm reading/modifying it". So you'd have two new functions, > get_sighand() and put_sighand(), which would protect the sighand from > changing while you were looking at it. Obviously, you'd still need to > see if sighand = NULL, but you wouldn't need to grab the shared > siglock. > > Note how this is different from sighand->count. sighand->count is a > much longer term reference which stops it being freed while a task is > using it, kinda like a "Don't free _MY_ sighand" reference, whereas > what I'm talking about is a "I'm touching YOUR sighand, so don't > change/free it" reference, e.g. a short term ref for when we're > operating on a target task. It could be that the two references can > really be just one atomic_t, I would have to write the code to figure > that out. Can't understand... OK, someone does get_sighand(). Now, what de_thread() should do if it wants to change ->sighand? And I don't really understand the point. You can read *sighand lockless. But you need some per-CLONE_SIGHAND lock if you want to modify it anyway. > Now, at the moment that suggestion just seems like needless overhead > because siglock already provides the features we want. But, my problem > with siglock is, > > 1. It needs to be acquired to stop a task passing through > __exit_signal(). > > 2. It protects bits of signal_struct and that struct is getting > pretty bloated and siglock is being used to protect lots of > different things. Yes, this is the main problem: it is overused. We need the better locking. Honestly, _personally_ I do not really care about scalability (but perhaps I should) when it comes to signals, but there are other problems. And, apart from the already mentioned problems with signals-from-irq, I think the main problem is tasklist_lock in do_wait/exit/etc pathes. And we still have the problems with signals which should be fixed. de_thread() can miss a signal, vfork() should be interruptible, do_coredump() should be interruptible. But first of all we need to define better the behaviour of explicit SIGKILL and what it means after exit_signals(). This should be fixed first, I think. > 3. I suspect most people find the rules of ->sighand pretty > confusing. Just look at > > arch/tile/kernel/hardwall.c:do_hardwall_trap() > > the use of siglock there looks buggy to me. Indeed, I agree. It shouldn't use __group_send_sig_info() at all. I'll send the patch. Nobody outside of signal code should play with ->sighand, this is almost always wrong. There is another problem, historically we have a lot, a lot of send-signal helpers, but you can never find the right one. And the naming sucks. > Do you have any recollection of the cleanups? signal_struct needs to be > put on a diet for sure. I was going to remove ->sighand from fs/proc first, probably I should try to resend these patches... Then we should remove the "sighand != NULL" checks, we need the new helper, and btw it should be used instead of pid_alive(). Then something else... boring ;) Oleg. ^ permalink raw reply [flat|nested] 29+ messages in thread
* arch/tile/kernel/hardwall.c:do_hardwall_trap unsafe/wrong usage of ->sighand 2011-04-18 16:45 ` Oleg Nesterov @ 2011-04-21 19:03 ` Oleg Nesterov 2011-04-22 13:04 ` Chris Metcalf 2011-04-26 9:46 ` [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock Matt Fleming 1 sibling, 1 reply; 29+ messages in thread From: Oleg Nesterov @ 2011-04-21 19:03 UTC (permalink / raw) To: Matt Fleming, Chris Metcalf Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On 04/18, Oleg Nesterov wrote: > > On 04/16, Matt Fleming wrote: > > > > 3. I suspect most people find the rules of ->sighand pretty > > confusing. Just look at > > > > arch/tile/kernel/hardwall.c:do_hardwall_trap() > > > > the use of siglock there looks buggy to me. > > Indeed, I agree. It shouldn't use __group_send_sig_info() at all. > I'll send the patch. Nobody outside of signal code should play with > ->sighand, this is almost always wrong. Hmm. It turns out, I can't make the patch because I do not understand what this code tries to do. hardwall_activate() adds the thread to hardwall_list, but do_hardwall_trap() sends the signal to the whole process. I know nothing about arch/tile and probably this is correct, but could you confirm this? Note that SIGILL can be delivered to another thread in the thread-group, is it correct? Also. Is it supposed that SIGILL can have a hanlder or can be blocked, or it should always kill the whole thread group? I think we need the patch below, assuming that SIGILL should be sent to the single thread and it is fine to have a handler for SIGILL. Oleg. --- sigprocmask/arch/tile/kernel/hardwall.c~1_sighand 2011-04-06 21:33:42.000000000 +0200 +++ sigprocmask/arch/tile/kernel/hardwall.c 2011-04-21 20:56:36.000000000 +0200 @@ -268,12 +268,10 @@ void __kprobes do_hardwall_trap(struct p found_processes = 0; list_for_each_entry(p, &rect->task_head, thread.hardwall_list) { BUG_ON(p->thread.hardwall != rect); - if (p->sighand) { + if (!(p->flags & PF_EXITING)) { found_processes = 1; pr_notice("hardwall: killing %d\n", p->pid); - spin_lock(&p->sighand->siglock); - __group_send_sig_info(info.si_signo, &info, p); - spin_unlock(&p->sighand->siglock); + do_send_sig_info(info.si_signo, &info, p, false); } } if (!found_processes) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: arch/tile/kernel/hardwall.c:do_hardwall_trap unsafe/wrong usage of ->sighand 2011-04-21 19:03 ` arch/tile/kernel/hardwall.c:do_hardwall_trap unsafe/wrong usage of ->sighand Oleg Nesterov @ 2011-04-22 13:04 ` Chris Metcalf 2011-04-26 20:36 ` [PATCH 0/1] tile: do_hardwall_trap: do not play with task->sighand Oleg Nesterov 0 siblings, 1 reply; 29+ messages in thread From: Chris Metcalf @ 2011-04-22 13:04 UTC (permalink / raw) To: Oleg Nesterov Cc: Matt Fleming, Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On 4/21/2011 9:03 PM, Oleg Nesterov wrote: > On 04/18, Oleg Nesterov wrote: >> On 04/16, Matt Fleming wrote: >>> 3. I suspect most people find the rules of ->sighand pretty >>> confusing. Just look at >>> >>> arch/tile/kernel/hardwall.c:do_hardwall_trap() >>> >>> the use of siglock there looks buggy to me. >> Indeed, I agree. It shouldn't use __group_send_sig_info() at all. >> I'll send the patch. Nobody outside of signal code should play with >> ->sighand, this is almost always wrong. > Hmm. It turns out, I can't make the patch because I do not understand > what this code tries to do. > > hardwall_activate() adds the thread to hardwall_list, but do_hardwall_trap() > sends the signal to the whole process. I know nothing about arch/tile and > probably this is correct, but could you confirm this? Yes, the intended behavior is to send the signal to the process, as a way of indicating the OS's displeasure with sending a malformed packet on the user network. But I think sending it to the specific thread is reasonable too; I don't have a strong preference in this design. > Note that SIGILL can be delivered to another thread in the thread-group, is > it correct? > > Also. Is it supposed that SIGILL can have a hanlder or can be blocked, or > it should always kill the whole thread group? A handler would be reasonable for the process. > I think we need the patch below, assuming that SIGILL should be sent to > the single thread and it is fine to have a handler for SIGILL. Thanks; I appreciate the additional code review in any case. I'll look at the ramifications of the change in more detail when I return from vacation late next week. > Oleg. > > --- sigprocmask/arch/tile/kernel/hardwall.c~1_sighand 2011-04-06 21:33:42.000000000 +0200 > +++ sigprocmask/arch/tile/kernel/hardwall.c 2011-04-21 20:56:36.000000000 +0200 > @@ -268,12 +268,10 @@ void __kprobes do_hardwall_trap(struct p > found_processes = 0; > list_for_each_entry(p, &rect->task_head, thread.hardwall_list) { > BUG_ON(p->thread.hardwall != rect); > - if (p->sighand) { > + if (!(p->flags & PF_EXITING)) { > found_processes = 1; > pr_notice("hardwall: killing %d\n", p->pid); > - spin_lock(&p->sighand->siglock); > - __group_send_sig_info(info.si_signo, &info, p); > - spin_unlock(&p->sighand->siglock); > + do_send_sig_info(info.si_signo, &info, p, false); > } > } > if (!found_processes) > -- Chris Metcalf, Tilera Corp. http://www.tilera.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 0/1] tile: do_hardwall_trap: do not play with task->sighand 2011-04-22 13:04 ` Chris Metcalf @ 2011-04-26 20:36 ` Oleg Nesterov 2011-04-26 20:37 ` [PATCH 1/1] " Oleg Nesterov 0 siblings, 1 reply; 29+ messages in thread From: Oleg Nesterov @ 2011-04-26 20:36 UTC (permalink / raw) To: Chris Metcalf Cc: Matt Fleming, Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On 04/22, Chris Metcalf wrote: > > On 4/21/2011 9:03 PM, Oleg Nesterov wrote: > > Hmm. It turns out, I can't make the patch because I do not understand > > what this code tries to do. > > > > hardwall_activate() adds the thread to hardwall_list, but do_hardwall_trap() > > sends the signal to the whole process. I know nothing about arch/tile and > > probably this is correct, but could you confirm this? > > Yes, the intended behavior is to send the signal to the process, as a way > of indicating the OS's displeasure with sending a malformed packet on the > user network. But I think sending it to the specific thread is reasonable > too; I don't have a strong preference in this design. > > > Note that SIGILL can be delivered to another thread in the thread-group, is > > it correct? > > > > Also. Is it supposed that SIGILL can have a hanlder or can be blocked, or > > it should always kill the whole thread group? > > A handler would be reasonable for the process. OK. In this case the thread-specific SIGILL makes more sense afaics. > > I think we need the patch below, assuming that SIGILL should be sent to > > the single thread and it is fine to have a handler for SIGILL. > > Thanks; I appreciate the additional code review in any case. I'll look at > the ramifications of the change in more detail when I return from vacation > late next week. Great. I am sending the same patch + the changelog. Please do not forget, I know _nothing_ about arch/tile, and of course the patch was not tested. Oleg. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/1] tile: do_hardwall_trap: do not play with task->sighand 2011-04-26 20:36 ` [PATCH 0/1] tile: do_hardwall_trap: do not play with task->sighand Oleg Nesterov @ 2011-04-26 20:37 ` Oleg Nesterov 2011-05-02 22:42 ` Chris Metcalf 0 siblings, 1 reply; 29+ messages in thread From: Oleg Nesterov @ 2011-04-26 20:37 UTC (permalink / raw) To: Chris Metcalf Cc: Matt Fleming, Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming 1. do_hardwall_trap() checks ->sighand != NULL and then takes ->siglock. This is unsafe even if the task can't run (I assume it is pinned to the same CPU), its parent can reap the task and set ->sighand = NULL right after this check. Even if the compiler dosn't read ->sighand twice and this memory can't to away __group_send_sig_info() is wrong after that. Use do_send_sig_info(). 2. Send SIGILL to the thread, not to the whole process. Unless it has the handler or blocked this kills the whole thread-group as before. IIUC, different threads can be bound to different rect's. 3. Check PF_EXITING instead of ->sighand. A zombie thread can go away but its ->sighand can be !NULL. Reported-by: Matt Fleming <matt@console-pimps.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- arch/tile/kernel/hardwall.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) --- sigprocmask/arch/tile/kernel/hardwall.c~1_sighand 2011-04-06 21:33:42.000000000 +0200 +++ sigprocmask/arch/tile/kernel/hardwall.c 2011-04-21 20:56:36.000000000 +0200 @@ -268,12 +268,10 @@ void __kprobes do_hardwall_trap(struct p found_processes = 0; list_for_each_entry(p, &rect->task_head, thread.hardwall_list) { BUG_ON(p->thread.hardwall != rect); - if (p->sighand) { + if (!(p->flags & PF_EXITING)) { found_processes = 1; pr_notice("hardwall: killing %d\n", p->pid); - spin_lock(&p->sighand->siglock); - __group_send_sig_info(info.si_signo, &info, p); - spin_unlock(&p->sighand->siglock); + do_send_sig_info(info.si_signo, &info, p, false); } } if (!found_processes) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/1] tile: do_hardwall_trap: do not play with task->sighand 2011-04-26 20:37 ` [PATCH 1/1] " Oleg Nesterov @ 2011-05-02 22:42 ` Chris Metcalf 0 siblings, 0 replies; 29+ messages in thread From: Chris Metcalf @ 2011-05-02 22:42 UTC (permalink / raw) To: Oleg Nesterov Cc: Matt Fleming, Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On 4/26/2011 4:37 PM, Oleg Nesterov wrote: > 1. do_hardwall_trap() checks ->sighand != NULL and then takes ->siglock. > > This is unsafe even if the task can't run (I assume it is pinned to > the same CPU), its parent can reap the task and set ->sighand = NULL > right after this check. Even if the compiler dosn't read ->sighand > twice and this memory can't to away __group_send_sig_info() is wrong > after that. Use do_send_sig_info(). > > 2. Send SIGILL to the thread, not to the whole process. Unless it has > the handler or blocked this kills the whole thread-group as before. > IIUC, different threads can be bound to different rect's. > > 3. Check PF_EXITING instead of ->sighand. A zombie thread can go away > but its ->sighand can be !NULL. > > Reported-by: Matt Fleming <matt@console-pimps.org> > Signed-off-by: Oleg Nesterov <oleg@redhat.com> Thanks, I've taken this change into the tile tree. -- Chris Metcalf, Tilera Corp. http://www.tilera.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock 2011-04-18 16:45 ` Oleg Nesterov 2011-04-21 19:03 ` arch/tile/kernel/hardwall.c:do_hardwall_trap unsafe/wrong usage of ->sighand Oleg Nesterov @ 2011-04-26 9:46 ` Matt Fleming 1 sibling, 0 replies; 29+ messages in thread From: Matt Fleming @ 2011-04-26 9:46 UTC (permalink / raw) To: Oleg Nesterov Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming, Chris Metcalf [Sorry it's taken me so long to reply to this email] On Mon, 18 Apr 2011 18:45:13 +0200 Oleg Nesterov <oleg@redhat.com> wrote: > Can't understand... > > OK, someone does get_sighand(). Now, what de_thread() should do if it > wants to change ->sighand? > > And I don't really understand the point. You can read *sighand lockless. > But you need some per-CLONE_SIGHAND lock if you want to modify it anyway. I think at this stage the best thing for me to do is to write a patch to demonstrate what I'm talking about. > > Now, at the moment that suggestion just seems like needless overhead > > because siglock already provides the features we want. But, my problem > > with siglock is, > > > > 1. It needs to be acquired to stop a task passing through > > __exit_signal(). > > > > 2. It protects bits of signal_struct and that struct is getting > > pretty bloated and siglock is being used to protect lots of > > different things. > > Yes, this is the main problem: it is overused. > > We need the better locking. Honestly, _personally_ I do not really care > about scalability (but perhaps I should) when it comes to signals, but > there are other problems. And, apart from the already mentioned problems > with signals-from-irq, I think the main problem is tasklist_lock in > do_wait/exit/etc pathes. Is the tasklist_lock problem that we acquire the write lock for these paths? Or is it a problem acquiring the read lock too? > And we still have the problems with signals which should be fixed. > de_thread() can miss a signal, vfork() should be interruptible, > do_coredump() should be interruptible. But first of all we need to > define better the behaviour of explicit SIGKILL and what it means > after exit_signals(). This should be fixed first, I think. Hmm.. interesting. Does the SIGKILL case cause any bugs? Or is it more of a theoretical scenario? I must admit that I can't see any problems. > > Do you have any recollection of the cleanups? signal_struct needs to be > > put on a diet for sure. > > I was going to remove ->sighand from fs/proc first, probably I should > try to resend these patches... Then we should remove the "sighand != NULL" > checks, we need the new helper, and btw it should be used instead of > pid_alive(). Then something else... boring ;) Heh. I'd be interested in reviewing these patches if you could find and submit them. -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 29+ messages in thread
* [RFC][PATCH 3/5] ia64: Catch up with new sighand action spinlock 2011-04-05 19:21 [RFC][PATCH 0/5] Improve signal delivery scalability Matt Fleming 2011-04-05 19:21 ` [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' Matt Fleming 2011-04-05 19:21 ` [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock Matt Fleming @ 2011-04-05 19:21 ` Matt Fleming 2011-04-05 19:21 ` [RFC][PATCH 4/5] signals: Introduce __dequeue_private_signal helper function Matt Fleming 2011-04-05 19:21 ` [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery Matt Fleming 4 siblings, 0 replies; 29+ messages in thread From: Matt Fleming @ 2011-04-05 19:21 UTC (permalink / raw) To: Tejun Heo, Oleg Nesterov Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming From: Matt Fleming <matt.fleming@linux.intel.com> Also delete the comment that says holding siglock might not be necessary - it is, parts of signal code require that the handlers for a sighand do not change. Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com> --- arch/ia64/kernel/signal.c | 13 ++----------- 1 files changed, 2 insertions(+), 11 deletions(-) diff --git a/arch/ia64/kernel/signal.c b/arch/ia64/kernel/signal.c index 7bdafc8..d3b37c4 100644 --- a/arch/ia64/kernel/signal.c +++ b/arch/ia64/kernel/signal.c @@ -308,18 +308,9 @@ force_sigsegv_info (int sig, void __user *addr) struct siginfo si; if (sig == SIGSEGV) { - /* - * Acquiring siglock around the sa_handler-update is almost - * certainly overkill, but this isn't a - * performance-critical path and I'd rather play it safe - * here than having to debug a nasty race if and when - * something changes in kernel/signal.c that would make it - * no longer safe to modify sa_handler without holding the - * lock. - */ - spin_lock_irqsave(¤t->sighand->siglock, flags); + write_lock_irqsave(¤t->sighand->action_lock, flags); current->sighand->action[sig - 1].sa.sa_handler = SIG_DFL; - spin_unlock_irqrestore(¤t->sighand->siglock, flags); + write_unlock_irqrestore(¤t->sighand->action_lock, flags); } si.si_signo = SIGSEGV; si.si_errno = 0; -- 1.7.4 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [RFC][PATCH 4/5] signals: Introduce __dequeue_private_signal helper function 2011-04-05 19:21 [RFC][PATCH 0/5] Improve signal delivery scalability Matt Fleming ` (2 preceding siblings ...) 2011-04-05 19:21 ` [RFC][PATCH 3/5] ia64: Catch up with new sighand action spinlock Matt Fleming @ 2011-04-05 19:21 ` Matt Fleming 2011-04-05 19:21 ` [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery Matt Fleming 4 siblings, 0 replies; 29+ messages in thread From: Matt Fleming @ 2011-04-05 19:21 UTC (permalink / raw) To: Tejun Heo, Oleg Nesterov Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming From: Matt Fleming <matt.fleming@linux.intel.com> To keep things symmetric with __dequeue_shared_signal() create a __dequeue_private_signal() function which helps to hide the required locking for accessing tsk->pending. Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com> --- kernel/signal.c | 19 ++++++++++++++----- 1 files changed, 14 insertions(+), 5 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index c67f27a..db0ce0e 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -642,6 +642,18 @@ static int __dequeue_shared_signal(struct task_struct *tsk, return signr; } +static int __dequeue_private_signal(struct task_struct *tsk, + sigset_t *mask, siginfo_t *info) +{ + int signr; + + spin_lock_irq(&tsk->siglock); + signr = __dequeue_signal(&tsk->pending, mask, info); + spin_unlock_irq(&tsk->siglock); + + return signr; +} + /* * Dequeue a signal and return the element to the caller, which is * expected to free it. @@ -665,11 +677,8 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info) /* We only dequeue private signals from ourselves, we don't let * signalfd steal them */ - if (!signr) { - spin_lock(&tsk->siglock); - signr = __dequeue_signal(&tsk->pending, mask, info); - spin_unlock(&tsk->siglock); - } + if (!signr) + signr = __dequeue_private_signal(tsk, mask, info); recalc_sigpending(); if (!signr) -- 1.7.4 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery 2011-04-05 19:21 [RFC][PATCH 0/5] Improve signal delivery scalability Matt Fleming ` (3 preceding siblings ...) 2011-04-05 19:21 ` [RFC][PATCH 4/5] signals: Introduce __dequeue_private_signal helper function Matt Fleming @ 2011-04-05 19:21 ` Matt Fleming 2011-04-13 20:12 ` Oleg Nesterov 4 siblings, 1 reply; 29+ messages in thread From: Matt Fleming @ 2011-04-05 19:21 UTC (permalink / raw) To: Tejun Heo, Oleg Nesterov Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming From: Matt Fleming <matt.fleming@linux.intel.com> To reduce the contention on the shared siglock this patch pushes the responsibility of acquiring and releasing the shared siglock down into the functions that need it. That way, if we don't call a function that needs to be run under the shared siglock, we can run without acquiring it at all. Note that this does not make signal delivery lockless. A signal must still be dequeued from either the shared or private signal queues. However, in the private signal case we can now get by with just acquiring the per-thread siglock which massively reduces contention on the shared siglock. Also update tracehook.h to indicate it's not called with siglock held anymore. With this commit signal delivery now scales much better (though not linearly with the number of threads) as can be seen in these two graphs which execute the signal1 testcase from the Will It Scale benchmark suite, http://userweb.kernel.org/~mfleming/will-it-scale/signals-per-thread-siglock/ Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com> --- fs/signalfd.c | 5 -- include/linux/tracehook.h | 3 +- kernel/compat.c | 7 ++- kernel/signal.c | 124 +++++++++++++++++++++++++++------------------ 4 files changed, 79 insertions(+), 60 deletions(-) diff --git a/fs/signalfd.c b/fs/signalfd.c index ed49c40..a9d5c6f 100644 --- a/fs/signalfd.c +++ b/fs/signalfd.c @@ -146,7 +146,6 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, siginfo_t *info, ssize_t ret; DECLARE_WAITQUEUE(wait, current); - spin_lock_irq(¤t->sighand->siglock); ret = dequeue_signal(current, &ctx->sigmask, info); switch (ret) { case 0: @@ -154,7 +153,6 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, siginfo_t *info, break; ret = -EAGAIN; default: - spin_unlock_irq(¤t->sighand->siglock); return ret; } @@ -168,11 +166,8 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, siginfo_t *info, ret = -ERESTARTSYS; break; } - spin_unlock_irq(¤t->sighand->siglock); schedule(); - spin_lock_irq(¤t->sighand->siglock); } - spin_unlock_irq(¤t->sighand->siglock); remove_wait_queue(¤t->sighand->signalfd_wqh, &wait); __set_current_state(TASK_RUNNING); diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index b073f3c..849336d 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -447,7 +447,7 @@ static inline int tracehook_force_sigpending(void) * @return_ka: sigaction for synthetic signal * * Return zero to check for a real pending signal normally. - * Return -1 after releasing the siglock to repeat the check. + * Return -1 to repeat the check. * Return a signal number to induce an artifical signal delivery, * setting *@info and *@return_ka to specify its details and behavior. * @@ -458,7 +458,6 @@ static inline int tracehook_force_sigpending(void) * %SIGTSTP for stop unless in an orphaned pgrp), but the signal number * reported will be @info->si_signo instead. * - * Called with @task->sighand->siglock held, before dequeuing pending signals. */ static inline int tracehook_get_signal(struct task_struct *task, struct pt_regs *regs, diff --git a/kernel/compat.c b/kernel/compat.c index 38b1d2c..c7c5f9f 100644 --- a/kernel/compat.c +++ b/kernel/compat.c @@ -912,7 +912,6 @@ compat_sys_rt_sigtimedwait (compat_sigset_t __user *uthese, return -EINVAL; } - spin_lock_irq(¤t->sighand->siglock); sig = dequeue_signal(current, &s, &info); if (!sig) { timeout = MAX_SCHEDULE_TIMEOUT; @@ -920,6 +919,7 @@ compat_sys_rt_sigtimedwait (compat_sigset_t __user *uthese, timeout = timespec_to_jiffies(&t) +(t.tv_sec || t.tv_nsec); if (timeout) { + spin_lock_irq(¤t->sighand->siglock); current->real_blocked = current->blocked; sigandsets(¤t->blocked, ¤t->blocked, &s); @@ -928,14 +928,15 @@ compat_sys_rt_sigtimedwait (compat_sigset_t __user *uthese, timeout = schedule_timeout_interruptible(timeout); - spin_lock_irq(¤t->sighand->siglock); sig = dequeue_signal(current, &s, &info); + + spin_lock_irq(¤t->sighand->siglock); current->blocked = current->real_blocked; siginitset(¤t->real_blocked, 0); recalc_sigpending(); + spin_unlock_irq(¤t->sighand->siglock); } } - spin_unlock_irq(¤t->sighand->siglock); if (sig) { ret = sig; diff --git a/kernel/signal.c b/kernel/signal.c index db0ce0e..d31a97a 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -595,8 +595,7 @@ static int __dequeue_shared_signal(struct task_struct *tsk, { int signr; - assert_spin_locked(&tsk->sighand->siglock); - + spin_lock_irq(&tsk->sighand->siglock); signr = __dequeue_signal(&tsk->signal->shared_pending, mask, info); /* @@ -639,6 +638,7 @@ static int __dequeue_shared_signal(struct task_struct *tsk, current->group_stop |= GROUP_STOP_DEQUEUED; } + spin_unlock_irq(&tsk->sighand->siglock); return signr; } @@ -657,12 +657,10 @@ static int __dequeue_private_signal(struct task_struct *tsk, /* * Dequeue a signal and return the element to the caller, which is * expected to free it. - * - * All callers have to hold the siglock. */ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info) { - int signr; + int signr = 0; /* * Dequeue shared signals first since this is where SIGSTOP @@ -672,7 +670,8 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info) * the shared queue, e.g. we'd starve shared signals from * being serviced. */ - signr = __dequeue_shared_signal(tsk, mask, info); + if (PENDING(&tsk->signal->shared_pending, &tsk->blocked)) + signr = __dequeue_shared_signal(tsk, mask, info); /* We only dequeue private signals from ourselves, we don't let * signalfd steal them @@ -684,17 +683,9 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info) if (!signr) return 0; - if ((info->si_code & __SI_MASK) == __SI_TIMER && info->si_sys_private) { - /* - * Release the siglock to ensure proper locking order - * of timer locks outside of siglocks. Note, we leave - * irqs disabled here, since the posix-timers code is - * about to disable them again anyway. - */ - spin_unlock(&tsk->sighand->siglock); + if ((info->si_code & __SI_MASK) == __SI_TIMER && info->si_sys_private) do_schedule_next_timer(info); - spin_lock(&tsk->sighand->siglock); - } + return signr; } @@ -2119,6 +2110,7 @@ static int ptrace_signal(int signr, siginfo_t *info, if (!task_ptrace(current)) return signr; + spin_lock_irq(¤t->sighand->siglock); ptrace_signal_deliver(regs, cookie); /* Let the debugger run. */ @@ -2127,7 +2119,7 @@ static int ptrace_signal(int signr, siginfo_t *info, /* We're back. Did the debugger cancel the sig? */ signr = current->exit_code; if (signr == 0) - return signr; + goto out; current->exit_code = 0; @@ -2161,35 +2153,36 @@ static int ptrace_signal(int signr, siginfo_t *info, signr = 0; } +out: + spin_unlock_irq(¤t->sighand->siglock); return signr; } -int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka, - struct pt_regs *regs, void *cookie) +static inline int __do_signal_stop(struct sighand_struct *sighand) { - struct sighand_struct *sighand = current->sighand; - struct signal_struct *signal = current->signal; - int signr; + int stopped = 0; + spin_lock_irq(&sighand->siglock); + if (current->group_stop & GROUP_STOP_PENDING) + stopped = do_signal_stop(0); -relock: - /* - * We'll jump back here after any time we were stopped in TASK_STOPPED. - * While in TASK_STOPPED, we were considered "frozen enough". - * Now that we woke up, it's crucial if we're supposed to be - * frozen that we freeze now before running anything substantial. - */ - try_to_freeze(); + if (!stopped) + spin_unlock_irq(&sighand->siglock); + + return stopped; +} + +static inline int __do_notify_parent(struct signal_struct *signal, + struct sighand_struct *sighand) +{ + struct task_struct *leader; + int notified = 0; + int why; - spin_lock_irq(&sighand->siglock); /* - * Every stopped thread goes here after wakeup. Check to see if - * we should notify the parent, prepare_signal(SIGCONT) encodes - * the CLD_ si_code into SIGNAL_CLD_MASK bits. + * We could have raced with another writer. */ - if (unlikely(signal->flags & SIGNAL_CLD_MASK)) { - struct task_struct *leader; - int why; - + spin_lock_irq(&sighand->siglock); + if (signal->flags & SIGNAL_CLD_MASK) { if (signal->flags & SIGNAL_CLD_CONTINUED) why = CLD_CONTINUED; else @@ -2216,8 +2209,39 @@ relock: do_notify_parent_cldstop(leader, true, why); read_unlock(&tasklist_lock); + notified = 1; + } else { + spin_unlock_irq(&sighand->siglock); + notified = 0; + } + + return notified; +} + +int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka, + struct pt_regs *regs, void *cookie) +{ + struct sighand_struct *sighand = current->sighand; + struct signal_struct *signal = current->signal; + int signr; + +relock: + /* + * We'll jump back here after any time we were stopped in TASK_STOPPED. + * While in TASK_STOPPED, we were considered "frozen enough". + * Now that we woke up, it's crucial if we're supposed to be + * frozen that we freeze now before running anything substantial. + */ + try_to_freeze(); - goto relock; + /* + * Every stopped thread goes here after wakeup. Check to see if + * we should notify the parent, prepare_signal(SIGCONT) encodes + * the CLD_ si_code into SIGNAL_CLD_MASK bits. + */ + if (unlikely(signal->flags & SIGNAL_CLD_MASK)) { + if (__do_notify_parent(signal, sighand)) + goto relock; } for (;;) { @@ -2234,8 +2258,13 @@ relock: ka = return_ka; else { if (unlikely(current->group_stop & - GROUP_STOP_PENDING) && do_signal_stop(0)) - goto relock; + GROUP_STOP_PENDING)) { + int stopped; + + stopped = __do_signal_stop(sighand); + if (stopped) + goto relock; + } signr = dequeue_signal(current, ¤t->blocked, info); @@ -2302,20 +2331,18 @@ relock: * We need to check for that and bail out if necessary. */ if (signr != SIGSTOP) { - spin_unlock_irq(&sighand->siglock); - /* signals can be posted during this window */ - if (is_current_pgrp_orphaned()) goto relock; - spin_lock_irq(&sighand->siglock); } + spin_lock_irq(&sighand->siglock); if (likely(do_signal_stop(info->si_signo))) { /* It released the siglock. */ goto relock; } + spin_unlock_irq(&sighand->siglock); /* * We didn't actually stop, due to a race @@ -2324,8 +2351,6 @@ relock: continue; } - spin_unlock_irq(&sighand->siglock); - /* * Anything else is fatal, maybe with a core dump. */ @@ -2351,7 +2376,6 @@ relock: do_group_exit(info->si_signo); /* NOTREACHED */ } - spin_unlock_irq(&sighand->siglock); return signr; } @@ -2640,7 +2664,6 @@ SYSCALL_DEFINE4(rt_sigtimedwait, const sigset_t __user *, uthese, return -EINVAL; } - spin_lock_irq(¤t->sighand->siglock); sig = dequeue_signal(current, &these, &info); if (!sig) { timeout = MAX_SCHEDULE_TIMEOUT; @@ -2652,6 +2675,7 @@ SYSCALL_DEFINE4(rt_sigtimedwait, const sigset_t __user *, uthese, /* None ready -- temporarily unblock those we're * interested while we are sleeping in so that we'll * be awakened when they arrive. */ + spin_lock_irq(¤t->sighand->siglock); current->real_blocked = current->blocked; sigandsets(¤t->blocked, ¤t->blocked, &these); recalc_sigpending(); @@ -2664,9 +2688,9 @@ SYSCALL_DEFINE4(rt_sigtimedwait, const sigset_t __user *, uthese, current->blocked = current->real_blocked; siginitset(¤t->real_blocked, 0); recalc_sigpending(); + spin_unlock_irq(¤t->sighand->siglock); } } - spin_unlock_irq(¤t->sighand->siglock); if (sig) { ret = sig; -- 1.7.4 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery 2011-04-05 19:21 ` [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery Matt Fleming @ 2011-04-13 20:12 ` Oleg Nesterov 2011-04-14 10:57 ` Matt Fleming 0 siblings, 1 reply; 29+ messages in thread From: Oleg Nesterov @ 2011-04-13 20:12 UTC (permalink / raw) To: Matt Fleming Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On 04/05, Matt Fleming wrote: > > To reduce the contention on the shared siglock this patch pushes the > responsibility of acquiring and releasing the shared siglock down into > the functions that need it. That way, if we don't call a function that > needs to be run under the shared siglock, we can run without acquiring > it at all. This adds new races. And this time I do not even understand the intent. I mean, it is not clear to me why this change can really help to speed up get_signal_to_deliver(). > Note that this does not make signal delivery lockless. A signal must > still be dequeued from either the shared or private signal > queues. However, in the private signal case we can now get by with > just acquiring the per-thread siglock OK, we can dequeue the signal. But dequeue_signal()->recalc_sigpending() becomes even more wrong. We do not hold any lock, we can race with both shared/private signal sending. > Also update tracehook.h to indicate it's not called with siglock held > anymore. Heh. This breaks this tracehook completely ;) OK, nobody cares about the out-of-tree users, forget. Also. get_signal_to_deliver() does signr = dequeue_signal(current, ¤t->blocked, info); ... ka = &sighand->action[signr-1]; ... if (ka->sa.sa_handler != SIG_DFL) { /* Run the handler. */ *return_ka = *ka; This memcpy() can race with sys_rt_sigaction(), we can't read *ka atomically. Actually, even SIG_DFL/SIG_IGN checks can race, although this is minor... But still not correct. if (ka->sa.sa_flags & SA_ONESHOT) { write_lock(&sighand->action_lock); ka->sa.sa_handler = SIG_DFL; write_unlock(&sighand->action_lock); We should check SA_ONESHOT under ->action_lock. But even then this will bw racy, although we can probably ignore this... Suppose that SA_ONESHOT was set after we dequeued the signal. Oleg. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery 2011-04-13 20:12 ` Oleg Nesterov @ 2011-04-14 10:57 ` Matt Fleming 2011-04-14 19:20 ` Oleg Nesterov 0 siblings, 1 reply; 29+ messages in thread From: Matt Fleming @ 2011-04-14 10:57 UTC (permalink / raw) To: Oleg Nesterov Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On Wed, 13 Apr 2011 22:12:19 +0200 Oleg Nesterov <oleg@redhat.com> wrote: > On 04/05, Matt Fleming wrote: > > > > To reduce the contention on the shared siglock this patch pushes the > > responsibility of acquiring and releasing the shared siglock down into > > the functions that need it. That way, if we don't call a function that > > needs to be run under the shared siglock, we can run without acquiring > > it at all. > > This adds new races. And this time I do not even understand the intent. > I mean, it is not clear to me why this change can really help to speed > up get_signal_to_deliver(). Again, it's not necessarily speeding up get_signal_to_deliver(), but rather it's reducing the contention on the shared siglock. For example, without this patch, if you've got someone sending a signal to a task group, you can't run get_signal_to_deliver() in parallel because you'll be waiting for the sending thread to release the shared siglock. Which, if you were going to dequeue a private signal anyway and didn't need to access signal->shared_pending, is unnecessary overhead :-( As it turns out, the shared siglock protects more than just signal->shared_pending, so in certain cases you need to acquire it anyway (like the fatal signal code paths) so this isn't as optimised as it could be, which is a shame. > > Note that this does not make signal delivery lockless. A signal must > > still be dequeued from either the shared or private signal > > queues. However, in the private signal case we can now get by with > > just acquiring the per-thread siglock > > OK, we can dequeue the signal. But dequeue_signal()->recalc_sigpending() > becomes even more wrong. We do not hold any lock, we can race with both > shared/private signal sending. Yep, this was covered in the previous patch review. > > Also update tracehook.h to indicate it's not called with siglock held > > anymore. > > Heh. This breaks this tracehook completely ;) OK, nobody cares about > the out-of-tree users, forget. I was hoping you'd say that ;-) > Also. get_signal_to_deliver() does > > signr = dequeue_signal(current, ¤t->blocked, > info); > ... > > ka = &sighand->action[signr-1]; > > ... > > if (ka->sa.sa_handler != SIG_DFL) { > /* Run the handler. */ > *return_ka = *ka; > > This memcpy() can race with sys_rt_sigaction(), we can't read *ka > atomically. Eek! I hadn't noticed that. Thanks. > Actually, even SIG_DFL/SIG_IGN checks can race, although this is minor... > But still not correct. > > if (ka->sa.sa_flags & SA_ONESHOT) { > write_lock(&sighand->action_lock); > ka->sa.sa_handler = SIG_DFL; > write_unlock(&sighand->action_lock); > > We should check SA_ONESHOT under ->action_lock. But even then this > will bw racy, although we can probably ignore this... Suppose that > SA_ONESHOT was set after we dequeued the signal. Right, most of this side is wrong wrt to the action_lock. Thanks Oleg. -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery 2011-04-14 10:57 ` Matt Fleming @ 2011-04-14 19:20 ` Oleg Nesterov 2011-04-16 13:27 ` Matt Fleming 0 siblings, 1 reply; 29+ messages in thread From: Oleg Nesterov @ 2011-04-14 19:20 UTC (permalink / raw) To: Matt Fleming Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On 04/14, Matt Fleming wrote: > > On Wed, 13 Apr 2011 22:12:19 +0200 > Oleg Nesterov <oleg@redhat.com> wrote: > > > This adds new races. And this time I do not even understand the intent. > > I mean, it is not clear to me why this change can really help to speed > > up get_signal_to_deliver(). > > Again, it's not necessarily speeding up get_signal_to_deliver(), but > rather it's reducing the contention on the shared siglock. Yes, sorry for confusion. I used the "speed up" term wrongly throughout. I understand what are you trying to do. But yes, in this case I probably missed the intent, > For example, without this patch, if you've got someone sending a signal > to a task group, you can't run get_signal_to_deliver() in parallel I missed the simple fact, get_signal_to_deliver() could avoid ->siglock completely if it dequeues the private signal. Btw, I forgot to mention another problem. We should not dequeue from signal->shared_pending before task->pending. There are various reasons why we shouldn't, but in particular please look at a27341cd "Prioritize synchronous signals over 'normal' signals". Oleg. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery 2011-04-14 19:20 ` Oleg Nesterov @ 2011-04-16 13:27 ` Matt Fleming 0 siblings, 0 replies; 29+ messages in thread From: Matt Fleming @ 2011-04-16 13:27 UTC (permalink / raw) To: Oleg Nesterov Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin, Matt Fleming On Thu, 14 Apr 2011 21:20:59 +0200 Oleg Nesterov <oleg@redhat.com> wrote: > Btw, I forgot to mention another problem. We should not dequeue from > signal->shared_pending before task->pending. There are various reasons > why we shouldn't, but in particular please look at a27341cd > "Prioritize synchronous signals over 'normal' signals". Oh, that's an interesting point! I hadn't thought of that. OK, I'll swap the dequeue order back to the way it was. This actually makes things a little easier as we can now get away with not reading the list of pending shared signals in the fast path. -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2011-05-02 22:42 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-04-05 19:21 [RFC][PATCH 0/5] Improve signal delivery scalability Matt Fleming 2011-04-05 19:21 ` [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' Matt Fleming 2011-04-05 20:19 ` Oleg Nesterov 2011-04-05 20:50 ` Matt Fleming 2011-04-06 12:57 ` Oleg Nesterov 2011-04-06 13:09 ` Tejun Heo 2011-04-06 13:30 ` Matt Fleming 2011-04-06 13:15 ` Matt Fleming 2011-04-11 18:50 ` Oleg Nesterov 2011-04-11 19:24 ` Matt Fleming 2011-04-05 19:21 ` [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock Matt Fleming 2011-04-13 19:42 ` Oleg Nesterov 2011-04-14 10:34 ` Matt Fleming 2011-04-14 19:00 ` Oleg Nesterov 2011-04-16 13:08 ` Matt Fleming 2011-04-18 16:45 ` Oleg Nesterov 2011-04-21 19:03 ` arch/tile/kernel/hardwall.c:do_hardwall_trap unsafe/wrong usage of ->sighand Oleg Nesterov 2011-04-22 13:04 ` Chris Metcalf 2011-04-26 20:36 ` [PATCH 0/1] tile: do_hardwall_trap: do not play with task->sighand Oleg Nesterov 2011-04-26 20:37 ` [PATCH 1/1] " Oleg Nesterov 2011-05-02 22:42 ` Chris Metcalf 2011-04-26 9:46 ` [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock Matt Fleming 2011-04-05 19:21 ` [RFC][PATCH 3/5] ia64: Catch up with new sighand action spinlock Matt Fleming 2011-04-05 19:21 ` [RFC][PATCH 4/5] signals: Introduce __dequeue_private_signal helper function Matt Fleming 2011-04-05 19:21 ` [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery Matt Fleming 2011-04-13 20:12 ` Oleg Nesterov 2011-04-14 10:57 ` Matt Fleming 2011-04-14 19:20 ` Oleg Nesterov 2011-04-16 13:27 ` Matt Fleming
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.