All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] Improve signal delivery scalability
@ 2011-04-05 19:21 Matt Fleming
  2011-04-05 19:21 ` [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' Matt Fleming
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Matt Fleming @ 2011-04-05 19:21 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin,
	Matt Fleming

From: Matt Fleming <matt.fleming@linux.intel.com>

This series is most definitely RFC material. At this early stage I'm
just asking for feedback on whether the concepts/ideas are sound. For
example, I'm fairly sure that these patches break ia64 because it
accesses tsk->pending in asm and I haven't had time to look closely at
that yet.

This patch series was motivated by the signal1_threads testcase from
the Will It Scale benchmark suite. Currently, signal generation and
delivery in a thread group is serialised by the thread group's
siglock, tsk->sighand->siglock. This lock is a huge point of
contention in multithreaded applications, even when sending a signal
to one thread in a thread group.

In light of this, this patch series tries to improve scalability by,

	- introducing a per-thread siglock to protect the per-thread
	  pending signal queue

	- avoiding acquiring the shared siglock wherever possible

	- using a rwlock to protect signal handlers

This series is based on the assumption that it is OK to acquire and
release the shared siglock across function calls and do things like
execute PENDING() without holding any locks.

The improvement on the "signal delivery" testcase can be seen here,
http://userweb.kernel.org/~mfleming/will-it-scale/signals-per-thread-siglock/

Tejun, these patches are based on your ptrace branch as both you and
Oleg have touched alot of the same code paths lately.

All comments appreciated!

Matt Fleming (5):
  signals: Always place SIGCONT and SIGSTOP on 'shared_pending'
  signals: Introduce per-thread siglock and action rwlock
  ia64: Catch up with new sighand action spinlock
  signals: Introduce __dequeue_private_signal helper function
  signals: Don't hold shared siglock across signal delivery

 arch/ia64/kernel/signal.c |   13 +--
 fs/autofs4/waitq.c        |    2 +
 fs/exec.c                 |    2 +
 fs/signalfd.c             |    7 +-
 include/linux/init_task.h |    2 +
 include/linux/sched.h     |    4 +
 include/linux/signal.h    |    5 +
 include/linux/tracehook.h |    3 +-
 kernel/compat.c           |    7 +-
 kernel/exit.c             |   12 +-
 kernel/fork.c             |    2 +
 kernel/kmod.c             |    8 +-
 kernel/ptrace.c           |    4 +-
 kernel/signal.c           |  429 +++++++++++++++++++++++++++++++++------------
 security/selinux/hooks.c  |    6 +-
 15 files changed, 360 insertions(+), 146 deletions(-)

-- 
1.7.4


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending'
  2011-04-05 19:21 [RFC][PATCH 0/5] Improve signal delivery scalability Matt Fleming
@ 2011-04-05 19:21 ` Matt Fleming
  2011-04-05 20:19   ` Oleg Nesterov
  2011-04-05 19:21 ` [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock Matt Fleming
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Matt Fleming @ 2011-04-05 19:21 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin,
	Matt Fleming

From: Matt Fleming <matt.fleming@linux.intel.com>

Because SIGCONT and SIGSTOP affect an entire thread group, we can
place them on the 'shared_pending' queue. This means that we no longer
have to iterate over every thread in a thread group to remove
SIGCONT/SIGSTOP signals from their private 'pending' queues.

SIGCONT and SIGSTOP signals are now only placed on the shared queue
and not on the private so queue we need to prevent them from being
starved of service. We need to prioritize signals on the shared queue
over signals on the private queue. For example, if we don't do this
there's potential for a task to keep handling private signals even
though they were delivered _after_ a SIGSTOP. Note that POSIX does not
mandate any kind of priority between signals to a thread group or a
specific thread, so this change in behaviour should be safe.

This is an optimization to reduce the amount of code that we execute
while holding ->siglock, and is preparation for introducing a
per-thread siglock for modifying the private signal queue.

Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com>
---
 kernel/signal.c |  123 +++++++++++++++++++++++++++++++++++--------------------
 1 files changed, 79 insertions(+), 44 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 4f7312b..9595d86 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -545,51 +545,41 @@ static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
 }
 
 /*
- * Dequeue a signal and return the element to the caller, which is 
- * expected to free it.
- *
- * All callers have to hold the siglock.
+ * All callers must hold tsk->sighand->siglock.
  */
-int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
+static int __dequeue_shared_signal(struct task_struct *tsk,
+				   sigset_t *mask, siginfo_t *info)
 {
 	int signr;
 
-	/* We only dequeue private signals from ourselves, we don't let
-	 * signalfd steal them
+	assert_spin_locked(&tsk->sighand->siglock);
+
+	signr = __dequeue_signal(&tsk->signal->shared_pending,
+				 mask, info);
+	/*
+	 * itimer signal ?
+	 *
+	 * itimers are process shared and we restart periodic
+	 * itimers in the signal delivery path to prevent DoS
+	 * attacks in the high resolution timer case. This is
+	 * compliant with the old way of self restarting
+	 * itimers, as the SIGALRM is a legacy signal and only
+	 * queued once. Changing the restart behaviour to
+	 * restart the timer in the signal dequeue path is
+	 * reducing the timer noise on heavy loaded !highres
+	 * systems too.
 	 */
-	signr = __dequeue_signal(&tsk->pending, mask, info);
-	if (!signr) {
-		signr = __dequeue_signal(&tsk->signal->shared_pending,
-					 mask, info);
-		/*
-		 * itimer signal ?
-		 *
-		 * itimers are process shared and we restart periodic
-		 * itimers in the signal delivery path to prevent DoS
-		 * attacks in the high resolution timer case. This is
-		 * compliant with the old way of self restarting
-		 * itimers, as the SIGALRM is a legacy signal and only
-		 * queued once. Changing the restart behaviour to
-		 * restart the timer in the signal dequeue path is
-		 * reducing the timer noise on heavy loaded !highres
-		 * systems too.
-		 */
-		if (unlikely(signr == SIGALRM)) {
-			struct hrtimer *tmr = &tsk->signal->real_timer;
-
-			if (!hrtimer_is_queued(tmr) &&
-			    tsk->signal->it_real_incr.tv64 != 0) {
-				hrtimer_forward(tmr, tmr->base->get_time(),
-						tsk->signal->it_real_incr);
-				hrtimer_restart(tmr);
-			}
+	if (unlikely(signr == SIGALRM)) {
+		struct hrtimer *tmr = &tsk->signal->real_timer;
+
+		if (!hrtimer_is_queued(tmr) &&
+		    tsk->signal->it_real_incr.tv64 != 0) {
+			hrtimer_forward(tmr, tmr->base->get_time(),
+					tsk->signal->it_real_incr);
+			hrtimer_restart(tmr);
 		}
 	}
 
-	recalc_sigpending();
-	if (!signr)
-		return 0;
-
 	if (unlikely(sig_kernel_stop(signr))) {
 		/*
 		 * Set a marker that we have dequeued a stop signal.  Our
@@ -605,6 +595,40 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 		 */
 		current->group_stop |= GROUP_STOP_DEQUEUED;
 	}
+
+	return signr;
+}
+
+/*
+ * Dequeue a signal and return the element to the caller, which is 
+ * expected to free it.
+ *
+ * All callers have to hold the siglock.
+ */
+int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
+{
+	int signr;
+
+	/*
+	 * Dequeue shared signals first since this is where SIGSTOP
+	 * and SIGCONTs will be. If we don't dequeue these first
+	 * there's a chance that we could keep handling private
+	 * signals even when there are SIGSTOPs or SIGCONTs pending in
+	 * the shared queue, e.g. we'd starve shared signals from
+	 * being serviced.
+	 */
+	signr = __dequeue_shared_signal(tsk, mask, info);
+
+	/* We only dequeue private signals from ourselves, we don't let
+	 * signalfd steal them
+	 */
+	if (!signr)
+		signr = __dequeue_signal(&tsk->pending, mask, info);
+
+	recalc_sigpending();
+	if (!signr)
+		return 0;
+
 	if ((info->si_code & __SI_MASK) == __SI_TIMER && info->si_sys_private) {
 		/*
 		 * Release the siglock to ensure proper locking order
@@ -779,13 +803,10 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 		 */
 	} else if (sig_kernel_stop(sig)) {
 		/*
-		 * This is a stop signal.  Remove SIGCONT from all queues.
+		 * This is a stop signal.  Remove SIGCONT from the
+		 * shared queue.
 		 */
 		rm_from_queue(sigmask(SIGCONT), &signal->shared_pending);
-		t = p;
-		do {
-			rm_from_queue(sigmask(SIGCONT), &t->pending);
-		} while_each_thread(p, t);
 	} else if (sig == SIGCONT) {
 		unsigned int why;
 		/*
@@ -795,7 +816,6 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 		t = p;
 		do {
 			task_clear_group_stop_pending(t);
-			rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
 			wake_up_state(t, __TASK_STOPPED);
 		} while_each_thread(p, t);
 
@@ -945,7 +965,22 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 	if (!prepare_signal(sig, t, from_ancestor_ns))
 		return 0;
 
-	pending = group ? &t->signal->shared_pending : &t->pending;
+	/*
+	 * We always enqueue SIGSTOP or SIGCONT signals on the shared
+	 * queue. This means that a SIGSTOP or SIGCONT signal _cannot_
+	 * be present on a thread's private pending queue.
+	 *
+	 * This makes prepare_signal() more optimal as we do not have
+	 * to remove signals from each thread's pending queue and so
+	 * can avoid iterating over all threads in the thread group
+	 * (and therefore avoid the locking that would be necessary to
+	 * do that safely).
+	 */
+	if (group || sig_kernel_stop(sig) || sig == SIGCONT)
+		pending = &t->signal->shared_pending;
+	else
+		pending = &t->pending;
+
 	/*
 	 * Short-circuit ignored signals and support queuing
 	 * exactly one non-rt signal, so that we can get more
-- 
1.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock
  2011-04-05 19:21 [RFC][PATCH 0/5] Improve signal delivery scalability Matt Fleming
  2011-04-05 19:21 ` [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' Matt Fleming
@ 2011-04-05 19:21 ` Matt Fleming
  2011-04-13 19:42   ` Oleg Nesterov
  2011-04-05 19:21 ` [RFC][PATCH 3/5] ia64: Catch up with new sighand action spinlock Matt Fleming
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Matt Fleming @ 2011-04-05 19:21 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin,
	Matt Fleming

From: Matt Fleming <matt.fleming@linux.intel.com>

Try to reduce the amount of time we hold the shared siglock by
introducing a per-thread siglock that protects each thread's 'pending'
queue. Currently, the shared siglock protects both the shared and
private signal queues. As such, it is a huge point of contention when
generating and delivering signals to single threads in a multithreaded
process. For example, even if a thread is delivering a signal to
itself (thereby putting a signal on its private signal queue) it must
acquire the shared siglock.

To improve signal generation scalability we only acquire the shared
siglock when generating a shared signal. If we're generating a private
signal which will be delivered to a specific thread (and that signal
does not affect an entire thread group) then we can optimise that case
by only taking the per-thread siglock. However, the case where we're
sending a fatal signal to a specific thread is special because we need
to modify tsk->signal->flags, so we need to be holding the shared
siglock.

Note that we now hold both the shared and per-thread siglock when
delivering a private signal. That will be fixed in the next patch so
that signal delivery scales with signal generation.

Also, because we now run signal code paths without holding the shared
siglock at all, it can no longer be used to protect tsk->sighand->action.
So we introduce a rwlock for protecting tsk->sighand->action. A rwlock
is better than a spinlock in this case because there are many more
readers than writers and a rwlock allows us to scale much better than
a spinlock.

Move the the majority of the shared signal code into a helper function
to make the code in dequeue_signal() easier to read.

Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com>
---
 fs/autofs4/waitq.c        |    2 +
 fs/exec.c                 |    2 +
 fs/signalfd.c             |    2 +
 include/linux/init_task.h |    2 +
 include/linux/sched.h     |    4 +
 include/linux/signal.h    |    5 +
 kernel/exit.c             |   12 ++-
 kernel/fork.c             |    2 +
 kernel/kmod.c             |    8 +-
 kernel/ptrace.c           |    4 +-
 kernel/signal.c           |  189 ++++++++++++++++++++++++++++++++++++++-------
 security/selinux/hooks.c  |    6 +-
 12 files changed, 199 insertions(+), 39 deletions(-)

diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
index 5601005..a7adbf8 100644
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -83,7 +83,9 @@ static int autofs4_write(struct file *file, const void *addr, int bytes)
 	   SIGPIPE unless it was already supposed to get one */
 	if (wr == -EPIPE && !sigpipe) {
 		spin_lock_irqsave(&current->sighand->siglock, flags);
+		spin_lock(&current->siglock);
 		sigdelset(&current->pending.signal, SIGPIPE);
+		spin_unlock(&current->siglock);
 		recalc_sigpending();
 		spin_unlock_irqrestore(&current->sighand->siglock, flags);
 	}
diff --git a/fs/exec.c b/fs/exec.c
index 8328beb..5cc4a6c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1661,8 +1661,10 @@ static int zap_process(struct task_struct *start, int exit_code)
 	do {
 		task_clear_group_stop_pending(t);
 		if (t != current && t->mm) {
+			spin_lock(&t->siglock);
 			sigaddset(&t->pending.signal, SIGKILL);
 			signal_wake_up(t, 1);
+			spin_unlock(&t->siglock);
 			nr++;
 		}
 	} while_each_thread(start, t);
diff --git a/fs/signalfd.c b/fs/signalfd.c
index 492465b..ed49c40 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -48,10 +48,12 @@ static unsigned int signalfd_poll(struct file *file, poll_table *wait)
 	poll_wait(file, &current->sighand->signalfd_wqh, wait);
 
 	spin_lock_irq(&current->sighand->siglock);
+	spin_lock(&current->siglock);
 	if (next_signal(&current->pending, &ctx->sigmask) ||
 	    next_signal(&current->signal->shared_pending,
 			&ctx->sigmask))
 		events |= POLLIN;
+	spin_unlock(&current->siglock);
 	spin_unlock_irq(&current->sighand->siglock);
 
 	return events;
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index caa151f..a948d24 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -47,6 +47,7 @@ extern struct nsproxy init_nsproxy;
 	.action		= { { { .sa_handler = SIG_DFL, } }, },		\
 	.siglock	= __SPIN_LOCK_UNLOCKED(sighand.siglock),	\
 	.signalfd_wqh	= __WAIT_QUEUE_HEAD_INITIALIZER(sighand.signalfd_wqh),	\
+	.action_lock	= __RW_LOCK_UNLOCKED(sighand.action_lock),	\
 }
 
 extern struct group_info init_groups;
@@ -168,6 +169,7 @@ extern struct cred init_cred;
 	.signal		= &init_signals,				\
 	.sighand	= &init_sighand,				\
 	.nsproxy	= &init_nsproxy,				\
+	.siglock	= __SPIN_LOCK_UNLOCKED(tsk.siglock),		\
 	.pending	= {						\
 		.list = LIST_HEAD_INIT(tsk.pending.list),		\
 		.signal = {{0}}},					\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8cef82d..7b2ea86 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -443,6 +443,9 @@ struct sighand_struct {
 	struct k_sigaction	action[_NSIG];
 	spinlock_t		siglock;
 	wait_queue_head_t	signalfd_wqh;
+
+	/* protects action */
+	rwlock_t		action_lock;
 };
 
 struct pacct_struct {
@@ -1358,6 +1361,7 @@ struct task_struct {
 
 	sigset_t blocked, real_blocked;
 	sigset_t saved_sigmask;	/* restored if set_restore_sigmask() was used */
+	spinlock_t siglock;	/* protects pending */
 	struct sigpending pending;
 
 	unsigned long sas_ss_sp;
diff --git a/include/linux/signal.h b/include/linux/signal.h
index fcd2b14..9aad2b0 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -376,6 +376,11 @@ int unhandled_signal(struct task_struct *tsk, int sig);
 	(!siginmask(signr, SIG_KERNEL_IGNORE_MASK|SIG_KERNEL_STOP_MASK) && \
 	 (t)->sighand->action[(signr)-1].sa.sa_handler == SIG_DFL)
 
+/* Should the signal be placed on shared_pending? */
+#define sig_shared(sig)	\
+	(((sig) < SIGRTMIN) && \
+	 siginmask(sig, SIG_KERNEL_STOP_MASK | rt_sigmask(SIGCONT)))
+
 void signals_init(void);
 
 #endif /* __KERNEL__ */
diff --git a/kernel/exit.c b/kernel/exit.c
index 1a0f10f..7774126 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -142,7 +142,9 @@ static void __exit_signal(struct task_struct *tsk)
 	 * Do this under ->siglock, we can race with another thread
 	 * doing sigqueue_free() if we have SIGQUEUE_PREALLOC signals.
 	 */
+	spin_lock(&tsk->siglock);
 	flush_sigqueue(&tsk->pending);
+	spin_unlock(&tsk->siglock);
 	tsk->sighand = NULL;
 	spin_unlock(&sighand->siglock);
 
@@ -386,7 +388,8 @@ int allow_signal(int sig)
 	if (!valid_signal(sig) || sig < 1)
 		return -EINVAL;
 
-	spin_lock_irq(&current->sighand->siglock);
+	write_lock_irq(&current->sighand->action_lock);
+	spin_lock(&current->sighand->siglock);
 	/* This is only needed for daemonize()'ed kthreads */
 	sigdelset(&current->blocked, sig);
 	/*
@@ -396,7 +399,8 @@ int allow_signal(int sig)
 	 */
 	current->sighand->action[(sig)-1].sa.sa_handler = (void __user *)2;
 	recalc_sigpending();
-	spin_unlock_irq(&current->sighand->siglock);
+	spin_unlock(&current->sighand->siglock);
+	write_unlock_irq(&current->sighand->action_lock);
 	return 0;
 }
 
@@ -407,10 +411,10 @@ int disallow_signal(int sig)
 	if (!valid_signal(sig) || sig < 1)
 		return -EINVAL;
 
-	spin_lock_irq(&current->sighand->siglock);
+	write_lock_irq(&current->sighand->action_lock);
 	current->sighand->action[(sig)-1].sa.sa_handler = SIG_IGN;
 	recalc_sigpending();
-	spin_unlock_irq(&current->sighand->siglock);
+	write_unlock_irq(&current->sighand->action_lock);
 	return 0;
 }
 
diff --git a/kernel/fork.c b/kernel/fork.c
index f2b494d..50fa84f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1081,6 +1081,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	p->vfork_done = NULL;
 	spin_lock_init(&p->alloc_lock);
 
+	spin_lock_init(&p->siglock);
 	init_sigpending(&p->pending);
 
 	p->utime = cputime_zero;
@@ -1494,6 +1495,7 @@ static void sighand_ctor(void *data)
 
 	spin_lock_init(&sighand->siglock);
 	init_waitqueue_head(&sighand->signalfd_wqh);
+	rwlock_init(&sighand->action_lock);
 }
 
 void __init proc_caches_init(void)
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 9cd0591..551a6e9 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -134,9 +134,9 @@ static int ____call_usermodehelper(void *data)
 	struct subprocess_info *sub_info = data;
 	int retval;
 
-	spin_lock_irq(&current->sighand->siglock);
+	write_lock_irq(&current->sighand->action_lock);
 	flush_signal_handlers(current, 1);
-	spin_unlock_irq(&current->sighand->siglock);
+	write_unlock_irq(&current->sighand->action_lock);
 
 	/* We can run anywhere, unlike our parent keventd(). */
 	set_cpus_allowed_ptr(current, cpu_all_mask);
@@ -178,9 +178,9 @@ static int wait_for_helper(void *data)
 	pid_t pid;
 
 	/* If SIGCLD is ignored sys_wait4 won't populate the status. */
-	spin_lock_irq(&current->sighand->siglock);
+	write_lock_irq(&current->sighand->action_lock);
 	current->sighand->action[SIGCHLD-1].sa.sa_handler = SIG_DFL;
-	spin_unlock_irq(&current->sighand->siglock);
+	write_unlock_irq(&current->sighand->action_lock);
 
 	pid = kernel_thread(____call_usermodehelper, sub_info, SIGCHLD);
 	if (pid < 0) {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 20d5efd..c72d40d 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -294,10 +294,10 @@ static int ptrace_traceme(void)
 static int ignoring_children(struct sighand_struct *sigh)
 {
 	int ret;
-	spin_lock(&sigh->siglock);
+	read_lock(&sigh->action_lock);
 	ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) ||
 	      (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
-	spin_unlock(&sigh->siglock);
+	read_unlock(&sigh->action_lock);
 	return ret;
 }
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 9595d86..c67f27a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -38,6 +38,45 @@
 #include "audit.h"	/* audit_signal_info() */
 
 /*
+ * Signal locking.
+ *
+ * There are three locks to consider when manipulating signal queues,
+ *
+ *	- tsk->sighand->action_lock	(sighand action rwlock)
+ *	- tsk->sighand->siglock		(shared siglock)
+ *	- tsk->siglock			(per-thread siglock)
+ *
+ * The sighand action rwlock needs to be acquired when modifying
+ * tsk->sighand->action or when you need to prevent modifications.
+ *
+ * When modifying the shared signal queue (signal->shared_pending),
+ * modifying tsk->blocked or handling stop signals (see
+ * do_signal_stop), you need to be holding the shared siglock.
+ *
+ * When modifying the per-thread pending signal queue (tsk->pending)
+ * you need to be holding the per-thread siglock.
+ *
+ * You MUST NOT acquire the shared siglock if already holding the
+ * per-thread siglock because various places iterate over the threads
+ * in a thread group and acquire their per-thread siglocks while
+ * holding the shared siglock - see zap_process().
+ *
+ * You MUST NOT acquire the sighand action lock if already holding the
+ * per-thread siglock.
+ *
+ * So the locking order is...
+ *
+ * tsk->sighand->action_lock
+ *   tsk->sighand->siglock
+ *     tsk->siglock
+ *
+ * OR..
+ *
+ * tsk->sighand->action_lock
+ *   tsk->siglock
+ */
+
+/*
  * SLAB caches for signal bits.
  */
 
@@ -367,7 +406,9 @@ void flush_sigqueue(struct sigpending *queue)
 void __flush_signals(struct task_struct *t)
 {
 	clear_tsk_thread_flag(t, TIF_SIGPENDING);
+	spin_lock(&t->siglock);
 	flush_sigqueue(&t->pending);
+	spin_unlock(&t->siglock);
 	flush_sigqueue(&t->signal->shared_pending);
 }
 
@@ -409,7 +450,9 @@ void flush_itimer_signals(void)
 	unsigned long flags;
 
 	spin_lock_irqsave(&tsk->sighand->siglock, flags);
+	spin_lock(&tsk->siglock);
 	__flush_itimer_signals(&tsk->pending);
+	spin_unlock(&tsk->siglock);
 	__flush_itimer_signals(&tsk->signal->shared_pending);
 	spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
 }
@@ -622,8 +665,11 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 	/* We only dequeue private signals from ourselves, we don't let
 	 * signalfd steal them
 	 */
-	if (!signr)
+	if (!signr) {
+		spin_lock(&tsk->siglock);
 		signr = __dequeue_signal(&tsk->pending, mask, info);
+		spin_unlock(&tsk->siglock);
+	}
 
 	recalc_sigpending();
 	if (!signr)
@@ -931,7 +977,9 @@ static void complete_signal(int sig, struct task_struct *p, int group)
 			t = p;
 			do {
 				task_clear_group_stop_pending(t);
+				spin_lock(&t->siglock);
 				sigaddset(&t->pending.signal, SIGKILL);
+				spin_unlock(&t->siglock);
 				signal_wake_up(t, 1);
 			} while_each_thread(p, t);
 			return;
@@ -960,8 +1008,6 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 
 	trace_signal_generate(sig, info, t);
 
-	assert_spin_locked(&t->sighand->siglock);
-
 	if (!prepare_signal(sig, t, from_ancestor_ns))
 		return 0;
 
@@ -976,7 +1022,12 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 	 * (and therefore avoid the locking that would be necessary to
 	 * do that safely).
 	 */
-	if (group || sig_kernel_stop(sig) || sig == SIGCONT)
+	if (group || sig_shared(sig) || sig_fatal(t, sig))
+		assert_spin_locked(&t->sighand->siglock);
+	else
+		assert_spin_locked(&t->siglock);
+
+	if (group || sig_shared(sig))
 		pending = &t->signal->shared_pending;
 	else
 		pending = &t->pending;
@@ -988,6 +1039,7 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 	 */
 	if (legacy_queue(pending, sig))
 		return 0;
+
 	/*
 	 * fast-pathed signals for kernel-internal things like SIGSTOP
 	 * or SIGKILL.
@@ -1123,11 +1175,34 @@ int do_send_sig_info(int sig, struct siginfo *info, struct task_struct *p,
 	unsigned long flags;
 	int ret = -ESRCH;
 
-	if (lock_task_sighand(p, &flags)) {
+	read_lock_irqsave(&p->sighand->action_lock, flags);
+
+	/*
+	 * We can optimize for the case where we're sending a signal
+	 * to a specific thread. If we're only sending a signal to one
+	 * thread we only need to acquire that thread's siglock and
+	 * not the global siglock.
+	 *
+	 * Fatal signals have the potential to tear down an entire
+	 * thread group, even if we're only sending the signal to a
+	 * specific thread! If we're killing an entire thread group we
+	 * need the shared siglock. Note that we can't check if the
+	 * signal is fatal until we're holding 'action_lock', because
+	 * someone could modify the handler for the signal (some
+	 * signals are only fatal when the handler is SIG_DFL).
+	 */
+	if (!group && !sig_shared(sig) && !sig_fatal(p, sig)) {
+		spin_lock(&p->siglock);
 		ret = send_signal(sig, info, p, group);
-		unlock_task_sighand(p, &flags);
+		spin_unlock(&p->siglock);
+	} else {
+		if (lock_task_sighand(p, &flags)) {
+			ret = send_signal(sig, info, p, group);
+			unlock_task_sighand(p, &flags);
+		}
 	}
 
+	read_unlock_irqrestore(&p->sighand->action_lock, flags);
 	return ret;
 }
 
@@ -1148,8 +1223,10 @@ force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
 	unsigned long int flags;
 	int ret, blocked, ignored;
 	struct k_sigaction *action;
+	int shared_siglock;
 
-	spin_lock_irqsave(&t->sighand->siglock, flags);
+	write_lock_irqsave(&t->sighand->action_lock, flags);
+	spin_lock(&t->sighand->siglock);
 	action = &t->sighand->action[sig-1];
 	ignored = action->sa.sa_handler == SIG_IGN;
 	blocked = sigismember(&t->blocked, sig);
@@ -1162,9 +1239,21 @@ force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
 	}
 	if (action->sa.sa_handler == SIG_DFL)
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
+
+	shared_siglock = sig_shared(sig) || sig_fatal(t, sig);
+	if (!shared_siglock) {
+		spin_unlock(&t->sighand->siglock);
+		spin_lock(&t->siglock);
+	}
+
 	ret = specific_send_sig_info(sig, info, t);
-	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
+	if (!shared_siglock)
+		spin_unlock(&t->siglock);
+	else
+		spin_unlock(&t->sighand->siglock);
+
+	write_unlock_irqrestore(&t->sighand->action_lock, flags);
 	return ret;
 }
 
@@ -1185,7 +1274,9 @@ int zap_other_threads(struct task_struct *p)
 		/* Don't bother with already dead threads */
 		if (t->exit_state)
 			continue;
+		spin_lock(&t->siglock);
 		sigaddset(&t->pending.signal, SIGKILL);
+		spin_unlock(&t->siglock);
 		signal_wake_up(t, 1);
 	}
 
@@ -1314,11 +1405,15 @@ int kill_pid_info_as_uid(int sig, struct siginfo *info, struct pid *pid,
 		goto out_unlock;
 
 	if (sig) {
-		if (lock_task_sighand(p, &flags)) {
+		unsigned long _flags;
+
+		read_lock_irqsave(&p->sighand->action_lock, flags);
+		if (lock_task_sighand(p, &_flags)) {
 			ret = __send_signal(sig, info, p, 1, 0);
-			unlock_task_sighand(p, &flags);
+			unlock_task_sighand(p, &_flags);
 		} else
 			ret = -ESRCH;
+		read_unlock_irqrestore(&p->sighand->action_lock, flags);
 	}
 out_unlock:
 	rcu_read_unlock();
@@ -1411,9 +1506,9 @@ force_sigsegv(int sig, struct task_struct *p)
 {
 	if (sig == SIGSEGV) {
 		unsigned long flags;
-		spin_lock_irqsave(&p->sighand->siglock, flags);
+		write_lock_irqsave(&p->sighand->action_lock, flags);
 		p->sighand->action[sig - 1].sa.sa_handler = SIG_DFL;
-		spin_unlock_irqrestore(&p->sighand->siglock, flags);
+		write_unlock_irqrestore(&p->sighand->action_lock, flags);
 	}
 	force_sig(SIGSEGV, p);
 	return 0;
@@ -1485,14 +1580,24 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
 {
 	int sig = q->info.si_signo;
 	struct sigpending *pending;
-	unsigned long flags;
+	unsigned long flags, _flags;
+	int shared_siglock;
 	int ret;
 
 	BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
 
 	ret = -1;
-	if (!likely(lock_task_sighand(t, &flags)))
-		goto ret;
+	read_lock_irqsave(&t->sighand->action_lock, flags);
+	shared_siglock = group || sig_shared(sig) || sig_fatal(t, sig);
+	if (!shared_siglock) {
+		spin_lock(&t->siglock);
+		pending = &t->pending;
+	} else {
+		if (!likely(lock_task_sighand(t, &_flags)))
+			goto ret;
+
+		pending = &t->signal->shared_pending;
+	}
 
 	ret = 1; /* the signal is ignored */
 	if (!prepare_signal(sig, t, 0))
@@ -1511,12 +1616,18 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
 	q->info.si_overrun = 0;
 
 	signalfd_notify(t, sig);
-	pending = group ? &t->signal->shared_pending : &t->pending;
+
 	list_add_tail(&q->list, &pending->list);
 	sigaddset(&pending->signal, sig);
 	complete_signal(sig, t, group);
+
 out:
-	unlock_task_sighand(t, &flags);
+	if (shared_siglock)
+		unlock_task_sighand(t, &_flags);
+	else
+		spin_unlock(&t->siglock);
+
+	read_unlock_irqrestore(&t->sighand->action_lock, flags);
 ret:
 	return ret;
 }
@@ -1578,7 +1689,8 @@ int do_notify_parent(struct task_struct *tsk, int sig)
 	}
 
 	psig = tsk->parent->sighand;
-	spin_lock_irqsave(&psig->siglock, flags);
+	read_lock_irqsave(&psig->action_lock, flags);
+	spin_lock(&psig->siglock);
 	if (!task_ptrace(tsk) && sig == SIGCHLD &&
 	    (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN ||
 	     (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) {
@@ -1604,7 +1716,8 @@ int do_notify_parent(struct task_struct *tsk, int sig)
 	if (valid_signal(sig) && sig > 0)
 		__group_send_sig_info(sig, &info, tsk->parent);
 	__wake_up_parent(tsk, tsk->parent);
-	spin_unlock_irqrestore(&psig->siglock, flags);
+	spin_unlock(&psig->siglock);
+	read_unlock_irqrestore(&psig->action_lock, flags);
 
 	return ret;
 }
@@ -1666,7 +1779,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  	}
 
 	sighand = parent->sighand;
-	spin_lock_irqsave(&sighand->siglock, flags);
+	read_lock_irqsave(&sighand->action_lock, flags);
+	spin_lock(&sighand->siglock);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
 		__group_send_sig_info(SIGCHLD, &info, parent);
@@ -1674,7 +1788,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
 	__wake_up_parent(tsk, parent);
-	spin_unlock_irqrestore(&sighand->siglock, flags);
+	spin_unlock(&sighand->siglock);
+	read_unlock_irqrestore(&sighand->action_lock, flags);
 }
 
 static inline int may_ptrace_stop(void)
@@ -2021,7 +2136,19 @@ static int ptrace_signal(int signr, siginfo_t *info,
 
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr)) {
+		/*
+		 * We're called with the shared siglock held but if
+		 * we're going to modify the private signal queue we
+		 * also need to acquire the per-thread siglock.
+		 */
+		if (!sig_shared(signr) && !sig_fatal(current, signr))
+			spin_lock(&current->siglock);
+
 		specific_send_sig_info(signr, info, current);
+
+		if (!sig_shared(signr) && !sig_fatal(current, signr))
+			spin_unlock(&current->siglock);
+
 		signr = 0;
 	}
 
@@ -2126,8 +2253,11 @@ relock:
 			/* Run the handler.  */
 			*return_ka = *ka;
 
-			if (ka->sa.sa_flags & SA_ONESHOT)
+			if (ka->sa.sa_flags & SA_ONESHOT) {
+				write_lock(&sighand->action_lock);
 				ka->sa.sa_handler = SIG_DFL;
+				write_unlock(&sighand->action_lock);
+			}
 
 			break; /* will return non-zero "signr" value */
 		}
@@ -2157,9 +2287,8 @@ relock:
 			 * The default action is to stop all threads in
 			 * the thread group.  The job control signals
 			 * do nothing in an orphaned pgrp, but SIGSTOP
-			 * always works.  Note that siglock needs to be
-			 * dropped during the call to is_orphaned_pgrp()
-			 * because of lock ordering with tasklist_lock.
+			 * always works.  Note that we are not holding
+			 * siglock during the call to is_orphaned_pgrp().
 			 * This allows an intervening SIGCONT to be posted.
 			 * We need to check for that and bail out if necessary.
 			 */
@@ -2374,8 +2503,10 @@ long do_sigpending(void __user *set, unsigned long sigsetsize)
 		goto out;
 
 	spin_lock_irq(&current->sighand->siglock);
+	spin_lock(&current->siglock);
 	sigorsets(&pending, &current->pending.signal,
 		  &current->signal->shared_pending.signal);
+	spin_unlock(&current->siglock);
 	spin_unlock_irq(&current->sighand->siglock);
 
 	/* Outside the lock because only this thread touches it.  */
@@ -2691,9 +2822,10 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
 	if (!valid_signal(sig) || sig < 1 || (act && sig_kernel_only(sig)))
 		return -EINVAL;
 
+	read_lock_irq(&t->sighand->action_lock);
 	k = &t->sighand->action[sig-1];
 
-	spin_lock_irq(&current->sighand->siglock);
+	spin_lock(&current->sighand->siglock);
 	if (oact)
 		*oact = *k;
 
@@ -2717,13 +2849,16 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
 			sigaddset(&mask, sig);
 			rm_from_queue_full(&mask, &t->signal->shared_pending);
 			do {
+				spin_lock(&t->siglock);
 				rm_from_queue_full(&mask, &t->pending);
+				spin_unlock(&t->siglock);
 				t = next_thread(t);
 			} while (t != current);
 		}
 	}
 
-	spin_unlock_irq(&current->sighand->siglock);
+	spin_unlock(&current->sighand->siglock);
+	read_unlock_irq(&t->sighand->action_lock);
 	return 0;
 }
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6475e1f..91e2450 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2244,13 +2244,15 @@ static void selinux_bprm_committed_creds(struct linux_binprm *bprm)
 		memset(&itimer, 0, sizeof itimer);
 		for (i = 0; i < 3; i++)
 			do_setitimer(i, &itimer, NULL);
-		spin_lock_irq(&current->sighand->siglock);
+		write_lock_irq(&current->sighand->action_lock);
+		spin_lock(&current->sighand->siglock);
 		if (!(current->signal->flags & SIGNAL_GROUP_EXIT)) {
 			__flush_signals(current);
 			flush_signal_handlers(current, 1);
 			sigemptyset(&current->blocked);
 		}
-		spin_unlock_irq(&current->sighand->siglock);
+		spin_unlock(&current->sighand->siglock);
+		write_unlock_irq(&current->sighand->action_lock);
 	}
 
 	/* Wake up the parent if it is waiting so that it can recheck
-- 
1.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC][PATCH 3/5] ia64: Catch up with new sighand action spinlock
  2011-04-05 19:21 [RFC][PATCH 0/5] Improve signal delivery scalability Matt Fleming
  2011-04-05 19:21 ` [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' Matt Fleming
  2011-04-05 19:21 ` [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock Matt Fleming
@ 2011-04-05 19:21 ` Matt Fleming
  2011-04-05 19:21 ` [RFC][PATCH 4/5] signals: Introduce __dequeue_private_signal helper function Matt Fleming
  2011-04-05 19:21 ` [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery Matt Fleming
  4 siblings, 0 replies; 29+ messages in thread
From: Matt Fleming @ 2011-04-05 19:21 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin,
	Matt Fleming

From: Matt Fleming <matt.fleming@linux.intel.com>

Also delete the comment that says holding siglock might not be
necessary - it is, parts of signal code require that the handlers for
a sighand do not change.

Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com>
---
 arch/ia64/kernel/signal.c |   13 ++-----------
 1 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/ia64/kernel/signal.c b/arch/ia64/kernel/signal.c
index 7bdafc8..d3b37c4 100644
--- a/arch/ia64/kernel/signal.c
+++ b/arch/ia64/kernel/signal.c
@@ -308,18 +308,9 @@ force_sigsegv_info (int sig, void __user *addr)
 	struct siginfo si;
 
 	if (sig == SIGSEGV) {
-		/*
-		 * Acquiring siglock around the sa_handler-update is almost
-		 * certainly overkill, but this isn't a
-		 * performance-critical path and I'd rather play it safe
-		 * here than having to debug a nasty race if and when
-		 * something changes in kernel/signal.c that would make it
-		 * no longer safe to modify sa_handler without holding the
-		 * lock.
-		 */
-		spin_lock_irqsave(&current->sighand->siglock, flags);
+		write_lock_irqsave(&current->sighand->action_lock, flags);
 		current->sighand->action[sig - 1].sa.sa_handler = SIG_DFL;
-		spin_unlock_irqrestore(&current->sighand->siglock, flags);
+		write_unlock_irqrestore(&current->sighand->action_lock, flags);
 	}
 	si.si_signo = SIGSEGV;
 	si.si_errno = 0;
-- 
1.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC][PATCH 4/5] signals: Introduce __dequeue_private_signal helper function
  2011-04-05 19:21 [RFC][PATCH 0/5] Improve signal delivery scalability Matt Fleming
                   ` (2 preceding siblings ...)
  2011-04-05 19:21 ` [RFC][PATCH 3/5] ia64: Catch up with new sighand action spinlock Matt Fleming
@ 2011-04-05 19:21 ` Matt Fleming
  2011-04-05 19:21 ` [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery Matt Fleming
  4 siblings, 0 replies; 29+ messages in thread
From: Matt Fleming @ 2011-04-05 19:21 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin,
	Matt Fleming

From: Matt Fleming <matt.fleming@linux.intel.com>

To keep things symmetric with __dequeue_shared_signal() create a
__dequeue_private_signal() function which helps to hide the required
locking for accessing tsk->pending.

Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com>
---
 kernel/signal.c |   19 ++++++++++++++-----
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index c67f27a..db0ce0e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -642,6 +642,18 @@ static int __dequeue_shared_signal(struct task_struct *tsk,
 	return signr;
 }
 
+static int __dequeue_private_signal(struct task_struct *tsk,
+				    sigset_t *mask, siginfo_t *info)
+{
+	int signr;
+
+	spin_lock_irq(&tsk->siglock);
+	signr = __dequeue_signal(&tsk->pending, mask, info);
+	spin_unlock_irq(&tsk->siglock);
+
+	return signr;
+}
+
 /*
  * Dequeue a signal and return the element to the caller, which is 
  * expected to free it.
@@ -665,11 +677,8 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 	/* We only dequeue private signals from ourselves, we don't let
 	 * signalfd steal them
 	 */
-	if (!signr) {
-		spin_lock(&tsk->siglock);
-		signr = __dequeue_signal(&tsk->pending, mask, info);
-		spin_unlock(&tsk->siglock);
-	}
+	if (!signr)
+		signr = __dequeue_private_signal(tsk, mask, info);
 
 	recalc_sigpending();
 	if (!signr)
-- 
1.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery
  2011-04-05 19:21 [RFC][PATCH 0/5] Improve signal delivery scalability Matt Fleming
                   ` (3 preceding siblings ...)
  2011-04-05 19:21 ` [RFC][PATCH 4/5] signals: Introduce __dequeue_private_signal helper function Matt Fleming
@ 2011-04-05 19:21 ` Matt Fleming
  2011-04-13 20:12   ` Oleg Nesterov
  4 siblings, 1 reply; 29+ messages in thread
From: Matt Fleming @ 2011-04-05 19:21 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin,
	Matt Fleming

From: Matt Fleming <matt.fleming@linux.intel.com>

To reduce the contention on the shared siglock this patch pushes the
responsibility of acquiring and releasing the shared siglock down into
the functions that need it. That way, if we don't call a function that
needs to be run under the shared siglock, we can run without acquiring
it at all.

Note that this does not make signal delivery lockless. A signal must
still be dequeued from either the shared or private signal
queues. However, in the private signal case we can now get by with
just acquiring the per-thread siglock which massively reduces
contention on the shared siglock.

Also update tracehook.h to indicate it's not called with siglock held
anymore.

With this commit signal delivery now scales much better (though not
linearly with the number of threads) as can be seen in these two
graphs which execute the signal1 testcase from the Will It Scale
benchmark suite,

http://userweb.kernel.org/~mfleming/will-it-scale/signals-per-thread-siglock/

Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com>
---
 fs/signalfd.c             |    5 --
 include/linux/tracehook.h |    3 +-
 kernel/compat.c           |    7 ++-
 kernel/signal.c           |  124 +++++++++++++++++++++++++++------------------
 4 files changed, 79 insertions(+), 60 deletions(-)

diff --git a/fs/signalfd.c b/fs/signalfd.c
index ed49c40..a9d5c6f 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -146,7 +146,6 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, siginfo_t *info,
 	ssize_t ret;
 	DECLARE_WAITQUEUE(wait, current);
 
-	spin_lock_irq(&current->sighand->siglock);
 	ret = dequeue_signal(current, &ctx->sigmask, info);
 	switch (ret) {
 	case 0:
@@ -154,7 +153,6 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, siginfo_t *info,
 			break;
 		ret = -EAGAIN;
 	default:
-		spin_unlock_irq(&current->sighand->siglock);
 		return ret;
 	}
 
@@ -168,11 +166,8 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, siginfo_t *info,
 			ret = -ERESTARTSYS;
 			break;
 		}
-		spin_unlock_irq(&current->sighand->siglock);
 		schedule();
-		spin_lock_irq(&current->sighand->siglock);
 	}
-	spin_unlock_irq(&current->sighand->siglock);
 
 	remove_wait_queue(&current->sighand->signalfd_wqh, &wait);
 	__set_current_state(TASK_RUNNING);
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index b073f3c..849336d 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -447,7 +447,7 @@ static inline int tracehook_force_sigpending(void)
  * @return_ka:		sigaction for synthetic signal
  *
  * Return zero to check for a real pending signal normally.
- * Return -1 after releasing the siglock to repeat the check.
+ * Return -1 to repeat the check.
  * Return a signal number to induce an artifical signal delivery,
  * setting *@info and *@return_ka to specify its details and behavior.
  *
@@ -458,7 +458,6 @@ static inline int tracehook_force_sigpending(void)
  * %SIGTSTP for stop unless in an orphaned pgrp), but the signal number
  * reported will be @info->si_signo instead.
  *
- * Called with @task->sighand->siglock held, before dequeuing pending signals.
  */
 static inline int tracehook_get_signal(struct task_struct *task,
 				       struct pt_regs *regs,
diff --git a/kernel/compat.c b/kernel/compat.c
index 38b1d2c..c7c5f9f 100644
--- a/kernel/compat.c
+++ b/kernel/compat.c
@@ -912,7 +912,6 @@ compat_sys_rt_sigtimedwait (compat_sigset_t __user *uthese,
 			return -EINVAL;
 	}
 
-	spin_lock_irq(&current->sighand->siglock);
 	sig = dequeue_signal(current, &s, &info);
 	if (!sig) {
 		timeout = MAX_SCHEDULE_TIMEOUT;
@@ -920,6 +919,7 @@ compat_sys_rt_sigtimedwait (compat_sigset_t __user *uthese,
 			timeout = timespec_to_jiffies(&t)
 				+(t.tv_sec || t.tv_nsec);
 		if (timeout) {
+			spin_lock_irq(&current->sighand->siglock);
 			current->real_blocked = current->blocked;
 			sigandsets(&current->blocked, &current->blocked, &s);
 
@@ -928,14 +928,15 @@ compat_sys_rt_sigtimedwait (compat_sigset_t __user *uthese,
 
 			timeout = schedule_timeout_interruptible(timeout);
 
-			spin_lock_irq(&current->sighand->siglock);
 			sig = dequeue_signal(current, &s, &info);
+
+			spin_lock_irq(&current->sighand->siglock);
 			current->blocked = current->real_blocked;
 			siginitset(&current->real_blocked, 0);
 			recalc_sigpending();
+			spin_unlock_irq(&current->sighand->siglock);
 		}
 	}
-	spin_unlock_irq(&current->sighand->siglock);
 
 	if (sig) {
 		ret = sig;
diff --git a/kernel/signal.c b/kernel/signal.c
index db0ce0e..d31a97a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -595,8 +595,7 @@ static int __dequeue_shared_signal(struct task_struct *tsk,
 {
 	int signr;
 
-	assert_spin_locked(&tsk->sighand->siglock);
-
+	spin_lock_irq(&tsk->sighand->siglock);
 	signr = __dequeue_signal(&tsk->signal->shared_pending,
 				 mask, info);
 	/*
@@ -639,6 +638,7 @@ static int __dequeue_shared_signal(struct task_struct *tsk,
 		current->group_stop |= GROUP_STOP_DEQUEUED;
 	}
 
+	spin_unlock_irq(&tsk->sighand->siglock);
 	return signr;
 }
 
@@ -657,12 +657,10 @@ static int __dequeue_private_signal(struct task_struct *tsk,
 /*
  * Dequeue a signal and return the element to the caller, which is 
  * expected to free it.
- *
- * All callers have to hold the siglock.
  */
 int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 {
-	int signr;
+	int signr = 0;
 
 	/*
 	 * Dequeue shared signals first since this is where SIGSTOP
@@ -672,7 +670,8 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 	 * the shared queue, e.g. we'd starve shared signals from
 	 * being serviced.
 	 */
-	signr = __dequeue_shared_signal(tsk, mask, info);
+	if (PENDING(&tsk->signal->shared_pending, &tsk->blocked))
+		signr = __dequeue_shared_signal(tsk, mask, info);
 
 	/* We only dequeue private signals from ourselves, we don't let
 	 * signalfd steal them
@@ -684,17 +683,9 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 	if (!signr)
 		return 0;
 
-	if ((info->si_code & __SI_MASK) == __SI_TIMER && info->si_sys_private) {
-		/*
-		 * Release the siglock to ensure proper locking order
-		 * of timer locks outside of siglocks.  Note, we leave
-		 * irqs disabled here, since the posix-timers code is
-		 * about to disable them again anyway.
-		 */
-		spin_unlock(&tsk->sighand->siglock);
+	if ((info->si_code & __SI_MASK) == __SI_TIMER && info->si_sys_private)
 		do_schedule_next_timer(info);
-		spin_lock(&tsk->sighand->siglock);
-	}
+
 	return signr;
 }
 
@@ -2119,6 +2110,7 @@ static int ptrace_signal(int signr, siginfo_t *info,
 	if (!task_ptrace(current))
 		return signr;
 
+	spin_lock_irq(&current->sighand->siglock);
 	ptrace_signal_deliver(regs, cookie);
 
 	/* Let the debugger run.  */
@@ -2127,7 +2119,7 @@ static int ptrace_signal(int signr, siginfo_t *info,
 	/* We're back.  Did the debugger cancel the sig?  */
 	signr = current->exit_code;
 	if (signr == 0)
-		return signr;
+		goto out;
 
 	current->exit_code = 0;
 
@@ -2161,35 +2153,36 @@ static int ptrace_signal(int signr, siginfo_t *info,
 		signr = 0;
 	}
 
+out:
+	spin_unlock_irq(&current->sighand->siglock);
 	return signr;
 }
 
-int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka,
-			  struct pt_regs *regs, void *cookie)
+static inline int __do_signal_stop(struct sighand_struct *sighand)
 {
-	struct sighand_struct *sighand = current->sighand;
-	struct signal_struct *signal = current->signal;
-	int signr;
+	int stopped = 0;
+	spin_lock_irq(&sighand->siglock);
+	if (current->group_stop & GROUP_STOP_PENDING)
+		stopped = do_signal_stop(0);
 
-relock:
-	/*
-	 * We'll jump back here after any time we were stopped in TASK_STOPPED.
-	 * While in TASK_STOPPED, we were considered "frozen enough".
-	 * Now that we woke up, it's crucial if we're supposed to be
-	 * frozen that we freeze now before running anything substantial.
-	 */
-	try_to_freeze();
+	if (!stopped)
+		spin_unlock_irq(&sighand->siglock);
+
+	return stopped;
+}
+
+static inline int __do_notify_parent(struct signal_struct *signal,
+				     struct sighand_struct *sighand)
+{
+	struct task_struct *leader;
+	int notified = 0;
+	int why;
 
-	spin_lock_irq(&sighand->siglock);
 	/*
-	 * Every stopped thread goes here after wakeup. Check to see if
-	 * we should notify the parent, prepare_signal(SIGCONT) encodes
-	 * the CLD_ si_code into SIGNAL_CLD_MASK bits.
+	 * We could have raced with another writer.
 	 */
-	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
-		struct task_struct *leader;
-		int why;
-
+	spin_lock_irq(&sighand->siglock);
+	if (signal->flags & SIGNAL_CLD_MASK) {
 		if (signal->flags & SIGNAL_CLD_CONTINUED)
 			why = CLD_CONTINUED;
 		else
@@ -2216,8 +2209,39 @@ relock:
 			do_notify_parent_cldstop(leader, true, why);
 
 		read_unlock(&tasklist_lock);
+		notified = 1;
+	} else {
+		spin_unlock_irq(&sighand->siglock);
+		notified = 0;
+	}
+
+	return notified;
+}
+
+int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka,
+			  struct pt_regs *regs, void *cookie)
+{
+	struct sighand_struct *sighand = current->sighand;
+	struct signal_struct *signal = current->signal;
+	int signr;
+
+relock:
+	/*
+	 * We'll jump back here after any time we were stopped in TASK_STOPPED.
+	 * While in TASK_STOPPED, we were considered "frozen enough".
+	 * Now that we woke up, it's crucial if we're supposed to be
+	 * frozen that we freeze now before running anything substantial.
+	 */
+	try_to_freeze();
 
-		goto relock;
+	/*
+	 * Every stopped thread goes here after wakeup. Check to see if
+	 * we should notify the parent, prepare_signal(SIGCONT) encodes
+	 * the CLD_ si_code into SIGNAL_CLD_MASK bits.
+	 */
+	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
+		if (__do_notify_parent(signal, sighand))
+			goto relock;
 	}
 
 	for (;;) {
@@ -2234,8 +2258,13 @@ relock:
 			ka = return_ka;
 		else {
 			if (unlikely(current->group_stop &
-				     GROUP_STOP_PENDING) && do_signal_stop(0))
-				goto relock;
+				     GROUP_STOP_PENDING)) {
+				int stopped;
+
+				stopped = __do_signal_stop(sighand);
+				if (stopped)
+					goto relock;
+			}
 
 			signr = dequeue_signal(current, &current->blocked,
 					       info);
@@ -2302,20 +2331,18 @@ relock:
 			 * We need to check for that and bail out if necessary.
 			 */
 			if (signr != SIGSTOP) {
-				spin_unlock_irq(&sighand->siglock);
-
 				/* signals can be posted during this window */
-
 				if (is_current_pgrp_orphaned())
 					goto relock;
 
-				spin_lock_irq(&sighand->siglock);
 			}
 
+			spin_lock_irq(&sighand->siglock);
 			if (likely(do_signal_stop(info->si_signo))) {
 				/* It released the siglock.  */
 				goto relock;
 			}
+			spin_unlock_irq(&sighand->siglock);
 
 			/*
 			 * We didn't actually stop, due to a race
@@ -2324,8 +2351,6 @@ relock:
 			continue;
 		}
 
-		spin_unlock_irq(&sighand->siglock);
-
 		/*
 		 * Anything else is fatal, maybe with a core dump.
 		 */
@@ -2351,7 +2376,6 @@ relock:
 		do_group_exit(info->si_signo);
 		/* NOTREACHED */
 	}
-	spin_unlock_irq(&sighand->siglock);
 	return signr;
 }
 
@@ -2640,7 +2664,6 @@ SYSCALL_DEFINE4(rt_sigtimedwait, const sigset_t __user *, uthese,
 			return -EINVAL;
 	}
 
-	spin_lock_irq(&current->sighand->siglock);
 	sig = dequeue_signal(current, &these, &info);
 	if (!sig) {
 		timeout = MAX_SCHEDULE_TIMEOUT;
@@ -2652,6 +2675,7 @@ SYSCALL_DEFINE4(rt_sigtimedwait, const sigset_t __user *, uthese,
 			/* None ready -- temporarily unblock those we're
 			 * interested while we are sleeping in so that we'll
 			 * be awakened when they arrive.  */
+			spin_lock_irq(&current->sighand->siglock);
 			current->real_blocked = current->blocked;
 			sigandsets(&current->blocked, &current->blocked, &these);
 			recalc_sigpending();
@@ -2664,9 +2688,9 @@ SYSCALL_DEFINE4(rt_sigtimedwait, const sigset_t __user *, uthese,
 			current->blocked = current->real_blocked;
 			siginitset(&current->real_blocked, 0);
 			recalc_sigpending();
+			spin_unlock_irq(&current->sighand->siglock);
 		}
 	}
-	spin_unlock_irq(&current->sighand->siglock);
 
 	if (sig) {
 		ret = sig;
-- 
1.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending'
  2011-04-05 19:21 ` [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' Matt Fleming
@ 2011-04-05 20:19   ` Oleg Nesterov
  2011-04-05 20:50     ` Matt Fleming
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2011-04-05 20:19 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

Hi Matt,

I'll try to study this series, but not before Friday, sorry.

Only one thing,

On 04/05, Matt Fleming wrote:
> 
> Because SIGCONT and SIGSTOP affect an entire thread group,

Yes, the effect is global, but

> we can
> place them on the 'shared_pending' queue.

I don't think we can.

 -	pending = group ? &t->signal->shared_pending : &t->pending;
> +	/*
> +	 * We always enqueue SIGSTOP or SIGCONT signals on the shared
> +	 * queue. This means that a SIGSTOP or SIGCONT signal _cannot_
> +	 * be present on a thread's private pending queue.
> +	 *
> +	 * This makes prepare_signal() more optimal as we do not have
> +	 * to remove signals from each thread's pending queue and so
> +	 * can avoid iterating over all threads in the thread group
> +	 * (and therefore avoid the locking that would be necessary to
> +	 * do that safely).
> +	 */
> +	if (group || sig_kernel_stop(sig) || sig == SIGCONT)
> +		pending = &t->signal->shared_pending;
> +	else
> +		pending = &t->pending;

How so? Suppose the process has a handler for SIGCONT. Suppose this
process is not stopped. tkill(SIGCONT) should deliver the signal to
the right thread.

SIGSTOP can't have the handler, still we shouldn't place it on the
shared list, debuggers won't be happy.

Also. This code was changed very much, please do these changes on
top of
http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=refs/heads/ptrace

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending'
  2011-04-05 20:19   ` Oleg Nesterov
@ 2011-04-05 20:50     ` Matt Fleming
  2011-04-06 12:57       ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Matt Fleming @ 2011-04-05 20:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

On Tue, 5 Apr 2011 22:19:58 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> Hi Matt,
> 
> I'll try to study this series, but not before Friday, sorry.

No problem!

> Only one thing,
> 
> On 04/05, Matt Fleming wrote:
> > 
> > Because SIGCONT and SIGSTOP affect an entire thread group,
> 
> Yes, the effect is global, but
> 
> > we can
> > place them on the 'shared_pending' queue.
> 
> I don't think we can.
> 
>  -	pending = group ? &t->signal->shared_pending : &t->pending;
> > +	/*
> > +	 * We always enqueue SIGSTOP or SIGCONT signals on the
> > shared
> > +	 * queue. This means that a SIGSTOP or SIGCONT signal
> > _cannot_
> > +	 * be present on a thread's private pending queue.
> > +	 *
> > +	 * This makes prepare_signal() more optimal as we do not
> > have
> > +	 * to remove signals from each thread's pending queue and
> > so
> > +	 * can avoid iterating over all threads in the thread group
> > +	 * (and therefore avoid the locking that would be
> > necessary to
> > +	 * do that safely).
> > +	 */
> > +	if (group || sig_kernel_stop(sig) || sig == SIGCONT)
> > +		pending = &t->signal->shared_pending;
> > +	else
> > +		pending = &t->pending;
> 
> How so? Suppose the process has a handler for SIGCONT. Suppose this
> process is not stopped. tkill(SIGCONT) should deliver the signal to
> the right thread.

D'oh, yes. I think I got confused here. You're right, this won't work.
 
> SIGSTOP can't have the handler, still we shouldn't place it on the
> shared list, debuggers won't be happy.

Urgh, debuggers actually peek at shared_pending and pending?

> Also. This code was changed very much, please do these changes on
> top of
> http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=refs/heads/ptrace

My patches are already based on that tree.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending'
  2011-04-05 20:50     ` Matt Fleming
@ 2011-04-06 12:57       ` Oleg Nesterov
  2011-04-06 13:09         ` Tejun Heo
  2011-04-06 13:15         ` Matt Fleming
  0 siblings, 2 replies; 29+ messages in thread
From: Oleg Nesterov @ 2011-04-06 12:57 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

On 04/05, Matt Fleming wrote:
>
> On Tue, 5 Apr 2011 22:19:58 +0200
>
> > SIGSTOP can't have the handler, still we shouldn't place it on the
> > shared list, debuggers won't be happy.
>
> Urgh, debuggers actually peek at shared_pending and pending?

Argh, sorry, I was not clear.

First of all, only SIGSTOP is never delivered to user-space, other
sig_kernel_stop() signals can have a handler and in this case, say,
SIGTTIN doesn't stop but acts like the normal signal. This means you
can't put it into shared_pending.

But even SIGSTOP should be routed properly. If the process is ptraced,
the tracee reports SIGSTOP to the debugger first. This means that
tkill(SIGSTOP) should be delivered to the right target.

> > Also. This code was changed very much, please do these changes on
> > top of
> > http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=refs/heads/ptrace
>
> My patches are already based on that tree.

Ah, good.

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending'
  2011-04-06 12:57       ` Oleg Nesterov
@ 2011-04-06 13:09         ` Tejun Heo
  2011-04-06 13:30           ` Matt Fleming
  2011-04-06 13:15         ` Matt Fleming
  1 sibling, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2011-04-06 13:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Fleming, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

Hey, guys.

On Wed, Apr 06, 2011 at 02:57:57PM +0200, Oleg Nesterov wrote:
> But even SIGSTOP should be routed properly. If the process is ptraced,
> the tracee reports SIGSTOP to the debugger first. This means that
> tkill(SIGSTOP) should be delivered to the right target.

I think the more important part is that there really isn't much point
in optimizing SIGSTOP/CONT.  They inherently involve heavy,
walk-every-thread operations of putting them to sleep and reversing it
and there isn't much point in optimizing sending SIGSTOP to stopped
processes or CONT to running ones.  In addition, STOP/CONT interaction
is already scary enough so I'd like to avoid adding complexities there
if at all possible.

I think it would be better to concentrate on more usual signals.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending'
  2011-04-06 12:57       ` Oleg Nesterov
  2011-04-06 13:09         ` Tejun Heo
@ 2011-04-06 13:15         ` Matt Fleming
  2011-04-11 18:50           ` Oleg Nesterov
  1 sibling, 1 reply; 29+ messages in thread
From: Matt Fleming @ 2011-04-06 13:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

On Wed, 6 Apr 2011 14:57:57 +0200
Oleg Nesterov <oleg@redhat.com> wrote:
 
> Argh, sorry, I was not clear.
> 
> First of all, only SIGSTOP is never delivered to user-space, other
> sig_kernel_stop() signals can have a handler and in this case, say,
> SIGTTIN doesn't stop but acts like the normal signal. This means you
> can't put it into shared_pending.

Yeah, good point. This patch definitely needs dropping from the series.

> But even SIGSTOP should be routed properly. If the process is ptraced,
> the tracee reports SIGSTOP to the debugger first. This means that
> tkill(SIGSTOP) should be delivered to the right target.

Right, thanks.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending'
  2011-04-06 13:09         ` Tejun Heo
@ 2011-04-06 13:30           ` Matt Fleming
  0 siblings, 0 replies; 29+ messages in thread
From: Matt Fleming @ 2011-04-06 13:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

On Wed, 6 Apr 2011 06:09:28 -0700
Tejun Heo <tj@kernel.org> wrote:

> Hey, guys.
> 
> On Wed, Apr 06, 2011 at 02:57:57PM +0200, Oleg Nesterov wrote:
> > But even SIGSTOP should be routed properly. If the process is
> > ptraced, the tracee reports SIGSTOP to the debugger first. This
> > means that tkill(SIGSTOP) should be delivered to the right target.
> 
> I think the more important part is that there really isn't much point
> in optimizing SIGSTOP/CONT.  They inherently involve heavy,
> walk-every-thread operations of putting them to sleep and reversing it
> and there isn't much point in optimizing sending SIGSTOP to stopped
> processes or CONT to running ones.  In addition, STOP/CONT interaction
> is already scary enough so I'd like to avoid adding complexities there
> if at all possible.
> 
> I think it would be better to concentrate on more usual signals.

Fair point. Note that none of the other patches try to optimize
SIGSTOP/CONT paths.

This patch was also my attempt to make my brain not explode while
figuring out the locking order. This was the first patch I wrote in
the series and it was before I'd decided on the order. In other words,
I was trying to eliminate any code where we'd do,

    tsk->sighand->action_lock
        tsk->siglock
            [Dequeue STOP signal from tsk->pending]
                tsk->sighand->siglock

because that complicates the locking order and this patch seemed like a
worthwhile cleanup. As it turns out, it's not a worthwhile/correct
cleanup so I'll have to think how I can handle those paths safely with
a different locking order.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending'
  2011-04-06 13:15         ` Matt Fleming
@ 2011-04-11 18:50           ` Oleg Nesterov
  2011-04-11 19:24             ` Matt Fleming
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2011-04-11 18:50 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

On 04/06, Matt Fleming wrote:
>
> Yeah, good point. This patch definitely needs dropping from the series.

Yes.

Sorry for delay again. I'll try to read the rest of this series tomorrow,
I can no longer look at signal.c to today.

Oh, these changes do not look trivial. I must admit, I am a bit sceptical,
but this is only my first impression based on amount on changes and more
complex locking.

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending'
  2011-04-11 18:50           ` Oleg Nesterov
@ 2011-04-11 19:24             ` Matt Fleming
  0 siblings, 0 replies; 29+ messages in thread
From: Matt Fleming @ 2011-04-11 19:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

On Mon, 11 Apr 2011 20:50:05 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> Sorry for delay again. I'll try to read the rest of this series
> tomorrow, I can no longer look at signal.c to today.

No problem.

> Oh, these changes do not look trivial. I must admit, I am a bit
> sceptical, but this is only my first impression based on amount on
> changes and more complex locking.

Yeah, the locking is more complex and I suspect that because I'm
dropping this patch the next iteration might become even more complex.
However, I definitely think the complexity is worth it because of how
much better signal delivery scales with these patches.

If you find anything that you think is overly complex I'm sure I could
clean it up with either comments or refactoring the code.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock
  2011-04-05 19:21 ` [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock Matt Fleming
@ 2011-04-13 19:42   ` Oleg Nesterov
  2011-04-14 10:34     ` Matt Fleming
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2011-04-13 19:42 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

Matt, sorry for huge delay.

On 04/05, Matt Fleming wrote:
>
> Try to reduce the amount of time we hold the shared siglock by
> introducing a per-thread siglock that protects each thread's 'pending'
> queue. Currently, the shared siglock protects both the shared and
> private signal queues. As such, it is a huge point of contention when
> generating and delivering signals to single threads in a multithreaded
> process. For example, even if a thread is delivering a signal to
> itself (thereby putting a signal on its private signal queue) it must
> acquire the shared siglock.
>
> To improve signal generation scalability we only acquire the shared
> siglock when generating a shared signal. If we're generating a private
> signal which will be delivered to a specific thread (and that signal
> does not affect an entire thread group) then we can optimise that case
> by only taking the per-thread siglock. However, the case where we're
> sending a fatal signal to a specific thread is special because we need
> to modify tsk->signal->flags, so we need to be holding the shared
> siglock.
>
> Note that we now hold both the shared and per-thread siglock when
> delivering a private signal. That will be fixed in the next patch so
> that signal delivery scales with signal generation.
>
> Also, because we now run signal code paths without holding the shared
> siglock at all, it can no longer be used to protect tsk->sighand->action.
> So we introduce a rwlock for protecting tsk->sighand->action. A rwlock
> is better than a spinlock in this case because there are many more
> readers than writers and a rwlock allows us to scale much better than
> a spinlock.

So. This complicates the locking enormously. I must admit, I dislike
this very much. Yes, the signals are slow... But this returns us to the
original question: do we really care? Of course, it is always nice to
makes the things faster, but I am not sure the added complexity worth
the trouble.

Random example,

	int
	force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
	{
		unsigned long int flags;
		int ret, blocked, ignored;
		struct k_sigaction *action;
		int shared_siglock;

		write_lock_irqsave(&t->sighand->action_lock, flags);
		spin_lock(&t->sighand->siglock);
		action = &t->sighand->action[sig-1];
		ignored = action->sa.sa_handler == SIG_IGN;
		blocked = sigismember(&t->blocked, sig);
		if (blocked || ignored) {
			action->sa.sa_handler = SIG_DFL;
			if (blocked) {
				sigdelset(&t->blocked, sig);
				recalc_sigpending_and_wake(t);
			}
		}
		if (action->sa.sa_handler == SIG_DFL)
			t->signal->flags &= ~SIGNAL_UNKILLABLE;

		shared_siglock = sig_shared(sig) || sig_fatal(t, sig);
		if (!shared_siglock) {
			spin_unlock(&t->sighand->siglock);
			spin_lock(&t->siglock);
		}

		ret = specific_send_sig_info(sig, info, t);

		if (!shared_siglock)
			spin_unlock(&t->siglock);
		else
			spin_unlock(&t->sighand->siglock);

		write_unlock_irqrestore(&t->sighand->action_lock, flags);
		return ret;
	}

This doesn't look very nice ;)

To me personally, the only "real" performance problem is kill-from-irq
(posix timers is the worst case), this should be solved somehow but
these changes can't help...

Anyway, the patches do not look right.

> +/* Should the signal be placed on shared_pending? */
> +#define sig_shared(sig)	\
> +	(((sig) < SIGRTMIN) && \
> +	 siginmask(sig, SIG_KERNEL_STOP_MASK | rt_sigmask(SIGCONT)))

OK, this is wrong but we already discussed this.

> @@ -142,7 +142,9 @@ static void __exit_signal(struct task_struct *tsk)
>  	 * Do this under ->siglock, we can race with another thread
>  	 * doing sigqueue_free() if we have SIGQUEUE_PREALLOC signals.
>  	 */
> +	spin_lock(&tsk->siglock);
>  	flush_sigqueue(&tsk->pending);
> +	spin_unlock(&tsk->siglock);

This only protects flush_sigqueue(), but this is not enough.

tkill() can run without ->siglock held, how can it ensure the target
can't exit before we take task->siglock?


> @@ -976,7 +1022,12 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
>  	 * (and therefore avoid the locking that would be necessary to
>  	 * do that safely).
>  	 */
> -	if (group || sig_kernel_stop(sig) || sig == SIGCONT)
> +	if (group || sig_shared(sig) || sig_fatal(t, sig))
> +		assert_spin_locked(&t->sighand->siglock);
> +	else
> +		assert_spin_locked(&t->siglock);
> +
> +	if (group || sig_shared(sig))
>  		pending = &t->signal->shared_pending;
>  	else
>  		pending = &t->pending;

Well. But later then we call complete_signal()->signal_wake_up(). And this
needs ->siglock. Now I recall why it is needed. Obviously, to serialize with
recalc_sigpending().

Note also that if you use task->siglock for sigaddset(t->pending), then
recalc_sigpending() should take this lock as well.

> @@ -1123,11 +1175,34 @@ int do_send_sig_info(int sig, struct siginfo *info, struct task_struct *p,
>  	unsigned long flags;
>  	int ret = -ESRCH;
>
> -	if (lock_task_sighand(p, &flags)) {
> +	read_lock_irqsave(&p->sighand->action_lock, flags);

How so? This is unsafe, p can exit, ->sighand can be NULL. Or exec can
change ->sighand, we can take the wrong lock.

And there are more wrong changes like this...

> @@ -1485,14 +1580,24 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
>  {
>  	int sig = q->info.si_signo;
>  	struct sigpending *pending;
> -	unsigned long flags;
> +	unsigned long flags, _flags;
> +	int shared_siglock;
>  	int ret;
>
>  	BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
>
>  	ret = -1;
> -	if (!likely(lock_task_sighand(t, &flags)))
> -		goto ret;
> +	read_lock_irqsave(&t->sighand->action_lock, flags);
> +	shared_siglock = group || sig_shared(sig) || sig_fatal(t, sig);
> +	if (!shared_siglock) {
> +		spin_lock(&t->siglock);
> +		pending = &t->pending;
> +	} else {
> +		if (!likely(lock_task_sighand(t, &_flags)))
> +			goto ret;
> +
> +		pending = &t->signal->shared_pending;

This looks wrong wrong. We shouldn't do s/pending/shared_pending/ if
sig_fatal(). The signal can be blocked the this task can be ptraced.

> @@ -1666,7 +1779,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>   	}
>
>  	sighand = parent->sighand;
> -	spin_lock_irqsave(&sighand->siglock, flags);
> +	read_lock_irqsave(&sighand->action_lock, flags);
> +	spin_lock(&sighand->siglock);

Why do we need both? (the same for do_notify_parent)

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery
  2011-04-05 19:21 ` [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery Matt Fleming
@ 2011-04-13 20:12   ` Oleg Nesterov
  2011-04-14 10:57     ` Matt Fleming
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2011-04-13 20:12 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

On 04/05, Matt Fleming wrote:
>
> To reduce the contention on the shared siglock this patch pushes the
> responsibility of acquiring and releasing the shared siglock down into
> the functions that need it. That way, if we don't call a function that
> needs to be run under the shared siglock, we can run without acquiring
> it at all.

This adds new races. And this time I do not even understand the intent.
I mean, it is not clear to me why this change can really help to speed
up get_signal_to_deliver().

> Note that this does not make signal delivery lockless. A signal must
> still be dequeued from either the shared or private signal
> queues. However, in the private signal case we can now get by with
> just acquiring the per-thread siglock

OK, we can dequeue the signal. But dequeue_signal()->recalc_sigpending()
becomes even more wrong. We do not hold any lock, we can race with both
shared/private signal sending.

> Also update tracehook.h to indicate it's not called with siglock held
> anymore.

Heh. This breaks this tracehook completely ;) OK, nobody cares about
the out-of-tree users, forget.

Also. get_signal_to_deliver() does

		signr = dequeue_signal(current, &current->blocked,
					       info);
		...

		ka = &sighand->action[signr-1];

		...

		if (ka->sa.sa_handler != SIG_DFL) {
			/* Run the handler.  */
			*return_ka = *ka;

This memcpy() can race with sys_rt_sigaction(), we can't read *ka
atomically.

Actually, even SIG_DFL/SIG_IGN checks can race, although this is minor...
But still not correct.

			if (ka->sa.sa_flags & SA_ONESHOT) {
				write_lock(&sighand->action_lock);
				ka->sa.sa_handler = SIG_DFL;
				write_unlock(&sighand->action_lock);

We should check SA_ONESHOT under ->action_lock. But even then this
will bw racy, although we can probably ignore this... Suppose that
SA_ONESHOT was set after we dequeued the signal.

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock
  2011-04-13 19:42   ` Oleg Nesterov
@ 2011-04-14 10:34     ` Matt Fleming
  2011-04-14 19:00       ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Matt Fleming @ 2011-04-14 10:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

On Wed, 13 Apr 2011 21:42:31 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> Matt, sorry for huge delay.

No problem. Thanks for the review!
 
> So. This complicates the locking enormously. I must admit, I dislike
> this very much. Yes, the signals are slow... But this returns us to the
> original question: do we really care? Of course, it is always nice to
> makes the things faster, but I am not sure the added complexity worth
> the trouble.

Well, it's not really that signals are slow (though that might be
true!), it's more that they don't scale. So, this patch series was not
designed to speed up signals in a single-threaded app, but rather to
make signals scale better in multithreaded apps. We do that by
reducing the contention on the shared siglock. Signal delivery isn't
getting any faster with these patches, they just try to stop it getting
slower when you add more threads ;-)

As for the complexity... OK, point taken. Since we've already
established that the patch series doesn't work as-is because the first
patch is bogus, I'll treat the rest of the review as an academic
exercise in helping to highlight any of my misunderstandings about the
signal code. If/when I respin this series I'll certainly try to keep it
simpler.

> Random example,
> 
> 	int
> 	force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
> 	{
> 		unsigned long int flags;
> 		int ret, blocked, ignored;
> 		struct k_sigaction *action;
> 		int shared_siglock;
> 
> 		write_lock_irqsave(&t->sighand->action_lock, flags);
> 		spin_lock(&t->sighand->siglock);
> 		action = &t->sighand->action[sig-1];
> 		ignored = action->sa.sa_handler == SIG_IGN;
> 		blocked = sigismember(&t->blocked, sig);
> 		if (blocked || ignored) {
> 			action->sa.sa_handler = SIG_DFL;
> 			if (blocked) {
> 				sigdelset(&t->blocked, sig);
> 				recalc_sigpending_and_wake(t);
> 			}
> 		}
> 		if (action->sa.sa_handler == SIG_DFL)
> 			t->signal->flags &= ~SIGNAL_UNKILLABLE;
> 
> 		shared_siglock = sig_shared(sig) || sig_fatal(t, sig);
> 		if (!shared_siglock) {
> 			spin_unlock(&t->sighand->siglock);
> 			spin_lock(&t->siglock);
> 		}
> 
> 		ret = specific_send_sig_info(sig, info, t);
> 
> 		if (!shared_siglock)
> 			spin_unlock(&t->siglock);
> 		else
> 			spin_unlock(&t->sighand->siglock);
> 
> 		write_unlock_irqrestore(&t->sighand->action_lock, flags);
> 		return ret;
> 	}
> 
> This doesn't look very nice ;)

Yeah, not my finest work ;-)

But seriously, you're not wrong. This code was written this way as a
result of the locking design, which suggests I need to go back to the
drawing board and come up with something simpler.

> To me personally, the only "real" performance problem is kill-from-irq
> (posix timers is the worst case), this should be solved somehow but
> these changes can't help...

True.

> > @@ -142,7 +142,9 @@ static void __exit_signal(struct task_struct *tsk)
> >  	 * Do this under ->siglock, we can race with another thread
> >  	 * doing sigqueue_free() if we have SIGQUEUE_PREALLOC signals.
> >  	 */
> > +	spin_lock(&tsk->siglock);
> >  	flush_sigqueue(&tsk->pending);
> > +	spin_unlock(&tsk->siglock);
> 
> This only protects flush_sigqueue(), but this is not enough.
> 
> tkill() can run without ->siglock held, how can it ensure the target
> can't exit before we take task->siglock?

And by "target can't exit" you mean, how can we ensure that the target
doesn't execute __exit_signal() and set tsk->signal to NULL if we're
not holding the shared siglock in the tkill() path? Good point, that's
totally a bug.

While we're discussing the technique for stopping tasks from exiting...

Is there a reason that a short-term reference counter isn't used to
prevent this, instead of taking the siglock? Obviously, the siglock
already exists and we would have to add a new atomic_t to sighand for
the short-term ref, but is there any other fundamental reason why the
siglock acquiring method is better? Maybe a short-term ref would make
things more complicated? I dunno. siglock does seem to be used for a
few different things at the moment though, so currently things are
already quite complicated.

Of course, the reason I'm asking is that this entire series relies on
avoiding acquiring the shared siglock wherever possible. As you've
pointed out above, by not acquiring the shared siglock in the tkill()
path we can race with the task exiting, and somehow I need to fix that.
Though note that I don't think it's just the tkill() path that is the
problem here, it's signal generation in general, i.e. anybody who
calls send_signal() without the shared siglock held.

> > @@ -976,7 +1022,12 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
> >  	 * (and therefore avoid the locking that would be necessary to
> >  	 * do that safely).
> >  	 */
> > -	if (group || sig_kernel_stop(sig) || sig == SIGCONT)
> > +	if (group || sig_shared(sig) || sig_fatal(t, sig))
> > +		assert_spin_locked(&t->sighand->siglock);
> > +	else
> > +		assert_spin_locked(&t->siglock);
> > +
> > +	if (group || sig_shared(sig))
> >  		pending = &t->signal->shared_pending;
> >  	else
> >  		pending = &t->pending;
> 
> Well. But later then we call complete_signal()->signal_wake_up(). And this
> needs ->siglock. Now I recall why it is needed. Obviously, to serialize with
> recalc_sigpending().

Ah, it's the set_tsk_thread_flag(t, TIF_SIGPENDING) that we need to do
atomically. Right, damn, that's a race.

> Note also that if you use task->siglock for sigaddset(t->pending), then
> recalc_sigpending() should take this lock as well.

Right.

> > @@ -1123,11 +1175,34 @@ int do_send_sig_info(int sig, struct siginfo *info, struct task_struct *p,
> >  	unsigned long flags;
> >  	int ret = -ESRCH;
> >
> > -	if (lock_task_sighand(p, &flags)) {
> > +	read_lock_irqsave(&p->sighand->action_lock, flags);
> 
> How so? This is unsafe, p can exit, ->sighand can be NULL. Or exec can
> change ->sighand, we can take the wrong lock.
> 
> And there are more wrong changes like this...

Yeah, this is a fundamental problem with this series.
 
> > @@ -1485,14 +1580,24 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
> >  {
> >  	int sig = q->info.si_signo;
> >  	struct sigpending *pending;
> > -	unsigned long flags;
> > +	unsigned long flags, _flags;
> > +	int shared_siglock;
> >  	int ret;
> >
> >  	BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
> >
> >  	ret = -1;
> > -	if (!likely(lock_task_sighand(t, &flags)))
> > -		goto ret;
> > +	read_lock_irqsave(&t->sighand->action_lock, flags);
> > +	shared_siglock = group || sig_shared(sig) || sig_fatal(t, sig);
> > +	if (!shared_siglock) {
> > +		spin_lock(&t->siglock);
> > +		pending = &t->pending;
> > +	} else {
> > +		if (!likely(lock_task_sighand(t, &_flags)))
> > +			goto ret;
> > +
> > +		pending = &t->signal->shared_pending;
> 
> This looks wrong wrong. We shouldn't do s/pending/shared_pending/ if
> sig_fatal(). The signal can be blocked the this task can be ptraced.

D'oh, yes, my bad. I've compounded the case of picking the pending
queue with picking the siglock, that's wrong. OK, maybe this new
locking is complicated ;-)
 
> > @@ -1666,7 +1779,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
> >   	}
> >
> >  	sighand = parent->sighand;
> > -	spin_lock_irqsave(&sighand->siglock, flags);
> > +	read_lock_irqsave(&sighand->action_lock, flags);
> > +	spin_lock(&sighand->siglock);
> 
> Why do we need both? (the same for do_notify_parent)

We need action_lock because we're reading sighand->action and making
decisions based upon its value, so we need it to not change. Also,
__send_signal() needs us to have the action_lock at least for reading
because we call sig_fatal() in complete_signal(), which checks tsk's
action handlers, so again, we need to prevent the handlers from being
modified.

We need sighand->siglock because we're sending a group signal, and
therefore modifying signal->shared_pending.

The answer also applies to do_notify_parent().

Thanks for the review!

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery
  2011-04-13 20:12   ` Oleg Nesterov
@ 2011-04-14 10:57     ` Matt Fleming
  2011-04-14 19:20       ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Matt Fleming @ 2011-04-14 10:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

On Wed, 13 Apr 2011 22:12:19 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 04/05, Matt Fleming wrote:
> >
> > To reduce the contention on the shared siglock this patch pushes the
> > responsibility of acquiring and releasing the shared siglock down into
> > the functions that need it. That way, if we don't call a function that
> > needs to be run under the shared siglock, we can run without acquiring
> > it at all.
> 
> This adds new races. And this time I do not even understand the intent.
> I mean, it is not clear to me why this change can really help to speed
> up get_signal_to_deliver().

Again, it's not necessarily speeding up get_signal_to_deliver(), but
rather it's reducing the contention on the shared siglock.

For example, without this patch, if you've got someone sending a signal
to a task group, you can't run get_signal_to_deliver() in parallel
because you'll be waiting for the sending thread to release the shared
siglock. Which, if you were going to dequeue a private signal anyway
and didn't need to access signal->shared_pending, is unnecessary
overhead :-(

As it turns out, the shared siglock protects more than just
signal->shared_pending, so in certain cases you need to acquire it
anyway (like the fatal signal code paths) so this isn't as optimised
as it could be, which is a shame.

> > Note that this does not make signal delivery lockless. A signal must
> > still be dequeued from either the shared or private signal
> > queues. However, in the private signal case we can now get by with
> > just acquiring the per-thread siglock
> 
> OK, we can dequeue the signal. But dequeue_signal()->recalc_sigpending()
> becomes even more wrong. We do not hold any lock, we can race with both
> shared/private signal sending.

Yep, this was covered in the previous patch review.
 
> > Also update tracehook.h to indicate it's not called with siglock held
> > anymore.
> 
> Heh. This breaks this tracehook completely ;) OK, nobody cares about
> the out-of-tree users, forget.

I was hoping you'd say that ;-)

> Also. get_signal_to_deliver() does
> 
> 		signr = dequeue_signal(current, &current->blocked,
> 					       info);
> 		...
> 
> 		ka = &sighand->action[signr-1];
> 
> 		...
> 
> 		if (ka->sa.sa_handler != SIG_DFL) {
> 			/* Run the handler.  */
> 			*return_ka = *ka;
> 
> This memcpy() can race with sys_rt_sigaction(), we can't read *ka
> atomically.

Eek! I hadn't noticed that. Thanks.
 
> Actually, even SIG_DFL/SIG_IGN checks can race, although this is minor...
> But still not correct.
> 
> 			if (ka->sa.sa_flags & SA_ONESHOT) {
> 				write_lock(&sighand->action_lock);
> 				ka->sa.sa_handler = SIG_DFL;
> 				write_unlock(&sighand->action_lock);
> 
> We should check SA_ONESHOT under ->action_lock. But even then this
> will bw racy, although we can probably ignore this... Suppose that
> SA_ONESHOT was set after we dequeued the signal.

Right, most of this side is wrong wrt to the action_lock.

Thanks Oleg.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock
  2011-04-14 10:34     ` Matt Fleming
@ 2011-04-14 19:00       ` Oleg Nesterov
  2011-04-16 13:08         ` Matt Fleming
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2011-04-14 19:00 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

On 04/14, Matt Fleming wrote:
>
> Well, it's not really that signals are slow (though that might be
> true!), it's more that they don't scale. So, this patch series was not
> designed to speed up signals in a single-threaded app, but rather to
> make signals scale better in multithreaded apps. We do that by
> reducing the contention on the shared siglock. Signal delivery isn't
> getting any faster with these patches, they just try to stop it getting
> slower when you add more threads ;-)

Yes, I understand. Even the private signal needs the "global" per-process
lock bit it is rwlock.

> > > @@ -142,7 +142,9 @@ static void __exit_signal(struct task_struct *tsk)
> > >  	 * Do this under ->siglock, we can race with another thread
> > >  	 * doing sigqueue_free() if we have SIGQUEUE_PREALLOC signals.
> > >  	 */
> > > +	spin_lock(&tsk->siglock);
> > >  	flush_sigqueue(&tsk->pending);
> > > +	spin_unlock(&tsk->siglock);
> >
> > This only protects flush_sigqueue(), but this is not enough.
> >
> > tkill() can run without ->siglock held, how can it ensure the target
> > can't exit before we take task->siglock?
>
> And by "target can't exit" you mean, how can we ensure that the target
> doesn't execute __exit_signal() and set tsk->signal to NULL

No, tsk->signal can't go away, ->sighand can. This means that
prepare_signal()->sig_ignored() is not safe.

Another problem is, __send_signal() shouldn't add the new sigqueue to
tsk->pending if this tsk has already passed __exit_signal()->flush_sigqueue().

> While we're discussing the technique for stopping tasks from exiting...
>
> Is there a reason that a short-term reference counter isn't used to
> prevent this, instead of taking the siglock?

Well, sighand->count is the reference counter. The problem is, ->sighand
is not per-process, we can share it with abother CLONE_SIGHAND process
and de_thread() can change ->sighand during exec.

Also. We have the code which checks ->sighand != NULL to check if this
thread was released or not.

I was going to try to add some cleanups here after the scope of ->signal
was changed, but right now I can't recall what I had in mind. Anyway,
everything in sighand_struct needs ->siglock anyway, a lockless access
doesn't buy too much currently.

> > > @@ -1666,7 +1779,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
> > >   	}
> > >
> > >  	sighand = parent->sighand;
> > > -	spin_lock_irqsave(&sighand->siglock, flags);
> > > +	read_lock_irqsave(&sighand->action_lock, flags);
> > > +	spin_lock(&sighand->siglock);
> >
> > Why do we need both? (the same for do_notify_parent)
>
> We need action_lock because we're reading sighand->action and making
> decisions based upon its value, so we need it to not change. Also,
> __send_signal()

Ah, indeed, thanks. Somehow I misread the code as if it takes task->siglock,
not sighand->siglock. But anyway I was wrong, I forgot we are going to send
the signal.

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery
  2011-04-14 10:57     ` Matt Fleming
@ 2011-04-14 19:20       ` Oleg Nesterov
  2011-04-16 13:27         ` Matt Fleming
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2011-04-14 19:20 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

On 04/14, Matt Fleming wrote:
>
> On Wed, 13 Apr 2011 22:12:19 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > This adds new races. And this time I do not even understand the intent.
> > I mean, it is not clear to me why this change can really help to speed
> > up get_signal_to_deliver().
>
> Again, it's not necessarily speeding up get_signal_to_deliver(), but
> rather it's reducing the contention on the shared siglock.

Yes, sorry for confusion. I used the "speed up" term wrongly throughout.
I understand what are you trying to do.

But yes, in this case I probably missed the intent,

> For example, without this patch, if you've got someone sending a signal
> to a task group, you can't run get_signal_to_deliver() in parallel

I missed the simple fact, get_signal_to_deliver() could avoid ->siglock
completely if it dequeues the private signal.

Btw, I forgot to mention another problem. We should not dequeue from
signal->shared_pending before task->pending. There are various reasons
why we shouldn't, but in particular please look at a27341cd
"Prioritize synchronous signals over 'normal' signals".

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock
  2011-04-14 19:00       ` Oleg Nesterov
@ 2011-04-16 13:08         ` Matt Fleming
  2011-04-18 16:45           ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Matt Fleming @ 2011-04-16 13:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming, Chris Metcalf

On Thu, 14 Apr 2011 21:00:12 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

[Added Chris to Cc. Chris, check out point 3 below]
 
> No, tsk->signal can't go away, ->sighand can. This means that
> prepare_signal()->sig_ignored() is not safe.

Sorry, that was a typo - I meant tsk->sighand.
 
> Another problem is, __send_signal() shouldn't add the new sigqueue to
> tsk->pending if this tsk has already passed __exit_signal()->flush_sigqueue().

Right.

> > While we're discussing the technique for stopping tasks from exiting...
> >
> > Is there a reason that a short-term reference counter isn't used to
> > prevent this, instead of taking the siglock?
> 
> Well, sighand->count is the reference counter. The problem is, ->sighand
> is not per-process, we can share it with abother CLONE_SIGHAND process
> and de_thread() can change ->sighand during exec.

What I meant was, a reference to say "You can't change/free ->sighand
because I'm reading/modifying it". So you'd have two new functions,
get_sighand() and put_sighand(), which would protect the sighand from
changing while you were looking at it. Obviously, you'd still need to
see if sighand = NULL, but you wouldn't need to grab the shared
siglock.

Note how this is different from sighand->count. sighand->count is a
much longer term reference which stops it being freed while a task is
using it, kinda like a "Don't free _MY_ sighand" reference, whereas
what I'm talking about is a "I'm touching YOUR sighand, so don't
change/free it" reference, e.g. a short term ref for when we're
operating on a target task. It could be that the two references can
really be just one atomic_t, I would have to write the code to figure
that out.

(Note I haven't thought this through all the way yet. There may be a
reason that the above idea is the most stupid thing anyone's ever heard)

Now, at the moment that suggestion just seems like needless overhead
because siglock already provides the features we want. But, my problem
with siglock is,

	1. It needs to be acquired to stop a task passing through
	__exit_signal().

	2. It protects bits of signal_struct and that struct is getting
	pretty bloated and siglock is being used to protect lots of
	different things. Do we really need to completely prevent
	signals being generated/delivered to a thread group while we
	modify signal->it[] in kernel/itimer.c?

	3. I suspect most people find the rules of ->sighand pretty
	confusing. Just look at

		arch/tile/kernel/hardwall.c:do_hardwall_trap()

	the use of siglock there looks buggy to me. What if 'p' passes
	through __exit_signal()? 'p' might not have reached free_task()
	yet and so won't have called hardwall_deactivate(), which
	means 'p' could still be on rect->task_head. Chris?

	4. It's a heavily contended lock in thread groups (which was
	the reason for this patch series).

> I was going to try to add some cleanups here after the scope of ->signal
> was changed, but right now I can't recall what I had in mind. Anyway,
> everything in sighand_struct needs ->siglock anyway, a lockless access
> doesn't buy too much currently.

Well, as I think my patch series shows, (in theory) not needing to
acquire ->siglock gets us decent scalability improvements (in practice
it was buggy as hell but that should be fixable ;-).

Do you have any recollection of the cleanups? signal_struct needs to be
put on a diet for sure.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery
  2011-04-14 19:20       ` Oleg Nesterov
@ 2011-04-16 13:27         ` Matt Fleming
  0 siblings, 0 replies; 29+ messages in thread
From: Matt Fleming @ 2011-04-16 13:27 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

On Thu, 14 Apr 2011 21:20:59 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> Btw, I forgot to mention another problem. We should not dequeue from
> signal->shared_pending before task->pending. There are various reasons
> why we shouldn't, but in particular please look at a27341cd
> "Prioritize synchronous signals over 'normal' signals".

Oh, that's an interesting point! I hadn't thought of that.

OK, I'll swap the dequeue order back to the way it was. This actually
makes things a little easier as we can now get away with not reading
the list of pending shared signals in the fast path.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock
  2011-04-16 13:08         ` Matt Fleming
@ 2011-04-18 16:45           ` Oleg Nesterov
  2011-04-21 19:03             ` arch/tile/kernel/hardwall.c:do_hardwall_trap unsafe/wrong usage of ->sighand Oleg Nesterov
  2011-04-26  9:46             ` [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock Matt Fleming
  0 siblings, 2 replies; 29+ messages in thread
From: Oleg Nesterov @ 2011-04-18 16:45 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming, Chris Metcalf

On 04/16, Matt Fleming wrote:
>
> On Thu, 14 Apr 2011 21:00:12 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > > Is there a reason that a short-term reference counter isn't used to
> > > prevent this, instead of taking the siglock?
> >
> > Well, sighand->count is the reference counter. The problem is, ->sighand
> > is not per-process, we can share it with abother CLONE_SIGHAND process
> > and de_thread() can change ->sighand during exec.
>
> What I meant was, a reference to say "You can't change/free ->sighand
> because I'm reading/modifying it". So you'd have two new functions,
> get_sighand() and put_sighand(), which would protect the sighand from
> changing while you were looking at it. Obviously, you'd still need to
> see if sighand = NULL, but you wouldn't need to grab the shared
> siglock.
>
> Note how this is different from sighand->count. sighand->count is a
> much longer term reference which stops it being freed while a task is
> using it, kinda like a "Don't free _MY_ sighand" reference, whereas
> what I'm talking about is a "I'm touching YOUR sighand, so don't
> change/free it" reference, e.g. a short term ref for when we're
> operating on a target task. It could be that the two references can
> really be just one atomic_t, I would have to write the code to figure
> that out.

Can't understand...

OK, someone does get_sighand(). Now, what de_thread() should do if it
wants to change ->sighand?

And I don't really understand the point. You can read *sighand lockless.
But you need some per-CLONE_SIGHAND lock if you want to modify it anyway.

> Now, at the moment that suggestion just seems like needless overhead
> because siglock already provides the features we want. But, my problem
> with siglock is,
>
> 	1. It needs to be acquired to stop a task passing through
> 	__exit_signal().
>
> 	2. It protects bits of signal_struct and that struct is getting
> 	pretty bloated and siglock is being used to protect lots of
> 	different things.

Yes, this is the main problem: it is overused.

We need the better locking. Honestly, _personally_ I do not really care
about scalability (but perhaps I should) when it comes to signals, but
there are other problems. And, apart from the already mentioned problems
with signals-from-irq, I think the main problem is tasklist_lock in
do_wait/exit/etc pathes.

And we still have the problems with signals which should be fixed.
de_thread() can miss a signal, vfork() should be interruptible,
do_coredump() should be interruptible. But first of all we need to
define better the behaviour of explicit SIGKILL and what it means
after exit_signals(). This should be fixed first, I think.

> 	3. I suspect most people find the rules of ->sighand pretty
> 	confusing. Just look at
>
> 		arch/tile/kernel/hardwall.c:do_hardwall_trap()
>
> 	the use of siglock there looks buggy to me.

Indeed, I agree. It shouldn't use __group_send_sig_info() at all.
I'll send the patch. Nobody outside of signal code should play with
->sighand, this is almost always wrong.

There is another problem, historically we have a lot, a lot of send-signal
helpers, but you can never find the right one. And the naming sucks.

> Do you have any recollection of the cleanups? signal_struct needs to be
> put on a diet for sure.

I was going to remove ->sighand from fs/proc first, probably I should
try to resend these patches... Then we should remove the "sighand != NULL"
checks, we need the new helper, and btw it should be used instead of
pid_alive(). Then something else... boring ;)

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* arch/tile/kernel/hardwall.c:do_hardwall_trap unsafe/wrong usage of ->sighand
  2011-04-18 16:45           ` Oleg Nesterov
@ 2011-04-21 19:03             ` Oleg Nesterov
  2011-04-22 13:04               ` Chris Metcalf
  2011-04-26  9:46             ` [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock Matt Fleming
  1 sibling, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2011-04-21 19:03 UTC (permalink / raw)
  To: Matt Fleming, Chris Metcalf
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming

On 04/18, Oleg Nesterov wrote:
>
> On 04/16, Matt Fleming wrote:
> >
> > 	3. I suspect most people find the rules of ->sighand pretty
> > 	confusing. Just look at
> >
> > 		arch/tile/kernel/hardwall.c:do_hardwall_trap()
> >
> > 	the use of siglock there looks buggy to me.
>
> Indeed, I agree. It shouldn't use __group_send_sig_info() at all.
> I'll send the patch. Nobody outside of signal code should play with
> ->sighand, this is almost always wrong.

Hmm. It turns out, I can't make the patch because I do not understand
what this code tries to do.

hardwall_activate() adds the thread to hardwall_list, but do_hardwall_trap()
sends the signal to the whole process. I know nothing about arch/tile and
probably this is correct, but could you confirm this?

Note that SIGILL can be delivered to another thread in the thread-group, is
it correct?

Also. Is it supposed that SIGILL can have a hanlder or can be blocked, or
it should always kill the whole thread group?

I think we need the patch below, assuming that SIGILL should be sent to
the single thread and it is fine to have a handler for SIGILL.

Oleg.

--- sigprocmask/arch/tile/kernel/hardwall.c~1_sighand	2011-04-06 21:33:42.000000000 +0200
+++ sigprocmask/arch/tile/kernel/hardwall.c	2011-04-21 20:56:36.000000000 +0200
@@ -268,12 +268,10 @@ void __kprobes do_hardwall_trap(struct p
 	found_processes = 0;
 	list_for_each_entry(p, &rect->task_head, thread.hardwall_list) {
 		BUG_ON(p->thread.hardwall != rect);
-		if (p->sighand) {
+		if (!(p->flags & PF_EXITING)) {
 			found_processes = 1;
 			pr_notice("hardwall: killing %d\n", p->pid);
-			spin_lock(&p->sighand->siglock);
-			__group_send_sig_info(info.si_signo, &info, p);
-			spin_unlock(&p->sighand->siglock);
+			do_send_sig_info(info.si_signo, &info, p, false);
 		}
 	}
 	if (!found_processes)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: arch/tile/kernel/hardwall.c:do_hardwall_trap unsafe/wrong usage of ->sighand
  2011-04-21 19:03             ` arch/tile/kernel/hardwall.c:do_hardwall_trap unsafe/wrong usage of ->sighand Oleg Nesterov
@ 2011-04-22 13:04               ` Chris Metcalf
  2011-04-26 20:36                 ` [PATCH 0/1] tile: do_hardwall_trap: do not play with task->sighand Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Chris Metcalf @ 2011-04-22 13:04 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Fleming, Tejun Heo, linux-kernel, Thomas Gleixner,
	Peter Zijlstra, H. Peter Anvin, Matt Fleming

On 4/21/2011 9:03 PM, Oleg Nesterov wrote:
> On 04/18, Oleg Nesterov wrote:
>> On 04/16, Matt Fleming wrote:
>>> 	3. I suspect most people find the rules of ->sighand pretty
>>> 	confusing. Just look at
>>>
>>> 		arch/tile/kernel/hardwall.c:do_hardwall_trap()
>>>
>>> 	the use of siglock there looks buggy to me.
>> Indeed, I agree. It shouldn't use __group_send_sig_info() at all.
>> I'll send the patch. Nobody outside of signal code should play with
>> ->sighand, this is almost always wrong.
> Hmm. It turns out, I can't make the patch because I do not understand
> what this code tries to do.
>
> hardwall_activate() adds the thread to hardwall_list, but do_hardwall_trap()
> sends the signal to the whole process. I know nothing about arch/tile and
> probably this is correct, but could you confirm this?

Yes, the intended behavior is to send the signal to the process, as a way
of indicating the OS's displeasure with sending a malformed packet on the
user network.  But I think sending it to the specific thread is reasonable
too; I don't have a strong preference in this design.

> Note that SIGILL can be delivered to another thread in the thread-group, is
> it correct?
>
> Also. Is it supposed that SIGILL can have a hanlder or can be blocked, or
> it should always kill the whole thread group?

A handler would be reasonable for the process.

> I think we need the patch below, assuming that SIGILL should be sent to
> the single thread and it is fine to have a handler for SIGILL.

Thanks; I appreciate the additional code review in any case.  I'll look at
the ramifications of the change in more detail when I return from vacation
late next week.

> Oleg.
>
> --- sigprocmask/arch/tile/kernel/hardwall.c~1_sighand	2011-04-06 21:33:42.000000000 +0200
> +++ sigprocmask/arch/tile/kernel/hardwall.c	2011-04-21 20:56:36.000000000 +0200
> @@ -268,12 +268,10 @@ void __kprobes do_hardwall_trap(struct p
>  	found_processes = 0;
>  	list_for_each_entry(p, &rect->task_head, thread.hardwall_list) {
>  		BUG_ON(p->thread.hardwall != rect);
> -		if (p->sighand) {
> +		if (!(p->flags & PF_EXITING)) {
>  			found_processes = 1;
>  			pr_notice("hardwall: killing %d\n", p->pid);
> -			spin_lock(&p->sighand->siglock);
> -			__group_send_sig_info(info.si_signo, &info, p);
> -			spin_unlock(&p->sighand->siglock);
> +			do_send_sig_info(info.si_signo, &info, p, false);
>  		}
>  	}
>  	if (!found_processes)
>

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock
  2011-04-18 16:45           ` Oleg Nesterov
  2011-04-21 19:03             ` arch/tile/kernel/hardwall.c:do_hardwall_trap unsafe/wrong usage of ->sighand Oleg Nesterov
@ 2011-04-26  9:46             ` Matt Fleming
  1 sibling, 0 replies; 29+ messages in thread
From: Matt Fleming @ 2011-04-26  9:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	H. Peter Anvin, Matt Fleming, Chris Metcalf

[Sorry it's taken me so long to reply to this email]

On Mon, 18 Apr 2011 18:45:13 +0200
Oleg Nesterov <oleg@redhat.com> wrote:
 
> Can't understand...
> 
> OK, someone does get_sighand(). Now, what de_thread() should do if it
> wants to change ->sighand?
> 
> And I don't really understand the point. You can read *sighand lockless.
> But you need some per-CLONE_SIGHAND lock if you want to modify it anyway.

I think at this stage the best thing for me to do is to write a patch to
demonstrate what I'm talking about.
 
> > Now, at the moment that suggestion just seems like needless overhead
> > because siglock already provides the features we want. But, my problem
> > with siglock is,
> >
> > 	1. It needs to be acquired to stop a task passing through
> > 	__exit_signal().
> >
> > 	2. It protects bits of signal_struct and that struct is getting
> > 	pretty bloated and siglock is being used to protect lots of
> > 	different things.
> 
> Yes, this is the main problem: it is overused.
> 
> We need the better locking. Honestly, _personally_ I do not really care
> about scalability (but perhaps I should) when it comes to signals, but
> there are other problems. And, apart from the already mentioned problems
> with signals-from-irq, I think the main problem is tasklist_lock in
> do_wait/exit/etc pathes.

Is the tasklist_lock problem that we acquire the write lock for these
paths? Or is it a problem acquiring the read lock too?
 
> And we still have the problems with signals which should be fixed.
> de_thread() can miss a signal, vfork() should be interruptible,
> do_coredump() should be interruptible. But first of all we need to
> define better the behaviour of explicit SIGKILL and what it means
> after exit_signals(). This should be fixed first, I think.

Hmm.. interesting. Does the SIGKILL case cause any bugs? Or is it more
of a theoretical scenario? I must admit that I can't see any problems.

> > Do you have any recollection of the cleanups? signal_struct needs to be
> > put on a diet for sure.
> 
> I was going to remove ->sighand from fs/proc first, probably I should
> try to resend these patches... Then we should remove the "sighand != NULL"
> checks, we need the new helper, and btw it should be used instead of
> pid_alive(). Then something else... boring ;)

Heh. I'd be interested in reviewing these patches if you could find and
submit them.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 0/1] tile: do_hardwall_trap: do not play with task->sighand
  2011-04-22 13:04               ` Chris Metcalf
@ 2011-04-26 20:36                 ` Oleg Nesterov
  2011-04-26 20:37                   ` [PATCH 1/1] " Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2011-04-26 20:36 UTC (permalink / raw)
  To: Chris Metcalf
  Cc: Matt Fleming, Tejun Heo, linux-kernel, Thomas Gleixner,
	Peter Zijlstra, H. Peter Anvin, Matt Fleming

On 04/22, Chris Metcalf wrote:
>
> On 4/21/2011 9:03 PM, Oleg Nesterov wrote:
> > Hmm. It turns out, I can't make the patch because I do not understand
> > what this code tries to do.
> >
> > hardwall_activate() adds the thread to hardwall_list, but do_hardwall_trap()
> > sends the signal to the whole process. I know nothing about arch/tile and
> > probably this is correct, but could you confirm this?
>
> Yes, the intended behavior is to send the signal to the process, as a way
> of indicating the OS's displeasure with sending a malformed packet on the
> user network.  But I think sending it to the specific thread is reasonable
> too; I don't have a strong preference in this design.
>
> > Note that SIGILL can be delivered to another thread in the thread-group, is
> > it correct?
> >
> > Also. Is it supposed that SIGILL can have a hanlder or can be blocked, or
> > it should always kill the whole thread group?
>
> A handler would be reasonable for the process.

OK. In this case the thread-specific SIGILL makes more sense afaics.

> > I think we need the patch below, assuming that SIGILL should be sent to
> > the single thread and it is fine to have a handler for SIGILL.
>
> Thanks; I appreciate the additional code review in any case.  I'll look at
> the ramifications of the change in more detail when I return from vacation
> late next week.

Great. I am sending the same patch + the changelog.

Please do not forget, I know _nothing_ about arch/tile, and of course the
patch was not tested.

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/1] tile: do_hardwall_trap: do not play with task->sighand
  2011-04-26 20:36                 ` [PATCH 0/1] tile: do_hardwall_trap: do not play with task->sighand Oleg Nesterov
@ 2011-04-26 20:37                   ` Oleg Nesterov
  2011-05-02 22:42                     ` Chris Metcalf
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2011-04-26 20:37 UTC (permalink / raw)
  To: Chris Metcalf
  Cc: Matt Fleming, Tejun Heo, linux-kernel, Thomas Gleixner,
	Peter Zijlstra, H. Peter Anvin, Matt Fleming

1. do_hardwall_trap() checks ->sighand != NULL and then takes ->siglock.

   This is unsafe even if the task can't run (I assume it is pinned to
   the same CPU), its parent can reap the task and set ->sighand = NULL
   right after this check. Even if the compiler dosn't read ->sighand
   twice and this memory can't to away __group_send_sig_info() is wrong
   after that. Use do_send_sig_info().

2. Send SIGILL to the thread, not to the whole process. Unless it has
   the handler or blocked this kills the whole thread-group as before.
   IIUC, different threads can be bound to different rect's.

3. Check PF_EXITING instead of ->sighand. A zombie thread can go away
   but its ->sighand can be !NULL.

Reported-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 arch/tile/kernel/hardwall.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

--- sigprocmask/arch/tile/kernel/hardwall.c~1_sighand	2011-04-06 21:33:42.000000000 +0200
+++ sigprocmask/arch/tile/kernel/hardwall.c	2011-04-21 20:56:36.000000000 +0200
@@ -268,12 +268,10 @@ void __kprobes do_hardwall_trap(struct p
 	found_processes = 0;
 	list_for_each_entry(p, &rect->task_head, thread.hardwall_list) {
 		BUG_ON(p->thread.hardwall != rect);
-		if (p->sighand) {
+		if (!(p->flags & PF_EXITING)) {
 			found_processes = 1;
 			pr_notice("hardwall: killing %d\n", p->pid);
-			spin_lock(&p->sighand->siglock);
-			__group_send_sig_info(info.si_signo, &info, p);
-			spin_unlock(&p->sighand->siglock);
+			do_send_sig_info(info.si_signo, &info, p, false);
 		}
 	}
 	if (!found_processes)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tile: do_hardwall_trap: do not play with task->sighand
  2011-04-26 20:37                   ` [PATCH 1/1] " Oleg Nesterov
@ 2011-05-02 22:42                     ` Chris Metcalf
  0 siblings, 0 replies; 29+ messages in thread
From: Chris Metcalf @ 2011-05-02 22:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Fleming, Tejun Heo, linux-kernel, Thomas Gleixner,
	Peter Zijlstra, H. Peter Anvin, Matt Fleming

On 4/26/2011 4:37 PM, Oleg Nesterov wrote:
> 1. do_hardwall_trap() checks ->sighand != NULL and then takes ->siglock.
>
>    This is unsafe even if the task can't run (I assume it is pinned to
>    the same CPU), its parent can reap the task and set ->sighand = NULL
>    right after this check. Even if the compiler dosn't read ->sighand
>    twice and this memory can't to away __group_send_sig_info() is wrong
>    after that. Use do_send_sig_info().
>
> 2. Send SIGILL to the thread, not to the whole process. Unless it has
>    the handler or blocked this kills the whole thread-group as before.
>    IIUC, different threads can be bound to different rect's.
>
> 3. Check PF_EXITING instead of ->sighand. A zombie thread can go away
>    but its ->sighand can be !NULL.
>
> Reported-by: Matt Fleming <matt@console-pimps.org>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Thanks, I've taken this change into the tile tree.

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2011-05-02 22:42 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-05 19:21 [RFC][PATCH 0/5] Improve signal delivery scalability Matt Fleming
2011-04-05 19:21 ` [RFC][PATCH 1/5] signals: Always place SIGCONT and SIGSTOP on 'shared_pending' Matt Fleming
2011-04-05 20:19   ` Oleg Nesterov
2011-04-05 20:50     ` Matt Fleming
2011-04-06 12:57       ` Oleg Nesterov
2011-04-06 13:09         ` Tejun Heo
2011-04-06 13:30           ` Matt Fleming
2011-04-06 13:15         ` Matt Fleming
2011-04-11 18:50           ` Oleg Nesterov
2011-04-11 19:24             ` Matt Fleming
2011-04-05 19:21 ` [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock Matt Fleming
2011-04-13 19:42   ` Oleg Nesterov
2011-04-14 10:34     ` Matt Fleming
2011-04-14 19:00       ` Oleg Nesterov
2011-04-16 13:08         ` Matt Fleming
2011-04-18 16:45           ` Oleg Nesterov
2011-04-21 19:03             ` arch/tile/kernel/hardwall.c:do_hardwall_trap unsafe/wrong usage of ->sighand Oleg Nesterov
2011-04-22 13:04               ` Chris Metcalf
2011-04-26 20:36                 ` [PATCH 0/1] tile: do_hardwall_trap: do not play with task->sighand Oleg Nesterov
2011-04-26 20:37                   ` [PATCH 1/1] " Oleg Nesterov
2011-05-02 22:42                     ` Chris Metcalf
2011-04-26  9:46             ` [RFC][PATCH 2/5] signals: Introduce per-thread siglock and action rwlock Matt Fleming
2011-04-05 19:21 ` [RFC][PATCH 3/5] ia64: Catch up with new sighand action spinlock Matt Fleming
2011-04-05 19:21 ` [RFC][PATCH 4/5] signals: Introduce __dequeue_private_signal helper function Matt Fleming
2011-04-05 19:21 ` [RFC][PATCH 5/5] signals: Don't hold shared siglock across signal delivery Matt Fleming
2011-04-13 20:12   ` Oleg Nesterov
2011-04-14 10:57     ` Matt Fleming
2011-04-14 19:20       ` Oleg Nesterov
2011-04-16 13:27         ` Matt Fleming

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.