All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] Signal scalability series
@ 2011-09-30 15:12 Matt Fleming
  2011-09-30 15:12 ` [RFC][PATCH 1/5] signal: Document signal locking rules Matt Fleming
                   ` (6 more replies)
  0 siblings, 7 replies; 38+ messages in thread
From: Matt Fleming @ 2011-09-30 15:12 UTC (permalink / raw)
  To: Oleg Nesterov, Tejun Heo; +Cc: linux-kernel, Tony Luck, Matt Fleming

From: Matt Fleming <matt.fleming@intel.com>

Here's another attempt at improving signal scalability in the
kernel. It's not as drastic as my first attempt and consequently the
speedups aren't as impressive. However, it's a good first step and
hopefully by keeping it relatively simple it'll make it easier to
review.

Basically it's the same idea as last time - introduce more locks to
reduce contention on the per-process siglock, sighand->siglock. This
lock becomes highly contended in applications with multiple threads,
and I know the -rt guys have had scenarios where the per-process lock
has caused performance problems.

With this series I saw a ~%120 improvement in the number of
signals/per second that can be handled in the Will It Scale signal1
benchmark on a dual-socket machine.

There are two remaining scalability problems,

  1. lock_task_sighand() still grabs sighand->siglock
  2. signal->ctrl_lock must be taken in recalc_sigpending()

fixing these problems will be left to future patches.

And just like last time, these patches break ia64 because it
implements rt_sigprocmask() in assembly. I'll work on the ia64 stuff
while these patches are being reviewed.

Matt Fleming (5):
  signal: Document signal locking rules
  signal: Add rwlock to protect sighand->action
  signal: Reduce sighand->siglock hold time in get_signal_to_deliver()
  signal: Add signal->ctrl_lock for job control
  signal: Split siglock into shared_siglock and per-thread siglock

 arch/ia64/kernel/signal.c           |    4 +-
 drivers/block/nbd.c                 |    2 +-
 drivers/usb/gadget/f_mass_storage.c |    2 +-
 drivers/usb/gadget/file_storage.c   |    2 +-
 fs/autofs4/waitq.c                  |    5 +-
 fs/exec.c                           |   17 +-
 fs/jffs2/background.c               |    2 +-
 fs/ncpfs/sock.c                     |    2 +
 fs/proc/array.c                     |    2 +
 fs/signalfd.c                       |   11 +-
 include/linux/init_task.h           |    4 +
 include/linux/sched.h               |   23 +-
 kernel/exit.c                       |   29 +-
 kernel/fork.c                       |    4 +
 kernel/freezer.c                    |   10 +-
 kernel/kmod.c                       |    8 +-
 kernel/posix-timers.c               |    5 +-
 kernel/ptrace.c                     |   68 ++--
 kernel/signal.c                     |  737 +++++++++++++++++++++++++++--------
 net/9p/client.c                     |    6 +-
 net/sunrpc/svc.c                    |    3 -
 security/selinux/hooks.c            |   11 +-
 22 files changed, 677 insertions(+), 280 deletions(-)

-- 
1.7.4.4


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC][PATCH 1/5] signal: Document signal locking rules
  2011-09-30 15:12 [RFC][PATCH 0/5] Signal scalability series Matt Fleming
@ 2011-09-30 15:12 ` Matt Fleming
  2011-09-30 15:12 ` [RFC][PATCH 2/5] signal: Add rwlock to protect sighand->action Matt Fleming
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 38+ messages in thread
From: Matt Fleming @ 2011-09-30 15:12 UTC (permalink / raw)
  To: Oleg Nesterov, Tejun Heo
  Cc: linux-kernel, Tony Luck, Matt Fleming, Peter Zijlstra,
	Thomas Gleixner, Anirudh Badam

From: Matt Fleming <matt.fleming@intel.com>

The current locking rules for signal handling and job control are
complex and pretty much undocumented. Fix that by explaining which
data structures are involved. At the moment, all the necessary locking
is covered by the per-process sighand->siglock but patches later in
the series will change that.

Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Anirudh Badam <abadam@cs.princeton.edu>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
---
 kernel/signal.c |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 3868e66..54fc552 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -38,6 +38,33 @@
 #include "audit.h"	/* audit_signal_info() */
 
 /*
+ * signal locking rules:
+ *
+ *   - sighand->siglock (spinlock) protects,
+ *
+ *        * the tsk->sighand pointer from being modified, see de_thread() and
+ *          __exit_signal(). If you need to dereference tsk->sighand (for
+ *          example when locking ->siglock) and tsk is not current, you must
+ *          call lock_task_sighand().
+ *
+ *        * most things under tsk->signal
+ *
+ *        * tsk->sighand->action[]
+ *
+ *        * tsk->last_siginfo
+ *        * tsk->group_stop
+ *        * tsk->pending
+ *        * tsk->jobctl
+ *
+ *        * the atomic operation of checking tsk->jobctl, tsk->pending and
+ *          tsk->signal->shared_pending and setting/clearing TIF_SIGPENDING,
+ *          see recalc_sigpending().
+ *
+ *        * tsk->cpu_timers
+ *
+ */
+
+/*
  * SLAB caches for signal bits.
  */
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC][PATCH 2/5] signal: Add rwlock to protect sighand->action
  2011-09-30 15:12 [RFC][PATCH 0/5] Signal scalability series Matt Fleming
  2011-09-30 15:12 ` [RFC][PATCH 1/5] signal: Document signal locking rules Matt Fleming
@ 2011-09-30 15:12 ` Matt Fleming
  2011-09-30 15:25   ` Peter Zijlstra
  2011-09-30 15:56   ` Matt Fleming
  2011-09-30 15:12 ` [RFC][PATCH 3/5] signal: Reduce sighand->siglock hold time in get_signal_to_deliver() Matt Fleming
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 38+ messages in thread
From: Matt Fleming @ 2011-09-30 15:12 UTC (permalink / raw)
  To: Oleg Nesterov, Tejun Heo
  Cc: linux-kernel, Tony Luck, Matt Fleming, Peter Zijlstra,
	Thomas Gleixner, Anirudh Badam

From: Matt Fleming <matt.fleming@intel.com>

In a future patch sighand->siglock will no longer serialise access to
sighand->action. Therefore, we need provide a new lock to protect it.

As sighand->action is read much more frequently than written a rwlock
makes the most sense here.

Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Anirudh Badam <abadam@cs.princeton.edu>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
---
 arch/ia64/kernel/signal.c |    4 +-
 fs/ncpfs/sock.c           |    2 +
 fs/proc/array.c           |    2 +
 include/linux/init_task.h |    1 +
 include/linux/sched.h     |    1 +
 kernel/exit.c             |    6 ++
 kernel/fork.c             |    1 +
 kernel/kmod.c             |    8 ++--
 kernel/ptrace.c           |    4 +-
 kernel/signal.c           |  117 ++++++++++++++++++++++++++++++++++++++++-----
 security/selinux/hooks.c  |    2 +
 11 files changed, 128 insertions(+), 20 deletions(-)

diff --git a/arch/ia64/kernel/signal.c b/arch/ia64/kernel/signal.c
index 7523501..c77c845 100644
--- a/arch/ia64/kernel/signal.c
+++ b/arch/ia64/kernel/signal.c
@@ -311,9 +311,9 @@ force_sigsegv_info (int sig, void __user *addr)
 		 * no longer safe to modify sa_handler without holding the
 		 * lock.
 		 */
-		spin_lock_irqsave(&current->sighand->siglock, flags);
+		write_lock_irqsave(&current->sighand->action_lock, flags);
 		current->sighand->action[sig - 1].sa.sa_handler = SIG_DFL;
-		spin_unlock_irqrestore(&current->sighand->siglock, flags);
+		write_unlock_irqrestore(&current->sighand->action_lock, flags);
 	}
 	si.si_signo = SIGSEGV;
 	si.si_errno = 0;
diff --git a/fs/ncpfs/sock.c b/fs/ncpfs/sock.c
index 850768a..3e69aec 100644
--- a/fs/ncpfs/sock.c
+++ b/fs/ncpfs/sock.c
@@ -763,10 +763,12 @@ static int ncp_do_request(struct ncp_server *server, int size,
 			   What if we've blocked it ourselves?  What about
 			   alarms?  Why, in fact, are we mucking with the
 			   sigmask at all? -- r~ */
+			read_lock(&current->sighand->action_lock);
 			if (current->sighand->action[SIGINT - 1].sa.sa_handler == SIG_DFL)
 				mask |= sigmask(SIGINT);
 			if (current->sighand->action[SIGQUIT - 1].sa.sa_handler == SIG_DFL)
 				mask |= sigmask(SIGQUIT);
+			read_unlock(&current->sighand->action_lock);
 		}
 		siginitsetinv(&blocked, mask);
 		set_current_blocked(&blocked);
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 3a1dafd..0322ac9 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -239,6 +239,7 @@ static void collect_sigign_sigcatch(struct task_struct *p, sigset_t *ign,
 	struct k_sigaction *k;
 	int i;
 
+	read_lock(&p->sighand->action_lock);
 	k = p->sighand->action;
 	for (i = 1; i <= _NSIG; ++i, ++k) {
 		if (k->sa.sa_handler == SIG_IGN)
@@ -246,6 +247,7 @@ static void collect_sigign_sigcatch(struct task_struct *p, sigset_t *ign,
 		else if (k->sa.sa_handler != SIG_DFL)
 			sigaddset(catch, i);
 	}
+	read_unlock(&p->sighand->action_lock);
 }
 
 static inline void task_sig(struct seq_file *m, struct task_struct *p)
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index d14e058..1a66552 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -56,6 +56,7 @@ extern struct nsproxy init_nsproxy;
 	.action		= { { { .sa_handler = SIG_DFL, } }, },		\
 	.siglock	= __SPIN_LOCK_UNLOCKED(sighand.siglock),	\
 	.signalfd_wqh	= __WAIT_QUEUE_HEAD_INITIALIZER(sighand.signalfd_wqh),	\
+	.action_lock	= __RW_LOCK_UNLOCKED(sighand.action_lock),	\
 }
 
 extern struct group_info init_groups;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index dcacb26..f067bbc 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -443,6 +443,7 @@ struct sighand_struct {
 	struct k_sigaction	action[_NSIG];
 	spinlock_t		siglock;
 	wait_queue_head_t	signalfd_wqh;
+	rwlock_t		action_lock;
 };
 
 struct pacct_struct {
diff --git a/kernel/exit.c b/kernel/exit.c
index 2913b35..a8a83ac 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -380,7 +380,10 @@ int allow_signal(int sig)
 	 * know it'll be handled, so that they don't get converted to
 	 * SIGKILL or just silently dropped.
 	 */
+	write_lock(&current->sighand->action_lock);
 	current->sighand->action[(sig)-1].sa.sa_handler = (void __user *)2;
+	write_unlock(&current->sighand->action_lock);
+
 	recalc_sigpending();
 	spin_unlock_irq(&current->sighand->siglock);
 	return 0;
@@ -394,7 +397,10 @@ int disallow_signal(int sig)
 		return -EINVAL;
 
 	spin_lock_irq(&current->sighand->siglock);
+	write_lock(&current->sighand->action_lock);
 	current->sighand->action[(sig)-1].sa.sa_handler = SIG_IGN;
+	write_unlock(&current->sighand->action_lock);
+
 	recalc_sigpending();
 	spin_unlock_irq(&current->sighand->siglock);
 	return 0;
diff --git a/kernel/fork.c b/kernel/fork.c
index 8104ace..8871ae8 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1588,6 +1588,7 @@ static void sighand_ctor(void *data)
 
 	spin_lock_init(&sighand->siglock);
 	init_waitqueue_head(&sighand->signalfd_wqh);
+	rwlock_init(&sighand->action_lock);
 }
 
 void __init proc_caches_init(void)
diff --git a/kernel/kmod.c b/kernel/kmod.c
index ddc7644..30be456 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -143,9 +143,9 @@ static int ____call_usermodehelper(void *data)
 	struct cred *new;
 	int retval;
 
-	spin_lock_irq(&current->sighand->siglock);
+	write_lock_irq(&current->sighand->action_lock);
 	flush_signal_handlers(current, 1);
-	spin_unlock_irq(&current->sighand->siglock);
+	write_unlock_irq(&current->sighand->action_lock);
 
 	/* We can run anywhere, unlike our parent keventd(). */
 	set_cpus_allowed_ptr(current, cpu_all_mask);
@@ -202,9 +202,9 @@ static int wait_for_helper(void *data)
 	pid_t pid;
 
 	/* If SIGCLD is ignored sys_wait4 won't populate the status. */
-	spin_lock_irq(&current->sighand->siglock);
+	write_lock_irq(&current->sighand->action_lock);
 	current->sighand->action[SIGCHLD-1].sa.sa_handler = SIG_DFL;
-	spin_unlock_irq(&current->sighand->siglock);
+	write_unlock_irq(&current->sighand->action_lock);
 
 	pid = kernel_thread(____call_usermodehelper, sub_info, SIGCHLD);
 	if (pid < 0) {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 9de3ecf..eb46323 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -350,10 +350,10 @@ static int ptrace_traceme(void)
 static int ignoring_children(struct sighand_struct *sigh)
 {
 	int ret;
-	spin_lock(&sigh->siglock);
+	read_lock(&sigh->action_lock);
 	ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) ||
 	      (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
-	spin_unlock(&sigh->siglock);
+	read_unlock(&sigh->action_lock);
 	return ret;
 }
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 54fc552..7f2de6d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -38,7 +38,7 @@
 #include "audit.h"	/* audit_signal_info() */
 
 /*
- * signal locking rules:
+ * signal locking rules (and locking order):
  *
  *   - sighand->siglock (spinlock) protects,
  *
@@ -49,8 +49,6 @@
  *
  *        * most things under tsk->signal
  *
- *        * tsk->sighand->action[]
- *
  *        * tsk->last_siginfo
  *        * tsk->group_stop
  *        * tsk->pending
@@ -62,6 +60,12 @@
  *
  *        * tsk->cpu_timers
  *
+ *   - sighand->action_lock (rwlock) protects,
+ *
+ *        * sighand->action[]. Most callers only need to acquire action_lock
+ *          for reading, see lock_action() for when the write-lock is
+ *          necessary.
+ *
  */
 
 /*
@@ -72,6 +76,10 @@ static struct kmem_cache *sigqueue_cachep;
 
 int print_fatal_signals __read_mostly;
 
+/*
+ * Must be called with t->sighand->action_lock at least held for
+ * reading.
+ */
 static void __user *sig_handler(struct task_struct *t, int sig)
 {
 	return t->sighand->action[sig - 1].sa.sa_handler;
@@ -485,6 +493,9 @@ void flush_itimer_signals(void)
 	spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
 }
 
+/*
+ * Must be called with t->sighand->action_lock held for writing.
+ */
 void ignore_signals(struct task_struct *t)
 {
 	int i;
@@ -497,8 +508,9 @@ void ignore_signals(struct task_struct *t)
 
 /*
  * Flush all handlers for a task.
+ *
+ * Must be called with t->sighand->action_lock write-locked.
  */
-
 void
 flush_signal_handlers(struct task_struct *t, int force_default)
 {
@@ -1186,10 +1198,23 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
+static int
+__group_send_sig_info_locked(int sig, struct siginfo *info,
+			     struct task_struct *p)
+{
+	return send_signal(sig, info, p, 1);
+}
+
 int
 __group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, 1);
+	int ret;
+
+	read_lock(&p->sighand->action_lock);
+	ret = send_signal(sig, info, p, 1);
+	read_unlock(&p->sighand->action_lock);
+
+	return ret;
 }
 
 static int
@@ -1205,7 +1230,9 @@ int do_send_sig_info(int sig, struct siginfo *info, struct task_struct *p,
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
+		read_lock(&p->sighand->action_lock);
 		ret = send_signal(sig, info, p, group);
+		read_unlock(&p->sighand->action_lock);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1231,6 +1258,7 @@ force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
 	struct k_sigaction *action;
 
 	spin_lock_irqsave(&t->sighand->siglock, flags);
+	write_lock(&t->sighand->action_lock);
 	action = &t->sighand->action[sig-1];
 	ignored = action->sa.sa_handler == SIG_IGN;
 	blocked = sigismember(&t->blocked, sig);
@@ -1244,6 +1272,7 @@ force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
 	if (action->sa.sa_handler == SIG_DFL)
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
 	ret = specific_send_sig_info(sig, info, t);
+	write_unlock(&t->sighand->action_lock);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1402,7 +1431,9 @@ int kill_pid_info_as_uid(int sig, struct siginfo *info, struct pid *pid,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
+			read_lock(&p->sighand->action_lock);
 			ret = __send_signal(sig, info, p, 1, 0);
+			read_unlock(&p->sighand->action_lock);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -1497,9 +1528,9 @@ force_sigsegv(int sig, struct task_struct *p)
 {
 	if (sig == SIGSEGV) {
 		unsigned long flags;
-		spin_lock_irqsave(&p->sighand->siglock, flags);
+		write_lock_irqsave(&p->sighand->action_lock, flags);
 		p->sighand->action[sig - 1].sa.sa_handler = SIG_DFL;
-		spin_unlock_irqrestore(&p->sighand->siglock, flags);
+		write_unlock_irqrestore(&p->sighand->action_lock, flags);
 	}
 	force_sig(SIGSEGV, p);
 	return 0;
@@ -1665,6 +1696,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 
 	psig = tsk->parent->sighand;
 	spin_lock_irqsave(&psig->siglock, flags);
+	read_lock(&psig->action_lock);
 	if (!tsk->ptrace && sig == SIGCHLD &&
 	    (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN ||
 	     (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) {
@@ -1688,8 +1720,9 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 			sig = 0;
 	}
 	if (valid_signal(sig) && sig)
-		__group_send_sig_info(sig, &info, tsk->parent);
+		__group_send_sig_info_locked(sig, &info, tsk->parent);
 	__wake_up_parent(tsk, tsk->parent);
+	read_unlock(&psig->action_lock);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
 	return autoreap;
@@ -1753,13 +1786,15 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 
 	sighand = parent->sighand;
 	spin_lock_irqsave(&sighand->siglock, flags);
+	read_lock(&sighand->action_lock);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		__group_send_sig_info_locked(SIGCHLD, &info, parent);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
 	__wake_up_parent(tsk, parent);
+	read_unlock(&sighand->action_lock);
 	spin_unlock_irqrestore(&sighand->siglock, flags);
 }
 
@@ -2154,13 +2189,58 @@ static int ptrace_signal(int signr, siginfo_t *info,
 
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr)) {
+		read_lock(&current->sighand->action_lock);
 		specific_send_sig_info(signr, info, current);
+		read_unlock(&current->sighand->action_lock);
 		signr = 0;
 	}
 
 	return signr;
 }
 
+/**
+ * lock_action - acquire @sighand->action_lock for read/write
+ * @sighand: the signal handler struct to protect
+ * @signr: the signal being delivered
+ *
+ * If the caller needs to modify @sighand->action[] then we need to acquire
+ * @sighand->action_lock for writing, which only happens when %SA_ONESHOT is
+ * set in sa.sa_flags. Otherwise we can just acquire the read lock to prevent
+ * writers changing things under our feet.
+ *
+ * RETURNS:
+ * %false if @sighand->action_lock is read-locked
+ * %true if @sighand->action_lock is write-locked
+ */
+static bool lock_action(struct sighand_struct *sighand, int signr)
+{
+	struct k_sigaction *ka;
+
+	read_lock(&sighand->action_lock);
+	ka = &sighand->action[signr-1];
+
+	/*
+	 * Take the read lock in the hope that we don't need to modify the
+	 * action, but if we do need to update the action we must take the
+	 * write lock to prevent readers from reading a stale signal handler.
+	 */
+	if (ka->sa.sa_flags & SA_ONESHOT) {
+		read_unlock(&sighand->action_lock);
+		write_lock(&sighand->action_lock);
+		return true;
+	}
+
+	return false;
+}
+
+static void unlock_action(struct sighand_struct *sighand, bool write_locked)
+{
+	if (write_locked)
+		write_unlock(&sighand->action_lock);
+	else
+		read_unlock(&sighand->action_lock);
+}
+
 int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka,
 			  struct pt_regs *regs, void *cookie)
 {
@@ -2216,6 +2296,7 @@ relock:
 
 	for (;;) {
 		struct k_sigaction *ka;
+		bool write_locked;
 
 		if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
 		    do_signal_stop(0))
@@ -2239,13 +2320,17 @@ relock:
 				continue;
 		}
 
+		write_locked = lock_action(sighand, signr);
 		ka = &sighand->action[signr-1];
 
 		/* Trace actually delivered signals. */
 		trace_signal_deliver(signr, info, ka);
 
-		if (ka->sa.sa_handler == SIG_IGN) /* Do nothing.  */
+		if (ka->sa.sa_handler == SIG_IGN) { /* Do nothing.  */
+			unlock_action(sighand, write_locked);
 			continue;
+		}
+
 		if (ka->sa.sa_handler != SIG_DFL) {
 			/* Run the handler.  */
 			*return_ka = *ka;
@@ -2253,14 +2338,19 @@ relock:
 			if (ka->sa.sa_flags & SA_ONESHOT)
 				ka->sa.sa_handler = SIG_DFL;
 
+			unlock_action(sighand, write_locked);
 			break; /* will return non-zero "signr" value */
 		}
 
 		/*
 		 * Now we are doing the default action for this signal.
 		 */
-		if (sig_kernel_ignore(signr)) /* Default is nothing. */
+		if (sig_kernel_ignore(signr)) { /* Default is nothing. */
+			unlock_action(sighand, write_locked);
 			continue;
+		}
+
+		unlock_action(sighand, write_locked);
 
 		/*
 		 * Global init gets no signals it doesn't want.
@@ -2935,9 +3025,11 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
 	if (!valid_signal(sig) || sig < 1 || (act && sig_kernel_only(sig)))
 		return -EINVAL;
 
+	spin_lock_irq(&current->sighand->siglock);
+	read_lock(&current->sighand->action_lock);
+
 	k = &t->sighand->action[sig-1];
 
-	spin_lock_irq(&current->sighand->siglock);
 	if (oact)
 		*oact = *k;
 
@@ -2967,6 +3059,7 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
 		}
 	}
 
+	read_unlock(&current->sighand->action_lock);
 	spin_unlock_irq(&current->sighand->siglock);
 	return 0;
 }
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 266a229..574956c 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2274,7 +2274,9 @@ static void selinux_bprm_committed_creds(struct linux_binprm *bprm)
 		spin_lock_irq(&current->sighand->siglock);
 		if (!(current->signal->flags & SIGNAL_GROUP_EXIT)) {
 			__flush_signals(current);
+			write_lock(&current->sighand->action_lock);
 			flush_signal_handlers(current, 1);
+			write_unlock(&current->sighand->action_lock);
 			sigemptyset(&current->blocked);
 		}
 		spin_unlock_irq(&current->sighand->siglock);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC][PATCH 3/5] signal: Reduce sighand->siglock hold time in get_signal_to_deliver()
  2011-09-30 15:12 [RFC][PATCH 0/5] Signal scalability series Matt Fleming
  2011-09-30 15:12 ` [RFC][PATCH 1/5] signal: Document signal locking rules Matt Fleming
  2011-09-30 15:12 ` [RFC][PATCH 2/5] signal: Add rwlock to protect sighand->action Matt Fleming
@ 2011-09-30 15:12 ` Matt Fleming
  2011-09-30 15:12 ` [RFC][PATCH 4/5] signal: Add signal->ctrl_lock for job control Matt Fleming
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 38+ messages in thread
From: Matt Fleming @ 2011-09-30 15:12 UTC (permalink / raw)
  To: Oleg Nesterov, Tejun Heo
  Cc: linux-kernel, Tony Luck, Matt Fleming, Peter Zijlstra,
	Thomas Gleixner, Anirudh Badam

From: Matt Fleming <matt.fleming@intel.com>

We don't need to hold sighand->siglock across large chunks of code in
the signal delivery path. Instead we can just acquire the lock when
we're modifying a data structure that is protected by it. In other
words, we're pushing all the locking down into the functions that need
it, which allows us to get rid of dequeue_signal_lock() and stop
playing asymmetric locking games, e.g. in do_signal_stop() where we
may or may not return with the siglock held.

This patch greatly reduces the contention on the per-process siglock.

Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Anirudh Badam <abadam@cs.princeton.edu>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
---
 drivers/block/nbd.c                 |    2 +-
 drivers/usb/gadget/f_mass_storage.c |    2 +-
 drivers/usb/gadget/file_storage.c   |    2 +-
 fs/jffs2/background.c               |    2 +-
 fs/signalfd.c                       |    5 -
 include/linux/sched.h               |   12 ---
 kernel/signal.c                     |  157 +++++++++++++++++++++--------------
 7 files changed, 100 insertions(+), 82 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index f533f33..06aa43e 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -199,7 +199,7 @@ static int sock_xmit(struct nbd_device *lo, int send, void *buf, int size,
 			siginfo_t info;
 			printk(KERN_WARNING "nbd (pid %d: %s) got signal %d\n",
 				task_pid_nr(current), current->comm,
-				dequeue_signal_lock(current, &current->blocked, &info));
+				dequeue_signal(current, &current->blocked, &info));
 			result = -EINTR;
 			sock_shutdown(lo, !send);
 			break;
diff --git a/drivers/usb/gadget/f_mass_storage.c b/drivers/usb/gadget/f_mass_storage.c
index 5b93395..8688de2 100644
--- a/drivers/usb/gadget/f_mass_storage.c
+++ b/drivers/usb/gadget/f_mass_storage.c
@@ -2463,7 +2463,7 @@ static void handle_exception(struct fsg_common *common)
 	 */
 	for (;;) {
 		int sig =
-			dequeue_signal_lock(current, &current->blocked, &info);
+			dequeue_signal(current, &current->blocked, &info);
 		if (!sig)
 			break;
 		if (sig != SIGUSR1) {
diff --git a/drivers/usb/gadget/file_storage.c b/drivers/usb/gadget/file_storage.c
index 639e14a..89206e5 100644
--- a/drivers/usb/gadget/file_storage.c
+++ b/drivers/usb/gadget/file_storage.c
@@ -2896,7 +2896,7 @@ static void handle_exception(struct fsg_dev *fsg)
 	/* Clear the existing signals.  Anything but SIGUSR1 is converted
 	 * into a high-priority EXIT exception. */
 	for (;;) {
-		sig = dequeue_signal_lock(current, &current->blocked, &info);
+		sig = dequeue_signal(current, &current->blocked, &info);
 		if (!sig)
 			break;
 		if (sig != SIGUSR1) {
diff --git a/fs/jffs2/background.c b/fs/jffs2/background.c
index 404111b..127f511 100644
--- a/fs/jffs2/background.c
+++ b/fs/jffs2/background.c
@@ -122,7 +122,7 @@ static int jffs2_garbage_collect_thread(void *_c)
 			if (try_to_freeze())
 				goto again;
 
-			signr = dequeue_signal_lock(current, &current->blocked, &info);
+			signr = dequeue_signal(current, &current->blocked, &info);
 
 			switch(signr) {
 			case SIGSTOP:
diff --git a/fs/signalfd.c b/fs/signalfd.c
index 492465b..728681d 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -144,7 +144,6 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, siginfo_t *info,
 	ssize_t ret;
 	DECLARE_WAITQUEUE(wait, current);
 
-	spin_lock_irq(&current->sighand->siglock);
 	ret = dequeue_signal(current, &ctx->sigmask, info);
 	switch (ret) {
 	case 0:
@@ -152,7 +151,6 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, siginfo_t *info,
 			break;
 		ret = -EAGAIN;
 	default:
-		spin_unlock_irq(&current->sighand->siglock);
 		return ret;
 	}
 
@@ -166,11 +164,8 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, siginfo_t *info,
 			ret = -ERESTARTSYS;
 			break;
 		}
-		spin_unlock_irq(&current->sighand->siglock);
 		schedule();
-		spin_lock_irq(&current->sighand->siglock);
 	}
-	spin_unlock_irq(&current->sighand->siglock);
 
 	remove_wait_queue(&current->sighand->signalfd_wqh, &wait);
 	__set_current_state(TASK_RUNNING);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f067bbc..e4cd6bb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2145,18 +2145,6 @@ extern void ignore_signals(struct task_struct *);
 extern void flush_signal_handlers(struct task_struct *, int force_default);
 extern int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info);
 
-static inline int dequeue_signal_lock(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
-{
-	unsigned long flags;
-	int ret;
-
-	spin_lock_irqsave(&tsk->sighand->siglock, flags);
-	ret = dequeue_signal(tsk, mask, info);
-	spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
-
-	return ret;
-}
-
 extern void block_all_signals(int (*notifier)(void *priv), void *priv,
 			      sigset_t *mask);
 extern void unblock_all_signals(void);
diff --git a/kernel/signal.c b/kernel/signal.c
index 7f2de6d..b69c5a9 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -632,13 +632,14 @@ static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
 /*
  * Dequeue a signal and return the element to the caller, which is
  * expected to free it.
- *
- * All callers have to hold the siglock.
  */
 int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 {
+	unsigned long flags;
 	int signr;
 
+	spin_lock_irqsave(&current->sighand->siglock, flags);
+
 	/* We only dequeue private signals from ourselves, we don't let
 	 * signalfd steal them
 	 */
@@ -672,8 +673,10 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 	}
 
 	recalc_sigpending();
-	if (!signr)
+	if (!signr) {
+		spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
 		return 0;
+	}
 
 	if (unlikely(sig_kernel_stop(signr))) {
 		/*
@@ -690,17 +693,12 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 		 */
 		current->jobctl |= JOBCTL_STOP_DEQUEUED;
 	}
-	if ((info->si_code & __SI_MASK) == __SI_TIMER && info->si_sys_private) {
-		/*
-		 * Release the siglock to ensure proper locking order
-		 * of timer locks outside of siglocks.  Note, we leave
-		 * irqs disabled here, since the posix-timers code is
-		 * about to disable them again anyway.
-		 */
-		spin_unlock(&tsk->sighand->siglock);
+
+	spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
+
+	if ((info->si_code & __SI_MASK) == __SI_TIMER && info->si_sys_private)
 		do_schedule_next_timer(info);
-		spin_lock(&tsk->sighand->siglock);
-	}
+
 	return signr;
 }
 
@@ -1997,16 +1995,15 @@ void ptrace_notify(int exit_code)
  * If %JOBCTL_STOP_PENDING is not set yet, initiate group stop with @signr
  * and participate in it.  If already set, participate in the existing
  * group stop.  If participated in a group stop (and thus slept), %true is
- * returned with siglock released.
+ * returned.
  *
  * If ptraced, this function doesn't handle stop itself.  Instead,
- * %JOBCTL_TRAP_STOP is scheduled and %false is returned with siglock
- * untouched.  The caller must ensure that INTERRUPT trap handling takes
- * places afterwards.
+ * %JOBCTL_TRAP_STOP is scheduled and %false is returned.  The caller
+ * must ensure that INTERRUPT trap handling takes places afterwards.
  *
  * CONTEXT:
- * Must be called with @current->sighand->siglock held, which is released
- * on %true return.
+ * Must be called with @current->sighand->siglock held, which may be
+ * released and re-acquired before returning with intervening sleep.
  *
  * RETURNS:
  * %false if group stop is already cancelled or ptrace trap is scheduled.
@@ -2014,6 +2011,7 @@ void ptrace_notify(int exit_code)
  */
 static bool do_signal_stop(int signr)
 	__releases(&current->sighand->siglock)
+	__acquires(&current->sighand->siglock)
 {
 	struct signal_struct *sig = current->signal;
 
@@ -2105,6 +2103,8 @@ static bool do_signal_stop(int signr)
 
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		schedule();
+
+		spin_lock_irq(&current->sighand->siglock);
 		return true;
 	} else {
 		/*
@@ -2241,31 +2241,26 @@ static void unlock_action(struct sighand_struct *sighand, bool write_locked)
 		read_unlock(&sighand->action_lock);
 }
 
-int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka,
-			  struct pt_regs *regs, void *cookie)
+/**
+ * __notify_parent - should we notify the parent of a stop/continue?
+ * @task: task that needs to notify parent
+ *
+ * Check to see if we should notify the parent, prepare_signal(SIGCONT)
+ * encodes the CLD_ si_code into %SIGNAL_CLD_MASK bits.
+ *
+ * RETURNS:
+ * %false if we didn't notify the parent
+ * %true if we did notify the parent
+ */
+static inline bool __notify_parent(struct task_struct *task)
 {
-	struct sighand_struct *sighand = current->sighand;
-	struct signal_struct *signal = current->signal;
-	int signr;
-
-relock:
-	/*
-	 * We'll jump back here after any time we were stopped in TASK_STOPPED.
-	 * While in TASK_STOPPED, we were considered "frozen enough".
-	 * Now that we woke up, it's crucial if we're supposed to be
-	 * frozen that we freeze now before running anything substantial.
-	 */
-	try_to_freeze();
+	struct sighand_struct *sighand = task->sighand;
+	struct signal_struct *signal = task->signal;
+	int why;
 
 	spin_lock_irq(&sighand->siglock);
-	/*
-	 * Every stopped thread goes here after wakeup. Check to see if
-	 * we should notify the parent, prepare_signal(SIGCONT) encodes
-	 * the CLD_ si_code into SIGNAL_CLD_MASK bits.
-	 */
-	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
-		int why;
 
+	if (signal->flags & SIGNAL_CLD_MASK) {
 		if (signal->flags & SIGNAL_CLD_CONTINUED)
 			why = CLD_CONTINUED;
 		else
@@ -2288,24 +2283,64 @@ relock:
 
 		if (ptrace_reparented(current->group_leader))
 			do_notify_parent_cldstop(current->group_leader,
-						true, why);
+						 true, why);
 		read_unlock(&tasklist_lock);
 
-		goto relock;
+		return true;
+	}
+
+	spin_unlock_irq(&sighand->siglock);
+	return false;
+}
+
+int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka,
+			  struct pt_regs *regs, void *cookie)
+{
+	struct sighand_struct *sighand = current->sighand;
+	struct signal_struct *signal = current->signal;
+	int signr;
+
+freeze:
+	/*
+	 * We'll jump back here after any time we were stopped in TASK_STOPPED.
+	 * While in TASK_STOPPED, we were considered "frozen enough".
+	 * Now that we woke up, it's crucial if we're supposed to be
+	 * frozen that we freeze now before running anything substantial.
+	 */
+	try_to_freeze();
+
+	/*
+	 * Every stopped thread goes here after wakeup.
+	 */
+	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
+		if (__notify_parent(current))
+			goto freeze;
 	}
 
 	for (;;) {
 		struct k_sigaction *ka;
 		bool write_locked;
 
-		if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
-		    do_signal_stop(0))
-			goto relock;
+		if (unlikely(current->jobctl & JOBCTL_STOP_PENDING)) {
+			bool stopped = false;
+
+			spin_lock_irq(&sighand->siglock);
+			if (current->jobctl & JOBCTL_STOP_PENDING)
+				stopped = do_signal_stop(0);
+			spin_unlock_irq(&sighand->siglock);
+
+			if (stopped)
+				goto freeze;
+		}
 
 		if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
-			do_jobctl_trap();
+			spin_lock_irq(&sighand->siglock);
+			if (current->jobctl & JOBCTL_TRAP_MASK) {
+				do_jobctl_trap();
+				spin_unlock_irq(&sighand->siglock);
+				goto freeze;
+			}
 			spin_unlock_irq(&sighand->siglock);
-			goto relock;
 		}
 
 		signr = dequeue_signal(current, &current->blocked, info);
@@ -2314,8 +2349,10 @@ relock:
 			break; /* will return 0 */
 
 		if (unlikely(current->ptrace) && signr != SIGKILL) {
+			spin_lock_irq(&sighand->siglock);
 			signr = ptrace_signal(signr, info,
 					      regs, cookie);
+			spin_unlock_irq(&sighand->siglock);
 			if (!signr)
 				continue;
 		}
@@ -2367,6 +2404,8 @@ relock:
 			continue;
 
 		if (sig_kernel_stop(signr)) {
+			bool stopped;
+
 			/*
 			 * The default action is to stop all threads in
 			 * the thread group.  The job control signals
@@ -2378,20 +2417,18 @@ relock:
 			 * We need to check for that and bail out if necessary.
 			 */
 			if (signr != SIGSTOP) {
-				spin_unlock_irq(&sighand->siglock);
-
 				/* signals can be posted during this window */
 
 				if (is_current_pgrp_orphaned())
-					goto relock;
-
-				spin_lock_irq(&sighand->siglock);
+					goto freeze;
 			}
 
-			if (likely(do_signal_stop(info->si_signo))) {
-				/* It released the siglock.  */
-				goto relock;
-			}
+			spin_lock_irq(&sighand->siglock);
+			stopped = do_signal_stop(info->si_signo);
+			spin_unlock_irq(&sighand->siglock);
+
+			if (likely(stopped))
+				goto freeze;
 
 			/*
 			 * We didn't actually stop, due to a race
@@ -2400,8 +2437,6 @@ relock:
 			continue;
 		}
 
-		spin_unlock_irq(&sighand->siglock);
-
 		/*
 		 * Anything else is fatal, maybe with a core dump.
 		 */
@@ -2427,7 +2462,7 @@ relock:
 		do_group_exit(info->si_signo);
 		/* NOTREACHED */
 	}
-	spin_unlock_irq(&sighand->siglock);
+
 	return signr;
 }
 
@@ -2795,7 +2830,6 @@ int do_sigtimedwait(const sigset_t *which, siginfo_t *info,
 	sigdelsetmask(&mask, sigmask(SIGKILL) | sigmask(SIGSTOP));
 	signotset(&mask);
 
-	spin_lock_irq(&tsk->sighand->siglock);
 	sig = dequeue_signal(tsk, &mask, info);
 	if (!sig && timeout) {
 		/*
@@ -2804,6 +2838,7 @@ int do_sigtimedwait(const sigset_t *which, siginfo_t *info,
 		 * they arrive. Unblocking is always fine, we can avoid
 		 * set_current_blocked().
 		 */
+		spin_lock_irq(&tsk->sighand->siglock);
 		tsk->real_blocked = tsk->blocked;
 		sigandsets(&tsk->blocked, &tsk->blocked, &mask);
 		recalc_sigpending();
@@ -2814,9 +2849,9 @@ int do_sigtimedwait(const sigset_t *which, siginfo_t *info,
 		spin_lock_irq(&tsk->sighand->siglock);
 		__set_task_blocked(tsk, &tsk->real_blocked);
 		siginitset(&tsk->real_blocked, 0);
+		spin_unlock_irq(&tsk->sighand->siglock);
 		sig = dequeue_signal(tsk, &mask, info);
 	}
-	spin_unlock_irq(&tsk->sighand->siglock);
 
 	if (sig)
 		return sig;
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC][PATCH 4/5] signal: Add signal->ctrl_lock for job control
  2011-09-30 15:12 [RFC][PATCH 0/5] Signal scalability series Matt Fleming
                   ` (2 preceding siblings ...)
  2011-09-30 15:12 ` [RFC][PATCH 3/5] signal: Reduce sighand->siglock hold time in get_signal_to_deliver() Matt Fleming
@ 2011-09-30 15:12 ` Matt Fleming
  2011-09-30 15:30   ` Peter Zijlstra
  2011-09-30 15:12 ` [RFC][PATCH 5/5] signal: Split siglock into shared_siglock and per-thread siglock Matt Fleming
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 38+ messages in thread
From: Matt Fleming @ 2011-09-30 15:12 UTC (permalink / raw)
  To: Oleg Nesterov, Tejun Heo
  Cc: linux-kernel, Tony Luck, Matt Fleming, Peter Zijlstra,
	Thomas Gleixner, Anirudh Badam

From: Matt Fleming <matt.fleming@intel.com>

Instead of using sighand->siglock to synchronise access to job control
data structures provide a new lock 'ctrl_lock'. This helps to reduce
contention on the per-process siglock.

Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Anirudh Badam <abadam@cs.princeton.edu>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
---
 fs/exec.c                 |   17 +++---
 include/linux/init_task.h |    1 +
 include/linux/sched.h     |    4 +-
 kernel/exit.c             |   19 ++++---
 kernel/fork.c             |    1 +
 kernel/ptrace.c           |   64 +++++++++++------------
 kernel/signal.c           |  128 ++++++++++++++++++++++++++++++++------------
 security/selinux/hooks.c  |    2 +
 8 files changed, 150 insertions(+), 86 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 8433e5d..367d3df 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -868,7 +868,6 @@ static int de_thread(struct task_struct *tsk)
 {
 	struct signal_struct *sig = tsk->signal;
 	struct sighand_struct *oldsighand = tsk->sighand;
-	spinlock_t *lock = &oldsighand->siglock;
 
 	if (thread_group_empty(tsk))
 		goto no_thread_group;
@@ -876,13 +875,13 @@ static int de_thread(struct task_struct *tsk)
 	/*
 	 * Kill all other threads in the thread group.
 	 */
-	spin_lock_irq(lock);
+	spin_lock_irq(&sig->ctrl_lock);
 	if (signal_group_exit(sig)) {
 		/*
 		 * Another group action in progress, just
 		 * return so that the signal is processed.
 		 */
-		spin_unlock_irq(lock);
+		spin_unlock_irq(&sig->ctrl_lock);
 		return -EAGAIN;
 	}
 
@@ -893,11 +892,11 @@ static int de_thread(struct task_struct *tsk)
 
 	while (sig->notify_count) {
 		__set_current_state(TASK_UNINTERRUPTIBLE);
-		spin_unlock_irq(lock);
+		spin_unlock_irq(&sig->ctrl_lock);
 		schedule();
-		spin_lock_irq(lock);
+		spin_lock_irq(&sig->ctrl_lock);
 	}
-	spin_unlock_irq(lock);
+	spin_unlock_irq(&sig->ctrl_lock);
 
 	/*
 	 * At this point all other threads have exited, all we have to
@@ -1845,12 +1844,12 @@ static inline int zap_threads(struct task_struct *tsk, struct mm_struct *mm,
 	unsigned long flags;
 	int nr = -EAGAIN;
 
-	spin_lock_irq(&tsk->sighand->siglock);
+	spin_lock_irq(&tsk->signal->ctrl_lock);
 	if (!signal_group_exit(tsk->signal)) {
 		mm->core_state = core_state;
 		nr = zap_process(tsk, exit_code);
 	}
-	spin_unlock_irq(&tsk->sighand->siglock);
+	spin_unlock_irq(&tsk->signal->ctrl_lock);
 	if (unlikely(nr < 0))
 		return nr;
 
@@ -1897,7 +1896,9 @@ static inline int zap_threads(struct task_struct *tsk, struct mm_struct *mm,
 			if (p->mm) {
 				if (unlikely(p->mm == mm)) {
 					lock_task_sighand(p, &flags);
+					spin_lock(&p->signal->ctrl_lock);
 					nr += zap_process(p, exit_code);
+					spin_unlock(&p->signal->ctrl_lock);
 					unlock_task_sighand(p, &flags);
 				}
 				break;
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 1a66552..80baa1d 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -33,6 +33,7 @@ extern struct fs_struct init_fs;
 #define INIT_SIGNALS(sig) {						\
 	.nr_threads	= 1,						\
 	.wait_chldexit	= __WAIT_QUEUE_HEAD_INITIALIZER(sig.wait_chldexit),\
+	.ctrl_lock	= __SPIN_LOCK_UNLOCKED(sig.ctrl_lock),		\
 	.shared_pending	= { 						\
 		.list = LIST_HEAD_INIT(sig.shared_pending.list),	\
 		.signal =  {{0}}},					\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e4cd6bb..e35ce4a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -537,6 +537,8 @@ struct signal_struct {
 	/* shared signal handling: */
 	struct sigpending	shared_pending;
 
+	spinlock_t		ctrl_lock;
+
 	/* thread group exit support */
 	int			group_exit_code;
 	/* overloaded:
@@ -1293,7 +1295,7 @@ struct task_struct {
 	int exit_state;
 	int exit_code, exit_signal;
 	int pdeath_signal;  /*  The signal sent when the parent dies  */
-	unsigned int jobctl;	/* JOBCTL_*, siglock protected */
+	unsigned int jobctl;	/* JOBCTL_*, ctrl_lock protected */
 	/* ??? */
 	unsigned int personality;
 	unsigned did_exec:1;
diff --git a/kernel/exit.c b/kernel/exit.c
index a8a83ac..379a13d 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -106,8 +106,10 @@ static void __exit_signal(struct task_struct *tsk)
 		 * If there is any task waiting for the group exit
 		 * then notify it:
 		 */
+		spin_lock(&sig->ctrl_lock);
 		if (sig->notify_count > 0 && !--sig->notify_count)
 			wake_up_process(sig->group_exit_task);
+		spin_unlock(&sig->ctrl_lock);
 
 		if (tsk == sig->curr_target)
 			sig->curr_target = next_thread(tsk);
@@ -1086,8 +1088,7 @@ do_group_exit(int exit_code)
 	if (signal_group_exit(sig))
 		exit_code = sig->group_exit_code;
 	else if (!thread_group_empty(current)) {
-		struct sighand_struct *const sighand = current->sighand;
-		spin_lock_irq(&sighand->siglock);
+		spin_lock_irq(&sig->ctrl_lock);
 		if (signal_group_exit(sig))
 			/* Another thread got here before we took the lock.  */
 			exit_code = sig->group_exit_code;
@@ -1096,7 +1097,7 @@ do_group_exit(int exit_code)
 			sig->flags = SIGNAL_GROUP_EXIT;
 			zap_other_threads(current);
 		}
-		spin_unlock_irq(&sighand->siglock);
+		spin_unlock_irq(&sig->ctrl_lock);
 	}
 
 	do_exit(exit_code);
@@ -1381,7 +1382,7 @@ static int *task_stopped_code(struct task_struct *p, bool ptrace)
  *
  * CONTEXT:
  * read_lock(&tasklist_lock), which is released if return value is
- * non-zero.  Also, grabs and releases @p->sighand->siglock.
+ * non-zero.  Also, grabs and releases @p->signal->ctrl_lock.
  *
  * RETURNS:
  * 0 if wait condition didn't exist and search for other wait conditions
@@ -1407,7 +1408,7 @@ static int wait_task_stopped(struct wait_opts *wo,
 		return 0;
 
 	exit_code = 0;
-	spin_lock_irq(&p->sighand->siglock);
+	spin_lock_irq(&p->signal->ctrl_lock);
 
 	p_code = task_stopped_code(p, ptrace);
 	if (unlikely(!p_code))
@@ -1422,7 +1423,7 @@ static int wait_task_stopped(struct wait_opts *wo,
 
 	uid = task_uid(p);
 unlock_sig:
-	spin_unlock_irq(&p->sighand->siglock);
+	spin_unlock_irq(&p->signal->ctrl_lock);
 	if (!exit_code)
 		return 0;
 
@@ -1485,16 +1486,16 @@ static int wait_task_continued(struct wait_opts *wo, struct task_struct *p)
 	if (!(p->signal->flags & SIGNAL_STOP_CONTINUED))
 		return 0;
 
-	spin_lock_irq(&p->sighand->siglock);
+	spin_lock_irq(&p->signal->ctrl_lock);
 	/* Re-check with the lock held.  */
 	if (!(p->signal->flags & SIGNAL_STOP_CONTINUED)) {
-		spin_unlock_irq(&p->sighand->siglock);
+		spin_unlock_irq(&p->signal->ctrl_lock);
 		return 0;
 	}
 	if (!unlikely(wo->wo_flags & WNOWAIT))
 		p->signal->flags &= ~SIGNAL_STOP_CONTINUED;
 	uid = task_uid(p);
-	spin_unlock_irq(&p->sighand->siglock);
+	spin_unlock_irq(&p->signal->ctrl_lock);
 
 	pid = task_pid_vnr(p);
 	get_task_struct(p);
diff --git a/kernel/fork.c b/kernel/fork.c
index 8871ae8..8c5cf19 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -993,6 +993,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 		sig->flags |= SIGNAL_UNKILLABLE;
 	sig->curr_target = tsk;
 	init_sigpending(&sig->shared_pending);
+	spin_lock_init(&sig->ctrl_lock);
 	INIT_LIST_HEAD(&sig->posix_timers);
 
 	hrtimer_init(&sig->real_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index eb46323..e4966b0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -81,7 +81,7 @@ void __ptrace_unlink(struct task_struct *child)
 	child->parent = child->real_parent;
 	list_del_init(&child->ptrace_entry);
 
-	spin_lock(&child->sighand->siglock);
+	spin_lock(&child->signal->ctrl_lock);
 
 	/*
 	 * Clear all pending traps and TRAPPING.  TRAPPING should be
@@ -108,7 +108,7 @@ void __ptrace_unlink(struct task_struct *child)
 	if (child->jobctl & JOBCTL_STOP_PENDING || task_is_traced(child))
 		signal_wake_up(child, task_is_traced(child));
 
-	spin_unlock(&child->sighand->siglock);
+	spin_unlock(&child->signal->ctrl_lock);
 }
 
 /**
@@ -123,7 +123,7 @@ void __ptrace_unlink(struct task_struct *child)
  * state.
  *
  * CONTEXT:
- * Grabs and releases tasklist_lock and @child->sighand->siglock.
+ * Grabs and releases tasklist_lock and @child->signal->ctrl_lock.
  *
  * RETURNS:
  * 0 on success, -ESRCH if %child is not ready.
@@ -141,16 +141,12 @@ int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	read_lock(&tasklist_lock);
 	if ((child->ptrace & PT_PTRACED) && child->parent == current) {
-		/*
-		 * child->sighand can't be NULL, release_task()
-		 * does ptrace_unlink() before __exit_signal().
-		 */
-		spin_lock_irq(&child->sighand->siglock);
+		spin_lock_irq(&child->signal->ctrl_lock);
 		WARN_ON_ONCE(task_is_stopped(child));
 		if (ignore_state || (task_is_traced(child) &&
 				     !(child->jobctl & JOBCTL_LISTENING)))
 			ret = 0;
-		spin_unlock_irq(&child->sighand->siglock);
+		spin_unlock_irq(&child->signal->ctrl_lock);
 	}
 	read_unlock(&tasklist_lock);
 
@@ -275,7 +271,7 @@ static int ptrace_attach(struct task_struct *task, long request,
 	if (!seize)
 		send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
 
-	spin_lock(&task->sighand->siglock);
+	spin_lock(&task->signal->ctrl_lock);
 
 	/*
 	 * If the task is already STOPPED, set JOBCTL_TRAP_STOP and
@@ -292,13 +288,13 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * ATTACH, the wait(2) may fail due to the transient RUNNING.
 	 *
 	 * The following task_is_stopped() test is safe as both transitions
-	 * in and out of STOPPED are protected by siglock.
+	 * in and out of STOPPED are protected by ctrl_lock.
 	 */
 	if (task_is_stopped(task) &&
 	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
 		signal_wake_up(task, 1);
 
-	spin_unlock(&task->sighand->siglock);
+	spin_unlock(&task->signal->ctrl_lock);
 
 	retval = 0;
 unlock_tasklist:
@@ -538,32 +534,30 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
 {
 	unsigned long flags;
-	int error = -ESRCH;
+	int error = -EINVAL;
 
-	if (lock_task_sighand(child, &flags)) {
-		error = -EINVAL;
-		if (likely(child->last_siginfo != NULL)) {
-			*info = *child->last_siginfo;
-			error = 0;
-		}
-		unlock_task_sighand(child, &flags);
+	spin_lock_irqsave(&child->signal->ctrl_lock, flags);
+	if (likely(child->last_siginfo != NULL)) {
+		*info = *child->last_siginfo;
+		error = 0;
 	}
+	spin_unlock_irqrestore(&child->signal->ctrl_lock, flags);
+
 	return error;
 }
 
 static int ptrace_setsiginfo(struct task_struct *child, const siginfo_t *info)
 {
 	unsigned long flags;
-	int error = -ESRCH;
+	int error = -EINVAL;
 
-	if (lock_task_sighand(child, &flags)) {
-		error = -EINVAL;
-		if (likely(child->last_siginfo != NULL)) {
-			*child->last_siginfo = *info;
-			error = 0;
-		}
-		unlock_task_sighand(child, &flags);
+	spin_lock_irqsave(&child->signal->ctrl_lock, flags);
+	if (likely(child->last_siginfo != NULL)) {
+		*child->last_siginfo = *info;
+		error = 0;
 	}
+	spin_unlock_irqrestore(&child->signal->ctrl_lock, flags);
+
 	return error;
 }
 
@@ -715,7 +709,7 @@ int ptrace_request(struct task_struct *child, long request,
 		 * The actual trap might not be PTRACE_EVENT_STOP trap but
 		 * the pending condition is cleared regardless.
 		 */
-		if (unlikely(!seized || !lock_task_sighand(child, &flags)))
+		if (unlikely(!seized))
 			break;
 
 		/*
@@ -724,10 +718,11 @@ int ptrace_request(struct task_struct *child, long request,
 		 * STOP, this INTERRUPT should clear LISTEN and re-trap
 		 * tracee into STOP.
 		 */
+		spin_lock_irqsave(&child->signal->ctrl_lock, flags);
 		if (likely(task_set_jobctl_pending(child, JOBCTL_TRAP_STOP)))
 			signal_wake_up(child, child->jobctl & JOBCTL_LISTENING);
 
-		unlock_task_sighand(child, &flags);
+		spin_unlock_irqrestore(&child->signal->ctrl_lock, flags);
 		ret = 0;
 		break;
 
@@ -740,12 +735,15 @@ int ptrace_request(struct task_struct *child, long request,
 		 * again.  Alternatively, ptracer can issue INTERRUPT to
 		 * finish listening and re-trap tracee into STOP.
 		 */
-		if (unlikely(!seized || !lock_task_sighand(child, &flags)))
+		if (unlikely(!seized))
 			break;
 
+		spin_lock_irqsave(&child->signal->ctrl_lock, flags);
 		si = child->last_siginfo;
-		if (unlikely(!si || si->si_code >> 8 != PTRACE_EVENT_STOP))
+		if (unlikely(!si || si->si_code >> 8 != PTRACE_EVENT_STOP)) {
+			spin_unlock_irqrestore(&child->signal->ctrl_lock, flags);
 			break;
+		}
 
 		child->jobctl |= JOBCTL_LISTENING;
 
@@ -756,7 +754,7 @@ int ptrace_request(struct task_struct *child, long request,
 		if (child->jobctl & JOBCTL_TRAP_NOTIFY)
 			signal_wake_up(child, true);
 
-		unlock_task_sighand(child, &flags);
+		spin_unlock_irqrestore(&child->signal->ctrl_lock, flags);
 		ret = 0;
 		break;
 
diff --git a/kernel/signal.c b/kernel/signal.c
index b69c5a9..ca99c2d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -50,13 +50,7 @@
  *        * most things under tsk->signal
  *
  *        * tsk->last_siginfo
- *        * tsk->group_stop
  *        * tsk->pending
- *        * tsk->jobctl
- *
- *        * the atomic operation of checking tsk->jobctl, tsk->pending and
- *          tsk->signal->shared_pending and setting/clearing TIF_SIGPENDING,
- *          see recalc_sigpending().
  *
  *        * tsk->cpu_timers
  *
@@ -66,6 +60,19 @@
  *          for reading, see lock_action() for when the write-lock is
  *          necessary.
  *
+ *   - signal->ctrl_lock (spinlock) protects,
+ *
+ *        * tsk->signal->group_exit_code
+ *        * tsk->signal->group_exit_task
+ *        * tsk->signal->notify_count
+ *        * tsk->signal->group_stop
+ *        * tsk->signal->flags
+ *        * tsk->group_stop
+ *        * tsk->jobctl
+ *
+ *        * the atomic operation of checking tsk->jobctl, tsk->pending and
+ *          tsk->signal->shared_pending and setting/clearing TIF_SIGPENDING,
+ *          see recalc_sigpending().
  */
 
 /*
@@ -157,8 +164,23 @@ static inline int has_pending_signals(sigset_t *signal, sigset_t *blocked)
 
 #define PENDING(p,b) has_pending_signals(&(p)->signal, (b))
 
+/**
+ * recalc_sigpending_tsk - check for pending signals
+ * @t: task to calculate pending signals for
+ *
+ * CONTEXT:
+ * Must be called with both t->signal->ctrl_lock and t->sighand->siglock held,
+ * because TIF_SIGPENDING can be set by a task holding either of them.
+ *
+ * RETURNS:
+ * 0 if there are no pending signals. Otherwise we return 1 and set
+ * TIF_SIGPENDING.
+ */
 static int recalc_sigpending_tsk(struct task_struct *t)
 {
+	assert_spin_locked(&t->sighand->siglock);
+	assert_spin_locked(&t->signal->ctrl_lock);
+
 	if ((t->jobctl & JOBCTL_PENDING_MASK) ||
 	    PENDING(&t->pending, &t->blocked) ||
 	    PENDING(&t->signal->shared_pending, &t->blocked)) {
@@ -179,15 +201,22 @@ static int recalc_sigpending_tsk(struct task_struct *t)
  */
 void recalc_sigpending_and_wake(struct task_struct *t)
 {
+	struct signal_struct *sig = t->signal;
+
+	spin_lock(&sig->ctrl_lock);
 	if (recalc_sigpending_tsk(t))
 		signal_wake_up(t, 0);
+	spin_unlock(&sig->ctrl_lock);
 }
 
 void recalc_sigpending(void)
 {
+	struct signal_struct *sig = current->signal;
+
+	spin_lock(&sig->ctrl_lock);
 	if (!recalc_sigpending_tsk(current) && !freezing(current))
 		clear_thread_flag(TIF_SIGPENDING);
-
+	spin_unlock(&sig->ctrl_lock);
 }
 
 /* Given the mask, find the first available signal that should be serviced. */
@@ -268,7 +297,7 @@ static inline void print_dropped_signal(int sig)
  * becomes noop.
  *
  * CONTEXT:
- * Must be called with @task->sighand->siglock held.
+ * Must be called with @task->signal->ctrl_lock held.
  *
  * RETURNS:
  * %true if @mask is set, %false if made noop because @task was dying.
@@ -278,6 +307,7 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned int mask)
 	BUG_ON(mask & ~(JOBCTL_PENDING_MASK | JOBCTL_STOP_CONSUME |
 			JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
 	BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
+	assert_spin_locked(&task->signal->ctrl_lock);
 
 	if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
 		return false;
@@ -299,10 +329,12 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned int mask)
  * ptracer.
  *
  * CONTEXT:
- * Must be called with @task->sighand->siglock held.
+ * Must be called with @task->signal->ctrl_lock held.
  */
 void task_clear_jobctl_trapping(struct task_struct *task)
 {
+	assert_spin_locked(&task->signal->ctrl_lock);
+
 	if (unlikely(task->jobctl & JOBCTL_TRAPPING)) {
 		task->jobctl &= ~JOBCTL_TRAPPING;
 		wake_up_bit(&task->jobctl, JOBCTL_TRAPPING_BIT);
@@ -322,11 +354,12 @@ void task_clear_jobctl_trapping(struct task_struct *task)
  * task_clear_jobctl_trapping().
  *
  * CONTEXT:
- * Must be called with @task->sighand->siglock held.
+ * Must be called with @task->signal->ctrl_lock held.
  */
 void task_clear_jobctl_pending(struct task_struct *task, unsigned int mask)
 {
 	BUG_ON(mask & ~JOBCTL_PENDING_MASK);
+	assert_spin_locked(&task->signal->ctrl_lock);
 
 	if (mask & JOBCTL_STOP_PENDING)
 		mask |= JOBCTL_STOP_CONSUME | JOBCTL_STOP_DEQUEUED;
@@ -347,7 +380,7 @@ void task_clear_jobctl_pending(struct task_struct *task, unsigned int mask)
  * stop, the appropriate %SIGNAL_* flags are set.
  *
  * CONTEXT:
- * Must be called with @task->sighand->siglock held.
+ * Must be called with @task->signal->ctrl_lock held.
  *
  * RETURNS:
  * %true if group stop completion should be notified to the parent, %false
@@ -359,6 +392,7 @@ static bool task_participate_group_stop(struct task_struct *task)
 	bool consume = task->jobctl & JOBCTL_STOP_CONSUME;
 
 	WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING));
+	assert_spin_locked(&sig->ctrl_lock);
 
 	task_clear_jobctl_pending(task, JOBCTL_STOP_PENDING);
 
@@ -672,8 +706,10 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 		}
 	}
 
+	spin_lock(&current->signal->ctrl_lock);
 	recalc_sigpending();
 	if (!signr) {
+		spin_unlock(&current->signal->ctrl_lock);
 		spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
 		return 0;
 	}
@@ -694,6 +730,7 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 		current->jobctl |= JOBCTL_STOP_DEQUEUED;
 	}
 
+	spin_unlock(&current->signal->ctrl_lock);
 	spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
 
 	if ((info->si_code & __SI_MASK) == __SI_TIMER && info->si_sys_private)
@@ -706,8 +743,8 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
  * Tell a process that it has a new active signal..
  *
  * NOTE! we rely on the previous spin_lock to
- * lock interrupts for us! We can only be called with
- * "siglock" held, and the local interrupt must
+ * lock interrupts for us! We can only be called with either
+ * "siglock"  or "ctrl_lock" held, and the local interrupt must
  * have been disabled when that got acquired!
  *
  * No need to set need_resched since signal event passing
@@ -870,12 +907,12 @@ static int check_kill_permission(int sig, struct siginfo *info,
  * are finished by PTRACE_CONT.
  *
  * CONTEXT:
- * Must be called with @task->sighand->siglock held.
+ * Must be called with @task->signal->ctrl_lock held.
  */
 static void ptrace_trap_notify(struct task_struct *t)
 {
 	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
-	assert_spin_locked(&t->sighand->siglock);
+	assert_spin_locked(&t->signal->ctrl_lock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
 	signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -896,6 +933,7 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 	struct signal_struct *signal = p->signal;
 	struct task_struct *t;
 
+	spin_lock(&signal->ctrl_lock);
 	if (unlikely(signal->flags & SIGNAL_GROUP_EXIT)) {
 		/*
 		 * The process is in the middle of dying, nothing to do.
@@ -950,6 +988,7 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 			signal->group_exit_code = 0;
 		}
 	}
+	spin_unlock(&signal->ctrl_lock);
 
 	return !sig_ignored(p, sig, from_ancestor_ns);
 }
@@ -1016,6 +1055,7 @@ static void complete_signal(int sig, struct task_struct *p, int group)
 	 * Found a killable thread.  If the signal will be fatal,
 	 * then start taking the whole group down immediately.
 	 */
+	spin_lock(&signal->ctrl_lock);
 	if (sig_fatal(p, sig) &&
 	    !(signal->flags & (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) &&
 	    !sigismember(&t->real_blocked, sig) &&
@@ -1039,6 +1079,7 @@ static void complete_signal(int sig, struct task_struct *p, int group)
 				sigaddset(&t->pending.signal, SIGKILL);
 				signal_wake_up(t, 1);
 			} while_each_thread(p, t);
+			spin_unlock(&signal->ctrl_lock);
 			return;
 		}
 	}
@@ -1048,6 +1089,7 @@ static void complete_signal(int sig, struct task_struct *p, int group)
 	 * Tell the chosen thread to wake up and dequeue it.
 	 */
 	signal_wake_up(t, sig == SIGKILL);
+	spin_unlock(&signal->ctrl_lock);
 	return;
 }
 
@@ -1840,7 +1882,10 @@ static int sigkill_pending(struct task_struct *tsk)
 static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	__releases(&current->sighand->siglock)
 	__acquires(&current->sighand->siglock)
+	__releases(&current->signal->ctrl_lock)
+	__acquires(&current->signal->ctrl_lock)
 {
+	struct signal_struct *sig = current->signal;
 	bool gstop_done = false;
 
 	if (arch_ptrace_stop_needed(exit_code, info)) {
@@ -1855,9 +1900,11 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 		 * siglock.  That must prevent us from sleeping in TASK_TRACED.
 		 * So after regaining the lock, we must check for SIGKILL.
 		 */
+		spin_unlock(&sig->ctrl_lock);
 		spin_unlock_irq(&current->sighand->siglock);
 		arch_ptrace_stop(exit_code, info);
 		spin_lock_irq(&current->sighand->siglock);
+		spin_lock(&sig->ctrl_lock);
 		if (sigkill_pending(current))
 			return;
 	}
@@ -1892,6 +1939,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	/* entering a trap, clear TRAPPING */
 	task_clear_jobctl_trapping(current);
 
+	spin_unlock(&sig->ctrl_lock);
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
 	if (may_ptrace_stop()) {
@@ -1947,11 +1995,12 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	try_to_freeze();
 
 	/*
-	 * We are back.  Now reacquire the siglock before touching
+	 * We are back.  Now reacquire the ctrl_lock before touching
 	 * last_siginfo, so that we are sure to have synchronized with
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
+	spin_lock(&sig->ctrl_lock);
 	current->last_siginfo = NULL;
 
 	/* LISTENING can be set only during STOP traps, clear it */
@@ -2002,7 +2051,7 @@ void ptrace_notify(int exit_code)
  * must ensure that INTERRUPT trap handling takes places afterwards.
  *
  * CONTEXT:
- * Must be called with @current->sighand->siglock held, which may be
+ * Must be called with @current->signal->ctrl_lock held, which may be
  * released and re-acquired before returning with intervening sleep.
  *
  * RETURNS:
@@ -2010,8 +2059,8 @@ void ptrace_notify(int exit_code)
  * %true if participated in group stop.
  */
 static bool do_signal_stop(int signr)
-	__releases(&current->sighand->siglock)
-	__acquires(&current->sighand->siglock)
+	__releases(&current->signal->ctrl_lock)
+	__acquires(&current->signal->ctrl_lock)
 {
 	struct signal_struct *sig = current->signal;
 
@@ -2084,7 +2133,7 @@ static bool do_signal_stop(int signr)
 			notify = CLD_STOPPED;
 
 		__set_current_state(TASK_STOPPED);
-		spin_unlock_irq(&current->sighand->siglock);
+		spin_unlock_irq(&sig->ctrl_lock);
 
 		/*
 		 * Notify the parent of the group stop completion.  Because
@@ -2104,7 +2153,7 @@ static bool do_signal_stop(int signr)
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		schedule();
 
-		spin_lock_irq(&current->sighand->siglock);
+		spin_lock_irq(&sig->ctrl_lock);
 		return true;
 	} else {
 		/*
@@ -2128,7 +2177,7 @@ static bool do_signal_stop(int signr)
  * number as exit_code and no siginfo.
  *
  * CONTEXT:
- * Must be called with @current->sighand->siglock held, which may be
+ * Must be called with @current->signal->ctrl_lock held, which may be
  * released and re-acquired before returning with intervening sleep.
  */
 static void do_jobctl_trap(void)
@@ -2153,6 +2202,8 @@ static void do_jobctl_trap(void)
 static int ptrace_signal(int signr, siginfo_t *info,
 			 struct pt_regs *regs, void *cookie)
 {
+	assert_spin_locked(&current->signal->ctrl_lock);
+
 	ptrace_signal_deliver(regs, cookie);
 	/*
 	 * We do not check sig_kernel_stop(signr) but set this marker
@@ -2189,9 +2240,11 @@ static int ptrace_signal(int signr, siginfo_t *info,
 
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr)) {
+		spin_unlock_irq(&current->signal->ctrl_lock);
 		read_lock(&current->sighand->action_lock);
 		specific_send_sig_info(signr, info, current);
 		read_unlock(&current->sighand->action_lock);
+		spin_lock_irq(&current->signal->ctrl_lock);
 		signr = 0;
 	}
 
@@ -2254,11 +2307,10 @@ static void unlock_action(struct sighand_struct *sighand, bool write_locked)
  */
 static inline bool __notify_parent(struct task_struct *task)
 {
-	struct sighand_struct *sighand = task->sighand;
 	struct signal_struct *signal = task->signal;
 	int why;
 
-	spin_lock_irq(&sighand->siglock);
+	spin_lock_irq(&signal->ctrl_lock);
 
 	if (signal->flags & SIGNAL_CLD_MASK) {
 		if (signal->flags & SIGNAL_CLD_CONTINUED)
@@ -2268,7 +2320,7 @@ static inline bool __notify_parent(struct task_struct *task)
 
 		signal->flags &= ~SIGNAL_CLD_MASK;
 
-		spin_unlock_irq(&sighand->siglock);
+		spin_unlock_irq(&signal->ctrl_lock);
 
 		/*
 		 * Notify the parent that we're continuing.  This event is
@@ -2289,7 +2341,7 @@ static inline bool __notify_parent(struct task_struct *task)
 		return true;
 	}
 
-	spin_unlock_irq(&sighand->siglock);
+	spin_unlock_irq(&signal->ctrl_lock);
 	return false;
 }
 
@@ -2324,23 +2376,23 @@ freeze:
 		if (unlikely(current->jobctl & JOBCTL_STOP_PENDING)) {
 			bool stopped = false;
 
-			spin_lock_irq(&sighand->siglock);
+			spin_lock_irq(&signal->ctrl_lock);
 			if (current->jobctl & JOBCTL_STOP_PENDING)
 				stopped = do_signal_stop(0);
-			spin_unlock_irq(&sighand->siglock);
+			spin_unlock_irq(&signal->ctrl_lock);
 
 			if (stopped)
 				goto freeze;
 		}
 
 		if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
-			spin_lock_irq(&sighand->siglock);
+			spin_lock_irq(&signal->ctrl_lock);
 			if (current->jobctl & JOBCTL_TRAP_MASK) {
 				do_jobctl_trap();
-				spin_unlock_irq(&sighand->siglock);
+				spin_unlock_irq(&signal->ctrl_lock);
 				goto freeze;
 			}
-			spin_unlock_irq(&sighand->siglock);
+			spin_unlock_irq(&signal->ctrl_lock);
 		}
 
 		signr = dequeue_signal(current, &current->blocked, info);
@@ -2349,10 +2401,10 @@ freeze:
 			break; /* will return 0 */
 
 		if (unlikely(current->ptrace) && signr != SIGKILL) {
-			spin_lock_irq(&sighand->siglock);
+			spin_lock_irq(&signal->ctrl_lock);
 			signr = ptrace_signal(signr, info,
 					      regs, cookie);
-			spin_unlock_irq(&sighand->siglock);
+			spin_unlock_irq(&signal->ctrl_lock);
 			if (!signr)
 				continue;
 		}
@@ -2423,9 +2475,9 @@ freeze:
 					goto freeze;
 			}
 
-			spin_lock_irq(&sighand->siglock);
+			spin_lock_irq(&signal->ctrl_lock);
 			stopped = do_signal_stop(info->si_signo);
-			spin_unlock_irq(&sighand->siglock);
+			spin_unlock_irq(&signal->ctrl_lock);
 
 			if (likely(stopped))
 				goto freeze;
@@ -2538,6 +2590,7 @@ void exit_signals(struct task_struct *tsk)
 	if (!signal_pending(tsk))
 		goto out;
 
+	spin_lock(&tsk->signal->ctrl_lock);
 	unblocked = tsk->blocked;
 	signotset(&unblocked);
 	retarget_shared_pending(tsk, &unblocked);
@@ -2545,6 +2598,7 @@ void exit_signals(struct task_struct *tsk)
 	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
 	    task_participate_group_stop(tsk))
 		group_stop = CLD_STOPPED;
+	spin_unlock(&tsk->signal->ctrl_lock);
 out:
 	spin_unlock_irq(&tsk->sighand->siglock);
 
@@ -2612,7 +2666,9 @@ void set_current_blocked(const sigset_t *newset)
 	struct task_struct *tsk = current;
 
 	spin_lock_irq(&tsk->sighand->siglock);
+	spin_lock(&tsk->signal->ctrl_lock);
 	__set_task_blocked(tsk, newset);
+	spin_unlock(&tsk->signal->ctrl_lock);
 	spin_unlock_irq(&tsk->sighand->siglock);
 }
 EXPORT_SYMBOL(set_current_blocked);
@@ -2847,8 +2903,10 @@ int do_sigtimedwait(const sigset_t *which, siginfo_t *info,
 		timeout = schedule_timeout_interruptible(timeout);
 
 		spin_lock_irq(&tsk->sighand->siglock);
+		spin_lock(&tsk->signal->ctrl_lock);
 		__set_task_blocked(tsk, &tsk->real_blocked);
 		siginitset(&tsk->real_blocked, 0);
+		spin_unlock(&tsk->signal->ctrl_lock);
 		spin_unlock_irq(&tsk->sighand->siglock);
 		sig = dequeue_signal(tsk, &mask, info);
 	}
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 574956c..47f278f 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2272,6 +2272,7 @@ static void selinux_bprm_committed_creds(struct linux_binprm *bprm)
 		for (i = 0; i < 3; i++)
 			do_setitimer(i, &itimer, NULL);
 		spin_lock_irq(&current->sighand->siglock);
+		spin_lock(&current->signal->ctrl_lock);
 		if (!(current->signal->flags & SIGNAL_GROUP_EXIT)) {
 			__flush_signals(current);
 			write_lock(&current->sighand->action_lock);
@@ -2279,6 +2280,7 @@ static void selinux_bprm_committed_creds(struct linux_binprm *bprm)
 			write_unlock(&current->sighand->action_lock);
 			sigemptyset(&current->blocked);
 		}
+		spin_unlock(&current->signal->ctrl_lock);
 		spin_unlock_irq(&current->sighand->siglock);
 	}
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC][PATCH 5/5] signal: Split siglock into shared_siglock and per-thread siglock
  2011-09-30 15:12 [RFC][PATCH 0/5] Signal scalability series Matt Fleming
                   ` (3 preceding siblings ...)
  2011-09-30 15:12 ` [RFC][PATCH 4/5] signal: Add signal->ctrl_lock for job control Matt Fleming
@ 2011-09-30 15:12 ` Matt Fleming
  2011-09-30 16:52 ` [RFC][PATCH 0/5] Signal scalability series Oleg Nesterov
  2011-09-30 22:30 ` Andi Kleen
  6 siblings, 0 replies; 38+ messages in thread
From: Matt Fleming @ 2011-09-30 15:12 UTC (permalink / raw)
  To: Oleg Nesterov, Tejun Heo
  Cc: linux-kernel, Tony Luck, Matt Fleming, Peter Zijlstra,
	Thomas Gleixner, Anirudh Badam

From: Matt Fleming <matt.fleming@intel.com>

Create two new locks specifically for synchronising access to
tsk->pending and tsk->signal->shared_pending. This helps to reduce
contention on the per-process siglock and improves scalability by
splitting signal delivery into a fastpath and slowpath.

The signal delivery fastpath dequeues a signal from the per-thread
queue (tsk->pending) which means it only has to acquire the per-thread
siglock. The slowpath on the other hand, must acquire the per-process
siglock (and potentially sighand->siglock if an itimer signal is
dequeued) and dequeue a signal from the shared queue.

Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Anirudh Badam <abadam@cs.princeton.edu>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
---
 fs/autofs4/waitq.c        |    5 +-
 fs/signalfd.c             |    6 +-
 include/linux/init_task.h |    2 +
 include/linux/sched.h     |    6 +-
 kernel/exit.c             |   12 +-
 kernel/fork.c             |    2 +
 kernel/freezer.c          |   10 +-
 kernel/posix-timers.c     |    5 +-
 kernel/signal.c           |  428 ++++++++++++++++++++++++++++++++-------------
 net/9p/client.c           |    6 +-
 net/sunrpc/svc.c          |    3 -
 security/selinux/hooks.c  |   11 +-
 12 files changed, 338 insertions(+), 158 deletions(-)

diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
index 58ba49a..d2fdcdc 100644
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -82,10 +82,11 @@ static int autofs4_write(struct file *file, const void *addr, int bytes)
 	/* Keep the currently executing process from receiving a
 	   SIGPIPE unless it was already supposed to get one */
 	if (wr == -EPIPE && !sigpipe) {
-		spin_lock_irqsave(&current->sighand->siglock, flags);
+		spin_lock_irqsave(&current->siglock, flags);
 		sigdelset(&current->pending.signal, SIGPIPE);
+		spin_unlock_irqrestore(&current->siglock, flags);
+
 		recalc_sigpending();
-		spin_unlock_irqrestore(&current->sighand->siglock, flags);
 	}
 
 	return (bytes > 0);
diff --git a/fs/signalfd.c b/fs/signalfd.c
index 728681d..26ea662 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -47,12 +47,14 @@ static unsigned int signalfd_poll(struct file *file, poll_table *wait)
 
 	poll_wait(file, &current->sighand->signalfd_wqh, wait);
 
-	spin_lock_irq(&current->sighand->siglock);
+	spin_lock_irq(&current->siglock);
+	spin_lock(&current->signal->shared_siglock);
 	if (next_signal(&current->pending, &ctx->sigmask) ||
 	    next_signal(&current->signal->shared_pending,
 			&ctx->sigmask))
 		events |= POLLIN;
-	spin_unlock_irq(&current->sighand->siglock);
+	spin_unlock(&current->signal->shared_siglock);
+	spin_unlock_irq(&current->siglock);
 
 	return events;
 }
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 80baa1d..6448863 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -34,6 +34,7 @@ extern struct fs_struct init_fs;
 	.nr_threads	= 1,						\
 	.wait_chldexit	= __WAIT_QUEUE_HEAD_INITIALIZER(sig.wait_chldexit),\
 	.ctrl_lock	= __SPIN_LOCK_UNLOCKED(sig.ctrl_lock),		\
+	.shared_siglock	= __SPIN_LOCK_UNLOCKED(sig.shared_siglock),	\
 	.shared_pending	= { 						\
 		.list = LIST_HEAD_INIT(sig.shared_pending.list),	\
 		.signal =  {{0}}},					\
@@ -171,6 +172,7 @@ extern struct cred init_cred;
 	.signal		= &init_signals,				\
 	.sighand	= &init_sighand,				\
 	.nsproxy	= &init_nsproxy,				\
+	.siglock	= __SPIN_LOCK_UNLOCKED(tsk.siglock),		\
 	.pending	= {						\
 		.list = LIST_HEAD_INIT(tsk.pending.list),		\
 		.signal = {{0}}},					\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e35ce4a..c04048f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -534,6 +534,9 @@ struct signal_struct {
 	/* current thread group signal load-balancing target: */
 	struct task_struct	*curr_target;
 
+	/* protects shared_pending */
+	spinlock_t		shared_siglock;
+
 	/* shared signal handling: */
 	struct sigpending	shared_pending;
 
@@ -1395,6 +1398,7 @@ struct task_struct {
 
 	sigset_t blocked, real_blocked;
 	sigset_t saved_sigmask;	/* restored if set_restore_sigmask() was used */
+	spinlock_t siglock;	/* protects pending */
 	struct sigpending pending;
 
 	unsigned long sas_ss_sp;
@@ -2166,7 +2170,7 @@ extern void force_sig(int, struct task_struct *);
 extern int send_sig(int, struct task_struct *, int);
 extern int zap_other_threads(struct task_struct *p);
 extern struct sigqueue *sigqueue_alloc(void);
-extern void sigqueue_free(struct sigqueue *);
+extern void sigqueue_free(struct sigqueue *, int group);
 extern int send_sigqueue(struct sigqueue *,  struct task_struct *, int group);
 extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *);
 extern int do_sigaltstack(const stack_t __user *, stack_t __user *, unsigned long);
diff --git a/kernel/exit.c b/kernel/exit.c
index 379a13d..32775ca 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -374,7 +374,6 @@ int allow_signal(int sig)
 	if (!valid_signal(sig) || sig < 1)
 		return -EINVAL;
 
-	spin_lock_irq(&current->sighand->siglock);
 	/* This is only needed for daemonize()'ed kthreads */
 	sigdelset(&current->blocked, sig);
 	/*
@@ -382,12 +381,11 @@ int allow_signal(int sig)
 	 * know it'll be handled, so that they don't get converted to
 	 * SIGKILL or just silently dropped.
 	 */
-	write_lock(&current->sighand->action_lock);
+	write_lock_irq(&current->sighand->action_lock);
 	current->sighand->action[(sig)-1].sa.sa_handler = (void __user *)2;
-	write_unlock(&current->sighand->action_lock);
+	write_unlock_irq(&current->sighand->action_lock);
 
 	recalc_sigpending();
-	spin_unlock_irq(&current->sighand->siglock);
 	return 0;
 }
 
@@ -398,13 +396,11 @@ int disallow_signal(int sig)
 	if (!valid_signal(sig) || sig < 1)
 		return -EINVAL;
 
-	spin_lock_irq(&current->sighand->siglock);
-	write_lock(&current->sighand->action_lock);
+	write_lock_irq(&current->sighand->action_lock);
 	current->sighand->action[(sig)-1].sa.sa_handler = SIG_IGN;
-	write_unlock(&current->sighand->action_lock);
+	write_unlock_irq(&current->sighand->action_lock);
 
 	recalc_sigpending();
-	spin_unlock_irq(&current->sighand->siglock);
 	return 0;
 }
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 8c5cf19..606604a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -992,6 +992,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	if (clone_flags & CLONE_NEWPID)
 		sig->flags |= SIGNAL_UNKILLABLE;
 	sig->curr_target = tsk;
+	spin_lock_init(&sig->shared_siglock);
 	init_sigpending(&sig->shared_pending);
 	spin_lock_init(&sig->ctrl_lock);
 	INIT_LIST_HEAD(&sig->posix_timers);
@@ -1166,6 +1167,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	p->vfork_done = NULL;
 	spin_lock_init(&p->alloc_lock);
 
+	spin_lock_init(&p->siglock);
 	init_sigpending(&p->pending);
 
 	p->utime = cputime_zero;
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 7b01de9..a990da8 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -40,9 +40,7 @@ void refrigerator(void)
 	save = current->state;
 	pr_debug("%s entered refrigerator\n", current->comm);
 
-	spin_lock_irq(&current->sighand->siglock);
 	recalc_sigpending(); /* We sent fake signal, clean it up */
-	spin_unlock_irq(&current->sighand->siglock);
 
 	/* prevent accounting of that task to load */
 	current->flags |= PF_FREEZING;
@@ -66,9 +64,9 @@ static void fake_signal_wake_up(struct task_struct *p)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&p->sighand->siglock, flags);
+	spin_lock_irqsave(&p->siglock, flags);
 	signal_wake_up(p, 0);
-	spin_unlock_irqrestore(&p->sighand->siglock, flags);
+	spin_unlock_irqrestore(&p->siglock, flags);
 }
 
 /**
@@ -122,14 +120,10 @@ bool freeze_task(struct task_struct *p, bool sig_only)
 
 void cancel_freezing(struct task_struct *p)
 {
-	unsigned long flags;
-
 	if (freezing(p)) {
 		pr_debug("  clean up: %s\n", p->comm);
 		clear_freeze_flag(p);
-		spin_lock_irqsave(&p->sighand->siglock, flags);
 		recalc_sigpending_and_wake(p);
-		spin_unlock_irqrestore(&p->sighand->siglock, flags);
 	}
 }
 
diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index 4556182..bb1b2c8 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -502,6 +502,8 @@ static void k_itimer_rcu_free(struct rcu_head *head)
 #define IT_ID_NOT_SET	0
 static void release_posix_timer(struct k_itimer *tmr, int it_id_set)
 {
+	int shared;
+
 	if (it_id_set) {
 		unsigned long flags;
 		spin_lock_irqsave(&idr_lock, flags);
@@ -509,7 +511,8 @@ static void release_posix_timer(struct k_itimer *tmr, int it_id_set)
 		spin_unlock_irqrestore(&idr_lock, flags);
 	}
 	put_pid(tmr->it_pid);
-	sigqueue_free(tmr->sigq);
+	shared = !(tmr->it_sigev_notify & SIGEV_THREAD_ID);
+	sigqueue_free(tmr->sigq, shared);
 	call_rcu(&tmr->it.rcu, k_itimer_rcu_free);
 }
 
diff --git a/kernel/signal.c b/kernel/signal.c
index ca99c2d..c8176ea 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -50,7 +50,6 @@
  *        * most things under tsk->signal
  *
  *        * tsk->last_siginfo
- *        * tsk->pending
  *
  *        * tsk->cpu_timers
  *
@@ -67,12 +66,24 @@
  *        * tsk->signal->notify_count
  *        * tsk->signal->group_stop
  *        * tsk->signal->flags
+ *        * tsk->real_blocked
  *        * tsk->group_stop
  *        * tsk->jobctl
  *
  *        * the atomic operation of checking tsk->jobctl, tsk->pending and
  *          tsk->signal->shared_pending and setting/clearing TIF_SIGPENDING,
  *          see recalc_sigpending().
+ *
+ *   - ->siglock (spinlock) protects,
+ *
+ *        * tsk->notifier_data
+ *        * tsk->notifier_mask
+ *        * tsk->notifier
+ *        * tsk->pending
+ *
+ *   - signal->shared_siglock (spinlock) protects,
+ *
+ *        * tsk->signal->shared_pending
  */
 
 /*
@@ -178,8 +189,8 @@ static inline int has_pending_signals(sigset_t *signal, sigset_t *blocked)
  */
 static int recalc_sigpending_tsk(struct task_struct *t)
 {
-	assert_spin_locked(&t->sighand->siglock);
 	assert_spin_locked(&t->signal->ctrl_lock);
+	assert_spin_locked(&t->siglock);
 
 	if ((t->jobctl & JOBCTL_PENDING_MASK) ||
 	    PENDING(&t->pending, &t->blocked) ||
@@ -202,21 +213,32 @@ static int recalc_sigpending_tsk(struct task_struct *t)
 void recalc_sigpending_and_wake(struct task_struct *t)
 {
 	struct signal_struct *sig = t->signal;
+	unsigned long flags;
 
-	spin_lock(&sig->ctrl_lock);
+	spin_lock_irqsave(&sig->ctrl_lock, flags);
+	spin_lock(&t->siglock);
 	if (recalc_sigpending_tsk(t))
 		signal_wake_up(t, 0);
-	spin_unlock(&sig->ctrl_lock);
+	spin_unlock(&t->siglock);
+	spin_unlock_irqrestore(&sig->ctrl_lock, flags);
+}
+
+static void __recalc_sigpending(void)
+{
+	if (!recalc_sigpending_tsk(current) && !freezing(current))
+		clear_thread_flag(TIF_SIGPENDING);
 }
 
 void recalc_sigpending(void)
 {
 	struct signal_struct *sig = current->signal;
+	unsigned long flags;
 
-	spin_lock(&sig->ctrl_lock);
-	if (!recalc_sigpending_tsk(current) && !freezing(current))
-		clear_thread_flag(TIF_SIGPENDING);
-	spin_unlock(&sig->ctrl_lock);
+	spin_lock_irqsave(&sig->ctrl_lock, flags);
+	spin_lock(&current->siglock);
+	__recalc_sigpending();
+	spin_unlock(&current->siglock);
+	spin_unlock_irqrestore(&sig->ctrl_lock, flags);
 }
 
 /* Given the mask, find the first available signal that should be serviced. */
@@ -479,6 +501,9 @@ void flush_sigqueue(struct sigpending *queue)
  */
 void __flush_signals(struct task_struct *t)
 {
+	assert_spin_locked(&t->siglock);
+	assert_spin_locked(&t->signal->shared_siglock);
+
 	clear_tsk_thread_flag(t, TIF_SIGPENDING);
 	flush_sigqueue(&t->pending);
 	flush_sigqueue(&t->signal->shared_pending);
@@ -488,9 +513,11 @@ void flush_signals(struct task_struct *t)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&t->sighand->siglock, flags);
+	spin_lock_irqsave(&t->siglock, flags);
+	spin_lock(&t->signal->shared_siglock);
 	__flush_signals(t);
-	spin_unlock_irqrestore(&t->sighand->siglock, flags);
+	spin_unlock(&t->signal->shared_siglock);
+	spin_unlock_irqrestore(&t->siglock, flags);
 }
 
 static void __flush_itimer_signals(struct sigpending *pending)
@@ -521,10 +548,12 @@ void flush_itimer_signals(void)
 	struct task_struct *tsk = current;
 	unsigned long flags;
 
-	spin_lock_irqsave(&tsk->sighand->siglock, flags);
+	spin_lock_irqsave(&tsk->siglock, flags);
+	spin_lock(&tsk->signal->shared_siglock);
 	__flush_itimer_signals(&tsk->pending);
 	__flush_itimer_signals(&tsk->signal->shared_pending);
-	spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
+	spin_unlock(&tsk->signal->shared_siglock);
+	spin_unlock_irqrestore(&tsk->siglock, flags);
 }
 
 /*
@@ -584,11 +613,11 @@ block_all_signals(int (*notifier)(void *priv), void *priv, sigset_t *mask)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&current->sighand->siglock, flags);
+	spin_lock_irqsave(&current->siglock, flags);
 	current->notifier_mask = mask;
 	current->notifier_data = priv;
 	current->notifier = notifier;
-	spin_unlock_irqrestore(&current->sighand->siglock, flags);
+	spin_unlock_irqrestore(&current->siglock, flags);
 }
 
 /* Notify the system that blocking has ended. */
@@ -598,11 +627,13 @@ unblock_all_signals(void)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&current->sighand->siglock, flags);
+	spin_lock_irqsave(&current->signal->ctrl_lock, flags);
+	spin_lock(&current->siglock);
 	current->notifier = NULL;
 	current->notifier_data = NULL;
-	recalc_sigpending();
-	spin_unlock_irqrestore(&current->sighand->siglock, flags);
+	__recalc_sigpending();
+	spin_unlock(&current->siglock);
+	spin_unlock_irqrestore(&current->signal->ctrl_lock, flags);
 }
 
 static void collect_signal(int sig, struct sigpending *list, siginfo_t *info)
@@ -664,6 +695,49 @@ static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
 }
 
 /*
+ * This is the slowpath because we need to acquire the heavily contended
+ * shared_siglock.
+ */
+static int dequeue_slowpath(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
+{
+	unsigned long flags;
+	int signr;
+
+	spin_lock_irqsave(&tsk->signal->shared_siglock, flags);
+	signr = __dequeue_signal(&tsk->signal->shared_pending,
+				 mask, info);
+	spin_unlock_irqrestore(&tsk->signal->shared_siglock, flags);
+
+	/*
+	 * itimer signal ?
+	 *
+	 * itimers are process shared and we restart periodic
+	 * itimers in the signal delivery path to prevent DoS
+	 * attacks in the high resolution timer case. This is
+	 * compliant with the old way of self-restarting
+	 * itimers, as the SIGALRM is a legacy signal and only
+	 * queued once. Changing the restart behaviour to
+	 * restart the timer in the signal dequeue path is
+	 * reducing the timer noise on heavy loaded !highres
+	 * systems too.
+	 */
+	if (unlikely(signr == SIGALRM)) {
+		struct hrtimer *tmr = &tsk->signal->real_timer;
+
+		spin_lock_irqsave(&tsk->sighand->siglock, flags);
+		if (!hrtimer_is_queued(tmr) &&
+		    tsk->signal->it_real_incr.tv64 != 0) {
+			hrtimer_forward(tmr, tmr->base->get_time(),
+					tsk->signal->it_real_incr);
+			hrtimer_restart(tmr);
+		}
+		spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
+	}
+
+	return signr;
+}
+
+/*
  * Dequeue a signal and return the element to the caller, which is
  * expected to free it.
  */
@@ -672,47 +746,19 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 	unsigned long flags;
 	int signr;
 
-	spin_lock_irqsave(&current->sighand->siglock, flags);
+	spin_lock_irqsave(&current->siglock, flags);
 
 	/* We only dequeue private signals from ourselves, we don't let
 	 * signalfd steal them
 	 */
 	signr = __dequeue_signal(&tsk->pending, mask, info);
-	if (!signr) {
-		signr = __dequeue_signal(&tsk->signal->shared_pending,
-					 mask, info);
-		/*
-		 * itimer signal ?
-		 *
-		 * itimers are process shared and we restart periodic
-		 * itimers in the signal delivery path to prevent DoS
-		 * attacks in the high resolution timer case. This is
-		 * compliant with the old way of self-restarting
-		 * itimers, as the SIGALRM is a legacy signal and only
-		 * queued once. Changing the restart behaviour to
-		 * restart the timer in the signal dequeue path is
-		 * reducing the timer noise on heavy loaded !highres
-		 * systems too.
-		 */
-		if (unlikely(signr == SIGALRM)) {
-			struct hrtimer *tmr = &tsk->signal->real_timer;
-
-			if (!hrtimer_is_queued(tmr) &&
-			    tsk->signal->it_real_incr.tv64 != 0) {
-				hrtimer_forward(tmr, tmr->base->get_time(),
-						tsk->signal->it_real_incr);
-				hrtimer_restart(tmr);
-			}
-		}
-	}
+	spin_unlock_irqrestore(&current->siglock, flags);
+	if (!signr)
+		signr = dequeue_slowpath(tsk, mask, info);
 
-	spin_lock(&current->signal->ctrl_lock);
 	recalc_sigpending();
-	if (!signr) {
-		spin_unlock(&current->signal->ctrl_lock);
-		spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
+	if (!signr)
 		return 0;
-	}
 
 	if (unlikely(sig_kernel_stop(signr))) {
 		/*
@@ -727,12 +773,11 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 		 * is to alert stop-signal processing code when another
 		 * processor has come along and cleared the flag.
 		 */
+		spin_lock_irqsave(&current->signal->ctrl_lock, flags);
 		current->jobctl |= JOBCTL_STOP_DEQUEUED;
+		spin_unlock_irqrestore(&current->signal->ctrl_lock, flags);
 	}
 
-	spin_unlock(&current->signal->ctrl_lock);
-	spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
-
 	if ((info->si_code & __SI_MASK) == __SI_TIMER && info->si_sys_private)
 		do_schedule_next_timer(info);
 
@@ -932,8 +977,8 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 {
 	struct signal_struct *signal = p->signal;
 	struct task_struct *t;
+	unsigned long flags;
 
-	spin_lock(&signal->ctrl_lock);
 	if (unlikely(signal->flags & SIGNAL_GROUP_EXIT)) {
 		/*
 		 * The process is in the middle of dying, nothing to do.
@@ -942,21 +987,32 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 		/*
 		 * This is a stop signal.  Remove SIGCONT from all queues.
 		 */
+		spin_lock(&signal->shared_siglock);
 		rm_from_queue(sigmask(SIGCONT), &signal->shared_pending);
+		spin_unlock(&signal->shared_siglock);
 		t = p;
 		do {
+			spin_lock(&t->siglock);
 			rm_from_queue(sigmask(SIGCONT), &t->pending);
+			spin_unlock(&t->siglock);
 		} while_each_thread(p, t);
 	} else if (sig == SIGCONT) {
 		unsigned int why;
+
+		spin_lock_irqsave(&signal->ctrl_lock, flags);
+
 		/*
 		 * Remove all stop signals from all queues, wake all threads.
 		 */
+		spin_lock(&signal->shared_siglock);
 		rm_from_queue(SIG_KERNEL_STOP_MASK, &signal->shared_pending);
+		spin_unlock(&signal->shared_siglock);
 		t = p;
 		do {
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
+			spin_lock(&t->siglock);
 			rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
+			spin_unlock(&t->siglock);
 			if (likely(!(t->ptrace & PT_SEIZED)))
 				wake_up_state(t, __TASK_STOPPED);
 			else
@@ -987,8 +1043,8 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 			signal->group_stop_count = 0;
 			signal->group_exit_code = 0;
 		}
+		spin_unlock_irqrestore(&signal->ctrl_lock, flags);
 	}
-	spin_unlock(&signal->ctrl_lock);
 
 	return !sig_ignored(p, sig, from_ancestor_ns);
 }
@@ -1014,6 +1070,47 @@ static inline int wants_signal(int sig, struct task_struct *p)
 	return task_curr(p) || !signal_pending(p);
 }
 
+/**
+ * complete_fatal_signal - send a fatal signal to a task
+ * @p: task sending the fatal signal
+ * @t: target task receiving the fatal signal
+ * @sig: fatal signal number
+ *
+ * RETURNS:
+ * %true if a fatal signal was delivered, %false otherwise.
+ */
+static bool complete_fatal_signal(struct task_struct *p, struct task_struct *t,
+				  int sig)
+{
+	struct signal_struct *signal = p->signal;
+
+	assert_spin_locked(&signal->ctrl_lock);
+
+	if (!sig_kernel_coredump(sig) && !sigismember(&t->real_blocked, sig) &&
+	    !(signal->flags & (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT))) {
+		/*
+		 * Start a group exit and wake everybody up.
+		 * This way we don't have other threads
+		 * running and doing things after a slower
+		 * thread has the fatal signal pending.
+		 */
+		signal->flags = SIGNAL_GROUP_EXIT;
+		signal->group_exit_code = sig;
+		signal->group_stop_count = 0;
+		t = p;
+		do {
+			task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
+			spin_lock(&t->siglock);
+			sigaddset(&t->pending.signal, SIGKILL);
+			spin_unlock(&t->siglock);
+			signal_wake_up(t, 1);
+		} while_each_thread(p, t);
+		return true;
+	}
+
+	return false;
+}
+
 static void complete_signal(int sig, struct task_struct *p, int group)
 {
 	struct signal_struct *signal = p->signal;
@@ -1055,33 +1152,32 @@ static void complete_signal(int sig, struct task_struct *p, int group)
 	 * Found a killable thread.  If the signal will be fatal,
 	 * then start taking the whole group down immediately.
 	 */
-	spin_lock(&signal->ctrl_lock);
-	if (sig_fatal(p, sig) &&
-	    !(signal->flags & (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) &&
-	    !sigismember(&t->real_blocked, sig) &&
-	    (sig == SIGKILL || !t->ptrace)) {
+	if (sig_fatal(p, sig) && (sig == SIGKILL || !t->ptrace)) {
+		bool fatal;
+
+		if (group)
+			spin_unlock(&signal->shared_siglock);
+		spin_unlock(&p->siglock);
+
 		/*
 		 * This signal will be fatal to the whole group.
 		 */
-		if (!sig_kernel_coredump(sig)) {
-			/*
-			 * Start a group exit and wake everybody up.
-			 * This way we don't have other threads
-			 * running and doing things after a slower
-			 * thread has the fatal signal pending.
-			 */
-			signal->flags = SIGNAL_GROUP_EXIT;
-			signal->group_exit_code = sig;
-			signal->group_stop_count = 0;
-			t = p;
-			do {
-				task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
-				sigaddset(&t->pending.signal, SIGKILL);
-				signal_wake_up(t, 1);
-			} while_each_thread(p, t);
-			spin_unlock(&signal->ctrl_lock);
+		spin_lock(&signal->ctrl_lock);
+		fatal = complete_fatal_signal(p, t, sig);
+		spin_unlock(&signal->ctrl_lock);
+
+		/*
+		 * When we set TIF_SIGPENDING in signal_wake_up() the
+		 * task might get spurious wakeups if it has already
+		 * seen the pending signal after we unlocked
+		 * siglock. It's probably not a big deal.
+		 */
+		spin_lock(&p->siglock);
+		if (group)
+			spin_lock(&signal->shared_siglock);
+
+		if (fatal)
 			return;
-		}
 	}
 
 	/*
@@ -1089,7 +1185,6 @@ static void complete_signal(int sig, struct task_struct *p, int group)
 	 * Tell the chosen thread to wake up and dequeue it.
 	 */
 	signal_wake_up(t, sig == SIGKILL);
-	spin_unlock(&signal->ctrl_lock);
 	return;
 }
 
@@ -1098,21 +1193,36 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
-			int group, int from_ancestor_ns)
+/**
+ * enqueue_signal - add a signal to a pending queue
+ * @t: task's pending queue
+ * @info: signal's siginfo
+ * @sig: signal number
+ * @group: does the signal affect the whole thread group?
+ * @from_ancestor_ns: ancestor namespace
+ *
+ * CONTEXT:
+ * Must be called with @t->siglock held. If @group is set then
+ * @t->signal->shared_siglock must also be held.
+ *
+ * RETURNS:
+ * < 0 if an error occurred
+ * 0 if no signal was enqueued
+ * > 0 if a signal was added to the pending queue.
+ */
+static int enqueue_signal(struct task_struct *t, struct siginfo *info,
+			  int sig, int group, int from_ancestor_ns)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
 	int override_rlimit;
 
-	trace_signal_generate(sig, info, t);
-
-	assert_spin_locked(&t->sighand->siglock);
-
-	if (!prepare_signal(sig, t, from_ancestor_ns))
-		return 0;
+	assert_spin_locked(&t->siglock);
+	if (group)
+		assert_spin_locked(&t->signal->shared_siglock);
 
 	pending = group ? &t->signal->shared_pending : &t->pending;
+
 	/*
 	 * Short-circuit ignored signals and support queuing
 	 * exactly one non-rt signal, so that we can get more
@@ -1188,8 +1298,35 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 out_set:
 	signalfd_notify(t, sig);
 	sigaddset(&pending->signal, sig);
-	complete_signal(sig, t, group);
-	return 0;
+	return 1;
+}
+
+static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
+			int group, int from_ancestor_ns)
+{
+	unsigned long flags;
+	int ret;
+
+	trace_signal_generate(sig, info, t);
+
+	if (!prepare_signal(sig, t, from_ancestor_ns))
+		return 0;
+
+	spin_lock_irqsave(&t->siglock, flags);
+	if (group)
+		spin_lock(&t->signal->shared_siglock);
+
+	ret = enqueue_signal(t, info, sig, group, from_ancestor_ns);
+	if (ret > 0) {
+		complete_signal(sig, t, group);
+		ret = 0;
+	}
+
+	if (group)
+		spin_unlock(&t->signal->shared_siglock);
+	spin_unlock_irqrestore(&t->siglock, flags);
+
+	return ret;
 }
 
 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
@@ -1296,24 +1433,50 @@ force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
 	unsigned long int flags;
 	int ret, blocked, ignored;
 	struct k_sigaction *action;
+	bool fatal;
+	int from_ancestor_ns = 0;
 
-	spin_lock_irqsave(&t->sighand->siglock, flags);
-	write_lock(&t->sighand->action_lock);
+	write_lock_irqsave(&t->sighand->action_lock, flags);
 	action = &t->sighand->action[sig-1];
 	ignored = action->sa.sa_handler == SIG_IGN;
+
+	spin_lock(&t->signal->ctrl_lock);
+	spin_lock(&t->siglock);
 	blocked = sigismember(&t->blocked, sig);
 	if (blocked || ignored) {
 		action->sa.sa_handler = SIG_DFL;
 		if (blocked) {
 			sigdelset(&t->blocked, sig);
-			recalc_sigpending_and_wake(t);
+			if (recalc_sigpending_tsk(t))
+				signal_wake_up(t, 0);
 		}
 	}
 	if (action->sa.sa_handler == SIG_DFL)
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = specific_send_sig_info(sig, info, t);
-	write_unlock(&t->sighand->action_lock);
-	spin_unlock_irqrestore(&t->sighand->siglock, flags);
+
+#ifdef CONFIG_PID_NS
+	from_ancestor_ns = si_fromuser(info) &&
+			   !task_pid_nr_ns(current, task_active_pid_ns(t));
+#endif
+	ret = enqueue_signal(t, info, sig, 0, from_ancestor_ns);
+	spin_unlock(&t->siglock);
+
+	/*
+	 * There's no need to call wants_signal() like
+	 * complete_signal() becuase 't' doesn't get a choice.
+	 */
+	if (ret > 0) {
+		fatal = false;
+		if (sig_fatal(t, sig) && (sig == SIGKILL || !t->ptrace))
+			fatal = complete_fatal_signal(t, t, sig);
+
+		if (!fatal)
+			signal_wake_up(t, sig == SIGKILL);
+		ret = 0;
+	}
+
+	spin_unlock(&t->signal->ctrl_lock);
+	write_unlock_irqrestore(&t->sighand->action_lock, flags);
 
 	return ret;
 }
@@ -1335,7 +1498,9 @@ int zap_other_threads(struct task_struct *p)
 		/* Don't bother with already dead threads */
 		if (t->exit_state)
 			continue;
+		spin_lock(&t->siglock);
 		sigaddset(&t->pending.signal, SIGKILL);
+		spin_unlock(&t->siglock);
 		signal_wake_up(t, 1);
 	}
 
@@ -1613,17 +1778,23 @@ struct sigqueue *sigqueue_alloc(void)
 	return q;
 }
 
-void sigqueue_free(struct sigqueue *q)
+void sigqueue_free(struct sigqueue *q, int group)
 {
 	unsigned long flags;
-	spinlock_t *lock = &current->sighand->siglock;
+	spinlock_t *lock;
 
 	BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
 	/*
-	 * We must hold ->siglock while testing q->list
-	 * to serialize with collect_signal() or with
+	 * We must hold ->siglock or ->shared_siglock
+	 * while testing q->list to serialize with
+	 * collect_signal() or with
 	 * __exit_signal()->flush_sigqueue().
 	 */
+	if (group)
+		lock = &current->signal->shared_siglock;
+	else
+		lock = &current->siglock;
+
 	spin_lock_irqsave(lock, flags);
 	q->flags &= ~SIGQUEUE_PREALLOC;
 	/*
@@ -1655,6 +1826,10 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
 	if (!prepare_signal(sig, t, 0))
 		goto out;
 
+	spin_lock(&t->siglock);
+	if (group)
+		spin_lock(&t->signal->shared_siglock);
+
 	ret = 0;
 	if (unlikely(!list_empty(&q->list))) {
 		/*
@@ -1663,7 +1838,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
 		 */
 		BUG_ON(q->info.si_code != SI_TIMER);
 		q->info.si_overrun++;
-		goto out;
+		goto out_siglock;
 	}
 	q->info.si_overrun = 0;
 
@@ -1672,6 +1847,10 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
 	list_add_tail(&q->list, &pending->list);
 	sigaddset(&pending->signal, sig);
 	complete_signal(sig, t, group);
+out_siglock:
+	if (group)
+		spin_unlock(&t->signal->shared_siglock);
+	spin_unlock(&t->siglock);
 out:
 	unlock_task_sighand(t, &flags);
 ret:
@@ -1735,8 +1914,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	}
 
 	psig = tsk->parent->sighand;
-	spin_lock_irqsave(&psig->siglock, flags);
-	read_lock(&psig->action_lock);
+	read_lock_irqsave(&psig->action_lock, flags);
 	if (!tsk->ptrace && sig == SIGCHLD &&
 	    (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN ||
 	     (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) {
@@ -1762,8 +1940,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	if (valid_signal(sig) && sig)
 		__group_send_sig_info_locked(sig, &info, tsk->parent);
 	__wake_up_parent(tsk, tsk->parent);
-	read_unlock(&psig->action_lock);
-	spin_unlock_irqrestore(&psig->siglock, flags);
+	read_unlock_irqrestore(&psig->action_lock, flags);
 
 	return autoreap;
 }
@@ -2011,7 +2188,9 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	 * So check for any that we should take before resuming user mode.
 	 * This sets TIF_SIGPENDING, but never clears it.
 	 */
+	spin_lock(&current->siglock);
 	recalc_sigpending_tsk(current);
+	spin_unlock(&current->siglock);
 }
 
 static void ptrace_do_notify(int signr, int exit_code, int why)
@@ -2581,16 +2760,15 @@ void exit_signals(struct task_struct *tsk)
 		return;
 	}
 
-	spin_lock_irq(&tsk->sighand->siglock);
 	/*
 	 * From now this task is not visible for group-wide signals,
 	 * see wants_signal(), do_signal_stop().
 	 */
 	tsk->flags |= PF_EXITING;
 	if (!signal_pending(tsk))
-		goto out;
+		return;
 
-	spin_lock(&tsk->signal->ctrl_lock);
+	spin_lock_irq(&tsk->signal->ctrl_lock);
 	unblocked = tsk->blocked;
 	signotset(&unblocked);
 	retarget_shared_pending(tsk, &unblocked);
@@ -2598,9 +2776,7 @@ void exit_signals(struct task_struct *tsk)
 	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
 	    task_participate_group_stop(tsk))
 		group_stop = CLD_STOPPED;
-	spin_unlock(&tsk->signal->ctrl_lock);
-out:
-	spin_unlock_irq(&tsk->sighand->siglock);
+	spin_unlock_irq(&tsk->signal->ctrl_lock);
 
 	/*
 	 * If group stop has completed, deliver the notification.  This
@@ -2651,7 +2827,7 @@ static void __set_task_blocked(struct task_struct *tsk, const sigset_t *newset)
 		retarget_shared_pending(tsk, &newblocked);
 	}
 	tsk->blocked = *newset;
-	recalc_sigpending();
+	__recalc_sigpending();
 }
 
 /**
@@ -2665,11 +2841,11 @@ void set_current_blocked(const sigset_t *newset)
 {
 	struct task_struct *tsk = current;
 
-	spin_lock_irq(&tsk->sighand->siglock);
-	spin_lock(&tsk->signal->ctrl_lock);
+	spin_lock_irq(&tsk->signal->ctrl_lock);
+	spin_lock(&tsk->siglock);
 	__set_task_blocked(tsk, newset);
-	spin_unlock(&tsk->signal->ctrl_lock);
-	spin_unlock_irq(&tsk->sighand->siglock);
+	spin_unlock(&tsk->siglock);
+	spin_unlock_irq(&tsk->signal->ctrl_lock);
 }
 EXPORT_SYMBOL(set_current_blocked);
 
@@ -2753,10 +2929,12 @@ long do_sigpending(void __user *set, unsigned long sigsetsize)
 	if (sigsetsize > sizeof(sigset_t))
 		goto out;
 
-	spin_lock_irq(&current->sighand->siglock);
+	spin_lock_irq(&current->siglock);
+	spin_lock(&current->signal->shared_siglock);
 	sigorsets(&pending, &current->pending.signal,
 		  &current->signal->shared_pending.signal);
-	spin_unlock_irq(&current->sighand->siglock);
+	spin_unlock(&current->signal->shared_siglock);
+	spin_unlock_irq(&current->siglock);
 
 	/* Outside the lock because only this thread touches it.  */
 	sigandsets(&pending, &current->blocked, &pending);
@@ -2894,20 +3072,22 @@ int do_sigtimedwait(const sigset_t *which, siginfo_t *info,
 		 * they arrive. Unblocking is always fine, we can avoid
 		 * set_current_blocked().
 		 */
-		spin_lock_irq(&tsk->sighand->siglock);
+		spin_lock_irq(&tsk->signal->ctrl_lock);
 		tsk->real_blocked = tsk->blocked;
 		sigandsets(&tsk->blocked, &tsk->blocked, &mask);
-		recalc_sigpending();
-		spin_unlock_irq(&tsk->sighand->siglock);
+		spin_lock(&tsk->siglock);
+		__recalc_sigpending();
+		spin_unlock(&tsk->siglock);
+		spin_unlock_irq(&tsk->signal->ctrl_lock);
 
 		timeout = schedule_timeout_interruptible(timeout);
 
-		spin_lock_irq(&tsk->sighand->siglock);
-		spin_lock(&tsk->signal->ctrl_lock);
+		spin_lock_irq(&tsk->signal->ctrl_lock);
+		spin_lock(&tsk->siglock);
 		__set_task_blocked(tsk, &tsk->real_blocked);
+		spin_unlock(&tsk->siglock);
 		siginitset(&tsk->real_blocked, 0);
-		spin_unlock(&tsk->signal->ctrl_lock);
-		spin_unlock_irq(&tsk->sighand->siglock);
+		spin_unlock_irq(&tsk->signal->ctrl_lock);
 		sig = dequeue_signal(tsk, &mask, info);
 	}
 
diff --git a/net/9p/client.c b/net/9p/client.c
index 0505a03..2ba5608 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -594,7 +594,6 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	va_list ap;
 	int tag, err;
 	struct p9_req_t *req;
-	unsigned long flags;
 	int sigpending;
 
 	P9_DPRINTK(P9_DEBUG_MUX, "client %p op %d\n", c, type);
@@ -664,11 +663,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 			err = 0;
 	}
 
-	if (sigpending) {
-		spin_lock_irqsave(&current->sighand->siglock, flags);
+	if (sigpending)
 		recalc_sigpending();
-		spin_unlock_irqrestore(&current->sighand->siglock, flags);
-	}
 
 	if (err < 0)
 		goto reterr;
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 6a69a11..5532009 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -930,7 +930,6 @@ static void __svc_unregister(const u32 program, const u32 version,
 static void svc_unregister(const struct svc_serv *serv)
 {
 	struct svc_program *progp;
-	unsigned long flags;
 	unsigned int i;
 
 	clear_thread_flag(TIF_SIGPENDING);
@@ -948,9 +947,7 @@ static void svc_unregister(const struct svc_serv *serv)
 		}
 	}
 
-	spin_lock_irqsave(&current->sighand->siglock, flags);
 	recalc_sigpending();
-	spin_unlock_irqrestore(&current->sighand->siglock, flags);
 }
 
 /*
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 47f278f..391bb8e 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2271,17 +2271,20 @@ static void selinux_bprm_committed_creds(struct linux_binprm *bprm)
 		memset(&itimer, 0, sizeof itimer);
 		for (i = 0; i < 3; i++)
 			do_setitimer(i, &itimer, NULL);
-		spin_lock_irq(&current->sighand->siglock);
-		spin_lock(&current->signal->ctrl_lock);
+		spin_lock_irq(&current->signal->ctrl_lock);
 		if (!(current->signal->flags & SIGNAL_GROUP_EXIT)) {
+			spin_lock(&current->siglock);
+			spin_lock(&current->signal->shared_siglock);
 			__flush_signals(current);
+			spin_unlock(&current->signal->shared_siglock);
+			spin_unlock(&current->siglock);
+
 			write_lock(&current->sighand->action_lock);
 			flush_signal_handlers(current, 1);
 			write_unlock(&current->sighand->action_lock);
 			sigemptyset(&current->blocked);
 		}
-		spin_unlock(&current->signal->ctrl_lock);
-		spin_unlock_irq(&current->sighand->siglock);
+		spin_unlock_irq(&current->signal->ctrl_lock);
 	}
 
 	/* Wake up the parent if it is waiting so that it can recheck
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 2/5] signal: Add rwlock to protect sighand->action
  2011-09-30 15:12 ` [RFC][PATCH 2/5] signal: Add rwlock to protect sighand->action Matt Fleming
@ 2011-09-30 15:25   ` Peter Zijlstra
  2011-09-30 15:56   ` Matt Fleming
  1 sibling, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2011-09-30 15:25 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Oleg Nesterov, Tejun Heo, linux-kernel, Tony Luck, Matt Fleming,
	Thomas Gleixner, Anirudh Badam

On Fri, 2011-09-30 at 16:12 +0100, Matt Fleming wrote:
> 
> As sighand->action is read much more frequently than written a rwlock
> makes the most sense here. 

Ha! you would think so, but then you'd forget that
read_lock()+read_unlock() are atomic ops modifying the lock state as
well. Furthermore rwlocks aren't fair by any means.

Therefore rwlock_t should never be used, use a spinlock_t possibly in
combination with RCU or seqcount etc..

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 4/5] signal: Add signal->ctrl_lock for job control
  2011-09-30 15:12 ` [RFC][PATCH 4/5] signal: Add signal->ctrl_lock for job control Matt Fleming
@ 2011-09-30 15:30   ` Peter Zijlstra
  2011-09-30 15:36     ` Matt Fleming
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2011-09-30 15:30 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Oleg Nesterov, Tejun Heo, linux-kernel, Tony Luck, Matt Fleming,
	Thomas Gleixner, Anirudh Badam

On Fri, 2011-09-30 at 16:12 +0100, Matt Fleming wrote:
> +       assert_spin_locked(&t->sighand->siglock);
> +       assert_spin_locked(&t->signal->ctrl_lock);

There's also lockdep_assert_held(), the difference is that
assert_spin_locked() will always generate code, and only checks that the
lock is held, not that we are the current owner.

lockdep_assert_held() will only generate code for lockdep kernels and
will in fact check that the specified lock is held by the current task.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 4/5] signal: Add signal->ctrl_lock for job control
  2011-09-30 15:30   ` Peter Zijlstra
@ 2011-09-30 15:36     ` Matt Fleming
  0 siblings, 0 replies; 38+ messages in thread
From: Matt Fleming @ 2011-09-30 15:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Tejun Heo, linux-kernel, Tony Luck,
	Thomas Gleixner, Anirudh Badam

On Fri, 2011-09-30 at 17:30 +0200, Peter Zijlstra wrote:
> On Fri, 2011-09-30 at 16:12 +0100, Matt Fleming wrote:
> > +       assert_spin_locked(&t->sighand->siglock);
> > +       assert_spin_locked(&t->signal->ctrl_lock);
> 
> There's also lockdep_assert_held(), the difference is that
> assert_spin_locked() will always generate code, and only checks that the
> lock is held, not that we are the current owner.
> 
> lockdep_assert_held() will only generate code for lockdep kernels and
> will in fact check that the specified lock is held by the current task.

Aha! Yes, that's what I want, not assert_spin_locked(). Thanks!

-- 
Matt Fleming, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 2/5] signal: Add rwlock to protect sighand->action
  2011-09-30 15:12 ` [RFC][PATCH 2/5] signal: Add rwlock to protect sighand->action Matt Fleming
  2011-09-30 15:25   ` Peter Zijlstra
@ 2011-09-30 15:56   ` Matt Fleming
  1 sibling, 0 replies; 38+ messages in thread
From: Matt Fleming @ 2011-09-30 15:56 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, linux-kernel, Tony Luck, Peter Zijlstra,
	Thomas Gleixner, Anirudh Badam

On Fri, 2011-09-30 at 16:12 +0100, Matt Fleming wrote:
> @@ -2935,9 +3025,11 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
>  	if (!valid_signal(sig) || sig < 1 || (act && sig_kernel_only(sig)))
>  		return -EINVAL;
>  
> +	spin_lock_irq(&current->sighand->siglock);
> +	read_lock(&current->sighand->action_lock);
> +
>  	k = &t->sighand->action[sig-1];
>  
> -	spin_lock_irq(&current->sighand->siglock);
>  	if (oact)
>  		*oact = *k;
>  
> @@ -2967,6 +3059,7 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
>  		}
>  	}
>  
> +	read_unlock(&current->sighand->action_lock);
>  	spin_unlock_irq(&current->sighand->siglock);
>  	return 0;
>  }

Guh! I just noticed this. That should obviously be write-locked.

-- 
Matt Fleming, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-09-30 15:12 [RFC][PATCH 0/5] Signal scalability series Matt Fleming
                   ` (4 preceding siblings ...)
  2011-09-30 15:12 ` [RFC][PATCH 5/5] signal: Split siglock into shared_siglock and per-thread siglock Matt Fleming
@ 2011-09-30 16:52 ` Oleg Nesterov
  2011-09-30 18:54   ` Oleg Nesterov
  2011-09-30 20:00   ` Matt Fleming
  2011-09-30 22:30 ` Andi Kleen
  6 siblings, 2 replies; 38+ messages in thread
From: Oleg Nesterov @ 2011-09-30 16:52 UTC (permalink / raw)
  To: Matt Fleming; +Cc: Tejun Heo, linux-kernel, Tony Luck, Matt Fleming

On 09/30, Matt Fleming wrote:
>
> However, it's a good first step and
> hopefully by keeping it relatively simple it'll make it easier to
> review.

Cough. I'll try to read this series next week, but currently I feel
I will never able to understand this code. It surely compliacates
things a lot.

But. All I can do is to _try_ to check this series from the correctness
pov. I can't believe (at least at first glance) this worth the trouble,
but otoh I won't argue unless I'll find the bugs.

>  arch/ia64/kernel/signal.c           |    4 +-
>  drivers/block/nbd.c                 |    2 +-
>  drivers/usb/gadget/f_mass_storage.c |    2 +-
>  drivers/usb/gadget/file_storage.c   |    2 +-
>  fs/autofs4/waitq.c                  |    5 +-
>  fs/exec.c                           |   17 +-
>  fs/jffs2/background.c               |    2 +-
>  fs/ncpfs/sock.c                     |    2 +
>  fs/proc/array.c                     |    2 +
>  fs/signalfd.c                       |   11 +-
>  include/linux/init_task.h           |    4 +
>  include/linux/sched.h               |   23 +-
>  kernel/exit.c                       |   29 +-
>  kernel/fork.c                       |    4 +
>  kernel/freezer.c                    |   10 +-
>  kernel/kmod.c                       |    8 +-
>  kernel/posix-timers.c               |    5 +-
>  kernel/ptrace.c                     |   68 ++--
>  kernel/signal.c                     |  737 +++++++++++++++++++++++++++--------
>  net/9p/client.c                     |    6 +-
>  net/sunrpc/svc.c                    |    3 -
>  security/selinux/hooks.c            |   11 +-
>  22 files changed, 677 insertions(+), 280 deletions(-)

And, this patch adds 4 new locks:

	sighand_struct->action_lock

	signal_struct->ctrl_lock
	signal_struct->shared_siglock

	task_struct->siglock

Nice ;) For what? This should be justified, imho.


Hmm. Just out of curiosity, I blindly applied the whole series and poke
the _random_ function to look at, dequeue_signal(). And it looks wrong.

	spin_lock_irqsave(&current->signal->ctrl_lock, flags);
	current->jobctl |= JOBCTL_STOP_DEQUEUED;

This signal->ctrl_lock can't help. A sig_kernel_stop() should be
dequeued under the same lock, and we shouldn't release it unless we
set JOBCTL_STOP_DEQUEUED. Otherwise we race with SIGCONT.

May be do_signal_stop() does something special? At first flance it doesn't.
But wait, it does while_each_thread() under ->ctrl_lock, why this is safe?

May be I was just lucky ;)

Oleg.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-09-30 16:52 ` [RFC][PATCH 0/5] Signal scalability series Oleg Nesterov
@ 2011-09-30 18:54   ` Oleg Nesterov
  2011-09-30 20:00   ` Matt Fleming
  1 sibling, 0 replies; 38+ messages in thread
From: Oleg Nesterov @ 2011-09-30 18:54 UTC (permalink / raw)
  To: Matt Fleming; +Cc: Tejun Heo, linux-kernel, Tony Luck, Matt Fleming

On 09/30, Oleg Nesterov wrote:
>
> On 09/30, Matt Fleming wrote:
> >
> > However, it's a good first step and
> > hopefully by keeping it relatively simple it'll make it easier to
> > review.
>
> Cough. I'll try to read this series next week, but currently I feel
> I will never able to understand this code. It surely compliacates
> things a lot.
>
> But. All I can do is to _try_ to check this series from the correctness
> pov. I can't believe (at least at first glance) this worth the trouble,
> but otoh I won't argue unless I'll find the bugs.
>
> >  arch/ia64/kernel/signal.c           |    4 +-
> >  drivers/block/nbd.c                 |    2 +-
> >  drivers/usb/gadget/f_mass_storage.c |    2 +-
> >  drivers/usb/gadget/file_storage.c   |    2 +-
> >  fs/autofs4/waitq.c                  |    5 +-
> >  fs/exec.c                           |   17 +-
> >  fs/jffs2/background.c               |    2 +-
> >  fs/ncpfs/sock.c                     |    2 +
> >  fs/proc/array.c                     |    2 +
> >  fs/signalfd.c                       |   11 +-
> >  include/linux/init_task.h           |    4 +
> >  include/linux/sched.h               |   23 +-
> >  kernel/exit.c                       |   29 +-
> >  kernel/fork.c                       |    4 +
> >  kernel/freezer.c                    |   10 +-
> >  kernel/kmod.c                       |    8 +-
> >  kernel/posix-timers.c               |    5 +-
> >  kernel/ptrace.c                     |   68 ++--
> >  kernel/signal.c                     |  737 +++++++++++++++++++++++++++--------
> >  net/9p/client.c                     |    6 +-
> >  net/sunrpc/svc.c                    |    3 -
> >  security/selinux/hooks.c            |   11 +-
> >  22 files changed, 677 insertions(+), 280 deletions(-)
>
> And, this patch adds 4 new locks:
>
> 	sighand_struct->action_lock
>
> 	signal_struct->ctrl_lock
> 	signal_struct->shared_siglock
>
> 	task_struct->siglock
>
> Nice ;) For what? This should be justified, imho.

Yes. I did the quick and dirty check (under kvm),

Before this series:

	[tst@myhost ~]$ time perl -wle '$SIG{HUP}=sub{}; kill HUP, $$ for 1..100_000'

	real    0m2.451s
	user    0m0.350s
	sys     0m2.097s
	[tst@myhost ~]$ time perl -wle '$SIG{HUP}=sub{}; kill HUP, $$ for 1..100_000'

	real    0m2.475s
	user    0m0.357s
	sys     0m2.117s
	[tst@myhost ~]$ time perl -wle '$SIG{HUP}=sub{}; kill HUP, $$ for 1..100_000'

	real    0m2.443s
	user    0m0.330s
	sys     0m2.113s

After:

	tst@myhost ~]$ time perl -wle '$SIG{HUP}=sub{}; kill HUP, $$ for 1..100_000'

	real    0m3.194s
	user    0m0.283s
	sys     0m2.910s
	[tst@myhost ~]$ time perl -wle '$SIG{HUP}=sub{}; kill HUP, $$ for 1..100_000'

	real    0m3.212s
	user    0m0.357s
	sys     0m2.853s
	[tst@myhost ~]$ time perl -wle '$SIG{HUP}=sub{}; kill HUP, $$ for 1..100_000'

	real    0m3.196s
	user    0m0.350s
	sys     0m2.846s

Doesn't like very good (may be only under kvm?). In fact I am really
surprised, I didn't expect the difference will be that noticeable.

Yes, yes, I understand that your goal is scalability, but still.

Oleg.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-09-30 16:52 ` [RFC][PATCH 0/5] Signal scalability series Oleg Nesterov
  2011-09-30 18:54   ` Oleg Nesterov
@ 2011-09-30 20:00   ` Matt Fleming
  2011-09-30 23:56     ` Tejun Heo
  2011-10-03 13:16     ` Oleg Nesterov
  1 sibling, 2 replies; 38+ messages in thread
From: Matt Fleming @ 2011-09-30 20:00 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Tejun Heo, linux-kernel, Tony Luck

On Fri, 2011-09-30 at 18:52 +0200, Oleg Nesterov wrote:
> And, this patch adds 4 new locks:
> 
> 	sighand_struct->action_lock
> 
> 	signal_struct->ctrl_lock
> 	signal_struct->shared_siglock
> 
> 	task_struct->siglock
> 
> Nice ;) For what? This should be justified, imho.

Well, sighand->siglock is seriously overused. It protects so much and I
think it's pretty confusing. It took me long enough to figure out how
many locks were really needed. But that's beside the point, having a
single lock doesn't scale at all, and that's what this series is about.

> Hmm. Just out of curiosity, I blindly applied the whole series and poke
> the _random_ function to look at, dequeue_signal(). And it looks wrong.
> 
> 	spin_lock_irqsave(&current->signal->ctrl_lock, flags);
> 	current->jobctl |= JOBCTL_STOP_DEQUEUED;
> 
> This signal->ctrl_lock can't help. A sig_kernel_stop() should be
> dequeued under the same lock, and we shouldn't release it unless we
> set JOBCTL_STOP_DEQUEUED. Otherwise we race with SIGCONT.

Hmm.. is that really a problem? Does the dequeue and setting
JOBCTL_STOP_DEQUEUED actually need to be atomic? Does it matter if we
have SIGCONT on the signal queue when we set JOBCTL_STOP_DEQUEUED?

> May be do_signal_stop() does something special? At first flance it doesn't.
> But wait, it does while_each_thread() under ->ctrl_lock, why this is safe?

Why is it not safe? What scenario are you thinking of where that isn't
safe?

> May be I was just lucky ;)

I doubt luck has anything to do with it ;-)

Thanks for the review!

-- 
Matt Fleming, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-09-30 15:12 [RFC][PATCH 0/5] Signal scalability series Matt Fleming
                   ` (5 preceding siblings ...)
  2011-09-30 16:52 ` [RFC][PATCH 0/5] Signal scalability series Oleg Nesterov
@ 2011-09-30 22:30 ` Andi Kleen
  2011-10-01  9:35   ` Matt Fleming
  6 siblings, 1 reply; 38+ messages in thread
From: Andi Kleen @ 2011-09-30 22:30 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Oleg Nesterov, Tejun Heo, linux-kernel, Tony Luck, Matt Fleming

Matt Fleming <matt@console-pimps.org> writes:
>
> And just like last time, these patches break ia64 because it
> implements rt_sigprocmask() in assembly. I'll work on the ia64 stuff
> while these patches are being reviewed.

Since we don't want to tie everyone changing signals in the future to
learning IA64 assembler, maybe the best approach would be simply to
change that function to just use the syscall. It can't be that
performance critical and everyone else has it as a syscall too.

I think that would be the best solution for long term maintainability.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-09-30 20:00   ` Matt Fleming
@ 2011-09-30 23:56     ` Tejun Heo
  2011-10-01 10:16       ` Matt Fleming
  2011-10-03 13:16     ` Oleg Nesterov
  1 sibling, 1 reply; 38+ messages in thread
From: Tejun Heo @ 2011-09-30 23:56 UTC (permalink / raw)
  To: Matt Fleming; +Cc: Oleg Nesterov, linux-kernel, Tony Luck

Hello,

On Fri, Sep 30, 2011 at 09:00:23PM +0100, Matt Fleming wrote:
> On Fri, 2011-09-30 at 18:52 +0200, Oleg Nesterov wrote:
> Well, sighand->siglock is seriously overused. It protects so much and I
> think it's pretty confusing. It took me long enough to figure out how
> many locks were really needed. But that's beside the point, having a
> single lock doesn't scale at all, and that's what this series is about.

But scalability for what?  What are the use cases here?  Do we care
enough about benefits to those use cases to accept the increased
complexity?  Having locks with broad coverage isn't necessarily evil.
If things are simpler that way and the protected paths aren't that
hot, who cares?  If splitting the locking makes things simpler, sure
but that doesn't look like the case here, so we need pretty strong
rationale to justify the added complexity.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-09-30 22:30 ` Andi Kleen
@ 2011-10-01  9:35   ` Matt Fleming
  2011-10-03 15:28     ` Linus Torvalds
  0 siblings, 1 reply; 38+ messages in thread
From: Matt Fleming @ 2011-10-01  9:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Oleg Nesterov, Tejun Heo, linux-kernel, Tony Luck,
	Thomas Gleixner, Peter Zijlstra, David Mosberger-Tang

On Fri, 2011-09-30 at 15:30 -0700, Andi Kleen wrote:
> Matt Fleming <matt@console-pimps.org> writes:
> >
> > And just like last time, these patches break ia64 because it
> > implements rt_sigprocmask() in assembly. I'll work on the ia64 stuff
> > while these patches are being reviewed.
> 
> Since we don't want to tie everyone changing signals in the future to
> learning IA64 assembler, maybe the best approach would be simply to
> change that function to just use the syscall. It can't be that
> performance critical and everyone else has it as a syscall too.
> 
> I think that would be the best solution for long term maintainability.

You and me both. Unfortunately, the ia64 guys had a different opinion,
http://www.spinics.net/lists/linux-ia64/msg08363.html

-- 
Matt Fleming, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-09-30 23:56     ` Tejun Heo
@ 2011-10-01 10:16       ` Matt Fleming
  2011-10-01 13:03         ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Matt Fleming @ 2011-10-01 10:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, linux-kernel, Tony Luck, Thomas Gleixner, Peter Zijlstra

On Fri, 2011-09-30 at 16:56 -0700, Tejun Heo wrote:
> Hello,
> 
> On Fri, Sep 30, 2011 at 09:00:23PM +0100, Matt Fleming wrote:
> > On Fri, 2011-09-30 at 18:52 +0200, Oleg Nesterov wrote:
> > Well, sighand->siglock is seriously overused. It protects so much and I
> > think it's pretty confusing. It took me long enough to figure out how
> > many locks were really needed. But that's beside the point, having a
> > single lock doesn't scale at all, and that's what this series is about.
> 
> But scalability for what?  What are the use cases here?  Do we care
> enough about benefits to those use cases to accept the increased
> complexity?  Having locks with broad coverage isn't necessarily evil.
> If things are simpler that way and the protected paths aren't that
> hot, who cares?  If splitting the locking makes things simpler, sure
> but that doesn't look like the case here, so we need pretty strong
> rationale to justify the added complexity.

I've Cc'd some -rt folks who have mentioned this bottleneck in the past.

This patch series improves signal delivery scalability. Signals are
still used heavily in legacy rt applications (as I'm hoping the -rt
folks will attest to) and because everything is currently serialiesd
with the per-process siglock, any heavy usage quickly hits that
bottleneck. It's essentially a big-lock for signal delivery within a
process.

I also think Thomas/Peter mentioned something about latency in
delivering timer signals because of contention on the per-process
siglock. They might have some more details on that.

-- 
Matt Fleming, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-01 10:16       ` Matt Fleming
@ 2011-10-01 13:03         ` Peter Zijlstra
  2011-10-03  1:38           ` Tejun Heo
  2011-10-03 13:07           ` Oleg Nesterov
  0 siblings, 2 replies; 38+ messages in thread
From: Peter Zijlstra @ 2011-10-01 13:03 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Tejun Heo, Oleg Nesterov, linux-kernel, Tony Luck, Thomas Gleixner

On Sat, 2011-10-01 at 11:16 +0100, Matt Fleming wrote:
> I also think Thomas/Peter mentioned something about latency in
> delivering timer signals because of contention on the per-process
> siglock. They might have some more details on that. 

Right, so signal delivery is O(nr_threads), which precludes being able
to deliver signals from hardirq context, leading to lots of ugly in -rt.

The hope is that this work is a stepping stone to O(1) signal delivery.

Breaking up the multitude of uses of siglock certainly seems worthwhile
esp. if it also allows for a cleanup of the horrid mess called
signal_struct (which really should be called process_struct or so).

And yes, aside from that the siglock can be quite contended because its
pretty much the one lock serializing all of the process wide state.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-01 13:03         ` Peter Zijlstra
@ 2011-10-03  1:38           ` Tejun Heo
  2011-10-03 13:56             ` Thomas Gleixner
  2011-10-03 13:07           ` Oleg Nesterov
  1 sibling, 1 reply; 38+ messages in thread
From: Tejun Heo @ 2011-10-03  1:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Matt Fleming, Oleg Nesterov, linux-kernel, Tony Luck,
	Thomas Gleixner, akpm, torvalds

Hello,

(cc'ing Andrew and Linus)

On Sat, Oct 01, 2011 at 03:03:29PM +0200, Peter Zijlstra wrote:
> On Sat, 2011-10-01 at 11:16 +0100, Matt Fleming wrote:
> > I also think Thomas/Peter mentioned something about latency in
> > delivering timer signals because of contention on the per-process
> > siglock. They might have some more details on that. 
> 
> Right, so signal delivery is O(nr_threads), which precludes being able
> to deliver signals from hardirq context, leading to lots of ugly in -rt.

Signal delivery is O(#threads)?  Delivery of fatal signal is of course
but where do we walk all threads during non-fatal signal deliveries?
What am I missing?

> Breaking up the multitude of uses of siglock certainly seems worthwhile
> esp. if it also allows for a cleanup of the horrid mess called
> signal_struct (which really should be called process_struct or so).
> 
> And yes, aside from that the siglock can be quite contended because its
> pretty much the one lock serializing all of the process wide state.

Hmmm... can you please be a bit more specific?  I personally has never
seen a case where siglock becomes a problem and IIUC Matt also doesn't
have actual use case at hand.  Given the fragile nature of this part
of kernel, it would be nice to know what the return is.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-01 13:03         ` Peter Zijlstra
  2011-10-03  1:38           ` Tejun Heo
@ 2011-10-03 13:07           ` Oleg Nesterov
  2011-10-03 15:22             ` Peter Zijlstra
  1 sibling, 1 reply; 38+ messages in thread
From: Oleg Nesterov @ 2011-10-03 13:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Matt Fleming, Tejun Heo, linux-kernel, Tony Luck, Thomas Gleixner

On 10/01, Peter Zijlstra wrote:
>
> On Sat, 2011-10-01 at 11:16 +0100, Matt Fleming wrote:
> > I also think Thomas/Peter mentioned something about latency in
> > delivering timer signals because of contention on the per-process
> > siglock. They might have some more details on that. 
>
> Right, so signal delivery is O(nr_threads),

Yes, a !SIGEV_THREAD_ID timer needs to find a thread which doesn't
block the signal.

But this series can't help afaics. At least in its current state. It
only adds more locking to the sending paths.

And anyway it is wrong (afaics, and I didn't read it yet ;).

> which precludes being able
> to deliver signals from hardirq context, leading to lots of ugly in -rt.

I think, the best solution would be: never send the signal from irq
context, and ->siglock shouldn't disable irqs.

> The hope is that this work is a stepping stone to O(1) signal delivery.

Probably this is possible too. I was thinking anout this when
set_current_blocked() was added. Unfortunately this needs a lot of
complications.

> Breaking up the multitude of uses of siglock certainly seems worthwhile
> esp.

Agreed. But I am not sure how much we should split the locking when
it comes to sending/dequeueing/etc signals. 5 locks seems too much.

> And yes, aside from that the siglock can be quite contended because its
> pretty much the one lock serializing all of the process wide state.

True.

Mostly this is because we moved misc stuff from tasklist to siglock,
previously this was a win. Today this doesn't look good.

Oleg.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-09-30 20:00   ` Matt Fleming
  2011-09-30 23:56     ` Tejun Heo
@ 2011-10-03 13:16     ` Oleg Nesterov
  2011-10-04  8:56       ` Matt Fleming
  1 sibling, 1 reply; 38+ messages in thread
From: Oleg Nesterov @ 2011-10-03 13:16 UTC (permalink / raw)
  To: Matt Fleming; +Cc: Tejun Heo, linux-kernel, Tony Luck

On 09/30, Matt Fleming wrote:
>
> On Fri, 2011-09-30 at 18:52 +0200, Oleg Nesterov wrote:
>
> > Hmm. Just out of curiosity, I blindly applied the whole series and poke
> > the _random_ function to look at, dequeue_signal(). And it looks wrong.
> >
> > 	spin_lock_irqsave(&current->signal->ctrl_lock, flags);
> > 	current->jobctl |= JOBCTL_STOP_DEQUEUED;
> >
> > This signal->ctrl_lock can't help. A sig_kernel_stop() should be
> > dequeued under the same lock, and we shouldn't release it unless we

s/unless/until/

> > set JOBCTL_STOP_DEQUEUED. Otherwise we race with SIGCONT.
>
> Hmm.. is that really a problem? Does the dequeue and setting
> JOBCTL_STOP_DEQUEUED actually need to be atomic?

It should be atomic wrt SIGCONT.

> Does it matter if we
> have SIGCONT on the signal queue when we set JOBCTL_STOP_DEQUEUED?

Why do we have? Usually SIGCONT is ignored. But this doesn't matter,
SIGCONT acts at the sending time.

If SIGCONT is sent - the process must not stop. Since we drop the lock
we can't guarantee this.

> > May be do_signal_stop() does something special? At first flance it doesn't.
> > But wait, it does while_each_thread() under ->ctrl_lock, why this is safe?
>
> Why is it not safe? What scenario are you thinking of where that isn't
> safe?

This series doesn't add ->ctrl_lock into copy_process/__unhash_process
or I misread the patches. This means we can't trust >thread_group list.

Even this is safe (say, we can rely on rcu), we can't calculate
->group_stop_count correctly. In particular, without ->siglock we can
race with exit_signals() which sets PF_EXITING. Note that PF_EXITING
check in task_set_jobctl_pending() is important.

Oleg.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-03  1:38           ` Tejun Heo
@ 2011-10-03 13:56             ` Thomas Gleixner
  2011-10-04  7:37               ` Tejun Heo
  0 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2011-10-03 13:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Matt Fleming, Oleg Nesterov, linux-kernel,
	Tony Luck, akpm, torvalds

On Sun, 2 Oct 2011, Tejun Heo wrote:
> On Sat, Oct 01, 2011 at 03:03:29PM +0200, Peter Zijlstra wrote:
> > On Sat, 2011-10-01 at 11:16 +0100, Matt Fleming wrote:
> > > I also think Thomas/Peter mentioned something about latency in
> > > delivering timer signals because of contention on the per-process
> > > siglock. They might have some more details on that. 
> > 
> > Right, so signal delivery is O(nr_threads), which precludes being able
> > to deliver signals from hardirq context, leading to lots of ugly in -rt.
> 
> Signal delivery is O(#threads)?  Delivery of fatal signal is of course
> but where do we walk all threads during non-fatal signal deliveries?
> What am I missing?

Delivery of any process wide signal can result in an O(thread) walk to
find a valid target. That's true for user space originated and kernel
space originated (e.g. posix timers) signals.
 
> > Breaking up the multitude of uses of siglock certainly seems worthwhile
> > esp. if it also allows for a cleanup of the horrid mess called
> > signal_struct (which really should be called process_struct or so).
> > 
> > And yes, aside from that the siglock can be quite contended because its
> > pretty much the one lock serializing all of the process wide state.
> 
> Hmmm... can you please be a bit more specific?  I personally has never
> seen a case where siglock becomes a problem and IIUC Matt also doesn't

Signal heavy applications suffer massivly from sighand->siglock
contention. sighand->siglock protects the world and some more and Matt
has explained it quite proper. And we have rather large code pathes
covered by it (posix-cpu-timers are the worst of all).

> have actual use case at hand.  Given the fragile nature of this part
> of kernel, it would be nice to know what the return is.

The return is finer grained locking and in the end a faster signal
delivery path which benefits everyone as we do not burden a random
interrupted task with the convoluted signal delivery because we want
to burden the task using signals with it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-03 13:07           ` Oleg Nesterov
@ 2011-10-03 15:22             ` Peter Zijlstra
  2011-10-04 17:14               ` Oleg Nesterov
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2011-10-03 15:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Fleming, Tejun Heo, linux-kernel, Tony Luck, Thomas Gleixner

On Mon, 2011-10-03 at 15:07 +0200, Oleg Nesterov wrote:
> On 10/01, Peter Zijlstra wrote:

> But this series can't help afaics. At least in its current state. It
> only adds more locking to the sending paths.

Right, so I was hoping Matt had a plan (TM)... :-)

> And anyway it is wrong (afaics, and I didn't read it yet ;).

I'll leave you to be the judge of that, I haven't bent by brain around
all this signal stuff yet..

> > which precludes being able
> > to deliver signals from hardirq context, leading to lots of ugly in -rt.
> 
> I think, the best solution would be: never send the signal from irq
> context, and ->siglock shouldn't disable irqs.

Bit hard that, posix timers need to deliver signals which pretty much
mandates we do something from irq context (and the round-trip through
softirq context really isn't pretty nor good for performance).

> > The hope is that this work is a stepping stone to O(1) signal delivery.
> 
> Probably this is possible too. I was thinking anout this when
> set_current_blocked() was added. Unfortunately this needs a lot of
> complications.

Right, so the thing Thomas and I have been promoting for a while now is
to update a signal target vector on every signal mask update. Mask
updates should be the slow path. This would leave us with a ready target
in O(1).

Although given that we've promoted this idea for a while now and it
hasn't happened yet I'm sure its non-trivial :-)

> > Breaking up the multitude of uses of siglock certainly seems worthwhile
> > esp.
> 
> Agreed. But I am not sure how much we should split the locking when
> it comes to sending/dequeueing/etc signals. 5 locks seems too much.

It doesn't need all 5 locks to send a signal, does it? But then, I'm
somewhat out of my depth here, the whole signal delivery path always
looses me.

> > And yes, aside from that the siglock can be quite contended because its
> > pretty much the one lock serializing all of the process wide state.
> 
> True.
> 
> Mostly this is because we moved misc stuff from tasklist to siglock,
> previously this was a win. Today this doesn't look good.

Well a per-process lock still wins from a global lock, but yeah, it
wants to be broken up further.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-01  9:35   ` Matt Fleming
@ 2011-10-03 15:28     ` Linus Torvalds
  2011-10-03 15:43       ` Matt Fleming
  0 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2011-10-03 15:28 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Andi Kleen, Oleg Nesterov, Tejun Heo, linux-kernel, Tony Luck,
	Thomas Gleixner, Peter Zijlstra, David Mosberger-Tang

On Sat, Oct 1, 2011 at 2:35 AM, Matt Fleming <matt@console-pimps.org> wrote:
>
> You and me both. Unfortunately, the ia64 guys had a different opinion,
> http://www.spinics.net/lists/linux-ia64/msg08363.html

Well, considering that it apparently slows down the single-threaded
case by 50% on an architecture that actually *matters*, I think the
whole discussion is somewhat academic, don't you?

"Scalability" is pretty much never a valid excuse for "slows down the
common case".

                       Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-03 15:28     ` Linus Torvalds
@ 2011-10-03 15:43       ` Matt Fleming
  2011-10-03 16:35         ` Oleg Nesterov
  0 siblings, 1 reply; 38+ messages in thread
From: Matt Fleming @ 2011-10-03 15:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Oleg Nesterov, Tejun Heo, linux-kernel, Tony Luck,
	Thomas Gleixner, Peter Zijlstra, David Mosberger-Tang

On Mon, 2011-10-03 at 08:28 -0700, Linus Torvalds wrote:
> On Sat, Oct 1, 2011 at 2:35 AM, Matt Fleming <matt@console-pimps.org> wrote:
> >
> > You and me both. Unfortunately, the ia64 guys had a different opinion,
> > http://www.spinics.net/lists/linux-ia64/msg08363.html
> 
> Well, considering that it apparently slows down the single-threaded
> case by 50% on an architecture that actually *matters*, I think the
> whole discussion is somewhat academic, don't you?
> 
> "Scalability" is pretty much never a valid excuse for "slows down the
> common case".

Where did you get the 50% slow down figure from?

Oh, are you referring to Oleg's email about the slow down under kvm?
Yeah, admittedly that's a pretty clear indicator that something is wrong
with the approach in this patch series.

I thought I'd replied to that email but I guess not.

-- 
Matt Fleming, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-03 15:43       ` Matt Fleming
@ 2011-10-03 16:35         ` Oleg Nesterov
  2011-10-03 20:58           ` Matt Fleming
  0 siblings, 1 reply; 38+ messages in thread
From: Oleg Nesterov @ 2011-10-03 16:35 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Linus Torvalds, Andi Kleen, Tejun Heo, linux-kernel, Tony Luck,
	Thomas Gleixner, Peter Zijlstra, David Mosberger-Tang

On 10/03, Matt Fleming wrote:
>
> Oh, are you referring to Oleg's email about the slow down under kvm?
> Yeah, admittedly that's a pretty clear indicator that something is wrong
> with the approach in this patch series.

Or there was something wrong with my testing, please recheck. I specially
mentioned I was surprised by the numbers. May be kvm, or lockdep...

Oleg.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-03 16:35         ` Oleg Nesterov
@ 2011-10-03 20:58           ` Matt Fleming
  2011-10-03 21:45             ` Linus Torvalds
  2011-10-03 22:13             ` Thomas Gleixner
  0 siblings, 2 replies; 38+ messages in thread
From: Matt Fleming @ 2011-10-03 20:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, Andi Kleen, Tejun Heo, linux-kernel, Tony Luck,
	Thomas Gleixner, Peter Zijlstra, David Mosberger-Tang

On Mon, 2011-10-03 at 18:35 +0200, Oleg Nesterov wrote:
> On 10/03, Matt Fleming wrote:
> >
> > Oh, are you referring to Oleg's email about the slow down under kvm?
> > Yeah, admittedly that's a pretty clear indicator that something is wrong
> > with the approach in this patch series.
> 
> Or there was something wrong with my testing, please recheck. I specially
> mentioned I was surprised by the numbers. May be kvm, or lockdep...

No, I don't think there was anything wrong with your testing method. I
ran your command-line under Qemu and saw similar results - with the
patches applied the single-threaded case slows down (not by 50%, it
looks more like 25%, but that's still unacceptable and not at all what I
had anticipated).

-- 
Matt Fleming, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-03 20:58           ` Matt Fleming
@ 2011-10-03 21:45             ` Linus Torvalds
  2011-10-03 22:13             ` Thomas Gleixner
  1 sibling, 0 replies; 38+ messages in thread
From: Linus Torvalds @ 2011-10-03 21:45 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Oleg Nesterov, Andi Kleen, Tejun Heo, linux-kernel, Tony Luck,
	Thomas Gleixner, Peter Zijlstra, David Mosberger-Tang

On Mon, Oct 3, 2011 at 1:58 PM, Matt Fleming <matt@console-pimps.org> wrote:
>
> No, I don't think there was anything wrong with your testing method. I
> ran your command-line under Qemu and saw similar results - with the
> patches applied the single-threaded case slows down (not by 50%, it
> looks more like 25%, but that's still unacceptable and not at all what I
> had anticipated).

Splitting up locks fairly easily causes these kinds of problems.

On many modern microarchitectures, the serialization implied by
locking can be a *big* performance hit. If a system call goes from a
single big lock to two split locks, that can easily make that system
call very noticeably slower. The individual locks may protect a much
smaller section and be "more scalable", but the end result is actually
clearly worse performance.

We've had that several times when we've made smaller locks (in the VM
in particular). One big lock that you take once can be way better than
two small ones that you have to take in sequence (or, worse still,
nested - that's when you can *really* get into exponential badness).

And with even a very limited number of threads (or processes passing
signals back-and-forth) you can get a "train effect": two cores
accessing the same two locks in order, so that they get synchronized.
The "get synchronized" event itself might even be rare, but once it
happens, things can stay synchronized.

And if the second one always then ends up blocking and/or just causing
cacheline ping-pongs, that slowdown can go up by an absolutely huge
amount because you basically make the "rare" case be the common one.

                           Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-03 20:58           ` Matt Fleming
  2011-10-03 21:45             ` Linus Torvalds
@ 2011-10-03 22:13             ` Thomas Gleixner
  2011-10-04  8:20               ` Matt Fleming
  2011-10-04 17:39               ` Oleg Nesterov
  1 sibling, 2 replies; 38+ messages in thread
From: Thomas Gleixner @ 2011-10-03 22:13 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Oleg Nesterov, Linus Torvalds, Andi Kleen, Tejun Heo,
	linux-kernel, Tony Luck, Peter Zijlstra, David Mosberger-Tang

On Mon, 3 Oct 2011, Matt Fleming wrote:

> On Mon, 2011-10-03 at 18:35 +0200, Oleg Nesterov wrote:
> > On 10/03, Matt Fleming wrote:
> > >
> > > Oh, are you referring to Oleg's email about the slow down under kvm?
> > > Yeah, admittedly that's a pretty clear indicator that something is wrong
> > > with the approach in this patch series.
> > 
> > Or there was something wrong with my testing, please recheck. I specially
> > mentioned I was surprised by the numbers. May be kvm, or lockdep...
> 
> No, I don't think there was anything wrong with your testing method. I
> ran your command-line under Qemu and saw similar results - with the
> patches applied the single-threaded case slows down (not by 50%, it
> looks more like 25%, but that's still unacceptable and not at all what I
> had anticipated).

After staring a bit at your patch I think that you need to tackle that
from a different angle.

The main nuisance of sighand->siglock is the exit race protection and
that's why we need to take it for evrything and some more.

In order to distangle the posix-(cpu)-timer and other stuffs
protection from that single lock, you need to introduce "independent"
locks which basically do the same dance as lock_task_sighand() does
and have to be taken in the exit() path in a well defined order before
manipulating task->sighand.

That way you still cover the exit races, but you can break up the
locking for particular subsystems w/o the need of (much) nesting.

Thanks,

	tglx







^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-03 13:56             ` Thomas Gleixner
@ 2011-10-04  7:37               ` Tejun Heo
  0 siblings, 0 replies; 38+ messages in thread
From: Tejun Heo @ 2011-10-04  7:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Matt Fleming, Oleg Nesterov, linux-kernel,
	Tony Luck, akpm, torvalds

Hello,

On Mon, Oct 03, 2011 at 03:56:26PM +0200, Thomas Gleixner wrote:
> > Signal delivery is O(#threads)?  Delivery of fatal signal is of course
> > but where do we walk all threads during non-fatal signal deliveries?
> > What am I missing?
> 
> Delivery of any process wide signal can result in an O(thread) walk to
> find a valid target. That's true for user space originated and kernel
> space originated (e.g. posix timers) signals.

Ooh, right.  Wakeup currently does linear scan of threads, but this is
mostly a separate issue.  As Peter suggested, tracking wakeup target
probably is one sane way to solve this.

Do you think siglock would still be a problem even after O(N) problem
is solved?  Maybe that would lessen contention enough such that the
locking scope wouldn't matter anymore?  Unless breaking up of the
locks results in nice separate hot domains, being finer-grained
doesn't mean much after all.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-03 22:13             ` Thomas Gleixner
@ 2011-10-04  8:20               ` Matt Fleming
  2011-10-04 17:39               ` Oleg Nesterov
  1 sibling, 0 replies; 38+ messages in thread
From: Matt Fleming @ 2011-10-04  8:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Oleg Nesterov, Linus Torvalds, Andi Kleen, Tejun Heo,
	linux-kernel, Tony Luck, Peter Zijlstra, David Mosberger-Tang

On Tue, 2011-10-04 at 00:13 +0200, Thomas Gleixner wrote:
> 
> After staring a bit at your patch I think that you need to tackle that
> from a different angle.
> 
> The main nuisance of sighand->siglock is the exit race protection and
> that's why we need to take it for evrything and some more.

Right.

> In order to distangle the posix-(cpu)-timer and other stuffs
> protection from that single lock, you need to introduce "independent"
> locks which basically do the same dance as lock_task_sighand() does
> and have to be taken in the exit() path in a well defined order before
> manipulating task->sighand.

Yeah, I've been trying to avoid changing the tsk->sighand stuff (you'll
notice that lock_task_sighand() still works as before in this patch
series) because when I've attempted it before it has always resulted in
much more complicated code (and coming from me, that's saying
something!).

> That way you still cover the exit races, but you can break up the
> locking for particular subsystems w/o the need of (much) nesting.

Yeah, that approach seems interesting. I'll have a look at that, thanks!

-- 
Matt Fleming, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-03 13:16     ` Oleg Nesterov
@ 2011-10-04  8:56       ` Matt Fleming
  2011-10-04 17:29         ` Oleg Nesterov
  0 siblings, 1 reply; 38+ messages in thread
From: Matt Fleming @ 2011-10-04  8:56 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, linux-kernel, Tony Luck, Thomas Gleixner, Peter Zijlstra

On Mon, 2011-10-03 at 15:16 +0200, Oleg Nesterov wrote:
> Why do we have? Usually SIGCONT is ignored. But this doesn't matter,
> SIGCONT acts at the sending time.
> 
> If SIGCONT is sent - the process must not stop. Since we drop the lock
> we can't guarantee this.

OK, I see, thanks.

> > > May be do_signal_stop() does something special? At first flance it doesn't.
> > > But wait, it does while_each_thread() under ->ctrl_lock, why this is safe?
> >
> > Why is it not safe? What scenario are you thinking of where that isn't
> > safe?
> 
> This series doesn't add ->ctrl_lock into copy_process/__unhash_process
> or I misread the patches. This means we can't trust >thread_group list.

*facepalm*

Arrrrggghh! This is why I complain about sighand->siglock protecting too
much, I didn't even _REALISE_ it protected the ->thread_group list.
Thanks for pointing that out, Oleg!

> Even this is safe (say, we can rely on rcu), we can't calculate
> ->group_stop_count correctly. In particular, without ->siglock we can
> race with exit_signals() which sets PF_EXITING. Note that PF_EXITING
> check in task_set_jobctl_pending() is important.

Ah, I think it was these lines that confused me into thinking
->ctrl_lock wasn't required around PF_EXITING,

void exit_signals(struct task_struct *tsk)
{
        int group_stop = 0;
        sigset_t unblocked;

        if (thread_group_empty(tsk) || signal_group_exit(tsk->signal)) {
                tsk->flags |= PF_EXITING;
                return;
        }

But I guess that's safe because either we're the only thread in the
group or the group is already going to exit?

-- 
Matt Fleming, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-03 15:22             ` Peter Zijlstra
@ 2011-10-04 17:14               ` Oleg Nesterov
  2011-10-04 17:52                 ` Oleg Nesterov
  2011-10-04 17:54                 ` Linus Torvalds
  0 siblings, 2 replies; 38+ messages in thread
From: Oleg Nesterov @ 2011-10-04 17:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Matt Fleming, Tejun Heo, linux-kernel, Tony Luck, Thomas Gleixner

On 10/03, Peter Zijlstra wrote:
>
> On Mon, 2011-10-03 at 15:07 +0200, Oleg Nesterov wrote:
> > On 10/01, Peter Zijlstra wrote:
>
> > But this series can't help afaics. At least in its current state. It
> > only adds more locking to the sending paths.
>
> Right, so I was hoping Matt had a plan (TM)... :-)

Whatever we do with the locking, this can't remove O(nr_threads),
although read_lock() could help to reduce the contention.

> > I think, the best solution would be: never send the signal from irq
> > context, and ->siglock shouldn't disable irqs.
>
> Bit hard that, posix timers need to deliver signals which pretty much
> mandates we do something from irq context

Of course. We should notify a thread even if it blocks the signal.

> (and the round-trip through
> softirq context really isn't pretty nor good for performance).

No, no. I meant, it would be nice to offload the sending to the target.
The process itself should take care.

> > Probably this is possible too. I was thinking anout this when
> > set_current_blocked() was added. Unfortunately this needs a lot of
> > complications.
>
> Right, so the thing Thomas and I have been promoting for a while now is
> to update a signal target vector on every signal mask update. Mask
> updates should be the slow path. This would leave us with a ready target
> in O(1).

Yes. This is the "obvious" solution ;) Now that we have
set_current_blocked() this is simple. Except, of course, this blows
signal_struct and set_current_blocked() can't rely on TIF_SIGPENDING.
But we can probably add TIF_YOU_ARE_LISTED_IN_CURR_TARGET_ARRAY.

And in fact this was _one_ of the reasons for set_current_blocked().

> > Agreed. But I am not sure how much we should split the locking when
> > it comes to sending/dequeueing/etc signals. 5 locks seems too much.
>
> It doesn't need all 5 locks to send a signal, does it?

Depending on group/private 3 or 4. Plus ->ctrl_lock if the signal
is fatal but I don't really understand this part... Fortunately I
think this is not needed ;)

And mostly (afaics) this tries to help to dequeue the signal, not
to send. At least currently.

Oleg.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-04  8:56       ` Matt Fleming
@ 2011-10-04 17:29         ` Oleg Nesterov
  0 siblings, 0 replies; 38+ messages in thread
From: Oleg Nesterov @ 2011-10-04 17:29 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Tejun Heo, linux-kernel, Tony Luck, Thomas Gleixner, Peter Zijlstra

On 10/04, Matt Fleming wrote:
>
> Ah, I think it was these lines that confused me into thinking
> ->ctrl_lock wasn't required around PF_EXITING,
>
> void exit_signals(struct task_struct *tsk)
> {
>         int group_stop = 0;
>         sigset_t unblocked;
>
>         if (thread_group_empty(tsk) || signal_group_exit(tsk->signal)) {
>                 tsk->flags |= PF_EXITING;
>                 return;
>         }
>
> But I guess that's safe because either we're the only thread in the
> group or the group is already going to exit?

Yes. Except s/exit/exit or exec/.

And this reminds me... This is not exactly right. I do not mean
this particular function, but the whole logic. An execing process
can miss SIGSTOP. Or the coredumping signal.

Oleg.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-03 22:13             ` Thomas Gleixner
  2011-10-04  8:20               ` Matt Fleming
@ 2011-10-04 17:39               ` Oleg Nesterov
  1 sibling, 0 replies; 38+ messages in thread
From: Oleg Nesterov @ 2011-10-04 17:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Matt Fleming, Linus Torvalds, Andi Kleen, Tejun Heo,
	linux-kernel, Tony Luck, Peter Zijlstra, David Mosberger-Tang

On 10/04, Thomas Gleixner wrote:
>
> The main nuisance of sighand->siglock is the exit race protection and
> that's why we need to take it for evrything and some more.
>
> In order to distangle the posix-(cpu)-timer and other stuffs
> protection from that single lock, you need to introduce "independent"
> locks

Yes. And there is another (much more important imho) reason.

We need a separate lock to protect thread_group/children to avoid
tasklist_lock in exit_notify/wait/etc. And probably it should be
sleepable.

Oleg.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-04 17:14               ` Oleg Nesterov
@ 2011-10-04 17:52                 ` Oleg Nesterov
  2011-10-04 17:54                 ` Linus Torvalds
  1 sibling, 0 replies; 38+ messages in thread
From: Oleg Nesterov @ 2011-10-04 17:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Matt Fleming, Tejun Heo, linux-kernel, Tony Luck, Thomas Gleixner

On 10/04, Oleg Nesterov wrote:
>
> On 10/03, Peter Zijlstra wrote:
> >
> > Right, so the thing Thomas and I have been promoting for a while now is
> > to update a signal target vector on every signal mask update. Mask
> > updates should be the slow path. This would leave us with a ready target
> > in O(1).
>
> Yes. This is the "obvious" solution ;) Now that we have
> set_current_blocked() this is simple. Except, of course, this blows
> signal_struct and set_current_blocked() can't rely on TIF_SIGPENDING.
> But we can probably add TIF_YOU_ARE_LISTED_IN_CURR_TARGET_ARRAY.

Forgot to mention... Unblocking becomes "nontrivial" too. But almost
all these changes are local to set_current_blocked().

Oleg.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-04 17:14               ` Oleg Nesterov
  2011-10-04 17:52                 ` Oleg Nesterov
@ 2011-10-04 17:54                 ` Linus Torvalds
  2011-10-04 18:13                   ` Oleg Nesterov
  1 sibling, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2011-10-04 17:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Matt Fleming, Tejun Heo, linux-kernel, Tony Luck,
	Thomas Gleixner

On Tue, Oct 4, 2011 at 10:14 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> Whatever we do with the locking, this can't remove O(nr_threads),
> although read_lock() could help to reduce the contention.

Read locks often make things worse.

Yes, they allow "more concurrency" and avoid contention, but when
we're talking spinning read-locks, almost every time we've used them
we've found them to be *worse* than spinlocks in actual practice.

Why? One reason is that they are just fundamentally more expensive - a
spinlock is just simpler. As a practical example, a spinlock can do
the unlock without the locked cycle.

The other reason seems to be that they often cause more cacheline
bouncing on both the lock itself and the data it protects. For a
spinlock, it's often appropriate to just be exclusive so that you
basically "concentrate" the accesses to one CPU.

It's different for the sleeping rw-semaphores of course. There,
read-locks have major advantages, and if you actually end up having
sleeping entities, read-locks can be a huge huge improvement over a
regular mutex, and the "more concurrency" is a big advantage.

But the spinning rwlock should generally be avoided. The only *real*
reason for using it is if you have timers/interrupts that can take
things for reading, and using an rwlock means that most users can then
avoid the irq save/restore. That pattern improvement more than makes
up for the fact that the rwlock itself is more expensive than the
spinlock.

Almost every time you want to use an rwlock and you think you have a
lot of readers, you should start looking at RCU instead. Because
*that* can be a huge improvement over spinlocks.

                      Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC][PATCH 0/5] Signal scalability series
  2011-10-04 17:54                 ` Linus Torvalds
@ 2011-10-04 18:13                   ` Oleg Nesterov
  0 siblings, 0 replies; 38+ messages in thread
From: Oleg Nesterov @ 2011-10-04 18:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Matt Fleming, Tejun Heo, linux-kernel, Tony Luck,
	Thomas Gleixner

On 10/04, Linus Torvalds wrote:
>
> On Tue, Oct 4, 2011 at 10:14 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Whatever we do with the locking, this can't remove O(nr_threads),
> > although read_lock() could help to reduce the contention.
>
> Read locks often make things worse.

Yes, yes, I see, thanks.

I meant that probably "more concurrency" was the reason for rwlock
this series adds.

Oleg.


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2011-10-04 18:17 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-30 15:12 [RFC][PATCH 0/5] Signal scalability series Matt Fleming
2011-09-30 15:12 ` [RFC][PATCH 1/5] signal: Document signal locking rules Matt Fleming
2011-09-30 15:12 ` [RFC][PATCH 2/5] signal: Add rwlock to protect sighand->action Matt Fleming
2011-09-30 15:25   ` Peter Zijlstra
2011-09-30 15:56   ` Matt Fleming
2011-09-30 15:12 ` [RFC][PATCH 3/5] signal: Reduce sighand->siglock hold time in get_signal_to_deliver() Matt Fleming
2011-09-30 15:12 ` [RFC][PATCH 4/5] signal: Add signal->ctrl_lock for job control Matt Fleming
2011-09-30 15:30   ` Peter Zijlstra
2011-09-30 15:36     ` Matt Fleming
2011-09-30 15:12 ` [RFC][PATCH 5/5] signal: Split siglock into shared_siglock and per-thread siglock Matt Fleming
2011-09-30 16:52 ` [RFC][PATCH 0/5] Signal scalability series Oleg Nesterov
2011-09-30 18:54   ` Oleg Nesterov
2011-09-30 20:00   ` Matt Fleming
2011-09-30 23:56     ` Tejun Heo
2011-10-01 10:16       ` Matt Fleming
2011-10-01 13:03         ` Peter Zijlstra
2011-10-03  1:38           ` Tejun Heo
2011-10-03 13:56             ` Thomas Gleixner
2011-10-04  7:37               ` Tejun Heo
2011-10-03 13:07           ` Oleg Nesterov
2011-10-03 15:22             ` Peter Zijlstra
2011-10-04 17:14               ` Oleg Nesterov
2011-10-04 17:52                 ` Oleg Nesterov
2011-10-04 17:54                 ` Linus Torvalds
2011-10-04 18:13                   ` Oleg Nesterov
2011-10-03 13:16     ` Oleg Nesterov
2011-10-04  8:56       ` Matt Fleming
2011-10-04 17:29         ` Oleg Nesterov
2011-09-30 22:30 ` Andi Kleen
2011-10-01  9:35   ` Matt Fleming
2011-10-03 15:28     ` Linus Torvalds
2011-10-03 15:43       ` Matt Fleming
2011-10-03 16:35         ` Oleg Nesterov
2011-10-03 20:58           ` Matt Fleming
2011-10-03 21:45             ` Linus Torvalds
2011-10-03 22:13             ` Thomas Gleixner
2011-10-04  8:20               ` Matt Fleming
2011-10-04 17:39               ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.