All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] signal: Allow tasks to cache one sigqueue struct (again).
@ 2022-11-29 15:15 Sebastian Andrzej Siewior
  2022-11-29 18:22 ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-11-29 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Eric W. Biederman, Ben Segall, Christian Brauner,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Ingo Molnar,
	Juri Lelli, Linus Torvalds, Mel Gorman, Oleg Nesterov,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot

From: Thomas Gleixner <tglx@linutronix.de>

The idea for this originates from the real time tree to make signal
delivery for realtime applications more efficient. In quite some of these
application scenarios a control tasks signals workers to start their
computations. There is usually only one signal per worker on flight.  This
works nicely as long as the kmem cache allocations do not hit the slow path
and cause latencies.

To cure this an optimistic caching was introduced (limited to RT tasks)
which allows a task to cache a single sigqueue in a pointer in task_struct
instead of handing it back to the kmem cache after consuming a signal. When
the next signal is sent to the task then the cached sigqueue is used
instead of allocating a new one. This solves the problem for this set of
application scenarios nicely.

The task cache is not preallocated so the first signal sent to a task goes
always to the cache allocator. The cached sigqueue stays around until the
task exits. Cache filling and depleting relies on task::sighand::siglock.
The sigqueue is cached until PF_EXITING is set. After this flag is set,
the cache is purged from __exit_signal().

After posting this solution for mainline the discussion came up whether
this would be useful in general and should not be limited to realtime
tasks: https://lore.kernel.org/r/m11rcu7nbr.fsf@fess.ebiederm.org

One concern leading to the original limitation was to avoid a large amount
of pointlessly cached sigqueues in alive tasks. The other concern was
vs. RLIMIT_SIGPENDING as these cached sigqueues are not accounted for.

The accounting problem is real, but on the other hand slightly academic.
After gathering some statistics it turned out that after boot of a regular
distro install there are less than 10 sigqueues cached in ~1500 tasks.

In case of a 'mass fork and fire signal to child' scenario the extra 80
bytes of memory per task are well in the noise of the overall memory
consumption of the fork bomb.

If this should be limited then this would need an extra counter in struct
user, more atomic instructions and a separate rlimit. Yet another tunable
which is mostly unused.

The caching is actually used. After boot and a full kernel compile on a
32CPU machine with make -j32 the number of 'allocations' looks like this:

  From slab:       152010
  From task cache: 349594

I.e. it reduces the number of slab cache operations by ~70%.
A typical pattern there is:

<...>-1428 __sigqueue_alloc: ALLOC SLAB  ffff88810b072218 sig 17 to bash-1427
 bash-1427 __sigqueue_cache: CACHE ffff88810b072218
 mesg-1429 __sigqueue_alloc: ALLOC CACHE ffff88810b072218 sig 17 to bash-1427
 bash-1427 __sigqueue_cache: CACHE ffff88810b072218
  cat-1430 __sigqueue_alloc: ALLOC CACHE ffff88810b072218 sig 17 to bash-1427
 bash-1427 __sigqueue_cache: CACHE ffff88810b072218

The interesting sequence is that the exiting task 1429 grabs the sigqueue
from bash's task cache to signal exit and bash sticks it back into it's own
cache. Same for 1430. Lather, rinse and repeat.

The caching is probably not noticeable for the general use case, but the
benefit for latency sensitive applications is clear. While kmem caches are
usually just serving from the fast path the slab merging (default) can
depending on the usage pattern of the merged slabs cause occasional slow
path allocations.

The time spared per cached entry is a few micro seconds per signal which is
not relevant for e.g. a kernel build, but for signal heavy workloads it's
measurable.

As there is no real downside of this caching mechanism making it
unconditionally available is preferred over more conditional code or new
magic tunables.

A previous version of this was reverted due to the incomplete handling
of exit races: b4b27b9eed8e ("Revert "signal: Allow tasks to cache one
sigqueue struct"").

The current version takes a substantially more robust approach to this:

  1) Caching a sigqueue is only allowed in context of the current task
     with current's sighand lock held.

     This avoids two issues of the previous version: the lockless
     mechanism and the caching from e.g. interrupt context.

  2) The exit race is prevented via PF_EXITING. PF_EXITING disables
     caching so that the following free of an already cached sigqueue
     is guaranteed to be the last operation on task::sigqueue_cache.

[ bigeasy: Refurbished commit after if got reverted for valid reasons. ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/sched.h  |  1 +
 include/linux/signal.h |  1 +
 kernel/exit.c          |  1 +
 kernel/fork.c          |  1 +
 kernel/signal.c        | 65 +++++++++++++++++++++++++++++++++++++-----
 5 files changed, 62 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ffb6eb55cd135..ac44ed5bb0abc 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1103,6 +1103,7 @@ struct task_struct {
 	/* Signal handlers: */
 	struct signal_struct		*signal;
 	struct sighand_struct __rcu		*sighand;
+	struct sigqueue			*sigqueue_cache;
 	sigset_t			blocked;
 	sigset_t			real_blocked;
 	/* Restored if set_restore_sigmask() was used: */
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 3b98e7a28538b..e140aa496e4fc 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -266,6 +266,7 @@ static inline void init_sigpending(struct sigpending *sig)
 }
 
 extern void flush_sigqueue(struct sigpending *queue);
+extern void exit_task_sigqueue_cache(struct task_struct *tsk);
 
 /* Test if 'sig' is valid signal. Use this instead of testing _NSIG directly */
 static inline int valid_signal(unsigned long sig)
diff --git a/kernel/exit.c b/kernel/exit.c
index 35e0a31a0315c..ee10207d62e13 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -157,6 +157,7 @@ static void __exit_signal(struct task_struct *tsk)
 	 */
 	flush_sigqueue(&tsk->pending);
 	tsk->sighand = NULL;
+	exit_task_sigqueue_cache(tsk);
 	spin_unlock(&sighand->siglock);
 
 	__cleanup_sighand(sighand);
diff --git a/kernel/fork.c b/kernel/fork.c
index 08969f5aa38d5..07ce3e490cad9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2141,6 +2141,7 @@ static __latent_entropy struct task_struct *copy_process(
 	spin_lock_init(&p->alloc_lock);
 
 	init_sigpending(&p->pending);
+	p->sigqueue_cache = NULL;
 
 	p->utime = p->stime = p->gtime = 0;
 #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
diff --git a/kernel/signal.c b/kernel/signal.c
index d140672185a48..88dfe52f65e74 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -432,7 +432,15 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
 		return NULL;
 
 	if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
-		q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
+		/*
+		 * The caching relies on sighand::siglock and so preallocation
+		 * can not use it.
+		 */
+		q = t->sigqueue_cache;
+		if (!q || sigqueue_flags)
+			q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
+		else
+			t->sigqueue_cache = NULL;
 	} else {
 		print_dropped_signal(sig);
 	}
@@ -447,15 +455,58 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
 	return q;
 }
 
-static void __sigqueue_free(struct sigqueue *q)
+void exit_task_sigqueue_cache(struct task_struct *tsk)
+{
+	/*
+	 * Race free because @tsk is mopped up. PF_EXITING is set and siglock
+	 * is held by the caller, which cannot be validated by lockdep because
+	 * tsk->sighand is already NULL.
+	 */
+	struct sigqueue *q = tsk->sigqueue_cache;
+
+	if (q)
+		kmem_cache_free(sigqueue_cachep, q);
+}
+
+static bool sigqueue_free_common(struct sigqueue *q)
 {
 	if (q->flags & SIGQUEUE_PREALLOC)
-		return;
+		return false;
 	if (q->ucounts) {
 		dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING);
 		q->ucounts = NULL;
 	}
-	kmem_cache_free(sigqueue_cachep, q);
+	return true;
+}
+
+static void __sigqueue_free(struct sigqueue *q)
+{
+	/*
+	 * Don't attempt to cache the sigqueue. This can be invoked from
+	 * interrupt context and has no requirement to hold siglock.
+	 */
+	if (sigqueue_free_common(q))
+		kmem_cache_free(sigqueue_cachep, q);
+}
+
+static void __sigqueue_cache_or_free(struct sigqueue *q)
+{
+	if (!sigqueue_free_common(q))
+		return;
+	/*
+	 * Try to cache one sigqueue per task. This pairs with the consumer
+	 * side in __sigqueue_alloc().
+	 * Caching is only possible in current's task context. Siglock must be
+	 * held, the task must be alive and not a kernel thread.
+	 *
+	 * Kernel threads are excluded from caching. No caching is done once the
+	 * task is exiting.
+	 */
+	if (current->sigqueue_cache || current->flags & (PF_EXITING | PF_KTHREAD)) {
+		kmem_cache_free(sigqueue_cachep, q);
+		return;
+	}
+	current->sigqueue_cache = q;
 }
 
 void flush_sigqueue(struct sigpending *queue)
@@ -594,7 +645,7 @@ static void collect_signal(int sig, struct sigpending *list, kernel_siginfo_t *i
 			(info->si_code == SI_TIMER) &&
 			(info->si_sys_private);
 
-		__sigqueue_free(first);
+		__sigqueue_cache_or_free(first);
 	} else {
 		/*
 		 * Ok, it wasn't in the queue.  This must be
@@ -745,7 +796,7 @@ static int dequeue_synchronous_signal(kernel_siginfo_t *info)
 still_pending:
 	list_del_init(&sync->list);
 	copy_siginfo(info, &sync->info);
-	__sigqueue_free(sync);
+	__sigqueue_cache_or_free(sync);
 	return info->si_signo;
 }
 
@@ -1955,7 +2006,7 @@ void sigqueue_free(struct sigqueue *q)
 	spin_unlock_irqrestore(lock, flags);
 
 	if (q)
-		__sigqueue_free(q);
+		__sigqueue_cache_or_free(q);
 }
 
 int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
-- 
2.38.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] signal: Allow tasks to cache one sigqueue struct (again).
  2022-11-29 15:15 [PATCH] signal: Allow tasks to cache one sigqueue struct (again) Sebastian Andrzej Siewior
@ 2022-11-29 18:22 ` Linus Torvalds
  2022-11-29 20:06   ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2022-11-29 18:22 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Eric W. Biederman, Ben Segall, Christian Brauner,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Ingo Molnar,
	Juri Lelli, Mel Gorman, Oleg Nesterov, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Valentin Schneider,
	Vincent Guittot

On Tue, Nov 29, 2022 at 7:15 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> From: Thomas Gleixner <tglx@linutronix.de>
>
> The idea for this originates from the real time tree to make signal
> delivery for realtime applications more efficient.

NAK.

This was attempted once. It had unexplained problems, and very unclear locking.

And this new version is THE SAME CRAZY STUFF. That whole
sigqueue_cache_or_free() hasn't been improved, and is just a thousand
monkeys on crack typing wildly on a keyboard.

This updated patch seems to *attempt* to do everything under the
sighand->siglock, and several of the comments point to it.

But - exactly like the first version - that isn't actually always
true, so it then does *other* random tests instead.

And in at least one case - sigqueue_free() - this code *literally* does that

        spin_unlock_irqrestore(lock, flags);

        if (q)
                __sigqueue_cache_or_free(q);

where that "spin_unlock_irqrestore()" just released 'sighand->siglock'
which is what allegedly is protecting this all.

In other words, STOP DOING THIS.

Insanity is doing the same thing over and over again, expecting
different results. This is insanity.

We have a couple of alternatives here:

 (a) do it *properly* and obviously under that sighand->lock in EVERY
SINGLE PATH.

     None of these games with random code paths either having the lock
conditionally (the 'sigqueue_flags' in __sigqueue_alloc()).

     And none of these "I just dropped the lock, and then free the cache"

 (b) alternatively, do no locking at all, and depend on 'current'
being single-threaded, and _only_ do it in task-synchronous contexts
(so no interrupts)

 (c) use proper atomics that don't *need* locking (ie "xchg(p,NULL)"
kind of things)

but no, we do not repeat the mistake of mixing (a) and (b) with random code.

And exactly because we've already done this once - badly - I want the
new code to be *obviously* correct.

That very much means "don't try to revive a patch that was already
reverted because it was broken and didn't make sense". Throw this
patch AWAY. It's not worth reviving. Start from scratch, with that
"this needs to be _obviously_ correct" as the starting point.

Honestly, I suspect the proper approach is:

 - absolutely *no* "test if task is dead". Free the last left-over
cached entry as you free the 'struct task_struct".

   Stop doing these "test random other unrelated flags".

   If you cannot based it on just the cache field and the lock, DON'T
DO IT AT ALL.

 - absolutely no "test the 'sigqueue_flags' argument".

   Again, stop doing these "test random flags".

   Either only do the "do I have a cached entry" in the path that
unambiguously already holds the lock!

   Or do a cache-specific lock (ie "use atomics") so that it doesn't
*matter* whether the lock is held.

 - make every step *so* obvious that you don't even need a comment.

   And then add the comment about it anyway.

Ok? Because I never want to see this broken approach again.

                Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] signal: Allow tasks to cache one sigqueue struct (again).
  2022-11-29 18:22 ` Linus Torvalds
@ 2022-11-29 20:06   ` Linus Torvalds
  2022-11-30  9:58     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2022-11-29 20:06 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Eric W. Biederman, Ben Segall, Christian Brauner,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Ingo Molnar,
	Juri Lelli, Mel Gorman, Oleg Nesterov, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Valentin Schneider,
	Vincent Guittot

[-- Attachment #1: Type: text/plain, Size: 2275 bytes --]

On Tue, Nov 29, 2022 at 10:22 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> That very much means "don't try to revive a patch that was already
> reverted because it was broken and didn't make sense". Throw this
> patch AWAY. It's not worth reviving. Start from scratch, with that
> "this needs to be _obviously_ correct" as the starting point.

Ok, so I realize that I said "obviously correct", and then I'm sending
out this ENTIRELY UNTESTED PATCH as an attachment. And I even made it
not build, on purpose.

In other words, the attached patch is anything *but* obviously
correct, but I'm sending it out as a "start from scratch" suggestion
for "maybe something like this".

Maybe this doesn't catch enough cases to really be worth it, but as a
starting point it

 (a) obviously done under the sighand->siglock or when the task is
dead and cannot be dereferenced

 (b) only adds sigqueues that have SIGQUEUE_PREALLOC set, and with an
empty queue

where (a) makes me at least go "the locking is fairly simple" and (b)
makes me go "and we don't have any rlimit counting issues" and it
really just acts as a single front-end cache.

As a result of (b), the freeing part should be just that kmem_cache_free().

And yeah, that doesn't actually build as-is, because 'sigqueue_cachep'
is local to kernel/signal.c, but rather than fix that I left this just
broken because I get the feeling that __exit_signal() should just be
*moved* into kernel/signal.c, but I didn't think about it enough.

In other words, this is *really* meant as not a good patch, but as a
"can we please go in this direction instead".

And yes, maybe some other "free this siginfo" paths should *also* do
that "try to add it to the cache, and if so, you're done" after doing
the accounting. So this is probably all mis-guided, but I ended up
wanting to just see how an alternative approach would look, and this
feels a lot safer to me.

We already have that SIGQUEUE_PREALLOC case, why not use some of the
same logic for the cached entry. Sure, it bypasses the rlimit, but
what else is new?

Maybe it all needs to do accounting, but at least in this form it
feels "safe", in that you have to free one of those unaccounted
entries in order to get another unaccounted entry.

              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 3047 bytes --]

 include/linux/sched.h |  1 +
 kernel/exit.c         |  4 ++++
 kernel/fork.c         |  2 ++
 kernel/signal.c       | 20 ++++++++++++++++++++
 4 files changed, 27 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ffb6eb55cd13..ac44ed5bb0ab 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1103,6 +1103,7 @@ struct task_struct {
 	/* Signal handlers: */
 	struct signal_struct		*signal;
 	struct sighand_struct __rcu		*sighand;
+	struct sigqueue			*sigqueue_cache;
 	sigset_t			blocked;
 	sigset_t			real_blocked;
 	/* Restored if set_restore_sigmask() was used: */
diff --git a/kernel/exit.c b/kernel/exit.c
index 35e0a31a0315..8d287c8481b4 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -157,6 +157,10 @@ static void __exit_signal(struct task_struct *tsk)
 	 */
 	flush_sigqueue(&tsk->pending);
 	tsk->sighand = NULL;
+	if (tsk->sigqueue_cache) {
+		kmem_cache_free(sigqueue_cachep, tsk->sigqueue_cache);
+		tsk->sigqueue_cache = NULL;
+	}
 	spin_unlock(&sighand->siglock);
 
 	__cleanup_sighand(sighand);
diff --git a/kernel/fork.c b/kernel/fork.c
index 08969f5aa38d..96ffbce88aa3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -535,6 +535,8 @@ void put_task_stack(struct task_struct *tsk)
 
 void free_task(struct task_struct *tsk)
 {
+	if (tsk->sigqueue_cache)
+		kmem_cache_free(sigqueue_cachep, tsk->sigqueue_cache);
 	release_user_cpus_ptr(tsk);
 	scs_release(tsk);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index d140672185a4..ccc0c7aa84e6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1107,6 +1107,12 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
 	if ((sig == SIGKILL) || (t->flags & PF_KTHREAD))
 		goto out_set;
 
+	if (t->sigqueue_cache) {
+		q = t->sigqueue_cache;
+		t->sigqueue_cache = NULL;
+		goto add_sigqueue;
+	}
+
 	/*
 	 * Real-time signals must be queued if sent by sigqueue, or
 	 * some other real-time mechanism.  It is implementation
@@ -1124,6 +1130,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
 	q = __sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit, 0);
 
 	if (q) {
+add_sigqueue:
 		list_add_tail(&q->list, &pending->list);
 		switch ((unsigned long) info) {
 		case (unsigned long) SEND_SIG_NOINFO:
@@ -1933,6 +1940,13 @@ struct sigqueue *sigqueue_alloc(void)
 	return __sigqueue_alloc(-1, current, GFP_KERNEL, 0, SIGQUEUE_PREALLOC);
 }
 
+/*
+ * We only add sigqueues with SIGQUEUE_PREALLOC set
+ * and list_empty(&q->list) to the sigqueue_cache.
+ *
+ * That automatically means that we don't count them
+ * towards any limits.
+ */
 void sigqueue_free(struct sigqueue *q)
 {
 	unsigned long flags;
@@ -1945,6 +1959,12 @@ void sigqueue_free(struct sigqueue *q)
 	 * __exit_signal()->flush_sigqueue().
 	 */
 	spin_lock_irqsave(lock, flags);
+	if (!current->sigqueue_cache && list_empty(&q->list)) {
+		current->sigqueue_cache = q;
+		spin_unlock_irqrestore(lock, flags);
+		return;
+	}
+
 	q->flags &= ~SIGQUEUE_PREALLOC;
 	/*
 	 * If it is queued it will be freed when dequeued,

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] signal: Allow tasks to cache one sigqueue struct (again).
  2022-11-29 20:06   ` Linus Torvalds
@ 2022-11-30  9:58     ` Sebastian Andrzej Siewior
  2022-11-30 11:32       ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-11-30  9:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Eric W. Biederman, Ben Segall, Christian Brauner,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Ingo Molnar,
	Juri Lelli, Mel Gorman, Oleg Nesterov, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Valentin Schneider,
	Vincent Guittot

On 2022-11-29 12:06:22 [-0800], Linus Torvalds wrote:
> In other words, this is *really* meant as not a good patch, but as a
> "can we please go in this direction instead".
> 
> And yes, maybe some other "free this siginfo" paths should *also* do
> that "try to add it to the cache, and if so, you're done" after doing
> the accounting. So this is probably all mis-guided, but I ended up
> wanting to just see how an alternative approach would look, and this
> feels a lot safer to me.

That one case which was not covered by the sighand lock was clearly an
oversight. It should have been under the sighand. The commit description
said so.

> We already have that SIGQUEUE_PREALLOC case, why not use some of the
> same logic for the cached entry. Sure, it bypasses the rlimit, but
> what else is new?

You need set sigqueue_cache to NULL in copy_process().

The SIGQUEUE_PREALLOC signals work slightly different. It is allocated
once at timer_create(2) time and can send a signal multiple times before
finally removed by timer_delete(2). So the signal delivery is, in a way,
already cached. The caching here for timer_create(2) does not work as
__send_signal_locked() (where you grab the cached struct) isn't used by
that code using SIGQUEUE_PREALLOC.
Instead it allocates its sigqueue via sigqueue_alloc() while not holding
the sighand lock.
So after a timer_delete(2), the next signal it receives (which is not
from a posix timer/ SIGQUEUE_PREALLOC) would be from the cache.

>               Linus

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] signal: Allow tasks to cache one sigqueue struct (again).
  2022-11-30  9:58     ` Sebastian Andrzej Siewior
@ 2022-11-30 11:32       ` Linus Torvalds
  0 siblings, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2022-11-30 11:32 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Eric W. Biederman, Ben Segall, Christian Brauner,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Ingo Molnar,
	Juri Lelli, Mel Gorman, Oleg Nesterov, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Valentin Schneider,
	Vincent Guittot

On Wed, Nov 30, 2022 at 1:58 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> That one case which was not covered by the sighand lock was clearly an
> oversight. It should have been under the sighand. The commit description
> said so.

Well, I think it was more than an oversight - it looks like it came
from basically starting from that old known-bad patch, which did the
same thing, with __sigqueue_free() intentionally called outside the
lock.

Which really is why I'm unhappy about that patch. We _know_ there were
some (possibly not entirely debugged) problems with the original
patch, and from a quick look I could find at least one remaining
oddity, and several remaining ugly cases.

Who knows - that issue with sigqueue_free() calling __sigqueue_free()
outside the lock may be what was the problem originally too, and that
caused the revert in the first place (I'm not sure we ever actually
had it ever debugged more than "this is reported to cause problems").

I suspect it would be best to not have that "do the cache case in the
normal alloc/free path" exactly because the normal alloc/free patch is
used in situations where we just don't hold that lock. Either because
of the 'sigqueue_flags' issue on the allocation path, or because - as
here - the freeing path was done separately after having dropped the
lock (to avoid lock contention, I assume).

Maybe just having the cache alloc/free be a separate function would solve it?

Or maybe that sigqueue_free() locking optimization should just be
undone, and maybe that fixes everything. Maybe that was always the
only actual bug in the original code. But I really dislike those other
games it plays too.

> You need set sigqueue_cache to NULL in copy_process().

Ack.

> The SIGQUEUE_PREALLOC signals work slightly different.

Yeah, I think that SIGQUEUE_PREALLOC hack of mine was a mistake, it
looked like a good idea in that sigqueue_free() path that I started
looking at, but it's probably a horrible idea in general.

But that "please don't use a known-broken patch as the starting point"
argument remains.

              Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-30 11:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-29 15:15 [PATCH] signal: Allow tasks to cache one sigqueue struct (again) Sebastian Andrzej Siewior
2022-11-29 18:22 ` Linus Torvalds
2022-11-29 20:06   ` Linus Torvalds
2022-11-30  9:58     ` Sebastian Andrzej Siewior
2022-11-30 11:32       ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.