* Kill(2) and pthread_create(2) interact poorly in 5.4.28-rt19
@ 2020-04-28 15:45 Joe Korty
2020-04-30 16:18 ` [PATCH] signals: Allow tasks to cache one sigqueue struct, version 2 Joe Korty
0 siblings, 1 reply; 4+ messages in thread
From: Joe Korty @ 2020-04-28 15:45 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: linux-rt-users, Sebastian Andrzej Siewior
Kill(2) and pthread_create(2) interact poorly in 5.4.28-rt19.
The following kernel splat can be made to occur when
running the attached test program:
virt_to_cache: Object is not a Slab page!
...
Call Trace:
free_uid+0x93/0xa0
__dequeue_signal+0x21c/0x230
dequeue_signal+0xb7/0x1b0
get_signal+0x266/0xce0
do_signal+0x36/0x640
This was discovered on 5.4.28-rt19. That kernel was
compiled nort, with preemption, and with lots of debug
options selected. Subsequent tests on the following
kernels show:
5.4.26-rt17 passes
5.4.28-rt18 fails
5.4.28-rt19 fails
5.4.34-rt21 fails
5.6.4-rt3 passes
The attached test, tb.c, does a pthread_create-n-join in a
continuous loop, while from other threads in the process
group, SIGUSR1 and SIGUSR2 are continually being sent to
the process group.
When I revert this rt patch in 5.4.28-rt19,
signals-allow-rt-tasks-to-cache-one-sigqueue-struct.patch
the kernel cannot be made to fail. However, since two
of the above kernels pass without removing this patch,
I do not believe that removing it will address whatever
the root cause is.
Normally, it takes five or six runs for the test to oops
the kernel. For some reason, when I run it as:
sudo chrt -f 1 taskset 2 ./tb
it fails, if it is going to fail, on first try.
once I get a splat the system keeps on running. But I
find I must reboot to be able to get another splat.
Regards,
Joe
/*
* Test interaction between pthread_create(2) and kill(2).
*
* Algorithm: In a loop, as fast as possible, create
* and wait on a NOP thread. While doing so, from other
* threads, send SIGUSR1 and SIGUSR2 to the process group
* as fast as possible. All threads except for the targeted
* test thread are blocked from receiving SIGUSR[12].
*
* This test runs for one second. It is known to generate
* a kernel splat on 5.4.28-rt19 when compiled preempt,
* nort, and with lots of debug options selected. Fails at
* most once per boot. Test may have to be run several
* times times before failure. The below seems to trigger
* the problem more easily:
*
* chrt -f 1 taskset 2 ./tb
*
* This program is a revamp/simplification of the LTP test
* pthread_create/14-1.c.
*
* Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com>
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <semaphore.h>
#include <pthread.h>
sigset_t usersigs;
sem_t semsig1;
sem_t semsig2;
volatile int doit = 1;
volatile unsigned long threaded_count, siguser1_count, siguser2_count;
/* thread to send SIGUSR1 serially to this process as fast as possible */
void *sendsig1(void *arg)
{
pid_t pid = getpid();
pthread_sigmask(SIG_BLOCK, &usersigs, NULL);
while (doit) {
siguser1_count++;
sem_wait(&semsig1);
kill(pid, SIGUSR1);
}
return NULL;
}
/* SIGUSR1 handler merely lets the sender know it can send another signal */
void siguser1(int sig)
{
sem_post(&semsig1);
}
/* thread to send SIGUSR2 serially to this process as fast as possible */
void *sendsig2(void *arg)
{
pid_t pid = getpid();
pthread_sigmask(SIG_BLOCK, &usersigs, NULL);
while (doit) {
siguser2_count++;
sem_wait(&semsig2);
kill(pid, SIGUSR2);
}
return NULL;
}
void siguser2(int sig)
{
sem_post(&semsig2);
}
/* dummy thread, does no work */
void *threaded(void *arg)
{
return NULL;
}
void *test(void *arg)
{
pthread_t child_id;
/* expose this thread & its children to SIGUSR[12] */
pthread_sigmask(SIG_UNBLOCK, &usersigs, NULL);
/* create and re-create a thread as fast as possible */
while (doit) {
threaded_count++;
pthread_create(&child_id, NULL, threaded, NULL);
pthread_join(child_id, NULL);
}
return NULL;
}
int main(void)
{
pthread_t th_work, th_sig1, th_sig2;
struct sigaction sa;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sa.sa_handler = siguser1;
sigaction(SIGUSR1, &sa, NULL);
sa.sa_handler = siguser2;
sigaction(SIGUSR2, &sa, NULL);
/* block the main thread from seeing SIGUSR[12] */
sigemptyset(&usersigs);
sigaddset(&usersigs, SIGUSR1);
sigaddset(&usersigs, SIGUSR2);
pthread_sigmask(SIG_BLOCK, &usersigs, NULL);
sem_init(&semsig1, 0, 1);
sem_init(&semsig2, 0, 1);
pthread_create(&th_work, NULL, test, NULL);
pthread_create(&th_sig1, NULL, sendsig1, NULL);
pthread_create(&th_sig2, NULL, sendsig2, NULL);
sleep(1);
doit = 0;
pthread_join(th_sig1, NULL);
pthread_join(th_sig2, NULL);
pthread_join(th_work, NULL);
printf("#threads %lu, #sigusr1's %lu, #sigusr2's %lu\n",
threaded_count, siguser1_count, siguser2_count);
return 0;
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH] signals: Allow tasks to cache one sigqueue struct, version 2
2020-04-28 15:45 Kill(2) and pthread_create(2) interact poorly in 5.4.28-rt19 Joe Korty
@ 2020-04-30 16:18 ` Joe Korty
2020-05-04 15:41 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 4+ messages in thread
From: Joe Korty @ 2020-04-30 16:18 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: linux-rt-users, Sebastian Andrzej Siewior
[5.4.28-rt19] Cache up to one sigqueue per task.
This replaces an earlier version of this rt patch,
signals-allow-rt-tasks-to-cache-one-sigqueue-struct.patch
which is able to generate a kernel splat of the form,
virt_to_cache: Object is not a Slab page!
...
Call Trace:
free_uid+0x93/0xa0
__dequeue_signal+0x21c/0x230
dequeue_signal+0xb7/0x1b0
get_signal+0x266/0xce0
do_signal+0x36/0x640
This most easily happens when the test program, tb.c,
is executed. That test program can be found in an
earlier linux-rt-users posting,
Kill(2) and pthread_create(2) interact poorly in 5.4.28-rt19
Date: Tue, 28 Apr 2020 11:45:36 -0400
After poking around the original patch, not making any
progress, I tried revamping the patch down to its simpliest
possible implementation. This revamped version has been
tested in both an rt and nort compiled 5.4.28-rt19 kernel.
No issues have been seen so far.
Please consider
1) dropping the original cache-sigqueue patch, or
2) replace that patch with this new version, or
3) rewrite the original patch using your own criteria
for what a correct patch should be.
and backport whatever fix is selected to all versions of the
rt kernel that have the original patch applied.
Thanks,
Joe
---
signals: Allow tasks to cache one sigqueue struct, version 2
To avoid allocation overhead, allow all tasks to cache one sigqueue struct
in task struct.
[ This replaces version 1, which is unstable in several versions of the
rt patch. ]
Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com>
Index: b/include/linux/sched.h
===================================================================
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -935,6 +935,7 @@ struct task_struct {
/* Signal handlers: */
struct signal_struct *signal;
struct sighand_struct *sighand;
+ struct sigqueue *sigqueue_cache;
sigset_t blocked;
sigset_t real_blocked;
/* Restored if set_restore_sigmask() was used: */
Index: b/include/linux/signal.h
===================================================================
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -255,6 +255,7 @@ static inline void init_sigpending(struc
}
extern void flush_sigqueue(struct sigpending *queue);
+extern void flush_task_sigqueue(struct task_struct *tsk);
/* Test if 'sig' is valid signal. Use this instead of testing _NSIG directly */
static inline int valid_signal(unsigned long sig)
Index: b/kernel/exit.c
===================================================================
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -171,6 +171,7 @@ static void __exit_signal(struct task_st
flush_sigqueue(&sig->shared_pending);
tty_kref_put(tty);
}
+ flush_task_sigqueue(tsk);
}
static void delayed_put_task_struct(struct rcu_head *rhp)
Index: b/kernel/fork.c
===================================================================
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1587,6 +1587,7 @@ static int copy_signal(unsigned long clo
init_waitqueue_head(&sig->wait_chldexit);
sig->curr_target = tsk;
init_sigpending(&sig->shared_pending);
+ tsk->sigqueue_cache = NULL;
INIT_HLIST_HEAD(&sig->multiprocess);
seqlock_init(&sig->stats_lock);
prev_cputime_init(&sig->prev_cputime);
@@ -1924,6 +1925,7 @@ static __latent_entropy struct task_stru
spin_lock_init(&p->alloc_lock);
init_sigpending(&p->pending);
+ p->sigqueue_cache = NULL;
p->utime = p->stime = p->gtime = 0;
#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
Index: b/kernel/signal.c
===================================================================
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -431,7 +431,9 @@ __sigqueue_alloc(int sig, struct task_st
rcu_read_unlock();
if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
- q = kmem_cache_alloc(sigqueue_cachep, flags);
+ q = xchg(&t->sigqueue_cache, NULL);
+ if (!q)
+ q = kmem_cache_alloc(sigqueue_cachep, flags);
} else {
print_dropped_signal(sig);
}
@@ -454,7 +456,9 @@ static void __sigqueue_free(struct sigqu
return;
if (atomic_dec_and_test(&q->user->sigpending))
free_uid(q->user);
- kmem_cache_free(sigqueue_cachep, q);
+ q = xchg(¤t->sigqueue_cache, q);
+ if (q)
+ kmem_cache_free(sigqueue_cachep, q);
}
void flush_sigqueue(struct sigpending *queue)
@@ -469,6 +473,15 @@ void flush_sigqueue(struct sigpending *q
}
}
+void flush_task_sigqueue(struct task_struct *t)
+{
+ struct sigqueue *q;
+
+ q = xchg(&t->sigqueue_cache, NULL);
+ if (q)
+ kmem_cache_free(sigqueue_cachep, q);
+}
+
/*
* Flush all pending signals for this kthread.
*/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] signals: Allow tasks to cache one sigqueue struct, version 2
2020-04-30 16:18 ` [PATCH] signals: Allow tasks to cache one sigqueue struct, version 2 Joe Korty
@ 2020-05-04 15:41 ` Sebastian Andrzej Siewior
2020-05-04 16:27 ` Joe Korty
0 siblings, 1 reply; 4+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-05-04 15:41 UTC (permalink / raw)
To: Joe Korty; +Cc: Thomas Gleixner, linux-rt-users
On 2020-04-30 12:18:48 [-0400], Joe Korty wrote:
> [5.4.28-rt19] Cache up to one sigqueue per task.
>
> This replaces an earlier version of this rt patch,
>
> signals-allow-rt-tasks-to-cache-one-sigqueue-struct.patch
>
> which is able to generate a kernel splat of the form,
Does this
https://lkml.kernel.org/r/20200407095413.30039-1-matt@codeblueprint.co.uk
help the problem you see?
Sebastian
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] signals: Allow tasks to cache one sigqueue struct, version 2
2020-05-04 15:41 ` Sebastian Andrzej Siewior
@ 2020-05-04 16:27 ` Joe Korty
0 siblings, 0 replies; 4+ messages in thread
From: Joe Korty @ 2020-05-04 16:27 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: Thomas Gleixner, linux-rt-users
On Mon, May 04, 2020 at 05:41:48PM +0200, Sebastian Andrzej Siewior wrote:
> On 2020-04-30 12:18:48 [-0400], Joe Korty wrote:
> > [5.4.28-rt19] Cache up to one sigqueue per task.
> >
> > This replaces an earlier version of this rt patch,
> >
> > signals-allow-rt-tasks-to-cache-one-sigqueue-struct.patch
> >
> > which is able to generate a kernel splat of the form,
>
> Does this
> https://lkml.kernel.org/r/20200407095413.30039-1-matt@codeblueprint.co.uk
>
> help the problem you see?
Yes it does!
Thanks,
Joe
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-05-04 16:27 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-28 15:45 Kill(2) and pthread_create(2) interact poorly in 5.4.28-rt19 Joe Korty
2020-04-30 16:18 ` [PATCH] signals: Allow tasks to cache one sigqueue struct, version 2 Joe Korty
2020-05-04 15:41 ` Sebastian Andrzej Siewior
2020-05-04 16:27 ` Joe Korty
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.