Linux-rt-users archive on lore.kernel.org
 help / color / Atom feed
* Kill(2) and pthread_create(2) interact poorly in 5.4.28-rt19
@ 2020-04-28 15:45 Joe Korty
  2020-04-30 16:18 ` [PATCH] signals: Allow tasks to cache one sigqueue struct, version 2 Joe Korty
  0 siblings, 1 reply; 4+ messages in thread
From: Joe Korty @ 2020-04-28 15:45 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-rt-users, Sebastian Andrzej Siewior

Kill(2) and pthread_create(2) interact poorly in 5.4.28-rt19.

The following kernel splat can be made to occur when
running the attached test program:

    virt_to_cache: Object is not a Slab page!
    ...
    Call Trace:
     free_uid+0x93/0xa0
     __dequeue_signal+0x21c/0x230
     dequeue_signal+0xb7/0x1b0
     get_signal+0x266/0xce0
     do_signal+0x36/0x640

This was discovered on 5.4.28-rt19. That kernel was
compiled nort, with preemption, and with lots of debug
options selected.  Subsequent tests on the following
kernels show:

     5.4.26-rt17     passes
     5.4.28-rt18     fails
     5.4.28-rt19     fails
     5.4.34-rt21     fails
     5.6.4-rt3       passes

The attached test, tb.c, does a pthread_create-n-join in a
continuous loop, while from other threads in the process
group, SIGUSR1 and SIGUSR2 are continually being sent to
the process group.

When I revert this rt patch in 5.4.28-rt19,

     signals-allow-rt-tasks-to-cache-one-sigqueue-struct.patch

the kernel cannot be made to fail.  However, since two
of the above kernels pass without removing this patch,
I do not believe that removing it will address whatever
the root cause is.

Normally, it takes five or six runs for the test to oops
the kernel.  For some reason, when I run it as:

      sudo chrt -f 1 taskset 2 ./tb

it fails, if it is going to fail, on first try.

once I get a splat the system keeps on running.  But I
find I must reboot to be able to get another splat.

Regards,
Joe


/*
 * Test interaction between pthread_create(2) and kill(2).
 *
 * Algorithm: In a loop, as fast as possible, create
 * and wait on a NOP thread.  While doing so, from other
 * threads, send SIGUSR1 and SIGUSR2 to the process group
 * as fast as possible.  All threads except for the targeted
 * test thread are blocked from receiving SIGUSR[12].
 *
 * This test runs for one second.  It is known to generate
 * a kernel splat on 5.4.28-rt19 when compiled preempt,
 * nort, and with lots of debug options selected.  Fails at
 * most once per boot.  Test may have to be run several
 * times times before failure.  The below seems to trigger
 * the problem more easily:
 *
 *      chrt -f 1 taskset 2 ./tb
 *
 * This program is a revamp/simplification of the LTP test
 * pthread_create/14-1.c.
 *
 * Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com>
 */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <semaphore.h>
#include <pthread.h>

sigset_t usersigs;

sem_t semsig1;
sem_t semsig2;

volatile int doit = 1;

volatile unsigned long threaded_count, siguser1_count, siguser2_count;

/* thread to send SIGUSR1 serially to this process as fast as possible */
void *sendsig1(void *arg)
{
	pid_t pid = getpid();

	pthread_sigmask(SIG_BLOCK, &usersigs, NULL);

	while (doit) {
		siguser1_count++;
		sem_wait(&semsig1);
		kill(pid, SIGUSR1);
	}
	return NULL;
}

/* SIGUSR1 handler merely lets the sender know it can send another signal */
void siguser1(int sig)
{
	sem_post(&semsig1);
}

/* thread to send SIGUSR2 serially to this process as fast as possible */
void *sendsig2(void *arg)
{
	pid_t pid = getpid();

	pthread_sigmask(SIG_BLOCK, &usersigs, NULL);

	while (doit) {
		siguser2_count++;
		sem_wait(&semsig2);
		kill(pid, SIGUSR2);
	}
	return NULL;
}

void siguser2(int sig)
{
	sem_post(&semsig2);
}

/* dummy thread, does no work */
void *threaded(void *arg)
{
	return NULL;
}

void *test(void *arg)
{
	pthread_t child_id;

	/* expose this thread & its children to SIGUSR[12] */
	pthread_sigmask(SIG_UNBLOCK, &usersigs, NULL);

	/* create and re-create a thread as fast as possible */
	while (doit) {
		threaded_count++;
		pthread_create(&child_id, NULL, threaded, NULL);
		pthread_join(child_id, NULL);
	}
	 return NULL;
}

int main(void)
{
	pthread_t th_work, th_sig1, th_sig2;
	struct sigaction sa;

	sigemptyset(&sa.sa_mask);
	sa.sa_flags = 0;
	sa.sa_handler = siguser1;
	sigaction(SIGUSR1, &sa, NULL);
	sa.sa_handler = siguser2;
	sigaction(SIGUSR2, &sa, NULL);

	/* block the main thread from seeing SIGUSR[12] */
	sigemptyset(&usersigs);
	sigaddset(&usersigs, SIGUSR1);
	sigaddset(&usersigs, SIGUSR2);
	pthread_sigmask(SIG_BLOCK, &usersigs, NULL);

	sem_init(&semsig1, 0, 1);
	sem_init(&semsig2, 0, 1);

	pthread_create(&th_work, NULL, test, NULL);

	pthread_create(&th_sig1, NULL, sendsig1, NULL);
	pthread_create(&th_sig2, NULL, sendsig2, NULL);

	sleep(1);

	doit = 0;
	pthread_join(th_sig1, NULL);
	pthread_join(th_sig2, NULL);
	pthread_join(th_work, NULL);

	printf("#threads %lu, #sigusr1's %lu, #sigusr2's %lu\n",
		threaded_count, siguser1_count, siguser2_count);

	return 0;
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] signals: Allow tasks to cache one sigqueue struct, version 2
  2020-04-28 15:45 Kill(2) and pthread_create(2) interact poorly in 5.4.28-rt19 Joe Korty
@ 2020-04-30 16:18 ` Joe Korty
  2020-05-04 15:41   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 4+ messages in thread
From: Joe Korty @ 2020-04-30 16:18 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-rt-users, Sebastian Andrzej Siewior

[5.4.28-rt19] Cache up to one sigqueue per task.

This replaces an earlier version of this rt patch,

    signals-allow-rt-tasks-to-cache-one-sigqueue-struct.patch

which is able to generate a kernel splat of the form,

    virt_to_cache: Object is not a Slab page!
    ...
    Call Trace:
     free_uid+0x93/0xa0
     __dequeue_signal+0x21c/0x230
     dequeue_signal+0xb7/0x1b0
     get_signal+0x266/0xce0
     do_signal+0x36/0x640

This most easily happens when the test program, tb.c,
is executed.  That test program can be found in an
earlier linux-rt-users posting,

   Kill(2) and pthread_create(2) interact poorly in 5.4.28-rt19
   Date: Tue, 28 Apr 2020 11:45:36 -0400

After poking around the original patch, not making any
progress, I tried revamping the patch down to its simpliest
possible implementation.  This revamped version has been
tested in both an rt and nort compiled 5.4.28-rt19 kernel.
No issues have been seen so far.

Please consider

  1) dropping the original cache-sigqueue patch, or
  2) replace that patch with this new version, or
  3) rewrite the original patch using your own criteria
     for what a correct patch should be.

and backport whatever fix is selected to all versions of the
rt kernel that have the original patch applied.

Thanks,
Joe

---
signals: Allow tasks to cache one sigqueue struct, version 2

To avoid allocation overhead, allow all tasks to cache one sigqueue struct
in task struct.

[ This replaces version 1, which is unstable in several versions of the
rt patch. ]

Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com>

Index: b/include/linux/sched.h
===================================================================
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -935,6 +935,7 @@ struct task_struct {
 	/* Signal handlers: */
 	struct signal_struct		*signal;
 	struct sighand_struct		*sighand;
+	struct sigqueue			*sigqueue_cache;
 	sigset_t			blocked;
 	sigset_t			real_blocked;
 	/* Restored if set_restore_sigmask() was used: */
Index: b/include/linux/signal.h
===================================================================
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -255,6 +255,7 @@ static inline void init_sigpending(struc
 }
 
 extern void flush_sigqueue(struct sigpending *queue);
+extern void flush_task_sigqueue(struct task_struct *tsk);
 
 /* Test if 'sig' is valid signal. Use this instead of testing _NSIG directly */
 static inline int valid_signal(unsigned long sig)
Index: b/kernel/exit.c
===================================================================
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -171,6 +171,7 @@ static void __exit_signal(struct task_st
 		flush_sigqueue(&sig->shared_pending);
 		tty_kref_put(tty);
 	}
+	flush_task_sigqueue(tsk);
 }
 
 static void delayed_put_task_struct(struct rcu_head *rhp)
Index: b/kernel/fork.c
===================================================================
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1587,6 +1587,7 @@ static int copy_signal(unsigned long clo
 	init_waitqueue_head(&sig->wait_chldexit);
 	sig->curr_target = tsk;
 	init_sigpending(&sig->shared_pending);
+	tsk->sigqueue_cache = NULL;
 	INIT_HLIST_HEAD(&sig->multiprocess);
 	seqlock_init(&sig->stats_lock);
 	prev_cputime_init(&sig->prev_cputime);
@@ -1924,6 +1925,7 @@ static __latent_entropy struct task_stru
 	spin_lock_init(&p->alloc_lock);
 
 	init_sigpending(&p->pending);
+	p->sigqueue_cache = NULL;
 
 	p->utime = p->stime = p->gtime = 0;
 #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
Index: b/kernel/signal.c
===================================================================
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -431,7 +431,9 @@ __sigqueue_alloc(int sig, struct task_st
 	rcu_read_unlock();
 
 	if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
-		q = kmem_cache_alloc(sigqueue_cachep, flags);
+		q = xchg(&t->sigqueue_cache, NULL);
+		if (!q)
+			q = kmem_cache_alloc(sigqueue_cachep, flags);
 	} else {
 		print_dropped_signal(sig);
 	}
@@ -454,7 +456,9 @@ static void __sigqueue_free(struct sigqu
 		return;
 	if (atomic_dec_and_test(&q->user->sigpending))
 		free_uid(q->user);
-	kmem_cache_free(sigqueue_cachep, q);
+	q = xchg(&current->sigqueue_cache, q);
+	if (q)
+		kmem_cache_free(sigqueue_cachep, q);
 }
 
 void flush_sigqueue(struct sigpending *queue)
@@ -469,6 +473,15 @@ void flush_sigqueue(struct sigpending *q
 	}
 }
 
+void flush_task_sigqueue(struct task_struct *t)
+{
+	struct sigqueue *q;
+
+	q = xchg(&t->sigqueue_cache, NULL);
+	if (q)
+		kmem_cache_free(sigqueue_cachep, q);
+}
+
 /*
  * Flush all pending signals for this kthread.
  */


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] signals: Allow tasks to cache one sigqueue struct, version 2
  2020-04-30 16:18 ` [PATCH] signals: Allow tasks to cache one sigqueue struct, version 2 Joe Korty
@ 2020-05-04 15:41   ` Sebastian Andrzej Siewior
  2020-05-04 16:27     ` Joe Korty
  0 siblings, 1 reply; 4+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-05-04 15:41 UTC (permalink / raw)
  To: Joe Korty; +Cc: Thomas Gleixner, linux-rt-users

On 2020-04-30 12:18:48 [-0400], Joe Korty wrote:
> [5.4.28-rt19] Cache up to one sigqueue per task.
> 
> This replaces an earlier version of this rt patch,
> 
>     signals-allow-rt-tasks-to-cache-one-sigqueue-struct.patch
> 
> which is able to generate a kernel splat of the form,

Does this
	https://lkml.kernel.org/r/20200407095413.30039-1-matt@codeblueprint.co.uk

help the problem you see?

Sebastian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] signals: Allow tasks to cache one sigqueue struct, version 2
  2020-05-04 15:41   ` Sebastian Andrzej Siewior
@ 2020-05-04 16:27     ` Joe Korty
  0 siblings, 0 replies; 4+ messages in thread
From: Joe Korty @ 2020-05-04 16:27 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Thomas Gleixner, linux-rt-users

On Mon, May 04, 2020 at 05:41:48PM +0200, Sebastian Andrzej Siewior wrote:
> On 2020-04-30 12:18:48 [-0400], Joe Korty wrote:
> > [5.4.28-rt19] Cache up to one sigqueue per task.
> > 
> > This replaces an earlier version of this rt patch,
> > 
> >     signals-allow-rt-tasks-to-cache-one-sigqueue-struct.patch
> > 
> > which is able to generate a kernel splat of the form,
> 
> Does this
> 	https://lkml.kernel.org/r/20200407095413.30039-1-matt@codeblueprint.co.uk
> 
> help the problem you see?

Yes it does!
Thanks,
Joe

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-28 15:45 Kill(2) and pthread_create(2) interact poorly in 5.4.28-rt19 Joe Korty
2020-04-30 16:18 ` [PATCH] signals: Allow tasks to cache one sigqueue struct, version 2 Joe Korty
2020-05-04 15:41   ` Sebastian Andrzej Siewior
2020-05-04 16:27     ` Joe Korty

Linux-rt-users archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rt-users/0 linux-rt-users/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rt-users linux-rt-users/ https://lore.kernel.org/linux-rt-users \
		linux-rt-users@vger.kernel.org
	public-inbox-index linux-rt-users

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rt-users


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git