linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kill(2) and pthread_create(2) interact poorly in 5.4.28-rt19
@ 2020-04-28 15:45 Joe Korty
  2020-04-30 16:18 ` [PATCH] signals: Allow tasks to cache one sigqueue struct, version 2 Joe Korty
  0 siblings, 1 reply; 4+ messages in thread
From: Joe Korty @ 2020-04-28 15:45 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-rt-users, Sebastian Andrzej Siewior

Kill(2) and pthread_create(2) interact poorly in 5.4.28-rt19.

The following kernel splat can be made to occur when
running the attached test program:

    virt_to_cache: Object is not a Slab page!
    ...
    Call Trace:
     free_uid+0x93/0xa0
     __dequeue_signal+0x21c/0x230
     dequeue_signal+0xb7/0x1b0
     get_signal+0x266/0xce0
     do_signal+0x36/0x640

This was discovered on 5.4.28-rt19. That kernel was
compiled nort, with preemption, and with lots of debug
options selected.  Subsequent tests on the following
kernels show:

     5.4.26-rt17     passes
     5.4.28-rt18     fails
     5.4.28-rt19     fails
     5.4.34-rt21     fails
     5.6.4-rt3       passes

The attached test, tb.c, does a pthread_create-n-join in a
continuous loop, while from other threads in the process
group, SIGUSR1 and SIGUSR2 are continually being sent to
the process group.

When I revert this rt patch in 5.4.28-rt19,

     signals-allow-rt-tasks-to-cache-one-sigqueue-struct.patch

the kernel cannot be made to fail.  However, since two
of the above kernels pass without removing this patch,
I do not believe that removing it will address whatever
the root cause is.

Normally, it takes five or six runs for the test to oops
the kernel.  For some reason, when I run it as:

      sudo chrt -f 1 taskset 2 ./tb

it fails, if it is going to fail, on first try.

once I get a splat the system keeps on running.  But I
find I must reboot to be able to get another splat.

Regards,
Joe


/*
 * Test interaction between pthread_create(2) and kill(2).
 *
 * Algorithm: In a loop, as fast as possible, create
 * and wait on a NOP thread.  While doing so, from other
 * threads, send SIGUSR1 and SIGUSR2 to the process group
 * as fast as possible.  All threads except for the targeted
 * test thread are blocked from receiving SIGUSR[12].
 *
 * This test runs for one second.  It is known to generate
 * a kernel splat on 5.4.28-rt19 when compiled preempt,
 * nort, and with lots of debug options selected.  Fails at
 * most once per boot.  Test may have to be run several
 * times times before failure.  The below seems to trigger
 * the problem more easily:
 *
 *      chrt -f 1 taskset 2 ./tb
 *
 * This program is a revamp/simplification of the LTP test
 * pthread_create/14-1.c.
 *
 * Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com>
 */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <semaphore.h>
#include <pthread.h>

sigset_t usersigs;

sem_t semsig1;
sem_t semsig2;

volatile int doit = 1;

volatile unsigned long threaded_count, siguser1_count, siguser2_count;

/* thread to send SIGUSR1 serially to this process as fast as possible */
void *sendsig1(void *arg)
{
	pid_t pid = getpid();

	pthread_sigmask(SIG_BLOCK, &usersigs, NULL);

	while (doit) {
		siguser1_count++;
		sem_wait(&semsig1);
		kill(pid, SIGUSR1);
	}
	return NULL;
}

/* SIGUSR1 handler merely lets the sender know it can send another signal */
void siguser1(int sig)
{
	sem_post(&semsig1);
}

/* thread to send SIGUSR2 serially to this process as fast as possible */
void *sendsig2(void *arg)
{
	pid_t pid = getpid();

	pthread_sigmask(SIG_BLOCK, &usersigs, NULL);

	while (doit) {
		siguser2_count++;
		sem_wait(&semsig2);
		kill(pid, SIGUSR2);
	}
	return NULL;
}

void siguser2(int sig)
{
	sem_post(&semsig2);
}

/* dummy thread, does no work */
void *threaded(void *arg)
{
	return NULL;
}

void *test(void *arg)
{
	pthread_t child_id;

	/* expose this thread & its children to SIGUSR[12] */
	pthread_sigmask(SIG_UNBLOCK, &usersigs, NULL);

	/* create and re-create a thread as fast as possible */
	while (doit) {
		threaded_count++;
		pthread_create(&child_id, NULL, threaded, NULL);
		pthread_join(child_id, NULL);
	}
	 return NULL;
}

int main(void)
{
	pthread_t th_work, th_sig1, th_sig2;
	struct sigaction sa;

	sigemptyset(&sa.sa_mask);
	sa.sa_flags = 0;
	sa.sa_handler = siguser1;
	sigaction(SIGUSR1, &sa, NULL);
	sa.sa_handler = siguser2;
	sigaction(SIGUSR2, &sa, NULL);

	/* block the main thread from seeing SIGUSR[12] */
	sigemptyset(&usersigs);
	sigaddset(&usersigs, SIGUSR1);
	sigaddset(&usersigs, SIGUSR2);
	pthread_sigmask(SIG_BLOCK, &usersigs, NULL);

	sem_init(&semsig1, 0, 1);
	sem_init(&semsig2, 0, 1);

	pthread_create(&th_work, NULL, test, NULL);

	pthread_create(&th_sig1, NULL, sendsig1, NULL);
	pthread_create(&th_sig2, NULL, sendsig2, NULL);

	sleep(1);

	doit = 0;
	pthread_join(th_sig1, NULL);
	pthread_join(th_sig2, NULL);
	pthread_join(th_work, NULL);

	printf("#threads %lu, #sigusr1's %lu, #sigusr2's %lu\n",
		threaded_count, siguser1_count, siguser2_count);

	return 0;
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-05-04 16:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-28 15:45 Kill(2) and pthread_create(2) interact poorly in 5.4.28-rt19 Joe Korty
2020-04-30 16:18 ` [PATCH] signals: Allow tasks to cache one sigqueue struct, version 2 Joe Korty
2020-05-04 15:41   ` Sebastian Andrzej Siewior
2020-05-04 16:27     ` Joe Korty

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).