Re: [PATCH] coroutine-sigaltstack: Keep SIGUSR2 handler up

From: Max Reitz <mreitz@redhat.com>
To: Laszlo Ersek <lersek@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [PATCH] coroutine-sigaltstack: Keep SIGUSR2 handler up
Date: Fri, 22 Jan 2021 18:58:27 +0100	[thread overview]
Message-ID: <3e6b417c-eebb-dc6a-da7d-af2295118c6a@redhat.com> (raw)
In-Reply-To: <10be5fcc-5e7a-3e44-3229-8526ad3b4547@redhat.com>

On 22.01.21 17:38, Laszlo Ersek wrote:
> On 01/22/21 11:20, Max Reitz wrote:
>> Modifying signal handlers is a process-global operation.  When two
>> threads run coroutine-sigaltstack's qemu_coroutine_new() concurrently,
>> they may interfere with each other: One of them may revert the SIGUSR2
>> handler back to the default between the other thread setting up
>> coroutine_trampoline() as the handler and raising SIGUSR2.  That SIGUSR2
>> will then lead to the process exiting.
>>
>> Outside of coroutine-sigaltstack, qemu does not use SIGUSR2.  We can
>> thus keep the signal handler installed all the time.
>> CoroutineThreadState.tr_handler tells coroutine_trampoline() whether its
>> stack is set up so a new coroutine is to be launched (i.e., it should
>> invoke sigsetjmp()), or not (i.e., the signal came from an external
>> source and we should just perform the default action, which is to exit
>> the process).
>>
>> Note that in user-mode emulation, the guest can register signal handlers
>> for any signal but SIGSEGV and SIGBUS, so if it registers a SIGUSR2
>> handler, sigaltstack coroutines will break from then on.  However, we do
>> not use coroutines for user-mode emulation, so that is fine.
>>
>> Suggested-by: Laszlo Ersek <lersek@redhat.com>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   util/coroutine-sigaltstack.c | 56 +++++++++++++++++++-----------------
>>   1 file changed, 29 insertions(+), 27 deletions(-)
> 
> (1) With SIGUSR2 permanently dedicated to "coroutine-sigaltstack.c", the
> comment on the qemu_init_main_loop() declaration, in
> "include/qemu/main-loop.h", also needs to be updated. SIGUSR2 is no
> longer a "free for thread-internal usage" signal.

It’s possible that I haven’t understood that comment, but I haven’t 
adjusted it because I didn’t interpret it to mean that the signals 
listed there were free to use.  For example, SIGUSR1 (aliased to 
SIG_IPI) is generated by cpus_kick_thread() and caught by KVM and HVF, 
so it doesn’t seem like it would be free for thread-internal usage either.

Instead, I understood it to mean that the signals listed there do not 
have to be blocked by non-main threads, because it is OK for non-main 
threads to catch them.  I can’t think of a reason why SIGUSR2 should be 
blocked by any thread, so I decided not to change the comment.

But perhaps I just didn’t understand the whole comment.  That’s 
definitely possible, given that I don’t even see a place where non-main 
threads would block the signals not listed there.

>> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
>> index aade82afb8..2d32afc322 100644
>> --- a/util/coroutine-sigaltstack.c
>> +++ b/util/coroutine-sigaltstack.c
>> @@ -59,6 +59,8 @@ typedef struct {
>>
>>   static pthread_key_t thread_state_key;
>>
>> +static void coroutine_trampoline(int signal);
>> +
>>   static CoroutineThreadState *coroutine_get_thread_state(void)
>>   {
>>       CoroutineThreadState *s = pthread_getspecific(thread_state_key);
>> @@ -80,6 +82,7 @@ static void qemu_coroutine_thread_cleanup(void *opaque)
>>
>>   static void __attribute__((constructor)) coroutine_init(void)
>>   {
>> +    struct sigaction sa;
>>       int ret;
>>
>>       ret = pthread_key_create(&thread_state_key, qemu_coroutine_thread_cleanup);
>> @@ -87,6 +90,20 @@ static void __attribute__((constructor)) coroutine_init(void)
>>           fprintf(stderr, "unable to create leader key: %s\n", strerror(errno));
>>           abort();
>>       }
>> +
>> +    /*
>> +     * Establish the SIGUSR2 signal handler.  This is a process-wide
>> +     * operation, and so will apply to all threads from here on.
>> +     */
>> +    sa = (struct sigaction) {
>> +        .sa_handler = coroutine_trampoline,
>> +        .sa_flags   = SA_ONSTACK,
>> +    };
>> +
>> +    if (sigaction(SIGUSR2, &sa, NULL) != 0) {
>> +        perror("Unable to install SIGUSR2 handler");
>> +        abort();
>> +    }
>>   }
>>
>>   /* "boot" function
>> @@ -121,7 +138,17 @@ static void coroutine_trampoline(int signal)
>>       /* Get the thread specific information */
>>       coTS = coroutine_get_thread_state();
>>       self = coTS->tr_handler;
>> +
>> +    if (!self) {
>> +        /*
>> +         * This SIGUSR2 came from an external source, not from
>> +         * qemu_coroutine_new(), so perform the default action.
>> +         */
>> +        exit(0);
>> +    }
> 
> (2) exit() is generally unsafe to call in signal handlers.
> 
> We could reason whether or not it is safe in this particular case (POSIX
> describes the exact conditions --
> <https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03>),
> but it's much simpler to just call _exit().
> 
> 
> (3) "Performing the default action" would be slightly different from
> calling _exit(). When a process is terminated with a signal, the parent
> can distinguish that, when reaping the child. See waitpid() /
> WIFSIGNALED() / WTERMSIG(), versus WIFEXITED() / WEXITSTATUS().
> 
> So for the "default action", we'd have to:
> - restore the SIGUSR2 handler to SIG_DFL, and
> - re-raise the signal for the thread, and
> - because the signal being handled is automatically blocked unless
>    SA_NODEFER was set: unblock the signal for the thread.
> 
> The pthread_sigmask() call, made for the last step, would be the one
> that would not return.
> 
> *However*, all of this complexity is not really useful here. We don't
> really want to simulate being "forcefully terminated" by the unexpected
> (asynchronous) SIGUSR2. We just want to exit.
> 
> Therefore, _exit() is fine. But, we should use an exit code different
> from 0, as this is definitely not a pristine exit from QEMU. I'm not
> sure if a convention exists for nonzero exit codes in QEMU; if not, then
> just pass EXIT_FAILURE to _exit().

I’m fine with calling _exit().  I hope, Eric is, too (as long as the 
comment no longer claims this were the default behavior).

Given that SIGUSR2 is not something that qemu is prepared to receive 
from the outside, EXIT_FAILURE seems right to me.  (Even abort() could 
be justified, but I don’t think generating a core dump does any good here.)

(As for qemu-specific exit code conventions, the only ones I know of are 
for certain qemu-img subcommands.  I’m sure you grepped, too, but I 
can’t find anything for the system emulator.)

> (4) Furthermore, please update the comment, because "perform the default
> action" is not precise.

Sure, that’s definitely easier than to re-raise SIGUSR2.

>> +
>>       coTS->tr_called = 1;
>> +    coTS->tr_handler = NULL;
>>       co = &self->base;
>>
>>       /*
> 
> (5) I see that "tr_called" has type "volatile sig_atomic_t", which is
> great (I think that "sig_atomic_t" is not even necessary here, because
> of the careful signal masking that we do, and because the signal is
> raised synchronously).
> 
> "volatile" is certainly justified though (the compiler may not be able
> to trace the call through the signal), and that's what I'm missing from
> "tr_handler" too. IMO, the "tr_handler" field should be decalered as
> follow:
> 
>    volatile void * volatile tr_handler;
> 
> wherein the second "volatile" serves just the same purpose as the
> "volatile" in "tr_called", and the first "volatile" follows from *that*
> -- wherever we chase the *chain of pointers* rooted in "tr_handler"
> should not be optimized out by the compiler.
> 
> But: my point (5) is orthogonal to this patch. In practice, it may not
> matter at all. So feel free to ignore now, I guess.

I suppose signal handlers are indeed preempting functions (i.e., the 
compiler is not aware of them executing).  I overlooked that, given that 
logically, we more or less explicitly invoke the SIGUSR2 handler, but of 
course, technically, we just trigger the signal and the handler is then 
invoked preemptively.

I suspect the pthread_kill() function is sufficiently complex that the 
compiler can’t prove that it won’t access *coTS (e.g. because perhaps it 
contains a syscall?), but better be safe than sorry.

But I don’t really like the “volatile void *volatile tr_handler”, 
particularly the “volatile void *” combinations.  I find volatile voids 
weird.

Why is tr_handler even a void pointer, and not a pointer to 
CoroutineSigAltStack, if all it can point to is such an object? 
“volatile CoroutineSigAltStack *” would make more sense to me.

Given that perhaps the CoroutineSigAltStack object should be volatile, 
shouldn’t also the CoroutineThreadState object be volatile?  If it were, 
we could drop the volatile from tr_called and wouldn’t need to make the 
pointer value tr_handler volatile.

OTOH, if I look more through the code, making everything volatile seems 
a bit excessive to me.  I understand correctly that 
sigsetjmp()/siglongjmp() act as barriers to the compiler, don’t they?

The only value that I can see the in-coroutine code writing that the 
calling code reads (before the next sigsetjmp()) is tr_called.  So 
shouldn’t it be sufficient to insert a barrier() before the 
pthread_kill(), and then it’d be sufficient to keep tr_called volatile, 
but the rest could stay as it is?

>> @@ -150,12 +177,9 @@ Coroutine *qemu_coroutine_new(void)
>>   {
>>       CoroutineSigAltStack *co;
>>       CoroutineThreadState *coTS;
>> -    struct sigaction sa;
>> -    struct sigaction osa;
>>       stack_t ss;
>>       stack_t oss;
>>       sigset_t sigs;
>> -    sigset_t osigs;
>>       sigjmp_buf old_env;
>>
>>       /* The way to manipulate stack is with the sigaltstack function. We
>> @@ -172,24 +196,6 @@ Coroutine *qemu_coroutine_new(void)
>>       co->stack = qemu_alloc_stack(&co->stack_size);
>>       co->base.entry_arg = &old_env; /* stash away our jmp_buf */
>>
>> -    coTS = coroutine_get_thread_state();
>> -    coTS->tr_handler = co;
>> -
>> -    /*
>> -     * Preserve the SIGUSR2 signal state, block SIGUSR2,
>> -     * and establish our signal handler. The signal will
>> -     * later transfer control onto the signal stack.
>> -     */
>> -    sigemptyset(&sigs);
>> -    sigaddset(&sigs, SIGUSR2);
>> -    pthread_sigmask(SIG_BLOCK, &sigs, &osigs);
>> -    sa.sa_handler = coroutine_trampoline;
>> -    sigfillset(&sa.sa_mask);
>> -    sa.sa_flags = SA_ONSTACK;
>> -    if (sigaction(SIGUSR2, &sa, &osa) != 0) {
>> -        abort();
>> -    }
>> -
>>       /*
>>        * Set the new stack.
>>        */
>> @@ -207,6 +213,8 @@ Coroutine *qemu_coroutine_new(void)
>>        * signal can be delivered the first time sigsuspend() is
>>        * called.
>>        */
>> +    coTS = coroutine_get_thread_state();
>> +    coTS->tr_handler = co;
>>       coTS->tr_called = 0;
>>       pthread_kill(pthread_self(), SIGUSR2);
>>       sigfillset(&sigs);
>> @@ -230,12 +238,6 @@ Coroutine *qemu_coroutine_new(void)
>>           sigaltstack(&oss, NULL);
>>       }
>>
>> -    /*
>> -     * Restore the old SIGUSR2 signal handler and mask
>> -     */
>> -    sigaction(SIGUSR2, &osa, NULL);
>> -    pthread_sigmask(SIG_SETMASK, &osigs, NULL);
>> -
>>       /*
>>        * Now enter the trampoline again, but this time not as a signal
>>        * handler. Instead we jump into it directly. The functionally
>>
> 
> (6) This change, for qemu_coroutine_new(), is too heavy-handed, in my
> opinion. I suggest (a) removing only the sigaction() calls and their
> directly needed aux variables, and (b) *not* moving around operations.
> 
> In particular, you remove the blocking of SIGUSR2 for the thread -- the
> pthread_sigmask() call. That means the sigsuspend() later on becomes
> superfluous, as the signal will not be delivered inside sigsuspend(),
> but inside pthread_kill(). With the signal not blocked, *generation* and
> *delivery* will coalesce into a single event.

Are you saying that (a) is a problem because this is too heavy-handed of 
a change for a single patch, or because it would actually be a problem 
to deliver the signal inside pthread_kill()?

(Either way will result in me dropping the change from this patch, so 
it’s just a question out of curiosity.)

As for (b), just FYI, I deliberately moved around the operation, so that 
tr_handler is only set once the coroutine stack is set up.  Otherwise it 
might be a problem if an external SIGUSR2 comes in before then.

But if we keep SIGUSR2 blocked, there is no reason for that movement, 
hence the “just FYI”.

> The general logic should stay the same, only the signal action
> manipulation should be removed. IOW, for this function, I propose the
> following change only:
> 
>> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
>> index aade82afb8c0..722fed7b2502 100644
>> --- a/util/coroutine-sigaltstack.c
>> +++ b/util/coroutine-sigaltstack.c
>> @@ -149,107 +149,97 @@ static void coroutine_trampoline(int signal)
>>   Coroutine *qemu_coroutine_new(void)
>>   {
>>       CoroutineSigAltStack *co;
>>       CoroutineThreadState *coTS;
>> -    struct sigaction sa;
>> -    struct sigaction osa;
>>       stack_t ss;
>>       stack_t oss;
>>       sigset_t sigs;
>>       sigset_t osigs;
>>       sigjmp_buf old_env;
>>
>>       /* The way to manipulate stack is with the sigaltstack function. We
>>        * prepare a stack, with it delivering a signal to ourselves and then
>>        * put sigsetjmp/siglongjmp where needed.
>>        * This has been done keeping coroutine-ucontext as a model and with the
>>        * pth ideas (GNU Portable Threads). See coroutine-ucontext for the basics
>>        * of the coroutines and see pth_mctx.c (from the pth project) for the
>>        * sigaltstack way of manipulating stacks.
>>        */
>>
>>       co = g_malloc0(sizeof(*co));
>>       co->stack_size = COROUTINE_STACK_SIZE;
>>       co->stack = qemu_alloc_stack(&co->stack_size);
>>       co->base.entry_arg = &old_env; /* stash away our jmp_buf */
>>
>>       coTS = coroutine_get_thread_state();
>>       coTS->tr_handler = co;
>>
>>       /*
>> -     * Preserve the SIGUSR2 signal state, block SIGUSR2,
>> -     * and establish our signal handler. The signal will
>> -     * later transfer control onto the signal stack.
>> +     * Block SIGUSR2. The signal will later transfer control onto the signal
>> +     * stack.
>>        */
>>       sigemptyset(&sigs);
>>       sigaddset(&sigs, SIGUSR2);
>>       pthread_sigmask(SIG_BLOCK, &sigs, &osigs);
>> -    sa.sa_handler = coroutine_trampoline;
>> -    sigfillset(&sa.sa_mask);
>> -    sa.sa_flags = SA_ONSTACK;
>> -    if (sigaction(SIGUSR2, &sa, &osa) != 0) {
>> -        abort();
>> -    }
>>
>>       /*
>>        * Set the new stack.
>>        */
>>       ss.ss_sp = co->stack;
>>       ss.ss_size = co->stack_size;
>>       ss.ss_flags = 0;
>>       if (sigaltstack(&ss, &oss) < 0) {
>>           abort();
>>       }
>>
>>       /*
>>        * Now transfer control onto the signal stack and set it up.
>>        * It will return immediately via "return" after the sigsetjmp()
>>        * was performed. Be careful here with race conditions.  The
>>        * signal can be delivered the first time sigsuspend() is
>>        * called.
>>        */
>>       coTS->tr_called = 0;
>>       pthread_kill(pthread_self(), SIGUSR2);
>>       sigfillset(&sigs);
>>       sigdelset(&sigs, SIGUSR2);
>>       while (!coTS->tr_called) {
>>           sigsuspend(&sigs);
>>       }
>>
>>       /*
>>        * Inform the system that we are back off the signal stack by
>>        * removing the alternative signal stack. Be careful here: It
>>        * first has to be disabled, before it can be removed.
>>        */
>>       sigaltstack(NULL, &ss);
>>       ss.ss_flags = SS_DISABLE;
>>       if (sigaltstack(&ss, NULL) < 0) {
>>           abort();
>>       }
>>       sigaltstack(NULL, &ss);
>>       if (!(oss.ss_flags & SS_DISABLE)) {
>>           sigaltstack(&oss, NULL);
>>       }
>>
>>       /*
>> -     * Restore the old SIGUSR2 signal handler and mask
>> +     * Restore the old mask
>>        */
>> -    sigaction(SIGUSR2, &osa, NULL);
>>       pthread_sigmask(SIG_SETMASK, &osigs, NULL);
>>
>>       /*
>>        * Now enter the trampoline again, but this time not as a signal
>>        * handler. Instead we jump into it directly. The functionally
>>        * redundant ping-pong pointer arithmetic is necessary to avoid
>>        * type-conversion warnings related to the `volatile' qualifier and
>>        * the fact that `jmp_buf' usually is an array type.
>>        */
>>       if (!sigsetjmp(old_env, 0)) {
>>           siglongjmp(coTS->tr_reenter, 1);
>>       }
>>
>>       /*
>>        * Ok, we returned again, so now we're finished
>>        */
>>
>>       return &co->base;
>>   }
> 
> 
> (7) The sigaction() call has not been moved entirely correctly from
> qemu_coroutine_new() to coroutine_init(), in my opinion.
> 
> Namely, the original call site fills the "sa_mask" member, meaning that,
> while the signal handler is executing, not only SIGUSR2 itself should be
> blocked, but *all* signals.
> 
> This is missing from the new call site, in coroutine_init() -- the
> "sa_mask" member is left zeroed.
> 
> Now, in practice, this may not matter a whole lot, because "sa_mask" is
> additive, and at the only place we allow the signal to be delivered,
> namely in sigsuspend(), we already have everything blocked, except for
> SIGUSR2. So when "sa_mask" is ORed with the set of blocked signals, upon
> delivery of SIGUSR2, there is no actual change to the signal mask.
> 
> *But*, I feel that such a change would really deserve its own argument,
> i.e. a separate patch, or at least a separate paragraph in the commit
> message. And I suggest not doing those; instead please just faithfully
> transfer the "sa_mask" setting too, to coroutine_init(). (My impression
> is that the effective removal of the "sa_mask" population was an
> oversight in this patch, not a conscious decision.)

Yes, it was totally an oversight.

Thanks a lot for your detailed review!  I have absolutely no experience 
with coroutine-sigaltstack in particular, and very little experience 
with signal handling in projects where correctness actually matters. 
I’m grateful that you inspect this patch with great scrutiny.

(Btw, that’s why I was very close to just ending a new version of the 
mutex patch just with the commit message adjusted, because that felt 
like I could do much less wrong than here.  Turns out I was right. O:))

Max