Re: [PATCH] coroutine-sigaltstack: Keep SIGUSR2 handler up

From: Laszlo Ersek <lersek@redhat.com>
To: Max Reitz <mreitz@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [PATCH] coroutine-sigaltstack: Keep SIGUSR2 handler up
Date: Fri, 22 Jan 2021 19:29:14 +0100	[thread overview]
Message-ID: <1121a803-98e7-6d41-119c-3d82717976ec@redhat.com> (raw)
In-Reply-To: <eef8237e-293a-b6e6-20be-fa004509fa05@redhat.com>

On 01/22/21 19:05, Max Reitz wrote:
> On 22.01.21 18:09, Laszlo Ersek wrote:
>> On 01/22/21 11:20, Max Reitz wrote:
>>> Modifying signal handlers is a process-global operation.  When two
>>> threads run coroutine-sigaltstack's qemu_coroutine_new() concurrently,
>>> they may interfere with each other: One of them may revert the SIGUSR2
>>> handler back to the default between the other thread setting up
>>> coroutine_trampoline() as the handler and raising SIGUSR2.  That SIGUSR2
>>> will then lead to the process exiting.
>>>
>>> Outside of coroutine-sigaltstack, qemu does not use SIGUSR2.  We can
>>> thus keep the signal handler installed all the time.
>>> CoroutineThreadState.tr_handler tells coroutine_trampoline() whether its
>>> stack is set up so a new coroutine is to be launched (i.e., it should
>>> invoke sigsetjmp()), or not (i.e., the signal came from an external
>>> source and we should just perform the default action, which is to exit
>>> the process).
>>>
>>> Note that in user-mode emulation, the guest can register signal handlers
>>> for any signal but SIGSEGV and SIGBUS, so if it registers a SIGUSR2
>>> handler, sigaltstack coroutines will break from then on.  However, we do
>>> not use coroutines for user-mode emulation, so that is fine.
>>>
>>> Suggested-by: Laszlo Ersek <lersek@redhat.com>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>   util/coroutine-sigaltstack.c | 56 +++++++++++++++++++-----------------
>>>   1 file changed, 29 insertions(+), 27 deletions(-)
>>>
>>> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
>>> index aade82afb8..2d32afc322 100644
>>> --- a/util/coroutine-sigaltstack.c
>>> +++ b/util/coroutine-sigaltstack.c
>>> @@ -59,6 +59,8 @@ typedef struct {
>>>     static pthread_key_t thread_state_key;
>>>   +static void coroutine_trampoline(int signal);
>>> +
>>>   static CoroutineThreadState *coroutine_get_thread_state(void)
>>>   {
>>>       CoroutineThreadState *s = pthread_getspecific(thread_state_key);
>>> @@ -80,6 +82,7 @@ static void qemu_coroutine_thread_cleanup(void
>>> *opaque)
>>>     static void __attribute__((constructor)) coroutine_init(void)
>>>   {
>>> +    struct sigaction sa;
>>>       int ret;
>>>         ret = pthread_key_create(&thread_state_key,
>>> qemu_coroutine_thread_cleanup);
>>> @@ -87,6 +90,20 @@ static void __attribute__((constructor))
>>> coroutine_init(void)
>>>           fprintf(stderr, "unable to create leader key: %s\n",
>>> strerror(errno));
>>>           abort();
>>>       }
>>> +
>>> +    /*
>>> +     * Establish the SIGUSR2 signal handler.  This is a process-wide
>>> +     * operation, and so will apply to all threads from here on.
>>> +     */
>>> +    sa = (struct sigaction) {
>>> +        .sa_handler = coroutine_trampoline,
>>> +        .sa_flags   = SA_ONSTACK,
>>> +    };
>>> +
>>> +    if (sigaction(SIGUSR2, &sa, NULL) != 0) {
>>> +        perror("Unable to install SIGUSR2 handler");
>>> +        abort();
>>> +    }
>>>   }
>>>     /* "boot" function
>>> @@ -121,7 +138,17 @@ static void coroutine_trampoline(int signal)
>>>       /* Get the thread specific information */
>>>       coTS = coroutine_get_thread_state();
>>>       self = coTS->tr_handler;
>>> +
>>> +    if (!self) {
>>> +        /*
>>> +         * This SIGUSR2 came from an external source, not from
>>> +         * qemu_coroutine_new(), so perform the default action.
>>> +         */
>>> +        exit(0);
>>> +    }
>>> +
>>>       coTS->tr_called = 1;
>>> +    coTS->tr_handler = NULL;
>>>       co = &self->base;
>>>         /*
>>
>> (8) There's a further complication here, assuming we really want to
>> recognize the case when the handler is executing unexpectedly:
>>
>> - pthread_getspecific() is not necessarily async-signal-safe, according
>> to POSIX, so calling coroutine_get_thread_state() in the "unexpected"
>> case (e.g. in response to an asynchronously generated SIGUSR2) is
>> problematic in its own right,
> 
> That’s a shame.
> 
>> - if the SIGUSR2 is delivered to a thread that has never called
>> coroutine_get_thread_state() before, then we'll reach g_malloc0() inside
>> coroutine_get_thread_state(), in signal handler context, which is very
>> bad.
> 
> Could be solved with a coroutine_try_get_thread_state() that will never
> malloc, but return NULL then.
> 
>> You'd have to block SIGUSR2 for the entire process (all threads) at all
>> times, and only temporarily unblock it for a particular coroutine
>> thread, with the sigsuspend(). The above check would suffice, that way.
> 
> Yes, that’s what I was originally afraid of.  I feel like that may be
> the complexity drop that pushes this change too far out of my comfort
> zone.  (And as evidenced by your review, it already was pretty much
> outside as it was.)
> 
>> Such blocking is possible by calling pthread_sigmask() from the main
>> thread, before any other thread is created (the signal mask is inherited
>> across pthread_create()). I guess it could be done in coroutine_init()
>> too.
>>
>> And *then* the pthread_sigmask() calls should indeed be removed from
>> qemu_coroutine_new().
> 
> OTOH, that does sound rather simple...
> 
>> (Apologies if my feedback is difficult to understand, it's my fault. I
>> could propose a patch, if (and only if) you want that.)
> 
> I can’t say I wouldn’t be happy with a patch for this code that doesn’t
> bear my S-o-b. ;)
> 
> I feel conflicted.  I can send a v2 that addresses this (probably
> consisting of multiple patches then, e.g. I’d split the SIGUSR2 blocking
> off the main patch), but to me, this bug is really more of a nuisance
> that just blocks me from sending a pull request for my block branch...
> So I’d rather not drag it out forever.  OTOH, sending a quick and bad
> fix just because I can’t wait is just bad.
> 
> I suppose I’ll have to decide over the weekend.  Though if you’re
> itching to write a patch yourself, I’d definitely be grateful.

OK, I'll try my hand at it; I hope I won't be eating my words.

Laszlo