On Thu, Jan 13, 2022 at 1:17 PM Peter Maydell <peter.maydell@linaro.org> wrote:
On Sun, 9 Jan 2022 at 16:40, Warner Losh <imp@bsdimp.com> wrote:
>
> Implement host_signal_handler to handle signals generated by the host
> and to do safe system calls.
>
> Signed-off-by: Stacey Son <sson@FreeBSD.org>
> Signed-off-by: Kyle Evans <kevans@freebsd.org>
> Signed-off-by: Warner Losh <imp@bsdimp.com>
> ---
>  bsd-user/signal.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 105 insertions(+)
>
> diff --git a/bsd-user/signal.c b/bsd-user/signal.c
> index b1331f63d61..a6e07277fb2 100644
> --- a/bsd-user/signal.c
> +++ b/bsd-user/signal.c
> @@ -142,6 +142,111 @@ void force_sig_fault(int sig, int code, abi_ulong addr)
>
>  static void host_signal_handler(int host_sig, siginfo_t *info, void *puc)
>  {
> +    CPUState *cpu = thread_cpu;
> +    CPUArchState *env = cpu->env_ptr;
> +    int sig;
> +    target_siginfo_t tinfo;
> +    ucontext_t *uc = puc;
> +    uintptr_t pc = 0;
> +    bool sync_sig = false;
> +
> +    /*
> +     * Non-spoofed SIGSEGV and SIGBUS are synchronous, and need special
> +     * handling wrt signal blocking and unwinding.
> +     */
> +    if ((host_sig == SIGSEGV || host_sig == SIGBUS) && info->si_code > 0) {
> +        MMUAccessType access_type;
> +        uintptr_t host_addr;
> +        abi_ptr guest_addr;
> +        bool is_write;
> +
> +        host_addr = (uintptr_t)info->si_addr;
> +
> +        /*
> +         * Convert forcefully to guest address space: addresses outside
> +         * reserved_va are still valid to report via SEGV_MAPERR.
> +         */
> +        guest_addr = h2g_nocheck(host_addr);
> +
> +        pc = host_signal_pc(uc);
> +        is_write = host_signal_write(info, uc);
> +        access_type = adjust_signal_pc(&pc, is_write);
> +
> +        if (host_sig == SIGSEGV) {
> +            bool maperr = true;
> +
> +            if (info->si_code == SEGV_ACCERR && h2g_valid(host_addr)) {
> +                /* If this was a write to a TB protected page, restart. */
> +                if (is_write &&
> +                    handle_sigsegv_accerr_write(cpu, &uc->uc_sigmask,
> +                                                pc, guest_addr)) {
> +                    return;
> +                }
> +
> +                /*
> +                 * With reserved_va, the whole address space is PROT_NONE,
> +                 * which means that we may get ACCERR when we want MAPERR.
> +                 */
> +                if (page_get_flags(guest_addr) & PAGE_VALID) {
> +                    maperr = false;
> +                } else {
> +                    info->si_code = SEGV_MAPERR;
> +                }
> +            }
> +
> +            sigprocmask(SIG_SETMASK, &uc->uc_sigmask, NULL);
> +            cpu_loop_exit_sigsegv(cpu, guest_addr, access_type, maperr, pc);
> +        } else {
> +            sigprocmask(SIG_SETMASK, &uc->uc_sigmask, NULL);
> +            if (info->si_code == BUS_ADRALN) {
> +                cpu_loop_exit_sigbus(cpu, guest_addr, access_type, pc);
> +            }
> +        }
> +
> +        sync_sig = true;
> +    }
> +
> +    /* Get the target signal number. */
> +    sig = host_to_target_signal(host_sig);
> +    if (sig < 1 || sig > TARGET_NSIG) {
> +        return;
> +    }
> +    trace_user_host_signal(cpu, host_sig, sig);
> +
> +    host_to_target_siginfo_noswap(&tinfo, info);
> +
> +    queue_signal(env, sig, &tinfo);       /* XXX how to cope with failure? */

queue_signal() can't fail, so there is nothing to cope with.
(Your bsd-user version even has the right 'void' type --
linux-user's returns 1 always and we never look at the return
value, so we should really switch that to void return too.)

Submitted a patch to that last bit. I'll remove this comment in the rework I'm doing.
 
> +    /*
> +     * Linux does something else here -> the queue signal may be wrong, but
> +     * maybe not.  And then it does the rewind_if_in_safe_syscall
> +     */

I think you have here a bit of a mix of linux-user's current design
and some older (broken) version. This is how linux-user works today:

Yes. I think we forked bsd-user from linux-user around 2015 and the old
design dropped signal queueing in 2016 if I'm reading git log correctly.
 
 * queue_signal() is a little bit misnamed, because there is no
   "queue" here: there can only be at most one "queued" signal,
   and it lives in the TaskState struct (which is user-only specific
   information that hangs off the guest CPU struct) as the
   TaskState::sync_signal field. The reason
   we only have one at once is that queue_signal() is used only
   for signals generated by QEMU itself by calling queue_signal()
   directly or indirectly from the cpu_loop() code. The cpu loop
   always calls process_pending_signals() at the end of its loop,
   which will pick up a queued signal. We never call queue_signal()
   twice in a row before getting back to process_pending_signals(),
   so there's only ever at most one thing in the "queue".
 * for all signals we get from the host except SIGSEGV/SIGBUS,
   we track whether there's a host signal pending in the
   TaskState::sigtab[] array (which is indexed by signal number).
   We block all host signals except SIGSEGV/SIGBUS before calling
   cpu_exit(), so we know we're not going to get more than one
   of these at once (and it won't clash with a queue_signal()
   signal either, as those use the sync_signal field, not the
   sigtab[]).
 * for host-sent non-spoofed (ie not sent via 'kill()') SIGSEGV/SIGBUS,
   we know this was caused by a bit of generated code, so we just
   use cpu_loop_exit_restore() to turn this into an EXCP_INTERRUPT
   at the right guest PC

I feel fairly strongly that you definitely want to use the same
design as current linux-user does for signals:
 * getting this right is pretty tricky, and even if we get two
   different designs to both have the same semantics it's going
   to be pretty confusing
 * we thought quite hard about the linux-user code at the time
   and it's definitely less buggy than the previous design
 * It's much easier to review the bsd-user code as "yes this is
   doing the same thing linux-user does" than working through
   a different approach from first principles

I agree. I like that design better than what's in bsd-user today and
seems simpler to implement and get right. I'd add a 4th bullet point
"It's easier to share code, either directly or via copying"
 
I don't have as strong an opinion on whether we should try to get
it into the tree that way from the start, or to put in whatever
you have currently and then fix it later. (More accurately,
I would prefer to review patches which use the same design
as linux-user but if that's going to be massively painful/slow
for you to get something upstream doing it that way around
I can probably live with the other approach...)

I'm going to start down that path and see if I can rework the patches
and see if it solves some of the regressions we've seen in bsd-user
between 6.1 and 6.2 as the signal stuff was reworked in linux-user
to cope better with sigsegv and sigbus. If that works out OK, then
I'll move forward with adjusting the fork and reflect that back into
this patch series.
 
> +    /*
> +     * For synchronous signals, unwind the cpu state to the faulting
> +     * insn and then exit back to the main loop so that the signal
> +     * is delivered immediately.
> +     XXXX Should this be in queue_signal?

No, because queue_signal() is called for lots of ways to pend
a signal, most of which aren't real host signals.

OK. I now agree after reading the code.
 
> +     */
> +    if (sync_sig) {
> +        cpu->exception_index = EXCP_INTERRUPT;
> +        cpu_loop_exit_restore(cpu, pc);
> +    }
> +
> +    rewind_if_in_safe_syscall(puc);
> +
> +    /*
> +     * Block host signals until target signal handler entered. We
> +     * can't block SIGSEGV or SIGBUS while we're executing guest
> +     * code in case the guest code provokes one in the window between
> +     * now and it getting out to the main loop. Signals will be
> +     * unblocked again in process_pending_signals().
> +     */
> +    sigfillset(&uc->uc_sigmask);
> +    sigdelset(&uc->uc_sigmask, SIGSEGV);
> +    sigdelset(&uc->uc_sigmask, SIGBUS);
> +
> +    /* Interrupt the virtual CPU as soon as possible. */
> +    cpu_exit(thread_cpu);
>  }
>
>  void signal_init(void)

thanks

Thank you for the insight and places to focus on making this code better. It's
a bit hard to discover all this just from reading code in any sane amount of time
and it is both quite helpful and much appreciated.

Warner