On Thu, Jan 13, 2022 at 1:17 PM Peter Maydell <peter.maydell@linaro.org>
wrote:

> On Sun, 9 Jan 2022 at 16:40, Warner Losh <imp@bsdimp.com> wrote:
> >
> > Implement host_signal_handler to handle signals generated by the host
> > and to do safe system calls.
> >
> > Signed-off-by: Stacey Son <sson@FreeBSD.org>
> > Signed-off-by: Kyle Evans <kevans@freebsd.org>
> > Signed-off-by: Warner Losh <imp@bsdimp.com>
> > ---
> >  bsd-user/signal.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 105 insertions(+)
> >
> > diff --git a/bsd-user/signal.c b/bsd-user/signal.c
> > index b1331f63d61..a6e07277fb2 100644
> > --- a/bsd-user/signal.c
> > +++ b/bsd-user/signal.c
> > @@ -142,6 +142,111 @@ void force_sig_fault(int sig, int code, abi_ulong
> addr)
> >
> >  static void host_signal_handler(int host_sig, siginfo_t *info, void
> *puc)
> >  {
> > +    CPUState *cpu = thread_cpu;
> > +    CPUArchState *env = cpu->env_ptr;
> > +    int sig;
> > +    target_siginfo_t tinfo;
> > +    ucontext_t *uc = puc;
> > +    uintptr_t pc = 0;
> > +    bool sync_sig = false;
> > +
> > +    /*
> > +     * Non-spoofed SIGSEGV and SIGBUS are synchronous, and need special
> > +     * handling wrt signal blocking and unwinding.
> > +     */
> > +    if ((host_sig == SIGSEGV || host_sig == SIGBUS) && info->si_code >
> 0) {
> > +        MMUAccessType access_type;
> > +        uintptr_t host_addr;
> > +        abi_ptr guest_addr;
> > +        bool is_write;
> > +
> > +        host_addr = (uintptr_t)info->si_addr;
> > +
> > +        /*
> > +         * Convert forcefully to guest address space: addresses outside
> > +         * reserved_va are still valid to report via SEGV_MAPERR.
> > +         */
> > +        guest_addr = h2g_nocheck(host_addr);
> > +
> > +        pc = host_signal_pc(uc);
> > +        is_write = host_signal_write(info, uc);
> > +        access_type = adjust_signal_pc(&pc, is_write);
> > +
> > +        if (host_sig == SIGSEGV) {
> > +            bool maperr = true;
> > +
> > +            if (info->si_code == SEGV_ACCERR && h2g_valid(host_addr)) {
> > +                /* If this was a write to a TB protected page, restart.
> */
> > +                if (is_write &&
> > +                    handle_sigsegv_accerr_write(cpu, &uc->uc_sigmask,
> > +                                                pc, guest_addr)) {
> > +                    return;
> > +                }
> > +
> > +                /*
> > +                 * With reserved_va, the whole address space is
> PROT_NONE,
> > +                 * which means that we may get ACCERR when we want
> MAPERR.
> > +                 */
> > +                if (page_get_flags(guest_addr) & PAGE_VALID) {
> > +                    maperr = false;
> > +                } else {
> > +                    info->si_code = SEGV_MAPERR;
> > +                }
> > +            }
> > +
> > +            sigprocmask(SIG_SETMASK, &uc->uc_sigmask, NULL);
> > +            cpu_loop_exit_sigsegv(cpu, guest_addr, access_type, maperr,
> pc);
> > +        } else {
> > +            sigprocmask(SIG_SETMASK, &uc->uc_sigmask, NULL);
> > +            if (info->si_code == BUS_ADRALN) {
> > +                cpu_loop_exit_sigbus(cpu, guest_addr, access_type, pc);
> > +            }
> > +        }
> > +
> > +        sync_sig = true;
> > +    }
> > +
> > +    /* Get the target signal number. */
> > +    sig = host_to_target_signal(host_sig);
> > +    if (sig < 1 || sig > TARGET_NSIG) {
> > +        return;
> > +    }
> > +    trace_user_host_signal(cpu, host_sig, sig);
> > +
> > +    host_to_target_siginfo_noswap(&tinfo, info);
> > +
> > +    queue_signal(env, sig, &tinfo);       /* XXX how to cope with
> failure? */
>
> queue_signal() can't fail, so there is nothing to cope with.
> (Your bsd-user version even has the right 'void' type --
> linux-user's returns 1 always and we never look at the return
> value, so we should really switch that to void return too.)
>

Submitted a patch to that last bit. I'll remove this comment in the rework
I'm doing.


> > +    /*
> > +     * Linux does something else here -> the queue signal may be wrong,
> but
> > +     * maybe not.  And then it does the rewind_if_in_safe_syscall
> > +     */
>
> I think you have here a bit of a mix of linux-user's current design
> and some older (broken) version. This is how linux-user works today:
>

Yes. I think we forked bsd-user from linux-user around 2015 and the old
design dropped signal queueing in 2016 if I'm reading git log correctly.


>  * queue_signal() is a little bit misnamed, because there is no
>    "queue" here: there can only be at most one "queued" signal,
>    and it lives in the TaskState struct (which is user-only specific
>    information that hangs off the guest CPU struct) as the
>    TaskState::sync_signal field. The reason
>    we only have one at once is that queue_signal() is used only
>    for signals generated by QEMU itself by calling queue_signal()
>    directly or indirectly from the cpu_loop() code. The cpu loop
>    always calls process_pending_signals() at the end of its loop,
>    which will pick up a queued signal. We never call queue_signal()
>    twice in a row before getting back to process_pending_signals(),
>    so there's only ever at most one thing in the "queue".
>  * for all signals we get from the host except SIGSEGV/SIGBUS,
>    we track whether there's a host signal pending in the
>    TaskState::sigtab[] array (which is indexed by signal number).
>    We block all host signals except SIGSEGV/SIGBUS before calling
>    cpu_exit(), so we know we're not going to get more than one
>    of these at once (and it won't clash with a queue_signal()
>    signal either, as those use the sync_signal field, not the
>    sigtab[]).
>  * for host-sent non-spoofed (ie not sent via 'kill()') SIGSEGV/SIGBUS,
>    we know this was caused by a bit of generated code, so we just
>    use cpu_loop_exit_restore() to turn this into an EXCP_INTERRUPT
>    at the right guest PC
>
> I feel fairly strongly that you definitely want to use the same
> design as current linux-user does for signals:
>  * getting this right is pretty tricky, and even if we get two
>    different designs to both have the same semantics it's going
>    to be pretty confusing
>  * we thought quite hard about the linux-user code at the time
>    and it's definitely less buggy than the previous design
>  * It's much easier to review the bsd-user code as "yes this is
>    doing the same thing linux-user does" than working through
>    a different approach from first principles
>

I agree. I like that design better than what's in bsd-user today and
seems simpler to implement and get right. I'd add a 4th bullet point
"It's easier to share code, either directly or via copying"


> I don't have as strong an opinion on whether we should try to get
> it into the tree that way from the start, or to put in whatever
> you have currently and then fix it later. (More accurately,
> I would prefer to review patches which use the same design
> as linux-user but if that's going to be massively painful/slow
> for you to get something upstream doing it that way around
> I can probably live with the other approach...)
>

I'm going to start down that path and see if I can rework the patches
and see if it solves some of the regressions we've seen in bsd-user
between 6.1 and 6.2 as the signal stuff was reworked in linux-user
to cope better with sigsegv and sigbus. If that works out OK, then
I'll move forward with adjusting the fork and reflect that back into
this patch series.


> > +    /*
> > +     * For synchronous signals, unwind the cpu state to the faulting
> > +     * insn and then exit back to the main loop so that the signal
> > +     * is delivered immediately.
> > +     XXXX Should this be in queue_signal?
>
> No, because queue_signal() is called for lots of ways to pend
> a signal, most of which aren't real host signals.
>

OK. I now agree after reading the code.


> > +     */
> > +    if (sync_sig) {
> > +        cpu->exception_index = EXCP_INTERRUPT;
> > +        cpu_loop_exit_restore(cpu, pc);
> > +    }
> > +
> > +    rewind_if_in_safe_syscall(puc);
> > +
> > +    /*
> > +     * Block host signals until target signal handler entered. We
> > +     * can't block SIGSEGV or SIGBUS while we're executing guest
> > +     * code in case the guest code provokes one in the window between
> > +     * now and it getting out to the main loop. Signals will be
> > +     * unblocked again in process_pending_signals().
> > +     */
> > +    sigfillset(&uc->uc_sigmask);
> > +    sigdelset(&uc->uc_sigmask, SIGSEGV);
> > +    sigdelset(&uc->uc_sigmask, SIGBUS);
> > +
> > +    /* Interrupt the virtual CPU as soon as possible. */
> > +    cpu_exit(thread_cpu);
> >  }
> >
> >  void signal_init(void)
>
> thanks
>

Thank you for the insight and places to focus on making this code better.
It's
a bit hard to discover all this just from reading code in any sane amount
of time
and it is both quite helpful and much appreciated.

Warner